On 28/09/14 03:36, Armindo Rodrigues wrote:

have noted the beginning and end of the quotes list so you can easily skip
and go straight to the code section. ***

It would probably have been better to just delete all but a nfew of the quotes. We don't need all of them to evaluate your code.

import re
from datetime import datetime
import time


###################  DATA LIST STARTS HERE

data_list=["And now here is my secret, a very simple secret: It is only
with the heart that one can see rightly; what is essential is invisible to
the eye.",
"All grown-ups were once children... but only few of them remember it.",
...
"If you love a flower that lives on a star, then it's good at night, to
look up at the sky. All the stars are blossoming."]


################## CODE STARTS HERE

#Create a list of words taken from each individual word in the datalist
word_list = []
for item in data_list:
     for word in item.split(" "):
         word = re.sub('^[^a-zA-z]*|[^a-zA-Z]*$','', word)

word.strip() would be better here. You can specify a string of chars to be stripped if its not only whitespace. Consider regular expressions as a weapon of last resort.

         word_list.append(word)
word_list = sorted(list(set(word_list))) #Remove repeated words

You don't need to convert the set into a list. sorted() works
with sets too.

quotesDict = {}
for word in word_list:
     quotesDict.setdefault(word,[]) #Create a dictionary with keys based on
each word in the word list

By putting the words in the dictionary you lose the sorting you did above. So the sorting was a waste of time.

for key, value in quotesDict.items():
     indexofquote = 0
     for quote in data_list:

You should use enumerate for this. It will automatically give you the index and quote and be less error prone than maintaining the index yourself.

         if key in quote:
             quotesDict[key].append(indexofquote) #Append the index of the
found quotes to the dictionary key
         indexofquote+=1

query=input("query: ")
query = query.strip(" ").split(" ")
query = list(set(query))


I don;t think you need the conversion to list here either.
You can just use the set.

start_time = time.time()

FoundQuotes = []

# Right now the OR search just prints out the index of the found quotes.
if ("or" in query) and ("and" not in query):

The logic here can be simplified by testing for 'and' first

if 'and' in query
   remove 'or'
   process and
elif 'or' in query
   process 'or'
else process simple query



     query.remove("or")
     print("Performing OR search for: ", query)
     for item in query:
         if (item in quotesDict):
             print("FOUND ",len(quotesDict[item]),  " ", item, "QUOTES: ",
quotesDict.get(item))
     print("\n--- Execution ---\n", (time.time() - start_time) * 1000,
"microseconds\n")

else:
     if "and" in query:
         query.remove("and")
     if "or" in query:
         query.remove("or")
     print("Performing AND search for: ", query)

This looks wrong. What about the case where neither and/or are in the query?

     for item in query:
         if (item in quotesDict):
             FoundQuotes = FoundQuotes + (quotesDict.get(item))
     FoundQuotes = list(set([x for x in FoundQuotes if FoundQuotes.count(x)
1]))

This doesn't look right either.
Foundquotes is a list of indexes. The comprehension builds a list of all the indexes that appear more than once - what about a quote that was only found once?

It then eliminates all the duplicates(set()) and returns it back to a list(why not leave it as a set?)

I'd have expected a simple conversion of FoundQuotes to a set would be what you wanted.

     for x in FoundQuotes:
         print(data_list[x])
     print("\n--- Execution ---\n", (time.time() - start_time) * 1000,
"microseconds\n")

The other problem is that you are serching the dictionary
several times, thus losing some of the speed advantage of
using a dictionary.

You would get more benefit from the dictionary if you adopt a try/except approach and just access the key directly. So, instead of:

>      for item in query:
>          if (item in quotesDict):
>              FoundQuotes = FoundQuotes + (quotesDict.get(item))

for item in query:
  try: FoundQuotes = FoundQuotes + quotesDict[item]
  except KeyError: pass

Or better still use the default value of get:

for item in query:
    FoundQuotes = FoundQuotes + quotesDict.get(item,[])

There are a few other things that could be tidied up but that should give you something to get started with.

--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor

Reply via email to