On 28/09/14 03:36, Armindo Rodrigues wrote:
have noted the beginning and end of the quotes list so you can easily skip
and go straight to the code section. ***
It would probably have been better to just delete all but a nfew of the
quotes. We don't need all of them to evaluate your code.
import re
from datetime import datetime
import time
################### DATA LIST STARTS HERE
data_list=["And now here is my secret, a very simple secret: It is only
with the heart that one can see rightly; what is essential is invisible to
the eye.",
"All grown-ups were once children... but only few of them remember it.",
...
"If you love a flower that lives on a star, then it's good at night, to
look up at the sky. All the stars are blossoming."]
################## CODE STARTS HERE
#Create a list of words taken from each individual word in the datalist
word_list = []
for item in data_list:
for word in item.split(" "):
word = re.sub('^[^a-zA-z]*|[^a-zA-Z]*$','', word)
word.strip() would be better here. You can specify a string of chars to
be stripped if its not only whitespace. Consider regular expressions as
a weapon of last resort.
word_list.append(word)
word_list = sorted(list(set(word_list))) #Remove repeated words
You don't need to convert the set into a list. sorted() works
with sets too.
quotesDict = {}
for word in word_list:
quotesDict.setdefault(word,[]) #Create a dictionary with keys based on
each word in the word list
By putting the words in the dictionary you lose the sorting you did
above. So the sorting was a waste of time.
for key, value in quotesDict.items():
indexofquote = 0
for quote in data_list:
You should use enumerate for this. It will automatically give you the
index and quote and be less error prone than maintaining the index yourself.
if key in quote:
quotesDict[key].append(indexofquote) #Append the index of the
found quotes to the dictionary key
indexofquote+=1
query=input("query: ")
query = query.strip(" ").split(" ")
query = list(set(query))
I don;t think you need the conversion to list here either.
You can just use the set.
start_time = time.time()
FoundQuotes = []
# Right now the OR search just prints out the index of the found quotes.
if ("or" in query) and ("and" not in query):
The logic here can be simplified by testing for 'and' first
if 'and' in query
remove 'or'
process and
elif 'or' in query
process 'or'
else process simple query
query.remove("or")
print("Performing OR search for: ", query)
for item in query:
if (item in quotesDict):
print("FOUND ",len(quotesDict[item]), " ", item, "QUOTES: ",
quotesDict.get(item))
print("\n--- Execution ---\n", (time.time() - start_time) * 1000,
"microseconds\n")
else:
if "and" in query:
query.remove("and")
if "or" in query:
query.remove("or")
print("Performing AND search for: ", query)
This looks wrong. What about the case where neither and/or are in the query?
for item in query:
if (item in quotesDict):
FoundQuotes = FoundQuotes + (quotesDict.get(item))
FoundQuotes = list(set([x for x in FoundQuotes if FoundQuotes.count(x)
1]))
This doesn't look right either.
Foundquotes is a list of indexes. The comprehension builds a list of all
the indexes that appear more than once - what about a quote that was
only found once?
It then eliminates all the duplicates(set()) and returns it back to a
list(why not leave it as a set?)
I'd have expected a simple conversion of FoundQuotes to a set would be
what you wanted.
for x in FoundQuotes:
print(data_list[x])
print("\n--- Execution ---\n", (time.time() - start_time) * 1000,
"microseconds\n")
The other problem is that you are serching the dictionary
several times, thus losing some of the speed advantage of
using a dictionary.
You would get more benefit from the dictionary if you adopt a try/except
approach and just access the key directly. So, instead of:
> for item in query:
> if (item in quotesDict):
> FoundQuotes = FoundQuotes + (quotesDict.get(item))
for item in query:
try: FoundQuotes = FoundQuotes + quotesDict[item]
except KeyError: pass
Or better still use the default value of get:
for item in query:
FoundQuotes = FoundQuotes + quotesDict.get(item,[])
There are a few other things that could be tidied up but that should
give you something to get started with.
--
Alan G
Author of the Learn to Program web site
http://www.alan-g.me.uk/
http://www.flickr.com/photos/alangauldphotos
_______________________________________________
Tutor maillist - [email protected]
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor