Re: Parsing a search string
That's not bad going considering you've only run out of alcohol at 6 in the morning and *then* ask python questions. Anyway - you could write a charcter-by-character parser function that would do that in a few minutes... My 'listquote' module has one - but it splits on commas not whitespace. Sounds like you're looking for a one-liner though regular expressions *could* do it... Regards, Fuzzy http://www.voidspace.org.uk/atlantibots/pythonutils.html#llistquote -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing a search string
Freddie wrote: Happy new year! Since I have run out of alcohol, I'll ask a question that I haven't really worked out an answer for yet. Is there an elegant way to turn something like: moo cow farmer john -zug into: ['moo', 'cow', 'farmer john'], ['zug'] I'm trying to parse a search string so I can use it for SQL WHERE constraints, preferably without horrifying regular expressions. Uhh yeah. The shlex approach, finished: searchstring = 'moo cow farmer john -zug' lexer = shlex.shlex(searchstring) lexer.wordchars += '-' poslist, neglist = [], [] while 1: token = lexer.get_token() # token is '' on eof if not token: break # remove quotes if token[0] in '\'': token = token[1:-1] # select in which list to put it if token[0] == '-': neglist.append(token[1:]) else: poslist.append(token) regards, Reinhold -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing a search string
I am right in the middle of doing text parsing so I used your example as a mental exercise. :-) Here's a NDFA for your text: b 0 1-9 a-Z , . + - '\n S0: S0 E E S1 E E E S3 E S2 E S1: T1 E E S1 E E E E E E T1 S2: S2 E E S2 E E E E E T2 E S3: T3 E E S3 E E E E E E T3 and the end-states are: E: error in text T1: You have the words: moo, cow T2: You get farmer john (w quotes) T3: You get zug Can't gurantee that I did it right - I did it really quick - and it's *specific* to your text string. Now just need to hire a programmer to write some clean Python parsing code. :-) -- It's me Freddie [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Happy new year! Since I have run out of alcohol, I'll ask a question that I haven't really worked out an answer for yet. Is there an elegant way to turn something like: moo cow farmer john -zug into: ['moo', 'cow', 'farmer john'], ['zug'] I'm trying to parse a search string so I can use it for SQL WHERE constraints, preferably without horrifying regular expressions. Uhh yeah. From 2005, Freddie -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing a search string
Ah! that is what the __future__ brings I guess. Damn that progress making me outdated ;) Python 2.2.3 ( a lot of extensions I use are stuck there , so I still use it) M.E.Farmer -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing a search string
M.E.Farmer wrote: Ah! that is what the __future__ brings I guess. Damn that progress making me outdated ;) Python 2.2.3 ( a lot of extensions I use are stuck there , so I still use it) I'm also positively surprised how many cute little additions are there every new Python version. Great thanks to the great devs! Reinhold -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing a search string
Andrew Dalke [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] It's me wrote: Here's a NDFA for your text: b 0 1-9 a-Z , . + - '\n S0: S0 E E S1 E E E S3 E S2 E S1: T1 E E S1 E E E E E E T1 S2: S2 E E S2 E E E E E T2 E S3: T3 E E S3 E E E E E E T3 Now if I only had an NDFA for parsing that syntax... Just finished one (don't ask me to show it - very clumpsy Python code - still in learning mode). :) Here's one for parsing integer: # b 0 1-9 , . + - ' a-Z \n # S0: S0 S0 S1 T0 E S2 S2 E E E T0 # S1: S3 S1 S1 T1 E E E E E E T1 # S2: E S2 S1 E E E E E E E E # S3: S3 T2 T2 T1 T2 T2 T2 T2 T2 T2 T1 T0: you got a null token T1: you got a good token, separator was , T2: you got a good token b, separator was E: bad token :) Andrew [EMAIL PROTECTED] -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing a search string
Freddie wrote: I'm trying to parse a search string so I can use it for SQL WHERE constraints, preferably without horrifying regular expressions. Uhh yeah. If you're interested, I've written a function that parses query strings using a customizable version of Google's search syntax. Features include: - Binary operators like OR - Unary operators like '-' for exclusion - Customizable modifiers like Google's site:, intitle:, inurl: syntax - *No* query is an error (invalid characters are fixed up, etc.) - Result is a dictionary in one of two possible forms, both geared towards being input to an search method for your database I'd be glad to post the code, although I'd probably want to have a last look at it before I let others see it... -- Brian Beck Adventurer of the First Order -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing a search string
Andrew Dalke wrote: It's me wrote: Here's a NDFA for your text: b 0 1-9 a-Z , . + - '\n S0: S0 E E S1 E E E S3 E S2 E S1: T1 E E S1 E E E E E E T1 S2: S2 E E S2 E E E E E T2 E S3: T3 E E S3 E E E E E E T3 Now if I only had an NDFA for parsing that syntax... Parsing your sentence as written (if I only had): If you were the sole keeper of the secret?? Parsing it as intended (if only I had), and ignoring the smiley: Looks like a fairly straight-forward state-transition table to me. The column headings are not aligned properly in the message, b means blank, a-Z is bletchworthy, but the da Vinci code it ain't. If only we had an NDFA (whatever that is) for guessing what acronyms mean ... Where I come from: DFA = deterministic finite-state automaton NFA = non-det.. SFA = content-free NFI = concept-free NDFA = National Dairy Farmers' Association HTH, and Happy New Year! -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing a search string
John Machin [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] Andrew Dalke wrote: It's me wrote: Here's a NDFA for your text: b 0 1-9 a-Z , . + - '\n S0: S0 E E S1 E E E S3 E S2 E S1: T1 E E S1 E E E E E E T1 S2: S2 E E S2 E E E E E T2 E S3: T3 E E S3 E E E E E E T3 Now if I only had an NDFA for parsing that syntax... Parsing your sentence as written (if I only had): If you were the sole keeper of the secret?? Parsing it as intended (if only I had), and ignoring the smiley: Looks like a fairly straight-forward state-transition table to me. Exactly. The column headings are not aligned properly in the message, b means blank, a-Z is bletchworthy, but the da Vinci code it ain't. If only we had an NDFA (whatever that is) for guessing what acronyms mean ... I believe (I am not a computer science major): NDFA = non-deterministic finite automata and: S: state T: terminal E: error So, S1 means State #1..T1 means Terminal #1, so forth You are correct that parsing that table is not hard. a) Set up a stack and place the buffer onto the stack, start with S0 b) For each character that comes from the stack, looking up the next state for that token c) If it's not a T or E state, jump to that state d) If it's a T or E state, finish -- http://mail.python.org/mailman/listinfo/python-list
Re: Parsing a search string
Reinhold Birkenfeld wrote: Freddie wrote: Happy new year! Since I have run out of alcohol, I'll ask a question that I haven't really worked out an answer for yet. Is there an elegant way to turn something like: moo cow farmer john -zug into: ['moo', 'cow', 'farmer john'], ['zug'] I'm trying to parse a search string so I can use it for SQL WHERE constraints, preferably without horrifying regular expressions. Uhh yeah. The shlex approach, finished: searchstring = 'moo cow farmer john -zug' lexer = shlex.shlex(searchstring) lexer.wordchars += '-' poslist, neglist = [], [] while 1: token = lexer.get_token() # token is '' on eof if not token: break # remove quotes if token[0] in '\'': token = token[1:-1] # select in which list to put it if token[0] == '-': neglist.append(token[1:]) else: poslist.append(token) regards, Reinhold Thanks for this, though there was one issue: lexer = shlex.shlex('moo cow +farmer john -dog') lexer.wordchars += '-+' while 1: ... tok = lexer.get_token() ... if not tok: break ... print tok ... moo cow +farmer john -dog The '+farmer john' part would be turned into two seperate words, '+farmer' and 'john'. I ended up using shlex.split() (which the docs say is new in Python 2.3), which gives me the desired result. Thanks for the help from yourself and M.E.Farmer :) Freddie shlex.split('moo cow +farmer john -evil dog') ['moo', 'cow', '+farmer john', '-evil dog'] shlex.split('moo cow +farmer john -evil dog +elephant') ['moo', 'cow', '+farmer john', '-evil dog', '+elephant'] -- http://mail.python.org/mailman/listinfo/python-list