On Fri, 4 Nov 2005, Carroll, Barry wrote:
> My UDP client is receiving responses from the server, and now I need to > process them. A typical response string looks like this: > > "(0, ''),some data from the test system" > > The tuple represents the error code and message. If the command had > failed, the response would look like this: > > "(-1, 'Error message from the test system')" > > I need to extract the tuple from the rest of the response string. I can do > this using eval, like so: > > errtuple = eval(mytxt[:mytxt.find(')')+1]) > > Is there another, more specific method for transforming a sting into a > tuple? Hi Barry, Since this seems to be such a popular request, here is sample kludgy code that provides a parse() function that does the structure-building. This parser doesn't much adequate error checking yet, and I apologize in advance for that. But I just want to do something to make sure people don't use eval() to extract simple stuff out of network traffic. *grin* (In reality, we'd use a parser-generating tool like pyparsing to make the code below simpler and with good error messages.) ########################################################################## """Simple parsing of expressions. Meant to be a demo of how one could turn strings into structures. If we were to do this for real, though, we'd definitely use parser generator tools instead. Main usage: >>> parse("(0, ''),some data from the test system") (0, '') >>> parse("(-1, 'Error message from the test system')") (-1, 'Error message from the test system') """ import re stringRegex = re.compile(r""" ' # a single quote ( # followed by any number of [^'] # non-quote characters | # or (\') # an escaped quote )* ' """, re.VERBOSE) numberRegex = re.compile(r""" [+-]? ## optional sign \d+ ## one or more digits """, re.VERBOSE) def tokenize(s): """Returns an list of tokens. Each token will be of the form: (tokenType, datum) with the tokenType in ['string', 'number', '(', ')', ','] Tokenizes as much as it can. When it first hits a non-token, will give up and return what it can. """ tokens = [] while True: s = s.lstrip() if not s: break if stringRegex.match(s): m = stringRegex.match(s) tokens.append( ('string', m.group(0)[1:-1]) ) s = s[len(m.group(0)):] elif numberRegex.match(s): m = numberRegex.match(s) tokens.append( ('number', int(m.group(0))) ) s = s[len(m.group(0)):] elif s[0] in ['(', ')', ',']: tokens.append( (s[0], None) ) s = s[1:] else: break return tokens def parse(s): """Given a string s, parses out a single expression from s. The result may be a string, a number, or a tuple.""" tokens = tokenize(s) return parseExpression(tokens) def parseExpression(tokens): """Parses a single expression. An expression can either be a number, a string, or a tuple. """ if not tokens: raise ValueError, "Empty token list" firstToken = tokens[0] if firstToken[0] in ['number', 'string']: tokens.pop(0) return firstToken[1] elif firstToken[0] == '(': return parseTuple(tokens) else: raise ValueError, "Don't know how to handle", tokens[0] def parseTuple(tokens): """Parses a tuple expression. A tuple is a '(', followed by a bunch of comma separated expressions, followed by a ')'. """ elements = [] eat(tokens, '(') while True: if not tokens: raise ValueError, ("Expected either ',', an expression," + " or ')', but exhaused token list") if tokens[0][0] in ['number', 'string', '(']: elements.append(parseExpression(tokens)) if tokens[0][0] == ')': break else: eat(tokens, ',') elif tokens[0][0] == ')': break else: raise ValueError, ("Don't know how to handle %r" % (tokens[0],)) eat(tokens, ')') return tuple(elements) def eat(tokens, typeExpected): """Tries to eat a token of the given type, and returns its datum. If we can't, raises ValueError.""" if not tokens: raise ValueError, ("Expected %4, but exhaused token list" % (typeExpected,)) token = tokens.pop(0) if token != (typeExpected, None): raise ValueError, ("Expected %r, but got %s" % (typeExpected, token,)) return token[1] ########################################################################### Whew. That was a mouthful. *grin* But, again, that's because I'm cooking almost everything from scratch. Parser-generating tools will make this a lot simpler. Anyway, let's see how this parse() function works: ###### >>> parse("(0, ''),some data from the test system") (0, '') >>> parse("(-1, 'Error message from the test system')") (-1, 'Error message from the test system') >>> parse("'question 1) can this handle embedded parens in strings?'") 'question 1) can this handle embedded parens in strings?' >>> parse("('question 2) how about this?:', ((((1), 2), 3), 4), 5)") ('question 2) how about this?:', ((((1,), 2), 3), 4), 5) ###### The worst thing that might happen from parsing an arbitrary string with parse() will be an exception or with a structure that is way too deep or large. But other than that, parse() should be resistant to a code-injection attack since it doesn't do anything too special: it's mostly just list/string manipulation and recursion. I hope this helps! _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor