On Fri, Dec 27, 2013 at 11:55 PM, <t.giuse...@gmail.com> wrote: > I'm rewriting a program previously written in C #, and trying to keep the > same configuration file, I have a problem with untapped strings.
Not sure what you mean by "untapped" here? > Taking for example a classic line of apache log: > > 0.0.0.0 - [27/Dec/2013: 00:56:51 +0100] "GET / webdav / HTTP/1.1" 404 524 "-" > "Mozilla/5.0 (Windows, U, Windows NT 5.1, en-US , rv: 1.9.2.12) > Gecko/20101026 Firefox/3.6.12 " > > Is there any way to pull out the values so arranged as follows: > > ip = 0.0.0.0 > date = 27/Dec/2013: 00:56:51 +0100 > url = / webdav / > (Aside: Do you really have spaces in your URLs? That seems odd.) One common way to implement this sort of thing is with a regular expression. You can either derive a regex from your config file, or have users directly manage the regex. For the specific case of parsing the Apache common log format, there's plenty of material around. This page [1] has a tidy regex that'll do the job, and this module [2] purports to create a parser by reading the configuration line that creates it. I don't know anything about either, save that they came up in a Google search for 'python apache common log', along with a whole lot of other decent-looking results. But for a more general solution - supposing you have piles and piles of those parser strings - I'd be inclined to write a preparser that reads your config file and derives regex patterns. It needs to figure out what's a placeholder and what's literal text, then escape the literal text (if there are regex metacharacters in it) and come up with some sort of capturing sequence for the placeholder. I don't know what you'd want there; possibly (.*?) will be the best (that means "capture any number of characters, as few as possible"). But you know your data far better than I do. ChrisA [1] http://www.seehuhn.de/blog/52 [2] https://pypi.python.org/pypi/apachelog/1.0 -- https://mail.python.org/mailman/listinfo/python-list