Re: Multiline regex

2010-07-21 Thread Steven D'Aprano
On Wed, 21 Jul 2010 10:06:14 -0500, Brandon Harris wrote: > what do you mean by slurp the entire file? I'm trying to use regular > expressions because line by line parsing will be too slow. And example > file would have somewhere in the realm of 6 million lines of code. And you think trying to r

Re: Multiline regex

2010-07-21 Thread Jeremy Sanders
Brandon Harris wrote: > I'm trying to read in and parse an ascii type file that contains > information that can span several lines. > Example: What about something like this (you need re.MULTILINE): In [16]: re.findall('^([^ ].*\n([ ].*\n)+)', a, re.MULTILINE) Out[16]: [('createNode animCurveTU

RE: Multiline regex

2010-07-21 Thread Andreas Tawn
>>> I could make it that simple, but that is also incredibly slow and on >>> a file with several million lines, it takes somewhere in the league of >>> half an hour to grab all the data. I need this to grab data from >>> many many file and return the data quickly. >>> >>> Brandon L. Harris >>> >> T

Re: Multiline regex

2010-07-21 Thread Brandon Harris
Could it be that there isn't just that type of data in the file? there are many different types, that is just one that I'm trying to grab. Brandon L. Harris Andreas Tawn wrote: I could make it that simple, but that is also incredibly slow and on a file with several million lines, it takes som

RE: RE: Multiline regex

2010-07-21 Thread Andreas Tawn
> I could make it that simple, but that is also incredibly slow and on a > file with several million lines, it takes somewhere in the league of > half an hour to grab all the data. I need this to grab data from many > many file and return the data quickly. > > Brandon L. Harris That's surprising.

Re: RE: Multiline regex

2010-07-21 Thread Brandon Harris
I could make it that simple, but that is also incredibly slow and on a file with several million lines, it takes somewhere in the league of half an hour to grab all the data. I need this to grab data from many many file and return the data quickly. Brandon L. Harris Andreas Tawn wrote: I'm

Re: Multiline regex

2010-07-21 Thread Peter Otten
Brandon Harris wrote: > I'm trying to read in and parse an ascii type file that contains > information that can span several lines. > Example: > > createNode animCurveTU -n "test:master_globalSmooth"; > setAttr ".tan" 9; > setAttr -s 4 ".ktv[0:3]" 101 0 163 0 169 0 201 0; > setAttr -

RE: Multiline regex

2010-07-21 Thread Andreas Tawn
> I'm trying to read in and parse an ascii type file that contains > information that can span several lines. > Example: > > createNode animCurveTU -n "test:master_globalSmooth"; > setAttr ".tan" 9; > setAttr -s 4 ".ktv[0:3]" 101 0 163 0 169 0 201 0; > setAttr -s 4 ".kit[3]" 10; >

Re: Multiline regex

2010-07-21 Thread Brandon Harris
At the moment I'm trying to stick with built in python modules to create tools for a much larger pipeline on multiple OSes. Brandon L. Harris Eknath Venkataramani wrote: On Wed, Jul 21, 2010 at 8:12 PM, Brandon Harris mailto:brandon.har...@reelfx.com>> wrote: I'm trying to read in an

Re: Multiline regex

2010-07-21 Thread Eknath Venkataramani
On Wed, Jul 21, 2010 at 8:12 PM, Brandon Harris wrote: > I'm trying to read in and parse an ascii type file that contains > information that can span several lines. > Do you have to use only regex? If not, I'd certainly suggest 'pyparsing'. It's a pleasure to use and very easy on the eye too, if

Re: Multiline regex

2010-07-21 Thread Brandon Harris
what do you mean by slurp the entire file? I'm trying to use regular expressions because line by line parsing will be too slow. And example file would have somewhere in the realm of 6 million lines of code. Brandon L. Harris Rodrick Brown wrote: Slurp the entire file into a string and pick o

Re: Multiline regex

2010-07-21 Thread Rodrick Brown
Slurp the entire file into a string and pick out the fields you need. Sent from my iPhone 4. On Jul 21, 2010, at 10:42 AM, Brandon Harris wrote: > I'm trying to read in and parse an ascii type file that contains information > that can span several lines. > Example: > > createNode animCurveTU

Multiline regex

2010-07-21 Thread Brandon Harris
I'm trying to read in and parse an ascii type file that contains information that can span several lines. Example: createNode animCurveTU -n "test:master_globalSmooth"; setAttr ".tan" 9; setAttr -s 4 ".ktv[0:3]" 101 0 163 0 169 0 201 0; setAttr -s 4 ".kit[3]" 10; setAttr -s 4 ".kot

Re: Multiline regex help

2005-03-03 Thread Kent Johnson
Steven Bethard wrote: Kent Johnson wrote: for line in raw_data: if line.startswith('RelevantInfo1'): info1 = raw_data.next().strip() elif line.startswith('RelevantInfo2'): info2 = raw_data.next().strip() elif line.startswith('RelevantInfo3'): info3 = raw_data.nex

Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Kent Johnson wrote: for line in raw_data: if line.startswith('RelevantInfo1'): info1 = raw_data.next().strip() elif line.startswith('RelevantInfo2'): info2 = raw_data.next().strip() elif line.startswith('RelevantInfo3'): info3 = raw_data.next().strip() sc

Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 3 Mar 2005 12:26:37 -0800, James Stroud <[EMAIL PROTECTED]> wrote: > Have a look at "martel", part of biopython. The world of bioinformatics is > filled with files with structure like this. > > http://www.biopython.org/docs/api/public/Martel-module.html > > James Thanks for the link. Stev

Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 16:25:39 -0500, Kent Johnson <[EMAIL PROTECTED]> wrote: > Here is another attempt. I'm still not sure I understand what form you want > the data in. I made a > dict -> dict -> list structure so if you lookup e.g. scores['10/11/04']['60'] > you get a list of all > the Relevan

Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 13:45:31 -0700, Steven Bethard <[EMAIL PROTECTED]> wrote: > > I think if you use the non-greedy .*? instead of the greedy .*, you'll > get this behavior. For example: > > py> s = """\ > ... Gibberish > ... 53 > ... MoreGarbage > [snip a whole bunch of stuff] > ... RelevantInfo

Re: Multiline regex help

2005-03-03 Thread Kent Johnson
Here is another attempt. I'm still not sure I understand what form you want the data in. I made a dict -> dict -> list structure so if you lookup e.g. scores['10/11/04']['60'] you get a list of all the RelevantInfo2 values for Relevant1='10/11/04' and Relevant2='60'. The parser is a simple-minde

Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Yatima wrote: On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard <[EMAIL PROTECTED]> wrote: A possible solution, using the re module: py> s = """\ ... Gibberish ... 53 ... MoreGarbage ... 12 ... RelevantInfo1 ... 10/10/04 ... NothingImportant ... ThisDoesNotMatter ... 44 ... RelevantInfo2 ... 22 ..

Re: Multiline regex help

2005-03-03 Thread James Stroud
I found the original paper for Martel: http://www.dalkescientific.com/Martel/ipc9/ On Thursday 03 March 2005 12:26 pm, James Stroud wrote: > Have a look at "martel", part of biopython. The world of bioinformatics is > filled with files with structure like this. > > http://www.biopython.org/docs/a

Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 07:14:50 -0500, Kent Johnson <[EMAIL PROTECTED]> wrote: > > Here is a way to create a list of [RelevantInfo, value] pairs: > import cStringIO > > raw_data = '''Gibberish > 53 > MoreGarbage > 12 > RelevantInfo1 > 10/10/04 > NothingImportant > ThisDoesNotMatter > 44 > RelevantInfo

Re: Multiline regex help

2005-03-03 Thread James Stroud
Have a look at "martel", part of biopython. The world of bioinformatics is filled with files with structure like this. http://www.biopython.org/docs/api/public/Martel-module.html James On Thursday 03 March 2005 12:03 pm, Yatima wrote: > On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard <[EMAI

Re: Multiline regex help

2005-03-03 Thread Yatima
On Thu, 03 Mar 2005 09:54:02 -0700, Steven Bethard <[EMAIL PROTECTED]> wrote: > > A possible solution, using the re module: > > py> s = """\ > ... Gibberish > ... 53 > ... MoreGarbage > ... 12 > ... RelevantInfo1 > ... 10/10/04 > ... NothingImportant > ... ThisDoesNotMatter > ... 44 > ... RelevantI

Re: Multiline regex help

2005-03-03 Thread Steven Bethard
Yatima wrote: Hey Folks, I've got some info in a bunch of files that kind of looks like so: Gibberish 53 MoreGarbage 12 RelevantInfo1 10/10/04 NothingImportant ThisDoesNotMatter 44 RelevantInfo2 22 BlahBlah 343 RelevantInfo3 23 Hubris Crap 34 and so on... Anyhow, these "fields" repeat several times

Re: Multiline regex help

2005-03-03 Thread Kent Johnson
Yatima wrote: Hey Folks, I've got some info in a bunch of files that kind of looks like so: Gibberish 53 MoreGarbage 12 RelevantInfo1 10/10/04 NothingImportant ThisDoesNotMatter 44 RelevantInfo2 22 BlahBlah 343 RelevantInfo3 23 Hubris Crap 34 and so on... Anyhow, these "fields" repeat several times

Multiline regex help

2005-03-03 Thread Yatima
Hey Folks, I've got some info in a bunch of files that kind of looks like so: Gibberish 53 MoreGarbage 12 RelevantInfo1 10/10/04 NothingImportant ThisDoesNotMatter 44 RelevantInfo2 22 BlahBlah 343 RelevantInfo3 23 Hubris Crap 34 and so on... Anyhow, these "fields" repeat several times in a give