Re: [Tutor] splits and pops

bob gailer Sat, 12 Jul 2008 07:31:08 -0700

Please reply to the list and not just me. That way we all get tocontribute and to learn.


Eric Abrahamsen wrote:

Sorry I haven't explained this clearly, it's just one more symptom ofmy confusion... Your example has a tab between records as well asbetween fields:


That's not how I see it! Look again:

"11\t12\t13\t\n14\t15\t16\t17\t18\t19\n21\t22\t23\t24\t25\t26\t27\t28\t29"

my text file had tabs only between fields, and only a newline betweenrecords.
The test string I was practicing with was this:

test = 'one\ttwo\tthree\nfour\tfive\tsix'

split on tabs produced this:

test = ['one', 'two', 'three\nfour', 'five', 'six']
My loop (breaking test[2] on '\n') worked fine with this test, whichwas what confused me. I only realized what the problem was when Itried it on a test like this:
test = ['one', 'two', 'three\nfour', 'five', 'six', 'seven\neight','nine']
That showed me that I needed to step one extra item, in order to reachthe next item that needed to be split. My brain still hurts.
E

On Jul 12, 2008, at 9:44 PM, bob gailer wrote:
Eric Abrahamsen wrote:
I have a horribly stupid text parsing problem that is driving mecrazy, and making me think my Python skills have a long, long way togo...
What I've got is a poorly-though-out SQL dump, in the form of a textfile, where each record is separated by a newline, and each field ineach record is separated by a tab. BUT, and this is what sinks me,there are also newlines within some of the fields. Newlines are not'safe' – they could appear anywhere – but tabs are 'safe' – theyonly appear as field delimiters.
There are nine fields per record. All I can think to do is read thefile in as a string, then split on tabs. That gives me a list whereevery eighth item is a string like this: u'last-field\nfirst-field'.Now I want to iterate through the list of strings, taking everyeighth item, splitting it on '\n', and replacing it with the tworesulting strings. Then I'll have the proper flat list where everynine list items constitutes one complete record, and I'm good to gofrom there.
I've been fooling around with variations on the following (assumingsplitlist = fullstring.split('\t')):
for x in xrange(8, sys.maxint, 8):
   try:
       splitlist[x:x] = splitlist.pop(x).split('\n')
   except IndexError:
       break
The first line correctly steps over all the list items that need tobe split, but I can't come up with a line that correctly replacesthose list items with the two strings I want. Either the cycle goesoff and splits the wrong strings, or I get nested list items, whichis not what I want. Can someone please point me in the rightdirection here?
I tried a simple case with fullstring ="11\t12\t13\t\n14\t15\t16\t17\t18\t19\n21\t22\t23\t24\t25\t26\t27\t28\t29"Your spec is a little vague "each field in each record is separatedby a tab". I assumed that to mean "fields in each record areseparated by tabs".The result was ['11', '12', '13', '\n14', '15', '16', '17', '18','19', '21', '22', '23', '24', '25', '26', '27', '28', '29']
which I had expected.

Give us an example of text for which it does not work.
--
Bob Gailer
919-636-4239 Chapel Hill, NC



--
Bob Gailer
919-636-4239 Chapel Hill, NC

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] splits and pops

Reply via email to