Please reply to the list and not just me. That way we all get to
contribute and to learn.
Eric Abrahamsen wrote:
Sorry I haven't explained this clearly, it's just one more symptom of
my confusion... Your example has a tab between records as well as
between fields:
That's not how I see it! Look again:
"11\t12\t13\t\n14\t15\t16\t17\t18\t19\n21\t22\t23\t24\t25\t26\t27\t28\t29"
my text file had tabs only between fields, and only a newline between
records.
The test string I was practicing with was this:
test = 'one\ttwo\tthree\nfour\tfive\tsix'
split on tabs produced this:
test = ['one', 'two', 'three\nfour', 'five', 'six']
My loop (breaking test[2] on '\n') worked fine with this test, which
was what confused me. I only realized what the problem was when I
tried it on a test like this:
test = ['one', 'two', 'three\nfour', 'five', 'six', 'seven\neight',
'nine']
That showed me that I needed to step one extra item, in order to reach
the next item that needed to be split. My brain still hurts.
E
On Jul 12, 2008, at 9:44 PM, bob gailer wrote:
Eric Abrahamsen wrote:
I have a horribly stupid text parsing problem that is driving me
crazy, and making me think my Python skills have a long, long way to
go...
What I've got is a poorly-though-out SQL dump, in the form of a text
file, where each record is separated by a newline, and each field in
each record is separated by a tab. BUT, and this is what sinks me,
there are also newlines within some of the fields. Newlines are not
'safe' – they could appear anywhere – but tabs are 'safe' – they
only appear as field delimiters.
There are nine fields per record. All I can think to do is read the
file in as a string, then split on tabs. That gives me a list where
every eighth item is a string like this: u'last-field\nfirst-field'.
Now I want to iterate through the list of strings, taking every
eighth item, splitting it on '\n', and replacing it with the two
resulting strings. Then I'll have the proper flat list where every
nine list items constitutes one complete record, and I'm good to go
from there.
I've been fooling around with variations on the following (assuming
splitlist = fullstring.split('\t')):
for x in xrange(8, sys.maxint, 8):
try:
splitlist[x:x] = splitlist.pop(x).split('\n')
except IndexError:
break
The first line correctly steps over all the list items that need to
be split, but I can't come up with a line that correctly replaces
those list items with the two strings I want. Either the cycle goes
off and splits the wrong strings, or I get nested list items, which
is not what I want. Can someone please point me in the right
direction here?
I tried a simple case with fullstring =
"11\t12\t13\t\n14\t15\t16\t17\t18\t19\n21\t22\t23\t24\t25\t26\t27\t28\t29"
Your spec is a little vague "each field in each record is separated
by a tab". I assumed that to mean "fields in each record are
separated by tabs".
The result was ['11', '12', '13', '\n14', '15', '16', '17', '18',
'19', '21', '22', '23', '24', '25', '26', '27', '28', '29']
which I had expected.
Give us an example of text for which it does not work.
--
Bob Gailer
919-636-4239 Chapel Hill, NC
--
Bob Gailer
919-636-4239 Chapel Hill, NC
_______________________________________________
Tutor maillist - Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor