Hi, Thanks Kent, I'll check out the CSV module. I had a go with Pyparsing awhile ago, and it's clocking in at the 3 minute mark also.
Alan - the data is of the form - a = { b = 1 c = 2 d = { e = { f = 4 g = "Ultimate Showdown of Ultimate Destiny" } h = { i j k } } } Everything is whitespace delimited. I'd like to turn it into ["a", "=", "{", "b", "=", "1", "c", "=", "2", "d", "=", "{", "e", "=", "{", "f", "=", "4", "g", "=", "\"Ultimate Showdown of Ultimate Destiny\"", "}", "h", "=", "{", "i", "j", "k", "}", "}"] Regards, Liam Clarke On 1/26/06, Alan Gauld <[EMAIL PROTECTED]> wrote: > Hi Liam, > > I'm not sure I really understand what you are trying > to get to here. > > Can you provide a short sample of before/after data > so we can see what we are trying to achieve? > > Alan G > > ----- Original Message ----- > From: "Liam Clarke" <[EMAIL PROTECTED]> > To: "Python Tutor" <tutor@python.org> > Sent: Wednesday, January 25, 2006 1:18 PM > Subject: [Tutor] strings & splitting > > > Hi all, > > I have a large string which I'm attempting to manipulate, which I find > very convenient to call > large_string.split(" ") on to conveniently tokenise. > > Except, however for the double quoted strings within my string, which > contain spaces. > > At the moment I'm doing a split by \n, and then looping line by line, > splitting by spaces and then reuniting double quoted strings by > iterating over the split line, looking for mismatched quotation marks, > storing the indexes of each matching pair and then: > > for (l,r) in pairs: > . sub_string = q[l:r+1] #Up to r and including it. > . rejoined_string = " ".join(sub_string) > . indices = range(l,r+1) > . indices.reverse() > . for i in indices: q.pop(i) > . q.insert(l, rejoined_string) > > I'm doing it split line by split line, extending the resulting line > into a big flat list as I found out that Python doesn't cope overly > well with stuff like the above when it's a 800,000 item list, I think > it was the insert mainly. > > My question is, is there a more Pythonic solution to this? > > I was thinking of using a regex to pluck qualifying > quoted-space-including sentences out, and then trying to remember > where they went based on context, but that sounds prone to error; so I > thought of perhaps the same thing with a unique token of my own that I > can find once the list is created and then sub the original string > back in, but I wonder if calling index() repeatedly would be any > faster. > > I've got it down to 3 seconds now, but I'm trying to get... a stable > solution, if possible an elegant solution.The current one is prone to > breaking based on funny whitespace and is just ugly and prickly > looking. > > Regards, > > Liam Clarke > > > _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor