It occurred to me last night on the drive home that I should just run this through an xml parser, then lo and behold this email was sitting in my inbox when I got home. Having tried that, my data is not as clean as I first thought. It seems like a fairly simple fix, but durned if I can figure out how to do it. One of the problems is data such as this (viewed in the text editor, this is a log, not a stream):
1\x02 data data data \x03\x02 more data more data more data \x03\x02 even more data even 2more data even more data\x03\x02 Mary had a little\x03\x02 lamb whose fleece was white as 3snow\x03\x02 and so on. The 1,2,3 at the beginning of each above are just line numbers in the text editor, they do not actually exist. How do I read in the file, either in it's entirety or line by line, then output the text with as \x02 the event data \x03 on each line, and when python sees the \x03 it goes to a new line and continues to output? On Tue, Dec 27, 2016 at 7:46 PM, David Rock <[email protected]> wrote: > * Alan Gauld via Tutor <[email protected]> [2016-12-28 00:40]: > > On 27/12/16 19:44, richard kappler wrote: > > > Using python 2.7 - I have a large log file we recorded of streamed xml > data > > > that I now need to feed into another app for stress testing. The > problem is > > > the data comes in 2 formats. > > > > > > 1. each 'event' is a full set of xml data with opening and closing > tags + > > > x02 and x03 (stx and etx) > > > > > > 2. some events have all the xml data on one 'line' in the log, others > are > > > in typical nested xml format with lots of white space and multiple > 'lines' > > > in the log for each event, the first line of th e 'event' starting > with an > > > stx and the last line of the 'event' ending in an etx. > > > > It sounds as if an xml parser should work for both. After all > > xml doesn't care about layout and whitespace etc. > > > > Which xml parser are you using - I assume you are not trying > > to parse it manually using regex or string methjods - that's > > rarely a good idea for xml. > > Yeah, since everything appears to be <data>..</data>, the "event" flags > of [\x02] [\x03] may not even matter if you use an actual parser. > > -- > David Rock > [email protected] > _______________________________________________ > Tutor maillist - [email protected] > To unsubscribe or change subscription options: > https://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
