* richard kappler <[email protected]> [2016-12-27 15:39]: > I was actually working somewhat in that direction while I waited. I had in > mind to use something along the lines of: > > > stx = '\x02' > etx = '\x03' > line1 = "" > > with open('original.log', 'r') as f1: > with open('new.log', 'w') as f2: > for line in f1: > if stx in line: > line1 = line1 + line > if not stx in line: > if not etx in line: > line1 = line1 + line > if etx in line: > line1 = line1 + line + '\n' > f2.write(line1) > line1 = "" > > > but that didn't work. It neither broke each line on etx (multiple events > with stx and etx on one line) nor did it concatenate the multi-line events.
A big part of the challenge sounds like it's inconsistent data formatting. You are going to have to identify some way to reliably check for the beginning/end of your data for it to work. Do you know if you will always have \x02 at the start of a section of input, for example? The way I usually do log parsing in that case is use the stx as a flag to start doing other things (ie, if I find stx, stuff lines until I see the next stx, then dump and continue). If you have intermediary data that is not between your stx and etx (comment lines, other data that you don't want), then it gets a lot harder. If you don't have at least a marginally consistent input, your only real option is probably going to be scanning by character and looking for the \x02 and \x03 to get a glob of data, then parse that glob with some kind of xml parser, since the data between those two is likely safe-ish. -- David Rock [email protected] _______________________________________________ Tutor maillist - [email protected] To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
