Re: xml.sax parsing elements with the same name
If you are using jython, then you might also want to consider VTD-XML, which is a lot easier to use and faster than SAX, native XPath support may be useful too http;//vtd-xml.sf.net On Jan 12, 12:13 am, Stefan Behnel stefan...@behnel.de wrote: amadain, 11.01.2010 20:13: I have an event log with 100s of thousands of entries with logs of the form: event eventTimestamp=2009-12-18T08:22:49.035 uniqueId=1261124569.35725_PFS_1_1340035961 result value=Blocked/ filters filter code=338 type=Filter_Name diagnostic result value=Triggered/ /diagnostic /filter filter code=338 type=Filter_Name diagnostic result value=Blocked/ /diagnostic /filter /filters /event I am usingxml.saxto parse the event log. You should give ElementTree's iterparse() a try (xml.etree package). Instead of a stream of simple events, it will give you a stream of subtrees, which are a lot easier to work with. You can intercept the event stream on each 'event' tag, handle it completely in one obvious code step, and then delete any content you are done with to safe memory. It's also very fast, you will like not loose muchperformancecompared toxml.sax. Stefan- Hide quoted text - - Show quoted text - -- http://mail.python.org/mailman/listinfo/python-list
Re: xml.sax parsing elements with the same name
On Mon, 2010-01-11 at 13:24 -0800, amadain wrote: On Jan 11, 9:03 pm, John Bokma j...@castleamber.com wrote: amadain mfmdev...@gmail.com writes: I was thinking about something like: self.filterIndex = 0 in startElement: if name == 'filter': self.filterIndex += 1 return if name == 'result' and self.filterIndex == 1: ... = attrs.get('value', '') in EndElement if name == 'filters': self.filterIndex = 0 If you want the result of the first filter in filters Thank you. I will try that If you document is reasonably complex I usually define some modes like: BPML_BOOTSTRAP_MODE = 0 BPML_PACKAGE_MODE= 1 BPML_PROCESS_MODE= 2 BPML_CONTEXT_MODE= 3 BPML_EVENT_MODE = 10 BPML_FAULTS_MODE = 11 BPML_ATTRIBUTES_MODE = 12 - so I can self.mode.append(BPML_PROCESS_MODE) when I enter an element (startElement) and self.mode = self.mode[:-1] when I leave an element (endElement). This provides you with a complete 'stack trace' of how you got where you are and still lets you efficiently process the stream [verses using evil document model]. In startElement you can check the current mode and tag with something like - ... elif (name == 'event' and self.mode[-1] -- BPML_PROCESS_MODE): ... -- OpenGroupware developer: awill...@whitemice.org http://whitemiceconsulting.blogspot.com/ OpenGroupare Cyrus IMAPd documenation @ http://docs.opengroupware.org/Members/whitemice/wmogag/file_view -- http://mail.python.org/mailman/listinfo/python-list
Re: xml.sax parsing elements with the same name
amadain, 11.01.2010 20:13: I have an event log with 100s of thousands of entries with logs of the form: event eventTimestamp=2009-12-18T08:22:49.035 uniqueId=1261124569.35725_PFS_1_1340035961 result value=Blocked/ filters filter code=338 type=Filter_Name diagnostic result value=Triggered/ /diagnostic /filter filter code=338 type=Filter_Name diagnostic result value=Blocked/ /diagnostic /filter /filters /event I am using xml.sax to parse the event log. You should give ElementTree's iterparse() a try (xml.etree package). Instead of a stream of simple events, it will give you a stream of subtrees, which are a lot easier to work with. You can intercept the event stream on each 'event' tag, handle it completely in one obvious code step, and then delete any content you are done with to safe memory. It's also very fast, you will like not loose much performance compared to xml.sax. Stefan -- http://mail.python.org/mailman/listinfo/python-list
xml.sax parsing elements with the same name
I have an event log with 100s of thousands of entries with logs of the form: event eventTimestamp=2009-12-18T08:22:49.035 uniqueId=1261124569.35725_PFS_1_1340035961 result value=Blocked/ filters filter code=338 type=Filter_Name diagnostic result value=Triggered/ /diagnostic /filter filter code=338 type=Filter_Name diagnostic result value=Blocked/ /diagnostic /filter /filters /event I am using xml.sax to parse the event log. The trouble with the file above is when I parse for result value I get the last result value (Blocked from above). I want to get the result value triggered (the second in the event). my code is as follows: def startElement(self, name, attrs): if name == 'event': self.eventTime = attrs.get('eventTimestamp',) self.eventUniqueId = attrs.get('uniqueId', ) if name == 'result': self.resultValue = attrs.get('value',) return def endElement(self, name): if name==event: result= eval(self.filter) if result: ... How do I get the result value I require when events have the same names like above? -- http://mail.python.org/mailman/listinfo/python-list
Re: xml.sax parsing elements with the same name
amadain mfmdev...@gmail.com writes: I have an event log with 100s of thousands of entries with logs of the form: event eventTimestamp=2009-12-18T08:22:49.035 uniqueId=1261124569.35725_PFS_1_1340035961 result value=Blocked/ filters filter code=338 type=Filter_Name diagnostic result value=Triggered/ /diagnostic /filter filter code=338 type=Filter_Name diagnostic result value=Blocked/ /diagnostic /filter /filters /event I am using xml.sax to parse the event log. The trouble with the file above is when I parse for result value I get the last result value (Blocked from above). I want to get the result value triggered (the second in the event). my code is as follows: def startElement(self, name, attrs): if name == 'event': self.eventTime = attrs.get('eventTimestamp',) self.eventUniqueId = attrs.get('uniqueId', ) if name == 'result': self.resultValue = attrs.get('value',) return def endElement(self, name): if name==event: result= eval(self.filter) if result: ... How do I get the result value I require when events have the same names like above? You have to keep track if you're inside a filters section, and keep track of the filter elements (first, second, etc.) assuming you want the result value of the first filter. -- John Bokma Read my blog: http://johnbokma.com/ Hire me (Perl/Python): http://castleamber.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: xml.sax parsing elements with the same name
On Jan 11, 7:26 pm, John Bokma j...@castleamber.com wrote: amadain mfmdev...@gmail.com writes: I have an event log with 100s of thousands of entries with logs of the form: event eventTimestamp=2009-12-18T08:22:49.035 uniqueId=1261124569.35725_PFS_1_1340035961 result value=Blocked/ filters filter code=338 type=Filter_Name diagnostic result value=Triggered/ /diagnostic /filter filter code=338 type=Filter_Name diagnostic result value=Blocked/ /diagnostic /filter /filters /event I am using xml.sax to parse the event log. The trouble with the file above is when I parse for result value I get the last result value (Blocked from above). I want to get the result value triggered (the second in the event). my code is as follows: def startElement(self, name, attrs): if name == 'event': self.eventTime = attrs.get('eventTimestamp',) self.eventUniqueId = attrs.get('uniqueId', ) if name == 'result': self.resultValue = attrs.get('value',) return def endElement(self, name): if name==event: result= eval(self.filter) if result: ... How do I get the result value I require when events have the same names like above? You have to keep track if you're inside a filters section, and keep track of the filter elements (first, second, etc.) assuming you want the result value of the first filter. -- John Bokma Read my blog:http://johnbokma.com/ Hire me (Perl/Python):http://castleamber.com/ how do I keep track? The first result value is outside a filters section and the rest are. Do you mean something like: def startElement(self, name, attrs): if name == 'event': self.eventTime = attrs.get('eventTimestamp',) self.eventUniqueId = attrs.get('uniqueId', ) if name == 'result': self.resultValue = attrs.get('value',) if name == filters: if name == 'result': self.resultValueF = attrs.get('value',) return A -- http://mail.python.org/mailman/listinfo/python-list
Re: xml.sax parsing elements with the same name
amadain mfmdev...@gmail.com writes: On Jan 11, 7:26 pm, John Bokma j...@castleamber.com wrote: amadain mfmdev...@gmail.com writes: event eventTimestamp=2009-12-18T08:22:49.035 uniqueId=1261124569.35725_PFS_1_1340035961 result value=Blocked/ filters filter code=338 type=Filter_Name diagnostic result value=Triggered/ /diagnostic /filter filter code=338 type=Filter_Name diagnostic result value=Blocked/ /diagnostic /filter /filters /event how do I keep track? The first result value is outside a filters section and the rest are. Do you mean something like: def startElement(self, name, attrs): if name == 'event': self.eventTime = attrs.get('eventTimestamp',) self.eventUniqueId = attrs.get('uniqueId', ) if name == 'result': self.resultValue = attrs.get('value',) if name == filters: if name == 'result': self.resultValueF = attrs.get('value',) return I was thinking about something like: self.filterIndex = 0 in startElement: if name == 'filter': self.filterIndex += 1 return if name == 'result' and self.filterIndex == 1: ... = attrs.get('value', '') in EndElement if name == 'filters': self.filterIndex = 0 If you want the result of the first filter in filters -- John Bokma Read my blog: http://johnbokma.com/ Hire me (Perl/Python): http://castleamber.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: xml.sax parsing elements with the same name
On Jan 11, 9:03 pm, John Bokma j...@castleamber.com wrote: amadain mfmdev...@gmail.com writes: On Jan 11, 7:26 pm, John Bokma j...@castleamber.com wrote: amadain mfmdev...@gmail.com writes: event eventTimestamp=2009-12-18T08:22:49.035 uniqueId=1261124569.35725_PFS_1_1340035961 result value=Blocked/ filters filter code=338 type=Filter_Name diagnostic result value=Triggered/ /diagnostic /filter filter code=338 type=Filter_Name diagnostic result value=Blocked/ /diagnostic /filter /filters /event how do I keep track? The first result value is outside a filters section and the rest are. Do you mean something like: def startElement(self, name, attrs): if name == 'event': self.eventTime = attrs.get('eventTimestamp',) self.eventUniqueId = attrs.get('uniqueId', ) if name == 'result': self.resultValue = attrs.get('value',) if name == filters: if name == 'result': self.resultValueF = attrs.get('value',) return I was thinking about something like: self.filterIndex = 0 in startElement: if name == 'filter': self.filterIndex += 1 return if name == 'result' and self.filterIndex == 1: ... = attrs.get('value', '') in EndElement if name == 'filters': self.filterIndex = 0 If you want the result of the first filter in filters -- John Bokma Read my blog:http://johnbokma.com/ Hire me (Perl/Python):http://castleamber.com/e e Thank you. I will try that -- http://mail.python.org/mailman/listinfo/python-list