Re: xml.sax parsing elements with the same name

2010-02-08 Thread dontcare
If you are using jython, then you might also want to consider VTD-XML,
which is
a lot easier to use and faster than SAX, native XPath support may be
useful too

http;//vtd-xml.sf.net

On Jan 12, 12:13 am, Stefan Behnel stefan...@behnel.de wrote:
 amadain, 11.01.2010 20:13:





  I have an event log with 100s of thousands of entries with logs of the
  form:

  event eventTimestamp=2009-12-18T08:22:49.035
  uniqueId=1261124569.35725_PFS_1_1340035961
     result value=Blocked/
        filters
            filter code=338 type=Filter_Name
                diagnostic
                     result value=Triggered/
                /diagnostic
            /filter
            filter code=338 type=Filter_Name
                diagnostic
                     result value=Blocked/
                /diagnostic
            /filter
        /filters
  /event

  I am usingxml.saxto parse the event log.

 You should give ElementTree's iterparse() a try (xml.etree package).
 Instead of a stream of simple events, it will give you a stream of
 subtrees, which are a lot easier to work with. You can intercept the event
 stream on each 'event' tag, handle it completely in one obvious code step,
 and then delete any content you are done with to safe memory.

 It's also very fast, you will like not loose muchperformancecompared 
 toxml.sax.

 Stefan- Hide quoted text -

 - Show quoted text -

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: xml.sax parsing elements with the same name

2010-01-15 Thread Adam Tauno Williams
On Mon, 2010-01-11 at 13:24 -0800, amadain wrote:
 On Jan 11, 9:03 pm, John Bokma j...@castleamber.com wrote:
  amadain mfmdev...@gmail.com writes:
  I was thinking about something like:
  self.filterIndex = 0
  in startElement:
  if name == 'filter':
 self.filterIndex += 1
 return
  if name == 'result' and self.filterIndex == 1:
 ...  = attrs.get('value', '')
  in EndElement
 if name == 'filters':
self.filterIndex = 0
  If you want the result of the first filter in filters
 Thank you. I will try that

If you document is reasonably complex I usually define some modes like:

BPML_BOOTSTRAP_MODE  = 0
BPML_PACKAGE_MODE= 1
BPML_PROCESS_MODE= 2
BPML_CONTEXT_MODE= 3

BPML_EVENT_MODE  = 10
BPML_FAULTS_MODE = 11
BPML_ATTRIBUTES_MODE = 12

- so I can self.mode.append(BPML_PROCESS_MODE) when I enter an element
(startElement) and self.mode = self.mode[:-1] when I leave an element
(endElement).  This provides you with a complete 'stack trace' of how
you got where you are and still lets you efficiently process the stream
[verses using evil document model].  In startElement you can check the
current mode and tag with something like -
...
elif (name == 'event' and self.mode[-1] -- BPML_PROCESS_MODE):
...

-- 
OpenGroupware developer: awill...@whitemice.org
http://whitemiceconsulting.blogspot.com/
OpenGroupare  Cyrus IMAPd documenation @
http://docs.opengroupware.org/Members/whitemice/wmogag/file_view

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: xml.sax parsing elements with the same name

2010-01-12 Thread Stefan Behnel

amadain, 11.01.2010 20:13:

I have an event log with 100s of thousands of entries with logs of the
form:

event eventTimestamp=2009-12-18T08:22:49.035
uniqueId=1261124569.35725_PFS_1_1340035961
   result value=Blocked/
  filters
  filter code=338 type=Filter_Name
  diagnostic
   result value=Triggered/
  /diagnostic
  /filter
  filter code=338 type=Filter_Name
  diagnostic
   result value=Blocked/
  /diagnostic
  /filter
  /filters
/event

I am using xml.sax to parse the event log.


You should give ElementTree's iterparse() a try (xml.etree package). 
Instead of a stream of simple events, it will give you a stream of 
subtrees, which are a lot easier to work with. You can intercept the event 
stream on each 'event' tag, handle it completely in one obvious code step, 
and then delete any content you are done with to safe memory.


It's also very fast, you will like not loose much performance compared to 
xml.sax.


Stefan
--
http://mail.python.org/mailman/listinfo/python-list


xml.sax parsing elements with the same name

2010-01-11 Thread amadain
I have an event log with 100s of thousands of entries with logs of the
form:

event eventTimestamp=2009-12-18T08:22:49.035
uniqueId=1261124569.35725_PFS_1_1340035961
   result value=Blocked/
  filters
  filter code=338 type=Filter_Name
  diagnostic
   result value=Triggered/
  /diagnostic
  /filter
  filter code=338 type=Filter_Name
  diagnostic
   result value=Blocked/
  /diagnostic
  /filter
  /filters
/event

I am using xml.sax to parse the event log. The trouble with the file
above is when I parse for result value I get the last result value
(Blocked from above). I want to get the result value triggered (the
second in the event).

my code is as follows:

def startElement(self, name, attrs):
if name == 'event':
self.eventTime = attrs.get('eventTimestamp',)
self.eventUniqueId = attrs.get('uniqueId', )
if name == 'result':
self.resultValue = attrs.get('value',)
return

def endElement(self, name):
if name==event:
result= eval(self.filter)
if result:
...

How do I get the result value I require when events have the same
names like above?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: xml.sax parsing elements with the same name

2010-01-11 Thread John Bokma
amadain mfmdev...@gmail.com writes:

 I have an event log with 100s of thousands of entries with logs of the
 form:

 event eventTimestamp=2009-12-18T08:22:49.035
 uniqueId=1261124569.35725_PFS_1_1340035961
result value=Blocked/
   filters
   filter code=338 type=Filter_Name
   diagnostic
result value=Triggered/
   /diagnostic
   /filter
   filter code=338 type=Filter_Name
   diagnostic
result value=Blocked/
   /diagnostic
   /filter
   /filters
 /event

 I am using xml.sax to parse the event log. The trouble with the file
 above is when I parse for result value I get the last result value
 (Blocked from above). I want to get the result value triggered (the
 second in the event).

 my code is as follows:

 def startElement(self, name, attrs):
 if name == 'event':
 self.eventTime = attrs.get('eventTimestamp',)
 self.eventUniqueId = attrs.get('uniqueId', )
 if name == 'result':
 self.resultValue = attrs.get('value',)
 return

 def endElement(self, name):
 if name==event:
 result= eval(self.filter)
 if result:
   ...

 How do I get the result value I require when events have the same
 names like above?

You have to keep track if you're inside a filters section, and keep
track of the filter elements (first, second, etc.) assuming you want the
result value of the first filter.

-- 
John Bokma

Read my blog: http://johnbokma.com/
Hire me (Perl/Python): http://castleamber.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: xml.sax parsing elements with the same name

2010-01-11 Thread amadain
On Jan 11, 7:26 pm, John Bokma j...@castleamber.com wrote:
 amadain mfmdev...@gmail.com writes:
  I have an event log with 100s of thousands of entries with logs of the
  form:

  event eventTimestamp=2009-12-18T08:22:49.035
  uniqueId=1261124569.35725_PFS_1_1340035961
     result value=Blocked/
        filters
            filter code=338 type=Filter_Name
                diagnostic
                     result value=Triggered/
                /diagnostic
            /filter
            filter code=338 type=Filter_Name
                diagnostic
                     result value=Blocked/
                /diagnostic
            /filter
        /filters
  /event

  I am using xml.sax to parse the event log. The trouble with the file
  above is when I parse for result value I get the last result value
  (Blocked from above). I want to get the result value triggered (the
  second in the event).

  my code is as follows:

      def startElement(self, name, attrs):
          if name == 'event':
              self.eventTime = attrs.get('eventTimestamp',)
              self.eventUniqueId = attrs.get('uniqueId', )
          if name == 'result':
              self.resultValue = attrs.get('value',)
          return

      def endElement(self, name):
          if name==event:
              result= eval(self.filter)
              if result:
             ...

  How do I get the result value I require when events have the same
  names like above?

 You have to keep track if you're inside a filters section, and keep
 track of the filter elements (first, second, etc.) assuming you want the
 result value of the first filter.

 --
 John Bokma

 Read my blog:http://johnbokma.com/
 Hire me (Perl/Python):http://castleamber.com/

how do I keep track? The first result value is outside a filters
section and the rest are. Do you mean something like:

def startElement(self, name, attrs):
if name == 'event':
self.eventTime = attrs.get('eventTimestamp',)
self.eventUniqueId = attrs.get('uniqueId', )
if name == 'result':
self.resultValue = attrs.get('value',)
if name == filters:
if name == 'result':
self.resultValueF = attrs.get('value',)
return

A
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: xml.sax parsing elements with the same name

2010-01-11 Thread John Bokma
amadain mfmdev...@gmail.com writes:

 On Jan 11, 7:26 pm, John Bokma j...@castleamber.com wrote:
 amadain mfmdev...@gmail.com writes:


  event eventTimestamp=2009-12-18T08:22:49.035
  uniqueId=1261124569.35725_PFS_1_1340035961
     result value=Blocked/
        filters
            filter code=338 type=Filter_Name
                diagnostic
                     result value=Triggered/
                /diagnostic
            /filter
            filter code=338 type=Filter_Name
                diagnostic
                     result value=Blocked/
                /diagnostic
            /filter
        /filters
  /event

 how do I keep track? The first result value is outside a filters
 section and the rest are. Do you mean something like:

 def startElement(self, name, attrs):
 if name == 'event':
 self.eventTime = attrs.get('eventTimestamp',)
 self.eventUniqueId = attrs.get('uniqueId', )
 if name == 'result':
 self.resultValue = attrs.get('value',)
 if name == filters:
 if name == 'result':
 self.resultValueF = attrs.get('value',)
 return

I was thinking about something like:

self.filterIndex = 0

in startElement:

if name == 'filter':
   self.filterIndex += 1
   return
if name == 'result' and self.filterIndex == 1:
   ...  = attrs.get('value', '')

in EndElement

   if name == 'filters':
  self.filterIndex = 0

If you want the result of the first filter in filters

-- 
John Bokma

Read my blog: http://johnbokma.com/
Hire me (Perl/Python): http://castleamber.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: xml.sax parsing elements with the same name

2010-01-11 Thread amadain
On Jan 11, 9:03 pm, John Bokma j...@castleamber.com wrote:
 amadain mfmdev...@gmail.com writes:
  On Jan 11, 7:26 pm, John Bokma j...@castleamber.com wrote:
  amadain mfmdev...@gmail.com writes:
   event eventTimestamp=2009-12-18T08:22:49.035
   uniqueId=1261124569.35725_PFS_1_1340035961
      result value=Blocked/
         filters
             filter code=338 type=Filter_Name
                 diagnostic
                      result value=Triggered/
                 /diagnostic
             /filter
             filter code=338 type=Filter_Name
                 diagnostic
                      result value=Blocked/
                 /diagnostic
             /filter
         /filters
   /event
  how do I keep track? The first result value is outside a filters
  section and the rest are. Do you mean something like:

      def startElement(self, name, attrs):
          if name == 'event':
              self.eventTime = attrs.get('eventTimestamp',)
              self.eventUniqueId = attrs.get('uniqueId', )
          if name == 'result':
                  self.resultValue = attrs.get('value',)
          if name == filters:
              if name == 'result':
                  self.resultValueF = attrs.get('value',)
          return

 I was thinking about something like:

 self.filterIndex = 0

 in startElement:

     if name == 'filter':
        self.filterIndex += 1
        return
     if name == 'result' and self.filterIndex == 1:
        ...  = attrs.get('value', '')

 in EndElement

    if name == 'filters':
       self.filterIndex = 0

 If you want the result of the first filter in filters

 --
 John Bokma

 Read my blog:http://johnbokma.com/
 Hire me (Perl/Python):http://castleamber.com/e e

Thank you. I will try that
-- 
http://mail.python.org/mailman/listinfo/python-list