On Tue, 2008-04-01 at 08:35 -0400, Doran, Harold wrote:
> David et al:
> 
> Attached is a sample xml file. Below is my python code. I am using
> python 2.5.2 on a Windows XP machine.
> 
> Test.py
> from xml.etree.ElementTree import ElementTree as ET
> 
> # create a new file defined by the user
> f = open('output.txt', 'w')
> 
> et = ET(file='g:\python\ml\out_g3r_b2.xml')
> 
> for statentityref in
> et.findall('admin/responseanalyses/analysis/analysisdata/statentityref')
> :
>    for statval in
> et.findall('admin/responseanalyses/analysis/analysisdata/statentityref/s
> tatval'):
>       print >> f, statentityref.attrib['id'], '\t',
> statval.attrib['type'], '\t', statval.attrib['value']     
>       
> f.close()
> 
> If you run this you will see the output organized almost exactly as I
> need it. But, there is a bug in my program, which I suspect is in the
> order in which I am looping. For example, here is a snippet of output
> from the file output.txt. I've added in some comments so you can see
> where I am struggling.
> 
> 9568  OmitCount       0.000000 # This is correct
> 9568  NotReachedCount         0.000000 # This is correct
> 9568  PolyserialCorrelation   0.602525 # This is correct
> 9568  AdjustedPolyserial      0.553564 # This is correct
> 9568  AverageScore    0.817348 # This is correct
> 9568  StdevItemScore  0.386381 # This is correct
> 9568  OmitCount       0.000000 # This is NOT correct
> 9568  NotReachedCount         0.000000 # This is NOT correct
> 9568  PolyserialCorrelation   0.672088 # This is NOT correct
> 9568  AdjustedPolyserial      0.590175 # This is NOT correct
> 9568  AverageScore    1.034195 # This is NOT correct
> 9568  StdevItemScore  0.926668 # This is NOT correct
> 
> Now, here is what *should* be returned. Note that I have manually
> changed the item id (the number preceding the text) to 9569. The data
> are pulled in correctly, but for some reason I am not looping properly
> to get the correct item ID to line up with its corresponding data.
> 
> 9568  OmitCount       0.000000
> 9568  NotReachedCount         0.000000
> 9568  PolyserialCorrelation   0.602525
> 9568  AdjustedPolyserial      0.553564
> 9568  AverageScore    0.817348
> 9568  StdevItemScore  0.386381
> 9569  OmitCount       0.000000    # Note the item ID has been modified
> here and below.
> 9569  NotReachedCount         0.000000
> 9569  PolyserialCorrelation   0.672088
> 9569  AdjustedPolyserial      0.590175
> 9569  AverageScore    1.034195
> 9569  StdevItemScore  0.926668
> 
> Last, notice the portion of code
> 
> admin/responseanalyses/analysis/analysisdata/statentityref')
> 
> I know this is what to use only because I manually went through the xml
> file to examine its hierarchical structure. I assume this is bad
> pratice. Is there a way to examine the parent-child structure of an XML
> file in python so I can see the hierarchical structure?
> 
> Thanks,
> Harold

If you keep looking in your probably massive output file, you'll also
find the same results under 9569, 9567, 9571, and all your other
statentityrefs.  In the following code:

for statentityref in \
et.findall('admin/responseanalyses/analysis/analysisdata/statentityref'):
   for statval in \
et.findall('admin/responseanalyses/analysis/analysisdata/statentityref/statval'):
       print >> f, statentityref.attrib['id'], '\t', statval.attrib['type'], \
            '\t', statval.attrib['value']     

there is nothing limiting statval to within statentityref, so for each
statentityref, you get all the statvals from *every* statentityref.  Try
something like this:

for statentityref in \
et.findall('admin/responseanalyses/analysis/analysisdata/statentityref'):
    for statval in statentityref.findall('statval'):
        do(stuff)

Note that now the xpath from which you get statval is limited to
searching within the current statentityref, and takes that statentityref
as its context node.

Or, if you want to shorten up your code lines a bit, break out part of
your xpath.

analysisdata = et.findall('admin/responeanalyses/analysis/analysisdata')
for statentityref in analysisdata.findall('statentityref'):
    for statval in statentityref.findall('statval'):
        do(stuff)


Cheers,
Cliff


_______________________________________________
XML-SIG maillist  -  XML-SIG@python.org
http://mail.python.org/mailman/listinfo/xml-sig

Reply via email to