Re: [xml] Two semantically identical files are being treated differently
David Grohmann wrote: Confirmed that this fixed the problem. But is this a bug since they should have both parsed the same since there was no whitespace difference in that area? Possibly, but I can't reproduce it. What OS are you using? Could you include a small program that reads first.xml and second.xml and outputs the first child of PruningSection that exhibits the problem? Jason ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Two semantically identical files are being treated differently
using libxml2 2.6.26 David Grohmann wrote: These 2 files have an element PruningSection which contain en element Channels that contains some text. These 2 files are extremely similar and I dont think they should be creating a node Tree any different than each other. 1) Why are these coming out different? 2) Is there a method (or group of methods put together) that will take a PruningSection node and return the Channels node no matter which of these 2 trees are passed in? (other than trying to fallow one pointer path vs trying to fallow the other path) something like give_me_the_node_below( parent_node, name_of_child_node) so more concretely using the node variable from below xmlNodePtr channel_node = give_me_the_node_below( node, Channels) Thank you - David Trees described below: first.xml: (gdb) p *node $40 = { _private = 0x0, type = XML_ELEMENT_NODE, name = 0x18489a3 PruningSection, children = 0x1849c00, last = 0x1849da0, parent = 0x1849180, next = 0x1849e20, prev = 0x18493f0, doc = 0x1848d20, ns = 0x1848f90, content = 0x0, properties = 0x1849510, nsDef = 0x0, psvi = 0x0, line = 11, extra = 0 } (gdb) p *node-children $45 = { _private = 0x0, type = XML_TEXT_NODE, name = 0x2a9bb827c6 text, children = 0x0, last = 0x0, parent = 0x1849490, next = 0x1849c80, prev = 0x0, doc = 0x1848d20, ns = 0x0, content = 0x1848a55 \n, properties = 0x0, nsDef = 0x0, psvi = 0x0, line = 0, extra = 0 } (gdb) p *node-children-next $46 = { _private = 0x0, type = XML_ELEMENT_NODE, name = 0x1848a5b Channels, children = 0x1849d00, last = 0x1849d00, parent = 0x1849490, next = 0x1849da0, prev = 0x1849c00, doc = 0x1848d20, ns = 0x1848f90, content = 0x0, properties = 0x0, nsDef = 0x0, psvi = 0x0, line = 12, extra = 0 } (gdb) p *node-children-next-children $47 = { _private = 0x0, type = XML_TEXT_NODE, name = 0x2a9bb827c6 text, children = 0x0, last = 0x0, parent = 0x1849c80, next = 0x0, prev = 0x0, doc = 0x1848d20, ns = 0x0, content = 0x1849d80 1:3,10:15,20, properties = 0x0, nsDef = 0x0, psvi = 0x0, line = 0, extra = 0 } - second.xml: (gdb) p *node $48 = { _private = 0x0, type = XML_ELEMENT_NODE, name = 0x18a6b7f PruningSection, children = 0x1864ff0, last = 0x1864ff0, parent = 0x18645e0, next = 0x0, prev = 0x0, doc = 0x18a6f00, ns = 0x18a7170, content = 0x0, properties = 0x1864940, nsDef = 0x0, psvi = 0x0, line = 3, extra = 0 } (gdb) p *node-children $49 = { _private = 0x0, type = XML_ELEMENT_NODE, name = 0x18a6c31 Channels, children = 0x1865070, last = 0x1865070, parent = 0x18648c0, next = 0x0, prev = 0x0, doc = 0x18a6f00, ns = 0x18a7170, content = 0x0, properties = 0x0, nsDef = 0x0, psvi = 0x0, line = 4, extra = 0 } (gdb) p *node-children-children $50 = { _private = 0x0, type = XML_TEXT_NODE, name = 0x2a9bb827c6 text, children = 0x0, last = 0x0, parent = 0x1864ff0, next = 0x0, prev = 0x0, doc = 0x18a6f00, ns = 0x0, content = 0x18650f0 1:3,10:15,20, properties = 0x0, nsDef = 0x0, psvi = 0x0, line = 0, extra = 0 } ?xml version=1.0 encoding=UTF-8? NominalPruning xmlns=http://www.arlut.utexas.edu/esl/SPEAR/library/xml/NominalPruning-v1; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://www.arlut.utexas.edu/esl/SPEAR/library/xml/NominalPruning-v1 /opt/spear-dev/share/schemas/parameters-nominal-pruning-v1.xsd PruningSection index=0 frequency_start_in_hertz=0 frequency_stop_in_hertz=10 azimuth_start_in_degrees=5 azimuth_stop_in_degrees=15 range_start_in_meters=1 range_stop_in_meters=INF Channels1:3,10:15,20/Channels /PruningSection /NominalPruning ?xml version=1.0 encoding=UTF-8? NominalPruning xmlns=http://www.arlut.utexas.edu/esl/SPEAR/library/xml/NominalPruning-v1; xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance; xsi:schemaLocation=http://www.arlut.utexas.edu/esl/SPEAR/library/xml/NominalPruning-v1 /opt/spear-dev/share/schemas/parameters-nominal-pruning-v1.xsd PruningSection index=0 frequency_start_in_hertz=0.0E1 frequency_stop_in_hertz=10 range_start_in_meters=1 range_stop_in_meters=INF azimuth_start_in_degrees=5 azimuth_stop_in_degrees=15 Channels1:3,10:15,20/Channels /PruningSection /NominalPruning ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml -- David Grohmann Senior Student Associate Applied Research Lab : UT Austin : ESL - S206 Office: 512-835-3237 ___ xml mailing
Re: [xml] Two semantically identical files are being treated differently
David Grohmann wrote: These 2 files have an element PruningSection which contain en element Channels that contains some text. These 2 files are extremely similar and I dont think they should be creating a node Tree any different than each other. Yes, they should. Whitespace between XML elements is significant, even though it's handy to ignore 'indenting' whitespace. If you want to ignore it, pass XML_PARSE_NOBLANKS when creating the doc. Jason ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Two semantically identical files are being treated differently
Jason Viers wrote: David Grohmann wrote: These 2 files have an element PruningSection which contain en element Channels that contains some text. These 2 files are extremely similar and I dont think they should be creating a node Tree any different than each other. Yes, they should. My apologies, I looked at the examples too quickly and didn't notice the extra line breaks only occurred between attributes, not elements. Are you sure they're being read in the same manner? When I read them both have XML_TEXT_NODE as the the first child. Is one being given XML_PARSE_NOBLANKS? Jason ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Two semantically identical files are being treated differently
Jason Viers wrote: Jason Viers wrote: David Grohmann wrote: These 2 files have an element PruningSection which contain en element Channels that contains some text. These 2 files are extremely similar and I dont think they should be creating a node Tree any different than each other. Yes, they should. My apologies, I looked at the examples too quickly and didn't notice the extra line breaks only occurred between attributes, not elements. Are you sure they're being read in the same manner? When I read them both have XML_TEXT_NODE as the the first child. Is one being given XML_PARSE_NOBLANKS? Jason ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml the call I'm using is this xmlReadFile( document_filename, NULL, XML_PARSE_PEDANTIC ) are you saying that if i change XML_PARSE_PEDANTIC to XML_PARSE_NOBLANKS it will parse them exactly the same? -- David Grohmann Senior Student Associate Applied Research Lab : UT Austin : ESL - S206 Office: 512-835-3237 ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Two semantically identical files are being treated differently
Jason Viers wrote: Jason Viers wrote: David Grohmann wrote: These 2 files have an element PruningSection which contain en element Channels that contains some text. These 2 files are extremely similar and I dont think they should be creating a node Tree any different than each other. Yes, they should. My apologies, I looked at the examples too quickly and didn't notice the extra line breaks only occurred between attributes, not elements. Are you sure they're being read in the same manner? When I read them both have XML_TEXT_NODE as the the first child. Is one being given XML_PARSE_NOBLANKS? Jason ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml Enum xmlParserOption { XML_PARSE_RECOVER = 1 : recover on errors XML_PARSE_NOENT = 2 : substitute entities XML_PARSE_DTDLOAD = 4 : load the external subset XML_PARSE_DTDATTR = 8 : default DTD attributes XML_PARSE_DTDVALID = 16 : validate with the DTD XML_PARSE_NOERROR = 32 : suppress error reports XML_PARSE_NOWARNING = 64 : suppress warning reports XML_PARSE_PEDANTIC = 128 : pedantic error reporting XML_PARSE_NOBLANKS = 256 : remove blank nodes XML_PARSE_SAX1 = 512 : use the SAX1 interface internally XML_PARSE_XINCLUDE = 1024 : Implement XInclude substitition XML_PARSE_NONET = 2048 : Forbid network access XML_PARSE_NODICT = 4096 : Do not reuse the context dictionnary XML_PARSE_NSCLEAN = 8192 : remove redundant namespaces declarations XML_PARSE_NOCDATA = 16384 : merge CDATA as text nodes XML_PARSE_NOXINCNODE = 32768 : do not generate XINCLUDE START/END nodes XML_PARSE_COMPACT = 65536 : compact small text nodes; no modification of the tree allowed afterwards (will possibly crash if you try to modify the tree) } so I guess I should say that if I OR them together and then call it like this? xmlReadFile( document_filename, NULL, XML_PARSE_PEDANTIC | XML_PARSE_NOBLANKS ) Also lets say this works. I'm about to try it.But is there a better way to get at that data than manually following pointers like I was showing in the GDB prompts? Thank you, -- David Grohmann Senior Student Associate Applied Research Lab : UT Austin : ESL - S206 Office: 512-835-3237 ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Two semantically identical files are being treated differently
David Grohmann wrote: the call I'm using is this xmlReadFile( document_filename, NULL, XML_PARSE_PEDANTIC ) are you saying that if i change XML_PARSE_PEDANTIC to XML_PARSE_NOBLANKS it will parse them exactly the same? They SHOULD be parsing the same anyway, don't know why they're not. But it sounds like you want to ignore the blank text node in between the elements, so adding that should make them both ignore it. You don't have to get rid of PEDANTIC if you want, you can OR them together for multiple options. xmlReadFile( document_filename, NULL, XML_PARSE_PEDANTIC | XML_PARSE_NOBLANKS) Jason ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Two semantically identical files are being treated differently
David Grohmann wrote: But is there a better way to get at that data than manually following pointers like I was showing in the GDB prompts? Nope, following children and next pointers is the way you get around the trees. Jason ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Two semantically identical files are being treated differently
On Wed, 2007-21-02 at 15:59 -0600, David Grohmann wrote: [...] .But is there a better way to get at that data than manually following pointers like I was showing in the GDB prompts? The XPath interface is higher level, if that helps. Liam -- Liam Quin - XML Activity Lead, W3C, http://www.w3.org/People/Quin/ Pictures from old books: http://fromoldbooks.org/ Ankh: irc.sorcery.net irc.gnome.org www.advogato.org ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml
Re: [xml] Two semantically identical files are being treated differently
Jason Viers wrote: David Grohmann wrote: But is there a better way to get at that data than manually following pointers like I was showing in the GDB prompts? Nope, following children and next pointers is the way you get around the trees. Jason ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml Confirmed that this fixed the problem. But is this a bug since they should have both parsed the same since there was no whitespace difference in that area? -- David Grohmann Senior Student Associate Applied Research Lab : UT Austin : ESL - S206 Office: 512-835-3237 ___ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml