kalyanasundaram wrote: > On Thu, 2007-06-28 at 11:00 +0200, Stefan Behnel wrote: > >> kalyanasundaram wrote: >> >>> On Thu, 2007-06-28 at 09:10 +0200, Stefan Behnel wrote: >>> >>>> kalyanasundaram s wrote: >>>> >>>>> I need to parse a huge xml file for a specific set of nodes. Is there >>>>> any method like getElementByName is available in libxml. I could not >>>>> find it in the document. Does it exists with some other name? >>>>> Otherwise i will have to travel in the entire tree. It is inefficient. >>>>> >>>> No it's not. It just depends on the implementation of getElementByName. :) >>>> >>>> You can always use XPath to find the tag. However, if your XML tree is >>>> really >>>> so big (note that XPath is pretty fast, so I'd try it first) you may >>>> consider >>>> building an index of the tree, i.e. some kind of data structure that maps >>>> tag >>>> names to a list of node pointers. >>>> >>>> Note that there is also a hash map implementation in libxml2, see hash.c. >>>> >>>> Stefan >>>> >>> Thanks for your information. I need to parse the xml file only once. >>> >> Do you mean: parse it once, keep it in memory and keep doing lots of things >> with it? Or rather: parse it once, extract what you need and then throw it >> away? >> >> > Yeh, in my case parse it once and update few nodes and save it as > another document. Nothing more than that. > > >> In the first case: build an index. In the latter: Either read it in, traverse >> it and extract what you need (possibly with XPath), or read it in with SAX >> and >> extract what you need while parsing. Depends on whether you need a tree to >> know what you need or not. >> >> >> >>> So which would be better? XPath or linear traversing? >>> I dont know much about XPath implementation. (Do they not traverse >>> atleast once?) The file size is about 500 KB. :) >>> >> That sounds rather small. Just parse it in and walk through it, that's what >> I'd do. >> >> > Really! I thought 500 Kb is bigger. How much it would be able to handle? > At what size I should go for XPath ? >
It's a very subjective call, which is dependent on the structure of the xml itself, the XPath being executed, and the specs of the machine running it, and what you consider "fast enough". I'd recommend trying to do the XPath and seeing how the performance is. As Stefan said, 500k really isn't that big. Jason _______________________________________________ xml mailing list, project page http://xmlsoft.org/ [email protected] http://mail.gnome.org/mailman/listinfo/xml
