On Wed, Jun 6, 2012 at 10:50 PM, Josh Elser <[email protected]> wrote: > Aside from losing the hierarchy > knowledge, if you have a skewed distribution of elements in the XML > document, you can't get good locality in your query/analytic. What was your > idea behind storing the offsets?
<RECORDS> <RECORD> <KEY_FIELD/> <TAG/> </RECORD> <RECORD> <KEY_FIELD/> <TAG/> </RECORD> </RECORDS> My XML looks like that. I don't know how the information in the XML will be used in the future and I don't want to re-scan large numbers of XML to find a single record. For example, yesterday we found a potential bug. My bug analysis showed the source data was in record X of 450,000 records. Since I know which XML file held that record, I was able to get that file locally and use command-line tools to find surrounding information. My XML file might have 200 tags but normally I only need 45 of them. My XML is without hierarchy.
