Re: XML Storage - Accumulo or HDFS

David Medinets Thu, 07 Jun 2012 04:30:35 -0700

On Wed, Jun 6, 2012 at 10:50 PM, Josh Elser <[email protected]> wrote:
>  Aside from losing the hierarchy
> knowledge, if you have a skewed distribution of elements in the XML
> document, you can't get good locality in your query/analytic. What was your
> idea behind storing the offsets?


<RECORDS>
 <RECORD>
  <KEY_FIELD/>
  <TAG/>
 </RECORD>
 <RECORD>
  <KEY_FIELD/>
  <TAG/>
 </RECORD>
</RECORDS>

My XML looks like that. I don't know how the information in the XML
will be used in the future and I don't want to re-scan large numbers
of XML to find a single record. For example, yesterday we found a
potential bug. My bug analysis showed the source data was in record X
of 450,000 records. Since I know which XML file held that record, I
was able to get that file locally and use command-line tools to find
surrounding information. My XML file might have 200 tags but normally
I only need 45 of them. My XML is without hierarchy.

Re: XML Storage - Accumulo or HDFS

Reply via email to