Re: Major Compacting ISAMs

Adam Fuchs Sat, 28 Jul 2012 08:01:51 -0700

John is spot on. However, there's one additional implication to mention,
which is that you need to pick a table structure that doesn't require
adding more data to the same tablet over time if you are continuing to
write new data to your table. Depending on what type of indexing you would
like to use, this generally requires using a document-partitioned structure
like that used in the WikiSearch example:
http://accumulo.apache.org/example/wikisearch.html


For some problems (like building a graph or an RDF triple store) this isn't
really feasible, and you will eventually need to major compact.

Cheers,
Adam


On Fri, Jul 27, 2012 at 11:35 AM, John Armstrong <[email protected]> wrote:

> On 07/27/2012 11:23 AM, Hugh Xedni wrote:
>
>> If I load sorted key-value map or ISAM files into HDFS via bulk loading,
>> how can I ensure only one file will be assigned to a tablet and major
>> compaction is avoided?
>>
>
> I think (and those more knowledgeable will correct me if I'm wrong) that
> you could achieve this by
>
> (a) making sure that all your bulk-load files contain non-overlapping
> Accumulo key ranges and are
>
> (b) each smaller than the maximum tablet size on the table, and
>
> (c) setting the table splits to the file key range boundaries before bulk
> importing.
>
> These should be sufficient conditions, though possibly (likely?) not
> necessary.
>
> hth
>

Re: Major Compacting ISAMs

Reply via email to