John is spot on. However, there's one additional implication to mention, which is that you need to pick a table structure that doesn't require adding more data to the same tablet over time if you are continuing to write new data to your table. Depending on what type of indexing you would like to use, this generally requires using a document-partitioned structure like that used in the WikiSearch example: http://accumulo.apache.org/example/wikisearch.html
For some problems (like building a graph or an RDF triple store) this isn't really feasible, and you will eventually need to major compact. Cheers, Adam On Fri, Jul 27, 2012 at 11:35 AM, John Armstrong <[email protected]> wrote: > On 07/27/2012 11:23 AM, Hugh Xedni wrote: > >> If I load sorted key-value map or ISAM files into HDFS via bulk loading, >> how can I ensure only one file will be assigned to a tablet and major >> compaction is avoided? >> > > I think (and those more knowledgeable will correct me if I'm wrong) that > you could achieve this by > > (a) making sure that all your bulk-load files contain non-overlapping > Accumulo key ranges and are > > (b) each smaller than the maximum tablet size on the table, and > > (c) setting the table splits to the file key range boundaries before bulk > importing. > > These should be sufficient conditions, though possibly (likely?) not > necessary. > > hth >
