Could you use Culvert to control the indexing across platforms? On Thu, Jul 12, 2012 at 8:57 AM, William Slacum <[email protected]> wrote:
> It'd be nice to see some numbers, but I also think it's important to > account for use cases. Doing secondary indexing on records/files, > metadata extraction and document storage will increase the raw storage > required by some factor. Then, it's all compressed in various ways > (ie, at the RFile level, at the HDFS block level)! > > Could we try to define some rudimentary structure that we'd put the > data in? Like just create a term index on it, since I know HBase and > Cassandra should be able to handle that. > > On Thu, Jul 12, 2012 at 6:42 AM, David Medinets > <[email protected]> wrote: > > Are there any published numbers for the amount of disk space used by > > Accumulo versus other products? I'm thinking some dataset like dbpedia > > or something from http://books.google.com/ngrams/datasets. If there is > > not such a comparison, what comparisons would you like to see? What > > about WordNet stored in CSV, MySQL, Cassandra, HBase, and Accumulo? > > WordNet is just a large set of CSV files so it would be a good > > candidate for this concept, I think. >
