Re: Accumulo Data Storage Efficiency

Clint Green Thu, 12 Jul 2012 08:09:03 -0700

Could you use Culvert to control the indexing across platforms?

On Thu, Jul 12, 2012 at 8:57 AM, William Slacum <[email protected]> wrote:


> It'd be nice to see some numbers, but I also think it's important to
> account for use cases. Doing secondary indexing on records/files,
> metadata extraction and document storage will increase the raw storage
> required by some factor. Then, it's all compressed in various ways
> (ie, at the RFile level, at the HDFS block level)!
>
> Could we try to define some rudimentary structure that we'd put the
> data in? Like just create a term index on it, since I know HBase and
> Cassandra should be able to handle that.
>
> On Thu, Jul 12, 2012 at 6:42 AM, David Medinets
> <[email protected]> wrote:
> > Are there any published numbers for the amount of disk space used by
> > Accumulo versus other products? I'm thinking some dataset like dbpedia
> > or something from http://books.google.com/ngrams/datasets. If there is
> > not such a comparison, what comparisons would you like to see? What
> > about WordNet stored in CSV, MySQL, Cassandra, HBase, and Accumulo?
> > WordNet is just a large set of CSV files so it would be a good
> > candidate for this concept, I think.
>

Re: Accumulo Data Storage Efficiency

Reply via email to