Hi guys,

While doing pre-analytics we generate hundreds of millions of mutations  that 
result in 1-100 megabytes of useful data after major compaction. We ingest into 
Accumulo using MR from Mapper job. We identified that performance really 
degrades while increasing a number of mutations.

The obvious improvement is to do some calculations in-memory before sending 
mutations to Accumulo.

Of course, at the same time we are looking for a solution to minimize 
development effort.

I guess I am asking about micro compaction/ingest-time iterators on the client 
side (before data is sent to Accumulo).

To my understanding, Accumulo does not support them, is it correct? And if so, 
are there any plans to support this functionality in the future?

Thanks
Roman


Please consider the environment before printing this email. This message should 
be regarded as confidential. If you have received this email in error please 
notify the sender and destroy it immediately. Statements of intent shall only 
become binding when confirmed in hard copy by an authorised signatory. The 
contents of this email may relate to dealings with other companies under the 
control of BAE Systems Applied Intelligence Limited, details of which can be 
found at http://www.baesystems.com/Businesses/index.htm.

Reply via email to