Re: micro compaction

2015-06-10 Thread John Vines
:*Russ Weeks [mailto:rwe...@newbrightidea.com] *Sent:* 09 June 2015 20:54 *To:* accumulo-user *Subject:* Re: micro compaction For consistency and ease of implementation. Say I've written a stack of combiners that do statistical aggregation, sampling etc. on my table. Rather than port

Re: micro compaction

2015-06-10 Thread Josh Elser
mailto:rwe...@newbrightidea.com] *Sent:* 09 June 2015 20:54 *To:* accumulo-user *Subject:* Re: micro compaction For consistency and ease of implementation. Say I've written a stack of combiners that do statistical aggregation, sampling etc. on my table

Re: micro compaction

2015-06-10 Thread Josh Elser
qualifiers inside is just 10-30% faster than 19 individual mutations. *From:*Russ Weeks [mailto:rwe...@newbrightidea.com] *Sent:* 09 June 2015 20:54 *To:* accumulo-user *Subject:* Re: micro compaction For consistency and ease of implementation. Say I've written a stack of combiners that do

micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
. The obvious improvement is to do some calculations in-memory before sending mutations to Accumulo. Of course, at the same time we are looking for a solution to minimize development effort. I guess I am asking about micro compaction/ingest-time iterators on the client side (before data is sent

Re: micro compaction

2015-06-09 Thread Russ Weeks
to Accumulo) , I can reduce overall number of mutations by 1000x or so -Original Message- From: Josh Elser [mailto:josh.el...@gmail.com] Sent: 09 June 2015 16:54 To: user@accumulo.apache.org Subject: Re: micro compaction Well, you win the prize for new terminology. I haven't ever

Re: micro compaction

2015-06-09 Thread Josh Elser
Well, you win the prize for new terminology. I haven't ever heard the term micro compaction before. Can you clarify though, you say hundreds of millions of mutations that result in megabytes of data. Is that an increase or decrease in size. Comparing apples to oranges :) roman.drap

RE: micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
@accumulo.apache.org Subject: Re: micro compaction Well, you win the prize for new terminology. I haven't ever heard the term micro compaction before. Can you clarify though, you say hundreds of millions of mutations that result in megabytes of data. Is that an increase or decrease in size. Comparing apples

Re: micro compaction

2015-06-09 Thread Adam Fuchs
:54 To: user@accumulo.apache.org Subject: Re: micro compaction Well, you win the prize for new terminology. I haven't ever heard the term micro compaction before. Can you clarify though, you say hundreds of millions of mutations that result in megabytes of data. Is that an increase

Re: micro compaction

2015-06-09 Thread Christopher
[mailto:josh.el...@gmail.com] Sent: 09 June 2015 16:54 To: user@accumulo.apache.org Subject: Re: micro compaction Well, you win the prize for new terminology. I haven't ever heard the term micro compaction before. Can you clarify though, you say hundreds of millions of mutations that result

RE: micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
: Re: micro compaction I think this might be the same concept as in-mapper combining, but applied to data being sent to a BatchWriter rather than an OutputCollector. See [1], section 3.1.1. A similar performance analysis and probably a lot of the same code should apply here. Cheers, Adam [1

Re: micro compaction

2015-06-09 Thread Russ Weeks
...@apache.org] *Sent:* 09 June 2015 19:08 *To:* user@accumulo.apache.org *Subject:* Re: micro compaction I think this might be the same concept as in-mapper combining, but applied to data being sent to a BatchWriter rather than an OutputCollector. See [1], section 3.1.1. A similar performance

RE: micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
% faster than 19 individual mutations. From: Russ Weeks [mailto:rwe...@newbrightidea.com] Sent: 09 June 2015 20:54 To: accumulo-user Subject: Re: micro compaction For consistency and ease of implementation. Say I've written a stack of combiners that do statistical aggregation, sampling etc

Re: micro compaction

2015-06-09 Thread Keith Turner
). *From:* Russ Weeks [mailto:rwe...@newbrightidea.com] *Sent:* 09 June 2015 20:54 *To:* accumulo-user *Subject:* Re: micro compaction For consistency and ease of implementation. Say I've written a stack of combiners that do statistical aggregation, sampling etc. on my table. Rather than

RE: micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
(or do multuple compare of the row for each column in the mutation). From: Russ Weeks [mailto:rwe...@newbrightidea.commailto:rwe...@newbrightidea.com] Sent: 09 June 2015 20:54 To: accumulo-user Subject: Re: micro compaction For consistency and ease of implementation. Say I've written a stack

Re: micro compaction

2015-06-09 Thread David Medinets
@accumulo.apache.org *Subject:* Re: micro compaction I think this might be the same concept as in-mapper combining, but applied to data being sent to a BatchWriter rather than an OutputCollector. See [1], section 3.1.1. A similar performance analysis and probably a lot of the same code should

RE: micro compaction

2015-06-09 Thread roman.drap...@baesystems.com
Thanks a lot, will give a try! From: Keith Turner [mailto:ke...@deenlo.com] Sent: 09 June 2015 22:28 To: user@accumulo.apache.org Subject: Re: micro compaction On Tue, Jun 9, 2015 at 5:10 PM, roman.drap...@baesystems.commailto:roman.drap...@baesystems.com roman.drap

Re: micro compaction

2015-06-09 Thread Chris Bennight
there is a relatively easy way to do this with Accumulo or whether it’s time to look closer into something like Spark. Thanks Roman *From:* Adam Fuchs [mailto:afu...@apache.org] *Sent:* 09 June 2015 19:08 *To:* user@accumulo.apache.org *Subject:* Re: micro compaction I think this might