Good, I'm glad you found it useful.

The important thing to always remember is that your data is split across many tablet servers and that Iterators run local to each tablet server. As such, you cannot compute a single sum via an iterator, you can, at best, compute N intermediate sums -- one of each tabletserver the batchscanner had to talk to.

Also ignore my previous comment about a second iterator. I had assumed you were doing something fancier than selecting a single column qualifier from a row.

Since you're passing in what are likely multiple, disjoint ranges, I'm not sure you're going to get much of a performance optimization out of a custom iterator in this case. After each seek, your iterator would need to return the entries that it summed in the provided Range (the Iterator framework isn't designed to know the overall state of the scan -- you might have more data to read or you might be done. You must return the data when the data you're reading moves outside of the current range).

The way that you'd see the real optimization an Iterator provides is if you are scanning over a large, contiguous set of rows specified by a single Range (you can get the reduction of reading many key/values into a single pair returned).

If I mis-stated your situation, please do let me know.

madhvi wrote:
Hi,

Thanks for the blog you shared.I found it quite useful for my requirement.
"How are you passing these IDs to the batch scanner?"
I am passing row ids received as a previous query result from another
table as 'new Range(entry.getKey().getRow())' in a Range type list and
passing that list to batch Scanner.

"Are you trying to sum across all rows that you queried? "
Yes we need to sum a particular column qualifier across the rows ids
passed to batch scanner.How the summation can be done across the rows as
you said "you can put a second iterator "above" the first"?

Thanks
Madhvi
On Wednesday 17 June 2015 08:43 PM, Josh Elser wrote:
Madhvi,

Understood. A few more questions..

How are you passing these IDs to the batch scanner? Are you providing
individual Ranges for each ID (e.g. `new Range(new Key("row1", "",
"id1"), true, new Key("row1", "", "id1\x00"), false))`)? Or are you
providing an entire row (or set of rows) and using the
fetchColumns(Text,Text) method (or similar) on the BatchScanner?

Are you trying to sum across all rows that you queried? Or is your sum
per-row? If the former, that is going to cause you problems. The quick
explanation is that you can't reliably know the tablet boundaries so
you should try to perform an initial sum, per row. If you want, you
can put a second iterator "above" the first and do a summation across
all rows to reduce the amount of data sent to a client. However, if
you use a BatchScanner, you will still have to perform a final
summation at the client.

Check out
https://blogs.apache.org/accumulo/entry/thinking_about_reads_over_accumulo
for more details on that..

madhvi wrote:
Hi Josh,

Sorry, my company policy doesn't allow me to share full source.What we
are tryng to do is summing over a unique field stored in column
qualifier for IDs passed to batch scanner.Can u suggest how it can be
done in accumulo.

Thanks
Madhvi
On Wednesday 17 June 2015 10:32 AM, Josh Elser wrote:
You put random values in the family and qualifier? Do I misunderstand
you?

Also, if you can put up the full source for the iterator, that will be
much easier if you need help debugging it. It's hard for us to guess
at why your code might not be working as you expect.

madhvi wrote:
Hi Josh,

I have changed HashMap to TreeMap which sorts lexicographically and I
have inserted random values in column family and qualifier.Value of
TreeMap in value.
Used scanner and batch scanner but getting results only with scanner.

Thanks
Madhvi

On Tuesday 16 June 2015 08:42 PM, Josh Elser wrote:
Additionally, you're placing the Value into the ColumnQualifier and
dropping the ColumnFamily completely. Granted, that may not be a
problem for the specific data in your table, but it's not going to
work for any data.

Christopher wrote:
You're iterating over a HashMap. That's not sorted.

--
Christopher L Tubbs II
http://gravatar.com/ctubbsii


On Tue, Jun 16, 2015 at 1:58 AM, madhvi<[email protected]>
wrote:
Hi Josh,
Thanks for replying. I will enable remote debugger on my Accumulo
server.

However I am slightly confused with your statement "you are not
returning
your data in sorted order". Can you point the part in my iterator
code which
seems innapropriate and any possible solution for that?

Thanks
Madhvi


On Tuesday 16 June 2015 11:07 AM, Josh Elser wrote:
//matched the condition and put values to holder map.




Reply via email to