I think I can understand what your query is doing, but, I'm just
guessing too.
What does your data in Accumulo look like? The only way I'm seeing that
you would be able to implement this fully in Accumulo would be if your
student_id is the leading component in the Accumulo rowId. The
student_id anywhere else would require some multi-level computation
(involving an additional aggregation client-side).
Hoping that your data is in this form, a first implementation could be:
1. WholeRowIterator (collapse an entire row into one key-value pair)
2. Custom Filter (remove rows which do not match your criteria)
3. Custom transformation (permute the row into your 'np2' and 'shared'
columns)
Once you get the above working, there are a number of optimizations
which you could do further (avoid serializing rows you're going to
filter out or avoid the intermediate serialization entirely).
Yamini Joshi wrote:
Hi Dylan
This is what I'm trying to do:
#groupby id and create 2 new columns: np2 and shared
query = {'$group': {'_id': '$student_id', 'np2': {'$first': '$count'},
'shared': {'$sum': 1}}}
The statement written above is one of the stages in a mongo aggregate
query. The results of allthe stages are computed on the server side and
the final result returned to the user.
My problem is: I can't figure out 2 things:
1. How to add new columns while writing a Combiner/iterator
2. How to do group by (based on a condition since data in accumulo is
always stored in a group).
Best regards,
Yamini Joshi
On Sun, Sep 25, 2016 at 5:18 PM, Dylan Hutchison
<[email protected] <mailto:[email protected]>> wrote:
Hi Yamini,
Could you further describe the computation you have in mind, for
those of us not familiar with MongoDB's "Aggr" function? You may
want to look at Accumulo's built-in Combiner iterators
<https://accumulo.apache.org/1.8/accumulo_user_manual#_combiners>.
They seem more relevant than Filters.
I don't know what you mean when you write that your output is not
visible to "the complete Database".
Regards, Dylan
On Sun, Sep 25, 2016 at 11:34 AM, Yamini Joshi
<[email protected] <mailto:[email protected]>> wrote:
Hello everyone
I wanted to know if there is any equivalent of Mongo Aggr
queries in Acuumulo. I have a complex query in form of a Mongo
aggregate (multi-staged) query. I'm trying to model the same in
Accumulo. As of know, with the limited knowledge that I have, I
have created a class extending Filter class. My question is:
since my queries depend on a input, is there any other way of
using the iterators/filters only for one query or change their
input with every single query? As of now, my filter is getting
attached to the table on 'SCAN' that means the output will be
visible to the subsequent queries and not the complete Database.
Best regards,
Yamini Joshi