I think I can understand what your query is doing, but, I'm just guessing too.

What does your data in Accumulo look like? The only way I'm seeing that you would be able to implement this fully in Accumulo would be if your student_id is the leading component in the Accumulo rowId. The student_id anywhere else would require some multi-level computation (involving an additional aggregation client-side).

Hoping that your data is in this form, a first implementation could be:

1. WholeRowIterator (collapse an entire row into one key-value pair)
2. Custom Filter (remove rows which do not match your criteria)
3. Custom transformation (permute the row into your 'np2' and 'shared' columns)

Once you get the above working, there are a number of optimizations which you could do further (avoid serializing rows you're going to filter out or avoid the intermediate serialization entirely).

Yamini Joshi wrote:
Hi Dylan

This is what I'm trying to do:
#groupby id and create 2 new columns: np2 and shared
  query = {'$group': {'_id': '$student_id', 'np2': {'$first': '$count'},
'shared': {'$sum': 1}}}

The statement written above is one of the stages in a mongo aggregate
query. The results of allthe stages are computed on the server side and
the final result returned to the user.

My problem is: I can't figure out 2 things:
1. How to add new columns while writing a Combiner/iterator
2. How to do group by (based on a condition since data in accumulo is
always stored in a group).


Best regards,
Yamini Joshi

On Sun, Sep 25, 2016 at 5:18 PM, Dylan Hutchison
<[email protected] <mailto:[email protected]>> wrote:

    Hi Yamini,

    Could you further describe the computation you have in mind, for
    those of us not familiar with MongoDB's "Aggr" function?  You may
    want to look at Accumulo's built-in Combiner iterators
    <https://accumulo.apache.org/1.8/accumulo_user_manual#_combiners>.
    They seem more relevant than Filters.

    I don't know what you mean when you write that your output is not
    visible to "the complete Database".

    Regards, Dylan

    On Sun, Sep 25, 2016 at 11:34 AM, Yamini Joshi
    <[email protected] <mailto:[email protected]>> wrote:


        Hello everyone

        I wanted to know if there is any equivalent of Mongo Aggr
        queries in Acuumulo. I have a complex query in form of a Mongo
        aggregate (multi-staged) query. I'm trying to model the same in
        Accumulo. As of know, with the limited knowledge that I have, I
        have created a class extending Filter class. My question is:
        since my queries depend on a input, is there any other way of
        using the iterators/filters only for one query or change their
        input with every single query? As of now, my filter is getting
        attached to the table on 'SCAN' that means the output will be
        visible to the subsequent queries and not the complete Database.

        Best regards,
        Yamini Joshi



Reply via email to