You can have 3 different bolts in storm and give like this
bolt1 group by(col1), bolt2 group by (col1, col2), bolt3 group by
(col1, col2, col3)

bolt1 will give o/p as you are expecting
a1            {43,136}
a2            {52,77}
a4            {99,66}

bolt2 will give o/p as
a1
    --b1        {21,33}
    --b2        {22,103}
a2
    --b1        {30,44}
    --b3        {22,33}
a4
    --b4        {99,66}

Rest individual columns will be bolt3 o/p.

You cannot have dynamic grouping




On 1/31/14, Kavi Kumar <[email protected]> wrote:
> I have rows in BigData DB (Cassandra in my case) with column names
> col1,col2,col3,val1,val2
>
> in SQL approach I can do group by col1,col2 or col2,col1 or any other
> possible way also. This way I can form tree hierarchy easily.
>
> But now we are using Cassandra to store the data which doesnt support group
> by. So we want to use Storm for doing group by and aggregations.
> We wrote some sample code do aggregation and group by, but we are unable to
> form an opinion whether we can achieve it or not.
>
> Data looks like this
>
>
> col1,col2,col3,val1,val2
> ------------------------
> a1,b1,c1,10,20
> a1,b1,c2,11,13
> a1,b2,c1,9,15
> a1,b2,c3,13,88
> a2,b1,c1,30,44
> a2,b3,c2,22,33
> a4,b4,c4,99,66
>
>
> Like in excel pivot I want to build hierarchy
> root->child1->child2->child3-val1,val2 then it may look like this if my
> hierarchy is col1->col2->col3
>
>
> a1            {43,136}
>     --b1        {21,33}
>         --c1    10,20
>         --c2    11,13
>     --b2        {22,103}
>         --c1    9,15
>         --c3    13,88
> a2            {52,77}
>     --b1        {30,44}
>         --c1    30,44
>     --b3        {22,33}
>         --c2    22,33
> a4            {99,66}
>     --b4        {99,66}
>         --c4    99,66
>
>
> I want to give user functionality to re-arrange hierarchy elements
> something like col3->col1->col2 (or something else also, which is dynamic)
> in this case data will look like this
>
> c1            {49,79}
>     --a1        {19,35}
>         --b1    10,20
>         --b2    9,15
>     --a2        {30,44}
>         --b1    30,44
> c2            {11,13}
>     --a1        {11,13}
>         --b1    11,13
>     --a2        {22,33}
>         --b3    22,33
> c3            {13,88}
>     --a1        {13,88}
>         --b2    13,88
> c4            {99,66}
>     --a4        {99,66}
>         --b4    99,66
>
>
>
> Few lines of my trident code looks like this, which is not working as
> expected.
>
>
> topology.newStream("aggregation", spout).groupBy(new Fields(
> "col1","col2","col3","val1","val2")).aggregate(new Fields(
> "val1","val2"), new Sum(), new Fields("val1sum","val2sum"))
> .each(new Fields("col1","col2","col3","val1sum","val2sum"), new
> Utils.PrintFilter());
>
>
> For doing above transformations I want to use Storm with or without Trident
> API support.
> Can anyone guide me how to achive it? Any program ideas are very much
> appreciated.
>

Reply via email to