Re: extract dynamic number of tuples from bags

Dmitriy Ryaboy Thu, 14 Oct 2010 13:33:32 -0700

I see.. I think you can flatten each row in data to un-nest, so you will get
(["amenity","parking"],["name","Vuurtoren"]); then for each resulting row
call ToBag(*), getting ({["amenity","parking"],["name","Vuurtoren"]}); then
flatten *that*, getting a row per pair. Now you can  group and count.


Haven't tried it, let me know how it goes.

-D

On Thu, Oct 14, 2010 at 12:09 PM, Kim Vogt <[email protected]> wrote:

> Hey Dmitriy,
>
> I tried to keep my example simple but maybe that doesn't work so here it
> goes.  I'm trying to do group/counts on the "tag" key/values in json data
> that looks like this:
>
> {
>   "type":"Feature",
>   "id":61561312,
>   "geometry":{
>      "type":"Polygon",
>      "coordinates":[
>         [
>            [
>               "53.18119",
>               "4.85247"
>            ],
>            [
>               "53.180908",
>               "4.8518934"
>            ],
>            [
>               "53.1807441",
>               "4.8520919"
>            ],
>            [
>               "53.181027",
>               "4.8526444"
>            ],
>            [
>               "53.18119",
>               "4.85247"
>            ]
>         ]
>      ]
>   },
>   "properties":{
>      "uid":26959,
>      "timestamp":"2010-06-09T12:25:02Z",
>      "changeset":4944796,
>      "user":"ttwimlex",
>      "version":1
>   },
>   "tags":[
>      [
>         "amenity",
>         "parking"
>      ],
>      [
>         "name",
>         "Vuurtoren"
>      ]
>   ]
> }
>
> So in the end, I want stats on how many unique key/value tag pairs I see.
> For just this one record, my output would be something like:
>
> (["amenity", "parking"], 1)
> (["name", "Vuurtoren"], 1)
>
> I load the data using my jsonLoader and grab just the tags, like this:
>
> data = LOAD 'data.txt' using PigJsonLoader as (json: map[]);
> data = FOREACH data GENERATE json#'tags';
> dump data;
>
> ([["amenity","parking"],["name","Vuurtoren"]])
>
> and then I get stuck, because there can be any number of these tags in each
> json record.  I thought if I could split them out into multiple bags, I
> could group and count.  Or maybe I'm missing something obvious :-)
>
> -Kim
>
> On Thu, Oct 14, 2010 at 11:33 AM, Dmitriy Ryaboy <[email protected]>
> wrote:
>
> > Kim,
> > You can't just flatten it? Not sure I am following the example right.
> >
> > -D
> >
> > On Thu, Oct 14, 2010 at 10:30 AM, Kim Vogt <[email protected]> wrote:
> >
> > > Hi,
> > >
> > > If I have bags that have a dynamic number of fields that look something
> > > like
> > > this:
> > >
> > > ("park", "building", "office")
> > > ("store", "school")
> > > ("building", "school", "restaurant", "hotel)
> > >
> > > Is it possible to transform this into one tuple per bag so my data
> looks
> > > like this and then I can do group bys and counts?  Maybe I can do this
> in
> > > an
> > > eval udf?
> > >
> > > ("park")
> > > ("building")
> > > ("office")
> > > ("store")
> > > ...
> > >
> > >
> > > -Kim
> > >
> >
>

Re: extract dynamic number of tuples from bags

Reply via email to