You're right that this is kind of a difficult thing to do, but not
impossible. Pig does not HAVE to have a schema for things to work: it just
helps. That said, I think a map might be more suited to what you want to
do, that way you can reference the columns you may or may not have... it's
definitely possible to do what you want to do (just override outputSchema
and return null, then go to town), I just uestion whether or not it is a
good idea!

2012/3/19 Eli Finkelshteyn <[email protected]>

> Hi,
> I have a relation set of browsers and number of people using each of the
> form:
>
> _browser_, _total_
> firefox,1234
> ie,123
> chrome,321
> ipad,437
>
> Is there any good way I can rotate this, so that the first row dynamically
> generates columns and I wind up with a result like:
>
> _firefox_, _ie_, _chrome_, _ipad_
> 1234, 123, 321, 437
>
> The basic Pig I'm using to load what I have so far is along the lines of:
>
>        good = FILTER new BY (browser_identity IS NOT NULL)
>                AND (browser_version IS NOT NULL)
>                AND (ip_address IS NOT NULL);
>         distincted = DISTINCT good
>         distincted = FOREACH distincted GENERATE browser_identity,
> browser_version;
>         grouped = GROUP distincted BY (browser_identity, browser_version);
>         counted = FOREACH grouped GENERATE
>             group AS colname, COUNT(distincted) AS total
>
> In case that helps.
>
> I was thinking of writing a udf for this, but figured the output schema
> would be really annoying to deal with, so I'd ask here first in case
> there's an easier way, or someone had already done it.
>
> Cheers,
> Eli
>

Reply via email to