Re: Best Practice: store depending on data content

Markus Resch Thu, 28 Jun 2012 02:18:18 -0700

Thanks Thejas,

This _really_ helped a lot :)
Some additional question on this:
As far as I see, the MultiStorage is currently just capable to write CSV
output, right? Is there any attempt ongoing currently to make this
storage more generic regarding the format of the output data? For our
needs we would require AVRO output as well as some special proprietary
binary encoding for which we already created our own storage. I'm
thinking about a storage that will select a certain writer method
depending to the file names ending.


Do you know of such efforts?

Thanks

Markus


Am Freitag, den 22.06.2012, 11:23 -0700 schrieb Thejas Nair:
> You can use MultiStorage store func - 
> http://pig.apache.org/docs/r0.9.1/api/org/apache/pig/piggybank/storage/MultiStorage.html
> 
> Or if you want something more flexible, and have metadata as well, use 
> hcatalog . Specify the keys on which you want to partition as your 
> partition keys in the table. Then use HcatStorer() to store the data.
> See http://incubator.apache.org/hcatalog/docs/r0.4.0/index.html
> 
> Thanks,
> Thejas
> 
> 
> 
> On 6/22/12 4:54 AM, Markus Resch wrote:
> > Hey everyone,
> >
> > We're doing some aggregation. The result contains a key where we want to
> > have a single output file for each key. Is it possible to store files
> > like this? Especially adjusting the path by the key's value.
> >
> > Example:
> > Input = LOAD 'my/data.avro' USING AvroStorage;
> > [.... doing stuff....]
> > Output = GROUP AggregatesValues BY Key;
> > FOREACH Output Store * into
> > '/my/output/path/by/$Output.Key/Result.avro'
> >
> > I know this example does not work. But is there anything similar
> > possible? And, as I assume, not: is there some framework in the hadoop
> > world that can do such stuff?
> >
> >
> > Thanks
> >
> > Markus
> >
> >

Re: Best Practice: store depending on data content

Reply via email to