Hey Chao,

I wrote one for Cloudera ML. Here's the Source:

https://github.com/cloudera/ml/blob/master/hcatalog/src/main/java/com/cloudera/science/ml/hcatalog/HCatalogSource.java

Couple of caveats:

1) I developed it against HCat 0.4; I think there are more modern versions
now,
2) I wrote the Source by hand b/c we didn't have support for providing
extra conf info on FileSourceImpl at the time,
3) I was using my own custom Record interface that wrapped HCatalog
records, Avro records, and CSV files, so that's the type of data provided
by the Source.

There are also a bunch of Hive utilities I wrote that I found useful for
working with Hive tables:

https://github.com/cloudera/ml/blob/master/hcatalog/src/main/java/com/cloudera/science/ml/hcatalog/HCatalog.java

My opinion at the time was that writing to a HCat output was sort of a pain
unless the table was already defined and you were just creating a new
partition of it, which didn't really apply to my use case, so I would just
write my regular outputs and then call the Hive APIs to create a table
around it.

Hope that helps-- good luck!

J


On Thu, Jan 23, 2014 at 5:35 AM, Chao Shi <[email protected]> wrote:

> Hi all,
>
> One of our recent projects needs read and write from/to HCatalog. We
> currently use raw MR with HCatInputFormat/HCatOutputFormat shipped with
> HCatalog. Does anyone know if there is already a Crunch wrapper for it?
>
> Thanks,
> Chao
>

Reply via email to