Hey Chao, I wrote one for Cloudera ML. Here's the Source:
https://github.com/cloudera/ml/blob/master/hcatalog/src/main/java/com/cloudera/science/ml/hcatalog/HCatalogSource.java Couple of caveats: 1) I developed it against HCat 0.4; I think there are more modern versions now, 2) I wrote the Source by hand b/c we didn't have support for providing extra conf info on FileSourceImpl at the time, 3) I was using my own custom Record interface that wrapped HCatalog records, Avro records, and CSV files, so that's the type of data provided by the Source. There are also a bunch of Hive utilities I wrote that I found useful for working with Hive tables: https://github.com/cloudera/ml/blob/master/hcatalog/src/main/java/com/cloudera/science/ml/hcatalog/HCatalog.java My opinion at the time was that writing to a HCat output was sort of a pain unless the table was already defined and you were just creating a new partition of it, which didn't really apply to my use case, so I would just write my regular outputs and then call the Hive APIs to create a table around it. Hope that helps-- good luck! J On Thu, Jan 23, 2014 at 5:35 AM, Chao Shi <[email protected]> wrote: > Hi all, > > One of our recent projects needs read and write from/to HCatalog. We > currently use raw MR with HCatInputFormat/HCatOutputFormat shipped with > HCatalog. Does anyone know if there is already a Crunch wrapper for it? > > Thanks, > Chao >
