HFileOutputFormatForCrunch with spark pipeline

Surbhi Mungre Wed, 12 Aug 2015 10:47:40 -0700

I am converting a MRPipeline to SparkPipeline with these[1] instructions.
My SparkPipeline fails with this[2] exception. In my pipeline I am trying
to write to HBase using HFiles. IIUC M/R job which creates HFiles uses a
custom partitioner. I am not sure how Crunch translates this to Spark. From
the exception stack trace it looks like Spark is using M/R partitioner. I
am completely new to Spark but I think I will have to create a custom spark
partitioner and use it instead. When I am converting a MRPipeline to
SparkPipeline, if a M/R job uses custom partitioner will Crunch handle it?



[1]
http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_running_crunch_with_spark.html

[2] https://gist.github.com/anonymous/920c000f20229eaa76d8

Thanks,
Surbhi

HFileOutputFormatForCrunch with spark pipeline

Reply via email to