Hey Surbhi, I think it's just a bug-- Crunch-on-Spark should be handling the partitioner stuff correctly w/o requiring you to write your own. I think the problem is that we set the location of the partition file (the one that the code is mad that it can't find in your gist) inside of the GroupingOptions class, and we're not updating the Configuration object that the Spark job is going to use w/the location of that file in the same way we do on MapReduce. I'll file a bug for it and see if I can't come up w/a fix and unit test tomorrow.
Thanks! Josh On Wed, Aug 12, 2015 at 10:45 AM, Surbhi Mungre <[email protected]> wrote: > I am converting a MRPipeline to SparkPipeline with these[1] instructions. > My SparkPipeline fails with this[2] exception. In my pipeline I am trying > to write to HBase using HFiles. IIUC M/R job which creates HFiles uses a > custom partitioner. I am not sure how Crunch translates this to Spark. From > the exception stack trace it looks like Spark is using M/R partitioner. I > am completely new to Spark but I think I will have to create a custom spark > partitioner and use it instead. When I am converting a MRPipeline to > SparkPipeline, if a M/R job uses custom partitioner will Crunch handle it? > > > [1] > http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_running_crunch_with_spark.html > > [2] https://gist.github.com/anonymous/920c000f20229eaa76d8 > > Thanks, > Surbhi > > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
