Tracking here: https://issues.apache.org/jira/browse/CRUNCH-556
On Wed, Aug 12, 2015 at 8:10 PM, Josh Wills <[email protected]> wrote: > Hey Surbhi, > > I think it's just a bug-- Crunch-on-Spark should be handling the > partitioner stuff correctly w/o requiring you to write your own. I think > the problem is that we set the location of the partition file (the one that > the code is mad that it can't find in your gist) inside of the > GroupingOptions class, and we're not updating the Configuration object that > the Spark job is going to use w/the location of that file in the same way > we do on MapReduce. I'll file a bug for it and see if I can't come up w/a > fix and unit test tomorrow. > > Thanks! > Josh > > On Wed, Aug 12, 2015 at 10:45 AM, Surbhi Mungre <[email protected]> > wrote: > >> I am converting a MRPipeline to SparkPipeline with these[1] instructions. >> My SparkPipeline fails with this[2] exception. In my pipeline I am trying >> to write to HBase using HFiles. IIUC M/R job which creates HFiles uses a >> custom partitioner. I am not sure how Crunch translates this to Spark. From >> the exception stack trace it looks like Spark is using M/R partitioner. I >> am completely new to Spark but I think I will have to create a custom spark >> partitioner and use it instead. When I am converting a MRPipeline to >> SparkPipeline, if a M/R job uses custom partitioner will Crunch handle it? >> >> >> [1] >> http://www.cloudera.com/content/cloudera/en/documentation/core/latest/topics/cdh_ig_running_crunch_with_spark.html >> >> [2] https://gist.github.com/anonymous/920c000f20229eaa76d8 >> >> Thanks, >> Surbhi >> >> > > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
