It would be really helpful if links to resources are provided where sequences are used in Map reduce which I will try to replicate in spark.
Thank you James and Josh for your answers. On 17-Aug-2015 8:25 pm, "Josh Mahonin" <[email protected]> wrote: > Oh, neat! I was looking for some references to it in code, unit tests and > docs and didn't see anything relevant. > > It's possible they might "just work" then, although it's definitely an > untested scenario. > > On Mon, Aug 17, 2015 at 10:48 AM, James Taylor <[email protected]> > wrote: > >> Sequences are supported by MR integration, but I'm not sure if their >> usage by the Spark integration would cause any issues. >> >> >> On Monday, August 17, 2015, Josh Mahonin <[email protected]> wrote: >> >>> Hi Satya, >>> >>> I don't believe sequences are supported by the broader Phoenix >>> map-reduce integration, which the phoenix-spark module uses under the hood. >>> >>> One workaround that would give you sequential IDs, is to use the >>> 'zipWithIndex' method on the underlying Spark RDD, with a small 'map()' >>> operation to unpack / reorganize the tuple, before saving it to Phoenix. >>> >>> Good luck! >>> >>> Josh >>> >>> On Sat, Aug 15, 2015 at 10:02 AM, Ns G <[email protected]> wrote: >>> >>>> Hi All, >>>> >>>> I hope that someone will reply to this email as all my previous emails >>>> have been unanswered. >>>> >>>> I have 10-20 Million records in file and I want to insert it through >>>> Phoenix-Spark. >>>> The table primary id is generated by a sequence. So, every time an >>>> upsert is done, the sequence Id gets generated. >>>> >>>> Now I want to implement this in Spark and more precisely using data >>>> frames. Since RDDs are immutables, How can I add sequence to the rows in >>>> dataframe? >>>> >>>> Thanks for any help or direction or suggestion. >>>> >>>> Satya >>>> >>> >>> >
