Re: REG: Using Sequences in Phoenix Data Frame

Ns G Mon, 17 Aug 2015 09:12:10 -0700

It would be really helpful if  links to resources are provided  where
sequences are used in Map reduce which I will try to replicate in spark.


Thank you James and Josh for your answers.
On 17-Aug-2015 8:25 pm, "Josh Mahonin" <[email protected]> wrote:

> Oh, neat! I was looking for some references to it in code, unit tests and
> docs and didn't see anything relevant.
>
> It's possible they might "just work" then, although it's definitely an
> untested scenario.
>
> On Mon, Aug 17, 2015 at 10:48 AM, James Taylor <[email protected]>
> wrote:
>
>> Sequences are supported by MR integration, but I'm not sure if their
>> usage by the Spark integration would cause any issues.
>>
>>
>> On Monday, August 17, 2015, Josh Mahonin <[email protected]> wrote:
>>
>>> Hi Satya,
>>>
>>> I don't believe sequences are supported by the broader Phoenix
>>> map-reduce integration, which the phoenix-spark module uses under the hood.
>>>
>>> One workaround that would give you sequential IDs, is to use the
>>> 'zipWithIndex' method on the underlying Spark RDD, with a small 'map()'
>>> operation to unpack / reorganize the tuple, before saving it to Phoenix.
>>>
>>> Good luck!
>>>
>>> Josh
>>>
>>> On Sat, Aug 15, 2015 at 10:02 AM, Ns G <[email protected]> wrote:
>>>
>>>> Hi All,
>>>>
>>>> I hope that someone will reply to this email as all my previous emails
>>>> have been unanswered.
>>>>
>>>> I have 10-20 Million records in file and I want to insert it through
>>>> Phoenix-Spark.
>>>> The table primary id is generated by a sequence. So, every time an
>>>> upsert is done, the sequence Id gets generated.
>>>>
>>>> Now I want to implement this in Spark and more precisely using data
>>>> frames. Since RDDs are immutables, How can I add sequence to the rows in
>>>> dataframe?
>>>>
>>>> Thanks for any help or direction or suggestion.
>>>>
>>>> Satya
>>>>
>>>
>>>
>

Re: REG: Using Sequences in Phoenix Data Frame

Reply via email to