Re: Replicating a row n times

2017-09-29 Thread Weichen Xu
I suggest you to use `monotonicallyIncreasingId` which is high efficient.
But note that the ID it generated will not be consecutive.

On Fri, Sep 29, 2017 at 3:21 PM, Kanagha Kumar 
wrote:

> Thanks for the response.
> I can use either row_number() or monotonicallyIncreasingId to generate
> uniqueIds as in https://hadoopist.wordpress.com/2016/05/24/
> generate-unique-ids-for-each-rows-in-a-spark-dataframe/
>
> I'm looking for a java example to use that to replicate a single row n
> times by appending a rownum column generated as above or using explode
> function.
>
> Ex:
>
> ds.withColumn("ROWNUM", org.apache.spark.sql.functions.explode(columnEx));
>
> columnEx needs to be of type array inorder for explode to work.
>
> Any suggestions are helpful.
> Thanks
>
>
> On Thu, Sep 28, 2017 at 7:21 PM, ayan guha  wrote:
>
>> How about using row number for primary key?
>>
>> Select row_number() over (), * from table
>>
>> On Fri, 29 Sep 2017 at 10:21 am, Kanagha Kumar 
>> wrote:
>>
>>> Hi,
>>>
>>> I'm trying to replicate a single row from a dataset n times and create a
>>> new dataset from it. But, while replicating I need a column's value to be
>>> changed for each replication since it would be end up as the primary key
>>> when stored finally.
>>>
>>> Looked at the following reference:https://stackoverflo
>>> w.com/questions/40397740/replicate-spark-row-n-times
>>>
>>> import org.apache.spark.sql.functions._
>>> val result = singleRowDF
>>>   .withColumn("dummy", explode(array((1 until 100).map(lit): _*)))
>>>   .selectExpr(singleRowDF.columns: _*)
>>>
>>> How can I create a column from an array of values in Java and pass it to
>>> explode function? Suggestions are helpful.
>>>
>>>
>>> Thanks
>>> Kanagha
>>>
>> --
>> Best Regards,
>> Ayan Guha
>>
>
>


Re: Replicating a row n times

2017-09-29 Thread Kanagha Kumar
Thanks for the response.
I can use either row_number() or monotonicallyIncreasingId to generate
uniqueIds as in
https://hadoopist.wordpress.com/2016/05/24/generate-unique-ids-for-each-rows-in-a-spark-dataframe/

I'm looking for a java example to use that to replicate a single row n
times by appending a rownum column generated as above or using explode
function.

Ex:

ds.withColumn("ROWNUM", org.apache.spark.sql.functions.explode(columnEx));

columnEx needs to be of type array inorder for explode to work.

Any suggestions are helpful.
Thanks


On Thu, Sep 28, 2017 at 7:21 PM, ayan guha  wrote:

> How about using row number for primary key?
>
> Select row_number() over (), * from table
>
> On Fri, 29 Sep 2017 at 10:21 am, Kanagha Kumar 
> wrote:
>
>> Hi,
>>
>> I'm trying to replicate a single row from a dataset n times and create a
>> new dataset from it. But, while replicating I need a column's value to be
>> changed for each replication since it would be end up as the primary key
>> when stored finally.
>>
>> Looked at the following reference:https://stackoverflow.com/questions/
>> 40397740/replicate-spark-row-n-times
>>
>> import org.apache.spark.sql.functions._
>> val result = singleRowDF
>>   .withColumn("dummy", explode(array((1 until 100).map(lit): _*)))
>>   .selectExpr(singleRowDF.columns: _*)
>>
>> How can I create a column from an array of values in Java and pass it to
>> explode function? Suggestions are helpful.
>>
>>
>> Thanks
>> Kanagha
>>
> --
> Best Regards,
> Ayan Guha
>


Re: Replicating a row n times

2017-09-28 Thread ayan guha
How about using row number for primary key?

Select row_number() over (), * from table

On Fri, 29 Sep 2017 at 10:21 am, Kanagha Kumar 
wrote:

> Hi,
>
> I'm trying to replicate a single row from a dataset n times and create a
> new dataset from it. But, while replicating I need a column's value to be
> changed for each replication since it would be end up as the primary key
> when stored finally.
>
> Looked at the following reference:
> https://stackoverflow.com/questions/40397740/replicate-spark-row-n-times
>
> import org.apache.spark.sql.functions._
> val result = singleRowDF
>   .withColumn("dummy", explode(array((1 until 100).map(lit): _*)))
>   .selectExpr(singleRowDF.columns: _*)
>
> How can I create a column from an array of values in Java and pass it to
> explode function? Suggestions are helpful.
>
>
> Thanks
> Kanagha
>
-- 
Best Regards,
Ayan Guha