Re: add an auto_increment column

2022-02-08 Thread capitnfrakass

I have got the answer from Mich's answer. Thank you both.

frakass


On 08/02/2022 16:36, Gourav Sengupta wrote:

Hi,

so do you want to rank apple and tomato both as 2? Not quite clear on
the use case here though.

Regards,
Gourav Sengupta

On Tue, Feb 8, 2022 at 7:10 AM  wrote:


Hello Gourav

As you see here orderBy has already give the solution for "equal
amount":


df =




sc.parallelize([("orange",2),("apple",3),("tomato",3),("cherry",5)]).toDF(['fruit','amount'])



df.select("*").orderBy("amount",ascending=False).show()

+--+--+
| fruit|amount|
+--+--+
|cherry| 5|
| apple| 3|
|tomato| 3|
|orange| 2|
+--+--+

I want to add a column at the right whose name is "top" and the
value
auto_increment from 1 to N.

Thank you.

On 08/02/2022 13:52, Gourav Sengupta wrote:

Hi,

sorry once again, will try to understand the problem first :)

As we can clearly see that the initial responses were incorrectly
guessing the solution to be monotonically_increasing function

What if there are two fruits with equal amount? For any real life
application, can we understand what are trying to achieve by the
rankings?

Regards,
Gourav Sengupta

On Tue, Feb 8, 2022 at 4:22 AM ayan guha 

wrote:



For this req you can rank or dense rank.

On Tue, 8 Feb 2022 at 1:12 pm,  wrote:


Hello,

For this query:


df.select("*").orderBy("amount",ascending=False).show()

+--+--+
| fruit|amount|
+--+--+
|tomato| 9|
| apple| 6|
|cherry| 5|
|orange| 3|
+--+--+

I want to add a column "top", in which the value is: 1,2,3...
meaning
top1, top2, top3...

How can I do it?

Thanks.

On 07/02/2022 21:18, Gourav Sengupta wrote:

Hi,

can we understand the requirement first?

What is that you are trying to achieve by auto increment id? Do

you

just want different ID's for rows, or you may want to keep

track

of

the record count of a table as well, or do you want to do use

them for

surrogate keys?

If you are going to insert records multiple times in a table,

and

still have different values?

I think without knowing the requirements all the above

responses, like

everything else where solutions are reached before

understanding

the

problem, has high chances of being wrong.

Regards,
Gourav Sengupta

On Mon, Feb 7, 2022 at 2:21 AM Siva Samraj



wrote:


Monotonically_increasing_id() will give the same functionality

On Mon, 7 Feb, 2022, 6:57 am ,  wrote:


For a dataframe object, how to add a column who is

auto_increment

like
mysql's behavior?

Thank you.















-

To unsubscribe e-mail: user-unsubscr...@spark.apache.org










-

To unsubscribe e-mail: user-unsubscr...@spark.apache.org

--
Best Regards,
Ayan Guha


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: add an auto_increment column

2022-02-08 Thread Gourav Sengupta
Hi,

so do you want to rank apple and tomato both as 2? Not quite clear on the
use case here though.

Regards,
Gourav Sengupta

On Tue, Feb 8, 2022 at 7:10 AM  wrote:

>
> Hello Gourav
>
>
> As you see here orderBy has already give the solution for "equal
> amount":
>
> >>> df =
> >>>
> sc.parallelize([("orange",2),("apple",3),("tomato",3),("cherry",5)]).toDF(['fruit','amount'])
>
> >>> df.select("*").orderBy("amount",ascending=False).show()
> +--+--+
> | fruit|amount|
> +--+--+
> |cherry| 5|
> | apple| 3|
> |tomato| 3|
> |orange| 2|
> +--+--+
>
>
> I want to add a column at the right whose name is "top" and the value
> auto_increment from 1 to N.
>
> Thank you.
>
>
>
> On 08/02/2022 13:52, Gourav Sengupta wrote:
> > Hi,
> >
> > sorry once again, will try to understand the problem first :)
> >
> > As we can clearly see that the initial responses were incorrectly
> > guessing the solution to be monotonically_increasing function
> >
> > What if there are two fruits with equal amount? For any real life
> > application, can we understand what are trying to achieve by the
> > rankings?
> >
> > Regards,
> > Gourav Sengupta
> >
> > On Tue, Feb 8, 2022 at 4:22 AM ayan guha  wrote:
> >
> >> For this req you can rank or dense rank.
> >>
> >> On Tue, 8 Feb 2022 at 1:12 pm,  wrote:
> >>
> >>> Hello,
> >>>
> >>> For this query:
> >>>
> >> df.select("*").orderBy("amount",ascending=False).show()
> >>> +--+--+
> >>> | fruit|amount|
> >>> +--+--+
> >>> |tomato| 9|
> >>> | apple| 6|
> >>> |cherry| 5|
> >>> |orange| 3|
> >>> +--+--+
> >>>
> >>> I want to add a column "top", in which the value is: 1,2,3...
> >>> meaning
> >>> top1, top2, top3...
> >>>
> >>> How can I do it?
> >>>
> >>> Thanks.
> >>>
> >>> On 07/02/2022 21:18, Gourav Sengupta wrote:
>  Hi,
> 
>  can we understand the requirement first?
> 
>  What is that you are trying to achieve by auto increment id? Do
> >>> you
>  just want different ID's for rows, or you may want to keep track
> >>> of
>  the record count of a table as well, or do you want to do use
> >>> them for
>  surrogate keys?
> 
>  If you are going to insert records multiple times in a table,
> >>> and
>  still have different values?
> 
>  I think without knowing the requirements all the above
> >>> responses, like
>  everything else where solutions are reached before understanding
> >>> the
>  problem, has high chances of being wrong.
> 
>  Regards,
>  Gourav Sengupta
> 
>  On Mon, Feb 7, 2022 at 2:21 AM Siva Samraj
> >>> 
>  wrote:
> 
> > Monotonically_increasing_id() will give the same functionality
> >
> > On Mon, 7 Feb, 2022, 6:57 am ,  wrote:
> >
> >> For a dataframe object, how to add a column who is
> >>> auto_increment
> >> like
> >> mysql's behavior?
> >>
> >> Thank you.
> >>
> >>
> >
> 
> >>>
> >>
> > -
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>>
> >>>
> >>
> > -
> >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >> --
> >> Best Regards,
> >> Ayan Guha
>


Re: add an auto_increment column

2022-02-08 Thread Bitfox
Maybe col func is not even needed here. :)

>>> df.select(F.dense_rank().over(wOrder).alias("rank"),
"fruit","amount").show()

++--+--+

|rank| fruit|amount|

++--+--+

|   1|cherry| 5|

|   2| apple| 3|

|   2|tomato| 3|

|   3|orange| 2|

++--+--+




On Tue, Feb 8, 2022 at 3:50 PM Mich Talebzadeh 
wrote:

> simple either rank() or desnse_rank()
>
> >>> from pyspark.sql import functions as F
> >>> from pyspark.sql.functions import col
> >>> from pyspark.sql.window import Window
> >>> wOrder = Window().orderBy(df['amount'].desc())
> >>> df.select(F.rank().over(wOrder).alias("rank"), col('fruit'),
> col('amount')).show()
> ++--+--+
> |rank| fruit|amount|
> ++--+--+
> |   1|cherry| 5|
> |   2| apple| 3|
> |   2|tomato| 3|
> |   4|orange| 2|
> ++--+--+
>
> >>> df.select(F.dense_rank().over(wOrder).alias("rank"), col('fruit'),
> col('amount')).show()
> ++--+--+
> |rank| fruit|amount|
> ++--+--+
> |   1|cherry| 5|
> |   2| apple| 3|
> |   2|tomato| 3|
> |   3|orange| 2|
> ++--+--+
>
> HTH
>
>
>view my Linkedin profile
> 
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Mon, 7 Feb 2022 at 01:27,  wrote:
>
>> For a dataframe object, how to add a column who is auto_increment like
>> mysql's behavior?
>>
>> Thank you.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: add an auto_increment column

2022-02-07 Thread Mich Talebzadeh
simple either rank() or desnse_rank()

>>> from pyspark.sql import functions as F
>>> from pyspark.sql.functions import col
>>> from pyspark.sql.window import Window
>>> wOrder = Window().orderBy(df['amount'].desc())
>>> df.select(F.rank().over(wOrder).alias("rank"), col('fruit'),
col('amount')).show()
++--+--+
|rank| fruit|amount|
++--+--+
|   1|cherry| 5|
|   2| apple| 3|
|   2|tomato| 3|
|   4|orange| 2|
++--+--+

>>> df.select(F.dense_rank().over(wOrder).alias("rank"), col('fruit'),
col('amount')).show()
++--+--+
|rank| fruit|amount|
++--+--+
|   1|cherry| 5|
|   2| apple| 3|
|   2|tomato| 3|
|   3|orange| 2|
++--+--+

HTH


   view my Linkedin profile




*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Mon, 7 Feb 2022 at 01:27,  wrote:

> For a dataframe object, how to add a column who is auto_increment like
> mysql's behavior?
>
> Thank you.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: add an auto_increment column

2022-02-07 Thread Stelios Philippou
https://stackoverflow.com/a/51854022/299676

On Tue, 8 Feb 2022 at 09:25, Stelios Philippou  wrote:

> This has the information that you require in order to add an extra column
> with a sequence to it.
>
>
> On Tue, 8 Feb 2022 at 09:11,  wrote:
>
>>
>> Hello Gourav
>>
>>
>> As you see here orderBy has already give the solution for "equal
>> amount":
>>
>> >>> df =
>> >>>
>> sc.parallelize([("orange",2),("apple",3),("tomato",3),("cherry",5)]).toDF(['fruit','amount'])
>>
>> >>> df.select("*").orderBy("amount",ascending=False).show()
>> +--+--+
>> | fruit|amount|
>> +--+--+
>> |cherry| 5|
>> | apple| 3|
>> |tomato| 3|
>> |orange| 2|
>> +--+--+
>>
>>
>> I want to add a column at the right whose name is "top" and the value
>> auto_increment from 1 to N.
>>
>> Thank you.
>>
>>
>>
>> On 08/02/2022 13:52, Gourav Sengupta wrote:
>> > Hi,
>> >
>> > sorry once again, will try to understand the problem first :)
>> >
>> > As we can clearly see that the initial responses were incorrectly
>> > guessing the solution to be monotonically_increasing function
>> >
>> > What if there are two fruits with equal amount? For any real life
>> > application, can we understand what are trying to achieve by the
>> > rankings?
>> >
>> > Regards,
>> > Gourav Sengupta
>> >
>> > On Tue, Feb 8, 2022 at 4:22 AM ayan guha  wrote:
>> >
>> >> For this req you can rank or dense rank.
>> >>
>> >> On Tue, 8 Feb 2022 at 1:12 pm,  wrote:
>> >>
>> >>> Hello,
>> >>>
>> >>> For this query:
>> >>>
>> >> df.select("*").orderBy("amount",ascending=False).show()
>> >>> +--+--+
>> >>> | fruit|amount|
>> >>> +--+--+
>> >>> |tomato| 9|
>> >>> | apple| 6|
>> >>> |cherry| 5|
>> >>> |orange| 3|
>> >>> +--+--+
>> >>>
>> >>> I want to add a column "top", in which the value is: 1,2,3...
>> >>> meaning
>> >>> top1, top2, top3...
>> >>>
>> >>> How can I do it?
>> >>>
>> >>> Thanks.
>> >>>
>> >>> On 07/02/2022 21:18, Gourav Sengupta wrote:
>>  Hi,
>> 
>>  can we understand the requirement first?
>> 
>>  What is that you are trying to achieve by auto increment id? Do
>> >>> you
>>  just want different ID's for rows, or you may want to keep track
>> >>> of
>>  the record count of a table as well, or do you want to do use
>> >>> them for
>>  surrogate keys?
>> 
>>  If you are going to insert records multiple times in a table,
>> >>> and
>>  still have different values?
>> 
>>  I think without knowing the requirements all the above
>> >>> responses, like
>>  everything else where solutions are reached before understanding
>> >>> the
>>  problem, has high chances of being wrong.
>> 
>>  Regards,
>>  Gourav Sengupta
>> 
>>  On Mon, Feb 7, 2022 at 2:21 AM Siva Samraj
>> >>> 
>>  wrote:
>> 
>> > Monotonically_increasing_id() will give the same functionality
>> >
>> > On Mon, 7 Feb, 2022, 6:57 am ,  wrote:
>> >
>> >> For a dataframe object, how to add a column who is
>> >>> auto_increment
>> >> like
>> >> mysql's behavior?
>> >>
>> >> Thank you.
>> >>
>> >>
>> >
>> 
>> >>>
>> >>
>> > -
>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >>>
>> >>>
>> >>
>> > -
>> >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >> --
>> >> Best Regards,
>> >> Ayan Guha
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: add an auto_increment column

2022-02-07 Thread Stelios Philippou
This has the information that you require in order to add an extra column
with a sequence to it.


On Tue, 8 Feb 2022 at 09:11,  wrote:

>
> Hello Gourav
>
>
> As you see here orderBy has already give the solution for "equal
> amount":
>
> >>> df =
> >>>
> sc.parallelize([("orange",2),("apple",3),("tomato",3),("cherry",5)]).toDF(['fruit','amount'])
>
> >>> df.select("*").orderBy("amount",ascending=False).show()
> +--+--+
> | fruit|amount|
> +--+--+
> |cherry| 5|
> | apple| 3|
> |tomato| 3|
> |orange| 2|
> +--+--+
>
>
> I want to add a column at the right whose name is "top" and the value
> auto_increment from 1 to N.
>
> Thank you.
>
>
>
> On 08/02/2022 13:52, Gourav Sengupta wrote:
> > Hi,
> >
> > sorry once again, will try to understand the problem first :)
> >
> > As we can clearly see that the initial responses were incorrectly
> > guessing the solution to be monotonically_increasing function
> >
> > What if there are two fruits with equal amount? For any real life
> > application, can we understand what are trying to achieve by the
> > rankings?
> >
> > Regards,
> > Gourav Sengupta
> >
> > On Tue, Feb 8, 2022 at 4:22 AM ayan guha  wrote:
> >
> >> For this req you can rank or dense rank.
> >>
> >> On Tue, 8 Feb 2022 at 1:12 pm,  wrote:
> >>
> >>> Hello,
> >>>
> >>> For this query:
> >>>
> >> df.select("*").orderBy("amount",ascending=False).show()
> >>> +--+--+
> >>> | fruit|amount|
> >>> +--+--+
> >>> |tomato| 9|
> >>> | apple| 6|
> >>> |cherry| 5|
> >>> |orange| 3|
> >>> +--+--+
> >>>
> >>> I want to add a column "top", in which the value is: 1,2,3...
> >>> meaning
> >>> top1, top2, top3...
> >>>
> >>> How can I do it?
> >>>
> >>> Thanks.
> >>>
> >>> On 07/02/2022 21:18, Gourav Sengupta wrote:
>  Hi,
> 
>  can we understand the requirement first?
> 
>  What is that you are trying to achieve by auto increment id? Do
> >>> you
>  just want different ID's for rows, or you may want to keep track
> >>> of
>  the record count of a table as well, or do you want to do use
> >>> them for
>  surrogate keys?
> 
>  If you are going to insert records multiple times in a table,
> >>> and
>  still have different values?
> 
>  I think without knowing the requirements all the above
> >>> responses, like
>  everything else where solutions are reached before understanding
> >>> the
>  problem, has high chances of being wrong.
> 
>  Regards,
>  Gourav Sengupta
> 
>  On Mon, Feb 7, 2022 at 2:21 AM Siva Samraj
> >>> 
>  wrote:
> 
> > Monotonically_increasing_id() will give the same functionality
> >
> > On Mon, 7 Feb, 2022, 6:57 am ,  wrote:
> >
> >> For a dataframe object, how to add a column who is
> >>> auto_increment
> >> like
> >> mysql's behavior?
> >>
> >> Thank you.
> >>
> >>
> >
> 
> >>>
> >>
> > -
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>>
> >>>
> >>
> > -
> >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >> --
> >> Best Regards,
> >> Ayan Guha
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: add an auto_increment column

2022-02-07 Thread capitnfrakass



Hello Gourav


As you see here orderBy has already give the solution for "equal 
amount":


df = 
sc.parallelize([("orange",2),("apple",3),("tomato",3),("cherry",5)]).toDF(['fruit','amount'])



df.select("*").orderBy("amount",ascending=False).show()

+--+--+
| fruit|amount|
+--+--+
|cherry| 5|
| apple| 3|
|tomato| 3|
|orange| 2|
+--+--+


I want to add a column at the right whose name is "top" and the value 
auto_increment from 1 to N.


Thank you.



On 08/02/2022 13:52, Gourav Sengupta wrote:

Hi,

sorry once again, will try to understand the problem first :)

As we can clearly see that the initial responses were incorrectly
guessing the solution to be monotonically_increasing function

What if there are two fruits with equal amount? For any real life
application, can we understand what are trying to achieve by the
rankings?

Regards,
Gourav Sengupta

On Tue, Feb 8, 2022 at 4:22 AM ayan guha  wrote:


For this req you can rank or dense rank.

On Tue, 8 Feb 2022 at 1:12 pm,  wrote:


Hello,

For this query:


df.select("*").orderBy("amount",ascending=False).show()

+--+--+
| fruit|amount|
+--+--+
|tomato| 9|
| apple| 6|
|cherry| 5|
|orange| 3|
+--+--+

I want to add a column "top", in which the value is: 1,2,3...
meaning
top1, top2, top3...

How can I do it?

Thanks.

On 07/02/2022 21:18, Gourav Sengupta wrote:

Hi,

can we understand the requirement first?

What is that you are trying to achieve by auto increment id? Do

you

just want different ID's for rows, or you may want to keep track

of

the record count of a table as well, or do you want to do use

them for

surrogate keys?

If you are going to insert records multiple times in a table,

and

still have different values?

I think without knowing the requirements all the above

responses, like

everything else where solutions are reached before understanding

the

problem, has high chances of being wrong.

Regards,
Gourav Sengupta

On Mon, Feb 7, 2022 at 2:21 AM Siva Samraj



wrote:


Monotonically_increasing_id() will give the same functionality

On Mon, 7 Feb, 2022, 6:57 am ,  wrote:


For a dataframe object, how to add a column who is

auto_increment

like
mysql's behavior?

Thank you.











-

To unsubscribe e-mail: user-unsubscr...@spark.apache.org






-

To unsubscribe e-mail: user-unsubscr...@spark.apache.org

--
Best Regards,
Ayan Guha


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: add an auto_increment column

2022-02-07 Thread Gourav Sengupta
Hi,

sorry once again, will try to understand the problem first :)

As we can clearly see that the initial responses were incorrectly guessing
the solution to be monotonically_increasing function

What if there are two fruits with equal amount? For any real life
application, can we understand what are trying to achieve by the rankings?



Regards,
Gourav Sengupta

On Tue, Feb 8, 2022 at 4:22 AM ayan guha  wrote:

> For this req you can rank or dense rank.
>
> On Tue, 8 Feb 2022 at 1:12 pm,  wrote:
>
>> Hello,
>>
>> For this query:
>>
>> >>> df.select("*").orderBy("amount",ascending=False).show()
>> +--+--+
>> | fruit|amount|
>> +--+--+
>> |tomato| 9|
>> | apple| 6|
>> |cherry| 5|
>> |orange| 3|
>> +--+--+
>>
>>
>> I want to add a column "top", in which the value is: 1,2,3... meaning
>> top1, top2, top3...
>>
>> How can I do it?
>>
>> Thanks.
>>
>>
>>
>>
>> On 07/02/2022 21:18, Gourav Sengupta wrote:
>> > Hi,
>> >
>> > can we understand the requirement first?
>> >
>> > What is that you are trying to achieve by auto increment id? Do you
>> > just want different ID's for rows, or you may want to keep track of
>> > the record count of a table as well, or do you want to do use them for
>> > surrogate keys?
>> >
>> > If you are going to insert records multiple times in a table, and
>> > still have different values?
>> >
>> > I think without knowing the requirements all the above responses, like
>> > everything else where solutions are reached before understanding the
>> > problem, has high chances of being wrong.
>> >
>> > Regards,
>> > Gourav Sengupta
>> >
>> > On Mon, Feb 7, 2022 at 2:21 AM Siva Samraj 
>> > wrote:
>> >
>> >> Monotonically_increasing_id() will give the same functionality
>> >>
>> >> On Mon, 7 Feb, 2022, 6:57 am ,  wrote:
>> >>
>> >>> For a dataframe object, how to add a column who is auto_increment
>> >>> like
>> >>> mysql's behavior?
>> >>>
>> >>> Thank you.
>> >>>
>> >>>
>> >>
>> > -
>> >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>> --
> Best Regards,
> Ayan Guha
>


Re: add an auto_increment column

2022-02-07 Thread ayan guha
For this req you can rank or dense rank.

On Tue, 8 Feb 2022 at 1:12 pm,  wrote:

> Hello,
>
> For this query:
>
> >>> df.select("*").orderBy("amount",ascending=False).show()
> +--+--+
> | fruit|amount|
> +--+--+
> |tomato| 9|
> | apple| 6|
> |cherry| 5|
> |orange| 3|
> +--+--+
>
>
> I want to add a column "top", in which the value is: 1,2,3... meaning
> top1, top2, top3...
>
> How can I do it?
>
> Thanks.
>
>
>
>
> On 07/02/2022 21:18, Gourav Sengupta wrote:
> > Hi,
> >
> > can we understand the requirement first?
> >
> > What is that you are trying to achieve by auto increment id? Do you
> > just want different ID's for rows, or you may want to keep track of
> > the record count of a table as well, or do you want to do use them for
> > surrogate keys?
> >
> > If you are going to insert records multiple times in a table, and
> > still have different values?
> >
> > I think without knowing the requirements all the above responses, like
> > everything else where solutions are reached before understanding the
> > problem, has high chances of being wrong.
> >
> > Regards,
> > Gourav Sengupta
> >
> > On Mon, Feb 7, 2022 at 2:21 AM Siva Samraj 
> > wrote:
> >
> >> Monotonically_increasing_id() will give the same functionality
> >>
> >> On Mon, 7 Feb, 2022, 6:57 am ,  wrote:
> >>
> >>> For a dataframe object, how to add a column who is auto_increment
> >>> like
> >>> mysql's behavior?
> >>>
> >>> Thank you.
> >>>
> >>>
> >>
> > -
> >>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
Best Regards,
Ayan Guha


Re: add an auto_increment column

2022-02-07 Thread capitnfrakass

Hello,

For this query:


df.select("*").orderBy("amount",ascending=False).show()

+--+--+
| fruit|amount|
+--+--+
|tomato| 9|
| apple| 6|
|cherry| 5|
|orange| 3|
+--+--+


I want to add a column "top", in which the value is: 1,2,3... meaning 
top1, top2, top3...


How can I do it?

Thanks.




On 07/02/2022 21:18, Gourav Sengupta wrote:

Hi,

can we understand the requirement first?

What is that you are trying to achieve by auto increment id? Do you
just want different ID's for rows, or you may want to keep track of
the record count of a table as well, or do you want to do use them for
surrogate keys?

If you are going to insert records multiple times in a table, and
still have different values?

I think without knowing the requirements all the above responses, like
everything else where solutions are reached before understanding the
problem, has high chances of being wrong.

Regards,
Gourav Sengupta

On Mon, Feb 7, 2022 at 2:21 AM Siva Samraj 
wrote:


Monotonically_increasing_id() will give the same functionality

On Mon, 7 Feb, 2022, 6:57 am ,  wrote:


For a dataframe object, how to add a column who is auto_increment
like
mysql's behavior?

Thank you.





-

To unsubscribe e-mail: user-unsubscr...@spark.apache.org


-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: add an auto_increment column

2022-02-07 Thread Gourav Sengupta
Hi,

can we understand the requirement first?

What is that you are trying to achieve by auto increment id? Do you just
want different ID's for rows, or you may want to keep track of the record
count of a table as well, or do you want to do use them for surrogate keys?
If you are going to insert records multiple times in a table, and still
have different values?

I think without knowing the requirements all the above responses, like
everything else where solutions are reached before understanding the
problem, has high chances of being wrong.


Regards,
Gourav Sengupta

On Mon, Feb 7, 2022 at 2:21 AM Siva Samraj  wrote:

> Monotonically_increasing_id() will give the same functionality
>
> On Mon, 7 Feb, 2022, 6:57 am ,  wrote:
>
>> For a dataframe object, how to add a column who is auto_increment like
>> mysql's behavior?
>>
>> Thank you.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>


Re: add an auto_increment column

2022-02-06 Thread Siva Samraj
Monotonically_increasing_id() will give the same functionality

On Mon, 7 Feb, 2022, 6:57 am ,  wrote:

> For a dataframe object, how to add a column who is auto_increment like
> mysql's behavior?
>
> Thank you.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


Re: add an auto_increment column

2022-02-06 Thread ayan guha
Try this:
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.functions.monotonically_increasing_id.html



On Mon, 7 Feb 2022 at 12:27 pm,  wrote:

> For a dataframe object, how to add a column who is auto_increment like
> mysql's behavior?
>
> Thank you.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
> --
Best Regards,
Ayan Guha