[jira] [Comment Edited] (SPARK-23291) SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1

2018-05-06 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465356#comment-16465356
 ] 

Hyukjin Kwon edited comment on SPARK-23291 at 5/7/18 2:04 AM:
--

[~felixcheung], sure, I agree with that in general. However, we could probably 
think about this way too for this case specifically:

in other words, it has been wrong for 3 years, it requires weird codes for R 
specifically comparing to other languages APIs. IMHO, It's a bit subtlety and 
users might be adopted to this bugs rather than bothering to report this out 
(of course I guess with some nuisance). Think about this expr("substr(...)") 
and substr work differently. I am also seeing [expr("substr(...)") is suggested 
as an alternative of 
substr|https://stackoverflow.com/questions/37413122/use-of-substr-on-dataframe-column-in-sparkr?rq=1]
  If it's clearly documented in the migration guide, I thought it can be fine.

Also, this substr case is pretty well understood and isolated.

As a reference, I recall a case - 
https://github.com/apache/spark/pull/20499#issuecomment-363863660. It sounds 
pretty a similar case with that. I was hesitant at that time too but after 
thinking for a while, I ended up with kind of agreeing that the backport is 
okay. It wasn't a regression at that time too.



was (Author: hyukjin.kwon):
[~felixcheung], sure, I agree with that in general. However, we could probably 
think about this way too for this case specifically:

in other words, it has been wrong for 3 years, it requires weird codes for R 
specifically comparing to other languages APIs. IMHO, It's a bit subtlety and 
users might be adopted to this bugs rather than bothering this out (of course I 
guess with some nuisance). Think about this expr("substr(...)") and substr work 
differently. I am also seeing [expr("substr(...)") is suggested as an 
alternative of 
substr|https://stackoverflow.com/questions/37413122/use-of-substr-on-dataframe-column-in-sparkr?rq=1]
  If it's clearly documented in the migration guide, I thought it can be fine.

Also, this substr case is pretty well understood and isolated.

As a reference, I recall a case - 
https://github.com/apache/spark/pull/20499#issuecomment-363863660. It sounds 
pretty a similar case with that. I was hesitant at that time too but after 
thinking for a while, I ended up with kind of agreeing that the backport is 
okay. It wasn't a regression at that time too.


> SparkR : substr : In SparkR dataframe , starting and ending position 
> arguments in "substr" is giving wrong result  when the position is greater 
> than 1
> --
>
> Key: SPARK-23291
> URL: https://issues.apache.org/jira/browse/SPARK-23291
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.2, 2.2.0, 2.2.1, 2.3.0
>Reporter: Narendra
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>
> Defect Description :
> -
> For example ,an input string "2017-12-01" is read into a SparkR dataframe 
> "df" with column name "col1".
>  The target is to create a a new column named "col2" with the value "12" 
> which is inside the string ."12" can be extracted with "starting position" as 
> "6" and "Ending position" as "7"
>  (the starting position of the first character is considered as "1" )
> But,the current code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,7,8)))
> Observe that the first argument in the "substr" API , which indicates the 
> 'starting position', is mentioned as "7" 
>  Also, observe that the second argument in the "substr" API , which indicates 
> the 'ending position', is mentioned as "8"
> i.e the number that should be mentioned to indicate the position should be 
> the "actual position + 1"
> Expected behavior :
> 
> The code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,6,7)))
> Note :
> ---
>  This defect is observed with only when the starting position is greater than 
> 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23291) SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1

2018-05-06 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465307#comment-16465307
 ] 

Felix Cheung edited comment on SPARK-23291 at 5/6/18 10:38 PM:
---

actually, I'm not sure we should backport this to a x.x.1 release.

yes, the behavior "was unexpected" but it has been around for the last 3 years, 
if I recall, since the very beginning. and it is not a regression per se.

either users don't care since it has never been reported, or (most likely) 
users have adopted to the behavior in which case we will break existing jobs in 
a patch release.

anyway, it's just my 2c.


was (Author: felixcheung):
actually, I'm not sure we should backport this to a x.x.1 release.

yes, the behavior "was unexpected" but it has been around for the last 3 years, 
if I recall, since the very beginning.

either users don't care since it has never been reported, or (most likely) 
users have adopted to the behavior in which case we will break existing jobs in 
a patch release.

anyway, it's just my 2c.

> SparkR : substr : In SparkR dataframe , starting and ending position 
> arguments in "substr" is giving wrong result  when the position is greater 
> than 1
> --
>
> Key: SPARK-23291
> URL: https://issues.apache.org/jira/browse/SPARK-23291
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.2, 2.2.0, 2.2.1, 2.3.0
>Reporter: Narendra
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>
> Defect Description :
> -
> For example ,an input string "2017-12-01" is read into a SparkR dataframe 
> "df" with column name "col1".
>  The target is to create a a new column named "col2" with the value "12" 
> which is inside the string ."12" can be extracted with "starting position" as 
> "6" and "Ending position" as "7"
>  (the starting position of the first character is considered as "1" )
> But,the current code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,7,8)))
> Observe that the first argument in the "substr" API , which indicates the 
> 'starting position', is mentioned as "7" 
>  Also, observe that the second argument in the "substr" API , which indicates 
> the 'ending position', is mentioned as "8"
> i.e the number that should be mentioned to indicate the position should be 
> the "actual position + 1"
> Expected behavior :
> 
> The code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,6,7)))
> Note :
> ---
>  This defect is observed with only when the starting position is greater than 
> 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23291) SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1

2018-05-06 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465307#comment-16465307
 ] 

Felix Cheung edited comment on SPARK-23291 at 5/6/18 10:35 PM:
---

actually, I'm not sure we should backport this to a x.x.1 release.

yes, the behavior "was unexpected" but it has been around for the last 3 years, 
if I recall, since the very beginning.

either users don't care since it has never been reported, or (most likely) 
users have adopted to the behavior in which case we will break existing jobs in 
a patch release.

anyway, it's just my 2c.


was (Author: felixcheung):
actually, I'm not sure we should backport this to a x.x.1 release.

yes, the behavior "was unexpected" but it has been around for the last 3 years, 
if I recall.

either users don't care since it has never been reported, or users have adopted 
to the behavior in which case we will break existing jobs in a patch release.

anyway, it's just my 2c.

> SparkR : substr : In SparkR dataframe , starting and ending position 
> arguments in "substr" is giving wrong result  when the position is greater 
> than 1
> --
>
> Key: SPARK-23291
> URL: https://issues.apache.org/jira/browse/SPARK-23291
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.2, 2.2.0, 2.2.1, 2.3.0
>Reporter: Narendra
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>
> Defect Description :
> -
> For example ,an input string "2017-12-01" is read into a SparkR dataframe 
> "df" with column name "col1".
>  The target is to create a a new column named "col2" with the value "12" 
> which is inside the string ."12" can be extracted with "starting position" as 
> "6" and "Ending position" as "7"
>  (the starting position of the first character is considered as "1" )
> But,the current code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,7,8)))
> Observe that the first argument in the "substr" API , which indicates the 
> 'starting position', is mentioned as "7" 
>  Also, observe that the second argument in the "substr" API , which indicates 
> the 'ending position', is mentioned as "8"
> i.e the number that should be mentioned to indicate the position should be 
> the "actual position + 1"
> Expected behavior :
> 
> The code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,6,7)))
> Note :
> ---
>  This defect is observed with only when the starting position is greater than 
> 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-23291) SparkR : substr : In SparkR dataframe , starting and ending position arguments in "substr" is giving wrong result when the position is greater than 1

2018-05-06 Thread Felix Cheung (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-23291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465307#comment-16465307
 ] 

Felix Cheung edited comment on SPARK-23291 at 5/6/18 10:34 PM:
---

actually, I'm not sure we should backport this to a x.x.1 release.

yes, the behavior "was unexpected" but it has been around for the last 3 years, 
if I recall.

either users don't care since it has never been reported, or users have adopted 
to the behavior in which case we will break existing jobs in a patch release.

anyway, it's just my 2c.


was (Author: felixcheung):
actually, I'm not sure we should backport this to a x.x.1 release.

yes, the behavior "was unexpected" but it has been around for the last 3 years, 
if I recall.

either users don't care since it has never been reported, or users have adopted 
to the behavior in which case we will break existing jobs in a patch release.

 

> SparkR : substr : In SparkR dataframe , starting and ending position 
> arguments in "substr" is giving wrong result  when the position is greater 
> than 1
> --
>
> Key: SPARK-23291
> URL: https://issues.apache.org/jira/browse/SPARK-23291
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.1.2, 2.2.0, 2.2.1, 2.3.0
>Reporter: Narendra
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 2.4.0
>
>
> Defect Description :
> -
> For example ,an input string "2017-12-01" is read into a SparkR dataframe 
> "df" with column name "col1".
>  The target is to create a a new column named "col2" with the value "12" 
> which is inside the string ."12" can be extracted with "starting position" as 
> "6" and "Ending position" as "7"
>  (the starting position of the first character is considered as "1" )
> But,the current code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,7,8)))
> Observe that the first argument in the "substr" API , which indicates the 
> 'starting position', is mentioned as "7" 
>  Also, observe that the second argument in the "substr" API , which indicates 
> the 'ending position', is mentioned as "8"
> i.e the number that should be mentioned to indicate the position should be 
> the "actual position + 1"
> Expected behavior :
> 
> The code that needs to be written is :
>  
>  df <- withColumn(df,"col2",substr(df$col1,6,7)))
> Note :
> ---
>  This defect is observed with only when the starting position is greater than 
> 1.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org