[jira] [Comment Edited] (SPARK-35741) Variance of 1 record gives NULL in Spark 3.x and NaN in Spark 2.x

2021-06-12 Thread Abdeali Kothari (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362244#comment-17362244 ] Abdeali Kothari edited comment on SPARK-35741 at 6/12/21, 6:08 AM: ---

[jira] [Commented] (SPARK-35741) Variance of 1 record gives NULL in Spark 3.x and NaN in Spark 2.x

2021-06-12 Thread Abdeali Kothari (Jira)
[ https://issues.apache.org/jira/browse/SPARK-35741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17362244#comment-17362244 ] Abdeali Kothari commented on SPARK-35741: - Aha. I see - seems like it was in the spark 3.1

[jira] [Created] (SPARK-35741) Variance of 1 record gives NULL in Spark 3.x and NaN in Spark 2.x

2021-06-11 Thread Abdeali Kothari (Jira)
Abdeali Kothari created SPARK-35741: --- Summary: Variance of 1 record gives NULL in Spark 3.x and NaN in Spark 2.x Key: SPARK-35741 URL: https://issues.apache.org/jira/browse/SPARK-35741 Project:

[jira] [Comment Edited] (SPARK-7276) withColumn is very slow on dataframe with large number of columns

2021-03-12 Thread Abdeali Kothari (Jira)
[ https://issues.apache.org/jira/browse/SPARK-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17300724#comment-17300724 ] Abdeali Kothari edited comment on SPARK-7276 at 3/13/21, 5:09 AM: -- I can

[jira] [Commented] (SPARK-7276) withColumn is very slow on dataframe with large number of columns

2021-03-12 Thread Abdeali Kothari (Jira)
[ https://issues.apache.org/jira/browse/SPARK-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17300724#comment-17300724 ] Abdeali Kothari commented on SPARK-7276:  confirm that even with 2.4 there is a significant

[jira] [Comment Edited] (SPARK-7276) withColumn is very slow on dataframe with large number of columns

2021-03-12 Thread Abdeali Kothari (Jira)
[ https://issues.apache.org/jira/browse/SPARK-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17300724#comment-17300724 ] Abdeali Kothari edited comment on SPARK-7276 at 3/13/21, 5:03 AM: -- I can

[jira] [Updated] (SPARK-7276) withColumn is very slow on dataframe with large number of columns

2021-03-12 Thread Abdeali Kothari (Jira)
[ https://issues.apache.org/jira/browse/SPARK-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdeali Kothari updated SPARK-7276: --- Attachment: test.py > withColumn is very slow on dataframe with large number of columns >

[jira] [Created] (SPARK-30187) NULL handling in PySpark-PandasUDF

2019-12-09 Thread Abdeali Kothari (Jira)
Abdeali Kothari created SPARK-30187: --- Summary: NULL handling in PySpark-PandasUDF Key: SPARK-30187 URL: https://issues.apache.org/jira/browse/SPARK-30187 Project: Spark Issue Type: Bug

[jira] [Comment Edited] (SPARK-25992) Accumulators giving KeyError in pyspark

2018-12-09 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713965#comment-16713965 ] Abdeali Kothari edited comment on SPARK-25992 at 12/9/18 1:05 PM: -- Here

[jira] [Commented] (SPARK-25992) Accumulators giving KeyError in pyspark

2018-12-09 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713965#comment-16713965 ] Abdeali Kothari commented on SPARK-25992: - Here is a reproducible example in pyspark where using

[jira] [Comment Edited] (SPARK-25992) Accumulators giving KeyError in pyspark

2018-12-09 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16713965#comment-16713965 ] Abdeali Kothari edited comment on SPARK-25992 at 12/9/18 1:02 PM: -- Here

[jira] [Commented] (SPARK-25992) Accumulators giving KeyError in pyspark

2018-11-22 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16695684#comment-16695684 ] Abdeali Kothari commented on SPARK-25992: - I do need to solve this, so I will be looking into it

[jira] [Commented] (SPARK-25992) Accumulators giving KeyError in pyspark

2018-11-15 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687726#comment-16687726 ] Abdeali Kothari commented on SPARK-25992: - [~hyukjin.kwon] I tried a fair bit and was unable to

[jira] [Updated] (SPARK-26067) Pandas GROUPED_MAP udf breaks if DF has >255 columns

2018-11-14 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-26067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdeali Kothari updated SPARK-26067: Description: When I run spark's Pandas GROUPED_MAP udfs to apply a UDAF i wrote in

[jira] [Created] (SPARK-26067) Pandas GROUPED_MAP udf breaks if DF has >255 columns

2018-11-14 Thread Abdeali Kothari (JIRA)
Abdeali Kothari created SPARK-26067: --- Summary: Pandas GROUPED_MAP udf breaks if DF has >255 columns Key: SPARK-26067 URL: https://issues.apache.org/jira/browse/SPARK-26067 Project: Spark

[jira] [Created] (SPARK-25992) Accumulators giving KeyError in pyspark

2018-11-09 Thread Abdeali Kothari (JIRA)
Abdeali Kothari created SPARK-25992: --- Summary: Accumulators giving KeyError in pyspark Key: SPARK-25992 URL: https://issues.apache.org/jira/browse/SPARK-25992 Project: Spark Issue Type:

[jira] [Updated] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs

2018-10-07 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abdeali Kothari updated SPARK-25591: Description: When having multiple Python UDFs - the last Python UDF's accumulator is the

[jira] [Created] (SPARK-25591) PySpark Accumulators with multiple PythonUDFs

2018-10-01 Thread Abdeali Kothari (JIRA)
Abdeali Kothari created SPARK-25591: --- Summary: PySpark Accumulators with multiple PythonUDFs Key: SPARK-25591 URL: https://issues.apache.org/jira/browse/SPARK-25591 Project: Spark Issue

[jira] [Commented] (SPARK-24458) Invalid PythonUDF check_1(), requires attributes from more than one child

2018-06-19 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516899#comment-16516899 ] Abdeali Kothari commented on SPARK-24458: - I can be anything. I had a file with 1 column, 1 row

[jira] [Comment Edited] (SPARK-24458) Invalid PythonUDF check_1(), requires attributes from more than one child

2018-06-16 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514723#comment-16514723 ] Abdeali Kothari edited comment on SPARK-24458 at 6/16/18 9:10 AM: --

[jira] [Commented] (SPARK-24458) Invalid PythonUDF check_1(), requires attributes from more than one child

2018-06-16 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514725#comment-16514725 ] Abdeali Kothari commented on SPARK-24458: - Weirdly adding a random column in the starting makes

[jira] [Commented] (SPARK-24458) Invalid PythonUDF check_1(), requires attributes from more than one child

2018-06-16 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514723#comment-16514723 ] Abdeali Kothari commented on SPARK-24458: - Found a reproducible and much simpler example:  

[jira] [Comment Edited] (SPARK-24458) Invalid PythonUDF check_1(), requires attributes from more than one child

2018-06-16 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514709#comment-16514709 ] Abdeali Kothari edited comment on SPARK-24458 at 6/16/18 7:22 AM: -- Got

[jira] [Commented] (SPARK-24458) Invalid PythonUDF check_1(), requires attributes from more than one child

2018-06-16 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514709#comment-16514709 ] Abdeali Kothari commented on SPARK-24458: - Got into this a bit more and found that the error is

[jira] [Comment Edited] (SPARK-24458) Invalid PythonUDF check_1(), requires attributes from more than one child

2018-06-16 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16514709#comment-16514709 ] Abdeali Kothari edited comment on SPARK-24458 at 6/16/18 7:19 AM: -- Got

[jira] [Created] (SPARK-24458) Invalid PythonUDF check_1(), requires attributes from more than one child

2018-06-04 Thread Abdeali Kothari (JIRA)
Abdeali Kothari created SPARK-24458: --- Summary: Invalid PythonUDF check_1(), requires attributes from more than one child Key: SPARK-24458 URL: https://issues.apache.org/jira/browse/SPARK-24458

[jira] [Created] (SPARK-22448) Add functions like Mode(), NumNulls(), etc. in Summarizer

2017-11-04 Thread Abdeali Kothari (JIRA)
Abdeali Kothari created SPARK-22448: --- Summary: Add functions like Mode(), NumNulls(), etc. in Summarizer Key: SPARK-22448 URL: https://issues.apache.org/jira/browse/SPARK-22448 Project: Spark

[jira] [Created] (SPARK-22447) SAS reading functionality

2017-11-04 Thread Abdeali Kothari (JIRA)
Abdeali Kothari created SPARK-22447: --- Summary: SAS reading functionality Key: SPARK-22447 URL: https://issues.apache.org/jira/browse/SPARK-22447 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-16957) Use weighted midpoints for split values.

2016-08-27 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16957?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15441640#comment-15441640 ] Abdeali Kothari commented on SPARK-16957: - Hi, I'd like to begin contributing, and this seems

[jira] [Commented] (SPARK-5456) Decimal Type comparison issue

2016-08-18 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5456?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426351#comment-15426351 ] Abdeali Kothari commented on SPARK-5456: For the record, I get this error in 1.4.1 too. > Decimal

[jira] [Commented] (SPARK-12070) PySpark implementation of Slicing operator incorrect

2016-08-18 Thread Abdeali Kothari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15426048#comment-15426048 ] Abdeali Kothari commented on SPARK-12070: - I notice that this has been set to Won't Fix and