[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-02 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351305#comment-16351305 ] Xiao Li commented on SPARK-21658: - Yes. We should revert this. It is risky. > Adds the default None for

[jira] [Assigned] (SPARK-23327) Update the description of three external API or functions

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23327: Assignee: Xiao Li (was: Apache Spark) > Update the description of three external API or

[jira] [Assigned] (SPARK-23327) Update the description of three external API or functions

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23327: Assignee: Apache Spark (was: Xiao Li) > Update the description of three external API or

[jira] [Commented] (SPARK-23327) Update the description of three external API or functions

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23327?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351297#comment-16351297 ] Apache Spark commented on SPARK-23327: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Created] (SPARK-23327) Update the description of three external API or functions

2018-02-02 Thread Xiao Li (JIRA)
Xiao Li created SPARK-23327: --- Summary: Update the description of three external API or functions Key: SPARK-23327 URL: https://issues.apache.org/jira/browse/SPARK-23327 Project: Spark Issue Type:

[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351260#comment-16351260 ] Reynold Xin commented on SPARK-21658: - I'd revert this one first. I'd even consider the other one a

[jira] [Resolved] (SPARK-23317) rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-23317. - Resolution: Fixed Fix Version/s: 2.3.0 > rename ContinuousReader.setOffset to setStartOffset >

[jira] [Assigned] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23310: Assignee: (was: Apache Spark) > Perf regression introduced by SPARK-21113 >

[jira] [Assigned] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23310: Assignee: Apache Spark > Perf regression introduced by SPARK-21113 >

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351207#comment-16351207 ] Apache Spark commented on SPARK-23310: -- User 'sitalkedia' has created a pull request for this issue:

[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-02 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351196#comment-16351196 ] Hyukjin Kwon commented on SPARK-21658: -- [~rxin], this JIRA fixes the signature of an alias to match

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351188#comment-16351188 ] Felix Cheung commented on SPARK-23314: -- Thanks. I have isolated this to a different subset of data,

[jira] [Commented] (SPARK-23081) Add colRegex API to PySpark

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351186#comment-16351186 ] Reynold Xin commented on SPARK-23081: - Scala and Python actually. Sorry I was only commmenting on

[jira] [Commented] (SPARK-23081) Add colRegex API to PySpark

2018-02-02 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351166#comment-16351166 ] Hyukjin Kwon commented on SPARK-23081: -- Do you mean both Scala and Python APIs or Python

[jira] [Commented] (SPARK-20090) Add StructType.fieldNames to Python API

2018-02-02 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351159#comment-16351159 ] Hyukjin Kwon commented on SPARK-20090: -- I don't object to add an alias in Scala side and remove

[jira] [Commented] (SPARK-20090) Add StructType.fieldNames to Python API

2018-02-02 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351155#comment-16351155 ] Hyukjin Kwon commented on SPARK-20090: -- I think we deprecated this roughly to rename {{names}} to

[jira] [Commented] (SPARK-23064) Add documentation for stream-stream joins

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351136#comment-16351136 ] Apache Spark commented on SPARK-23064: -- User 'tdas' has created a pull request for this issue:

[jira] [Commented] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351118#comment-16351118 ] Apache Spark commented on SPARK-23326: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23326: Assignee: Apache Spark > "Scheduler Delay" of a task is confusing >

[jira] [Assigned] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23326: Assignee: (was: Apache Spark) > "Scheduler Delay" of a task is confusing >

[jira] [Updated] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-23326: - Description: Run the following code and check the UI {code} sc.makeRDD(1 to 1, 1).foreach { i =>

[jira] [Updated] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-23326: - Environment: (was: Run the following code and check the UI {code} sc.makeRDD(1 to 1,

[jira] [Created] (SPARK-23326) "Scheduler Delay" of a task is confusing

2018-02-02 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-23326: Summary: "Scheduler Delay" of a task is confusing Key: SPARK-23326 URL: https://issues.apache.org/jira/browse/SPARK-23326 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-21113) Support for read ahead input stream to amortize disk IO cost in the Spill reader

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351096#comment-16351096 ] Apache Spark commented on SPARK-21113: -- User 'sitalkedia' has created a pull request for this issue:

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351097#comment-16351097 ] Sital Kedia commented on SPARK-23310: - https://github.com/apache/spark/pull/20492 > Perf regression

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Nicolas Poggi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351071#comment-16351071 ] Nicolas Poggi commented on SPARK-23310: --- [~sitalke...@gmail.com] we have found around 18% higher

[jira] [Commented] (SPARK-23324) Announce new Kubernetes back-end for 2.3 release notes

2018-02-02 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351060#comment-16351060 ] Sameer Agarwal commented on SPARK-23324:   Thanks [~eje], this is definitely going to be a major

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351048#comment-16351048 ] Sameer Agarwal commented on SPARK-23310: [~sitalke...@gmail.com] it'd be great if you can create

[jira] [Commented] (SPARK-23325) DataSourceV2 readers should always produce InternalRow.

2018-02-02 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351043#comment-16351043 ] Ryan Blue commented on SPARK-23325: --- [~cloud_fan], FYI. > DataSourceV2 readers should always produce

[jira] [Updated] (SPARK-23324) Announce new Kubernetes back-end for 2.3 release notes

2018-02-02 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-23324: -- Labels: documentation kubernetes releasenotes (was: documentation kubernetes release_notes) The

[jira] [Created] (SPARK-23325) DataSourceV2 readers should always produce InternalRow.

2018-02-02 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-23325: - Summary: DataSourceV2 readers should always produce InternalRow. Key: SPARK-23325 URL: https://issues.apache.org/jira/browse/SPARK-23325 Project: Spark Issue

[jira] [Commented] (SPARK-23324) Announce new Kubernetes back-end for 2.3 release notes

2018-02-02 Thread Erik Erlandson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351020#comment-16351020 ] Erik Erlandson commented on SPARK-23324: cc [~sameer], [~foxish] > Announce new Kubernetes

[jira] [Created] (SPARK-23324) Announce new Kubernetes back-end for 2.3 release notes

2018-02-02 Thread Erik Erlandson (JIRA)
Erik Erlandson created SPARK-23324: -- Summary: Announce new Kubernetes back-end for 2.3 release notes Key: SPARK-23324 URL: https://issues.apache.org/jira/browse/SPARK-23324 Project: Spark

[jira] [Commented] (SPARK-23323) DataSourceV2 should use the output commit coordinator.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351015#comment-16351015 ] Apache Spark commented on SPARK-23323: -- User 'rdblue' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23323) DataSourceV2 should use the output commit coordinator.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23323: Assignee: (was: Apache Spark) > DataSourceV2 should use the output commit

[jira] [Assigned] (SPARK-23323) DataSourceV2 should use the output commit coordinator.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23323: Assignee: Apache Spark > DataSourceV2 should use the output commit coordinator. >

[jira] [Created] (SPARK-23323) DataSourceV2 should use the output commit coordinator.

2018-02-02 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-23323: - Summary: DataSourceV2 should use the output commit coordinator. Key: SPARK-23323 URL: https://issues.apache.org/jira/browse/SPARK-23323 Project: Spark Issue Type:

[jira] [Updated] (SPARK-23321) DataSourceV2 should apply some validation when writing.

2018-02-02 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated SPARK-23321: -- Issue Type: Sub-task (was: Bug) Parent: SPARK-22386 > DataSourceV2 should apply some

[jira] [Created] (SPARK-23322) Launcher handles can miss application updates if application finishes too quickly

2018-02-02 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-23322: -- Summary: Launcher handles can miss application updates if application finishes too quickly Key: SPARK-23322 URL: https://issues.apache.org/jira/browse/SPARK-23322

[jira] [Updated] (SPARK-23053) taskBinarySerialization and task partitions calculate in DagScheduler.submitMissingTasks should keep the same RDD checkpoint status

2018-02-02 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid updated SPARK-23053: - Component/s: Scheduler > taskBinarySerialization and task partitions calculate in >

[jira] [Assigned] (SPARK-22820) Spark 2.3 SQL API audit

2018-02-02 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-22820: --- Assignee: Xiao Li > Spark 2.3 SQL API audit > --- > > Key:

[jira] [Updated] (SPARK-22820) Spark 2.3 SQL API audit

2018-02-02 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-22820: Priority: Blocker (was: Major) > Spark 2.3 SQL API audit > --- > >

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350971#comment-16350971 ] Li Jin commented on SPARK-23314: Hi [~felixcheung] Thanks for the information. However, I still cannot

[jira] [Assigned] (SPARK-23321) DataSourceV2 should apply some validation when writing.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23321: Assignee: (was: Apache Spark) > DataSourceV2 should apply some validation when

[jira] [Commented] (SPARK-23321) DataSourceV2 should apply some validation when writing.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350967#comment-16350967 ] Apache Spark commented on SPARK-23321: -- User 'rdblue' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23321) DataSourceV2 should apply some validation when writing.

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23321: Assignee: Apache Spark > DataSourceV2 should apply some validation when writing. >

[jira] [Updated] (SPARK-23321) DataSourceV2 should apply some validation when writing.

2018-02-02 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated SPARK-23321: -- Summary: DataSourceV2 should apply some validation when writing. (was: DataSourceV2 should apply

[jira] [Created] (SPARK-23321) DataSourceV2 should apply preprocess rules for inserts.

2018-02-02 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-23321: - Summary: DataSourceV2 should apply preprocess rules for inserts. Key: SPARK-23321 URL: https://issues.apache.org/jira/browse/SPARK-23321 Project: Spark Issue

[jira] [Commented] (SPARK-23139) Read eventLog file with mixed encodings

2018-02-02 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350933#comment-16350933 ] Imran Rashid commented on SPARK-23139: -- Apologies if this is a really silly question -- but does

[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350901#comment-16350901 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 8:29 PM: --- I should

[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350901#comment-16350901 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 8:29 PM: --- I should

[jira] [Commented] (SPARK-23290) inadvertent change in handling of DateType when converting to pandas dataframe

2018-02-02 Thread Andre Menck (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350917#comment-16350917 ] Andre Menck commented on SPARK-23290: - Hey [~ueshin] apologies, I tried to come up with a simpler

[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350813#comment-16350813 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 8:15 PM: --- I'm still

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350901#comment-16350901 ] Thomas Graves commented on SPARK-23309: --- I should ask is there a log statement or query plan I can

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350900#comment-16350900 ] Thomas Graves commented on SPARK-23309: --- So the last test I did was spark 2.3 with the old hive

[jira] [Commented] (SPARK-20425) Support an extended display mode to print a column data per line

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350876#comment-16350876 ] Reynold Xin commented on SPARK-20425: - Hey so I don't think we should be doing multiple boolean

[jira] [Commented] (SPARK-23310) Perf regression introduced by SPARK-21113

2018-02-02 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350875#comment-16350875 ] Sital Kedia commented on SPARK-23310: - [~yhuai] - Sorry about introducing the regression for TPC-DS

[jira] [Commented] (SPARK-21852) Empty Parquet Files created as a result of spark jobs fail when read

2018-02-02 Thread Ravi Chittilla (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350871#comment-16350871 ] Ravi Chittilla commented on SPARK-21852: +1 > Empty Parquet Files created as a result of spark

[jira] [Commented] (SPARK-20090) Add StructType.fieldNames to Python API

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350861#comment-16350861 ] Reynold Xin commented on SPARK-20090: - Why would we deprecate this? I'd probably add names to Scala

[jira] [Commented] (SPARK-23290) inadvertent change in handling of DateType when converting to pandas dataframe

2018-02-02 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350860#comment-16350860 ] Sameer Agarwal commented on SPARK-23290: [~amenck] [~aash] any updates here? > inadvertent

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350857#comment-16350857 ] Xiao Li commented on SPARK-23309: - Based on my understanding about what [~tgraves]said above, the number

[jira] [Commented] (SPARK-23081) Add colRegex API to PySpark

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350852#comment-16350852 ] Reynold Xin commented on SPARK-23081: - Sorry why are we adding things like this? I see the value of

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350846#comment-16350846 ] Dongjoon Hyun commented on SPARK-23309: --- To sum up, the same Hive code (old Hive path) of Spark

[jira] [Commented] (SPARK-21658) Adds the default None for value in na.replace in PySpark to match

2018-02-02 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350848#comment-16350848 ] Reynold Xin commented on SPARK-21658: - Sorry but I object to this change. Why would we put null as

[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350846#comment-16350846 ] Dongjoon Hyun edited comment on SPARK-23309 at 2/2/18 7:35 PM: --- To sum up,

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350819#comment-16350819 ] Felix Cheung commented on SPARK-23314: -- Im running python 2 Pandas 0.22.0 Pyarrow 0.8.0 > Pandas

[jira] [Reopened] (SPARK-17859) persist should not impede with spark's ability to perform a broadcast join.

2018-02-02 Thread Fernando Pereira (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17859?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fernando Pereira reopened SPARK-17859: -- This bug persists {code:java} SPARK version 2.2.1 SparkSession available as 'spark'. In

[jira] [Comment Edited] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350813#comment-16350813 ] Thomas Graves edited comment on SPARK-23309 at 2/2/18 7:04 PM: --- I'm still

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350813#comment-16350813 ] Thomas Graves commented on SPARK-23309: --- I'm still seeing spark 2.3 slower by about 15% for the

[jira] [Updated] (SPARK-23288) Incorrect number of written records in structured streaming

2018-02-02 Thread Yuriy Bondaruk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuriy Bondaruk updated SPARK-23288: --- Component/s: SQL > Incorrect number of written records in structured streaming >

[jira] [Resolved] (SPARK-23295) Exclude Waring message when generating versions in make-distribution.sh

2018-02-02 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-23295. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20469

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350599#comment-16350599 ] Li Jin commented on SPARK-23314: [~felixcheung], what's the version of pandas you are using in your

[jira] [Assigned] (SPARK-23295) Exclude Waring message when generating versions in make-distribution.sh

2018-02-02 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-23295: - Assignee: Kent Yao > Exclude Waring message when generating versions in make-distribution.sh >

[jira] [Commented] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350568#comment-16350568 ] Li Jin commented on SPARK-23314: I am taking a look at this > Pandas grouped udf on dataset with

[jira] [Updated] (SPARK-23314) Pandas grouped udf on dataset with timestamp column error

2018-02-02 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-23314: --- Issue Type: Sub-task (was: Bug) Parent: SPARK-22216 > Pandas grouped udf on dataset with timestamp

[jira] [Commented] (SPARK-23309) Spark 2.3 cached query performance 20-30% worse then spark 2.2

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350533#comment-16350533 ] Thomas Graves commented on SPARK-23309: --- Note the schema of "something" here is a "string". I'll

[jira] [Assigned] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one

2018-02-02 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-23253: Assignee: Kent Yao > Only write shuffle temporary index file when there is not an

[jira] [Resolved] (SPARK-23253) Only write shuffle temporary index file when there is not an existing one

2018-02-02 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-23253. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20422

[jira] [Resolved] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-23304. --- Resolution: Invalid > Spark SQL coalesce() against hive not working >

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350440#comment-16350440 ] Thomas Graves commented on SPARK-23304: --- ok so I guess by that logic then the coalesce won't every

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350423#comment-16350423 ] Thomas Graves commented on SPARK-23304: --- it doesn't look like sql("xyz").rdd.partitions.length

[jira] [Commented] (SPARK-23304) Spark SQL coalesce() against hive not working

2018-02-02 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350428#comment-16350428 ] Thomas Graves commented on SPARK-23304: --- well I guess that give you end # of partitions and not the

[jira] [Resolved] (SPARK-23312) add a config to turn off vectorized cache reader

2018-02-02 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-23312. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20483

[jira] [Updated] (SPARK-23320) RANDOM pseudo environment variable has low resolution under Windows

2018-02-02 Thread Olivier Sannier (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olivier Sannier updated SPARK-23320: Description: Under Windows, spark-submit.bat calls spark-class2.cmd which then runs

[jira] [Created] (SPARK-23320) RANDOM pseudo environment variable has low resolution under Windows

2018-02-02 Thread Olivier Sannier (JIRA)
Olivier Sannier created SPARK-23320: --- Summary: RANDOM pseudo environment variable has low resolution under Windows Key: SPARK-23320 URL: https://issues.apache.org/jira/browse/SPARK-23320 Project:

[jira] [Commented] (SPARK-23319) Skip PySpark tests for old Pandas and old PyArrow

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350365#comment-16350365 ] Apache Spark commented on SPARK-23319: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Assigned] (SPARK-23319) Skip PySpark tests for old Pandas and old PyArrow

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23319: Assignee: Apache Spark > Skip PySpark tests for old Pandas and old PyArrow >

[jira] [Assigned] (SPARK-23319) Skip PySpark tests for old Pandas and old PyArrow

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23319: Assignee: (was: Apache Spark) > Skip PySpark tests for old Pandas and old PyArrow >

[jira] [Created] (SPARK-23319) Skip PySpark tests for old Pandas and old PyArrow

2018-02-02 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-23319: Summary: Skip PySpark tests for old Pandas and old PyArrow Key: SPARK-23319 URL: https://issues.apache.org/jira/browse/SPARK-23319 Project: Spark Issue

[jira] [Commented] (SPARK-23269) FP-growth: Provide last transaction for each detected frequent pattern

2018-02-02 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23269?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350308#comment-16350308 ] Sean Owen commented on SPARK-23269: --- Doesn't this incur similar overhead for every caller though? >

[jira] [Commented] (SPARK-23318) FP-growth: WARN FPGrowth: Input data is not cached

2018-02-02 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350309#comment-16350309 ] Sean Owen commented on SPARK-23318: --- Yes, a similar change sounds fine. > FP-growth: WARN FPGrowth:

[jira] [Comment Edited] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.

2018-02-02 Thread Zoltan Ivanfi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350240#comment-16350240 ] Zoltan Ivanfi edited comment on SPARK-12297 at 2/2/18 12:35 PM: Hive

[jira] [Commented] (SPARK-12297) Add work-around for Parquet/Hive int96 timestamp bug.

2018-02-02 Thread Zoltan Ivanfi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350240#comment-16350240 ] Zoltan Ivanfi commented on SPARK-12297: --- Hive already has a workaround based on a the writer

[jira] [Created] (SPARK-23318) FP-growth: WARN FPGrowth: Input data is not cached

2018-02-02 Thread Arseniy Tashoyan (JIRA)
Arseniy Tashoyan created SPARK-23318: Summary: FP-growth: WARN FPGrowth: Input data is not cached Key: SPARK-23318 URL: https://issues.apache.org/jira/browse/SPARK-23318 Project: Spark

[jira] [Assigned] (SPARK-23317) rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23317: Assignee: Apache Spark (was: Wenchen Fan) > rename ContinuousReader.setOffset to

[jira] [Commented] (SPARK-23317) rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350145#comment-16350145 ] Apache Spark commented on SPARK-23317: -- User 'cloud-fan' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23317) rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23317: Assignee: Wenchen Fan (was: Apache Spark) > rename ContinuousReader.setOffset to

[jira] [Created] (SPARK-23317) rename ContinuousReader.setOffset to setStartOffset

2018-02-02 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-23317: --- Summary: rename ContinuousReader.setOffset to setStartOffset Key: SPARK-23317 URL: https://issues.apache.org/jira/browse/SPARK-23317 Project: Spark Issue

[jira] [Commented] (SPARK-23316) AnalysisException after max iteration reached for IN query

2018-02-02 Thread Bogdan Raducanu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350196#comment-16350196 ] Bogdan Raducanu commented on SPARK-23316: - I'll work on a fix > AnalysisException after max

[jira] [Updated] (SPARK-23316) AnalysisException after max iteration reached for IN query

2018-02-02 Thread Bogdan Raducanu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bogdan Raducanu updated SPARK-23316: Affects Version/s: 2.4.0 > AnalysisException after max iteration reached for IN query >

  1   2   >