[jira] [Assigned] (SPARK-23169) Run lintr on the changes of lint-r script and .lintr configuration

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23169: Assignee: (was: Apache Spark) > Run lintr on the changes of lint-r script and .lintr

[jira] [Commented] (SPARK-23169) Run lintr on the changes of lint-r script and .lintr configuration

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333427#comment-16333427 ] Apache Spark commented on SPARK-23169: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Assigned] (SPARK-23169) Run lintr on the changes of lint-r script and .lintr configuration

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23169: Assignee: Apache Spark > Run lintr on the changes of lint-r script and .lintr

[jira] [Resolved] (SPARK-23156) Code of method "initialize(I)V" of class "org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection" grows beyond 64 KB

2018-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-23156. --- Resolution: Duplicate > Code of method "initialize(I)V" of class >

[jira] [Assigned] (SPARK-22119) Add cosine distance to KMeans

2018-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-22119: - Assignee: Marco Gaido > Add cosine distance to KMeans > - > >

[jira] [Resolved] (SPARK-22119) Add cosine distance to KMeans

2018-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22119. --- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 19340

[jira] [Commented] (SPARK-23171) Reduce the time costs of the rule runs that do not change the plans

2018-01-21 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333551#comment-16333551 ] Takeshi Yamamuro commented on SPARK-23171: -- ok, I'll check code based on these metrics. >

[jira] [Commented] (SPARK-23167) Update TPCDS queries from v1.4 to v2.7 (latest)

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333563#comment-16333563 ] Apache Spark commented on SPARK-23167: -- User 'maropu' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23167) Update TPCDS queries from v1.4 to v2.7 (latest)

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23167: Assignee: (was: Apache Spark) > Update TPCDS queries from v1.4 to v2.7 (latest) >

[jira] [Assigned] (SPARK-23167) Update TPCDS queries from v1.4 to v2.7 (latest)

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23167: Assignee: Apache Spark > Update TPCDS queries from v1.4 to v2.7 (latest) >

[jira] [Created] (SPARK-23169) Run lintr on the changes of lint-r script and .lintr configuration

2018-01-21 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-23169: Summary: Run lintr on the changes of lint-r script and .lintr configuration Key: SPARK-23169 URL: https://issues.apache.org/jira/browse/SPARK-23169 Project: Spark

[jira] [Commented] (SPARK-21293) R document update structured streaming

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333438#comment-16333438 ] Apache Spark commented on SPARK-21293: -- User 'felixcheung' has created a pull request for this

[jira] [Assigned] (SPARK-21293) R document update structured streaming

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21293: Assignee: Felix Cheung (was: Apache Spark) > R document update structured streaming >

[jira] [Assigned] (SPARK-21293) R document update structured streaming

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21293: Assignee: Apache Spark (was: Felix Cheung) > R document update structured streaming >

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-21 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333477#comment-16333477 ] Steve Loughran commented on SPARK-23050: there's one thing which worries me here: the implication

[jira] [Commented] (SPARK-23167) Update TPCDS queries from v1.4 to v2.7 (latest)

2018-01-21 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333533#comment-16333533 ] Xiao Li commented on SPARK-23167: - [~maropu] Could you add a new suite for TPC-DS 2.7? Thanks! > Update

[jira] [Commented] (SPARK-23168) Hints for fact tables and unique columns

2018-01-21 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333534#comment-16333534 ] Xiao Li commented on SPARK-23168: - This is part of https://issues.apache.org/jira/browse/SPARK-19842 >

[jira] [Created] (SPARK-23170) Dump the statistics of effective runs of analyzer and optimizer rules

2018-01-21 Thread Xiao Li (JIRA)
Xiao Li created SPARK-23170: --- Summary: Dump the statistics of effective runs of analyzer and optimizer rules Key: SPARK-23170 URL: https://issues.apache.org/jira/browse/SPARK-23170 Project: Spark

[jira] [Assigned] (SPARK-23170) Dump the statistics of effective runs of analyzer and optimizer rules

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23170: Assignee: Apache Spark (was: Xiao Li) > Dump the statistics of effective runs of

[jira] [Commented] (SPARK-23167) Update TPCDS queries from v1.4 to v2.7 (latest)

2018-01-21 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333536#comment-16333536 ] Takeshi Yamamuro commented on SPARK-23167: -- ok, will do. > Update TPCDS queries from v1.4 to

[jira] [Commented] (SPARK-23170) Dump the statistics of effective runs of analyzer and optimizer rules

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333535#comment-16333535 ] Apache Spark commented on SPARK-23170: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23170) Dump the statistics of effective runs of analyzer and optimizer rules

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23170: Assignee: Xiao Li (was: Apache Spark) > Dump the statistics of effective runs of

[jira] [Commented] (SPARK-23171) Reduce the time costs of the rule runs that do not change the plans

2018-01-21 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333540#comment-16333540 ] Xiao Li commented on SPARK-23171: - cc [~maropu] > Reduce the time costs of the rule runs that do not

[jira] [Created] (SPARK-23171) Reduce the time costs of the rule runs that do not change the plans

2018-01-21 Thread Xiao Li (JIRA)
Xiao Li created SPARK-23171: --- Summary: Reduce the time costs of the rule runs that do not change the plans Key: SPARK-23171 URL: https://issues.apache.org/jira/browse/SPARK-23171 Project: Spark

[jira] [Comment Edited] (SPARK-23171) Reduce the time costs of the rule runs that do not change the plans

2018-01-21 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333540#comment-16333540 ] Xiao Li edited comment on SPARK-23171 at 1/21/18 2:24 PM: -- cc [~maropu] This is

[jira] [Updated] (SPARK-23166) Add maxDF Parameter to CountVectorizer

2018-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-23166: -- Priority: Minor (was: Major) Seems fine; open a pull request. > Add maxDF Parameter to

[jira] [Commented] (SPARK-19842) Informational Referential Integrity Constraints Support in Spark

2018-01-21 Thread Ioana Delaney (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333703#comment-16333703 ] Ioana Delaney commented on SPARK-19842: --- The benefits of this work is that it opens up an area of

[jira] [Created] (SPARK-23173) from_json can produce nulls for fields which are marked as non-nullable

2018-01-21 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-23173: - Summary: from_json can produce nulls for fields which are marked as non-nullable Key: SPARK-23173 URL: https://issues.apache.org/jira/browse/SPARK-23173

[jira] [Commented] (SPARK-22320) ORC should support VectorUDT/MatrixUDT

2018-01-21 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333784#comment-16333784 ] Dongjoon Hyun commented on SPARK-22320: --- For this one, Parquet saves the original schema as a

[jira] [Commented] (SPARK-20906) Constrained Logistic Regression for SparkR

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333654#comment-16333654 ] Felix Cheung commented on SPARK-20906: -- [~wm624] would you like to add example of this in the R

[jira] [Commented] (SPARK-22208) Improve percentile_approx by not rounding up targetError and starting from index 0

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333659#comment-16333659 ] Felix Cheung commented on SPARK-22208: -- Is this documented in the SQL programming guide/ migration

[jira] [Commented] (SPARK-23115) SparkR 2.3 QA: New R APIs and API docs

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333677#comment-16333677 ] Felix Cheung commented on SPARK-23115: -- Another pass, we should add API doc for SPARK-20906 >

[jira] [Commented] (SPARK-20307) SparkR: pass on setHandleInvalid to spark.mllib functions that use StringIndexer

2018-01-21 Thread Joseph Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333682#comment-16333682 ] Joseph Wang commented on SPARK-20307: - Hi Felix, I can do that but I have a family emergency lately.

[jira] [Commented] (SPARK-23107) ML, Graph 2.3 QA: API: New Scala APIs, docs

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333712#comment-16333712 ] Felix Cheung commented on SPARK-23107: -- We don't have doc on RFormula but it'll be good idea to also

[jira] [Assigned] (SPARK-23169) Run lintr on the changes of lint-r script and .lintr configuration

2018-01-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-23169: Assignee: Hyukjin Kwon > Run lintr on the changes of lint-r script and .lintr

[jira] [Resolved] (SPARK-23169) Run lintr on the changes of lint-r script and .lintr configuration

2018-01-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23169?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23169. -- Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20339

[jira] [Resolved] (SPARK-20947) Encoding/decoding issue in PySpark pipe implementation

2018-01-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-20947. -- Resolution: Fixed Fixed in https://github.com/apache/spark/pull/18277 > Encoding/decoding

[jira] [Updated] (SPARK-20947) Encoding/decoding issue in PySpark pipe implementation

2018-01-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-20947: - Fix Version/s: 2.4.0 > Encoding/decoding issue in PySpark pipe implementation >

[jira] [Assigned] (SPARK-23174) Fix pep8 to latest official version

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23174: Assignee: Apache Spark > Fix pep8 to latest official version >

[jira] [Commented] (SPARK-23174) Fix pep8 to latest official version

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333793#comment-16333793 ] Apache Spark commented on SPARK-23174: -- User 'rekhajoshm' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23174) Fix pep8 to latest official version

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23174: Assignee: (was: Apache Spark) > Fix pep8 to latest official version >

[jira] [Updated] (SPARK-23173) from_json can produce nulls for fields which are marked as non-nullable

2018-01-21 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-23173: -- Description: The {{from_json}} function uses a schema to convert a string into a Spark

[jira] [Updated] (SPARK-23173) from_json can produce nulls for fields which are marked as non-nullable

2018-01-21 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell updated SPARK-23173: -- Description: The {{from_json}} function uses a schema to convert a string into a Spark

[jira] [Commented] (SPARK-20129) JavaSparkContext should use SparkContext.getOrCreate

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333808#comment-16333808 ] Apache Spark commented on SPARK-20129: -- User 'rekhajoshm' has created a pull request for this issue:

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-01-21 Thread Yash Sharma (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333809#comment-16333809 ] Yash Sharma commented on SPARK-23050: - Hi [~ste...@apache.org], Thanks for bringing this great

[jira] [Comment Edited] (SPARK-23114) Spark R 2.3 QA umbrella

2018-01-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333755#comment-16333755 ] Hyukjin Kwon edited comment on SPARK-23114 at 1/22/18 3:05 AM: ---

[jira] [Resolved] (SPARK-22808) saveAsTable() should be marked as deprecated

2018-01-21 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-22808. - Resolution: Duplicate > saveAsTable() should be marked as deprecated >

[jira] [Commented] (SPARK-23173) from_json can produce nulls for fields which are marked as non-nullable

2018-01-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333855#comment-16333855 ] Hyukjin Kwon commented on SPARK-23173: -- I believe this one is related with SPARK-17763. My first try

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2018-01-21 Thread Ruslan Dautkhanov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333916#comment-16333916 ] Ruslan Dautkhanov commented on SPARK-18016: --- In Spark 2.2 I have the same issue with pivoting

[jira] [Commented] (SPARK-19217) Offer easy cast from vector to array

2018-01-21 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333936#comment-16333936 ] Wenchen Fan commented on SPARK-19217: - Before publishing UDT, I'm a little worried about adding more

[jira] [Commented] (SPARK-19217) Offer easy cast from vector to array

2018-01-21 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333959#comment-16333959 ] Takeshi Yamamuro commented on SPARK-19217: -- If we can, I think it's the best to reuse `sqlType`,

[jira] [Created] (SPARK-23176) REPL project build failing in Spark v2.2.0

2018-01-21 Thread shekhar reddy (JIRA)
shekhar reddy created SPARK-23176: - Summary: REPL project build failing in Spark v2.2.0 Key: SPARK-23176 URL: https://issues.apache.org/jira/browse/SPARK-23176 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-23066) Master Page increase master start-up time.

2018-01-21 Thread guoxiaolongzte (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] guoxiaolongzte resolved SPARK-23066. Resolution: Won't Fix > Master Page increase master start-up time. >

[jira] [Resolved] (SPARK-23020) Re-enable Flaky Test: org.apache.spark.launcher.SparkLauncherSuite.testInProcessLauncher

2018-01-21 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-23020. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 20297

[jira] [Commented] (SPARK-19217) Offer easy cast from vector to array

2018-01-21 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19217?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333963#comment-16333963 ] Wenchen Fan commented on SPARK-19217: - Then we also need to define how to do cast, which needs a

[jira] [Updated] (SPARK-23176) REPL project build failing in Spark v2.2.0

2018-01-21 Thread shekhar reddy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shekhar reddy updated SPARK-23176: -- Priority: Blocker (was: Major) > REPL project build failing in Spark v2.2.0 >

[jira] [Updated] (SPARK-23176) REPL project build failing in Spark v2.2.0

2018-01-21 Thread shekhar reddy (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shekhar reddy updated SPARK-23176: -- Description: I tried building Spark v2.2.0 and got compilation in 

[jira] [Commented] (SPARK-23122) Deprecate register* for UDFs in SQLContext and Catalog in PySpark

2018-01-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333911#comment-16333911 ] Apache Spark commented on SPARK-23122: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Comment Edited] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2018-01-21 Thread Gaurav Garg (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333917#comment-16333917 ] Gaurav Garg edited comment on SPARK-18016 at 1/22/18 6:32 AM: -- [~Tagar], it

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2018-01-21 Thread Gaurav Garg (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333917#comment-16333917 ] Gaurav Garg commented on SPARK-18016: - [~Tagar], it is workiing fine with me when pivoting around 10K

[jira] [Updated] (SPARK-23122) Deprecate register* for UDFs in SQLContext and Catalog in PySpark

2018-01-21 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-23122: Description: Deprecate register* for UDFs in SQLContext and Catalog in PySpark Seems we allow many other

[jira] [Commented] (SPARK-23173) from_json can produce nulls for fields which are marked as non-nullable

2018-01-21 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23173?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333940#comment-16333940 ] Wenchen Fan commented on SPARK-23173: - +1 on proposal 1. > from_json can produce nulls for fields

[jira] [Commented] (SPARK-20307) SparkR: pass on setHandleInvalid to spark.mllib functions that use StringIndexer

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333663#comment-16333663 ] Felix Cheung commented on SPARK-20307: -- for SPARK-20307 and SPARK-21381, do you think you can write

[jira] [Comment Edited] (SPARK-20906) Constrained Logistic Regression for SparkR

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333654#comment-16333654 ] Felix Cheung edited comment on SPARK-20906 at 1/21/18 8:54 PM: --- [~wm624]

[jira] [Comment Edited] (SPARK-23114) Spark R 2.3 QA umbrella

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333725#comment-16333725 ] Felix Cheung edited comment on SPARK-23114 at 1/21/18 11:02 PM:

[jira] [Comment Edited] (SPARK-23114) Spark R 2.3 QA umbrella

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333730#comment-16333730 ] Felix Cheung edited comment on SPARK-23114 at 1/21/18 11:03 PM: [~falaki]

[jira] [Commented] (SPARK-23114) Spark R 2.3 QA umbrella

2018-01-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333755#comment-16333755 ] Hyukjin Kwon commented on SPARK-23114: -- [~felixcheung], I maybe misunderstood but you mean if we can

[jira] [Created] (SPARK-23174) Fix pep8 to latest official version

2018-01-21 Thread Rekha Joshi (JIRA)
Rekha Joshi created SPARK-23174: --- Summary: Fix pep8 to latest official version Key: SPARK-23174 URL: https://issues.apache.org/jira/browse/SPARK-23174 Project: Spark Issue Type: Improvement

[jira] [Comment Edited] (SPARK-22320) ORC should support VectorUDT/MatrixUDT

2018-01-21 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333787#comment-16333787 ] Dongjoon Hyun edited comment on SPARK-22320 at 1/22/18 2:18 AM: With the

[jira] [Comment Edited] (SPARK-23117) SparkR 2.3 QA: Check for new R APIs requiring example code

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333718#comment-16333718 ] Felix Cheung edited comment on SPARK-23117 at 1/21/18 10:47 PM: I did a

[jira] [Commented] (SPARK-23114) Spark R 2.3 QA umbrella

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333730#comment-16333730 ] Felix Cheung commented on SPARK-23114: -- [~falaki] [~hyukjin.kwon] About SPARK-21093, do you think

[jira] [Resolved] (SPARK-22976) Worker cleanup can remove running driver directories

2018-01-21 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved SPARK-22976. - Resolution: Fixed Fix Version/s: 2.3.0 > Worker cleanup can remove running driver

[jira] [Resolved] (SPARK-23026) Add RegisterUDF to PySpark

2018-01-21 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-23026. - Resolution: Won't Fix > Add RegisterUDF to PySpark > -- > > Key:

[jira] [Commented] (SPARK-23084) Add unboundedPreceding(), unboundedFollowing() and currentRow() to PySpark

2018-01-21 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23084?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333860#comment-16333860 ] Xiao Li commented on SPARK-23084: - Yeah. Please go ahead. > Add unboundedPreceding(),

[jira] [Commented] (SPARK-23081) Add colRegex API to PySpark

2018-01-21 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23081?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333861#comment-16333861 ] Xiao Li commented on SPARK-23081: - Yeah. Please go ahead. > Add colRegex API to PySpark >

[jira] [Commented] (SPARK-22208) Improve percentile_approx by not rounding up targetError and starting from index 0

2018-01-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333678#comment-16333678 ] Sean Owen commented on SPARK-22208: --- It's a bug fix, and more of a corner case of behavior, so I don't

[jira] [Commented] (SPARK-23116) SparkR 2.3 QA: Update user guide for new features & APIs

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333717#comment-16333717 ] Felix Cheung commented on SPARK-23116: -- I did a pass. > SparkR 2.3 QA: Update user guide for new

[jira] [Resolved] (SPARK-23116) SparkR 2.3 QA: Update user guide for new features & APIs

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-23116. -- Resolution: Fixed Assignee: Felix Cheung Fix Version/s: 2.3.0 > SparkR 2.3 QA:

[jira] [Commented] (SPARK-23117) SparkR 2.3 QA: Check for new R APIs requiring example code

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333718#comment-16333718 ] Felix Cheung commented on SPARK-23117: -- I did a pass, I think these could use an example, preferably

[jira] [Commented] (SPARK-23114) Spark R 2.3 QA umbrella

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333725#comment-16333725 ] Felix Cheung commented on SPARK-23114: -- [~sameerag] Here are some ideas for the release notes (that

[jira] [Assigned] (SPARK-20947) Encoding/decoding issue in PySpark pipe implementation

2018-01-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-20947: Assignee: Xiaozhe Wang > Encoding/decoding issue in PySpark pipe implementation >

[jira] [Resolved] (SPARK-23175) Type conversion does not make sense under case like select ’0.1’ = 0

2018-01-21 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-23175. - Resolution: Duplicate > Type conversion does not make sense under case like select ’0.1’ = 0 >

[jira] [Commented] (SPARK-23118) SparkR 2.3 QA: Programming guide, migration guide, vignettes updates

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333707#comment-16333707 ] Felix Cheung commented on SPARK-23118: -- for programming guide, perhaps  SPARK-20906 But it mostly

[jira] [Resolved] (SPARK-23118) SparkR 2.3 QA: Programming guide, migration guide, vignettes updates

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-23118. -- Resolution: Fixed Assignee: Felix Cheung Fix Version/s: 2.3.0 > SparkR 2.3 QA:

[jira] [Comment Edited] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2018-01-21 Thread Gaurav Garg (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333857#comment-16333857 ] Gaurav Garg edited comment on SPARK-18016 at 1/22/18 4:27 AM: -- In sequence,

[jira] [Resolved] (SPARK-23000) Flaky test suite DataSourceWithHiveMetastoreCatalogSuite in Spark 2.3

2018-01-21 Thread Sameer Agarwal (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sameer Agarwal resolved SPARK-23000. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by 

[jira] [Updated] (SPARK-22208) Improve percentile_approx by not rounding up targetError and starting from index 0

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22208: - Labels: releasenotes (was: ) > Improve percentile_approx by not rounding up targetError and

[jira] [Commented] (SPARK-23108) ML, Graph 2.3 QA: API: Experimental, DeveloperApi, final, sealed audit

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333708#comment-16333708 ] Felix Cheung commented on SPARK-23108: -- >From reviewing R, it would be good to document constrained

[jira] [Comment Edited] (SPARK-20906) Constrained Logistic Regression for SparkR

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333654#comment-16333654 ] Felix Cheung edited comment on SPARK-20906 at 1/21/18 10:30 PM: [~wm624]

[jira] [Comment Edited] (SPARK-23107) ML, Graph 2.3 QA: API: New Scala APIs, docs

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333712#comment-16333712 ] Felix Cheung edited comment on SPARK-23107 at 1/21/18 11:08 PM: We don't

[jira] [Assigned] (SPARK-22976) Worker cleanup can remove running driver directories

2018-01-21 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao reassigned SPARK-22976: --- Assignee: Russell Spitzer > Worker cleanup can remove running driver directories >

[jira] [Resolved] (SPARK-22838) Avoid unnecessary copying of data

2018-01-21 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Saisai Shao resolved SPARK-22838. - Resolution: Invalid > Avoid unnecessary copying of data > - > >

[jira] [Comment Edited] (SPARK-20307) SparkR: pass on setHandleInvalid to spark.mllib functions that use StringIndexer

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333682#comment-16333682 ] Felix Cheung edited comment on SPARK-20307 at 1/21/18 10:40 PM: Hi Felix,

[jira] [Commented] (SPARK-20307) SparkR: pass on setHandleInvalid to spark.mllib functions that use StringIndexer

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333716#comment-16333716 ] Felix Cheung commented on SPARK-20307: -- I think [~wm624] if you have the time > SparkR: pass on

[jira] [Commented] (SPARK-21727) Operating on an ArrayType in a SparkR DataFrame throws error

2018-01-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21727?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333733#comment-16333733 ] Felix Cheung commented on SPARK-21727: -- how are we doing? > Operating on an ArrayType in a SparkR

[jira] [Updated] (SPARK-22320) ORC should support VectorUDT/MatrixUDT

2018-01-21 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-22320: -- Priority: Minor (was: Major) > ORC should support VectorUDT/MatrixUDT >

[jira] [Commented] (SPARK-11222) Add style checker rules to validate doc tests aren't included in docs

2018-01-21 Thread Rekha Joshi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11222?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333786#comment-16333786 ] Rekha Joshi commented on SPARK-11222: -  I have raised the doctest bank line as an

[jira] [Commented] (SPARK-22320) ORC should support VectorUDT/MatrixUDT

2018-01-21 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333787#comment-16333787 ] Dongjoon Hyun commented on SPARK-22320: --- With the above workaround, I think this seems to be a

[jira] [Created] (SPARK-23175) Type conversion does not make sense under case like select ’0.1’ = 0

2018-01-21 Thread Shaoquan Zhang (JIRA)
Shaoquan Zhang created SPARK-23175: -- Summary: Type conversion does not make sense under case like select ’0.1’ = 0 Key: SPARK-23175 URL: https://issues.apache.org/jira/browse/SPARK-23175 Project:

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2018-01-21 Thread Gaurav (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333857#comment-16333857 ] Gaurav commented on SPARK-18016: In sequence, df .groupBy({color:#008000}"col1"{color})

  1   2   >