[jira] [Created] (SPARK-21436) Take advantage of known partioner for distinct on RDDs

2017-07-17 Thread holdenk (JIRA)
holdenk created SPARK-21436: --- Summary: Take advantage of known partioner for distinct on RDDs Key: SPARK-21436 URL: https://issues.apache.org/jira/browse/SPARK-21436 Project: Spark Issue Type:

[jira] [Commented] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Li Yuanjian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089442#comment-16089442 ] Li Yuanjian commented on SPARK-21435: - [~sowen] I tested the patch in our scenario and add a UT, I

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089613#comment-16089613 ] Sean Owen commented on SPARK-21401: --- I also am not clear why the implementation didn't use Scala's

[jira] [Commented] (SPARK-12777) Dataset fields can't be Scala tuples

2017-07-17 Thread Alexander Lopatin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089620#comment-16089620 ] Alexander Lopatin commented on SPARK-12777: --- The issue reproduces for me in standalone

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089633#comment-16089633 ] Sean Owen commented on SPARK-21401: --- I'm not sure how you would remove and add an element to the

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089648#comment-16089648 ] Peng Meng commented on SPARK-21401: --- Yes, you don't need to do it for the vast majority of elements.

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089656#comment-16089656 ] Peng Meng commented on SPARK-21401: --- Got it, thanks [~srowen] > add poll function for

[jira] [Commented] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089740#comment-16089740 ] Apache Spark commented on SPARK-21435: -- User 'xuanyuanking' has created a pull request for this

[jira] [Commented] (SPARK-21440) Refactor ArrowConverters and add DecimalType, ArrayType and StructType support.

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089779#comment-16089779 ] Apache Spark commented on SPARK-21440: -- User 'ueshin' has created a pull request for this issue:

[jira] [Assigned] (SPARK-21440) Refactor ArrowConverters and add DecimalType, ArrayType and StructType support.

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21440: Assignee: (was: Apache Spark) > Refactor ArrowConverters and add DecimalType,

[jira] [Commented] (SPARK-21227) Unicode in Json field causes AnalysisException when selecting from Dataframe

2017-07-17 Thread Seydou Dia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089786#comment-16089786 ] Seydou Dia commented on SPARK-21227: Hello, please let me know your thoughts on this... Maybe I

[jira] [Created] (SPARK-21438) Update to scalatest 3.x

2017-07-17 Thread PJ Fanning (JIRA)
PJ Fanning created SPARK-21438: -- Summary: Update to scalatest 3.x Key: SPARK-21438 URL: https://issues.apache.org/jira/browse/SPARK-21438 Project: Spark Issue Type: Task Components:

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089651#comment-16089651 ] Sean Owen commented on SPARK-21401: --- That's why using the Scala implementation may be better. It should

[jira] [Created] (SPARK-21440) Refactor ArrowConverters and add DecimalType, ArrayType and StructType support.

2017-07-17 Thread Takuya Ueshin (JIRA)
Takuya Ueshin created SPARK-21440: - Summary: Refactor ArrowConverters and add DecimalType, ArrayType and StructType support. Key: SPARK-21440 URL: https://issues.apache.org/jira/browse/SPARK-21440

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089629#comment-16089629 ] Peng Meng commented on SPARK-21401: --- Thanks @srowen. I mean for BoundedPriorityQueue, you also can

[jira] [Commented] (SPARK-21398) Data on Rest end point is not updating after first fetch

2017-07-17 Thread Shubham Gupta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21398?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089627#comment-16089627 ] Shubham Gupta commented on SPARK-21398: --- I am using yarn so , I called using

[jira] [Created] (SPARK-21439) Cannot use Spark with Python ABCmeta (exception from cloudpickle)

2017-07-17 Thread JIRA
Maciej Bryński created SPARK-21439: -- Summary: Cannot use Spark with Python ABCmeta (exception from cloudpickle) Key: SPARK-21439 URL: https://issues.apache.org/jira/browse/SPARK-21439 Project: Spark

[jira] [Assigned] (SPARK-21441) Incorrect Codegen in SortMergeJoinExec results failures in some cases

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21441: Assignee: (was: Apache Spark) > Incorrect Codegen in SortMergeJoinExec results

[jira] [Assigned] (SPARK-21441) Incorrect Codegen in SortMergeJoinExec results failures in some cases

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21441: Assignee: Apache Spark > Incorrect Codegen in SortMergeJoinExec results failures in some

[jira] [Commented] (SPARK-21441) Incorrect Codegen in SortMergeJoinExec results failures in some cases

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089807#comment-16089807 ] Apache Spark commented on SPARK-21441: -- User 'DonnyZone' has created a pull request for this issue:

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089638#comment-16089638 ] Peng Meng commented on SPARK-21401: --- I mean we totally rewrite the BoundedPriorityQueue, not use Java

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089641#comment-16089641 ] Sean Owen commented on SPARK-21401: --- I think it's not worth writing and maintaining a custom

[jira] [Resolved] (SPARK-21438) Update to scalatest 3.x

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21438. --- Resolution: Duplicate This duplicates the changes / discussions in the Scala 2.12 umbrella. Yes I

[jira] [Updated] (SPARK-21438) Update to scalatest 3.x

2017-07-17 Thread PJ Fanning (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-21438: --- Issue Type: Sub-task (was: Task) Parent: SPARK-14220 > Update to scalatest 3.x >

[jira] [Updated] (SPARK-21439) Cannot use Spark with Python ABCmeta (exception from cloudpickle)

2017-07-17 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-21439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Bryński updated SPARK-21439: --- Component/s: PySpark > Cannot use Spark with Python ABCmeta (exception from cloudpickle) >

[jira] [Assigned] (SPARK-21441) Incorrect Codegen in SortMergeJoinExec results failures in some cases

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21441: Assignee: (was: Apache Spark) > Incorrect Codegen in SortMergeJoinExec results

[jira] [Commented] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089729#comment-16089729 ] Apache Spark commented on SPARK-21435: -- User 'xuanyuanking' has created a pull request for this

[jira] [Commented] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089730#comment-16089730 ] Apache Spark commented on SPARK-21435: -- User 'xuanyuanking' has created a pull request for this

[jira] [Assigned] (SPARK-21440) Refactor ArrowConverters and add DecimalType, ArrayType and StructType support.

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21440: Assignee: Apache Spark > Refactor ArrowConverters and add DecimalType, ArrayType and

[jira] [Created] (SPARK-21441) Incorrect Codegen in SortMergeJoinExec results failures in some cases

2017-07-17 Thread Feng Zhu (JIRA)
Feng Zhu created SPARK-21441: Summary: Incorrect Codegen in SortMergeJoinExec results failures in some cases Key: SPARK-21441 URL: https://issues.apache.org/jira/browse/SPARK-21441 Project: Spark

[jira] [Assigned] (SPARK-21441) Incorrect Codegen in SortMergeJoinExec results failures in some cases

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21441: Assignee: Apache Spark > Incorrect Codegen in SortMergeJoinExec results failures in some

[jira] [Commented] (SPARK-21433) Spark SQL should support higher version of Hive metastore

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089414#comment-16089414 ] Sean Owen commented on SPARK-21433: --- What does this have to do with the Hive version? What version?

[jira] [Resolved] (SPARK-21431) Unable to parallelize a list in Scala

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21431. --- Resolution: Not A Problem > Unable to parallelize a list in Scala >

[jira] [Commented] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089450#comment-16089450 ] Sean Owen commented on SPARK-21435: --- I don't think that's sufficient, not necessarily. You need to run

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089437#comment-16089437 ] Peng Meng commented on SPARK-21401: --- I benchmarking just change pq.toArray.sorted. and pq.poll. pq.poll

[jira] [Assigned] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21435: Assignee: Apache Spark > Empty files should be skipped while write to file >

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089448#comment-16089448 ] Sean Owen commented on SPARK-21401: --- What about the time needed to build the queue? insertions are

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089581#comment-16089581 ] Sean Owen commented on SPARK-21401: --- Scratch that, I get the issue now. We can't remove the queue

[jira] [Comment Edited] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089586#comment-16089586 ] Peng Meng edited comment on SPARK-21401 at 7/17/17 10:10 AM: - I have tested

[jira] [Created] (SPARK-21434) Add PySpark pip documentation

2017-07-17 Thread holdenk (JIRA)
holdenk created SPARK-21434: --- Summary: Add PySpark pip documentation Key: SPARK-21434 URL: https://issues.apache.org/jira/browse/SPARK-21434 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089415#comment-16089415 ] Sean Owen commented on SPARK-21401: --- I can't see how polling would be faster. When you poll you have to

[jira] [Commented] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089438#comment-16089438 ] Apache Spark commented on SPARK-21435: -- User 'xuanyuanking' has created a pull request for this

[jira] [Assigned] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21435: Assignee: (was: Apache Spark) > Empty files should be skipped while write to file >

[jira] [Assigned] (SPARK-21383) YARN can allocate to many executors

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21383: Assignee: Apache Spark > YARN can allocate to many executors >

[jira] [Created] (SPARK-21437) Java Keyword cannot be used in table schema

2017-07-17 Thread pin_zhang (JIRA)
pin_zhang created SPARK-21437: - Summary: Java Keyword cannot be used in table schema Key: SPARK-21437 URL: https://issues.apache.org/jira/browse/SPARK-21437 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-21435. -- Resolution: Duplicate I am resolving this as a duplicate of SPARK-20065. I believe the fix

[jira] [Updated] (SPARK-21433) Spark SQL should support higher version of Hive metastore

2017-07-17 Thread Gu Chao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gu Chao updated SPARK-21433: Description: Now. Spark SQL supports Hive metastore versions ranging from 0.12.0 to 1.2.1 (inclusive).

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089424#comment-16089424 ] Peng Meng commented on SPARK-21401: --- Hi [~srowen], for ALS optimization, the difference of using poll

[jira] [Updated] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-21435: -- Priority: Minor (was: Major) Not major. This seems to relate to other JIRAs. Are you sure you can

[jira] [Commented] (SPARK-21430) Add PMML support to SparkR

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089595#comment-16089595 ] Sean Owen commented on SPARK-21430: --- That's going to be a big effort. Are you working on it? Not sure

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089431#comment-16089431 ] Sean Owen commented on SPARK-21401: --- You're benchmarking a lot of other changes all at once. I don't

[jira] [Created] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Li Yuanjian (JIRA)
Li Yuanjian created SPARK-21435: --- Summary: Empty files should be skipped while write to file Key: SPARK-21435 URL: https://issues.apache.org/jira/browse/SPARK-21435 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-21397) Maven shade plugin adding dependency-reduced-pom.xml to base directory

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21397. --- Resolution: Not A Problem > Maven shade plugin adding dependency-reduced-pom.xml to base directory >

[jira] [Commented] (SPARK-21383) YARN can allocate to many executors

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089459#comment-16089459 ] Apache Spark commented on SPARK-21383: -- User 'djvulee' has created a pull request for this issue:

[jira] [Assigned] (SPARK-21383) YARN can allocate to many executors

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21383: Assignee: (was: Apache Spark) > YARN can allocate to many executors >

[jira] [Updated] (SPARK-21383) YARN can allocate too many executors

2017-07-17 Thread DjvuLee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DjvuLee updated SPARK-21383: Summary: YARN can allocate too many executors (was: YARN can allocate to many executors) > YARN can

[jira] [Updated] (SPARK-21433) Spark SQL should support higher version of Hive metastore

2017-07-17 Thread Gu Chao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gu Chao updated SPARK-21433: Description: Now. Spark SQL supports Hive metastore versions ranging from 0.12.0 to 1.2.1 (inclusive).

[jira] [Reopened] (SPARK-21431) Unable to parallelize a list in Scala

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reopened SPARK-21431: --- > Unable to parallelize a list in Scala > - > > Key:

[jira] [Commented] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089453#comment-16089453 ] Hyukjin Kwon commented on SPARK-21435: -- I guess this is a duplicate of SPARK-20065 (issue itself

[jira] [Commented] (SPARK-21435) Empty files should be skipped while write to file

2017-07-17 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089455#comment-16089455 ] Hyukjin Kwon commented on SPARK-21435: -- Yea, particularly, in this case, many issues are related I

[jira] [Commented] (SPARK-21340) Bring PySpark MLLib evaluation metrics to parity with Scala API

2017-07-17 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089487#comment-16089487 ] Marco Gaido commented on SPARK-21340: - [~jake.charland] I submitted a PR but I am not sure it will be

[jira] [Commented] (SPARK-21437) Java Keyword cannot be used in table schema

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21437?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089527#comment-16089527 ] Sean Owen commented on SPARK-21437: --- Almost certainly because of code generation, because {{const}} is

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089578#comment-16089578 ] Peng Meng commented on SPARK-21401: --- Hi [~srowen], I got why my original test pq.toArray.sorted is very

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089586#comment-16089586 ] Peng Meng commented on SPARK-21401: --- I have tested much about poll and toArray.sorted. If the queue is

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089589#comment-16089589 ] Sean Owen commented on SPARK-21401: --- Yes, I agree with you now. I find they're roughly the same, but,

[jira] [Commented] (SPARK-21401) add poll function for BoundedPriorityQueue

2017-07-17 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089606#comment-16089606 ] Peng Meng commented on SPARK-21401: --- I think the BoundedPriorityQueue should be rewritten. there are

[jira] [Updated] (SPARK-21442) Spark CSV writer trims trailing spaces

2017-07-17 Thread eugen yushin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] eugen yushin updated SPARK-21442: - Environment: version 2.1.0-mapr-1703 Using Scala version 2.11.8 (OpenJDK 64-Bit Server VM, Java

[jira] [Resolved] (SPARK-21442) Spark CSV writer trims trailing spaces

2017-07-17 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-21442. -- Resolution: Duplicate > Spark CSV writer trims trailing spaces >

[jira] [Created] (SPARK-21442) Spark CSV writer trims trailing spaces

2017-07-17 Thread eugen yushin (JIRA)
eugen yushin created SPARK-21442: Summary: Spark CSV writer trims trailing spaces Key: SPARK-21442 URL: https://issues.apache.org/jira/browse/SPARK-21442 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-21415) Triage scapegoat warnings, part 1

2017-07-17 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089915#comment-16089915 ] Kazuaki Ishizaki commented on SPARK-21415: -- I see. When another JIRA will happen for these

[jira] [Updated] (SPARK-21442) Spark CSV writer trims trailing spaces

2017-07-17 Thread eugen yushin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] eugen yushin updated SPARK-21442: - Affects Version/s: 2.1.0 > Spark CSV writer trims trailing spaces >

[jira] [Comment Edited] (SPARK-21442) Spark CSV writer trims trailing spaces

2017-07-17 Thread eugen yushin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089927#comment-16089927 ] eugen yushin edited comment on SPARK-21442 at 7/17/17 2:45 PM: --- Agree,

[jira] [Commented] (SPARK-21442) Spark CSV writer trims trailing spaces

2017-07-17 Thread eugen yushin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089927#comment-16089927 ] eugen yushin commented on SPARK-21442: -- Agree, thanks for quick response > Spark CSV writer trims

[jira] [Created] (SPARK-21443) Very long planning duration for queries with lots of operations

2017-07-17 Thread Eyal Zituny (JIRA)
Eyal Zituny created SPARK-21443: --- Summary: Very long planning duration for queries with lots of operations Key: SPARK-21443 URL: https://issues.apache.org/jira/browse/SPARK-21443 Project: Spark

[jira] [Commented] (SPARK-14922) Alter Table Drop Partition Using Predicate-based Partition Spec

2017-07-17 Thread David Hodeffi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089897#comment-16089897 ] David Hodeffi commented on SPARK-14922: --- got this error too, when are you going to fix that, any

[jira] [Updated] (SPARK-21442) Spark CSV writer trims trailing spaces

2017-07-17 Thread eugen yushin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] eugen yushin updated SPARK-21442: - Affects Version/s: 2.1.1 > Spark CSV writer trims trailing spaces >

[jira] [Commented] (SPARK-21442) Spark CSV writer trims trailing spaces

2017-07-17 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16089923#comment-16089923 ] Hyukjin Kwon commented on SPARK-21442: -- Please check out the examples in

[jira] [Updated] (SPARK-21442) Spark CSV writer trims trailing spaces

2017-07-17 Thread eugen yushin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] eugen yushin updated SPARK-21442: - Affects Version/s: (was: 2.1.1) > Spark CSV writer trims trailing spaces >

[jira] [Commented] (SPARK-21425) LongAccumulator, DoubleAccumulator not threadsafe

2017-07-17 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090003#comment-16090003 ] Imran Rashid commented on SPARK-21425: -- Just to make sure I understand, the problem here is only

[jira] [Commented] (SPARK-21433) Spark SQL should support higher version of Hive metastore

2017-07-17 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090035#comment-16090035 ] Marcelo Vanzin commented on SPARK-21433: I have no idea what is being asked here. Spark 2.2

[jira] [Commented] (SPARK-18085) SPIP: Better History Server scalability for many / large applications

2017-07-17 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090042#comment-16090042 ] Marcelo Vanzin commented on SPARK-18085:  > SPIP: Better History Server scalability for many /

[jira] [Assigned] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20871: Assignee: Apache Spark > Only log Janino code in debug mode >

[jira] [Commented] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090099#comment-16090099 ] Apache Spark commented on SPARK-20871: -- User 'pjfanning' has created a pull request for this issue:

[jira] [Assigned] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20871: Assignee: (was: Apache Spark) > Only log Janino code in debug mode >

[jira] [Updated] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-20871: -- Priority: Trivial (was: Major) Fix Version/s: (was: 2.3.0) > Only log Janino code in

[jira] [Resolved] (SPARK-21221) CrossValidator and TrainValidationSplit Persist Nested Estimators such as OneVsRest

2017-07-17 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-21221. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18428

[jira] [Updated] (SPARK-21392) Unable to infer schema when loading large Parquet file

2017-07-17 Thread Stuart Reynolds (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stuart Reynolds updated SPARK-21392: Description: The following boring code works up until when I read in the parquet file.

[jira] [Updated] (SPARK-21392) Unable to infer schema when loading large Parquet file

2017-07-17 Thread Stuart Reynolds (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stuart Reynolds updated SPARK-21392: Description: The following boring code works up until when I read in the parquet file.

[jira] [Commented] (SPARK-21392) Unable to infer schema when loading large Parquet file

2017-07-17 Thread Stuart Reynolds (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090134#comment-16090134 ] Stuart Reynolds commented on SPARK-21392: - I've made the example self contained and sourced from

[jira] [Commented] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090146#comment-16090146 ] Sean Owen commented on SPARK-20871: --- Got it, so it can fail in normal operation. I can see turning down

[jira] [Updated] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread PJ Fanning (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-20871: --- Component/s: (was: Spark Core) SQL > Only log Janino code in debug mode >

[jira] [Updated] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread PJ Fanning (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] PJ Fanning updated SPARK-20871: --- Fix Version/s: 2.3.0 > Only log Janino code in debug mode > -- > >

[jira] [Commented] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090106#comment-16090106 ] Sean Owen commented on SPARK-20871: --- Why would you not want this information, if compilation has

[jira] [Commented] (SPARK-20871) Only log Janino code in debug mode

2017-07-17 Thread PJ Fanning (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090125#comment-16090125 ] PJ Fanning commented on SPARK-20871: [~srowen] One of the main reasons for the Janino compile to fail

[jira] [Commented] (SPARK-21425) LongAccumulator, DoubleAccumulator not threadsafe

2017-07-17 Thread Ryan Williams (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21425?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090017#comment-16090017 ] Ryan Williams commented on SPARK-21425: --- Yea, static accumulators; I was able to reproduce it in

[jira] [Resolved] (SPARK-21321) Spark very verbose on shutdown confusing users

2017-07-17 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-21321. --- Resolution: Fixed Assignee: Jong Yoon Lee Fix Version/s: 2.3.0

[jira] [Commented] (SPARK-21392) Unable to infer schema when loading large Parquet file

2017-07-17 Thread Stuart Reynolds (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090159#comment-16090159 ] Stuart Reynolds commented on SPARK-21392: - So trying to look at the csv was helpful.

[jira] [Commented] (SPARK-20307) SparkR: pass on setHandleInvalid to spark.mllib functions that use StringIndexer

2017-07-17 Thread Miao Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090213#comment-16090213 ] Miao Wang commented on SPARK-20307: --- [~yanboliang] Have you added the support on Python side? I think

[jira] [Commented] (SPARK-21443) Very long planning duration for queries with lots of operations

2017-07-17 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090242#comment-16090242 ] Kazuaki Ishizaki commented on SPARK-21443: -- These two optimizations

[jira] [Commented] (SPARK-11182) HDFS Delegation Token will be expired when calling "UserGroupInformation.getCurrentUser.addCredentials" in HA mode

2017-07-17 Thread John Zhuge (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16090183#comment-16090183 ] John Zhuge commented on SPARK-11182: Is this issue fixed by HDFS-9276? > HDFS Delegation Token will

  1   2   >