[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe

2016-10-11 Thread Harish (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565493#comment-15565493 ] Harish commented on SPARK-17463: It looks like a show stopper for my current project. Can you please let

[jira] [Created] (SPARK-17873) ALTER TABLE ... RENAME TO ... should allow users to specify database in destination table name

2016-10-11 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-17873: --- Summary: ALTER TABLE ... RENAME TO ... should allow users to specify database in destination table name Key: SPARK-17873 URL: https://issues.apache.org/jira/browse/SPARK-17873

[jira] [Commented] (SPARK-17873) ALTER TABLE ... RENAME TO ... should allow users to specify database in destination table name

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565623#comment-15565623 ] Apache Spark commented on SPARK-17873: -- User 'cloud-fan' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17873) ALTER TABLE ... RENAME TO ... should allow users to specify database in destination table name

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17873: Assignee: Apache Spark (was: Wenchen Fan) > ALTER TABLE ... RENAME TO ... should allow

[jira] [Updated] (SPARK-17870) ML/MLLIB: ChiSquareSelector based on Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peng Meng updated SPARK-17870: -- Summary: ML/MLLIB: ChiSquareSelector based on Statistics.chiSqTest(RDD) is wrong (was: ML/MLLIB:

[jira] [Updated] (SPARK-8425) Add blacklist mechanism for task scheduling

2016-10-11 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid updated SPARK-8425: Attachment: DesignDocforBlacklistMechanism.pdf Seems like there is agreement on the design, so I'm

[jira] [Commented] (SPARK-17853) Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.

2016-10-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565559#comment-15565559 ] Cody Koeninger commented on SPARK-17853: Good, will keep this ticket open at least until

[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2016-10-11 Thread Don Drake (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565592#comment-15565592 ] Don Drake commented on SPARK-16845: --- Unfortunately, it does not work around it. 16/10/10 18:19:47

[jira] [Commented] (SPARK-17853) Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.

2016-10-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565400#comment-15565400 ] Cody Koeninger commented on SPARK-17853: Use a different group id. Let me know if that addresses

[jira] [Commented] (SPARK-17853) Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.

2016-10-11 Thread Aleksander Ihnatowicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565448#comment-15565448 ] Aleksander Ihnatowicz commented on SPARK-17853: --- Setting different group ids solved the

[jira] [Resolved] (SPARK-17656) Decide on the variant of @scala.annotation.varargs and use consistently

2016-10-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-17656. -- Resolution: Fixed This was fixed in the PR together. > Decide on the variant of

[jira] [Assigned] (SPARK-17822) JVMObjectTracker.objMap may leak JVM objects

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17822: Assignee: (was: Apache Spark) > JVMObjectTracker.objMap may leak JVM objects >

[jira] [Commented] (SPARK-17822) JVMObjectTracker.objMap may leak JVM objects

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565374#comment-15565374 ] Apache Spark commented on SPARK-17822: -- User 'techaddict' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17822) JVMObjectTracker.objMap may leak JVM objects

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17822: Assignee: Apache Spark > JVMObjectTracker.objMap may leak JVM objects >

[jira] [Comment Edited] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe

2016-10-11 Thread Harish (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565493#comment-15565493 ] Harish edited comment on SPARK-17463 at 10/11/16 2:03 PM: -- It looks like a show

[jira] [Created] (SPARK-17872) aggregate function on dataset with tuples grouped by non sequential fields

2016-10-11 Thread Niek Bartholomeus (JIRA)
Niek Bartholomeus created SPARK-17872: - Summary: aggregate function on dataset with tuples grouped by non sequential fields Key: SPARK-17872 URL: https://issues.apache.org/jira/browse/SPARK-17872

[jira] [Commented] (SPARK-17870) ML/MLLIB: Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565225#comment-15565225 ] Peng Meng commented on SPARK-17870: --- yes, the selectKBest and selectPercentile in scikit learn only use

[jira] [Commented] (SPARK-17853) Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.

2016-10-11 Thread Piotr Guzik (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565307#comment-15565307 ] Piotr Guzik commented on SPARK-17853: - Hi. We are using version 0-10. We are also using the same

[jira] [Created] (SPARK-17871) Dataset joinwith syntax should support specifying the condition in a compile-time safe way

2016-10-11 Thread Jamie Hutton (JIRA)
Jamie Hutton created SPARK-17871: Summary: Dataset joinwith syntax should support specifying the condition in a compile-time safe way Key: SPARK-17871 URL: https://issues.apache.org/jira/browse/SPARK-17871

[jira] [Commented] (SPARK-17808) BinaryType fails in Python 3 due to outdated Pyrolite

2016-10-11 Thread Pete Fein (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565470#comment-15565470 ] Pete Fein commented on SPARK-17808: --- Any reason this can't be included in the next 2.0.x bug fix

[jira] [Updated] (SPARK-17872) aggregate function on dataset with tuples grouped by non sequential fields

2016-10-11 Thread Niek Bartholomeus (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17872?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Niek Bartholomeus updated SPARK-17872: -- Description: The following lines where the field index in the tuple used in an

[jira] [Commented] (SPARK-17853) Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.

2016-10-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565278#comment-15565278 ] Cody Koeninger commented on SPARK-17853: Which version of DStream are you using, 0-10 or 0-8? Are

[jira] [Commented] (SPARK-17870) ML/MLLIB: Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565251#comment-15565251 ] Peng Meng commented on SPARK-17870: --- The scikit learn code is here:

[jira] [Commented] (SPARK-17870) ML/MLLIB: Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565315#comment-15565315 ] Peng Meng commented on SPARK-17870: --- https://github.com/apache/spark/pull/1484#issuecomment-51024568 Hi

[jira] [Commented] (SPARK-17870) ML/MLLIB: Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565180#comment-15565180 ] Sean Owen commented on SPARK-17870: --- I don't think the raw statistic can be directly compared here

[jira] [Commented] (SPARK-17870) ML/MLLIB: Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565238#comment-15565238 ] Sean Owen commented on SPARK-17870: --- I don't quite understand this example, can you point me to the

[jira] [Commented] (SPARK-4411) Add "kill" link for jobs in the UI

2016-10-11 Thread Alex Bozarth (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566430#comment-15566430 ] Alex Bozarth commented on SPARK-4411: - I'm currently working on this. I'm updating the original pr to

[jira] [Comment Edited] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe

2016-10-11 Thread Harish (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566427#comment-15566427 ] Harish edited comment on SPARK-17463 at 10/11/16 8:06 PM: -- No i dont have any

[jira] [Commented] (SPARK-17870) ML/MLLIB: ChiSquareSelector based on Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Alexander Ulanov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566467#comment-15566467 ] Alexander Ulanov commented on SPARK-17870: --

[jira] [Comment Edited] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe

2016-10-11 Thread Harish (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566515#comment-15566515 ] Harish edited comment on SPARK-17463 at 10/11/16 8:34 PM: -- My second approach

[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe

2016-10-11 Thread Harish (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566515#comment-15566515 ] Harish commented on SPARK-17463: My second approach was: def testfunc(keys, vals, columnsToStandardize):

[jira] [Assigned] (SPARK-17845) Improve window function frame boundary API in DataFrame

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17845: Assignee: Apache Spark (was: Reynold Xin) > Improve window function frame boundary API

[jira] [Created] (SPARK-17879) Don't compact metadata logs constantly into a single compacted file

2016-10-11 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-17879: --- Summary: Don't compact metadata logs constantly into a single compacted file Key: SPARK-17879 URL: https://issues.apache.org/jira/browse/SPARK-17879 Project: Spark

[jira] [Commented] (SPARK-17709) spark 2.0 join - column resolution error

2016-10-11 Thread Ashish Shrowty (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566160#comment-15566160 ] Ashish Shrowty commented on SPARK-17709: Cool.. thanks. Will do this in next day or two. > spark

[jira] [Commented] (SPARK-17811) SparkR cannot parallelize data.frame with NA or NULL in Date columns

2016-10-11 Thread Miao Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566163#comment-15566163 ] Miao Wang commented on SPARK-17811: --- :) Just want to submit a PR and found that you have a fix. Good to

[jira] [Assigned] (SPARK-17876) Write StructuredStreaming WAL to a stream instead of materializing all at once

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17876: Assignee: (was: Apache Spark) > Write StructuredStreaming WAL to a stream instead of

[jira] [Resolved] (SPARK-17817) PySpark RDD Repartitioning Results in Highly Skewed Partition Sizes

2016-10-11 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17817?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-17817. -- Resolution: Fixed Assignee: Liang-Chi Hsieh Fix Version/s: 2.1.0 > PySpark RDD

[jira] [Commented] (SPARK-17870) ML/MLLIB: Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565041#comment-15565041 ] Peng Meng commented on SPARK-17870: --- hi [~srowen], thanks very much for you quickly reply. yes,the

[jira] [Comment Edited] (SPARK-14272) Evaluate GaussianMixtureModel with LogLikelihood

2016-10-11 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565069#comment-15565069 ] zhengruifeng edited comment on SPARK-14272 at 10/11/16 10:07 AM: - Yes, I

[jira] [Commented] (SPARK-14272) Evaluate GaussianMixtureModel with LogLikelihood

2016-10-11 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565069#comment-15565069 ] zhengruifeng commented on SPARK-14272: -- Yes, I will a update after SPARK-17847 get merged >

[jira] [Updated] (SPARK-14272) Evaluate GaussianMixtureModel with LogLikelihood

2016-10-11 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng updated SPARK-14272: - Component/s: (was: MLlib) ML > Evaluate GaussianMixtureModel with

[jira] [Commented] (SPARK-9879) OOM in LIMIT clause with large number

2016-10-11 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566639#comment-15566639 ] Dongjoon Hyun commented on SPARK-9879: -- Hi, All. The PR seems to be closed last December. Can we

[jira] [Updated] (SPARK-17877) Can not checkpoint connectedComponents resulting graph

2016-10-11 Thread Alexander Pivovarov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated SPARK-17877: Description: The following code demonstrates the issue {code} import

[jira] [Commented] (SPARK-17139) Add model summary for MultinomialLogisticRegression

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15565973#comment-15565973 ] Apache Spark commented on SPARK-17139: -- User 'WeichenXu123' has created a pull request for this

[jira] [Updated] (SPARK-17858) Provide option for Spark SQL to skip corrupt files

2016-10-11 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-17858: - Description: In Spark 2.0, corrupt files will fail a SQL query. However, the user may just want

[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe

2016-10-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566104#comment-15566104 ] Sean Owen commented on SPARK-17463: --- What do you mean? this has been released already in 2.0.1. >

[jira] [Commented] (SPARK-17808) BinaryType fails in Python 3 due to outdated Pyrolite

2016-10-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566107#comment-15566107 ] Sean Owen commented on SPARK-17808: --- I think it could be OK. It's a bug fix, and while it is a minor

[jira] [Commented] (SPARK-17709) spark 2.0 join - column resolution error

2016-10-11 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566124#comment-15566124 ] Xiao Li commented on SPARK-17709: - Below is the link:

[jira] [Commented] (SPARK-17709) spark 2.0 join - column resolution error

2016-10-11 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566126#comment-15566126 ] Xiao Li commented on SPARK-17709: - Below is the link:

[jira] [Created] (SPARK-17875) Remove unneeded direct dependence on Netty 3.x

2016-10-11 Thread Sean Owen (JIRA)
Sean Owen created SPARK-17875: - Summary: Remove unneeded direct dependence on Netty 3.x Key: SPARK-17875 URL: https://issues.apache.org/jira/browse/SPARK-17875 Project: Spark Issue Type:

[jira] [Updated] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2016-10-11 Thread Artur Sukhenko (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-4105: -- Affects Version/s: 2.0.0 > FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with

[jira] [Commented] (SPARK-15343) NoClassDefFoundError when initializing Spark with YARN

2016-10-11 Thread Jo Desmet (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566019#comment-15566019 ] Jo Desmet commented on SPARK-15343: --- By design we apparently have a very tight coupling of scheduling

[jira] [Assigned] (SPARK-17139) Add model summary for MultinomialLogisticRegression

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17139: Assignee: Apache Spark > Add model summary for MultinomialLogisticRegression >

[jira] [Assigned] (SPARK-17139) Add model summary for MultinomialLogisticRegression

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17139?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17139: Assignee: (was: Apache Spark) > Add model summary for MultinomialLogisticRegression >

[jira] [Commented] (SPARK-17709) spark 2.0 join - column resolution error

2016-10-11 Thread Ashish Shrowty (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566114#comment-15566114 ] Ashish Shrowty commented on SPARK-17709: I assume I would need to modify the Spark code and build

[jira] [Updated] (SPARK-17874) Additional SSL port on HistoryServer should be configurable

2016-10-11 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ash updated SPARK-17874: --- Summary: Additional SSL port on HistoryServer should be configurable (was: Enabling SSL on

[jira] [Created] (SPARK-17874) Enabling SSL on HistoryServer should only open one port not two

2016-10-11 Thread Andrew Ash (JIRA)
Andrew Ash created SPARK-17874: -- Summary: Enabling SSL on HistoryServer should only open one port not two Key: SPARK-17874 URL: https://issues.apache.org/jira/browse/SPARK-17874 Project: Spark

[jira] [Assigned] (SPARK-17875) Remove unneeded direct dependence on Netty 3.x

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17875: Assignee: Apache Spark (was: Sean Owen) > Remove unneeded direct dependence on Netty 3.x

[jira] [Commented] (SPARK-17875) Remove unneeded direct dependence on Netty 3.x

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566136#comment-15566136 ] Apache Spark commented on SPARK-17875: -- User 'srowen' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17875) Remove unneeded direct dependence on Netty 3.x

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17875: Assignee: Sean Owen (was: Apache Spark) > Remove unneeded direct dependence on Netty 3.x

[jira] [Commented] (SPARK-15153) SparkR spark.naiveBayes throws error when label is numeric type

2016-10-11 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566251#comment-15566251 ] Joseph K. Bradley commented on SPARK-15153: --- Note I'm setting the target version for 2.1, not

[jira] [Updated] (SPARK-17812) More granular control of starting offsets

2016-10-11 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17812: - Description: Right now you can only run a Streaming Query starting from either the

[jira] [Comment Edited] (SPARK-17877) Can not checkpoint connectedComponents resulting graph

2016-10-11 Thread Alexander Pivovarov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566385#comment-15566385 ] Alexander Pivovarov edited comment on SPARK-17877 at 10/11/16 7:50 PM:

[jira] [Created] (SPARK-17877) Can not checkpoint connectedComponents resulting graph

2016-10-11 Thread Alexander Pivovarov (JIRA)
Alexander Pivovarov created SPARK-17877: --- Summary: Can not checkpoint connectedComponents resulting graph Key: SPARK-17877 URL: https://issues.apache.org/jira/browse/SPARK-17877 Project: Spark

[jira] [Updated] (SPARK-17877) Can not checkpoint connectedComponents resulting graph

2016-10-11 Thread Alexander Pivovarov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated SPARK-17877: Description: The following code demonstrates an issue {code} import

[jira] [Updated] (SPARK-17877) Can not checkpoint connectedComponents resulting graph

2016-10-11 Thread Alexander Pivovarov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated SPARK-17877: Description: The following code demonstrates the issue {code} import

[jira] [Resolved] (SPARK-15153) SparkR spark.naiveBayes throws error when label is numeric type

2016-10-11 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-15153. --- Resolution: Fixed Fix Version/s: 2.1.0 Issue resolved by pull request 15431

[jira] [Updated] (SPARK-17877) Can not checkpoint connectedComponents resulting graph

2016-10-11 Thread Alexander Pivovarov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated SPARK-17877: Description: The following code demonstrates the issue {code} import

[jira] [Updated] (SPARK-17877) Can not checkpoint connectedComponents resulting graph

2016-10-11 Thread Alexander Pivovarov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexander Pivovarov updated SPARK-17877: Description: The following code demonstrates the issue {code} import

[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe

2016-10-11 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566391#comment-15566391 ] Shixiong Zhu commented on SPARK-17463: -- Do you have a reproducer? I saw `at

[jira] [Updated] (SPARK-17812) More granular control of starting offsets (assign)

2016-10-11 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-17812: - Summary: More granular control of starting offsets (assign) (was: More granular control

[jira] [Commented] (SPARK-17812) More granular control of starting offsets

2016-10-11 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566392#comment-15566392 ] Michael Armbrust commented on SPARK-17812: -- For the seeking back {{X}} offsets use case, I was

[jira] [Assigned] (SPARK-17876) Write StructuredStreaming WAL to a stream instead of materializing all at once

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17876: Assignee: Apache Spark > Write StructuredStreaming WAL to a stream instead of

[jira] [Commented] (SPARK-17876) Write StructuredStreaming WAL to a stream instead of materializing all at once

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17876?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566181#comment-15566181 ] Apache Spark commented on SPARK-17876: -- User 'brkyvz' has created a pull request for this issue:

[jira] [Commented] (SPARK-17870) ML/MLLIB: ChiSquareSelector based on Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566284#comment-15566284 ] Sean Owen commented on SPARK-17870: --- OK I get it, they're doing different things really. The scikit

[jira] [Comment Edited] (SPARK-12216) Spark failed to delete temp directory

2016-10-11 Thread Jerome Scheuring (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566349#comment-15566349 ] Jerome Scheuring edited comment on SPARK-12216 at 10/11/16 7:34 PM:

[jira] [Commented] (SPARK-12216) Spark failed to delete temp directory

2016-10-11 Thread Jerome Scheuring (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566349#comment-15566349 ] Jerome Scheuring commented on SPARK-12216: -- _Note that I am entirely new to the process of

[jira] [Commented] (SPARK-17845) Improve window function frame boundary API in DataFrame

2016-10-11 Thread Timothy Hunter (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566380#comment-15566380 ] Timothy Hunter commented on SPARK-17845: I like the {{Window.rowsBetween(Long.MinValue, -3)}}

[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe

2016-10-11 Thread Harish (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566206#comment-15566206 ] Harish commented on SPARK-17463: Is this fix is part of the https://github.com/apache/spark/pull/15371

[jira] [Commented] (SPARK-17877) Can not checkpoint connectedComponents resulting graph

2016-10-11 Thread Alexander Pivovarov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566385#comment-15566385 ] Alexander Pivovarov commented on SPARK-17877: - Another open issue with checkpointing is

[jira] [Created] (SPARK-17876) Write StructuredStreaming WAL to a stream instead of materializing all at once

2016-10-11 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-17876: --- Summary: Write StructuredStreaming WAL to a stream instead of materializing all at once Key: SPARK-17876 URL: https://issues.apache.org/jira/browse/SPARK-17876

[jira] [Commented] (SPARK-17344) Kafka 0.8 support for Structured Streaming

2016-10-11 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566191#comment-15566191 ] Michael Armbrust commented on SPARK-17344: -- I think the fact that CDH is still distributing 0.9

[jira] [Commented] (SPARK-17344) Kafka 0.8 support for Structured Streaming

2016-10-11 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566244#comment-15566244 ] Cody Koeninger commented on SPARK-17344: How long would it take CDH to distribute 0.10 if there

[jira] [Updated] (SPARK-15153) SparkR spark.naiveBayes throws error when label is numeric type

2016-10-11 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-15153: -- Target Version/s: 2.1.0 > SparkR spark.naiveBayes throws error when label is numeric

[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe

2016-10-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566299#comment-15566299 ] Sean Owen commented on SPARK-17463: --- No, that change came after, and is part of a different JIRA that

[jira] [Commented] (SPARK-17463) Serialization of accumulators in heartbeats is not thread-safe

2016-10-11 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566300#comment-15566300 ] Sean Owen commented on SPARK-17463: --- No, that change came after, and is part of a different JIRA that

[jira] [Updated] (SPARK-17816) Json serialzation of accumulators are failing with ConcurrentModificationException

2016-10-11 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-17816: - Affects Version/s: 2.0.1 > Json serialzation of accumulators are failing with >

[jira] [Commented] (SPARK-17878) Support for multiple null values when reading CSV data

2016-10-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567046#comment-15567046 ] Hyukjin Kwon commented on SPARK-17878: -- Maybe it'd be nicer if options allow list or nested map (if

[jira] [Commented] (SPARK-17878) Support for multiple null values when reading CSV data

2016-10-11 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567104#comment-15567104 ] Hossein Falaki commented on SPARK-17878: That would require API change in SparkSQL. Otherwise, we

[jira] [Commented] (SPARK-17878) Support for multiple null values when reading CSV data

2016-10-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567124#comment-15567124 ] Hyukjin Kwon commented on SPARK-17878: -- Oh, I didn't mean I am against this. I am just wondering if

[jira] [Comment Edited] (SPARK-17878) Support for multiple null values when reading CSV data

2016-10-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567124#comment-15567124 ] Hyukjin Kwon edited comment on SPARK-17878 at 10/12/16 12:50 AM: - Oh, I

[jira] [Commented] (SPARK-17878) Support for multiple null values when reading CSV data

2016-10-11 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567149#comment-15567149 ] Hyukjin Kwon commented on SPARK-17878: -- BTW, maybe, I will try to investigate further if it is

[jira] [Commented] (SPARK-17878) Support for multiple null values when reading CSV data

2016-10-11 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567180#comment-15567180 ] Hossein Falaki commented on SPARK-17878: Sure. If passing a list is possible it is the better

[jira] [Commented] (SPARK-4411) Add "kill" link for jobs in the UI

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567028#comment-15567028 ] Apache Spark commented on SPARK-4411: - User 'ajbozarth' has created a pull request for this issue:

[jira] [Assigned] (SPARK-17853) Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17853: Assignee: (was: Apache Spark) > Kafka OffsetOutOfRangeException on DStreams union

[jira] [Assigned] (SPARK-17853) Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-17853: Assignee: Apache Spark > Kafka OffsetOutOfRangeException on DStreams union from separate

[jira] [Resolved] (SPARK-17387) Creating SparkContext() from python without spark-submit ignores user conf

2016-10-11 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-17387. Resolution: Fixed Assignee: Jeff Zhang Fix Version/s: 2.1.0 > Creating

[jira] [Commented] (SPARK-16845) org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificOrdering" grows beyond 64 KB

2016-10-11 Thread K (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15566741#comment-15566741 ] K commented on SPARK-16845: --- We manually wrote parts that were throwing errors (StringIndexer and

[jira] [Commented] (SPARK-17870) ML/MLLIB: ChiSquareSelector based on Statistics.chiSqTest(RDD) is wrong

2016-10-11 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17870?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567170#comment-15567170 ] Peng Meng commented on SPARK-17870: --- hi [~avulanov], the question here is not use raw chi2 scores or

[jira] [Commented] (SPARK-17853) Kafka OffsetOutOfRangeException on DStreams union from separate Kafka clusters with identical topic names.

2016-10-11 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15567199#comment-15567199 ] Apache Spark commented on SPARK-17853: -- User 'koeninger' has created a pull request for this issue:

  1   2   >