[jira] [Resolved] (SPARK-10483) spark-submit can not support symbol link

2015-09-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-10483. --- Resolution: Duplicate Please have a look at

[jira] [Updated] (SPARK-10483) spark-submit can not support symbol link

2015-09-08 Thread xuqing (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuqing updated SPARK-10483: --- Environment: Red Hat Enterprise Linux Server release 6.4 (Santiago) (was: [root@xqwin03 bin]# cat

[jira] [Created] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when cross join happen

2015-09-08 Thread Yi Zhou (JIRA)
Yi Zhou created SPARK-10484: --- Summary: [Spark SQL] Come across lost task(timeout) or GC OOM error when cross join happen Key: SPARK-10484 URL: https://issues.apache.org/jira/browse/SPARK-10484 Project:

[jira] [Updated] (SPARK-10481) SPARK_PREPEND_CLASSES make spark-yarn related jar could not be found

2015-09-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10481: -- Priority: Minor (was: Major) > SPARK_PREPEND_CLASSES make spark-yarn related jar could not be found >

[jira] [Commented] (SPARK-3369) Java mapPartitions Iterator->Iterable is inconsistent with Scala's Iterator->Iterator

2015-09-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734495#comment-14734495 ] Sean Owen commented on SPARK-3369: -- I don't think there's a "why" -- just hasn't been done by someone who

[jira] [Updated] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when cross join happen

2015-09-08 Thread Yi Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhou updated SPARK-10484: Description: Found that it lost task or GC OOM when below cross join happen. The left big table is ~1.2G

[jira] [Updated] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when cross join happen

2015-09-08 Thread Yi Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhou updated SPARK-10484: Description: Found that it lost task or GC OOM when below cross join happen. The left big table is ~1.2G

[jira] [Commented] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when two tables do cross join

2015-09-08 Thread Cheng Hao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734395#comment-14734395 ] Cheng Hao commented on SPARK-10484: --- In cartesian produce implementation, there is 2 level nested

[jira] [Created] (SPARK-10483) spark-submit can not support symbol link

2015-09-08 Thread xuqing (JIRA)
xuqing created SPARK-10483: -- Summary: spark-submit can not support symbol link Key: SPARK-10483 URL: https://issues.apache.org/jira/browse/SPARK-10483 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-10483) spark-submit can not support symbol link

2015-09-08 Thread xuqing (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuqing updated SPARK-10483: --- Description: Create a symbol link for spark-submit {quote} [root@xqwin03 bin]# ll spark-submit lrwxrwxrwx 1

[jira] [Updated] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when cross join happen

2015-09-08 Thread Yi Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhou updated SPARK-10484: Issue Type: Improvement (was: Bug) > [Spark SQL] Come across lost task(timeout) or GC OOM error when

[jira] [Updated] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when cross join happen

2015-09-08 Thread Yi Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhou updated SPARK-10484: Issue Type: Bug (was: Improvement) > [Spark SQL] Come across lost task(timeout) or GC OOM error when

[jira] [Created] (SPARK-10485) IF expression is not correctly resolved when one of the options have NullType

2015-09-08 Thread Antonio Jesus Navarro (JIRA)
Antonio Jesus Navarro created SPARK-10485: - Summary: IF expression is not correctly resolved when one of the options have NullType Key: SPARK-10485 URL: https://issues.apache.org/jira/browse/SPARK-10485

[jira] [Commented] (SPARK-10479) LogisticRegression copy should copy model summary if available

2015-09-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734504#comment-14734504 ] Sean Owen commented on SPARK-10479: --- Seems OK, but this seems so logically related to SPARK-10480 that

[jira] [Updated] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when two table do cross join

2015-09-08 Thread Yi Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhou updated SPARK-10484: Summary: [Spark SQL] Come across lost task(timeout) or GC OOM error when two table do cross join (was:

[jira] [Assigned] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when two tables do cross join

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10484: Assignee: Apache Spark > [Spark SQL] Come across lost task(timeout) or GC OOM error when

[jira] [Assigned] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when two tables do cross join

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10484: Assignee: (was: Apache Spark) > [Spark SQL] Come across lost task(timeout) or GC OOM

[jira] [Commented] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when two tables do cross join

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734389#comment-14734389 ] Apache Spark commented on SPARK-10484: -- User 'chenghao-intel' has created a pull request for this

[jira] [Updated] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when two tables do cross join

2015-09-08 Thread Yi Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhou updated SPARK-10484: Summary: [Spark SQL] Come across lost task(timeout) or GC OOM error when two tables do cross join (was:

[jira] [Updated] (SPARK-10483) spark-submit can not support symbol link

2015-09-08 Thread xuqing (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuqing updated SPARK-10483: --- Description: Create a symbol link for spark-submit run spark-submit meets following errors: {color:red}

[jira] [Updated] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when cross join happen

2015-09-08 Thread Yi Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhou updated SPARK-10484: Description: Found that it lost task or GC OOM when below cross join happen. The left big table is ~1.2G

[jira] [Comment Edited] (SPARK-6350) Make mesosExecutorCores configurable in mesos "fine-grained" mode

2015-09-08 Thread Iulian Dragos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734575#comment-14734575 ] Iulian Dragos edited comment on SPARK-6350 at 9/8/15 10:26 AM: --- I'm

[jira] [Reopened] (SPARK-6350) Make mesosExecutorCores configurable in mesos "fine-grained" mode

2015-09-08 Thread Iulian Dragos (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Iulian Dragos reopened SPARK-6350: -- I'm re-opening this, since in the meantime this regressed. See changes in d86bbb, which regressed

[jira] [Commented] (SPARK-10479) LogisticRegression copy should copy model summary if available

2015-09-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734606#comment-14734606 ] Sean Owen commented on SPARK-10479: --- This is already being fixed in

[jira] [Created] (SPARK-10486) Spark intermittently fails to recover from a worker failure (in standalone mode)

2015-09-08 Thread Cheuk Lam (JIRA)
Cheuk Lam created SPARK-10486: - Summary: Spark intermittently fails to recover from a worker failure (in standalone mode) Key: SPARK-10486 URL: https://issues.apache.org/jira/browse/SPARK-10486 Project:

[jira] [Updated] (SPARK-10486) Spark intermittently fails to recover from a worker failure (in standalone mode)

2015-09-08 Thread Cheuk Lam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheuk Lam updated SPARK-10486: -- Description: We have run into a problem where some Spark job is aborted after one worker is killed in

[jira] [Commented] (SPARK-5421) SparkSql throw OOM at shuffle

2015-09-08 Thread Romi Kuntsman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5421?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734561#comment-14734561 ] Romi Kuntsman commented on SPARK-5421: -- does this still happen on the latest version? I got some OOM

[jira] [Commented] (SPARK-9610) Class and instance weighting for ML

2015-09-08 Thread Nickolay Yakushev (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734556#comment-14734556 ] Nickolay Yakushev commented on SPARK-9610: -- 1. Is basic statistics a good candidate for this

[jira] [Updated] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when two tables do cross join

2015-09-08 Thread Yi Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhou updated SPARK-10484: Description: Found that it lost task or GC OOM when below cross join happen. The left big table is ~1.2G

[jira] [Updated] (SPARK-10484) [Spark SQL] Come across lost task(timeout) or GC OOM error when two tables do cross join

2015-09-08 Thread Yi Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yi Zhou updated SPARK-10484: Description: Found that it lost task or GC OOM when below cross join happen. The left big table is ~1.2G

[jira] [Updated] (SPARK-10479) LogisticRegression copy should copy model summary if available

2015-09-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10479: -- Assignee: Yanbo Liang > LogisticRegression copy should copy model summary if available >

[jira] [Commented] (SPARK-10288) Add a rest client for Spark on Yarn

2015-09-08 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734637#comment-14734637 ] Steve Loughran commented on SPARK-10288: The long-haul filesystem communications is addressed by

[jira] [Commented] (SPARK-5791) [Spark SQL] show poor performance when multiple table do join operation

2015-09-08 Thread Yi Zhou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734721#comment-14734721 ] Yi Zhou commented on SPARK-5791: [~yhuai], Yes. Thank you ! > [Spark SQL] show poor performance when

[jira] [Updated] (SPARK-10480) ML.LinearRegressionModel.copy() can not use argument "extra"

2015-09-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-10480: -- Assignee: Yanbo Liang > ML.LinearRegressionModel.copy() can not use argument "extra" >

[jira] [Commented] (SPARK-6350) Make mesosExecutorCores configurable in mesos "fine-grained" mode

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734706#comment-14734706 ] Apache Spark commented on SPARK-6350: - User 'dragos' has created a pull request for this issue:

[jira] [Updated] (SPARK-10486) Spark intermittently fails to recover from a worker failure (in standalone mode)

2015-09-08 Thread Cheuk Lam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheuk Lam updated SPARK-10486: -- Description: We have run into a problem where some Spark job is aborted after one worker is killed in

[jira] [Commented] (SPARK-9435) Java UDFs don't work with GROUP BY expressions

2015-09-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735416#comment-14735416 ] Michael Armbrust commented on SPARK-9435: - >From a quick glance, the problem is likely that the

[jira] [Commented] (SPARK-10441) Cannot write timestamp to JSON

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735454#comment-14735454 ] Apache Spark commented on SPARK-10441: -- User 'yhuai' has created a pull request for this issue:

[jira] [Assigned] (SPARK-10492) Update Streaming documentation about rate limiting and backpressure

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10492: Assignee: Tathagata Das (was: Apache Spark) > Update Streaming documentation about rate

[jira] [Assigned] (SPARK-10492) Update Streaming documentation about rate limiting and backpressure

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10492: Assignee: Apache Spark (was: Tathagata Das) > Update Streaming documentation about rate

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-09-08 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735392#comment-14735392 ] Davies Liu commented on SPARK-8632: --- [~rxin] As [~justin.uang] suggested before, the batch mode will

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-09-08 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735406#comment-14735406 ] Justin Uang commented on SPARK-8632: Davies, what do you mean by upstream? I didn't quite understand

[jira] [Resolved] (SPARK-10470) ml.IsotonicRegressionModel.copy did not set parent

2015-09-08 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-10470. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8637

[jira] [Commented] (SPARK-10492) Update Streaming documentation about rate limiting and backpressure

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735510#comment-14735510 ] Apache Spark commented on SPARK-10492: -- User 'tdas' has created a pull request for this issue:

[jira] [Updated] (SPARK-10470) ml.IsotonicRegressionModel.copy did not set parent

2015-09-08 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10470: -- Fix Version/s: 1.5.1 > ml.IsotonicRegressionModel.copy did not set parent >

[jira] [Commented] (SPARK-10309) Some tasks failed with Unable to acquire memory

2015-09-08 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735401#comment-14735401 ] Davies Liu commented on SPARK-10309: [~nadenf] In my case, the job finally finished (after retry), so

[jira] [Created] (SPARK-10492) Update Streaming documentation about rate limiting and backpressure

2015-09-08 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-10492: - Summary: Update Streaming documentation about rate limiting and backpressure Key: SPARK-10492 URL: https://issues.apache.org/jira/browse/SPARK-10492 Project: Spark

[jira] [Updated] (SPARK-10470) ml.IsotonicRegressionModel.copy did not set parent

2015-09-08 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10470: -- Target Version/s: 1.6.0, 1.5.1 > ml.IsotonicRegressionModel.copy did not set parent >

[jira] [Created] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Glenn Strycker (JIRA)
Glenn Strycker created SPARK-10493: -- Summary: reduceByKey not returning distinct results Key: SPARK-10493 URL: https://issues.apache.org/jira/browse/SPARK-10493 Project: Spark Issue Type:

[jira] [Commented] (SPARK-10373) Move @since annotator to pyspark to be shared by all components

2015-09-08 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735520#comment-14735520 ] Xiangrui Meng commented on SPARK-10373: --- No, this is for 1.6. > Move @since annotator to pyspark

[jira] [Resolved] (SPARK-10316) respect non-deterministic expressions in PhysicalOperation

2015-09-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10316. -- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8486

[jira] [Updated] (SPARK-10470) ml.IsotonicRegressionModel.copy did not set parent

2015-09-08 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10470: -- Assignee: Yanbo Liang > ml.IsotonicRegressionModel.copy did not set parent >

[jira] [Closed] (SPARK-6101) Create a SparkSQL DataSource API implementation for DynamoDB

2015-09-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin closed SPARK-6101. -- Resolution: Won't Fix Assignee: (was: Chris Fregly) Fix Version/s: (was: 1.6.0)

[jira] [Assigned] (SPARK-9014) Allow Python spark API to use built-in exponential operator

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9014: --- Assignee: (was: Apache Spark) > Allow Python spark API to use built-in exponential

[jira] [Assigned] (SPARK-9014) Allow Python spark API to use built-in exponential operator

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-9014: --- Assignee: Apache Spark > Allow Python spark API to use built-in exponential operator >

[jira] [Commented] (SPARK-9014) Allow Python spark API to use built-in exponential operator

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735664#comment-14735664 ] Apache Spark commented on SPARK-9014: - User '0x0FFF' has created a pull request for this issue:

[jira] [Commented] (SPARK-10442) select cast('false' as boolean) returns true

2015-09-08 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735682#comment-14735682 ] Yin Huai commented on SPARK-10442: -- A related Hive jira is

[jira] [Updated] (SPARK-9769) Add Python API for ml.feature.CountVectorizer

2015-09-08 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-9769: - Summary: Add Python API for ml.feature.CountVectorizer (was: Add Python API for

[jira] [Commented] (SPARK-10441) Cannot write timestamp to JSON

2015-09-08 Thread Don Drake (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735690#comment-14735690 ] Don Drake commented on SPARK-10441: --- I see that PR 8597 was merged into master. Does master represent

[jira] [Closed] (SPARK-10482) Add Python interface for CountVectorizer

2015-09-08 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng closed SPARK-10482. - Resolution: Duplicate > Add Python interface for CountVectorizer >

[jira] [Updated] (SPARK-9769) Add Python API for ml.feature.CountVectorizer

2015-09-08 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9769?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-9769: - Assignee: holdenk Target Version/s: 1.6.0 Priority: Major (was: Minor)

[jira] [Commented] (SPARK-10408) Autoencoder

2015-09-08 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735706#comment-14735706 ] Debasish Das commented on SPARK-10408: -- [~avulanov] In MLP can we change BFGS to OWLQN and get L1

[jira] [Commented] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail

2015-09-08 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735735#comment-14735735 ] Yin Huai commented on SPARK-10301: -- [~lian cheng] Let's also have a follow-up pr for the master branch

[jira] [Updated] (SPARK-10466) UnsafeRow exception in Sort-Based Shuffle with data spill

2015-09-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10466: Target Version/s: 1.5.1 (was: 1.5.0) > UnsafeRow exception in Sort-Based Shuffle with data spill

[jira] [Commented] (SPARK-2620) case class cannot be used as key for reduce

2015-09-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735557#comment-14735557 ] Glenn Strycker commented on SPARK-2620: --- I am finding similar behavior for a non-case-class RDD...

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735626#comment-14735626 ] Glenn Strycker commented on SPARK-10493: Thanks for the speedy follow-up, [~frosner]! I'm

[jira] [Commented] (SPARK-10441) Cannot write timestamp to JSON

2015-09-08 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735702#comment-14735702 ] Yin Huai commented on SPARK-10441: -- [~dondrake] https://github.com/apache/spark/pull/8655 is the 1.5

[jira] [Updated] (SPARK-10301) For struct type, if parquet's global schema has less fields than a file's schema, data reading will fail

2015-09-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10301: - Labels: backport-needed (was: ) > For struct type, if parquet's global schema has less

[jira] [Resolved] (SPARK-10428) Struct fields read from parquet are mis-aligned

2015-09-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10428. -- Resolution: Fixed > Struct fields read from parquet are mis-aligned >

[jira] [Commented] (SPARK-9503) Mesos dispatcher NullPointerException (MesosClusterScheduler)

2015-09-08 Thread Sal Uryasev (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735750#comment-14735750 ] Sal Uryasev commented on SPARK-9503: Someone on my team is hitting the same bug. There is something

[jira] [Updated] (SPARK-6101) Create a SparkSQL DataSource API implementation for DynamoDB

2015-09-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-6101: --- Affects Version/s: (was: 1.2.0) > Create a SparkSQL DataSource API implementation for DynamoDB >

[jira] [Assigned] (SPARK-10373) Move @since annotator to pyspark to be shared by all components

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10373: Assignee: Apache Spark (was: Davies Liu) > Move @since annotator to pyspark to be shared

[jira] [Assigned] (SPARK-10373) Move @since annotator to pyspark to be shared by all components

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10373: Assignee: Davies Liu (was: Apache Spark) > Move @since annotator to pyspark to be shared

[jira] [Commented] (SPARK-10373) Move @since annotator to pyspark to be shared by all components

2015-09-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735581#comment-14735581 ] Apache Spark commented on SPARK-10373: -- User 'davies' has created a pull request for this issue:

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Glenn Strycker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735653#comment-14735653 ] Glenn Strycker commented on SPARK-10493: Note: this only seems to be occurring "at scale" so far.

[jira] [Updated] (SPARK-10468) Verify schema before Dataframe select API call

2015-09-08 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10468: -- Assignee: Vinod KC > Verify schema before Dataframe select API call >

[jira] [Updated] (SPARK-9717) Document persistence recommendation for MulticlassMetrics

2015-09-08 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9717: - Description: If a user wants to request multiple metrics from MulticlassMetrics, they

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-09-08 Thread Justin Uang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735749#comment-14735749 ] Justin Uang commented on SPARK-8632: I set the batch mode to be 100, which is the same as before

[jira] [Resolved] (SPARK-10441) Cannot write timestamp to JSON

2015-09-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10441. -- Resolution: Fixed Fix Version/s: 1.5.1 1.6.0 Issue resolved

[jira] [Commented] (SPARK-8632) Poor Python UDF performance because of RDD caching

2015-09-08 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735644#comment-14735644 ] Davies Liu commented on SPARK-8632: --- The upstream means child of current SparkPlan, could have other

[jira] [Updated] (SPARK-10474) Aggregation failed with unable to acquire memory

2015-09-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10474: Description: In aggregation case, a Lost task happened with below error. {code}

[jira] [Commented] (SPARK-10304) Partition discovery does not throw an exception if the dir structure is invalid

2015-09-08 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735710#comment-14735710 ] Zhan Zhang commented on SPARK-10304: Did more investigation. Currently all files are included

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Frank Rosner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735598#comment-14735598 ] Frank Rosner commented on SPARK-10493: -- Thanks for submitting the issue, [~glenn.strycker] :) Can

[jira] [Commented] (SPARK-10482) Add Python interface for CountVectorizer

2015-09-08 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735639#comment-14735639 ] holdenk commented on SPARK-10482: - This seems to duplicate

[jira] [Commented] (SPARK-10442) select cast('false' as boolean) returns true

2015-09-08 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735668#comment-14735668 ] Yin Huai commented on SPARK-10442: -- [~lian cheng] Looks like postgresql support more string literals

[jira] [Resolved] (SPARK-10468) Verify schema before Dataframe select API call

2015-09-08 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-10468. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8636

[jira] [Updated] (SPARK-10492) Update Streaming documentation about rate limiting and backpressure

2015-09-08 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-10492: -- Affects Version/s: (was: 1.5.0) > Update Streaming documentation about rate limiting and

[jira] [Resolved] (SPARK-10492) Update Streaming documentation about rate limiting and backpressure

2015-09-08 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-10492. --- Resolution: Fixed Fix Version/s: 1.5.0 > Update Streaming documentation about rate

[jira] [Comment Edited] (SPARK-10441) Cannot write timestamp to JSON

2015-09-08 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735702#comment-14735702 ] Yin Huai edited comment on SPARK-10441 at 9/8/15 10:00 PM: --- [~dondrake]

[jira] [Updated] (SPARK-9717) Document persistence recommendation for MulticlassMetrics

2015-09-08 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-9717: - Summary: Document persistence recommendation for MulticlassMetrics (was: Add persistence

[jira] [Commented] (SPARK-9717) Document persistence recommendation for MulticlassMetrics

2015-09-08 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735726#comment-14735726 ] Joseph K. Bradley commented on SPARK-9717: -- True. Changing this to document recommendation

[jira] [Created] (SPARK-10494) Multiple Python UDFs together with aggregation or sort merge join may cause OOM (failed to acquire memory)

2015-09-08 Thread Davies Liu (JIRA)
Davies Liu created SPARK-10494: -- Summary: Multiple Python UDFs together with aggregation or sort merge join may cause OOM (failed to acquire memory) Key: SPARK-10494 URL:

[jira] [Commented] (SPARK-10467) Vector is converted to tuple when extracted from Row using __getitem__

2015-09-08 Thread Alexey Grishchenko (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10467?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734780#comment-14734780 ] Alexey Grishchenko commented on SPARK-10467: Issue is not reproduced on master: {code} >>>

[jira] [Updated] (SPARK-10486) Spark intermittently fails to recover from a worker failure (in standalone mode)

2015-09-08 Thread Cheuk Lam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheuk Lam updated SPARK-10486: -- Description: We have run into a problem where some Spark job is aborted after one worker is killed in

[jira] [Comment Edited] (SPARK-6101) Create a SparkSQL DataSource API implementation for DynamoDB

2015-09-08 Thread Rustam Aliyev (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6101?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734830#comment-14734830 ] Rustam Aliyev edited comment on SPARK-6101 at 9/8/15 1:58 PM: -- What's the

[jira] [Commented] (SPARK-4940) Support more evenly distributing cores for Mesos mode

2015-09-08 Thread Martin Tapp (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734792#comment-14734792 ] Martin Tapp commented on SPARK-4940: I see your point and thinking about it, round-robin is excellent

[jira] [Commented] (SPARK-10493) reduceByKey not returning distinct results

2015-09-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735772#comment-14735772 ] Sean Owen commented on SPARK-10493: --- There are some key pieces of info missing, like what the key and

[jira] [Commented] (SPARK-10466) UnsafeRow exception in Sort-Based Shuffle with data spill

2015-09-08 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735776#comment-14735776 ] Davies Liu commented on SPARK-10466: [~chenghao] I tried your test case, it passed in master. Is

[jira] [Commented] (SPARK-10433) Gradient boosted trees

2015-09-08 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14735800#comment-14735800 ] Joseph K. Bradley commented on SPARK-10433: --- I had seen the input size growing, but I missed

[jira] [Resolved] (SPARK-10327) Cache Table is not working while subquery has alias in its project list

2015-09-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-10327. -- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8494

  1   2   3   >