[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mark Grover (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174149#comment-15174149 ] Mark Grover commented on SPARK-12177: - Thanks Cody, I appreciate your thoughts. I have been keeping

[jira] [Created] (SPARK-13600) Incorrect number of buckets in QuantileDiscretizer

2016-03-01 Thread Oliver Pierson (JIRA)
Oliver Pierson created SPARK-13600: -- Summary: Incorrect number of buckets in QuantileDiscretizer Key: SPARK-13600 URL: https://issues.apache.org/jira/browse/SPARK-13600 Project: Spark Issue

[jira] [Updated] (SPARK-13583) Support `UnusedImports` Java checkstyle rule

2016-03-01 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-13583: -- Priority: Minor (was: Trivial) Description: After SPARK-6990, `dev/lint-java` keeps

[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function

2016-03-01 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174193#comment-15174193 ] Shubhanshu Mishra commented on SPARK-13525: --- I am running my code from the interactive R

[jira] [Resolved] (SPARK-10694) Prevent Data Loss in Spark Streaming when used with OFF_HEAP ExternalBlockStore (Tachyon)

2016-03-01 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-10694. Resolution: Won't Fix Due to the removal of the ExternalBlockStore API in SPARK-12667, I think

[jira] [Updated] (SPARK-13449) Naive Bayes wrapper in SparkR

2016-03-01 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-13449: -- Description: Following SPARK-13011, we can add a wrapper for naive Bayes in SparkR. R's naive

[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function

2016-03-01 Thread Shubhanshu Mishra (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174191#comment-15174191 ] Shubhanshu Mishra commented on SPARK-13525: --- Hi [~felixcheung], I used the steps from the link

[jira] [Resolved] (SPARK-6112) Provide external block store support through HDFS RAM_DISK

2016-03-01 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-6112. --- Resolution: Won't Fix > Provide external block store support through HDFS RAM_DISK >

[jira] [Resolved] (SPARK-10314) [CORE]RDD persist to OFF_HEAP tachyon got block rdd_x_x not found exception when parallelism is big than data split size

2016-03-01 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-10314. Resolution: Cannot Reproduce Due to the removal of the ExternalBlockStore API in SPARK-12667, I

[jira] [Updated] (SPARK-13449) Naive Bayes wrapper in SparkR

2016-03-01 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-13449: -- Shepherd: Yanbo Liang > Naive Bayes wrapper in SparkR > - > >

[jira] [Commented] (SPARK-6112) Provide external block store support through HDFS RAM_DISK

2016-03-01 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174238#comment-15174238 ] Josh Rosen commented on SPARK-6112: --- Due to the removal of the ExternalBlockStore API in SPARK-12667, I

[jira] [Issue Comment Deleted] (SPARK-10659) DataFrames and SparkSQL saveAsParquetFile does not preserve REQUIRED (not nullable) flag in schema

2016-03-01 Thread Paul Greyson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Paul Greyson updated SPARK-10659: - Comment: was deleted (was: I believe this makes predicate pushdown in parquet useless due to

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174212#comment-15174212 ] Cody Koeninger commented on SPARK-12177: I'm happy to help in whatever way. If people think a

[jira] [Resolved] (SPARK-7477) TachyonBlockManager Store Block in TRY_CACHE mode which gives BlockNotFoundException when blocks are evicted from cache

2016-03-01 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-7477. --- Resolution: Cannot Reproduce Resolving as "Cannot Reproduce" / "Won't Fix" for now. Due to the

[jira] [Commented] (SPARK-13073) creating R like summary for logistic Regression in Spark - Scala

2016-03-01 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174691#comment-15174691 ] Joseph K. Bradley commented on SPARK-13073: --- It sounds reasonable to provide the same printed

[jira] [Created] (SPARK-13605) Bean encoder cannot handle nonbean properties - no way to Encode nonbean Java objects with columns

2016-03-01 Thread Steven Lewis (JIRA)
Steven Lewis created SPARK-13605: Summary: Bean encoder cannot handle nonbean properties - no way to Encode nonbean Java objects with columns Key: SPARK-13605 URL:

[jira] [Commented] (SPARK-13573) Open SparkR APIs (R package) to allow better 3rd party usage

2016-03-01 Thread Sun Rui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174726#comment-15174726 ] Sun Rui commented on SPARK-13573: - [~chipsenkbeil] glad to know Toree is to support SparkR. I tried it

[jira] [Resolved] (SPARK-13548) Move tags and unsafe modules into common

2016-03-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13548. - Resolution: Fixed Fix Version/s: 2/ > Move tags and unsafe modules into common >

[jira] [Updated] (SPARK-13548) Move tags and unsafe modules into common

2016-03-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-13548: Fix Version/s: (was: 2/) 2.0.0 > Move tags and unsafe modules into common >

[jira] [Commented] (SPARK-13525) SparkR: java.net.SocketTimeoutException: Accept timed out when running any dataframe function

2016-03-01 Thread Sun Rui (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174720#comment-15174720 ] Sun Rui commented on SPARK-13525: - the interactive R session is for your driver, Rscript is needed for

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mark Grover (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174856#comment-15174856 ] Mark Grover commented on SPARK-12177: - Hi [~tdas] and [~rxin], can you help us with your opinion on

[jira] [Commented] (SPARK-13596) Move misc top-level build files into appropriate subdirs

2016-03-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13596?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174642#comment-15174642 ] Reynold Xin commented on SPARK-13596: - Are those dot files even possible to move? > Move misc

[jira] [Created] (SPARK-13603) SQL generation for subquery

2016-03-01 Thread Davies Liu (JIRA)
Davies Liu created SPARK-13603: -- Summary: SQL generation for subquery Key: SPARK-13603 URL: https://issues.apache.org/jira/browse/SPARK-13603 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-13030) Change OneHotEncoder to Estimator

2016-03-01 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174685#comment-15174685 ] Joseph K. Bradley commented on SPARK-13030: --- I agree this is an issue, but I think we need to

[jira] [Updated] (SPARK-13604) Sync worker's state after registering with master

2016-03-01 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-13604: - Description: If Master cannot talk with Worker for a while and then network is back, Worker may

[jira] [Assigned] (SPARK-13604) Sync worker's state after registering with master

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13604: Assignee: Apache Spark (was: Shixiong Zhu) > Sync worker's state after registering with

[jira] [Created] (SPARK-13604) Sync worker's state after registering with master

2016-03-01 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-13604: Summary: Sync worker's state after registering with master Key: SPARK-13604 URL: https://issues.apache.org/jira/browse/SPARK-13604 Project: Spark Issue

[jira] [Closed] (SPARK-13586) add config to skip generate down time batch when restart StreamingContext

2016-03-01 Thread jeanlyn (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jeanlyn closed SPARK-13586. --- Resolution: Invalid > add config to skip generate down time batch when restart StreamingContext >

[jira] [Updated] (SPARK-13604) Sync worker's state after registering with master

2016-03-01 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-13604: - Description: If Master cannot talk with Worker for a while and then network is back, Worker may

[jira] [Assigned] (SPARK-13604) Sync worker's state after registering with master

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13604: Assignee: Shixiong Zhu (was: Apache Spark) > Sync worker's state after registering with

[jira] [Updated] (SPARK-13604) Sync worker's state after registering with master

2016-03-01 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13604?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-13604: - Description: If Master cannot talk with Worker for a while and then network is back, Worker may

[jira] [Commented] (SPARK-13604) Sync worker's state after registering with master

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174724#comment-15174724 ] Apache Spark commented on SPARK-13604: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Commented] (SPARK-13230) HashMap.merged not working properly with Spark

2016-03-01 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-13230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174809#comment-15174809 ] Łukasz Gieroń commented on SPARK-13230: --- [~srowen] Can you please assign me to this ticket? I have

[jira] [Resolved] (SPARK-13167) JDBC data source does not include null value partition columns rows in the result.

2016-03-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-13167. - Resolution: Fixed Assignee: Suresh Thalamati Fix Version/s: 2.0.0 > JDBC data

[jira] [Commented] (SPARK-7768) Make user-defined type (UDT) API public

2016-03-01 Thread Jaka Jancar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174807#comment-15174807 ] Jaka Jancar commented on SPARK-7768: [~randallwhitman] UDT, not UDF:

[jira] [Comment Edited] (SPARK-7768) Make user-defined type (UDT) API public

2016-03-01 Thread Randall Whitman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15169380#comment-15169380 ] Randall Whitman edited comment on SPARK-7768 at 3/2/16 1:47 AM: Am I

[jira] [Commented] (SPARK-13213) BroadcastNestedLoopJoin is very slow

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174441#comment-15174441 ] Apache Spark commented on SPARK-13213: -- User 'davies' has created a pull request for this issue:

[jira] [Commented] (SPARK-13307) TPCDS query 66 degraded by 30% in 1.6.0 compared to 1.4.1

2016-03-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174501#comment-15174501 ] Xiao Li commented on SPARK-13307: - Really thank you for your response. > TPCDS query 66 degraded by 30%

[jira] [Created] (SPARK-13602) o.a.s.deploy.worker.DriverRunner may leak the driver processes

2016-03-01 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-13602: Summary: o.a.s.deploy.worker.DriverRunner may leak the driver processes Key: SPARK-13602 URL: https://issues.apache.org/jira/browse/SPARK-13602 Project: Spark

[jira] [Assigned] (SPARK-13603) SQL generation for subquery

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13603: Assignee: Apache Spark (was: Davies Liu) > SQL generation for subquery >

[jira] [Assigned] (SPARK-13603) SQL generation for subquery

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13603: Assignee: Davies Liu (was: Apache Spark) > SQL generation for subquery >

[jira] [Commented] (SPARK-13603) SQL generation for subquery

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13603?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174649#comment-15174649 ] Apache Spark commented on SPARK-13603: -- User 'davies' has created a pull request for this issue:

[jira] [Commented] (SPARK-13574) Improve parquet dictionary decoding for strings

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174656#comment-15174656 ] Apache Spark commented on SPARK-13574: -- User 'nongli' has created a pull request for this issue:

[jira] [Resolved] (SPARK-13598) Remove LeftSemiJoinBNL

2016-03-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-13598. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11448

[jira] [Commented] (SPARK-13592) pyspark failed to launch on Windows client

2016-03-01 Thread Masayoshi TSUZUKI (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173472#comment-15173472 ] Masayoshi TSUZUKI commented on SPARK-13592: --- The error message says {quote}

[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2016-03-01 Thread Daniel Jouany (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173489#comment-15173489 ] Daniel Jouany commented on SPARK-10795: --- Hi - I am facing the exact same problem. However * I do

[jira] [Comment Edited] (SPARK-13587) Support virtualenv in PySpark

2016-03-01 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173228#comment-15173228 ] Jeff Zhang edited comment on SPARK-13587 at 3/1/16 8:30 AM: This method is

[jira] [Created] (SPARK-13592) pyspark failed to launch on Windows client

2016-03-01 Thread Masayoshi TSUZUKI (JIRA)
Masayoshi TSUZUKI created SPARK-13592: - Summary: pyspark failed to launch on Windows client Key: SPARK-13592 URL: https://issues.apache.org/jira/browse/SPARK-13592 Project: Spark Issue

[jira] [Assigned] (SPARK-13592) pyspark failed to launch on Windows client

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13592: Assignee: Apache Spark > pyspark failed to launch on Windows client >

[jira] [Commented] (SPARK-13592) pyspark failed to launch on Windows client

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173474#comment-15173474 ] Apache Spark commented on SPARK-13592: -- User 'tsudukim' has created a pull request for this issue:

[jira] [Assigned] (SPARK-13592) pyspark failed to launch on Windows client

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13592: Assignee: (was: Apache Spark) > pyspark failed to launch on Windows client >

[jira] [Assigned] (SPARK-13591) Remove Back-ticks in Attribute/Alias Names

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13591: Assignee: (was: Apache Spark) > Remove Back-ticks in Attribute/Alias Names >

[jira] [Commented] (SPARK-13591) Remove Back-ticks in Attribute/Alias Names

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173418#comment-15173418 ] Apache Spark commented on SPARK-13591: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Assigned] (SPARK-13591) Remove Back-ticks in Attribute/Alias Names

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13591: Assignee: Apache Spark > Remove Back-ticks in Attribute/Alias Names >

[jira] [Commented] (SPARK-10795) FileNotFoundException while deploying pyspark job on cluster

2016-03-01 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173503#comment-15173503 ] Jeff Zhang commented on SPARK-10795: What's your spark version ? And is it possible for you attach

[jira] [Commented] (SPARK-13117) WebUI should use the local ip not 0.0.0.0

2016-03-01 Thread Jeremiah Jordan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173511#comment-15173511 ] Jeremiah Jordan commented on SPARK-13117: - If you want minimal side effects to this then you can

[jira] [Closed] (SPARK-10712) JVM crashes with spark.sql.tungsten.enabled = true

2016-03-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu closed SPARK-10712. -- Resolution: Cannot Reproduce > JVM crashes with spark.sql.tungsten.enabled = true >

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mansi Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174534#comment-15174534 ] Mansi Shah commented on SPARK-12177: Thanks for the explanation Cody. I understand committing from

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mansi Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174490#comment-15174490 ] Mansi Shah commented on SPARK-12177: Sorry I forgot to mention that the numbers I quoted above were

[jira] [Commented] (SPARK-13463) Support Column pruning for Dataset logical plan

2016-03-01 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174506#comment-15174506 ] Xiao Li commented on SPARK-13463: - I see your points. Thank you! Let me try to write such test cases. : )

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174507#comment-15174507 ] Cody Koeninger commented on SPARK-12177: Thanks for the example of performance numbers. The

[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-01 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174438#comment-15174438 ] Nicholas Chammas commented on SPARK-7481: - Many people seem to be downgrading to use Spark built

[jira] [Updated] (SPARK-12313) getPartitionsByFilter doesnt handle predicates on all / multiple Partition Columns

2016-03-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-12313: --- Assignee: Harsh Gupta > getPartitionsByFilter doesnt handle predicates on all / multiple Partition

[jira] [Updated] (SPARK-13499) Optimize vectorized parquet reader for dictionary encoded data and RLE decoding

2016-03-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-13499: --- Assignee: Davies Liu (was: Nong Li) > Optimize vectorized parquet reader for dictionary encoded

[jira] [Commented] (SPARK-13463) Support Column pruning for Dataset logical plan

2016-03-01 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174448#comment-15174448 ] Michael Armbrust commented on SPARK-13463: -- If you are reading in a wide parquet schema and you

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mansi Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174485#comment-15174485 ] Mansi Shah commented on SPARK-12177: Caching the new consumer across batches. I did not know how to

[jira] [Created] (SPARK-13601) Invoke task failure callbacks before calling outputstream.close()

2016-03-01 Thread Davies Liu (JIRA)
Davies Liu created SPARK-13601: -- Summary: Invoke task failure callbacks before calling outputstream.close() Key: SPARK-13601 URL: https://issues.apache.org/jira/browse/SPARK-13601 Project: Spark

[jira] [Commented] (SPARK-7481) Add Hadoop 2.6+ profile to pull in object store FS accessors

2016-03-01 Thread Peng Cheng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-7481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174442#comment-15174442 ] Peng Cheng commented on SPARK-7481: --- +1 Me four > Add Hadoop 2.6+ profile to pull in object store FS

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Mansi Shah (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174392#comment-15174392 ] Mansi Shah commented on SPARK-12177: Cody / Mark I am glad you are discussing about caching the

[jira] [Updated] (SPARK-13499) Optimize vectorized parquet reader for dictionary encoded data and RLE decoding

2016-03-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu updated SPARK-13499: --- Assignee: Nong Li (was: Davies Liu) > Optimize vectorized parquet reader for dictionary encoded

[jira] [Resolved] (SPARK-13582) Improve performance of parquet reader with dictionary encoding

2016-03-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Davies Liu resolved SPARK-13582. Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request 11437

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174415#comment-15174415 ] Cody Koeninger commented on SPARK-12177: Mansi are you talking about performance improvements

[jira] [Updated] (SPARK-13319) Pyspark VectorSlicer, StopWordsRemvoer should have setDefault

2016-03-01 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-13319: -- Summary: Pyspark VectorSlicer, StopWordsRemvoer should have setDefault (was: Pyspark VectorSlicer

[jira] [Updated] (SPARK-13319) Pyspark VectorSlicer, StopWordsRemvoer should have setDefault

2016-03-01 Thread Xusen Yin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13319?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xusen Yin updated SPARK-13319: -- Description: Pyspark VectorSlicer should have setDefault, otherwise it will cause error when calling

[jira] [Commented] (SPARK-13174) Add API and options for csv data sources

2016-03-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174451#comment-15174451 ] Davies Liu commented on SPARK-13174: We may still need Python and R API, also some convenient

[jira] [Commented] (SPARK-13307) TPCDS query 66 degraded by 30% in 1.6.0 compared to 1.4.1

2016-03-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174473#comment-15174473 ] Davies Liu commented on SPARK-13307: In the plan, I saw that the column pruning does not work well,

[jira] [Assigned] (SPARK-13601) Invoke task failure callbacks before calling outputstream.close()

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13601: Assignee: Davies Liu (was: Apache Spark) > Invoke task failure callbacks before calling

[jira] [Assigned] (SPARK-13601) Invoke task failure callbacks before calling outputstream.close()

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13601?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13601: Assignee: Apache Spark (was: Davies Liu) > Invoke task failure callbacks before calling

[jira] [Commented] (SPARK-13601) Invoke task failure callbacks before calling outputstream.close()

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174433#comment-15174433 ] Apache Spark commented on SPARK-13601: -- User 'davies' has created a pull request for this issue:

[jira] [Commented] (SPARK-13463) Support Column pruning for Dataset logical plan

2016-03-01 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174445#comment-15174445 ] Davies Liu commented on SPARK-13463: [~smilegator] It could be true, then we just remove the TODO,

[jira] [Assigned] (SPARK-13576) Make examples jar not be an assembly

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13576: Assignee: Apache Spark > Make examples jar not be an assembly >

[jira] [Assigned] (SPARK-13576) Make examples jar not be an assembly

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13576?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13576: Assignee: (was: Apache Spark) > Make examples jar not be an assembly >

[jira] [Commented] (SPARK-13576) Make examples jar not be an assembly

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13576?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174587#comment-15174587 ] Apache Spark commented on SPARK-13576: -- User 'vanzin' has created a pull request for this issue:

[jira] [Commented] (SPARK-13174) Add API and options for csv data sources

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174912#comment-15174912 ] Apache Spark commented on SPARK-13174: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Assigned] (SPARK-13174) Add API and options for csv data sources

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13174: Assignee: (was: Apache Spark) > Add API and options for csv data sources >

[jira] [Commented] (SPARK-13606) Error from python worker: /usr/local/bin/python2.7: undefined symbol: _PyCodec_LookupTextEncoding

2016-03-01 Thread Avatar Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174953#comment-15174953 ] Avatar Zhang commented on SPARK-13606: -- /usr/local/bin/python2.7 can launch normally.

[jira] [Commented] (SPARK-13587) Support virtualenv in PySpark

2016-03-01 Thread Mike Sukmanowsky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175010#comment-15175010 ] Mike Sukmanowsky commented on SPARK-13587: -- Gotcha. I might suggest

[jira] [Comment Edited] (SPARK-12177) Update KafkaDStreams to new Kafka 0.9 Consumer API

2016-03-01 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175066#comment-15175066 ] Reynold Xin edited comment on SPARK-12177 at 3/2/16 5:32 AM: - This thread is

[jira] [Comment Edited] (SPARK-13141) Dataframe created from Hive partitioned tables using HiveContext returns wrong results

2016-03-01 Thread zhichao-li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13141?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174895#comment-15174895 ] zhichao-li edited comment on SPARK-13141 at 3/2/16 2:29 AM: Just try, but

[jira] [Assigned] (SPARK-13174) Add API and options for csv data sources

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13174: Assignee: Apache Spark > Add API and options for csv data sources >

[jira] [Commented] (SPARK-13073) creating R like summary for logistic Regression in Spark - Scala

2016-03-01 Thread Gayathri Murali (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174943#comment-15174943 ] Gayathri Murali commented on SPARK-13073: - I can work on this, can you please assign it to me? >

[jira] [Commented] (SPARK-13606) Error from python worker: /usr/local/bin/python2.7: undefined symbol: _PyCodec_LookupTextEncoding

2016-03-01 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174941#comment-15174941 ] Jeff Zhang commented on SPARK-13606: This might be python environment issue. Can you launch python on

[jira] [Commented] (SPARK-13025) Allow user to specify the initial model when training LogisticRegression

2016-03-01 Thread Gayathri Murali (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15174983#comment-15174983 ] Gayathri Murali commented on SPARK-13025: - PR : https://github.com/apache/spark/pull/11458 >

[jira] [Comment Edited] (SPARK-13587) Support virtualenv in PySpark

2016-03-01 Thread Jeff Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13587?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173228#comment-15173228 ] Jeff Zhang edited comment on SPARK-13587 at 3/2/16 4:17 AM: This method is

[jira] [Commented] (SPARK-13609) Support Column Pruning for MapPartitions

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15175011#comment-15175011 ] Apache Spark commented on SPARK-13609: -- User 'gatorsmile' has created a pull request for this issue:

[jira] [Assigned] (SPARK-13609) Support Column Pruning for MapPartitions

2016-03-01 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13609: Assignee: (was: Apache Spark) > Support Column Pruning for MapPartitions >

[jira] [Created] (SPARK-13611) import Aggregator doesn't work in Spark Shell

2016-03-01 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-13611: --- Summary: import Aggregator doesn't work in Spark Shell Key: SPARK-13611 URL: https://issues.apache.org/jira/browse/SPARK-13611 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-13612) Multiplication of BigDecimal columns not working as expected

2016-03-01 Thread Varadharajan (JIRA)
Varadharajan created SPARK-13612: Summary: Multiplication of BigDecimal columns not working as expected Key: SPARK-13612 URL: https://issues.apache.org/jira/browse/SPARK-13612 Project: Spark

[jira] [Updated] (SPARK-13010) Survival analysis in SparkR

2016-03-01 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-13010: -- Shepherd: yuhao yang (was: Xiangrui Meng) > Survival analysis in SparkR >

[jira] [Updated] (SPARK-13435) Add Weighted Cohen's kappa to MulticlassMetrics

2016-03-01 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-13435: -- Shepherd: (was: Xiangrui Meng) > Add Weighted Cohen's kappa to MulticlassMetrics >

  1   2   3   >