[jira] [Created] (SPARK-16109) Separate out statfunctions in generated R doc

2016-06-21 Thread Shivaram Venkataraman (JIRA)
Shivaram Venkataraman created SPARK-16109: - Summary: Separate out statfunctions in generated R doc Key: SPARK-16109 URL: https://issues.apache.org/jira/browse/SPARK-16109 Project: Spark

[jira] [Updated] (SPARK-16108) Why is KMeansModel (scala) private?

2016-06-21 Thread Florian Golemo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Golemo updated SPARK-16108: --- Description: Hey guys, I was wondering, in the file KMeans.scala (org/apache/spark/ml/clus

[jira] [Updated] (SPARK-16107) Group GLM-related methods in generated doc

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-16107: -- Assignee: Junyang Qian > Group GLM-related methods in generated doc > -

[jira] [Commented] (SPARK-15917) Define the number of executors in standalone mode with an easy-to-use property

2016-06-21 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342188#comment-15342188 ] Marcelo Vanzin commented on SPARK-15917: Like Sean, standalone mode is not really

[jira] [Updated] (SPARK-16107) Group GLM-related methods in generated doc

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-16107: -- Labels: starter (was: ) > Group GLM-related methods in generated doc > ---

[jira] [Updated] (SPARK-16107) Group GLM-related methods in generated doc

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-16107: -- Description: Group API docs of spark.glm, glm, predict(GLM), summary(GLM), read/write.ml(GLM) u

[jira] [Commented] (SPARK-16086) Python UDF failed when there is no arguments

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342187#comment-15342187 ] Apache Spark commented on SPARK-16086: -- User 'davies' has created a pull request for

[jira] [Updated] (SPARK-16108) Why is KMeansModel (scala) private?

2016-06-21 Thread Florian Golemo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Golemo updated SPARK-16108: --- Description: Hey guys, I was wondering, in the file KMeans.scala (org/apache/spark/ml/clus

[jira] [Updated] (SPARK-16108) Why is KMeansModel (scala) private?

2016-06-21 Thread Florian Golemo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Florian Golemo updated SPARK-16108: --- Description: Hey guys, I was wondering, in the file KMeans.scala (org/apache/spark/ml/clus

[jira] [Updated] (SPARK-16107) Group GLM-related methods in generated doc

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-16107: -- Description: spark.glm: spark.glm, glm, predict(GLM), summary(GLM), read/write.ml(GLM) > Group

[jira] [Created] (SPARK-16108) Why is KMeansModel (scala) private?

2016-06-21 Thread Florian Golemo (JIRA)
Florian Golemo created SPARK-16108: -- Summary: Why is KMeansModel (scala) private? Key: SPARK-16108 URL: https://issues.apache.org/jira/browse/SPARK-16108 Project: Spark Issue Type: Improveme

[jira] [Commented] (SPARK-16107) Group GLM-related methods in generated doc

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342178#comment-15342178 ] Xiangrui Meng commented on SPARK-16107: --- ping [~junyangq] > Group GLM-related meth

[jira] [Created] (SPARK-16107) Group GLM-related methods in generated doc

2016-06-21 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-16107: - Summary: Group GLM-related methods in generated doc Key: SPARK-16107 URL: https://issues.apache.org/jira/browse/SPARK-16107 Project: Spark Issue Type: Sub-

[jira] [Commented] (SPARK-16090) Improve method grouping in SparkR generated docs

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342173#comment-15342173 ] Xiangrui Meng commented on SPARK-16090: --- I changed the issue type to umbrella since

[jira] [Commented] (SPARK-15980) Add PushPredicateThroughObjectConsumer rule to Optimizer.

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342155#comment-15342155 ] Apache Spark commented on SPARK-15980: -- User 'shafiquejamal' has created a pull requ

[jira] [Updated] (SPARK-16090) Improve method grouping in SparkR generated docs

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-16090: -- Issue Type: Umbrella (was: Improvement) > Improve method grouping in SparkR generated docs > -

[jira] [Commented] (SPARK-16032) Audit semantics of various insertion operations related to partitioned tables

2016-06-21 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342135#comment-15342135 ] Ryan Blue commented on SPARK-16032: --- I agree with the push to unify the Hive and DataSo

[jira] [Commented] (SPARK-16100) Aggregator fails with Tungsten error when complex types are used for results and partial sum

2016-06-21 Thread Hiroshi Inoue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342055#comment-15342055 ] Hiroshi Inoue commented on SPARK-16100: --- Deenar, I identified the reason of this bu

[jira] [Assigned] (SPARK-16100) Aggregator fails with Tungsten error when complex types are used for results and partial sum

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16100: Assignee: Apache Spark > Aggregator fails with Tungsten error when complex types are used

[jira] [Commented] (SPARK-16100) Aggregator fails with Tungsten error when complex types are used for results and partial sum

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342043#comment-15342043 ] Apache Spark commented on SPARK-16100: -- User 'inouehrs' has created a pull request f

[jira] [Assigned] (SPARK-16100) Aggregator fails with Tungsten error when complex types are used for results and partial sum

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16100: Assignee: (was: Apache Spark) > Aggregator fails with Tungsten error when complex type

[jira] [Created] (SPARK-16106) TaskSchedulerImpl does not correctly handle new executors on existing hosts

2016-06-21 Thread Imran Rashid (JIRA)
Imran Rashid created SPARK-16106: Summary: TaskSchedulerImpl does not correctly handle new executors on existing hosts Key: SPARK-16106 URL: https://issues.apache.org/jira/browse/SPARK-16106 Project:

[jira] [Updated] (SPARK-15177) SparkR 2.0 QA: New R APIs and API docs for mllib.R

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-15177: -- Description: Audit new public R APIs in mllib.R (was: Audit new public R APIs in mllib.R.) >

[jira] [Updated] (SPARK-15177) SparkR 2.0 QA: make SparkR model params and default values consistent with MLlib

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-15177: -- Summary: SparkR 2.0 QA: make SparkR model params and default values consistent with MLlib (was

[jira] [Updated] (SPARK-15177) SparkR 2.0 QA: New R APIs and API docs for mllib.R

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-15177: -- Shepherd: Xiangrui Meng > SparkR 2.0 QA: New R APIs and API docs for mllib.R >

[jira] [Updated] (SPARK-15177) SparkR 2.0 QA: make SparkR model params and default values consistent with MLlib

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-15177: -- Description: Make SparkR model params and default values consistent with MLlib (was: Audit new

[jira] [Resolved] (SPARK-15177) SparkR 2.0 QA: New R APIs and API docs for mllib.R

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-15177. --- Resolution: Fixed Fix Version/s: 2.0.0 Target Version/s: 2.0.0 I marked this

[jira] [Updated] (SPARK-15177) SparkR 2.0 QA: New R APIs and API docs for mllib.R

2016-06-21 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-15177: -- Assignee: Yanbo Liang > SparkR 2.0 QA: New R APIs and API docs for mllib.R > --

[jira] [Commented] (SPARK-16015) Datasource register for shutdown?

2016-06-21 Thread Michael Nitschinger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341976#comment-15341976 ] Michael Nitschinger commented on SPARK-16015: - Yeah that sounds fair - althou

[jira] [Commented] (SPARK-16095) Yarn cluster mode should return consistent result for command line and SparkLauncher

2016-06-21 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341903#comment-15341903 ] Thomas Graves commented on SPARK-16095: --- FINISHED does not mean success, finished m

[jira] [Commented] (SPARK-12177) Update KafkaDStreams to new Kafka 0.10 Consumer API

2016-06-21 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12177?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341859#comment-15341859 ] Cody Koeninger commented on SPARK-12177: Thanks for the comments, clarified where

[jira] [Created] (SPARK-16105) PCA Reverse Transformer

2016-06-21 Thread Stefan Panayotov (JIRA)
Stefan Panayotov created SPARK-16105: Summary: PCA Reverse Transformer Key: SPARK-16105 URL: https://issues.apache.org/jira/browse/SPARK-16105 Project: Spark Issue Type: New Feature

[jira] [Commented] (SPARK-16100) Aggregator fails with Tungsten error when complex types are used for results and partial sum

2016-06-21 Thread Deenar Toraskar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341847#comment-15341847 ] Deenar Toraskar commented on SPARK-16100: - similar issue > Aggregator fails with

[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-21 Thread Deenar Toraskar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341844#comment-15341844 ] Deenar Toraskar commented on SPARK-15704: - done see https://issues.apache.org/jir

[jira] [Comment Edited] (SPARK-16087) Spark Hangs When Using Union With Persisted Hadoop RDD

2016-06-21 Thread Kevin Conaway (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341793#comment-15341793 ] Kevin Conaway edited comment on SPARK-16087 at 6/21/16 2:14 PM: ---

[jira] [Commented] (SPARK-16083) spark HistoryServer memory increases until gets killed by OS.

2016-06-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341840#comment-15341840 ] Sean Owen commented on SPARK-16083: --- OK, that does looks like the JVM is getting the op

[jira] [Commented] (SPARK-16015) Datasource register for shutdown?

2016-06-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341835#comment-15341835 ] Sean Owen commented on SPARK-16015: --- How about some kind of pool that simply times out

[jira] [Issue Comment Deleted] (SPARK-16015) Datasource register for shutdown?

2016-06-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-16015: -- Comment: was deleted (was: On the executor? I don't think there's a lifecycle there to manage, because

[jira] [Commented] (SPARK-16083) spark HistoryServer memory increases until gets killed by OS.

2016-06-21 Thread Sudhakar Thota (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341832#comment-15341832 ] Sudhakar Thota commented on SPARK-16083: 1. JVM is not allowed to use more than 1

[jira] [Updated] (SPARK-16087) Spark Hangs When Using Union With Persisted Hadoop RDD

2016-06-21 Thread Kevin Conaway (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Conaway updated SPARK-16087: -- Attachment: spark-16087.tar.gz I'm attaching a sample maven project that exhibits the issue as

[jira] [Commented] (SPARK-16087) Spark Hangs When Using Union With Persisted Hadoop RDD

2016-06-21 Thread Kevin Conaway (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341793#comment-15341793 ] Kevin Conaway commented on SPARK-16087: --- {quote} Are you actually using 1.4? {quote

[jira] [Commented] (SPARK-16032) Audit semantics of various insertion operations related to partitioned tables

2016-06-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341754#comment-15341754 ] Cheng Lian commented on SPARK-16032: [~rdblue], I also migrated some test cases from

[jira] [Commented] (SPARK-16037) use by-position resolution when insert into hive table

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341752#comment-15341752 ] Apache Spark commented on SPARK-16037: -- User 'liancheng' has created a pull request

[jira] [Commented] (SPARK-16015) Datasource register for shutdown?

2016-06-21 Thread Michael Nitschinger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341721#comment-15341721 ] Michael Nitschinger commented on SPARK-16015: - okay well what I wanted to avo

[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341715#comment-15341715 ] Hyukjin Kwon commented on SPARK-15393: -- This is said in the comment above. The examp

[jira] [Updated] (SPARK-16098) Multiclass SVM Learning

2016-06-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-16098: -- Flags: (was: Patch) Affects Version/s: (was: 2.1.0) Target Version/s: (was:

[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-21 Thread Jurriaan Pruis (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341703#comment-15341703 ] Jurriaan Pruis commented on SPARK-15393: That's interesting because my example wo

[jira] [Commented] (SPARK-16069) rdd.map(identity).cache very slow

2016-06-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341702#comment-15341702 ] Sean Owen commented on SPARK-16069: --- I don't think data has to be moved to the driver i

[jira] [Commented] (SPARK-16095) Yarn cluster mode should return consistent result for command line and SparkLauncher

2016-06-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16095?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341695#comment-15341695 ] Sean Owen commented on SPARK-16095: --- Is this description correct? you're saying they bo

[jira] [Assigned] (SPARK-16104) Do not creaate CSV writer object for every flush when writing

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16104: Assignee: Apache Spark > Do not creaate CSV writer object for every flush when writing > -

[jira] [Commented] (SPARK-16104) Do not creaate CSV writer object for every flush when writing

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341693#comment-15341693 ] Apache Spark commented on SPARK-16104: -- User 'HyukjinKwon' has created a pull reques

[jira] [Assigned] (SPARK-16104) Do not creaate CSV writer object for every flush when writing

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16104: Assignee: (was: Apache Spark) > Do not creaate CSV writer object for every flush when

[jira] [Resolved] (SPARK-16091) Dataset.partitionBy.csv raise a java.io.FileNotFoundException when launched on an hadoop cluster

2016-06-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16091. --- Resolution: Duplicate > Dataset.partitionBy.csv raise a java.io.FileNotFoundException when launched

[jira] [Commented] (SPARK-16015) Datasource register for shutdown?

2016-06-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341687#comment-15341687 ] Sean Owen commented on SPARK-16015: --- On the executor? I don't think there's a lifecycle

[jira] [Commented] (SPARK-16015) Datasource register for shutdown?

2016-06-21 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341689#comment-15341689 ] Sean Owen commented on SPARK-16015: --- On the executor? I don't think there's a lifecycle

[jira] [Created] (SPARK-16104) Do not creaate CSV writer object for every flush when writing

2016-06-21 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-16104: Summary: Do not creaate CSV writer object for every flush when writing Key: SPARK-16104 URL: https://issues.apache.org/jira/browse/SPARK-16104 Project: Spark

[jira] [Created] (SPARK-16103) Share a single Row for CSV data source rather than creating every time

2016-06-21 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-16103: Summary: Share a single Row for CSV data source rather than creating every time Key: SPARK-16103 URL: https://issues.apache.org/jira/browse/SPARK-16103 Project: Spark

[jira] [Commented] (SPARK-16099) Refatoring CSV data source and improve performance

2016-06-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341659#comment-15341659 ] Hyukjin Kwon commented on SPARK-16099: -- Basically it was splitted because the origin

[jira] [Commented] (SPARK-16032) Audit semantics of various insertion operations related to partitioned tables

2016-06-21 Thread Cheng Lian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341658#comment-15341658 ] Cheng Lian commented on SPARK-16032: Hey [~rdblue], [~yhuai] and [~cloud_fan] had alr

[jira] [Created] (SPARK-16102) Use Record API from Univocity rather than current data cast API.

2016-06-21 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-16102: Summary: Use Record API from Univocity rather than current data cast API. Key: SPARK-16102 URL: https://issues.apache.org/jira/browse/SPARK-16102 Project: Spark

[jira] [Created] (SPARK-16101) Refactoring CSV data source to be consistent with JSON data source

2016-06-21 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-16101: Summary: Refactoring CSV data source to be consistent with JSON data source Key: SPARK-16101 URL: https://issues.apache.org/jira/browse/SPARK-16101 Project: Spark

[jira] [Updated] (SPARK-14480) Remove meaningless StringIteratorReader for CSV data source for better performance

2016-06-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-14480: - Issue Type: Sub-task (was: Improvement) Parent: SPARK-16099 > Remove meaningless StringI

[jira] [Updated] (SPARK-14480) Remove meaningless StringIteratorReader for CSV data source for better performance

2016-06-21 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-14480: - Summary: Remove meaningless StringIteratorReader for CSV data source for better performance (was

[jira] [Created] (SPARK-16100) Aggregator fails with Tungsten error when complex types are used for results and partial sum

2016-06-21 Thread Deenar Toraskar (JIRA)
Deenar Toraskar created SPARK-16100: --- Summary: Aggregator fails with Tungsten error when complex types are used for results and partial sum Key: SPARK-16100 URL: https://issues.apache.org/jira/browse/SPARK-16100

[jira] [Updated] (SPARK-15904) High Memory Pressure using MLlib K-means

2016-06-21 Thread Alessio (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessio updated SPARK-15904: Description: *Please Note*: even though the issue has been marked as "not a problem" and "resolved", this

[jira] [Updated] (SPARK-15904) High Memory Pressure using MLlib K-means

2016-06-21 Thread Alessio (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessio updated SPARK-15904: Description: *Please Note*: even though the issue has been marked as "not a problem" and "resolved", this

[jira] [Updated] (SPARK-16024) add tests for table creation with column comment

2016-06-21 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-16024: Summary: add tests for table creation with column comment (was: column comment is ignored for data

[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-21 Thread Daniel Mescheder (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341640#comment-15341640 ] Daniel Mescheder commented on SPARK-15393: -- Update: The same issue also occurs w

[jira] [Created] (SPARK-16099) Refatoring CSV data source and improve performance

2016-06-21 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-16099: Summary: Refatoring CSV data source and improve performance Key: SPARK-16099 URL: https://issues.apache.org/jira/browse/SPARK-16099 Project: Spark Issue Type

[jira] [Commented] (SPARK-14480) Simplify CSV parsing process with a better performance

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341624#comment-15341624 ] Apache Spark commented on SPARK-14480: -- User 'HyukjinKwon' has created a pull reques

[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-21 Thread Hiroshi Inoue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341618#comment-15341618 ] Hiroshi Inoue commented on SPARK-15704: --- Yes, please. Thank you. > TungstenAggrega

[jira] [Commented] (SPARK-16076) Dataset - outer join nulls can sometimes combinate to default values

2016-06-21 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341609#comment-15341609 ] Wenchen Fan commented on SPARK-16076: - I think this is caused by https://issues.apach

[jira] [Commented] (SPARK-16097) Encoders.tuple should handle null object correctly

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16097?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341604#comment-15341604 ] Apache Spark commented on SPARK-16097: -- User 'cloud-fan' has created a pull request

[jira] [Assigned] (SPARK-16097) Encoders.tuple should handle null object correctly

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16097: Assignee: Apache Spark (was: Wenchen Fan) > Encoders.tuple should handle null object corr

[jira] [Assigned] (SPARK-16097) Encoders.tuple should handle null object correctly

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16097?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16097: Assignee: Wenchen Fan (was: Apache Spark) > Encoders.tuple should handle null object corr

[jira] [Created] (SPARK-16098) Multiclass SVM Learning

2016-06-21 Thread Hayri Volkan Agun (JIRA)
Hayri Volkan Agun created SPARK-16098: - Summary: Multiclass SVM Learning Key: SPARK-16098 URL: https://issues.apache.org/jira/browse/SPARK-16098 Project: Spark Issue Type: Request

[jira] [Created] (SPARK-16097) Encoders.tuple should handle null object correctly

2016-06-21 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-16097: --- Summary: Encoders.tuple should handle null object correctly Key: SPARK-16097 URL: https://issues.apache.org/jira/browse/SPARK-16097 Project: Spark Issue Type:

[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-21 Thread Deenar Toraskar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341598#comment-15341598 ] Deenar Toraskar commented on SPARK-15704: - [~inouehrs] thanks for checking this o

[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-21 Thread Hiroshi Inoue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341587#comment-15341587 ] Hiroshi Inoue commented on SPARK-15704: --- I confirmed the same error by executing De

[jira] [Commented] (SPARK-16044) input_file_name() returns empty strings in data sources based on NewHadoopRDD.

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341552#comment-15341552 ] Apache Spark commented on SPARK-16044: -- User 'HyukjinKwon' has created a pull reques

[jira] [Updated] (SPARK-15904) High Memory Pressure using MLlib K-means

2016-06-21 Thread Alessio (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessio updated SPARK-15904: Description: *Please Note*: even though the issue has been marked as "not a problem" and "resolved", this

[jira] [Updated] (SPARK-15904) High Memory Pressure using MLlib K-means

2016-06-21 Thread Alessio (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessio updated SPARK-15904: Description: *Please Note*: even though the issue has been marked as "not a problem" and "resolved", this

[jira] [Commented] (SPARK-16075) Make VectorUDT/MatrixUDT singleton under spark.ml package

2016-06-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341506#comment-15341506 ] Nick Pentreath commented on SPARK-16075: [~wangmiao1981] SPARK-15746 will probabl

[jira] [Comment Edited] (SPARK-16069) rdd.map(identity).cache very slow

2016-06-21 Thread Julien Diener (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341492#comment-15341492 ] Julien Diener edited comment on SPARK-16069 at 6/21/16 10:08 AM: --

[jira] [Comment Edited] (SPARK-16069) rdd.map(identity).cache very slow

2016-06-21 Thread Julien Diener (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341492#comment-15341492 ] Julien Diener edited comment on SPARK-16069 at 6/21/16 10:07 AM: --

[jira] [Comment Edited] (SPARK-16069) rdd.map(identity).cache very slow

2016-06-21 Thread Julien Diener (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341492#comment-15341492 ] Julien Diener edited comment on SPARK-16069 at 6/21/16 10:06 AM: --

[jira] [Commented] (SPARK-16069) rdd.map(identity).cache very slow

2016-06-21 Thread Julien Diener (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341492#comment-15341492 ] Julien Diener commented on SPARK-16069: --- Maybe I wasn't clear: the input rdd is alr

[jira] [Commented] (SPARK-15393) Writing empty Dataframes doesn't save any _metadata files

2016-06-21 Thread Daniel Mescheder (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341486#comment-15341486 ] Daniel Mescheder commented on SPARK-15393: -- I am observing what I think is the s

[jira] [Commented] (SPARK-15704) TungstenAggregate crashes

2016-06-21 Thread Deenar Toraskar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15704?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341478#comment-15341478 ] Deenar Toraskar commented on SPARK-15704: - Hi guys I get a similar error when us

[jira] [Assigned] (SPARK-16096) R deprecate unionAll and add union

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16096: Assignee: (was: Apache Spark) > R deprecate unionAll and add union > -

[jira] [Commented] (SPARK-16096) R deprecate unionAll and add union

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341443#comment-15341443 ] Apache Spark commented on SPARK-16096: -- User 'felixcheung' has created a pull reques

[jira] [Assigned] (SPARK-16096) R deprecate unionAll and add union

2016-06-21 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-16096: Assignee: Apache Spark > R deprecate unionAll and add union >

[jira] [Commented] (SPARK-16015) Datasource register for shutdown?

2016-06-21 Thread Michael Nitschinger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341442#comment-15341442 ] Michael Nitschinger commented on SPARK-16015: - Sean, thanks for your input.

[jira] [Created] (SPARK-16096) R deprecate unionAll and add union

2016-06-21 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-16096: Summary: R deprecate unionAll and add union Key: SPARK-16096 URL: https://issues.apache.org/jira/browse/SPARK-16096 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-16090) Improve method grouping in SparkR generated docs

2016-06-21 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341436#comment-15341436 ] Felix Cheung commented on SPARK-16090: -- statfunction: https://github.com/apache/spar

[jira] [Comment Edited] (SPARK-16070) DataFrame/Parquet issues with primitive arrays

2016-06-21 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341427#comment-15341427 ] Kazuaki Ishizaki edited comment on SPARK-16070 at 6/21/16 9:22 AM:

[jira] [Comment Edited] (SPARK-16070) DataFrame/Parquet issues with primitive arrays

2016-06-21 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341408#comment-15341408 ] Kazuaki Ishizaki edited comment on SPARK-16070 at 6/21/16 9:22 AM:

[jira] [Commented] (SPARK-16070) DataFrame/Parquet issues with primitive arrays

2016-06-21 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341427#comment-15341427 ] Kazuaki Ishizaki commented on SPARK-16070: -- Other JIRAs for DataFrame issues wit

[jira] [Updated] (SPARK-16063) Add storageLevel to Dataset

2016-06-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-16063: --- Summary: Add storageLevel to Dataset (was: Add getStorageLevel to Dataset) > Add storageLeve

[jira] [Updated] (SPARK-16063) Add storageLevel to Dataset

2016-06-21 Thread Nick Pentreath (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nick Pentreath updated SPARK-16063: --- Description: SPARK-11905 added {{cache}}/{{persist}} to {{Dataset}}. We should add {{Dataset.

<    1   2   3   4   >