[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198305#comment-16198305 ] DB Tsai commented on SPARK-22231: - [~rxin] {{df.drop("items.b")}} works too. We also want to have

[jira] [Resolved] (SPARK-22152) Add Dataset flatten function

2017-10-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22152. --- Resolution: Won't Fix > Add Dataset flatten function > > >

[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198286#comment-16198286 ] DB Tsai commented on SPARK-22231: - [~viirya] Thanks. I fixed the typo :) Yes, {{mapItems}} will work on

[jira] [Updated] (SPARK-21646) Add new type coercion rules to compatible with Hive

2017-10-10 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-21646: Summary: Add new type coercion rules to compatible with Hive (was: Add new type coercion to

[jira] [Commented] (SPARK-22233) filter out empty InputSplit in HadoopRDD

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198526#comment-16198526 ] Apache Spark commented on SPARK-22233: -- User 'liutang123' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22233) filter out empty InputSplit in HadoopRDD

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22233: Assignee: (was: Apache Spark) > filter out empty InputSplit in HadoopRDD >

[jira] [Assigned] (SPARK-22233) filter out empty InputSplit in HadoopRDD

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22233: Assignee: Apache Spark > filter out empty InputSplit in HadoopRDD >

[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198306#comment-16198306 ] Reynold Xin commented on SPARK-22231: - Can you say more? I can't think of a case in which you'd want

[jira] [Updated] (SPARK-21646) Add new type coercion to compatible with Hive

2017-10-10 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-21646: Summary: Add new type coercion to compatible with Hive (was: BinaryComparison shouldn't auto cast

[jira] [Commented] (SPARK-22225) wholeTextFilesIterators

2017-10-10 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198326#comment-16198326 ] sam commented on SPARK-5: - Thanks [~srowen] and [~hyukjin.kwon], I wasn't aware of either of these

[jira] [Created] (SPARK-22234) Distinct window functions are not supported

2017-10-10 Thread cen yuhai (JIRA)
cen yuhai created SPARK-22234: - Summary: Distinct window functions are not supported Key: SPARK-22234 URL: https://issues.apache.org/jira/browse/SPARK-22234 Project: Spark Issue Type: New

[jira] [Resolved] (SPARK-21770) ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions

2017-10-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21770. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19106

[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198301#comment-16198301 ] Reynold Xin commented on SPARK-22231: - For drop columns - why not just df.drop("items.b")? >

[jira] [Commented] (SPARK-22225) wholeTextFilesIterators

2017-10-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198287#comment-16198287 ] Sean Owen commented on SPARK-5: --- I tend to agree that this is already available from binaryFiles as

[jira] [Assigned] (SPARK-21770) ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions

2017-10-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-21770: - Assignee: Weichen Xu > ProbabilisticClassificationModel: Improve normalization of all-zero raw

[jira] [Commented] (SPARK-22220) Spark SQL: LATERAL VIEW OUTER null pointer exception with GROUP BY

2017-10-10 Thread Marco Gaido (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198295#comment-16198295 ] Marco Gaido commented on SPARK-0: - Please may you provide some sample data and easy code to

[jira] [Commented] (SPARK-22225) wholeTextFilesIterators

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198335#comment-16198335 ] Hyukjin Kwon commented on SPARK-5: -- Sure, Thanks [~sams]. > wholeTextFilesIterators >

[jira] [Created] (SPARK-22235) Can not kill job gracefully in spark standalone cluster

2017-10-10 Thread Mariusz Dubielecki (JIRA)
Mariusz Dubielecki created SPARK-22235: -- Summary: Can not kill job gracefully in spark standalone cluster Key: SPARK-22235 URL: https://issues.apache.org/jira/browse/SPARK-22235 Project: Spark

[jira] [Commented] (SPARK-8186) date/time function: date_add

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198651#comment-16198651 ] Hyukjin Kwon commented on SPARK-8186: - I think the problem is filed in separately - SPARK-17174. >

[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198716#comment-16198716 ] Liang-Chi Hsieh commented on SPARK-9: - As we have the pluggable mechanism to set up external

[jira] [Commented] (SPARK-20171) Analyzer should include the arity of a function when reporting "AnalysisException: Undefined function"

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20171?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198719#comment-16198719 ] Hyukjin Kwon commented on SPARK-20171: -- I happen to look at this JIRA. Looks we now throw a

[jira] [Commented] (SPARK-19325) Running query hang-up 5min

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198668#comment-16198668 ] Hyukjin Kwon commented on SPARK-19325: -- [~Sephiroth-Lin], do you maybe have a reproducer for this?

[jira] [Resolved] (SPARK-22212) Some SQL functions in Python fail with string column name

2017-10-10 Thread Jakub Nowacki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jakub Nowacki resolved SPARK-22212. --- Resolution: Later Keeping the resolution on-hold until API unification consensus will be

[jira] [Resolved] (SPARK-18479) spark.sql.shuffle.partitions defaults should be a prime number

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-18479. -- Resolution: Won't Fix I am resolving this assuming there is no explicit objection on ^. >

[jira] [Updated] (SPARK-19039) UDF ClosureCleaner bug when UDF, col applied in paste mode in REPL

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-19039: - Affects Version/s: 2.3.0 > UDF ClosureCleaner bug when UDF, col applied in paste mode in REPL >

[jira] [Commented] (SPARK-21737) Create communication channel between arbitrary clients and the Spark AM in YARN mode

2017-10-10 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198674#comment-16198674 ] Saisai Shao commented on SPARK-21737: - I was trying to understand how Spark communicate with Mesos,

[jira] [Commented] (SPARK-19860) DataFrame join get conflict error if two frames has a same name column.

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19860?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198704#comment-16198704 ] Hyukjin Kwon commented on SPARK-19860: -- [~wuchang1989], would you mind if I ask self-contained

[jira] [Assigned] (SPARK-21506) The description of "spark.executor.cores" may be not correct

2017-10-10 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-21506: --- Assignee: liuxian > The description of "spark.executor.cores" may be not correct >

[jira] [Resolved] (SPARK-21506) The description of "spark.executor.cores" may be not correct

2017-10-10 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-21506. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18711

[jira] [Resolved] (SPARK-20025) Driver fail over will not work, if SPARK_LOCAL* env is set.

2017-10-10 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20025. - Resolution: Fixed Fix Version/s: 2.3.0 > Driver fail over will not work, if SPARK_LOCAL*

[jira] [Commented] (SPARK-21737) Create communication channel between arbitrary clients and the Spark AM in YARN mode

2017-10-10 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198665#comment-16198665 ] Thomas Graves commented on SPARK-21737: --- I haven't had time to think about it much since the pr

[jira] [Commented] (SPARK-20162) Reading data from MySQL - Cannot up cast from decimal(30,6) to decimal(38,18)

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198712#comment-16198712 ] Hyukjin Kwon commented on SPARK-20162: -- Let me resolve it as Cannot Reproduce. Please reopen this if

[jira] [Resolved] (SPARK-20162) Reading data from MySQL - Cannot up cast from decimal(30,6) to decimal(38,18)

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-20162. -- Resolution: Cannot Reproduce > Reading data from MySQL - Cannot up cast from decimal(30,6) to

[jira] [Assigned] (SPARK-20025) Driver fail over will not work, if SPARK_LOCAL* env is set.

2017-10-10 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-20025: --- Assignee: Prashant Sharma > Driver fail over will not work, if SPARK_LOCAL* env is set. >

[jira] [Commented] (SPARK-18536) Failed to save to hive table when case class with empty field

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198613#comment-16198613 ] Hyukjin Kwon commented on SPARK-18536: -- The codes: {code} import scala.collection.mutable.Queue

[jira] [Commented] (SPARK-8186) date/time function: date_add

2017-10-10 Thread eugen yushin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8186?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198630#comment-16198630 ] eugen yushin commented on SPARK-8186: - Unfortunately, current implementation doesn't correspond to

[jira] [Commented] (SPARK-19039) UDF ClosureCleaner bug when UDF, col applied in paste mode in REPL

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198655#comment-16198655 ] Hyukjin Kwon commented on SPARK-19039: -- Still happens in the master: {code} scala> :paste //

[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread Jeremy Smith (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198965#comment-16198965 ] Jeremy Smith commented on SPARK-22231: -- I implemented these at Netflix, so I wanted to provide some

[jira] [Assigned] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai reassigned SPARK-22231: --- Assignee: Jeremy Smith > Support of map, filter, withColumn, dropColumn in nested list of

[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-10 Thread Yuval Degani (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199052#comment-16199052 ] Yuval Degani commented on SPARK-9: -- [~srowen], [~viirya], thanks for your response. Regarding

[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread Nathan Kronenfeld (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198861#comment-16198861 ] Nathan Kronenfeld commented on SPARK-22231: --- One couple related concerns... # I think

[jira] [Comment Edited] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread Nathan Kronenfeld (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198861#comment-16198861 ] Nathan Kronenfeld edited comment on SPARK-22231 at 10/10/17 7:39 PM: -

[jira] [Assigned] (SPARK-22237) Spark submit script should use downloaded files in standalone/local client mode

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22237: Assignee: Apache Spark > Spark submit script should use downloaded files in

[jira] [Commented] (SPARK-22237) Spark submit script should use downloaded files in standalone/local client mode

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199296#comment-16199296 ] Apache Spark commented on SPARK-22237: -- User 'loneknightpy' has created a pull request for this

[jira] [Assigned] (SPARK-22237) Spark submit script should use downloaded files in standalone/local client mode

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22237: Assignee: (was: Apache Spark) > Spark submit script should use downloaded files in

[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199242#comment-16199242 ] Sean Owen commented on SPARK-9: --- It's still something that should start as a package. I think that

[jira] [Commented] (SPARK-21988) Add default stats to StreamingExecutionRelation

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199182#comment-16199182 ] Apache Spark commented on SPARK-21988: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Commented] (SPARK-22236) CSV I/O: does not respect RFC 4180

2017-10-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199245#comment-16199245 ] Sean Owen commented on SPARK-22236: --- Interesting, because the Univocity parser internally seems to

[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-10 Thread Yuval Degani (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199293#comment-16199293 ] Yuval Degani commented on SPARK-9: -- We already published an open-source package that implements

[jira] [Updated] (SPARK-22237) Spark submit script should use downloaded files in standalone/local client mode

2017-10-10 Thread Yu Peng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Peng updated SPARK-22237: Description: SPARK-10643 is added to allow spark-submit script to download jars/files from remote hadoop

[jira] [Created] (SPARK-22237) Spark submit script should use downloaded files in standalone/local client mode

2017-10-10 Thread Yu Peng (JIRA)
Yu Peng created SPARK-22237: --- Summary: Spark submit script should use downloaded files in standalone/local client mode Key: SPARK-22237 URL: https://issues.apache.org/jira/browse/SPARK-22237 Project: Spark

[jira] [Created] (SPARK-22236) CSV I/O: does not respect RFC 4180

2017-10-10 Thread Ondrej Kokes (JIRA)
Ondrej Kokes created SPARK-22236: Summary: CSV I/O: does not respect RFC 4180 Key: SPARK-22236 URL: https://issues.apache.org/jira/browse/SPARK-22236 Project: Spark Issue Type: Improvement

[jira] [Comment Edited] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread Nathan Kronenfeld (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198861#comment-16198861 ] Nathan Kronenfeld edited comment on SPARK-22231 at 10/10/17 7:38 PM: -

[jira] [Reopened] (SPARK-21988) Add default stats to StreamingExecutionRelation

2017-10-10 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu reopened SPARK-21988: -- > Add default stats to StreamingExecutionRelation >

[jira] [Assigned] (SPARK-21988) Add default stats to StreamingExecutionRelation

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21988: Assignee: Apache Spark (was: Jose Torres) > Add default stats to

[jira] [Assigned] (SPARK-21988) Add default stats to StreamingExecutionRelation

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21988: Assignee: Jose Torres (was: Apache Spark) > Add default stats to

[jira] [Created] (SPARK-22238) EnsureStatefulOpPartitioning shouldn't ask for the child RDD before planning is completed

2017-10-10 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-22238: --- Summary: EnsureStatefulOpPartitioning shouldn't ask for the child RDD before planning is completed Key: SPARK-22238 URL: https://issues.apache.org/jira/browse/SPARK-22238

[jira] [Assigned] (SPARK-21907) NullPointerException in UnsafeExternalSorter.spill()

2017-10-10 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell reassigned SPARK-21907: - Assignee: Eyal Farago > NullPointerException in UnsafeExternalSorter.spill() >

[jira] [Commented] (SPARK-22238) EnsureStatefulOpPartitioning shouldn't ask for the child RDD before planning is completed

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199451#comment-16199451 ] Apache Spark commented on SPARK-22238: -- User 'brkyvz' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22238) EnsureStatefulOpPartitioning shouldn't ask for the child RDD before planning is completed

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22238: Assignee: Burak Yavuz (was: Apache Spark) > EnsureStatefulOpPartitioning shouldn't ask

[jira] [Assigned] (SPARK-22238) EnsureStatefulOpPartitioning shouldn't ask for the child RDD before planning is completed

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22238: Assignee: Apache Spark (was: Burak Yavuz) > EnsureStatefulOpPartitioning shouldn't ask

[jira] [Resolved] (SPARK-21907) NullPointerException in UnsafeExternalSorter.spill()

2017-10-10 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Herman van Hovell resolved SPARK-21907. --- Resolution: Fixed Fix Version/s: 2.3.0 > NullPointerException in

[jira] [Commented] (SPARK-22216) Improving PySpark/Pandas interoperability

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199514#comment-16199514 ] Hyukjin Kwon commented on SPARK-22216: -- Thanks for updating. > Improving PySpark/Pandas

[jira] [Commented] (SPARK-22216) Improving PySpark/Pandas interoperability

2017-10-10 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199516#comment-16199516 ] Li Jin commented on SPARK-22216: [~hyukjin.kwon], My intention is to keep this open to track all the

[jira] [Created] (SPARK-22239) Used-defined window functions with pandas udf

2017-10-10 Thread Li Jin (JIRA)
Li Jin created SPARK-22239: -- Summary: Used-defined window functions with pandas udf Key: SPARK-22239 URL: https://issues.apache.org/jira/browse/SPARK-22239 Project: Spark Issue Type: Sub-task

[jira] [Commented] (SPARK-22199) Spark Job on YARN fails with executors "Slave registration failed"

2017-10-10 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198258#comment-16198258 ] Saisai Shao commented on SPARK-22199: - Can you please list the steps to reproduce this issue? Also

[jira] [Updated] (SPARK-22233) filter out empty InputSplit in HadoopRDD

2017-10-10 Thread Lijia Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lijia Liu updated SPARK-22233: -- Description: Sometimes, Hive will create an empty table with many empty files, Spark use the

[jira] [Updated] (SPARK-22233) filter out empty InputSplit in HadoopRDD

2017-10-10 Thread Lijia Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lijia Liu updated SPARK-22233: -- Description: Sometimes, Hive will create an empty table with many empty files, Spark use the

[jira] [Updated] (SPARK-22232) Row objects in pyspark created using the `Row(**kwars)` syntax do not get serialized/deserialized properly

2017-10-10 Thread Bago Amirbekian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bago Amirbekian updated SPARK-22232: Summary: Row objects in pyspark created using the `Row(**kwars)` syntax do not get

[jira] [Updated] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai updated SPARK-22231: Description: At Netflix's algorithm team, we work on ranking problems to find the great content to

[jira] [Commented] (SPARK-22229) SPIP: RDMA Accelerated Shuffle Engine

2017-10-10 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16198284#comment-16198284 ] Sean Owen commented on SPARK-9: --- RDMA is specialized networking hardware right? I don't think this

[jira] [Resolved] (SPARK-20396) groupBy().apply() with pandas udf in pyspark

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-20396. -- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18732

[jira] [Commented] (SPARK-22216) Improving PySpark/Pandas interoperability

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199502#comment-16199502 ] Hyukjin Kwon commented on SPARK-22216: -- [~icexelloss] and [~bryanc], do you think we need more

[jira] [Updated] (SPARK-20791) Use Apache Arrow to Improve Spark createDataFrame from Pandas.DataFrame

2017-10-10 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-20791: --- Issue Type: Sub-task (was: New Feature) Parent: SPARK-22216 > Use Apache Arrow to Improve Spark

[jira] [Resolved] (SPARK-19558) Provide a config option to attach QueryExecutionListener to SparkSession

2017-10-10 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19558?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-19558. - Resolution: Fixed Assignee: Marcelo Vanzin Fix Version/s: 2.3.0 > Provide a config

[jira] [Created] (SPARK-22241) Apache spark giving InvalidSchemaException: Cannot write a schema with an empty group: optional group element {

2017-10-10 Thread Ritika Maheshwari (JIRA)
Ritika Maheshwari created SPARK-22241: - Summary: Apache spark giving InvalidSchemaException: Cannot write a schema with an empty group: optional group element { Key: SPARK-22241 URL:

[jira] [Commented] (SPARK-22231) Support of map, filter, withColumn, dropColumn in nested list of structures

2017-10-10 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199789#comment-16199789 ] Liang-Chi Hsieh commented on SPARK-22231: - [~Jeremy Smith] Thanks for the context. Regarding the

[jira] [Commented] (SPARK-20937) Describe spark.sql.parquet.writeLegacyFormat property in Spark SQL, DataFrames and Datasets Guide

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199808#comment-16199808 ] Hyukjin Kwon commented on SPARK-20937: -- +1 too. > Describe spark.sql.parquet.writeLegacyFormat

[jira] [Updated] (SPARK-15474) ORC data source fails to write and read back empty dataframe

2017-10-10 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-15474: -- Affects Version/s: 2.1.1 > ORC data source fails to write and read back empty dataframe >

[jira] [Commented] (SPARK-22223) ObjectHashAggregate introduces unnecessary shuffle

2017-10-10 Thread Michele Costantino Soccio (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199831#comment-16199831 ] Michele Costantino Soccio commented on SPARK-3: --- [~maropu] Not sure I understand

[jira] [Resolved] (SPARK-20561) Running SparkR with no RHive installed in secured environment

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-20561. -- Resolution: Invalid Question should better usually go to the mailing list, see

[jira] [Commented] (SPARK-21641) Combining windowing (groupBy) and mapGroupsWithState (groupByKey) in Spark Structured Streaming

2017-10-10 Thread kant kodali (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199811#comment-16199811 ] kant kodali commented on SPARK-21641: - [~marmbrus] > Combining windowing (groupBy) and

[jira] [Comment Edited] (SPARK-21641) Combining windowing (groupBy) and mapGroupsWithState (groupByKey) in Spark Structured Streaming

2017-10-10 Thread kant kodali (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199811#comment-16199811 ] kant kodali edited comment on SPARK-21641 at 10/11/17 4:58 AM: ---

[jira] [Comment Edited] (SPARK-22236) CSV I/O: does not respect RFC 4180

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199825#comment-16199825 ] Hyukjin Kwon edited comment on SPARK-22236 at 10/11/17 5:51 AM: Yup, what

[jira] [Resolved] (SPARK-21751) CodeGeneraor.splitExpressions counts code size more precisely

2017-10-10 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21751. - Resolution: Fixed Assignee: Kazuaki Ishizaki Fix Version/s: 2.3.0 >

[jira] [Commented] (SPARK-22223) ObjectHashAggregate introduces unnecessary shuffle

2017-10-10 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199597#comment-16199597 ] Takeshi Yamamuro commented on SPARK-3: -- The hash-based aggregate implementation requires the

[jira] [Resolved] (SPARK-15757) Error occurs when using Spark sql "select" statement on orc file after hive sql "insert overwrite tb1 select * from sourcTb" has been executed on this orc file

2017-10-10 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-15757. --- Resolution: Duplicate This issue only happens when `convertMetastoreOrc` is true. The root

[jira] [Resolved] (SPARK-21686) spark.sql.hive.convertMetastoreOrc is causing NullPointerException while reading ORC tables

2017-10-10 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21686?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-21686. --- Resolution: Duplicate > spark.sql.hive.convertMetastoreOrc is causing NullPointerException

[jira] [Commented] (SPARK-22241) Apache spark giving InvalidSchemaException: Cannot write a schema with an empty group: optional group element {

2017-10-10 Thread Ritika Maheshwari (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199724#comment-16199724 ] Ritika Maheshwari commented on SPARK-22241: --- I know parquet does not allow empty struct types.

[jira] [Updated] (SPARK-21929) Support `ALTER TABLE table_name ADD COLUMNS(..)` for ORC data source

2017-10-10 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-21929: -- Description: SPARK-19261 implemented `ADD COLUMNS` at Spark 2.2, but ORC data source is not

[jira] [Created] (SPARK-22243) job failed to restart from checkpoint

2017-10-10 Thread StephenZou (JIRA)
StephenZou created SPARK-22243: -- Summary: job failed to restart from checkpoint Key: SPARK-22243 URL: https://issues.apache.org/jira/browse/SPARK-22243 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-22242) job failed to restart from checkpoint

2017-10-10 Thread StephenZou (JIRA)
StephenZou created SPARK-22242: -- Summary: job failed to restart from checkpoint Key: SPARK-22242 URL: https://issues.apache.org/jira/browse/SPARK-22242 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-22243) streaming job failed to restart from checkpoint

2017-10-10 Thread StephenZou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] StephenZou updated SPARK-22243: --- Summary: streaming job failed to restart from checkpoint (was: job failed to restart from

[jira] [Updated] (SPARK-15474) ORC data source fails to write and read back empty dataframe

2017-10-10 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-15474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-15474: -- Affects Version/s: 2.2.0 > ORC data source fails to write and read back empty dataframe >

[jira] [Commented] (SPARK-22236) CSV I/O: does not respect RFC 4180

2017-10-10 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199825#comment-16199825 ] Hyukjin Kwon commented on SPARK-22236: -- Yup, what is said ^ looks all correct. I think we should set

[jira] [Commented] (SPARK-1911) Warn users if their assembly jars are not built with Java 6

2017-10-10 Thread Swaapnika Guntaka (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199666#comment-16199666 ] Swaapnika Guntaka commented on SPARK-1911: -- Does this issue still exist with Spark-2.2.? > Warn

[jira] [Assigned] (SPARK-22243) job failed to restart from checkpoint

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22243: Assignee: (was: Apache Spark) > job failed to restart from checkpoint >

[jira] [Assigned] (SPARK-22243) job failed to restart from checkpoint

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22243: Assignee: Apache Spark > job failed to restart from checkpoint >

[jira] [Updated] (SPARK-22243) job failed to restart from checkpoint

2017-10-10 Thread StephenZou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] StephenZou updated SPARK-22243: --- Attachment: CheckpointTest.scala the reproducible demo > job failed to restart from checkpoint >

[jira] [Commented] (SPARK-22243) job failed to restart from checkpoint

2017-10-10 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16199763#comment-16199763 ] Apache Spark commented on SPARK-22243: -- User 'ChenjunZou' has created a pull request for this issue:

  1   2   >