[jira] [Updated] (SPARK-24376) compiling spark with scala-2.10 should use the -P parameter instead of -D

2018-05-23 Thread Yu Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Wang updated SPARK-24376: Description: compiling spark with scala-2.10 should use the -P parameter instead of -D (was: compiling

[jira] [Updated] (SPARK-24376) compiling spark with scala-2.10 should use the -P parameter instead of -D

2018-05-23 Thread Yu Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yu Wang updated SPARK-24376: Summary: compiling spark with scala-2.10 should use the -P parameter instead of -D (was: compiling spark

[jira] [Created] (SPARK-24376) compiling spark with scala-2.10 should use the -p parameter instead of -d

2018-05-23 Thread Yu Wang (JIRA)
Yu Wang created SPARK-24376: --- Summary: compiling spark with scala-2.10 should use the -p parameter instead of -d Key: SPARK-24376 URL: https://issues.apache.org/jira/browse/SPARK-24376 Project: Spark

[jira] [Commented] (SPARK-24362) SUM function precision issue

2018-05-23 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488464#comment-16488464 ] Yuming Wang commented on SPARK-24362: - *{{SortMergeJoin}}* vs *{{BroadcastHashJoin}}*: {code}

[jira] [Resolved] (SPARK-24364) Files deletion after globbing may fail StructuredStreaming jobs

2018-05-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-24364. -- Resolution: Fixed Fix Version/s: 2.3.1 2.4.0 Issue resolved by pull

[jira] [Assigned] (SPARK-24364) Files deletion after globbing may fail StructuredStreaming jobs

2018-05-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-24364: Assignee: Hyukjin Kwon > Files deletion after globbing may fail StructuredStreaming jobs

[jira] [Commented] (SPARK-24375) Design sketch: support barrier scheduling in Apache Spark

2018-05-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488432#comment-16488432 ] Xiangrui Meng commented on SPARK-24375: --- [~jiangxb1987] Could you summarize the design sketch based

[jira] [Created] (SPARK-24375) Design sketch: support barrier scheduling in Apache Spark

2018-05-23 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-24375: - Summary: Design sketch: support barrier scheduling in Apache Spark Key: SPARK-24375 URL: https://issues.apache.org/jira/browse/SPARK-24375 Project: Spark

[jira] [Updated] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

2018-05-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-24374: -- Shepherd: Reynold Xin > SPIP: Support Barrier Scheduling in Apache Spark >

[jira] [Updated] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

2018-05-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-24374: -- Issue Type: Epic (was: Story) > SPIP: Support Barrier Scheduling in Apache Spark >

[jira] [Updated] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

2018-05-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-24374: -- Description: (See details in the linked/attached SPIP doc.) {quote} The proposal here is to

[jira] [Updated] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

2018-05-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-24374: -- Attachment: SPIP_ Support Barrier Scheduling in Apache Spark.pdf > SPIP: Support Barrier

[jira] [Updated] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

2018-05-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-24374: -- Description: (See details in the linked/attached SPIP doc.) The proposal here is to add a new

[jira] [Updated] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

2018-05-23 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-24374: -- Labels: SPIP (was: ) > SPIP: Support Barrier Scheduling in Apache Spark >

[jira] [Created] (SPARK-24374) SPIP: Support Barrier Scheduling in Apache Spark

2018-05-23 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-24374: - Summary: SPIP: Support Barrier Scheduling in Apache Spark Key: SPARK-24374 URL: https://issues.apache.org/jira/browse/SPARK-24374 Project: Spark Issue

[jira] [Commented] (SPARK-24369) A bug when having multiple distinct aggregations

2018-05-23 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488411#comment-16488411 ] Takeshi Yamamuro commented on SPARK-24369: -- ok, I will look into this. Thanks! > A bug when

[jira] [Commented] (SPARK-24369) A bug when having multiple distinct aggregations

2018-05-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488396#comment-16488396 ] Xiao Li commented on SPARK-24369: - cc [~maropu] Are you interested in this? > A bug when having multiple

[jira] [Assigned] (SPARK-24322) Upgrade Apache ORC to 1.4.4

2018-05-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-24322: --- Assignee: Dongjoon Hyun > Upgrade Apache ORC to 1.4.4 > --- > >

[jira] [Resolved] (SPARK-24322) Upgrade Apache ORC to 1.4.4

2018-05-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-24322. - Resolution: Fixed Fix Version/s: 2.3.1 2.4.0 Issue resolved by pull

[jira] [Resolved] (SPARK-24257) LongToUnsafeRowMap calculate the new size may be wrong

2018-05-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-24257. - Resolution: Fixed Fix Version/s: 2.3.1 2.1.3 2.4.0

[jira] [Assigned] (SPARK-24257) LongToUnsafeRowMap calculate the new size may be wrong

2018-05-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-24257: --- Assignee: dzcxzl > LongToUnsafeRowMap calculate the new size may be wrong >

[jira] [Commented] (SPARK-23416) Flaky test: KafkaSourceStressForDontFailOnDataLossSuite.stress test for failOnDataLoss=false

2018-05-23 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488355#comment-16488355 ] Dongjoon Hyun commented on SPARK-23416: --- Thank you, [~joseph.torres] and [~tdas] . Actually,

[jira] [Commented] (SPARK-24370) spark checkpoint creates many 0 byte empty files(partitions) in checkpoint directory

2018-05-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488330#comment-16488330 ] Hyukjin Kwon commented on SPARK-24370: -- Sounds more like a question though. Do you have any

[jira] [Commented] (SPARK-24370) spark checkpoint creates many 0 byte empty files(partitions) in checkpoint directory

2018-05-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488329#comment-16488329 ] Hyukjin Kwon commented on SPARK-24370: -- (let's avoid setting critical usually reserved for a

[jira] [Updated] (SPARK-24370) spark checkpoint creates many 0 byte empty files(partitions) in checkpoint directory

2018-05-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-24370: - Priority: Major (was: Critical) > spark checkpoint creates many 0 byte empty files(partitions)

[jira] [Resolved] (SPARK-24363) spark context exit abnormally

2018-05-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-24363. -- Resolution: Invalid Sounds more like a question. Please ask it to dev mailing list. I believe

[jira] [Commented] (SPARK-24358) createDataFrame in Python 3 should be able to infer bytes type as Binary type

2018-05-23 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488261#comment-16488261 ] Hyukjin Kwon commented on SPARK-24358: -- Yea, that's exactly what I thought. There are some

[jira] [Resolved] (SPARK-23416) Flaky test: KafkaSourceStressForDontFailOnDataLossSuite.stress test for failOnDataLoss=false

2018-05-23 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-23416. --- Resolution: Fixed Fix Version/s: (was: 2.3.0) 3.0.0 Issue

[jira] [Assigned] (SPARK-23416) Flaky test: KafkaSourceStressForDontFailOnDataLossSuite.stress test for failOnDataLoss=false

2018-05-23 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das reassigned SPARK-23416: - Assignee: Jose Torres > Flaky test: KafkaSourceStressForDontFailOnDataLossSuite.stress

[jira] [Updated] (SPARK-24322) Upgrade Apache ORC to 1.4.4

2018-05-23 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24322: -- Description: ORC 1.4.4 includes [nine

[jira] [Updated] (SPARK-24373) "df.cache() df.count()" no longer eagerly caches data

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24373: --- Summary: "df.cache() df.count()" no longer eagerly caches data (was: Spark Dataset groupby.agg/count

[jira] [Comment Edited] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488109#comment-16488109 ] Li Jin edited comment on SPARK-24373 at 5/23/18 11:18 PM: -- We found after

[jira] [Updated] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Li Jin updated SPARK-24373: --- Component/s: (was: Input/Output) SQL > Spark Dataset groupby.agg/count doesn't respect

[jira] [Commented] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Li Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488109#comment-16488109 ] Li Jin commented on SPARK-24373: I think this might be a regression from 2.2 Any one uses  "df.cache()

[jira] [Commented] (SPARK-24358) createDataFrame in Python 3 should be able to infer bytes type as Binary type

2018-05-23 Thread Joel Croteau (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488090#comment-16488090 ] Joel Croteau commented on SPARK-24358: -- This may be trickier than I first thought. In Python 2,

[jira] [Updated] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Wenbo Zhao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbo Zhao updated SPARK-24373: --- Description: Here is the code to reproduce in local mode {code:java} scala> val df = sc.range(1,

[jira] [Updated] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Wenbo Zhao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbo Zhao updated SPARK-24373: --- Description: Here is the code to reproduce in local mode {code:java} scala> val df = sc.range(1,

[jira] [Updated] (SPARK-24322) Upgrade Apache ORC to 1.4.4

2018-05-23 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-24322: -- Description: ORC 1.4.4 includes [nine

[jira] [Updated] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache with UDF

2018-05-23 Thread Wenbo Zhao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenbo Zhao updated SPARK-24373: --- Summary: Spark Dataset groupby.agg/count doesn't respect cache with UDF (was: Spark Dataset

[jira] [Created] (SPARK-24373) Spark Dataset groupby.agg/count doesn't respect cache

2018-05-23 Thread Wenbo Zhao (JIRA)
Wenbo Zhao created SPARK-24373: -- Summary: Spark Dataset groupby.agg/count doesn't respect cache Key: SPARK-24373 URL: https://issues.apache.org/jira/browse/SPARK-24373 Project: Spark Issue

[jira] [Commented] (SPARK-22054) Allow release managers to inject their keys

2018-05-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488024#comment-16488024 ] Marcelo Vanzin commented on SPARK-22054: I filed SPARK-24372 to create an easier way to prepare

[jira] [Updated] (SPARK-22054) Allow release managers to inject their keys

2018-05-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-22054: --- Priority: Major (was: Blocker) > Allow release managers to inject their keys >

[jira] [Updated] (SPARK-22055) Port release scripts

2018-05-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22055?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin updated SPARK-22055: --- Priority: Major (was: Blocker) > Port release scripts > > >

[jira] [Commented] (SPARK-22055) Port release scripts

2018-05-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488023#comment-16488023 ] Marcelo Vanzin commented on SPARK-22055: I filed SPARK-24372 for the stuff I'm working on. I'm

[jira] [Created] (SPARK-24372) Create script for preparing RCs

2018-05-23 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-24372: -- Summary: Create script for preparing RCs Key: SPARK-24372 URL: https://issues.apache.org/jira/browse/SPARK-24372 Project: Spark Issue Type: New Feature

[jira] [Assigned] (SPARK-24371) Added isinSet in DataFrame API for Scala and Java.

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24371: Assignee: Apache Spark (was: DB Tsai) > Added isinSet in DataFrame API for Scala and

[jira] [Assigned] (SPARK-24371) Added isinSet in DataFrame API for Scala and Java.

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24371: Assignee: DB Tsai (was: Apache Spark) > Added isinSet in DataFrame API for Scala and

[jira] [Commented] (SPARK-24371) Added isinSet in DataFrame API for Scala and Java.

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488012#comment-16488012 ] Apache Spark commented on SPARK-24371: -- User 'dbtsai' has created a pull request for this issue:

[jira] [Created] (SPARK-24371) Added isinSet in DataFrame API for Scala and Java.

2018-05-23 Thread DB Tsai (JIRA)
DB Tsai created SPARK-24371: --- Summary: Added isinSet in DataFrame API for Scala and Java. Key: SPARK-24371 URL: https://issues.apache.org/jira/browse/SPARK-24371 Project: Spark Issue Type: New

[jira] [Updated] (SPARK-24322) Upgrade Apache ORC to 1.4.4

2018-05-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24322: Labels: correctness (was: ) > Upgrade Apache ORC to 1.4.4 > --- > >

[jira] [Resolved] (SPARK-24294) Throw SparkException when OOM in BroadcastExchangeExec

2018-05-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24294. - Resolution: Fixed Assignee: jin xing Fix Version/s: 2.4.0 > Throw SparkException when

[jira] [Resolved] (SPARK-24206) Improve DataSource benchmark code for read and pushdown

2018-05-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-24206. - Resolution: Fixed Assignee: Takeshi Yamamuro Fix Version/s: 2.4.0 > Improve DataSource

[jira] [Reopened] (SPARK-24244) Parse only required columns of CSV file

2018-05-23 Thread Maxim Gekk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk reopened SPARK-24244: Previous PR was reverted due flaky UnivocityParserSuite > Parse only required columns of CSV file >

[jira] [Commented] (SPARK-24368) Flaky tests: org.apache.spark.sql.execution.datasources.csv.UnivocityParserSuite

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487923#comment-16487923 ] Apache Spark commented on SPARK-24368: -- User 'MaxGekk' has created a pull request for this issue:

[jira] [Commented] (SPARK-24244) Parse only required columns of CSV file

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487922#comment-16487922 ] Apache Spark commented on SPARK-24244: -- User 'MaxGekk' has created a pull request for this issue:

[jira] [Commented] (SPARK-23206) Additional Memory Tuning Metrics

2018-05-23 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23206?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487901#comment-16487901 ] Imran Rashid commented on SPARK-23206: -- [~felixcheung] can you give an example of the network & IO

[jira] [Updated] (SPARK-24370) spark checkpoint creates many 0 byte empty files(partitions) in checkpoint directory

2018-05-23 Thread Jami Malikzade (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jami Malikzade updated SPARK-24370: --- Attachment: partitions.PNG > spark checkpoint creates many 0 byte empty files(partitions)

[jira] [Created] (SPARK-24370) spark checkpoint creates many 0 byte empty files(partitions) in checkpoint directory

2018-05-23 Thread Jami Malikzade (JIRA)
Jami Malikzade created SPARK-24370: -- Summary: spark checkpoint creates many 0 byte empty files(partitions) in checkpoint directory Key: SPARK-24370 URL: https://issues.apache.org/jira/browse/SPARK-24370

[jira] [Updated] (SPARK-24369) A bug when having multiple distinct aggregations

2018-05-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24369: Summary: A bug when having multiple distinct aggregations (was: A bug of multiple distinct aggregations)

[jira] [Updated] (SPARK-24369) A bug of multiple distinct aggregations

2018-05-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24369: Summary: A bug of multiple distinct aggregations (was: A bug for multiple distinct aggregations) > A bug

[jira] [Updated] (SPARK-24369) A bug for multiple distinct aggregations

2018-05-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-24369: Description: {code} SELECT corr(DISTINCT x, y), corr(DISTINCT y, x), count(*) FROM (VALUES (1, 1),

[jira] [Created] (SPARK-24369) A bug for multiple distinct aggregations

2018-05-23 Thread Xiao Li (JIRA)
Xiao Li created SPARK-24369: --- Summary: A bug for multiple distinct aggregations Key: SPARK-24369 URL: https://issues.apache.org/jira/browse/SPARK-24369 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-24359) SPIP: ML Pipelines in R

2018-05-23 Thread Hossein Falaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487815#comment-16487815 ] Hossein Falaki commented on SPARK-24359: Thanks [~shivaram] and [~zero323]. It seems CRAN release

[jira] [Assigned] (SPARK-24368) Flaky tests: org.apache.spark.sql.execution.datasources.csv.UnivocityParserSuite

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24368: Assignee: (was: Apache Spark) > Flaky tests: >

[jira] [Commented] (SPARK-24368) Flaky tests: org.apache.spark.sql.execution.datasources.csv.UnivocityParserSuite

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487755#comment-16487755 ] Apache Spark commented on SPARK-24368: -- User 'MaxGekk' has created a pull request for this issue:

[jira] [Assigned] (SPARK-24368) Flaky tests: org.apache.spark.sql.execution.datasources.csv.UnivocityParserSuite

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24368: Assignee: Apache Spark > Flaky tests: >

[jira] [Commented] (SPARK-24359) SPIP: ML Pipelines in R

2018-05-23 Thread Shivaram Venkataraman (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487754#comment-16487754 ] Shivaram Venkataraman commented on SPARK-24359: --- I'd just like to echo the point on release

[jira] [Assigned] (SPARK-23161) Add missing APIs to Python GBTClassifier

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23161: Assignee: Apache Spark > Add missing APIs to Python GBTClassifier >

[jira] [Commented] (SPARK-23161) Add missing APIs to Python GBTClassifier

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487739#comment-16487739 ] Apache Spark commented on SPARK-23161: -- User 'huaxingao' has created a pull request for this issue:

[jira] [Assigned] (SPARK-23161) Add missing APIs to Python GBTClassifier

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-23161: Assignee: (was: Apache Spark) > Add missing APIs to Python GBTClassifier >

[jira] [Commented] (SPARK-21945) pyspark --py-files doesn't work in yarn client mode

2018-05-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487707#comment-16487707 ] Marcelo Vanzin commented on SPARK-21945: {noformat} $ cat test.py from pyspark import

[jira] [Comment Edited] (SPARK-21945) pyspark --py-files doesn't work in yarn client mode

2018-05-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487707#comment-16487707 ] Marcelo Vanzin edited comment on SPARK-21945 at 5/23/18 5:27 PM: -

[jira] [Commented] (SPARK-22055) Port release scripts

2018-05-23 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487647#comment-16487647 ] Marcelo Vanzin commented on SPARK-22055: No, my script just prepares the release (asks questions

[jira] [Commented] (SPARK-24368) Flaky tests: org.apache.spark.sql.execution.datasources.csv.UnivocityParserSuite

2018-05-23 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487516#comment-16487516 ] Xiao Li commented on SPARK-24368: - cc [~maxgekk] > Flaky tests: >

[jira] [Created] (SPARK-24368) Flaky tests: org.apache.spark.sql.execution.datasources.csv.UnivocityParserSuite

2018-05-23 Thread Xiao Li (JIRA)
Xiao Li created SPARK-24368: --- Summary: Flaky tests: org.apache.spark.sql.execution.datasources.csv.UnivocityParserSuite Key: SPARK-24368 URL: https://issues.apache.org/jira/browse/SPARK-24368 Project:

[jira] [Commented] (SPARK-18805) InternalMapWithStateDStream make java.lang.StackOverflowError

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18805?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487391#comment-16487391 ] Apache Spark commented on SPARK-18805: -- User 'chermenin' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18805) InternalMapWithStateDStream make java.lang.StackOverflowError

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18805: Assignee: Apache Spark > InternalMapWithStateDStream make java.lang.StackOverflowError >

[jira] [Assigned] (SPARK-18805) InternalMapWithStateDStream make java.lang.StackOverflowError

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18805: Assignee: (was: Apache Spark) > InternalMapWithStateDStream make

[jira] [Resolved] (SPARK-23711) Add fallback to interpreted execution logic

2018-05-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-23711. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 21106

[jira] [Assigned] (SPARK-23711) Add fallback to interpreted execution logic

2018-05-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-23711: --- Assignee: Liang-Chi Hsieh > Add fallback to interpreted execution logic >

[jira] [Updated] (SPARK-24313) Collection functions interpreted execution doesn't work with complex types

2018-05-23 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-24313: Fix Version/s: 2.3.1 > Collection functions interpreted execution doesn't work with complex types

[jira] [Commented] (SPARK-24367) Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487302#comment-16487302 ] Apache Spark commented on SPARK-24367: -- User 'gengliangwang' has created a pull request for this

[jira] [Assigned] (SPARK-24367) Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24367: Assignee: Apache Spark > Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag

[jira] [Assigned] (SPARK-24367) Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24367: Assignee: (was: Apache Spark) > Parquet: use JOB_SUMMARY_LEVEL instead of deprecated

[jira] [Created] (SPARK-24367) Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY

2018-05-23 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-24367: -- Summary: Parquet: use JOB_SUMMARY_LEVEL instead of deprecated flag ENABLE_JOB_SUMMARY Key: SPARK-24367 URL: https://issues.apache.org/jira/browse/SPARK-24367

[jira] [Commented] (SPARK-24359) SPIP: ML Pipelines in R

2018-05-23 Thread Maciej Szymkiewicz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487209#comment-16487209 ] Maciej Szymkiewicz commented on SPARK-24359: Just my two cents: * As proposed right now,

[jira] [Updated] (SPARK-24366) Improve error message for Catalyst type converters

2018-05-23 Thread Maxim Gekk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-24366: --- Summary: Improve error message for Catalyst type converters (was: Improve error message for type

[jira] [Updated] (SPARK-24366) Improve error message for type converting

2018-05-23 Thread Maxim Gekk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maxim Gekk updated SPARK-24366: --- Summary: Improve error message for type converting (was: Improve error message for type

[jira] [Commented] (SPARK-24366) Improve error message for type conversions

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487171#comment-16487171 ] Apache Spark commented on SPARK-24366: -- User 'MaxGekk' has created a pull request for this issue:

[jira] [Assigned] (SPARK-24366) Improve error message for type conversions

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24366: Assignee: (was: Apache Spark) > Improve error message for type conversions >

[jira] [Assigned] (SPARK-24366) Improve error message for type conversions

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24366: Assignee: Apache Spark > Improve error message for type conversions >

[jira] [Created] (SPARK-24366) Improve error message for type conversions

2018-05-23 Thread Maxim Gekk (JIRA)
Maxim Gekk created SPARK-24366: -- Summary: Improve error message for type conversions Key: SPARK-24366 URL: https://issues.apache.org/jira/browse/SPARK-24366 Project: Spark Issue Type:

[jira] [Commented] (SPARK-24341) Codegen compile error from predicate subquery

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487120#comment-16487120 ] Apache Spark commented on SPARK-24341: -- User 'mgaido91' has created a pull request for this issue:

[jira] [Assigned] (SPARK-24341) Codegen compile error from predicate subquery

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24341: Assignee: (was: Apache Spark) > Codegen compile error from predicate subquery >

[jira] [Assigned] (SPARK-24341) Codegen compile error from predicate subquery

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24341: Assignee: Apache Spark > Codegen compile error from predicate subquery >

[jira] [Comment Edited] (SPARK-24324) UserDefinedFunction mixes column labels

2018-05-23 Thread Cristian Consonni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487110#comment-16487110 ] Cristian Consonni edited comment on SPARK-24324 at 5/23/18 11:36 AM: -

[jira] [Commented] (SPARK-24324) UserDefinedFunction mixes column labels

2018-05-23 Thread Cristian Consonni (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487110#comment-16487110 ] Cristian Consonni commented on SPARK-24324: --- [~hyukjin.kwon] said: > Ah, I meant shorter

[jira] [Commented] (SPARK-24365) Add Parquet write benchmark

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16487050#comment-16487050 ] Apache Spark commented on SPARK-24365: -- User 'gengliangwang' has created a pull request for this

[jira] [Assigned] (SPARK-24365) Add Parquet write benchmark

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24365: Assignee: (was: Apache Spark) > Add Parquet write benchmark >

[jira] [Assigned] (SPARK-24365) Add Parquet write benchmark

2018-05-23 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-24365: Assignee: Apache Spark > Add Parquet write benchmark > --- > >

  1   2   >