Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread wuyi
This could be a sub-task of https://issues.apache.org/jira/browse/SPARK-25299 (Use remote storage for persisting shuffle data)? It's good if we could put the whole SPARK-25299 in Spark 3.1. Holden Karau wrote > Should we also consider the

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread Jungtaek Lim
Does this count only "new features" (probably major), or also count "improvements"? I'm aware of a couple of improvements which should be ideally included in the next release, but if this counts only major new features then don't feel they should be listed. On Tue, Jun 30, 2020 at 1:32 AM Holden

Contribute to Apache Spark

2020-06-29 Thread ????????
Hi, I want to contribute to Apache Spark. Would you please give me the contributor permission? My JIRA ID is suizhe007.

Re: Spark 3 pod template for the driver

2020-06-29 Thread edeesis
If I could muster a guess, you still need to specify the executor image. As is, this will only specify the driver image. You can specify it as --conf spark.kubernetes.container.image or --conf spark.kubernetes.executor.container.image -- Sent from:

Re: UnknownSource NullPointerException in CodeGen. with Custom Strategy

2020-06-29 Thread wuyi
Hi Nasrulla, Could you give a complete demo to reproduce the issue? -- Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/ - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Spark 3 pod template for the driver

2020-06-29 Thread Michel Sumbul
Hello, Adding the dev mailing list maybe there is someone here that can help to have/show a valid/accepted pod template for spark 3? Thanks in advance, Michel Le ven. 26 juin 2020 à 14:03, Michel Sumbul a écrit : > Hi Jorge, > If I set that in the spark submit command it works but I want it

Re: [DISCUSS][SPIP] Graceful Decommissioning

2020-06-29 Thread wuyi
Thank you for your effort, Holden. I left a few comments in SPIP. I asked for some details, though I know some contents have been include in the design doc. I'm not very clear about difference between the design doc and SPIP. But from what I saw at the SPIP template questions, I think some

Re: Setting spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 and Doc issue

2020-06-29 Thread Steve Loughran
v2 does a file-by-file copy to the dest dir in task commit; v1 promotes task attempts to job attempt dir by dir rename, job commit lists those and moves the contents if the worker fails during task commit -the next task attempt has to replace every file -so it had better use the same filenames.

Re: Contract for PartitionReader/InputPartition for ColumnarBatch?

2020-06-29 Thread Bobby Evans
Micah, You are correct. The contract for processing ColumnarBatches is that the code that produced the batch is responsible for closing it and anything downstream of it cannot keep any references to it. This is just like with UnsafeRow. If an UnsafeRow is cached, like for aggregates or sorts, it

Re: java.lang.ClassNotFoundException for s3a comitter

2020-06-29 Thread Steve Loughran
you are going to need hadoop-3.1 on your classpath, with hadoop-aws and the same aws-sdk it was built with (1.11.something). Mixing hadoop JARs is doomed. using a different aws sdk jar is a bit risky, though more recent upgrades have all be fairly low stress On Fri, 19 Jun 2020 at 05:39, murat

Re: preferredlocations for hadoopfsrelations based baseRelations

2020-06-29 Thread Steve Loughran
Here's a class which lets you proved a function on a row by row basis to declare location https://github.com/hortonworks-spark/cloud-integration/blob/master/spark-cloud-integration/src/main/scala/org/apache/spark/cloudera/ParallelizedWithLocalityRDD.scala needs to be in o.a.spark as something

Re: [DISCUSS][SPIP] Graceful Decommissioning

2020-06-29 Thread Holden Karau
So from the template I believe the SPIP is supposed to be more high level and then design goes into the linked “design sketch.” What sort of detail would you like to see added? On Mon, Jun 29, 2020 at 1:38 AM wuyi wrote: > Thank you for your effort, Holden. > > I left a few comments in SPIP. I

Announcing ApacheCon @Home 2020

2020-06-29 Thread Rich Bowen
Hi, Apache enthusiast! (You’re receiving this because you’re subscribed to one or more dev or user mailing lists for an Apache Software Foundation project.) The ApacheCon Planners and the Apache Software Foundation are pleased to announce that ApacheCon @Home will be held online, September

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread Maxim Gekk
Hi Dongjoon, I would add: - Filters pushdown to JSON (https://github.com/apache/spark/pull/27366) - Filters pushdown to other datasources like Avro - Support nested attributes of filters pushed down to JSON Maxim Gekk Software Engineer Databricks, Inc. On Mon, Jun 29, 2020 at 7:07 PM

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread JackyLee
Thank you for putting forward this. Can we put the support of view and partition catalog in version 3.1? AFAIT, these are great features in DSv2 and Catalog. With these, we can work well with warehouse, such as delta or hive. https://github.com/apache/spark/pull/28147

Re: [DISCUSS][SPIP] Graceful Decommissioning

2020-06-29 Thread wuyi
I've left the comments in SPIP, so let's discuss there. Holden Karau wrote > So from the template I believe the SPIP is supposed to be more high level > and then design goes into the linked “design sketch.” What sort of detail > would you like to see added? > > On Mon, Jun 29, 2020 at 1:38 AM

Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread Dongjoon Hyun
Hi, All. After a short celebration of Apache Spark 3.0, I'd like to ask you the community opinion on Apache Spark 3.1 feature expectations. First of all, Apache Spark 3.1 is scheduled for December 2020. - https://spark.apache.org/versioning-policy.html I'm expecting the following items: 1.

Re: Apache Spark 3.1 Feature Expectation (Dec. 2020)

2020-06-29 Thread Holden Karau
Should we also consider the shuffle service refactoring to support pluggable storage engines as targeting the 3.1 release? On Mon, Jun 29, 2020 at 9:31 AM Maxim Gekk wrote: > Hi Dongjoon, > > I would add: > - Filters pushdown to JSON (https://github.com/apache/spark/pull/27366) > - Filters

Re: [DISCUSS][SPIP] Graceful Decommissioning

2020-06-29 Thread Holden Karau
Ah, I had thought there was a larger issue given the scope of the comments. Excited to hear that is not the case. I'll respond in the doc :) On Mon, Jun 29, 2020 at 8:03 AM wuyi wrote: > I've left the comments in SPIP, so let's discuss there. > > > Holden Karau wrote > > So from the template I