Re: [Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-29 Thread Yinan Li
Hi David, Regarding cpu limit, in Spark 2.3, we do have the following config properties to specify cpu limit for the driver and executors. See http://spark.apache.org/docs/latest/running-on-kubernetes.html. spark.kubernetes.driver.limit.cores spark.kubernetes.executor.limit.cores On Thu, Mar 29,

[Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-29 Thread David Vogelbacher
Hi, At the moment driver and executor pods are created using the following requests and limits: CPUMemory Request[driver,executor].cores[driver,executor].memory LimitUnlimited (but can be specified using spark.[driver,executor].cores)[driver,executor].memory + [driver,executor].memoryOverh

[DISCUSS] Catalog APIs and multi-catalog support

2018-03-29 Thread Ryan Blue
Hi everyone, As a follow-up to the SPIP to clean up SparkSQL logical plans , I've written up a proposal for catalog APIs that are required for Spark to implement reliable high-l

Re: DataSourceV2 write input requirements

2018-03-29 Thread Russell Spitzer
@RyanBlue I'm hoping that through the CBO effort we will continue to get more detailed statistics. Like on read we could be using sketch data structures to get estimates on unique values and density for each column. You may be right that the real way for this to be handled would be giving a "cost"

Re: DataSourceV2 write input requirements

2018-03-29 Thread Ryan Blue
Cassandra can insert records with the same partition-key faster if they arrive in the same payload. But this is only beneficial if the incoming dataset has multiple entries for the same partition key. Thanks for the example, the recommended partitioning use case makes more sense now. I think we co

Re: [Spark R] Proposal: Exposing RBackend in RRunner

2018-03-29 Thread Jeremy Liu
Use case is to cache a reference to the JVM object created by SparkR. On Wed, Mar 28, 2018 at 12:03 PM Reynold Xin wrote: > If you need the functionality I would recommend you just copying the code > over to your project and use it that way. > > On Wed, Mar 28, 2018 at 9:02 AM Felix Cheung > wr

Re: Build issues with apache-spark-on-k8s.

2018-03-29 Thread Yinan Li
For 2.3, the dockerfile is under kubernetes/ in the tarball, not under the directory where you started the build. Once you successfully build, copy the tarball out, untar it, and you should see the directory kubernetes/ in it. On Thu, Mar 29, 2018 at 3:00 AM, Atul Sowani wrote: > Thanks all for

Re: Build issues with apache-spark-on-k8s.

2018-03-29 Thread Rob Vesse
Kubernetes support was only added as an experimental feature in Spark 2.3.0 It does not exist in the Apache Spark branch-2.2 If you really must build for Spark 2.2 you will need to use branch-2.2-kubernetes from the apache-spark-on-k8s fork on GitHub Note that there are various functio

Re: Build issues with apache-spark-on-k8s.

2018-03-29 Thread Atul Sowani
Thanks all for responding and helping me with the build issue. I tried building the code at git://github.com/apache/spark.git (master branch) in my ppc64le Ubuntu 16.04 VM and it failed. I tried building a specific branch (branch-2.2) using following command: build/mvn -DskipTests -Pkubernetes cle