Kubernetes reading a yaml file with Pyspark fails

2021-07-23 Thread Mich Talebzadeh
Hi, I have an issue with my PySpark running in Kubernetes (testing on minikube). The project is zipped as DSBQ.zip and passed to spark-submit with the zipped file on HDFS (pod can read it). Zipped file DSBQ.zip zipped at root and has the following structure: One of the py-files called DSBQ.zip

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-23 Thread Mich Talebzadeh
Thanks Julien for further info. I have been working a few day fee time on Pyspark on Kubernetes both on minikube and Google Cloud Platform (GCP) that provide Spark on Google Kubernetes Engine (GKE). Frankly my work on k8s has been a bit disappointing. In GCP the only available and supported

Re: Bechmarks on Spark running on Yarn versus Spark on K8s

2021-07-23 Thread Julien Laurenceau
Hi, Good question ! It is very dependent to your jobs and developer team. Things that mostly differ in my view is : 1/ data locality & fast-read If your data are stored in an HDFS cluster (not HCFS) and your Spark compute nodes are allowed to run on the Hadoop nodes, then definitely use Yarn to

[Spark SQL] Why doesn't Spark SQL use code generation for my custom expression?

2021-07-23 Thread Han You
Hello, I’m writing a custom Spark catalyst Expression with custom codegen, but it seems that Spark (3.0.0) doesn’t want to generate code, and falls back to interpreted mode. I created my SparkSession with spark.sql.codegen.factoryMode=CODEGEN_ONLY and spark.sql.codegen.fallback=false, hoping that