[Spark SQL][Intermediate][How to] Custom transformation to datasource V2 write apis

2021-06-03 Thread Sivabalan
Hey folks, Is it possible to add some custom transformations to dataframe with a custom datasource V2 write api? I understand I need to define Table -> SupportsWrite

Re: Kube estimate for Spark

2021-06-03 Thread Femi Anthony
I think he’s running Spark on Kubernetes not YARN as cluster manager Sent from my iPhone > On Jun 3, 2021, at 6:05 AM, Mich Talebzadeh wrote: > >  > Please provide the spark version, the environment you are running (on-prem, > cloud etc), state if you are running in YARN etc and your spark-s

Questions about `CreateViewCommand`

2021-06-03 Thread Zhun Wang
hi, When using `CREATE OR REPLACE VIEW v0 AS SELECT ...`, there will be some problems when the state is between `dropTable` and `createTable`. Perhaps `CreateViewCommand` uses `alterTable` more appropriately. What are the problems with using `alterTable`? Why do we replace it with `dropTabl

Re: Reading Large File in Pyspark

2021-06-03 Thread Gourav Sengupta
Hi, could not agree more with Molotch :) Regards, Gourav Sengupta On Thu, May 27, 2021 at 7:08 PM Molotch wrote: > You can specify the line separator to make spark split your records into > separate rows. > > df = spark.read.option("lineSep","^^^").text("path") > > Then you need to df.select(

Re: Kube estimate for Spark

2021-06-03 Thread Mich Talebzadeh
Please provide the spark version, the environment you are running (on-prem, cloud etc), state if you are running in YARN etc and your spark-submit parameters. Have you checked spark UI default on port 4040 under stages and executor tabs HTH view my Linkedin profile

Kube estimate for Spark

2021-06-03 Thread Subash Prabanantham
Hi Team, I am trying to understand how to estimate Kube cpu with respect to Spark executor cores. For example, Job configuration: (given to start) cores/executor = 4 # of executors = 240 But the allocated resources when we ran job are as follows, cores/executor = 4 # of executors = 47 So the q