Re: [EXTERNAL] Spark Thrift Server - Autoscaling on K8

2023-03-09 Thread Saurabh Gulati
Hey Jayabindu, We use thriftserver for on K8S. May I ask why you are not going for Trino instead? I know it didn't support autoscaling when we tested it in the past but not sure if it does now. Autoscaling also means that users might have to wait for the cluster to autoscale but that usually

Re: [EXTERNAL] Re: Online classes for spark topics

2023-03-09 Thread Saurabh Gulati
Hey guys, Its a nice idea and appreciate the effort you guys are taking. I can add to the list of topics which might be of interest: * Spark UI * Dynamic allocation * Tuning of jobs * Collecting spark metrics for monitoring and alerting HTH From:

Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

2023-01-05 Thread Saurabh Gulati
and 2 single quotes together'' are looking like a single double quote ". Mvg/Regards Saurabh Gulati From: Saurabh Gulati Sent: 05 January 2023 12:24 To: Sean Owen Cc: User Subject: Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the

Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

2023-01-05 Thread Saurabh Gulati
Its the same input except that headers are also being read with csv reader. Mvg/Regards Saurabh Gulati From: Sean Owen Sent: 04 January 2023 15:12 To: Saurabh Gulati Cc: User Subject: Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

Re: [EXTERNAL] Re: Re: Incorrect csv parsing when delimiter used within the data

2023-01-05 Thread Saurabh Gulati
Yes, there are other ways to solve this but trying to understand why there is a difference in behaviour between df.show() and df.select("c").show()​ Mvg/Regards Saurabh Gulati From: Shay Elbaz Sent: 04 January 2023 14:54 To: Saurabh Gulati ; Sean Owen

Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

2023-01-04 Thread Saurabh Gulati
for row in csv_reader: print(row) ['a', 'b', 'c'] ['1', '', ',see what "I did",\ni am still writing'] ['2', '', 'abc'] And also, I don't understand why there is a distinction in outputs from df.show()​ and df.select("c").show()​ Mvg/Regards Saurabh Gulati Data Platform __

Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within the data

2023-01-04 Thread Saurabh Gulati
"|null|null| |2 |null|abc | +---+----+--------+ df.select("c").show(10, False) ++ |c | ++ |",see what ""I did""| |null

Re: How to set a config for a single query?

2023-01-04 Thread Saurabh Gulati
Hey Felipe, Since you are collecting the dataframes, you might as well run them separately with desired configs and store them in your storage. Regards Saurabh From: Felipe Pessoto Sent: 04 January 2023 01:14 To: user@spark.apache.org Subject: [EXTERNAL] How to

Incorrect csv parsing when delimiter used within the data

2023-01-03 Thread Saurabh Gulati
Hello, We are seeing a case with csv data when it parses csv data incorrectly. The issue can be replicated using the below csv data "a","b","c" "1","","," "2","","abc" and using the spark csv read command. df = spark.read.format("csv")\ .option("multiLine", True)\ .option("escape", '"')\

Re: spark-submit fails in kubernetes 1.24.x cluster

2022-12-27 Thread Saurabh Gulati
erving the following deprecated API versions: Flow control ... kubernetes.io You will need to upgrade your endpoint to use the new general available endpoint. Regards Saurabh Gulati From: Thimme Gowda TP (Nokia) Sent: 23 December 2022 11:31 To: user@spark.apache.org S

Re: [EXTERNAL] Re: Spark streaming

2022-08-19 Thread Saurabh Gulati
You can also try out https://debezium.io/documentation/reference/0.10/connectors/mysql.html From: Ajit Kumar Amit Sent: 19 August 2022 14:30 To: sandra sukumaran Cc: user@spark.apache.org Subject: [EXTERNAL] Re: Spark streaming Caution! This email originated

Re: [EXTERNAL] Re: Spark streaming - Data Ingestion

2022-08-17 Thread Saurabh Gulati
Another take: * Debezium to read Write Ahead logs(WAL) and send to Kafka * Kafka connect to write to cloud storage -> Hive * OR * Spark streaming to parse WAL -> Storage -> Hive Regards

[Spark Core]: Unexpectedly exiting executor while gracefully decommissioning

2022-04-25 Thread Saurabh Gulati
Hey guys, My colleague tried to post a question twice but somehow it doesn't show up in or emails, but it does exist in the archive. So, I will post the question here again. We are running into some issues while attempting

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-03-10 Thread Saurabh Gulati
Hi Gourav, We use auto-scaling containers in GKE for running the Spark thriftserver. From: Gourav Sengupta Sent: 07 March 2022 14:36 To: Saurabh Gulati Cc: Mich Talebzadeh ; Kidong Lee ; user@spark.apache.org Subject: Re: [EXTERNAL] Re: Need to make WHERE

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-03-07 Thread Saurabh Gulati
for all types of queries. For example, we can parse the SQL and see if count (where​) == count( partition_column​ ), but this may not work for complex queries. Regards Saurabh From: Gourav Sengupta Sent: 05 March 2022 11:06 To: Saurabh Gulati Cc: Mich

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-02-22 Thread Saurabh Gulati
which has the same overview of data as we have in gcs. We tried presto but performance was similar and presto didn't support auto scaling. TIA Saurabh From: Mich Talebzadeh Sent: 22 February 2022 16:49 To: Kidong Lee ; Saurabh Gulati Cc: user@spark.apache.org

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-02-22 Thread Saurabh Gulati
To correct my last message, its hive-metastore​ running as a service in a container and not hive. We use Spark-thriftserver for query execution. From: Saurabh Gulati Sent: 22 February 2022 16:33 To: Mich Talebzadeh Cc: user@spark.apache.org Subject: Re

Re: [EXTERNAL] Re: Need to make WHERE clause compulsory in Spark SQL

2022-02-22 Thread Saurabh Gulati
third container. We use Spark on GKE setup to run thrift-server which spawns workers depending on the load. For buckets we use gcs. TIA Saurabh From: Mich Talebzadeh Sent: 22 February 2022 16:05 To: Saurabh Gulati Cc: user@spark.apache.org Subject: [EXTERNAL] Re

Need to make WHERE clause compulsory in Spark SQL

2022-02-22 Thread Saurabh Gulati
Hello, We are trying to setup Spark as the execution engine for exposing our data stored in lake. We have hive metastore running along with Spark thrift server and are using Superset as the UI. We save all tables as External tables in hive metastore with storge being on Cloud. We see that

Re: [EXTERNAL] Re: Unable to access Google buckets using spark-submit

2022-02-14 Thread Saurabh Gulati
Hey Karan, you can get the jar from here From: karan alang Sent: 13 February 2022 20:08 To: Gourav Sengupta Cc: Holden Karau ; Mich Talebzadeh ; user @spark

Re: [EXTERNAL] [Marketing Mail] Re: [Spark] Optimize spark join on different keys for same data frame

2021-10-05 Thread Saurabh Gulati
Hi Amit, The only approach I can think of is to create 2 copies of schema_df1​, one partitioned on key1 and other on key2 and then use these to Join. From: Amit Joshi Sent: 04 October 2021 19:13 To: spark-user Subject: [EXTERNAL] [Marketing Mail] Re: [Spark]

Re: [EXTERNAL] [Marketing Mail] Reading SPARK 3.1.x generated parquet in SPARK 2.4.x

2021-08-12 Thread Saurabh Gulati
We had issues with this migration mainly because of changes in spark date calendars. See We got this working by setting the below params:

Spark 3.0.1 new Proleptic Gregorian calendar

2020-11-19 Thread Saurabh Gulati
Hello, First of all, Thanks to you guys for maintaining and improving Spark. We just updated to Spark 3.0.1 and are facing some issues with the new Proleptic Gregorian calendar. We have data from different sources in our platform and we saw there were some date/timestamp columns that go back