Re: Spark 3.4.1 and Hive 3.1.3

2023-09-07 Thread Yeachan Park
k/sql/hive. But when I am trying to build *Spark 3.4.1* *with > Hive 2.3.9* the build is completed successfully. > > > > Has anyone tried building Spark 3.4.1 with Hive 3.1.3 or higher? > > > > Thanks, > > Sanket A. > > > > *From:* Yeachan Park > *Sent:* Tuesd

Re: Spark File Output Committer algorithm for GCS

2023-07-17 Thread Yeachan Park
Did you check if mapreduce.fileoutputcommitter.algorithm.version 2 is supported on GCS? IIRC it wasn't, but you could check with GCP support On Mon, Jul 17, 2023 at 3:54 PM Dipayan Dev wrote: > Thanks Jay, > > I will try that option. > > Any insight on the file committer algorithms? > > I

Loading in custom Hive jars for spark

2023-07-11 Thread Yeachan Park
Hi all, We made some changes to hive which require changes to the hive jars that Spark is bundled with. Since Spark 3.3.1 comes bundled with Hive 2.3.9 jars, we built our changes in Hive 2.3.9 and put the necessary jars under $SPARK_HOME/jars (replacing the original jars that were there),

Raise exception whilst casting instead of defaulting to null

2023-04-05 Thread Yeachan Park
Hi all, The default behaviour of Spark is to add a null value for casts that fail, unless ANSI SQL is enabled, SPARK-30292 . Whilst I understand that this is a subset of ANSI compliant behaviour, I don't understand why this feature is so

How to check the liveness of a SparkSession

2023-01-19 Thread Yeachan Park
Hi all, We have a long running PySpark session running on client mode that occasionally dies. We'd like to check whether the session is still alive. One solution we came up with was checking whether the UI is still up, but we were wondering if there's maybe an easier way then that. Maybe

Re: Converting None/Null into json in pyspark

2022-10-04 Thread Yeachan Park
> but I couldn't find the exact snippet. Could you share a sample snippet for > the same how do I set that property. > > My step: > df = df.selectExpr(f'to_json(struct(*)) as json_data') > > On Tue, Oct 4, 2022 at 10:57 AM Yeachan Park wrote: > >> Hi, >&

Re: Converting None/Null into json in pyspark

2022-10-03 Thread Yeachan Park
Hi, There's a config option for this. Try setting this to false in your spark conf. spark.sql.jsonGenerator.ignoreNullFields On Tuesday, October 4, 2022, Karthick Nk wrote: > Hi all, > > I need to convert pyspark dataframe into json . > > While converting , if all rows values are null/None

Filtering by job group in the Spark UI / API

2022-08-18 Thread Yeachan Park
Hi All, Is there a way that we can filter in all the jobs from the history server UI / in Spark's API based on the Job Group to which the job belongs to? Ideally we would like to supply a particular job group, and only see the jobs associated with that job group in the UI. Thanks, Yeachan

Reading snappy/lz4 compressed csv/json files

2022-07-05 Thread Yeachan Park
Hi all, We are trying to read csv/json files that have been snappy/lz4 compressed with spark. Files were compressed with the lz4 command line tool and the python snappy library. Both did not succeed, while other formats (bzip2 & gzip) worked fine. I've read in some places that the codec is not

[Spark Core]: Unexpectedly exiting executor while gracefully decommissioning

2022-04-22 Thread Yeachan Park
Hello all, we are running into some issues while attempting graceful decommissioning of executors. We are running spark-thriftserver (3.2.0) on Kubernetes (GKE 1.20.15-gke.2500). We enabled: - spark.decommission.enabled - spark.storage.decommission.rddBlocks.enabled -

[Spark Core]: Unexpectedly exiting executor while gracefully decommissioning

2022-04-21 Thread Yeachan Park
Hello all, we are running into some issues while attempting graceful decommissioning of executors. We are running spark-thriftserver (3.2.0) on Kubernetes (GKE 1.20.15-gke.2500). We enabled: - spark.decommission.enabled - spark.storage.decommission.rddBlocks.enabled -