Re: [Spark SQL] spark.sql insert overwrite on existing partition not updating hive metastore partition transient_lastddltime and column_stats

2025-05-02 Thread Sathi Chowdhury
I think it is not happening because it is a ddl time and upsert operation does not recreate the partition. It is just a dml statement.  Sent from Yahoo Mail for iPhone On Friday, May 2, 2025, 7:53 AM, Pradeep wrote: I have a partitioned hive external table as belowscala> spark.sql("describe

Re: BUG :: UI Spark

2024-05-26 Thread Sathi Chowdhury
Can you please explain how did you realize it’s wrong? Did you check cloudwatch for the same metrics and compare? Also are you using do.cache() and expecting that shuffle read/write to go away ? Sent from Yahoo Mail for iPhone On Sunday, May 26, 2024, 7:53 AM, Prem Sahoo wrote: Can anyone p

Re: Tune hive query launched thru spark-yarn job.

2019-09-05 Thread Sathi Chowdhury
What I can immediately think of is,  as you are doing IN in the where clause for a series of timestamps, if you can consider breaking them and for each epoch timestampYou can load your results to an intermediate staging table and then do a final aggregate from that table keeping the group by sam

Spark streaming connecting to two kafka clusters

2018-07-17 Thread Sathi Chowdhury
Hi,My question is about ability to integrate spark streaming with multiple clusters.Is it a supported use case. An example of that is that two topics owned by different group and they have their own kakka infra .Can i have two dataframes as a result of spark.readstream listening to different kaf

Re: Dataset - withColumn and withColumnRenamed that accept Column type

2018-07-17 Thread Sathi Chowdhury
Hi,My question is about ability to integrate spark streaming with multiple clusters.Is it a supported use case. An example of that is that two topics owned by different group and they have their own kakka infra .Can i have two dataframes as a result of spark.readstream listening to different kaf