Spark streaming connecting to two kafka clusters

2018-07-17 Thread Sathi Chowdhury
Hi,My question is about ability to integrate spark streaming with multiple clusters.Is it a supported use case. An example of that is that two topics owned by different group and they have their own kakka infra .Can i have two dataframes as a result of spark.readstream listening to different

Re: Dataset - withColumn and withColumnRenamed that accept Column type

2018-07-17 Thread Sathi Chowdhury
Hi,My question is about ability to integrate spark streaming with multiple clusters.Is it a supported use case. An example of that is that two topics owned by different group and they have their own kakka infra .Can i have two dataframes as a result of spark.readstream listening to different

Re: Tune hive query launched thru spark-yarn job.

2019-09-05 Thread Sathi Chowdhury
What I can immediately think of is,  as you are doing IN in the where clause for a series of timestamps, if you can consider breaking them and for each epoch timestampYou can load your results to an intermediate staging table and then do a final aggregate from that table keeping the group by