Hi all,

We upgraded a cluster of three nodes previously using Spark *3.0.1 t*o the
new release of Spark *3.1.1. *The cluster is using RHES 7.6 with three
nodes using yarn etc

We tested a job that creates random data and writes to Hive 3.0. This
worked fine as before.

In version 3.0.1 we could do structured streaming with Kafka and write to
Google BigQuery table.* This was using PySpark*

We had an issue* running the same code on Google Dataproc which was built
on  Spark-3.1.1-RC2.. *This was not producing any results and was stuck on
BatchId = 0. I have reported this in this forum before a few days ago. Once
we upgraded our on-premise cluster today to 3.1.1, we ran the same code
using Spark 3.1.1 to populate Google BigQuery. I am pleased that this is
now working correctly!

The needed jar files for version 3.1.1 to read from Kafka and write to
BigQuery for 3.1.1 are as follows:

All under $SPARK_HOME/jars on all nodes. These are the latest available jar
files


   - commons-pool2-2.9.0.jar
   - spark-token-provider-kafka-0-10_2.12-3.1.0.jar
   - spark-sql-kafka-0-10_2.12-3.1.0.jar
   - kafka-clients-2.7.0.jar
   - spark-bigquery-latest_2.12.jar


Also the following are added to $SPARK_HOME/conf/spark-defaults.conf on all
nodes


spark.driver.extraClassPath        $SPARK_HOME/jars/*.jar

spark.executor.extraClassPath      $SPARK_HOME/jars/*.jar



HTH


LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*





*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.

Reply via email to