I am trying to use PySpark to read a Kafka stream and then write it to Redis. However, PySpark does not have support for a ForEach sink. So, I am thinking of reading the Kafka stream into a DataFrame in Python and then sending that DataFrame into a Scala application to be written to Redis. Is there a way to be able to do this? All I have found is to extract JVM instance from SparkSession and doing something like this:
spark.sparkContext._jvm.com.application.writeToRedis(df._jdf) Is this the correct approach?