Hi All,
I am using Spark 3.0.1 Structuring streaming with Pyspark.
The problem is spark is running only 1 executor with 1 task. Following is
the summary of what I am doing.
Can anyone help on why my executor is 1 only?
def process_events(event):
fetch_actual_data()
#many more steps
def fetch_actual_data():
#applying operation on actual data
df = spark.readStream.format("kafka") \
.option("kafka.bootstrap.servers", KAFKA_URL) \
.option("subscribe", KAFKA_TOPICS) \
.option("startingOffsets",
START_OFFSET).load() .selectExpr("CAST(value AS STRING)")
query = df.writeStream.foreach(process_events).option("checkpointLocation",
"/opt/checkpoint").trigger(processingTime="30 seconds").start()
Kind Regards,
Sachit Murarka