Spark Streaming + Kafka + Hive: delayed

toletum Wed, 20 Sep 2017 04:08:57 -0700

Hello.

I have a process (python) that reads a kafka queue, for each record it checks 
in a table.


# Load table in memory
table=sqlContext.sql("select id from table")
table.cache()

kafkaTopic.foreachRDD(processForeach)

def processForeach (time, rdd):
 print(time)
 for k in rdd.collect ():
 if (table.filter("id =' %s'" % k["id"]).count()>0):
 print (k)

The problem is that little by little spark time is lagging behind, I can see it 
in the "print(time)" output. the kafka topic with a maximum of 3 messages per 
second.

Spark Streaming + Kafka + Hive: delayed

Reply via email to