I am struggling in trying to read data in kafka and save them to parquet
file on hdfs by using spark streaming according to this post
My code is similar to following
val df = spark
df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
What the difference is I am writing in Java language.
But in practice, this code just run once and then exit gracefully. Although
it produces the parquet file successfully and no any exception is threw out
, it runs like a normal spark program rather than a spark streaming program.
What should I do if want to read kafka and save them to parquet in batch?