Hi, can you please take a screen shot and show us the number of records that the streaming programme is reading from the source? If I am not mistaken it should be able to write out records to the output location every 5 mins.
Also, it may be of help to check whether you have permissions to write to the output location? Thanks and Regards, Gourav Sengupta On Fri, Apr 22, 2022 at 3:57 PM hsy...@gmail.com <hsy...@gmail.com> wrote: > Hello all, > > I’m just trying to build a pipeline reading data from a streaming source > and write to orc file. But I don’t see any file that is written to the > file system nor any exceptions > > Here is an example > > val df = spark.readStream.format(“...") > .option( > “Topic", > "Some topic" > ) > .load() > val q = df.writeStream.format("orc").option("path", > "gs://testdata/raw") > .option("checkpointLocation", > "gs://testdata/raw_chk").trigger(Trigger.ProcessingTime(5, > TimeUnit.SECONDS)).start > q.awaitTermination(1200000) > q.stop() > > > I couldn’t find any file until 1200 seconds are over > Does it mean all the data is cached in memory. If I keep the pipeline > running I see no file would be flushed in the file system. > > How do I control how often spark streaming write to disk? > > Thanks! >