Re: how to refresh the loaded non-streaming dataframe for each steaming batch ?

2019-09-06 Thread David Zhou
Not yet. Learning spark On Fri, Sep 6, 2019 at 2:17 PM Shyam P wrote: > cool ,but did you find a way or anyhelp or clue ? > > On Fri, Sep 6, 2019 at 11:40 PM David Zhou wrote: > >> I have the same question with yours >> >> On Thu, Sep 5, 2019 at 9:18 PM Shyam P

Question on streaming job wait and re-run

2019-09-06 Thread David Zhou
Hi, My streaming job consumes data from kafka and writes them into Cassandra. Current status: Cassandra is not stable. Streaming job crashed when it can't write data into Cassandra. Streaming job has check point. Usually, the Cassandra cluster will come back in 4 hours. Finally, I start the

Re: how to refresh the loaded non-streaming dataframe for each steaming batch ?

2019-09-06 Thread David Zhou
I have the same question with yours On Thu, Sep 5, 2019 at 9:18 PM Shyam P wrote: > Hi, > > I am using spark-sql-2.4.1v to streaming in my PoC. > > how to refresh the loaded dataframe from hdfs/cassandra table every time > new batch of stream processed ? What is the practice followed in general

Re: Start point to read source codes

2019-09-05 Thread David Zhou
Hi Hichame, Thanks a lot. I forked it. There are lots of codes. Need documents to guide me which part I should start from. On Thu, Sep 5, 2019 at 1:30 PM Hichame El Khalfi wrote: > Hey David, > > You can the source code on GitHub: > https://github.com/apache/spark > > Hope this helps, > >