Thanks for the follow up.

> Data will be in sync because it's stored in Ignite cache. IgniteRDD uses
> Ignite API to update it and you can do this as well in your code. 
> 
> There is no copy of the data maintained in Spark, it's always stored in
> Ignite caches. Spark runs Ignite client(s) that can fetch the data for
> computation, but it doesn't store it. 

I think I missed on clarifying what I wanted to say in my earlier comment.
When I earlier said that "I will have to discard the spark
rdd/dataset/dataframe every time the data is updated in ignite through the
Ignite API" what I also meant was I could not cache the dataset in spark's
memory for future transformations (using dataset.cache() spark api) because
if the ignite cache gets updated simultaneously by another user, my dataset
in spark would be stale. This happens because spark acts as an ignite client
and fetches the data instead of a tight integration where in it (spark)
could have worked with the same copy of data on the ignite server. 

If what I have understood is true I wanted to confirm that behavior is no
different when ignite runs in embedded mode with spark. Kindly let me know.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Apache-Spark-Ignite-Integration-tp8556p9121.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to