Thanks for the follow up.
> Data will be in sync because it's stored in Ignite cache. IgniteRDD uses > Ignite API to update it and you can do this as well in your code. > > There is no copy of the data maintained in Spark, it's always stored in > Ignite caches. Spark runs Ignite client(s) that can fetch the data for > computation, but it doesn't store it. I think I missed on clarifying what I wanted to say in my earlier comment. When I earlier said that "I will have to discard the spark rdd/dataset/dataframe every time the data is updated in ignite through the Ignite API" what I also meant was I could not cache the dataset in spark's memory for future transformations (using dataset.cache() spark api) because if the ignite cache gets updated simultaneously by another user, my dataset in spark would be stale. This happens because spark acts as an ignite client and fetches the data instead of a tight integration where in it (spark) could have worked with the same copy of data on the ignite server. If what I have understood is true I wanted to confirm that behavior is no different when ignite runs in embedded mode with spark. Kindly let me know. -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Apache-Spark-Ignite-Integration-tp8556p9121.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
