Sure.

1. The first diagram is for understanding the data visibility aspect of the
spark integration. Given that a cache exists on the ignite node, spark tries
to create a data frame from the IgniteRDD and perform an action (df.show())
on it. Concurrently if there are changes made to the cache (either by
another spark application or by another application using Ignite API) on the
ignite node, the question is would spark worker be able to see those
changes? My understanding based on our discussion so far is that the
df.show() action would not display the latest changes in the cache since the
underlying IgniteRDD might be updated but the dataframe is another layer
about it.

2. The second diagram is to understand the locking and the concurrency
behavior with the spark integration. Given that a cache exists on the ignite
node, spark tries to create a data frame from the IgniteRDD and add a new
column to the data (in the diagram, the email column). Concurrently if there
are changes made to the cache (either by another spark application or by
another application using Ignite API) on the ignite node, the question is 
a. What happens when spark tries to persist the RDD back to the ignite cache
through the saveRDD() api? Would the changes made previously to the ignite
cache be lost? 
b. What is the locking behavior when updating the ignite cache? Would it
lock all the partitions  of the cache preventing read/write access to the
cache or can ignite determine the partitions that are going to be updated
and lock only those?

Thanks.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Apache-Spark-Ignite-Integration-tp8556p9502.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to