Re: Apache Spark & Ignite Integration

pragmaticbigdata Thu, 17 Nov 2016 05:50:26 -0800

Appreciate your follow ups.

a. " Data is stored in Ignite and Spark will fetch data for a particular
partition when you execute something." Does IgniteRDD (i.e. Spark) fetch the
data to the closest Spark node that probably resides on the same server? One
of the earlier responses mention that this is done when new entries are
added to the cache.


a1. Could you please detail on how #a is achieved? I looked at the
IgniteRDD.compute() method implementation which creates a ScanQuery and
makes a call to the affinity api but I didn't follow how does the code
search for the closest ignite node?

b. For igniteRDD.sql() query execution, it seems that the behavior and hence
the performance would be same as executing a sql query on the IgniteCache
from an ignite client node. Is my understanding right? I follow the fact
that the performance would be better when compared to a similar spark SQL
query because of the in-memory indexes.

c. How can I take the advantage of Ignite's ACID transaction support when
doing the data processing in spark? Based on one of the earlier points the
code flow would look like
         val sharedRDD1: IgniteRDD[Int,Int] = ic.fromCache("partitioned")
         val sharedRDD2: IgniteRDD[Int,Int] = ic.fromCache("anotherCache")

         Transaction tx = Ignition.ignite().transactions().txStart()
         sharedRDD1.savePairs(...);
         sharedRDD2.savePairs(...);
         tx.commit()

Is my understanding of the flow correct? If so, how do I maintain
transaction isolation when other spark jobs try to read the data from ignite
in parallel?

Thanks.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Apache-Spark-Ignite-Integration-tp8556p9047.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Re: Apache Spark & Ignite Integration

Reply via email to