Dear Ignite enthusiasts,

I am beginner in Apache Ingnite, but want to prototype solution for using
Ignite cashes with market data distributed across multiple nodes running
Spark RDD.

I'd like to be able to send sequenced (from 1) binary messages (size from 40
bytes to max 1 Kb) to custom Spark job processing multidimensional cube of
parameters. 
Each market data event must be processed once from #1 to #records for each
parameter. 
Number of messages ~40-50 M in one batch.

It would be great if you can share your experience with similar imp. 

My high level thinking:
* Prepare system by loading Ignite Cashe (unzipping market data drop-copy
file, converting to preferred binary format and publish IgniteCache<Long,
BinaryObject>;
* Spawn Spark job to process input cube of parameters (SparkRDD) each using
cashed the same IgniteCashe (accessed sequentially by sequence number from 1
- #messages as key);
* Store results in RDMS/NoSQL storage;
* Perform reports from Apache Zeppelin using Spark.R interpreter.

I need for Cache outlive Spark jobs i.e. may run different cube of
parameters after one is finished.

I am not sure if Ignite would be able to lookup messages efficiently (I'd
need ~400 Km/s sustained retrieval). 
Or should I consider something more file oriented e.g. use memory mounted
file system on each node ...

Thank in advance to share your ideas/proposals/know-how!




--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Market-data-binary-messages-processed-with-Ignite-and-Spark-tp8313.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Reply via email to