Downsides of Spark-Ignite for extended cache management and access?

phalverson Tue, 24 May 2016 14:56:07 -0700

My company provides big data analytics for large banks (managing and
analyzing their loan portfolios). We have a number of applications that are
fundamentally grid-based, but which tend to use different frameworks to
handle grid computation. We are considering shifting these to a common
Hadoop stack to consolidate infrastructure and provide a more uniform way of
managing our services, as well as providing more options for different
classes of analytics (MR, Streaming, etc.).


One of these applications seems  to be a good fit for Ignite (lots of
concurrent low-latency queries against a massive but highly-partitionable
dataset) and, possibly, Spark (distributed batch computing). It's the latter
I'm uncertain about. I understand the general concept of the IgniteRDD as a
bridge to a distributed Ignite cache (or set of them), but do I give
anything up by deploying our app as a Spark job, vs. a custom YARN app that
hosts Ignite nodes?  I'm specifically looking at implications for:

  - affinity (both Data with Data and Compute with Data)
  - advanced SQL queries (cross-cache joins, aggregations, etc)
  - persistence (warm-up, write-through)
  - transactions

If I can still have all the benefits of IgniteCache while going the virtual
RDD, then Spark seems a good fit, but I want to be clear on any limitations
that such an abstraction might impose. Appreciate any guidance here.



--
View this message in context: 
http://apache-ignite-users.70518.x6.nabble.com/Downsides-of-Spark-Ignite-for-extended-cache-management-and-access-tp5154.html
Sent from the Apache Ignite Users mailing list archive at Nabble.com.

Downsides of Spark-Ignite for extended cache management and access?

Reply via email to