My company provides big data analytics for large banks (managing and analyzing their loan portfolios). We have a number of applications that are fundamentally grid-based, but which tend to use different frameworks to handle grid computation. We are considering shifting these to a common Hadoop stack to consolidate infrastructure and provide a more uniform way of managing our services, as well as providing more options for different classes of analytics (MR, Streaming, etc.).
One of these applications seems to be a good fit for Ignite (lots of concurrent low-latency queries against a massive but highly-partitionable dataset) and, possibly, Spark (distributed batch computing). It's the latter I'm uncertain about. I understand the general concept of the IgniteRDD as a bridge to a distributed Ignite cache (or set of them), but do I give anything up by deploying our app as a Spark job, vs. a custom YARN app that hosts Ignite nodes? I'm specifically looking at implications for: - affinity (both Data with Data and Compute with Data) - advanced SQL queries (cross-cache joins, aggregations, etc) - persistence (warm-up, write-through) - transactions If I can still have all the benefits of IgniteCache while going the virtual RDD, then Spark seems a good fit, but I want to be clear on any limitations that such an abstraction might impose. Appreciate any guidance here. -- View this message in context: http://apache-ignite-users.70518.x6.nabble.com/Downsides-of-Spark-Ignite-for-extended-cache-management-and-access-tp5154.html Sent from the Apache Ignite Users mailing list archive at Nabble.com.
