Hello, I have the following scenario and was wondering if I can use Spark to address it.
I want to query two different data stores (say, ElasticSearch and MySQL) and then merge the two result sets based on a join key between the two. Is it appropriate to use Spark to do this join, if the intermediate data sets are large? (This is a No-ETL scenario) I was thinking of two possibilities - 1) Send the intermediate data sets to Spark through a stream and get Spark to do the join. The complexity here is that there would be multiple concurrent streams to deal with. If I don't use streams, there would be intermediate disk writes and data transfer to the Spark master. 2) Don't use Spark and do the same with some in-memory distributed engine like MemSQL or Redis. What's the experts' view on this? Regards, Ashish