Hi All, I am very impressed with the work done on Spark SQL however when I have to pick something to serve real time queries I am in a dilemma for the following reasons.
1. Even though Spark Sql has logical plans, physical plans and run time code generation and all that it still doesn't look like the tool to serve real time queries like we normally do from a database. I tend to think this is because the queries had to go through job submission first. I don't want to call this overhead or anything but this is what it seems to do. comparing this, on the other hand we have the data that we want to serve sitting on a database where we simply issue an SQL query and get the response back so for this use case what would be an appropriate tool? I tend to think its Drill but would like to hear if there are any interesting arguments. 2. I can see a case for Spark SQL such as queries that need to be expressed in a iterative fashion. For example doing a graph traversal such BFS, DFS or say even a simple pre order, in order , post order Traversals on a BST. All this will be very hard to express on a Declarative syntax like SQL. I also tend to think Ad-hoc distributed joins (By Ad-hoc I mean one is not certain about their query patterns) are also better off expressing it in map-reduce style than say SQL unless one know their query patterns well ahead such that the possibility of queries that require redistribution is so low. I am also sure there are plenty of other cases where Spark SQL will excel but I wanted to see what is good choice to simple serve the data? Any suggestions are appreciated. Thanks!