Re: Apache Drill vs Spark SQL

Pierce Lamb Fri, 07 Apr 2017 10:41:33 -0700

Hi Kant,

If you are interested in using Spark alongside a database to serve real
time queries, there are many options. Almost every popular database has
built some sort of connector to Spark. I've listed a majority of them and
tried to delineate them in some way in this StackOverflow answer:


http://stackoverflow.com/a/39753976/3723346

As an employee of SnappyData <https://github.com/SnappyDataInc/snappydata>,
I'm biased toward it's solution in which Spark and the database are deeply
integrated and run on the same JVM. But there are many options depending on
your needs.

I'm not sure if the above link also answers your second question, but there
are two graph databases listed that connect to Spark as well.

Hope this helps,

Pierce

On Thu, Apr 6, 2017 at 10:34 PM, kant kodali <[email protected]> wrote:

> Hi All,
>
> I am very impressed with the work done on Spark SQL however when I have to
> pick something to serve real time queries I am in a dilemma for the
> following reasons.
>
> 1. Even though Spark Sql has logical plans, physical plans and run time
> code generation and all that it still doesn't look like the tool to serve
> real time queries like we normally do from a database. I tend to think this
> is because the queries had to go through job submission first. I don't want
> to call this overhead or anything but this is what it seems to do.
> comparing this, on the other hand we have the data that we want to serve
> sitting on a database where we simply issue an SQL query and get the
> response back so for this use case what would be an appropriate tool? I
> tend to think its Drill but would like to hear if there are any interesting
> arguments.
>
> 2. I can see a case for Spark SQL such as queries that need to be
> expressed in a iterative fashion. For example doing a graph traversal such
> BFS, DFS or say even a simple pre order, in order , post order Traversals
> on a BST. All this will be very hard to express on a Declarative syntax
> like SQL. I also tend to think Ad-hoc distributed joins (By Ad-hoc I mean
> one is not certain about their query patterns) are also better off
> expressing it in map-reduce style than say SQL unless one know their query
> patterns well ahead such that the possibility of queries that require
> redistribution is so low. I am also sure there are plenty of other cases
> where Spark SQL will excel but I wanted to see what is good choice to
> simple serve the data?
>
> Any suggestions are appreciated.
>
> Thanks!
>
>

Re: Apache Drill vs Spark SQL

Reply via email to