What might be the biggest factor affecting running time here is that
Drill's query execution is not fault tolerant while Spark's is. The
philosophy is different, Drill's says "when you're doing interactive
analytics and a node dies, killing your query as it goes, just run the
query again."
On 2022/04/07 16:11, Wes Peng wrote:
Hi Jacek,
Spark and Drill have no direct relations. But they have the similar
architecture.
If you read the book "Learning Apache Drill" (I guess it's free
online), chap 3 will give you Drill's SQL engine architecture:
It's quite similar to Spark's.
And the distributed implementation architecture is almost the same as
Spark:
Though they are separated products, but have the similar
implementation IMO.
No, I didn't use a statement optimized for Drill. It's just a common
SQL statement.
The reason for drill is faster, I think it's b/c drill's direct mmap
technology. It's more memory consumed than spark, so more faster.
Thanks.
Jacek Laskowski wrote:
Is this true that Drill is Spark or vice versa under the hood? If so,
how is it possible that Drill is faster? What does Drill do to make
the query faster? Could this be that you used a type of query Drill
is optimized for? Just guessing and am really curious (not implying
that one is better or worse than the other(s)).
---------------------------------------------------------------------
To unsubscribe e-mail: user-unsubscr...@spark.apache.org