Re: Some questions on Drill.

Tugdual Grall Mon, 26 Sep 2016 08:29:33 -0700

Adding one small comment to the rich answer Ted just made

I have started to create a Couchbase storage plugin few weeks... Hmm months
back feel free to to continue the work (still lot of work todo )


https://github.com/tgrall/drill/tree/couchbase-storage-plugin

Tug

On Monday, 26 September 2016, Ted Dunning <[email protected]> wrote:

> On Mon, Sep 26, 2016 at 4:05 PM, ramki <[email protected] <javascript:;>>
> wrote:
>
> > Below are some initial (+ some dumb) questions that I have.
> >
>
> Dumb questions are rare.
>
> Naive questions are common and often quite valuable.
>
>
> > 1) Does Drill have a way to represent the queries in JSON format? For
> > example, "select ... where name = 'x' and age = 10 " can be written in
> JSON
> > as {name = 'x', age ='10' }. You can think of it like Mongo queries. If
> > this is not already there, can we implement the same on Drill to expose
> > both SQL & JSON represented way to query?
> >
>
> Yes, but no.
>
> There is an interior representation of the logical query that can be
> injected in JSON form, but it isn't as simple as the Mongo query language.
> Of course, the Mongo query language isn't nearly as simple as it appears,
> either.
>
> It wouldn't be hard to build something that converts a simple form of JSON
> query into something suitable for Drill (this is a purposeful integration
> point), but it doesn't quite exist just now.
>
>
>
> >
> > 2) Does Drill have any expectations from the data-sources in any ways?
> Can
> > I plug any data-source to Drill by implementing the driver for it? Like
> if
> > I want to add support for ElasticSearch and CouchBase, is it easily
> > possible?
> >
>
> It is pretty easy if you don't allow any push-down or clever optimization.
>
> It isn't hugely harder if you support simple forms of push down.
>
> This is very doable for the cases you mention.
>
>
> >
> > 3) Does Drill have abilities to "stream" the results and so we can build
> > some sort of pipelines? For example, Reactive Streams?
> >
>
> Internally, Drill works as a streaming engine, but only in the sense of
> data streaming, not in the sense of an engine like Flink or Apex that
> supports checkpoints and event time.
>
> There is also a strong assumption that queries have a finite lifetime in
> the memory management.
>
> There has been some talk about making Drill into a true streaming engine,
> but I don't think that there has been much progress in that direction.
>
>
> > 4) Are there any characterization of resource usage like CPU, memory...
> > on data source containing over many tera-bytes data?
> >
>
> I think it is impossible to answer this in general other than to say that
> Drill will usually spill to disk when it can't keep everything in memory. A
> number of production use cases for processing much larger amounts of data
> than just terabytes with much smaller memory sets. There are also knobs to
> turn that will strictly limit memory usage.
>
> But saying anything more specific than that is probably impossible unless
> you can give specifics.
>
>
>
> >
> > 5) We can use Drill for querying only and not for ingestion, right?
> >
>
> Yes and no.
>
> Drill has very good capabilities for [create tables as ...]. This can be
> used to create files in directories and the directories can be queried by
> drill, thus reflecting a growing dataset. This works really well for
> ingestion that works in fairly substantial chunks.
>
> There is no support for replace or update, but I could be a bit out of date
> on that.
>

Re: Some questions on Drill.

Reply via email to