Can somebody else help Ramki with the details here?



On Tue, Sep 27, 2016 at 6:46 PM, ramki <[email protected]> wrote:

> Thanks for your response Ted. Further questions / thoughts inline.
>
> On Mon, Sep 26, 2016 at 8:30 PM, Ted Dunning <[email protected]>
> wrote:
>
> > On Mon, Sep 26, 2016 at 4:05 PM, ramki <[email protected]> wrote:
> >
> > > Below are some initial (+ some dumb) questions that I have.
> > >
> >
> > Dumb questions are rare.
> >
> > Naive questions are common and often quite valuable.
> >
> >
> > > 1) Does Drill have a way to represent the queries in JSON format? For
> > > example, "select ... where name = 'x' and age = 10 " can be written in
> > JSON
> > > as {name = 'x', age ='10' }. You can think of it like Mongo queries. If
> > > this is not already there, can we implement the same on Drill to expose
> > > both SQL & JSON represented way to query?
> > >
> >
> > Yes, but no.
> >
> > There is an interior representation of the logical query that can be
> > injected in JSON form, but it isn't as simple as the Mongo query
> language.
> > Of course, the Mongo query language isn't nearly as simple as it appears,
> > either.
> >
> > It wouldn't be hard to build something that converts a simple form of
> JSON
> > query into something suitable for Drill (this is a purposeful integration
> > point), but it doesn't quite exist just now.
> >
> >
> Any docs / code link for the internal JSON Structure? If I want to
> implement, where should I start?
>
>
> >
> > >
> > > 2) Does Drill have any expectations from the data-sources in any ways?
> > Can
> > > I plug any data-source to Drill by implementing the driver for it? Like
> > if
> > > I want to add support for ElasticSearch and CouchBase, is it easily
> > > possible?
> > >
> >
> > It is pretty easy if you don't allow any push-down or clever
> optimization.
> >
> > It isn't hugely harder if you support simple forms of push down.
> >
> > This is very doable for the cases you mention.
> >
> >
> Any sample data-source that I can use as a template?
>
>
> >
> > >
> > > 3) Does Drill have abilities to "stream" the results and so we can
> build
> > > some sort of pipelines? For example, Reactive Streams?
> > >
> >
> > Internally, Drill works as a streaming engine, but only in the sense of
> > data streaming, not in the sense of an engine like Flink or Apex that
> > supports checkpoints and event time.
> >
> > There is also a strong assumption that queries have a finite lifetime in
> > the memory management.
> >
> > There has been some talk about making Drill into a true streaming engine,
> > but I don't think that there has been much progress in that direction.
> >
>
> Even if the query result is streamed, that would be good enough for my
> use-cases. How can this be added?
>
>
> >
> > > 4) Are there any characterization of resource usage like CPU, memory...
> > > on data source containing over many tera-bytes data?
> > >
> >
> > I think it is impossible to answer this in general other than to say that
> > Drill will usually spill to disk when it can't keep everything in
> memory. A
> > number of production use cases for processing much larger amounts of data
> > than just terabytes with much smaller memory sets. There are also knobs
> to
> > turn that will strictly limit memory usage.
> >
> > But saying anything more specific than that is probably impossible unless
> > you can give specifics.
> >
> >
> I will probably get Drill up and see it with our loads.
>
>
> >
> > >
> > > 5) We can use Drill for querying only and not for ingestion, right?
> > >
> >
> > Yes and no.
> >
> > Drill has very good capabilities for [create tables as ...]. This can be
> > used to create files in directories and the directories can be queried by
> > drill, thus reflecting a growing dataset. This works really well for
> > ingestion that works in fairly substantial chunks.
> >
> > There is no support for replace or update, but I could be a bit out of
> date
> > on that.
> >
>
> I'm not sure if that is the direction I'm going. I might actually need ways
> to actually do insert into rather than create table as ... Will drop this
> requirement for now.
>
> For now, I would like to focus just on the JSON representation of a Query
> as its my immediate need.
>
> thanks much,
>
> -ramki
>

Reply via email to