Can somebody else help Ramki with the details here?
On Tue, Sep 27, 2016 at 6:46 PM, ramki <[email protected]> wrote: > Thanks for your response Ted. Further questions / thoughts inline. > > On Mon, Sep 26, 2016 at 8:30 PM, Ted Dunning <[email protected]> > wrote: > > > On Mon, Sep 26, 2016 at 4:05 PM, ramki <[email protected]> wrote: > > > > > Below are some initial (+ some dumb) questions that I have. > > > > > > > Dumb questions are rare. > > > > Naive questions are common and often quite valuable. > > > > > > > 1) Does Drill have a way to represent the queries in JSON format? For > > > example, "select ... where name = 'x' and age = 10 " can be written in > > JSON > > > as {name = 'x', age ='10' }. You can think of it like Mongo queries. If > > > this is not already there, can we implement the same on Drill to expose > > > both SQL & JSON represented way to query? > > > > > > > Yes, but no. > > > > There is an interior representation of the logical query that can be > > injected in JSON form, but it isn't as simple as the Mongo query > language. > > Of course, the Mongo query language isn't nearly as simple as it appears, > > either. > > > > It wouldn't be hard to build something that converts a simple form of > JSON > > query into something suitable for Drill (this is a purposeful integration > > point), but it doesn't quite exist just now. > > > > > Any docs / code link for the internal JSON Structure? If I want to > implement, where should I start? > > > > > > > > > > 2) Does Drill have any expectations from the data-sources in any ways? > > Can > > > I plug any data-source to Drill by implementing the driver for it? Like > > if > > > I want to add support for ElasticSearch and CouchBase, is it easily > > > possible? > > > > > > > It is pretty easy if you don't allow any push-down or clever > optimization. > > > > It isn't hugely harder if you support simple forms of push down. > > > > This is very doable for the cases you mention. > > > > > Any sample data-source that I can use as a template? > > > > > > > > > > 3) Does Drill have abilities to "stream" the results and so we can > build > > > some sort of pipelines? For example, Reactive Streams? > > > > > > > Internally, Drill works as a streaming engine, but only in the sense of > > data streaming, not in the sense of an engine like Flink or Apex that > > supports checkpoints and event time. > > > > There is also a strong assumption that queries have a finite lifetime in > > the memory management. > > > > There has been some talk about making Drill into a true streaming engine, > > but I don't think that there has been much progress in that direction. > > > > Even if the query result is streamed, that would be good enough for my > use-cases. How can this be added? > > > > > > > 4) Are there any characterization of resource usage like CPU, memory... > > > on data source containing over many tera-bytes data? > > > > > > > I think it is impossible to answer this in general other than to say that > > Drill will usually spill to disk when it can't keep everything in > memory. A > > number of production use cases for processing much larger amounts of data > > than just terabytes with much smaller memory sets. There are also knobs > to > > turn that will strictly limit memory usage. > > > > But saying anything more specific than that is probably impossible unless > > you can give specifics. > > > > > I will probably get Drill up and see it with our loads. > > > > > > > > > > 5) We can use Drill for querying only and not for ingestion, right? > > > > > > > Yes and no. > > > > Drill has very good capabilities for [create tables as ...]. This can be > > used to create files in directories and the directories can be queried by > > drill, thus reflecting a growing dataset. This works really well for > > ingestion that works in fairly substantial chunks. > > > > There is no support for replace or update, but I could be a bit out of > date > > on that. > > > > I'm not sure if that is the direction I'm going. I might actually need ways > to actually do insert into rather than create table as ... Will drop this > requirement for now. > > For now, I would like to focus just on the JSON representation of a Query > as its my immediate need. > > thanks much, > > -ramki >
