Re: Can Apache Drill perform streaming queries?

AnilKumar B Thu, 09 Nov 2017 13:11:39 -0800

You are correct Kant.

It will be great, If you can raise a JIRA for discussing *feasibility* of
incremental query support for Drill. Because, I can also see this is a very
good requirement for plugins like Kafka, HBase and Cassandra and thanks for
asking this question.


Thanks & Regards,
B Anil Kumar.

On Thu, Nov 9, 2017 at 12:45 PM, kant kodali <[email protected]> wrote:

> HI Anil,
>
> Thanks a lot for your response and look like I am indeed looking for
> incremental queries. so if I have a thread that polls every second to get
> the latest updates I just have to change partition values to minimize the
> scans right?
>
> Also I guess I can build some notification mechanism in case if my older
> partitions have an update.
>
> Thanks!
>
>
>
>
> On Thu, Nov 9, 2017 at 11:58 AM, AnilKumar B <[email protected]>
> wrote:
>
> > Hi Kant,
> >
> > If I understand your questions properly, you are looking for incremental
> > queries.
> >
> > Drill supports predicates pushed down with most of the Data sources. In
> > your case, suppose you are generating hourly partitions in HDFS using
> Spark
> > aplication. Then Drill is optmized to scan specific partition based on
> > query predicates(by using partition pruning) like for example
> > https://issues.apache.org/jira/browse/DRILL-3121.
> >
> > But Drill will not manage any checkpointing. So If BI/Dashboards tools
> like
> > Tableau etc can support this checkpointing then it's possible to connect
> > with Drill incrementally.
> >
> > Coming to latest Kafka storage plugin, In first version we are targetting
> > to support batch, I mean, at query time it will fetch all the messages
> from
> > start to end offsets for each topic partition and processes the data.
> > Currently it will support JSON and in next version we are targetting for
> > Avro support with schema registry. We are also discussing on fiseability
> > for metioning start and end offsset ranges, so that we can acheive
> > incremental support by managing checkpoining externally.
> >
> > Thanks,
> > B Anil Kumar.
> >
> > Thanks & Regards,
> > B Anil Kumar.
> >
> > On Thu, Nov 9, 2017 at 11:14 AM, kant kodali <[email protected]> wrote:
> >
> > > Can someone elaborate on what happens underneath if I poll every second
> > > (Specifically related to my questions in my previous email)?
> > >
> > > Thanks!
> > >
> > > On Thu, Nov 9, 2017 at 7:56 AM, Ted Dunning <[email protected]>
> > wrote:
> > >
> > > > Confluent has a non-Apache product, I think, for streaming SQL.
> > > >
> > > >
> > > > On Thu, Nov 9, 2017 at 4:50 PM, Saurabh Mahapatra <
> [email protected]
> > >
> > > > wrote:
> > > >
> > > > > Isn't there the new Kafka plugin? What does that exactly do?
> > > > >
> > > > > Best,
> > > > > Saurabh
> > > > >
> > > > > Sent from my iPhone
> > > > >
> > > > >
> > > > >
> > > > > > On Nov 9, 2017, at 5:15 AM, kant kodali <[email protected]>
> > wrote:
> > > > > >
> > > > > > Hi Tug,
> > > > > >
> > > > > > It's Parquet data on HDFS and the data to HDFS is constantly
> > written
> > > by
> > > > > > spark while consuming from Kafka.
> > > > > >
> > > > > > Is polling a common technique for say real time analytics
> > dashboard ?
> > > > > More
> > > > > > importantly if I poll does Drill due the scan every time? if the
> > > answer
> > > > > is
> > > > > > no, how does it know which is the new data? since the data is
> > written
> > > > > HDFS
> > > > > > constantly as a stream (The query can be the same however the new
> > > data
> > > > > will
> > > > > > be appended or updated to HDFS in parquet format as a stream).
> > > > > >
> > > > > > Thanks!
> > > > > >
> > > > > >> On Thu, Nov 9, 2017 at 4:47 AM, Tugdual Grall <
> [email protected]>
> > > > > wrote:
> > > > > >>
> > > > > >> Hello,
> > > > > >>
> > > > > >>
> > > > > >> Today Drill cannot do continuous/streaming query, so as you
> > > mentioned
> > > > > you
> > > > > >> will have to use a polling technique.
> > > > > >>
> > > > > >>
> > > > > >> Just out of curiosity, Which data source are you planning to
> use ?
> > > > > >>
> > > > > >> Regards
> > > > > >> Tug
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>
> > > > > >>> On Thu 9 Nov 2017 at 04:31, kant kodali <[email protected]>
> > > wrote:
> > > > > >>>
> > > > > >>> Hi All,
> > > > > >>>
> > > > > >>> I am new to Apache Drill. I am wondering if Apache Drill can
> > > perform
> > > > > >>> Streaming Queries? For example, I have a constant stream of
> data
> > in
> > > > 24
> > > > > >> hour
> > > > > >>> period and I would like to get updates as soon as I receive
> them.
> > > > > >>>
> > > > > >>> Do I need to have a polling thread that issues a Drill query
> > every
> > > > > >> second?
> > > > > >>>
> > > > > >>> Thanks!
> > > > > >>>
> > > > > >>
> > > > >
> > > >
> > >
> >
>

Re: Can Apache Drill perform streaming queries?

Reply via email to