Re: Can Apache Drill perform streaming queries?

AnilKumar B Thu, 09 Nov 2017 12:15:47 -0800

Hi Kant,

If I understand your questions properly, you are looking for incremental
queries.

Drill supports predicates pushed down with most of the Data sources. In
your case, suppose you are generating hourly partitions in HDFS using Spark
aplication. Then Drill is optmized to scan specific partition based on
query predicates(by using partition pruning) like for example
https://issues.apache.org/jira/browse/DRILL-3121.

But Drill will not manage any checkpointing. So If BI/Dashboards tools like
Tableau etc can support this checkpointing then it's possible to connect
with Drill incrementally.

Coming to latest Kafka storage plugin, In first version we are targetting
to support batch, I mean, at query time it will fetch all the messages from
start to end offsets for each topic partition and processes the data.
Currently it will support JSON and in next version we are targetting for
Avro support with schema registry. We are also discussing on fiseability
for metioning start and end offsset ranges, so that we can acheive
incremental support by managing checkpoining externally.

Thanks,
B Anil Kumar.

Thanks & Regards,
B Anil Kumar.

On Thu, Nov 9, 2017 at 11:14 AM, kant kodali <[email protected]> wrote:

> Can someone elaborate on what happens underneath if I poll every second
> (Specifically related to my questions in my previous email)?
>
> Thanks!
>
> On Thu, Nov 9, 2017 at 7:56 AM, Ted Dunning <[email protected]> wrote:
>
> > Confluent has a non-Apache product, I think, for streaming SQL.
> >
> >
> > On Thu, Nov 9, 2017 at 4:50 PM, Saurabh Mahapatra <[email protected]>
> > wrote:
> >
> > > Isn't there the new Kafka plugin? What does that exactly do?
> > >
> > > Best,
> > > Saurabh
> > >
> > > Sent from my iPhone
> > >
> > >
> > >
> > > > On Nov 9, 2017, at 5:15 AM, kant kodali <[email protected]> wrote:
> > > >
> > > > Hi Tug,
> > > >
> > > > It's Parquet data on HDFS and the data to HDFS is constantly written
> by
> > > > spark while consuming from Kafka.
> > > >
> > > > Is polling a common technique for say real time analytics dashboard ?
> > > More
> > > > importantly if I poll does Drill due the scan every time? if the
> answer
> > > is
> > > > no, how does it know which is the new data? since the data is written
> > > HDFS
> > > > constantly as a stream (The query can be the same however the new
> data
> > > will
> > > > be appended or updated to HDFS in parquet format as a stream).
> > > >
> > > > Thanks!
> > > >
> > > >> On Thu, Nov 9, 2017 at 4:47 AM, Tugdual Grall <[email protected]>
> > > wrote:
> > > >>
> > > >> Hello,
> > > >>
> > > >>
> > > >> Today Drill cannot do continuous/streaming query, so as you
> mentioned
> > > you
> > > >> will have to use a polling technique.
> > > >>
> > > >>
> > > >> Just out of curiosity, Which data source are you planning to use ?
> > > >>
> > > >> Regards
> > > >> Tug
> > > >>
> > > >>
> > > >>
> > > >>
> > > >>> On Thu 9 Nov 2017 at 04:31, kant kodali <[email protected]>
> wrote:
> > > >>>
> > > >>> Hi All,
> > > >>>
> > > >>> I am new to Apache Drill. I am wondering if Apache Drill can
> perform
> > > >>> Streaming Queries? For example, I have a constant stream of data in
> > 24
> > > >> hour
> > > >>> period and I would like to get updates as soon as I receive them.
> > > >>>
> > > >>> Do I need to have a polling thread that issues a Drill query every
> > > >> second?
> > > >>>
> > > >>> Thanks!
> > > >>>
> > > >>
> > >
> >
>

Re: Can Apache Drill perform streaming queries?

Reply via email to