Hi Kant, If I understand your questions properly, you are looking for incremental queries.
Drill supports predicates pushed down with most of the Data sources. In your case, suppose you are generating hourly partitions in HDFS using Spark aplication. Then Drill is optmized to scan specific partition based on query predicates(by using partition pruning) like for example https://issues.apache.org/jira/browse/DRILL-3121. But Drill will not manage any checkpointing. So If BI/Dashboards tools like Tableau etc can support this checkpointing then it's possible to connect with Drill incrementally. Coming to latest Kafka storage plugin, In first version we are targetting to support batch, I mean, at query time it will fetch all the messages from start to end offsets for each topic partition and processes the data. Currently it will support JSON and in next version we are targetting for Avro support with schema registry. We are also discussing on fiseability for metioning start and end offsset ranges, so that we can acheive incremental support by managing checkpoining externally. Thanks, B Anil Kumar. Thanks & Regards, B Anil Kumar. On Thu, Nov 9, 2017 at 11:14 AM, kant kodali <[email protected]> wrote: > Can someone elaborate on what happens underneath if I poll every second > (Specifically related to my questions in my previous email)? > > Thanks! > > On Thu, Nov 9, 2017 at 7:56 AM, Ted Dunning <[email protected]> wrote: > > > Confluent has a non-Apache product, I think, for streaming SQL. > > > > > > On Thu, Nov 9, 2017 at 4:50 PM, Saurabh Mahapatra <[email protected]> > > wrote: > > > > > Isn't there the new Kafka plugin? What does that exactly do? > > > > > > Best, > > > Saurabh > > > > > > Sent from my iPhone > > > > > > > > > > > > > On Nov 9, 2017, at 5:15 AM, kant kodali <[email protected]> wrote: > > > > > > > > Hi Tug, > > > > > > > > It's Parquet data on HDFS and the data to HDFS is constantly written > by > > > > spark while consuming from Kafka. > > > > > > > > Is polling a common technique for say real time analytics dashboard ? > > > More > > > > importantly if I poll does Drill due the scan every time? if the > answer > > > is > > > > no, how does it know which is the new data? since the data is written > > > HDFS > > > > constantly as a stream (The query can be the same however the new > data > > > will > > > > be appended or updated to HDFS in parquet format as a stream). > > > > > > > > Thanks! > > > > > > > >> On Thu, Nov 9, 2017 at 4:47 AM, Tugdual Grall <[email protected]> > > > wrote: > > > >> > > > >> Hello, > > > >> > > > >> > > > >> Today Drill cannot do continuous/streaming query, so as you > mentioned > > > you > > > >> will have to use a polling technique. > > > >> > > > >> > > > >> Just out of curiosity, Which data source are you planning to use ? > > > >> > > > >> Regards > > > >> Tug > > > >> > > > >> > > > >> > > > >> > > > >>> On Thu 9 Nov 2017 at 04:31, kant kodali <[email protected]> > wrote: > > > >>> > > > >>> Hi All, > > > >>> > > > >>> I am new to Apache Drill. I am wondering if Apache Drill can > perform > > > >>> Streaming Queries? For example, I have a constant stream of data in > > 24 > > > >> hour > > > >>> period and I would like to get updates as soon as I receive them. > > > >>> > > > >>> Do I need to have a polling thread that issues a Drill query every > > > >> second? > > > >>> > > > >>> Thanks! > > > >>> > > > >> > > > > > >
