You are correct Kant. It will be great, If you can raise a JIRA for discussing *feasibility* of incremental query support for Drill. Because, I can also see this is a very good requirement for plugins like Kafka, HBase and Cassandra and thanks for asking this question.
Thanks & Regards, B Anil Kumar. On Thu, Nov 9, 2017 at 12:45 PM, kant kodali <[email protected]> wrote: > HI Anil, > > Thanks a lot for your response and look like I am indeed looking for > incremental queries. so if I have a thread that polls every second to get > the latest updates I just have to change partition values to minimize the > scans right? > > Also I guess I can build some notification mechanism in case if my older > partitions have an update. > > Thanks! > > > > > On Thu, Nov 9, 2017 at 11:58 AM, AnilKumar B <[email protected]> > wrote: > > > Hi Kant, > > > > If I understand your questions properly, you are looking for incremental > > queries. > > > > Drill supports predicates pushed down with most of the Data sources. In > > your case, suppose you are generating hourly partitions in HDFS using > Spark > > aplication. Then Drill is optmized to scan specific partition based on > > query predicates(by using partition pruning) like for example > > https://issues.apache.org/jira/browse/DRILL-3121. > > > > But Drill will not manage any checkpointing. So If BI/Dashboards tools > like > > Tableau etc can support this checkpointing then it's possible to connect > > with Drill incrementally. > > > > Coming to latest Kafka storage plugin, In first version we are targetting > > to support batch, I mean, at query time it will fetch all the messages > from > > start to end offsets for each topic partition and processes the data. > > Currently it will support JSON and in next version we are targetting for > > Avro support with schema registry. We are also discussing on fiseability > > for metioning start and end offsset ranges, so that we can acheive > > incremental support by managing checkpoining externally. > > > > Thanks, > > B Anil Kumar. > > > > Thanks & Regards, > > B Anil Kumar. > > > > On Thu, Nov 9, 2017 at 11:14 AM, kant kodali <[email protected]> wrote: > > > > > Can someone elaborate on what happens underneath if I poll every second > > > (Specifically related to my questions in my previous email)? > > > > > > Thanks! > > > > > > On Thu, Nov 9, 2017 at 7:56 AM, Ted Dunning <[email protected]> > > wrote: > > > > > > > Confluent has a non-Apache product, I think, for streaming SQL. > > > > > > > > > > > > On Thu, Nov 9, 2017 at 4:50 PM, Saurabh Mahapatra < > [email protected] > > > > > > > wrote: > > > > > > > > > Isn't there the new Kafka plugin? What does that exactly do? > > > > > > > > > > Best, > > > > > Saurabh > > > > > > > > > > Sent from my iPhone > > > > > > > > > > > > > > > > > > > > > On Nov 9, 2017, at 5:15 AM, kant kodali <[email protected]> > > wrote: > > > > > > > > > > > > Hi Tug, > > > > > > > > > > > > It's Parquet data on HDFS and the data to HDFS is constantly > > written > > > by > > > > > > spark while consuming from Kafka. > > > > > > > > > > > > Is polling a common technique for say real time analytics > > dashboard ? > > > > > More > > > > > > importantly if I poll does Drill due the scan every time? if the > > > answer > > > > > is > > > > > > no, how does it know which is the new data? since the data is > > written > > > > > HDFS > > > > > > constantly as a stream (The query can be the same however the new > > > data > > > > > will > > > > > > be appended or updated to HDFS in parquet format as a stream). > > > > > > > > > > > > Thanks! > > > > > > > > > > > >> On Thu, Nov 9, 2017 at 4:47 AM, Tugdual Grall < > [email protected]> > > > > > wrote: > > > > > >> > > > > > >> Hello, > > > > > >> > > > > > >> > > > > > >> Today Drill cannot do continuous/streaming query, so as you > > > mentioned > > > > > you > > > > > >> will have to use a polling technique. > > > > > >> > > > > > >> > > > > > >> Just out of curiosity, Which data source are you planning to > use ? > > > > > >> > > > > > >> Regards > > > > > >> Tug > > > > > >> > > > > > >> > > > > > >> > > > > > >> > > > > > >>> On Thu 9 Nov 2017 at 04:31, kant kodali <[email protected]> > > > wrote: > > > > > >>> > > > > > >>> Hi All, > > > > > >>> > > > > > >>> I am new to Apache Drill. I am wondering if Apache Drill can > > > perform > > > > > >>> Streaming Queries? For example, I have a constant stream of > data > > in > > > > 24 > > > > > >> hour > > > > > >>> period and I would like to get updates as soon as I receive > them. > > > > > >>> > > > > > >>> Do I need to have a polling thread that issues a Drill query > > every > > > > > >> second? > > > > > >>> > > > > > >>> Thanks! > > > > > >>> > > > > > >> > > > > > > > > > > > > > > >
