If other queries are acceptable, you can use something similar to: 0: jdbc:drill:> select sum(`ROWS`) `TOTAL_NUMBER` from (select count(*) as `ROWS` from cp.`tpch/nation.parquet` union all select count(*) as `ROWS` from cp.`tpch/region.parquet`); +-------------------------+ | TOTAL_NUMBER | +-------------------------+ | 30 | +-------------------------+ 1 row selected (0.324 seconds)
Kind regards Vitalii On Sun, Aug 5, 2018 at 9:56 PM 王亮 <wanglian...@gmail.com> wrote: > Hi all, > > I have apache HTTP server logs in different machines and want to query > these log files. > > So I install the drill (distributed mode) in these machines, for example, > node1,node2. > > I use this command: > sqlline –u jdbc:drill:zk:node1,node2 > or > sqlline –u jdbc:drill:drillbit:node1,node2 > > then input query like: select count(*) from dfs.`/apache/logs/access_log` > I could only get the data of one machine. > > Maybe I can upload all logs file to s3 or Hadoop. > But is there an easy way to query all local files in different machines by > drill? > > If we need develop the new features to support this requirement, How much > work we should do? for example, only revise the physical plan distribution > codes? or need write the completely new data source plugin? > > I found these discussions, but seems no clear answer. > > > https://stackoverflow.com/questions/29365320/apache-drill-in-distributed-mode > > http://mail-archives.apache.org/mod_mbox/drill-user/201506.mbox/thread > > > https://stackoverflow.com/questions/33952568/how-to-configure-drill-to-use-all-the-nodes-for-a-query-by-creating-multiple-fr > > Thanks, > > Wang Liang >