Hi Abhishek and all,

Thanks for you answer.

I may try later for HDFS way. But there are cases when restriction are set
for the cluster, not allow user to use distributed file system.

In this way, Drill might need a way to query from all node to get the
result, not from the local. Because in CTAS, Drill created the result in
several node and thus should read from those nodes as well.

Is my suggestion make sense??

Thanks!

George

On Mon, Jun 1, 2015 at 9:51 AM, Abhishek Girish <[email protected]>
wrote:

> Hey George,
>
> Can I ask why aren't using a distributed file system? You would see the
> behavior you expect when you use the dfs plug-in configured with a
> distributed file system (HDFS / MapR-FS).
>
> In your case, the parquet files from CTAS will be written to a specific
> node's local file system, depending on which Drill-bit the client connects
> to. And if the table is moderate to large in size, Drill may process them
> in a distributed manner and write data into more than one node - hence the
> behavior you see.
>
> -Abhishek
>
> On Sun, May 31, 2015 at 6:34 PM, George Lu <[email protected]> wrote:
>
> > Hi all,
> >
> > I use dfs.tmp as my schema and when I use CTAS create some tables over
> > 10000 rows the result parquet was created in like 2 nodes in the cluster.
> > However when I query the table, I only get the portion in that node. So,
> I
> > get 700 rows in one node when I use "select * from T1" and 10000 rows in
> > another.
> >
> > May I ask is that behavior correct? How to create or let Drill get all
> > tuples when I create or query in one node using dfs.tmp local?
> >
> > Else the exists query doesn't work.
> >
> > Thanks!
> >
> > George
> >
>

Reply via email to