Re: Querying partitioned Parquet files

Jason Altekruse Wed, 29 Jul 2015 17:01:28 -0700

Put a little more simply, the node that we end up planning the query on is
going to enumerate the files we will be reading in the query so that we can
assign work to given nodes. This currently assumes we are going to know at
planning time (on the single node) all of the files to be read. This
happens to work in a single node setup, because all of the work will be
done on the single machine against one filesystem (the local fs). In the
distributed case we currently require that we have a connection from each
node to a DFS.

There is an outstanding feature request to support a use case like querying
a series of server logs, each machine may have a different number of log
files. We will need to modify the planning process to allow for the
description of a scan that is more flexible and allows enumerating the
files on each machine separately when we go to actually read them.

This JIRA discusses the issue you are facing in more detail, I believe we
should have one outstanding for the feature request as well. I will try to
take a look around for it and open one if I can't find it soon.

https://issues.apache.org/jira/browse/DRILL-3230

On Wed, Jul 29, 2015 at 4:14 PM, Kristine Hahn <[email protected]> wrote:

> Yes, you need a distributed file system to take advantage of Drill's query
> planning. If you use multiple Drillbits and do not use a distributed file
> system, the consistency of the fragment information cannot be maintained.
>
>
>
> Kristine Hahn
> Sr. Technical Writer
> 415-497-8107 @krishahn skype:krishahn
>
>
> On Wed, Jul 29, 2015 at 4:37 AM, Geercken, Uwe <[email protected]
> >
> wrote:
>
> > Hello,
> >
> > If I have a list of partitioned parquet files on the filesystem and two
> > drillbits with access to the filesystem and I query the data using the
> > column I partitioned on in the where clause of the query, will both
> > drillbits share the work?
> >
> > Or do I need a distributed filesystem such as Hadoop underlying to make
> > the bits work in parallel (or work together)?
> >
> > Tks.
> >
> > Uwe
> >
>

Re: Querying partitioned Parquet files

Reply via email to