Re: Querying partitioned Parquet files

Adam Gilmore Wed, 29 Jul 2015 18:47:47 -0700

Just to clarify this, Jason - you don't necessarily need HDFS or the like
for this, if you had say a NFS volume (for example, Amazon Elastic File
System), you can still accomplish it, right?  Or merely if you had all
files duplicated on every node locally.


On Thu, Jul 30, 2015 at 10:00 AM, Jason Altekruse <[email protected]>
wrote:

> Put a little more simply, the node that we end up planning the query on is
> going to enumerate the files we will be reading in the query so that we can
> assign work to given nodes. This currently assumes we are going to know at
> planning time (on the single node) all of the files to be read. This
> happens to work in a single node setup, because all of the work will be
> done on the single machine against one filesystem (the local fs). In the
> distributed case we currently require that we have a connection from each
> node to a DFS.
>
> There is an outstanding feature request to support a use case like querying
> a series of server logs, each machine may have a different number of log
> files. We will need to modify the planning process to allow for the
> description of a scan that is more flexible and allows enumerating the
> files on each machine separately when we go to actually read them.
>
> This JIRA discusses the issue you are facing in more detail, I believe we
> should have one outstanding for the feature request as well. I will try to
> take a look around for it and open one if I can't find it soon.
>
> https://issues.apache.org/jira/browse/DRILL-3230
>
> On Wed, Jul 29, 2015 at 4:14 PM, Kristine Hahn <[email protected]> wrote:
>
> > Yes, you need a distributed file system to take advantage of Drill's
> query
> > planning. If you use multiple Drillbits and do not use a distributed file
> > system, the consistency of the fragment information cannot be maintained.
> >
> >
> >
> > Kristine Hahn
> > Sr. Technical Writer
> > 415-497-8107 @krishahn skype:krishahn
> >
> >
> > On Wed, Jul 29, 2015 at 4:37 AM, Geercken, Uwe <
> [email protected]
> > >
> > wrote:
> >
> > > Hello,
> > >
> > > If I have a list of partitioned parquet files on the filesystem and two
> > > drillbits with access to the filesystem and I query the data using the
> > > column I partitioned on in the where clause of the query, will both
> > > drillbits share the work?
> > >
> > > Or do I need a distributed filesystem such as Hadoop underlying to make
> > > the bits work in parallel (or work together)?
> > >
> > > Tks.
> > >
> > > Uwe
> > >
> >
>

Re: Querying partitioned Parquet files

Reply via email to