Re: Querying partitioned Parquet files

Parth Chandra Wed, 29 Jul 2015 22:14:58 -0700

Yes that would work too, though if there are inconsistencies in the copies
of files made, then the results would be unreliable.


Parth

On Wed, Jul 29, 2015 at 6:45 PM, Adam Gilmore <[email protected]> wrote:

> Just to clarify this, Jason - you don't necessarily need HDFS or the like
> for this, if you had say a NFS volume (for example, Amazon Elastic File
> System), you can still accomplish it, right?  Or merely if you had all
> files duplicated on every node locally.
>
> On Thu, Jul 30, 2015 at 10:00 AM, Jason Altekruse <
> [email protected]>
> wrote:
>
> > Put a little more simply, the node that we end up planning the query on
> is
> > going to enumerate the files we will be reading in the query so that we
> can
> > assign work to given nodes. This currently assumes we are going to know
> at
> > planning time (on the single node) all of the files to be read. This
> > happens to work in a single node setup, because all of the work will be
> > done on the single machine against one filesystem (the local fs). In the
> > distributed case we currently require that we have a connection from each
> > node to a DFS.
> >
> > There is an outstanding feature request to support a use case like
> querying
> > a series of server logs, each machine may have a different number of log
> > files. We will need to modify the planning process to allow for the
> > description of a scan that is more flexible and allows enumerating the
> > files on each machine separately when we go to actually read them.
> >
> > This JIRA discusses the issue you are facing in more detail, I believe we
> > should have one outstanding for the feature request as well. I will try
> to
> > take a look around for it and open one if I can't find it soon.
> >
> > https://issues.apache.org/jira/browse/DRILL-3230
> >
> > On Wed, Jul 29, 2015 at 4:14 PM, Kristine Hahn <[email protected]>
> wrote:
> >
> > > Yes, you need a distributed file system to take advantage of Drill's
> > query
> > > planning. If you use multiple Drillbits and do not use a distributed
> file
> > > system, the consistency of the fragment information cannot be
> maintained.
> > >
> > >
> > >
> > > Kristine Hahn
> > > Sr. Technical Writer
> > > 415-497-8107 @krishahn skype:krishahn
> > >
> > >
> > > On Wed, Jul 29, 2015 at 4:37 AM, Geercken, Uwe <
> > [email protected]
> > > >
> > > wrote:
> > >
> > > > Hello,
> > > >
> > > > If I have a list of partitioned parquet files on the filesystem and
> two
> > > > drillbits with access to the filesystem and I query the data using
> the
> > > > column I partitioned on in the where clause of the query, will both
> > > > drillbits share the work?
> > > >
> > > > Or do I need a distributed filesystem such as Hadoop underlying to
> make
> > > > the bits work in parallel (or work together)?
> > > >
> > > > Tks.
> > > >
> > > > Uwe
> > > >
> > >
> >
>

Re: Querying partitioned Parquet files

Reply via email to