Yes that would work too, though if there are inconsistencies in the copies of files made, then the results would be unreliable.
Parth On Wed, Jul 29, 2015 at 6:45 PM, Adam Gilmore <[email protected]> wrote: > Just to clarify this, Jason - you don't necessarily need HDFS or the like > for this, if you had say a NFS volume (for example, Amazon Elastic File > System), you can still accomplish it, right? Or merely if you had all > files duplicated on every node locally. > > On Thu, Jul 30, 2015 at 10:00 AM, Jason Altekruse < > [email protected]> > wrote: > > > Put a little more simply, the node that we end up planning the query on > is > > going to enumerate the files we will be reading in the query so that we > can > > assign work to given nodes. This currently assumes we are going to know > at > > planning time (on the single node) all of the files to be read. This > > happens to work in a single node setup, because all of the work will be > > done on the single machine against one filesystem (the local fs). In the > > distributed case we currently require that we have a connection from each > > node to a DFS. > > > > There is an outstanding feature request to support a use case like > querying > > a series of server logs, each machine may have a different number of log > > files. We will need to modify the planning process to allow for the > > description of a scan that is more flexible and allows enumerating the > > files on each machine separately when we go to actually read them. > > > > This JIRA discusses the issue you are facing in more detail, I believe we > > should have one outstanding for the feature request as well. I will try > to > > take a look around for it and open one if I can't find it soon. > > > > https://issues.apache.org/jira/browse/DRILL-3230 > > > > On Wed, Jul 29, 2015 at 4:14 PM, Kristine Hahn <[email protected]> > wrote: > > > > > Yes, you need a distributed file system to take advantage of Drill's > > query > > > planning. If you use multiple Drillbits and do not use a distributed > file > > > system, the consistency of the fragment information cannot be > maintained. > > > > > > > > > > > > Kristine Hahn > > > Sr. Technical Writer > > > 415-497-8107 @krishahn skype:krishahn > > > > > > > > > On Wed, Jul 29, 2015 at 4:37 AM, Geercken, Uwe < > > [email protected] > > > > > > > wrote: > > > > > > > Hello, > > > > > > > > If I have a list of partitioned parquet files on the filesystem and > two > > > > drillbits with access to the filesystem and I query the data using > the > > > > column I partitioned on in the where clause of the query, will both > > > > drillbits share the work? > > > > > > > > Or do I need a distributed filesystem such as Hadoop underlying to > make > > > > the bits work in parallel (or work together)? > > > > > > > > Tks. > > > > > > > > Uwe > > > > > > > > > >
