On 6/1/15, 12:14 PM, "Matt" <[email protected]> wrote:

>Segmenting data into directories in HDFS would require clients to
>structure queries accordingly, but would there be benefit in reduced
>query time by limiting scan ranges?

Yes. I am just a newbie user, but I have already seen that work with
localFS and S3; I fully expect it will work for HDFS also, as I have seen
mention of such a strategy for HDFS outside the context of Drill. Ignorant
clients can also still query the root directory and just not get the
benefit. I believe you could even define a view that would allow clients
to apply WHERE clause filters against artificial columns of date
information that you map to the directory structure, thereby hiding the
structure from the client.

HTH,
Paul

Reply via email to