On 6/1/15, 12:14 PM, "Matt" <[email protected]> wrote:
>Segmenting data into directories in HDFS would require clients to >structure queries accordingly, but would there be benefit in reduced >query time by limiting scan ranges? Yes. I am just a newbie user, but I have already seen that work with localFS and S3; I fully expect it will work for HDFS also, as I have seen mention of such a strategy for HDFS outside the context of Drill. Ignorant clients can also still query the root directory and just not get the benefit. I believe you could even define a view that would allow clients to apply WHERE clause filters against artificial columns of date information that you map to the directory structure, thereby hiding the structure from the client. HTH, Paul
