How many columns do you have? Do you understand about columnar data stores and how selecting only a single column means that much less data needs to be read? If your data consists, say, of integers, then Drill only needs to read 160MB to satisfy your query which is quite reasonable to be read in a second or less.
If your records are much wider than that (say 50 columns or so), then reading * could easily take a minute, especially if you don't have disk bandwidth to read that much data in parallel. On Mon, Jul 6, 2015 at 7:11 PM, Yousef Lasi <[email protected]> wrote: > I'm hoping someone can expand my understanding of the mechanics of a query > against a parquet file. We're finding that selecting a single column in a > record from a file with > 40 million records is extremely fast - typically > less than a second. However, running a 'select *" query against the same > record using the same criteria is somewhat slow - as in greater than 60 > seconds. > > This might be expected behavior, but hopefully a better understanding of > why this occurs might help us optimize the structure of our data files > better as we create them. > > Thanks >
