Re: Performance querying a single column out of a parquet file

2016-04-11 Thread Jacques Nadeau
There was a major conflict between the patch and the metadata caching feature that came in right at the same time (right before it). I believe there was a discussion about this on the list. It would be great if a developer could pick this up. -- Jacques Nadeau CTO and Co-Founder, Dremio On Mon,

Re: Performance querying a single column out of a parquet file

2016-04-11 Thread Ted Dunning
On Mon, Apr 11, 2016 at 10:36 AM, Aman Sinha wrote: > There is a JIRA related to one aspect of this: DRILL-1950 (filter pushdown > into parquet scan). This is still work in progress I believe. > Actually, it looks like there was a patch from the community nearly a year

Re: Creating an Interpreter - %alias

2016-04-11 Thread John Omernik
LOL Sorry folks, yes, I meant this for the Apache Zeppelin list. Thanks Magnus for the response, I will also send over to the Zeppelin list. On Mon, Apr 11, 2016 at 8:15 AM, Magnus Pierre wrote: > Hello John, > I assume you are talking about Zeppelin -> Drill? I’ve think

Re: Performance querying a single column out of a parquet file

2016-04-11 Thread Aman Sinha
There is a JIRA related to one aspect of this: DRILL-1950 (filter pushdown into parquet scan). This is still work in progress I believe. Once that is implemented, the scan will produce the filtered rows only. Regarding column projections, currently in Drill, the columns referenced anywhere in

Re: Performance querying a single column out of a parquet file

2016-04-11 Thread Johannes Zillmann
Hey Ted, Sorry i mixed up row and column! Queries are like that: (1) "SELECT * FROM dfs.`myParquetFile` WHERE `id` = 23" (2) "SELECT id FROM dfs.`myParquetFile` WHERE `id` = 23" (1) is 14 sec and (2) is 1.5 sec. Using drill-1.6. So it looks like Drill is extracting the columns

Re: Performance querying a single column out of a parquet file

2016-04-11 Thread Ted Dunning
Did you mean that you are doing a select to find a single column? What you typed was row, but that seems out of line with the rest of what you wrote. If you are truly asking about filtering down to a single row, whether it costs more to return all of the columns rather than just one from a single

Performance querying a single column out of a parquet file

2016-04-11 Thread Johannes Zillmann
Hey there, i currently doing some performance measurements on Drill. In my case its a single parquet file with a single local Drill Bit. Now in one case i have unexpected results and i’m curious if somebody has a good explanation for it! So i have a file with 10 mio rows with 9 columns . Now

Re: Creating an Interpreter - %alias

2016-04-11 Thread Magnus Pierre
Hello John, I assume you are talking about Zeppelin -> Drill? I’ve think I’ve seen it done. Let me check with the person I believe did what you ask for and ask how he did it. I think it is somewhat convenient to have one and the same ”tag” for all statements that goes against jdbc and just

Creating an Interpreter - %alias

2016-04-11 Thread John Omernik
So I copied the %jdbc interpreter. I am looking to create a specific connection to Apache Drill, and would like to use the jdbc interpreter, but invoke it by calling %drill rather than %jdbc, is this possible? I tried create interpreter and did a new name "drill" with interpreter type jdbc, but I