Re: Varying Execution Times For The Same Query On The Same File

2015-01-18 Thread Jacques Nadeau
We should consider whether we enable an option to support splitting on files that are known to be one record per line. Does someone want to file and enhancement request? On Saturday, January 17, 2015, Ted Dunning ted.dunn...@gmail.com wrote: On Fri, Jan 16, 2015 at 6:25 PM, George Chow

Re: Varying Execution Times For The Same Query On The Same File

2015-01-17 Thread Ted Dunning
On Fri, Jan 16, 2015 at 6:25 PM, George Chow geo...@overcoil.com wrote: Are you saying that Drill will serialize one file to one DrillBit? For unsplittable files, yes.

Re: Varying Execution Times For The Same Query On The Same File

2015-01-16 Thread Ted Dunning
If you do want to have more parallelism, use several input files. On Fri, Jan 16, 2015 at 9:13 AM, Jason Altekruse altekruseja...@gmail.com wrote: I do not think we currently consider JSON files splittable. If we do treat them as such, it would depend on the file size and the available read

Re: Varying Execution Times For The Same Query On The Same File

2015-01-16 Thread Steven Phillips
json files are not splittable. There will be exactly one thread reading the file, regardless of how big it is. On Fri, Jan 16, 2015 at 4:15 PM, George Chow geo...@overcoil.com wrote: It should be possible to compare your HDFS block size with your file size to determine how many blocks (and

Re: Varying Execution Times For The Same Query On The Same File

2015-01-16 Thread George Chow
Hi Steven, But a JSON file residing on HDFS is nonetheless split across datanode boundaries. Are you saying that Drill will serialize one file to one DrillBit? George On Fri, Jan 16, 2015 at 4:50 PM, Steven Phillips sphill...@maprtech.com wrote: json files are not splittable. There will be