We should consider whether we enable an option to support splitting on
files that are known to be one record per line. Does someone want to file
and enhancement request?
On Saturday, January 17, 2015, Ted Dunning ted.dunn...@gmail.com wrote:
On Fri, Jan 16, 2015 at 6:25 PM, George Chow
On Fri, Jan 16, 2015 at 6:25 PM, George Chow geo...@overcoil.com wrote:
Are you saying that Drill will serialize one file to one DrillBit?
For unsplittable files, yes.
If you do want to have more parallelism, use several input files.
On Fri, Jan 16, 2015 at 9:13 AM, Jason Altekruse altekruseja...@gmail.com
wrote:
I do not think we currently consider JSON files splittable. If we do treat
them as such, it would depend on the file size and the available read
json files are not splittable. There will be exactly one thread reading the
file, regardless of how big it is.
On Fri, Jan 16, 2015 at 4:15 PM, George Chow geo...@overcoil.com wrote:
It should be possible to compare your HDFS block size with your file size
to determine how many blocks (and
Hi Steven,
But a JSON file residing on HDFS is nonetheless split across datanode
boundaries.
Are you saying that Drill will serialize one file to one DrillBit?
George
On Fri, Jan 16, 2015 at 4:50 PM, Steven Phillips sphill...@maprtech.com
wrote:
json files are not splittable. There will be