Hi dear drill user,
I have submited the issue https://issues.apache.org/jira/browse/DRILL-2159
for TableStatsCalculator bug.
These days we find 2 confused issue.
1stI do the query select id, some_other_colums from hdfs.tmp.`table_name` order
by id limit 100in drill-0.7 on hadoop 2.3.0 and get the paln as
belowOperatorTypeSetup (min)Setup (avg)Setup (max)Process (min)Process
(avg)Process (max)Wait (min)Wait (avg)Wait (max)00-xx-00SCREEN0.000
(0)0.0000.000 (0)0.001 (0)0.0010.001 (0)0.000 (0)0.0000.000
(0)00-xx-01PROJECT0.004 (0)0.0040.004 (0)0.000 (0)0.0000.000 (0)0.000
(0)0.0000.000 (0)00-xx-02SELECTION_VECTOR_REMOVER0.048 (0)0.0480.048 (0)0.002
(0)0.0020.002 (0)0.000 (0)0.0000.000 (0)00-xx-03LIMIT0.000 (0)0.0000.000
(0)0.004 (0)0.0040.004 (0)0.000 (0)0.0000.000 (0)00-xx-04MERGING_RECEIVER0.000
(0)0.0000.000 (0)0.255 (0)0.2550.255 (0)13.398 (0)13.39813.398
(0)01-xx-00SINGLE_SENDER0.000 (0)0.0000.000 (275)0.000 (44)0.0000.000 (99)0.000
(164)0.0020.061 (63)01-xx-01SELECTION_VECTOR_REMOVER0.000 (184)0.0010.001
(189)0.000 (100)0.0010.013 (205)0.000 (100)0.0000.000
(205)01-xx-02TOP_N_SORT0.000 (0)0.0000.000 (275)0.062 (88)0.3500.739 (17)0.000
(88)0.0000.000 (17)01-xx-03UNORDERED_RECEIVER0.000 (0)0.0000.000 (275)0.000
(0)0.0100.305 (133)0.000 (0)6.71013.897 (268)02-xx-00HASH_PARTITION_SENDER0.000
(0)0.0000.000 (275)0.624 (144)1.4532.370 (245)0.005 (199)0.2281.015
(196)02-xx-01PROJECT0.000 (252)0.0050.271 (204)0.000 (144)0.0010.002 (94)0.000
(144)0.0000.000 (94)02-xx-02PARQUET_ROW_GROUP_SCAN0.000 (124)0.2101.801
(78)0.565 (144)7.13811.801 (2)0.000 (144)0.0000.000 (2)
But the top_n_sort is after hash_partition_sender, so the machine should send
all data to the drillbits. Why isn't top_n_sort between scan&project and
hash_part_sender? Then it will be more faster.
Major FragmentMinor Fragments ReportingFirst StartLast StartFirst EndLast
Endtmintavgtmax00-xx-xx1 / 14.883 (0)4.883 (0)20.779 (0)20.779 (0)15.896
(0)15.89615.896 (0)01-xx-xx276 / 2764.932 (7)5.639 (268)19.972 (0)20.521
(267)14.876 (268)15.01615.134 (217)02-xx-xx276 / 2765.686 (0)7.674 (275)8.206
(144)20.529 (117)1.623 (144)9.77814.635 (2)The fragment is started after
almost 5s, and I wanna which operation take the first 5s. Do the executing plan?
2ndDrill reads a parquet file(size:200m) from hdfs, but always throw the
IOException:FAILED_TO_UNCOMPRESS(5) in
org.apache.drill.exec.store.parquet.columnreaders(line:122).
code:bytesIn = parentColumnReader.parentReader.getCodecFactoryExposer()
.decompress(parentColumnReader.columnChunkMetaData.getCodec(),
compressedData,
uncompressedData,
pageHeader.compressed_page_size,
pageHeader.getUncompressed_page_size());
We found that in org.apache.drill.exec.store.parquet.ColumnDataReader the
CompatibilityUtil merely copy part content( the red highlight code).So we add
the while condition for getting the completed content and it's working fine.
But I'm not sure if this is a bug.public ByteBuf getPageAsBytesBuf(ByteBuf
byteBuf, int pageLength) throws IOException{
ByteBuffer directBuffer=byteBuf.nioBuffer(0, pageLength);
int l=directBuffer.remaining();
int bl=byteBuf.capacity();
try{
do {
CompatibilityUtil.getBuf(input, directBuffer, pageLength);
} while (directBuffer.remaining() > 0);
}catch(Exception e) {
logger.error("Failed to read data into Direct ByteBuffer with exception:
"+e.getMessage());
throw new DrillRuntimeException(e.getMessage());
}
return byteBuf;
}