question for drill issue

蔡自强(伏念) Thu, 05 Feb 2015 00:23:44 -0800

Hi dear drill user,
I have submited the issue   https://issues.apache.org/jira/browse/DRILL-2159 
for  TableStatsCalculator bug.


These days we find 2 confused issue.
1stI do the query select id, some_other_colums from hdfs.tmp.`table_name` order 
by id limit 100in drill-0.7 on hadoop 2.3.0 and get the paln as 
belowOperatorTypeSetup (min)Setup (avg)Setup (max)Process (min)Process 
(avg)Process (max)Wait (min)Wait (avg)Wait (max)00-xx-00SCREEN0.000 
(0)0.0000.000 (0)0.001 (0)0.0010.001 (0)0.000 (0)0.0000.000 
(0)00-xx-01PROJECT0.004 (0)0.0040.004 (0)0.000 (0)0.0000.000 (0)0.000 
(0)0.0000.000 (0)00-xx-02SELECTION_VECTOR_REMOVER0.048 (0)0.0480.048 (0)0.002 
(0)0.0020.002 (0)0.000 (0)0.0000.000 (0)00-xx-03LIMIT0.000 (0)0.0000.000 
(0)0.004 (0)0.0040.004 (0)0.000 (0)0.0000.000 (0)00-xx-04MERGING_RECEIVER0.000 
(0)0.0000.000 (0)0.255 (0)0.2550.255 (0)13.398 (0)13.39813.398 
(0)01-xx-00SINGLE_SENDER0.000 (0)0.0000.000 (275)0.000 (44)0.0000.000 (99)0.000 
(164)0.0020.061 (63)01-xx-01SELECTION_VECTOR_REMOVER0.000 (184)0.0010.001 
(189)0.000 (100)0.0010.013 (205)0.000 (100)0.0000.000 
(205)01-xx-02TOP_N_SORT0.000 (0)0.0000.000 (275)0.062 (88)0.3500.739 (17)0.000 
(88)0.0000.000 (17)01-xx-03UNORDERED_RECEIVER0.000 (0)0.0000.000 (275)0.000 
(0)0.0100.305 (133)0.000 (0)6.71013.897 (268)02-xx-00HASH_PARTITION_SENDER0.000 
(0)0.0000.000 (275)0.624 (144)1.4532.370 (245)0.005 (199)0.2281.015 
(196)02-xx-01PROJECT0.000 (252)0.0050.271 (204)0.000 (144)0.0010.002 (94)0.000 
(144)0.0000.000 (94)02-xx-02PARQUET_ROW_GROUP_SCAN0.000 (124)0.2101.801 
(78)0.565 (144)7.13811.801 (2)0.000 (144)0.0000.000 (2)
But the top_n_sort is after hash_partition_sender, so the machine should send 
all data to the drillbits. Why isn't top_n_sort between scan&project and 
hash_part_sender? Then it will be more faster.
Major FragmentMinor Fragments ReportingFirst StartLast StartFirst EndLast 
Endtmintavgtmax00-xx-xx1 / 14.883 (0)4.883 (0)20.779 (0)20.779 (0)15.896 
(0)15.89615.896 (0)01-xx-xx276 / 2764.932 (7)5.639 (268)19.972 (0)20.521 
(267)14.876 (268)15.01615.134 (217)02-xx-xx276 / 2765.686 (0)7.674 (275)8.206 
(144)20.529 (117)1.623 (144)9.77814.635 (2)The  fragment is started after 
almost 5s, and I wanna which operation take the first 5s. Do the executing plan?
 
2ndDrill reads a parquet file(size:200m) from hdfs, but always throw the 
IOException：FAILED_TO_UNCOMPRESS(5) in 
org.apache.drill.exec.store.parquet.columnreaders(line:122).
code:bytesIn = parentColumnReader.parentReader.getCodecFactoryExposer()
 .decompress(parentColumnReader.columnChunkMetaData.getCodec(),
 compressedData,
 uncompressedData,
 pageHeader.compressed_page_size,
 pageHeader.getUncompressed_page_size());
We found that in org.apache.drill.exec.store.parquet.ColumnDataReader the 
CompatibilityUtil merely copy part content( the red highlight code).So we add 
the while condition for getting the completed content and it's working fine. 
But I'm not sure if this is a bug.public ByteBuf getPageAsBytesBuf(ByteBuf 
byteBuf, int pageLength) throws IOException{
    ByteBuffer directBuffer=byteBuf.nioBuffer(0, pageLength);
    int l=directBuffer.remaining();
    int bl=byteBuf.capacity();
    try{
      do {
          CompatibilityUtil.getBuf(input, directBuffer, pageLength);
      } while (directBuffer.remaining() > 0);
    }catch(Exception e) {
      logger.error("Failed to read data into Direct ByteBuffer with exception: 
"+e.getMessage());
       throw new DrillRuntimeException(e.getMessage());
    }
    return byteBuf;
  }

question for drill issue

Reply via email to