Hi guys, I am doing some benchmarks with Kudu and Impala/Parquet and hope to share it soon but there is one thing that bugs me. This is perhaps Impala question but since I am using Kudu with Impala I am going to try and ask anyway.
One of my queries takes 120 seconds to run the very first time. It joins one large 5B row table with a bunch of smaller tables and then stores result in Impala/parquet (not Kudu). Now if I run it second and third time, it only takes 60 seconds. Can someone explain why? Is there any settings to decrease this gap? I've compared query profiles in CM and the only thing that was very different is scan against Kudu table (the large one): *************************** first time: *************************** KUDU_SCAN_NODE (id=0) (47.68s) <https://lkmaorabd103.multihosp.net:7183/cmf/impala/queryDetails?queryId=5143f7165be82819%3Ae00a103500000000&serviceName=impala#> - BytesRead: *0 B* - InactiveTotalTime: *0ns* - KuduRemoteScanTokens: *0* - NumScannerThreadsStarted: *20* - PeakMemoryUsage: *35.8 MiB* - RowsRead: *693,502,241* - RowsReturned: *693,502,241* - RowsReturnedRate: *14643448 per second* - ScanRangesComplete: *20* - ScannerThreadsInvoluntaryContextSwitches: *1,341* - ScannerThreadsTotalWallClockTime: *36.2m* - MaterializeTupleTime(*): *47.57s* - ScannerThreadsSysTime: *31.42s* - ScannerThreadsUserTime: *1.7m* - ScannerThreadsVoluntaryContextSwitches: *96,855* - TotalKuduScanRoundTrips: *52,308* - TotalReadThroughput: *0 B/s* - TotalTime: *47.68s* *************************** second time: *************************** KUDU_SCAN_NODE (id=0) (4.28s) <https://lkmaorabd103.multihosp.net:7183/cmf/impala/queryDetails?queryId=53497a308f860837%3A243772e000000000&serviceName=impala#> - BytesRead: *0 B* - InactiveTotalTime: *0ns* - KuduRemoteScanTokens: *0* - NumScannerThreadsStarted: *20* - PeakMemoryUsage: *37.9 MiB* - RowsRead: *693,502,241* - RowsReturned: *693,502,241* - RowsReturnedRate: *173481534 per second* - ScanRangesComplete: *20* - ScannerThreadsInvoluntaryContextSwitches: *1,451* - ScannerThreadsTotalWallClockTime: *19.5m* - MaterializeTupleTime(*): *4.20s* - ScannerThreadsSysTime: *38.22s* - ScannerThreadsUserTime: *1.7m* - ScannerThreadsVoluntaryContextSwitches: *480,870* - TotalKuduScanRoundTrips: *52,142* - TotalReadThroughput: *0 B/s* - TotalTime: *4.28s*
