Hi Boris,

Given that we don't have much data we can use here, I'll have to
extrapolate. As an aside though, this is yet another example where we need
more Kudu-side metrics in the query profile.

So, Kudu lazily loads a bunch of metadata and that can really affect scan
times. If this was your first time ever scanning the table since a Kudu
restart, it's very possible that that's where that time was spent. There's
also the page cache in the OS that might now be populated. You could do
something like "sync; echo 3 > /proc/sys/vm/drop_caches" on all the
machines and run the query 2 times again, without restarting Kudu, to
understand the effect of the page cache itself. There's currently now way
to purge the cached metadata in Kudu though.

Hope this helps a bit,

J-D

On Wed, Dec 13, 2017 at 8:07 AM, Boris Tyukin <bo...@boristyukin.com> wrote:

> Hi guys,
>
> I am doing some benchmarks with Kudu and Impala/Parquet and hope to share
> it soon but there is one thing that bugs me. This is perhaps Impala
> question but since I am using Kudu with Impala I am going to try and ask
> anyway.
>
> One of my queries takes 120 seconds to run the very first time. It joins
> one large 5B row table with a bunch of smaller tables and then stores
> result in Impala/parquet (not Kudu).
>
> Now if I run it second and third time, it only takes 60 seconds. Can
> someone explain why? Is there any settings to decrease this gap?
>
> I've compared query profiles in CM and the only thing that was very
> different is scan against Kudu table (the large one):
>
> ***************************
> first time:
> ***************************
> KUDU_SCAN_NODE (id=0) (47.68s)
> <https://lkmaorabd103.multihosp.net:7183/cmf/impala/queryDetails?queryId=5143f7165be82819%3Ae00a103500000000&serviceName=impala#>
>
>
>
>    - BytesRead: *0 B*
>    - InactiveTotalTime: *0ns*
>    - KuduRemoteScanTokens: *0*
>    - NumScannerThreadsStarted: *20*
>    - PeakMemoryUsage: *35.8 MiB*
>    - RowsRead: *693,502,241*
>    - RowsReturned: *693,502,241*
>    - RowsReturnedRate: *14643448 per second*
>    - ScanRangesComplete: *20*
>    - ScannerThreadsInvoluntaryContextSwitches: *1,341*
>    - ScannerThreadsTotalWallClockTime: *36.2m*
>       - MaterializeTupleTime(*): *47.57s*
>       - ScannerThreadsSysTime: *31.42s*
>       - ScannerThreadsUserTime: *1.7m*
>    - ScannerThreadsVoluntaryContextSwitches: *96,855*
>    - TotalKuduScanRoundTrips: *52,308*
>    - TotalReadThroughput: *0 B/s*
>    - TotalTime: *47.68s*
>
>
> ***************************
> second time:
> ***************************
> KUDU_SCAN_NODE (id=0) (4.28s)
> <https://lkmaorabd103.multihosp.net:7183/cmf/impala/queryDetails?queryId=53497a308f860837%3A243772e000000000&serviceName=impala#>
>
>
>
>    - BytesRead: *0 B*
>    - InactiveTotalTime: *0ns*
>    - KuduRemoteScanTokens: *0*
>    - NumScannerThreadsStarted: *20*
>    - PeakMemoryUsage: *37.9 MiB*
>    - RowsRead: *693,502,241*
>    - RowsReturned: *693,502,241*
>    - RowsReturnedRate: *173481534 per second*
>    - ScanRangesComplete: *20*
>    - ScannerThreadsInvoluntaryContextSwitches: *1,451*
>    - ScannerThreadsTotalWallClockTime: *19.5m*
>       - MaterializeTupleTime(*): *4.20s*
>       - ScannerThreadsSysTime: *38.22s*
>       - ScannerThreadsUserTime: *1.7m*
>    - ScannerThreadsVoluntaryContextSwitches: *480,870*
>    - TotalKuduScanRoundTrips: *52,142*
>    - TotalReadThroughput: *0 B/s*
>    - TotalTime: *4.28s*
>
>
>
>
>

Reply via email to