The cpu and memory are sufficient. Only one core is 100% during PHJ:
----->top + H
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
146455 impala 20 0 34.534g 2.130g 80992 R 99.9 0.8 0:05.94 impalad
51161 kudu 20 0 50.041g 0.021t 36788 S 4.9 8.6 672:05.66 rpc
reactor-511
51177 kudu 20 0 50.041g 0.021t 36788 S 4.9 8.6 0:37.49 rpc
worker-5117
51163 kudu 20 0 50.041g 0.021t 36788 S 4.6 8.6 417:25.94 rpc
reactor-511
51185 kudu 20 0 50.041g 0.021t 36788 S 3.9 8.6 0:03.43 rpc
worker-5118
51337 kudu 20 0 50.041g 0.021t 36788 S 3.9 8.6 417:15.66
maintenance_sch
114193 impala 20 0 34.534g 2.130g 80992 S 3.9 0.8 0:03.96 rpc
reactor-114
51191 kudu 20 0 50.041g 0.021t 36788 S 3.6 8.6 0:20.62 rpc
worker-5119
51190 kudu 20 0 50.041g 0.021t 36788 S 2.9 8.6 0:05.81 rpc
worker-5119
51162 kudu 20 0 50.041g 0.021t 36788 S 2.6 8.6 423:10.56 rpc reac
And in order to improve efficiency, i changed the sql to: select count(*) from
supplier as s, partsupp as p where s.s_suppkey = p.ps_suppkey;
Then i used 'sar' and 'perf' to capture something, and found that 'page-faults'
of kudu is almost twice of parquet though i don't know whether it is valuable.
kudu(sar) =================================================>>>
root@hadoop958:~# sar -B 1 1000
Linux 3.16.0-4-amd64 (hadoop958.hz.163.org) 01/08/2018 _x86_64_
(40 CPU)
05:21:21 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s
pgscand/s pgsteal/s %vmeff
05:21:22 PM 0.00 0.00 25.00 0.00 50.00 0.00
0.00 0.00 0.00
05:21:23 PM 0.00 0.00 24.00 0.00 49.00 0.00
0.00 0.00 0.00
05:21:24 PM 0.00 0.00 2385.00 0.00 516.00 0.00
0.00 0.00 0.00
05:21:25 PM 0.00 0.00 29041.00 0.00 20017.00 0.00
0.00 0.00 0.00
05:21:26 PM 0.00 0.00 40957.00 0.00 14432.00 0.00
0.00 0.00 0.00
05:21:27 PM 0.00 32.00 45776.00 0.00 1433.00 0.00
0.00 0.00 0.00
05:21:28 PM 0.00 0.00 12949.00 0.00 20950.00 0.00
0.00 0.00 0.00
05:21:29 PM 0.00 0.00 11335.00 0.00 22574.00 0.00
0.00 0.00 0.00
05:21:30 PM 0.00 100.00 15852.00 0.00 27227.00 0.00
0.00 0.00 0.00
05:21:31 PM 0.00 0.00 17678.00 0.00 29853.00 0.00
0.00 0.00 0.00
05:21:32 PM 0.00 0.00 12681.00 0.00 20305.00 0.00
0.00 0.00 0.00
05:21:33 PM 0.00 0.00 11062.00 0.00 20368.00 0.00
0.00 0.00 0.00
05:21:34 PM 0.00 0.00 11891.00 0.00 20317.00 0.00
0.00 0.00 0.00
05:21:35 PM 0.00 12.00 12495.00 0.00 20329.00 0.00
0.00 0.00 0.00
05:21:36 PM 0.00 68.00 10453.00 0.00 20320.00 0.00
0.00 0.00 0.00
05:21:37 PM 0.00 0.00 10313.00 0.00 20309.00 0.00
0.00 0.00 0.00
05:21:38 PM 0.00 0.00 10738.00 0.00 20350.00 0.00
0.00 0.00 0.00
05:21:39 PM 0.00 0.00 12216.00 0.00 20398.00 0.00
0.00 0.00 0.00
05:21:40 PM 0.00 0.00 12834.00 0.00 21116.00 0.00
0.00 0.00 0.00
05:21:41 PM 0.00 4.00 12497.00 0.00 20319.00 0.00
0.00 0.00 0.00
05:21:42 PM 0.00 16.00 11831.00 0.00 20376.00 0.00
0.00 0.00 0.00
05:21:43 PM 0.00 0.00 14087.00 0.00 24922.00 0.00
0.00 0.00 0.00
05:21:44 PM 0.00 0.00 17885.00 0.00 24956.00 0.00
0.00 0.00 0.00
05:21:45 PM 0.00 28.00 19622.00 0.00 24578.00 0.00
0.00 0.00 0.00
05:21:46 PM 0.00 4.00 11262.00 0.00 20351.00 0.00
0.00 0.00 0.00
05:21:47 PM 0.00 0.00 13248.00 0.00 21835.00 0.00
0.00 0.00 0.00
05:21:48 PM 0.00 24.00 12915.00 0.00 20322.00 0.00
0.00 0.00 0.00
05:21:49 PM 0.00 0.00 13170.00 0.00 21733.00 0.00
0.00 0.00 0.00
05:21:50 PM 0.00 20.00 13614.00 0.00 20660.00 0.00
0.00 0.00 0.00
05:21:51 PM 0.00 8.00 20138.00 0.00 20972.00 0.00
0.00 0.00 0.00
05:21:52 PM 0.00 0.00 13014.00 0.00 20372.00 0.00
0.00 0.00 0.00
05:21:53 PM 0.00 0.00 11716.00 0.00 20336.00 0.00
0.00 0.00 0.00
05:21:54 PM 0.00 12.00 10320.00 0.00 20442.00 0.00
0.00 0.00 0.00
05:21:54 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s
pgscand/s pgsteal/s %vmeff
05:21:55 PM 0.00 0.00 19144.00 0.00 29316.00 0.00
0.00 0.00 0.00
05:21:56 PM 0.00 96.00 10316.00 0.00 20297.00 0.00
0.00 0.00 0.00
05:21:57 PM 0.00 0.00 10310.00 0.00 20395.00 0.00
0.00 0.00 0.00
05:21:58 PM 0.00 0.00 10583.00 0.00 20864.00 0.00
0.00 0.00 0.00
05:21:59 PM 0.00 0.00 11559.00 0.00 21296.00 0.00
0.00 0.00 0.00
05:22:00 PM 0.00 0.00 10327.00 0.00 20313.00 0.00
0.00 0.00 0.00
05:22:01 PM 0.00 132.00 10406.00 0.00 29654.00 0.00
0.00 0.00 0.00
05:22:02 PM 0.00 12.00 11089.00 0.00 21865.00 0.00
0.00 0.00 0.00
05:22:03 PM 0.00 0.00 11085.00 0.00 21675.00 0.00
0.00 0.00 0.00
05:22:04 PM 0.00 0.00 11332.00 0.00 22583.00 0.00
0.00 0.00 0.00
05:22:05 PM 0.00 0.00 10820.00 0.00 21572.00 0.00
0.00 0.00 0.00
05:22:06 PM 0.00 40.00 15241.00 0.00 28091.00 0.00
0.00 0.00 0.00
05:22:07 PM 0.00 4.00 12533.00 0.00 24664.00 0.00
0.00 0.00 0.00
05:22:08 PM 0.00 0.00 12149.00 0.00 23097.00 0.00
0.00 0.00 0.00
05:22:09 PM 0.00 0.00 11511.00 0.00 21385.00 0.00
0.00 0.00 0.00
05:22:10 PM 0.00 0.00 10339.00 0.00 20445.00 0.00
0.00 0.00 0.00
05:22:11 PM 0.00 16.00 10306.00 0.00 20319.00 0.00
0.00 0.00 0.00
05:22:12 PM 0.00 16.00 10306.00 0.00 20382.00 0.00
0.00 0.00 0.00
05:22:13 PM 0.00 0.00 10312.00 0.00 20299.00 0.00
0.00 0.00 0.00
05:22:14 PM 0.00 0.00 10700.00 0.00 20363.00 0.00
0.00 0.00 0.00
05:22:15 PM 0.00 0.00 8725.00 0.00 18867.00 0.00
0.00 0.00 0.00
05:22:16 PM 0.00 20.00 55.00 0.00 96883.00 0.00
0.00 0.00 0.00
05:22:17 PM 0.00 0.00 24.00 0.00 53.00 0.00
0.00 0.00 0.00
05:22:18 PM 0.00 0.00 24.00 0.00 107.00 0.00
0.00 0.00 0.00
05:22:19 PM 0.00 0.00 1287.00 0.00 1136.00 0.00
0.00 0.00 0.00
05:22:20 PM 0.00 0.00 31.00 0.00 66.00 0.00
0.00 0.00 0.00
05:22:21 PM 0.00 64.00 24.00 0.00 58.00 0.00
0.00 0.00 0.00
05:22:22 PM 0.00 0.00 24.00 0.00 88.00 0.00
0.00 0.00 0.00
05:22:23 PM 0.00 0.00 24.00 0.00 89.00 0.00
0.00 0.00 0.00
05:22:24 PM 0.00 0.00 24.00 0.00 49.00 0.00
0.00 0.00 0.00
05:22:25 PM 0.00 0.00 24.00 0.00 49.00 0.00
0.00 0.00 0.00
05:22:26 PM 0.00 0.00 24.00 0.00 93.00 0.00
0.00 0.00 0.00
05:22:27 PM 0.00 4.00 24.00 0.00 97.00 0.00
0.00 0.00 0.00
parquet(sar) =============================================>>>
root@hadoop958:~# sar -B 1 1000
Linux 3.16.0-4-amd64 (hadoop958.hz.163.org) 01/08/2018 _x86_64_
(40 CPU)
04:37:37 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s
pgscand/s pgsteal/s %vmeff
04:37:38 PM 0.00 16.00 25.00 0.00 57.00 0.00
0.00 0.00 0.00
04:37:39 PM 0.00 0.00 14975.00 0.00 2585.00 0.00
0.00 0.00 0.00
04:37:40 PM 0.00 0.00 29143.00 0.00 6272.00 0.00
0.00 0.00 0.00
04:37:41 PM 0.00 0.00 67780.00 0.00 7579.00 0.00
0.00 0.00 0.00
04:37:42 PM 0.00 8.00 3734.00 0.00 4083.00 0.00
0.00 0.00 0.00
04:37:43 PM 0.00 60.00 7019.00 0.00 8188.00 0.00
0.00 0.00 0.00
04:37:44 PM 0.00 24.00 11843.00 0.00 8926.00 0.00
0.00 0.00 0.00
04:37:45 PM 0.00 0.00 9809.00 0.00 8289.00 0.00
0.00 0.00 0.00
04:37:46 PM 0.00 0.00 3792.00 0.00 4031.00 0.00
0.00 0.00 0.00
04:37:47 PM 0.00 0.00 3809.00 0.00 4057.00 0.00
0.00 0.00 0.00
04:37:48 PM 0.00 24.00 3796.00 0.00 4065.00 0.00
0.00 0.00 0.00
04:37:49 PM 0.00 0.00 6021.00 0.00 5349.00 0.00
0.00 0.00 0.00
04:37:50 PM 0.00 0.00 3762.00 0.00 4017.00 0.00
0.00 0.00 0.00
04:37:51 PM 0.00 0.00 13333.00 0.00 4450.00 0.00
0.00 0.00 0.00
04:37:52 PM 0.00 8.00 3839.00 0.00 4095.00 0.00
0.00 0.00 0.00
04:37:53 PM 0.00 0.00 3873.00 0.00 4134.00 0.00
0.00 0.00 0.00
04:37:54 PM 0.00 28.00 3763.00 0.00 3992.00 0.00
0.00 0.00 0.00
04:37:55 PM 0.00 0.00 12371.00 0.00 12656.00 0.00
0.00 0.00 0.00
04:37:56 PM 0.00 0.00 3731.00 0.00 60638.00 0.00
0.00 0.00 0.00
04:37:57 PM 0.00 104.00 3866.00 0.00 3835.00 0.00
0.00 0.00 0.00
04:37:58 PM 0.00 0.00 4116.00 0.00 4394.00 0.00
0.00 0.00 0.00
04:37:59 PM 0.00 0.00 7188.00 0.00 6622.00 0.00
0.00 0.00 0.00
04:38:00 PM 0.00 0.00 4940.00 0.00 10193.00 0.00
0.00 0.00 0.00
04:38:01 PM 0.00 60.00 133.00 0.00 9299.00 0.00
0.00 0.00 0.00
04:38:02 PM 0.00 44.00 49.00 0.00 299.00 0.00
0.00 0.00 0.00
04:38:03 PM 0.00 0.00 160.00 0.00 83.00 0.00
0.00 0.00 0.00
04:38:04 PM 0.00 0.00 24.00 0.00 64.00 0.00
0.00 0.00 0.00
04:38:05 PM 0.00 0.00 24.00 0.00 65.00 0.00
0.00 0.00 0.00
04:38:06 PM 0.00 0.00 746.00 0.00 470.00 0.00
0.00 0.00 0.00
04:38:07 PM 0.00 0.00 81.00 0.00 93162.00 0.00
0.00 0.00 0.00
04:38:08 PM 0.00 16.00 34.00 0.00 981.00 0.00
0.00 0.00 0.00
04:38:09 PM 0.00 0.00 680.00 0.00 545.00 0.00
0.00 0.00 0.00
04:38:10 PM 0.00 0.00 66.00 0.00 118.00 0.00
0.00 0.00 0.00
kudu(perf)===============================================>>>
root@hadoop958:~# perf stat -p 29160 -d
^C
Performance counter stats for process id '29160':
75367.439887 task-clock (msec) # 1.365 CPUs utilized
323,646 context-switches # 0.004 M/sec
8,683 cpu-migrations # 0.115 K/sec
632,653 page-faults # 0.008 M/sec
162,296,209,992 cycles # 2.153 GHz
[93.18%]
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
86,693,164,726 instructions # 0.53 insns per cycle
[95.31%]
15,026,610,761 branches # 199.378 M/sec
[95.44%]
29,680,120 branch-misses # 0.20% of all branches
[95.54%]
33,528,975,631 L1-dcache-loads # 444.873 M/sec
[95.91%]
2,484,449,334 L1-dcache-load-misses # 7.41% of all L1-dcache
hits [11.40%]
1,465,862,774 LLC-loads # 19.450 M/sec
[10.11%]
<not supported> LLC-load-misses:HG
55.217146805 seconds time elapsed
parquet(perf) =============================================>>>
root@hadoop958:~# perf stat -p 29160 -d
^C
Performance counter stats for process id '29160':
51130.516140 task-clock (msec) # 1.557 CPUs utilized
316,994 context-switches # 0.006 M/sec
8,232 cpu-migrations # 0.161 K/sec
190,272 page-faults # 0.004 M/sec
102,279,158,848 cycles # 2.000 GHz
[92.86%]
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
94,061,719,561 instructions # 0.92 insns per cycle
[95.23%]
16,498,918,033 branches # 322.682 M/sec
[95.15%]
30,073,148 branch-misses # 0.18% of all branches
[95.15%]
29,651,847,951 L1-dcache-loads # 579.925 M/sec
[95.43%]
2,101,137,839 L1-dcache-load-misses # 7.09% of all L1-dcache
hits [12.35%]
1,194,026,289 LLC-loads # 23.353 M/sec
[10.25%]
<not supported> LLC-load-misses:HG
32.849196160 seconds time elapsed
On 2018-01-04 00:20, Tim Armstrong <[email protected]> wrote:
> That's a tricky one. I have a couple of ideas but it's a bit difficult to
> confirm since the profile isn't really designed to easily answer questions
> like this. ProbeTime measures wall-clock time rather than actual time spent
> executing on the CPU.
>
> My first guess is that it's because the Kudu scan is using more CPU than
> the Parquet scan and ProbeTime so the thread doing the scan is competing
> for resources more with the join thread (either hyperthreads competing for
> resources on the same CPU or threads competing for time on logical
> processors). You could compare the User CPU time for the fragment instances
> containing the joins and scans to see if there is also a discrepancy in CPU
> time. That won't answer this directly but might provide some clues.
>
> My second guess is that there's some kind of subtle memory locality or
> scheduling effect.
>
> On Tue, Jan 2, 2018 at 11:40 PM, helifu <[email protected]> wrote:
>
> > Hi everybody,
> >
> >
> >
> > Recently I ran a simple PHJ onâparquetâ and âkuduâwith this sql
> > independently:
> >
> > select count(*) from lineitem as l, orders as o where l.l_orderkey =
> > o.o_orderkey and o.o_orderdate < '1996-01-01' and o.o_orderdate >=
> > '1995-01-01';
> >
> >
> >
> > And I found that the âProbeTimeâ on âkuduâis much larger than on
> > âparquetâ!!
> >
> >
> >
> > Below are the plans and profiles:
> >
> >
> >
> > Thanks in advance.
> >
> >
> >
> >
> >
> > 使夫
> >
> > 2017-04-10 16:06:24
> >
> >
> >
>