Kudu would be a different folder but they could all be on the same disks, just depends on how you configured it. ZK needs dedicated disks to be usable so I can definitely see Kudu having a huge impact on ZK if they were sharing those spindles.
On Wed, Jan 3, 2018 at 12:20 PM, Boris Tyukin <bo...@boristyukin.com> wrote: > it is possible but I thought Kudu keeps its stuff in its own folders > > On Wed, Jan 3, 2018 at 1:45 PM, Jean-Daniel Cryans <jdcry...@apache.org> > wrote: > >> Hey Boris, >> >> Thanks for reporting back with results! >> >> On Wed, Jan 3, 2018 at 10:38 AM, Boris Tyukin <bo...@boristyukin.com> >> wrote: >> >>> so it was the page cache that makes this difference. we did a series of >>> tests either restarting Kudu only, Impala only or both and resetting or not >>> touching page cache. >>> >>> as for Kudu failures after restart, it was a sequence of services that >>> need to be started before Kudu. If we start Kudu after HDFS, everything is >>> fine. Data is intact >>> >> >> Is it possible that Kudu is sharing disks with ZK? >> >> >>> >>> thanks again for your help, J-D >>> >>> On Sat, Dec 16, 2017 at 4:05 PM, Jean-Daniel Cryans <jdcry...@apache.org >>> > wrote: >>> >>>> I'm more thinking in terms of the startup IO having some impact on the >>>> co-located services, but we really need to know what "went down" means. >>>> >>>> On Sat, Dec 16, 2017 at 12:50 PM, Boris Tyukin <bo...@boristyukin.com> >>>> wrote: >>>> >>>>> yep it is really weird since Kudu does not use neither one. I'll get >>>>> with him on Monday to gather more details >>>>> >>>>> On Sat, Dec 16, 2017 at 3:28 PM, Jean-Daniel Cryans < >>>>> jdcry...@apache.org> wrote: >>>>> >>>>>> Hi Boris, >>>>>> >>>>>> How exactly did HDFS and ZK go down? A Kudu restart is fairly >>>>>> IO-intensive but I don't know how that can cause things like DataNodes to >>>>>> fail. >>>>>> >>>>>> J-D >>>>>> >>>>>> On Sat, Dec 16, 2017 at 11:45 AM, Boris Tyukin <bo...@boristyukin.com >>>>>> > wrote: >>>>>> >>>>>>> well our admin had fun two days - it was the first time we restarted >>>>>>> Kudu on our DEV cluster and it did not go well. He is still >>>>>>> troubleshooting >>>>>>> what happened but after Kudu restart zookeeper and HDFS went down after >>>>>>> 3-4 >>>>>>> minutes. If we disable Kudu, all is well. No error in Kudu logs...I will >>>>>>> have more details next week so not asking for help as I do not know all >>>>>>> the >>>>>>> details. What is obvious thought is that it has to do something with >>>>>>> Kudu :) >>>>>>> >>>>>>> On Thu, Dec 14, 2017 at 9:40 AM, Boris Tyukin <bo...@boristyukin.com >>>>>>> > wrote: >>>>>>> >>>>>>>> thanks for your suggestions, J-D, I am sure you are right more >>>>>>>> often than that! :)) >>>>>>>> >>>>>>>> I will report back with our results. So far I am really impressed >>>>>>>> with Kudu - we have been benchmarking ingest and egress throughput and >>>>>>>> our >>>>>>>> typical queries runtime. The biggest pain so far is lack of support for >>>>>>>> decimals >>>>>>>> >>>>>>>> On Wed, Dec 13, 2017 at 5:07 PM, Jean-Daniel Cryans < >>>>>>>> jdcry...@apache.org> wrote: >>>>>>>> >>>>>>>>> On Wed, Dec 13, 2017 at 11:30 AM, Boris Tyukin < >>>>>>>>> bo...@boristyukin.com> wrote: >>>>>>>>> >>>>>>>>>> thanks J-D! we are going to try that and see how it impacts the >>>>>>>>>> runtime. >>>>>>>>>> >>>>>>>>>> is there any way to load this metadata upfront? a lot of our >>>>>>>>>> queries are adhoc in nature but they will be hitting the same tables >>>>>>>>>> with >>>>>>>>>> different predicates and join patterns though. >>>>>>>>>> >>>>>>>>> >>>>>>>>> You could use Impala to compute all the stats of all the tables >>>>>>>>> after each Kudu restart. Actually, do try that, restart Kudu then >>>>>>>>> compute >>>>>>>>> stats and see how fast it scans. >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I am curious why this metadata does not survive restarts though. >>>>>>>>>> We are going to run our benchmarks again and this time restart Kudu >>>>>>>>>> and >>>>>>>>>> Impala. >>>>>>>>>> >>>>>>>>> >>>>>>>>> It's in the tserver memory, it can't survive a restart. >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> I just ran another query first time which hits 2 large tables and >>>>>>>>>> these tables have been scanned by the previous query and this time I >>>>>>>>>> do not >>>>>>>>>> see any difference in query time before the first and second time - >>>>>>>>>> I guess >>>>>>>>>> this confirms your statement about " first time ever scanning >>>>>>>>>> the table since a Kudu restart" and collecting metadata. >>>>>>>>>> >>>>>>>>> >>>>>>>>> Maybe, I've been known to be right once or twice a year :) >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Wed, Dec 13, 2017 at 11:18 AM, Jean-Daniel Cryans < >>>>>>>>>> jdcry...@apache.org> wrote: >>>>>>>>>> >>>>>>>>>>> Hi Boris, >>>>>>>>>>> >>>>>>>>>>> Given that we don't have much data we can use here, I'll have to >>>>>>>>>>> extrapolate. As an aside though, this is yet another example where >>>>>>>>>>> we need >>>>>>>>>>> more Kudu-side metrics in the query profile. >>>>>>>>>>> >>>>>>>>>>> So, Kudu lazily loads a bunch of metadata and that can really >>>>>>>>>>> affect scan times. If this was your first time ever scanning the >>>>>>>>>>> table >>>>>>>>>>> since a Kudu restart, it's very possible that that's where that >>>>>>>>>>> time was >>>>>>>>>>> spent. There's also the page cache in the OS that might now be >>>>>>>>>>> populated. >>>>>>>>>>> You could do something like "sync; echo 3 > >>>>>>>>>>> /proc/sys/vm/drop_caches" on >>>>>>>>>>> all the machines and run the query 2 times again, without >>>>>>>>>>> restarting Kudu, >>>>>>>>>>> to understand the effect of the page cache itself. There's >>>>>>>>>>> currently now >>>>>>>>>>> way to purge the cached metadata in Kudu though. >>>>>>>>>>> >>>>>>>>>>> Hope this helps a bit, >>>>>>>>>>> >>>>>>>>>>> J-D >>>>>>>>>>> >>>>>>>>>>> On Wed, Dec 13, 2017 at 8:07 AM, Boris Tyukin < >>>>>>>>>>> bo...@boristyukin.com> wrote: >>>>>>>>>>> >>>>>>>>>>>> Hi guys, >>>>>>>>>>>> >>>>>>>>>>>> I am doing some benchmarks with Kudu and Impala/Parquet and >>>>>>>>>>>> hope to share it soon but there is one thing that bugs me. This is >>>>>>>>>>>> perhaps >>>>>>>>>>>> Impala question but since I am using Kudu with Impala I am going >>>>>>>>>>>> to try and >>>>>>>>>>>> ask anyway. >>>>>>>>>>>> >>>>>>>>>>>> One of my queries takes 120 seconds to run the very first time. >>>>>>>>>>>> It joins one large 5B row table with a bunch of smaller tables and >>>>>>>>>>>> then >>>>>>>>>>>> stores result in Impala/parquet (not Kudu). >>>>>>>>>>>> >>>>>>>>>>>> Now if I run it second and third time, it only takes 60 >>>>>>>>>>>> seconds. Can someone explain why? Is there any settings to >>>>>>>>>>>> decrease this >>>>>>>>>>>> gap? >>>>>>>>>>>> >>>>>>>>>>>> I've compared query profiles in CM and the only thing that was >>>>>>>>>>>> very different is scan against Kudu table (the large one): >>>>>>>>>>>> >>>>>>>>>>>> *************************** >>>>>>>>>>>> first time: >>>>>>>>>>>> *************************** >>>>>>>>>>>> KUDU_SCAN_NODE (id=0) (47.68s) >>>>>>>>>>>> <https://lkmaorabd103.multihosp.net:7183/cmf/impala/queryDetails?queryId=5143f7165be82819%3Ae00a103500000000&serviceName=impala#> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - BytesRead: *0 B* >>>>>>>>>>>> - InactiveTotalTime: *0ns* >>>>>>>>>>>> - KuduRemoteScanTokens: *0* >>>>>>>>>>>> - NumScannerThreadsStarted: *20* >>>>>>>>>>>> - PeakMemoryUsage: *35.8 MiB* >>>>>>>>>>>> - RowsRead: *693,502,241* >>>>>>>>>>>> - RowsReturned: *693,502,241* >>>>>>>>>>>> - RowsReturnedRate: *14643448 per second* >>>>>>>>>>>> - ScanRangesComplete: *20* >>>>>>>>>>>> - ScannerThreadsInvoluntaryContextSwitches: *1,341* >>>>>>>>>>>> - ScannerThreadsTotalWallClockTime: *36.2m* >>>>>>>>>>>> - MaterializeTupleTime(*): *47.57s* >>>>>>>>>>>> - ScannerThreadsSysTime: *31.42s* >>>>>>>>>>>> - ScannerThreadsUserTime: *1.7m* >>>>>>>>>>>> - ScannerThreadsVoluntaryContextSwitches: *96,855* >>>>>>>>>>>> - TotalKuduScanRoundTrips: *52,308* >>>>>>>>>>>> - TotalReadThroughput: *0 B/s* >>>>>>>>>>>> - TotalTime: *47.68s* >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> *************************** >>>>>>>>>>>> second time: >>>>>>>>>>>> *************************** >>>>>>>>>>>> KUDU_SCAN_NODE (id=0) (4.28s) >>>>>>>>>>>> <https://lkmaorabd103.multihosp.net:7183/cmf/impala/queryDetails?queryId=53497a308f860837%3A243772e000000000&serviceName=impala#> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> - BytesRead: *0 B* >>>>>>>>>>>> - InactiveTotalTime: *0ns* >>>>>>>>>>>> - KuduRemoteScanTokens: *0* >>>>>>>>>>>> - NumScannerThreadsStarted: *20* >>>>>>>>>>>> - PeakMemoryUsage: *37.9 MiB* >>>>>>>>>>>> - RowsRead: *693,502,241* >>>>>>>>>>>> - RowsReturned: *693,502,241* >>>>>>>>>>>> - RowsReturnedRate: *173481534 per second* >>>>>>>>>>>> - ScanRangesComplete: *20* >>>>>>>>>>>> - ScannerThreadsInvoluntaryContextSwitches: *1,451* >>>>>>>>>>>> - ScannerThreadsTotalWallClockTime: *19.5m* >>>>>>>>>>>> - MaterializeTupleTime(*): *4.20s* >>>>>>>>>>>> - ScannerThreadsSysTime: *38.22s* >>>>>>>>>>>> - ScannerThreadsUserTime: *1.7m* >>>>>>>>>>>> - ScannerThreadsVoluntaryContextSwitches: *480,870* >>>>>>>>>>>> - TotalKuduScanRoundTrips: *52,142* >>>>>>>>>>>> - TotalReadThroughput: *0 B/s* >>>>>>>>>>>> - TotalTime: *4.28s* >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >> >