Re: Impala Parquet to Kudu 1.5 - severe ingest performance degradation

2018-02-22 Thread Hao Hao
Did you happen to check the health of the cluster after the upgrade by 'kudu cluster ksck'? Best, Hao On Thu, Feb 22, 2018 at 6:31 AM, Boris Tyukin wrote: > Hello, > > we just upgraded our dev cluster from Kudu 1.3 to kudu 1.5.0-cdh5.13.1 > and noticed quite severe

Re: Locks are acquired to cost much time in transactions

2018-09-24 Thread Hao Hao
Hi Xiaokai, If I understand you correctly, you are proposing to use key->ops hash to apply the write transaction instead of serializing the operations in PREPARE phrase with a lock? Here are some questions I have with this approach, 1) what the time stamp would be if there is no longer a lock

Re: Changing number of Kudu worker threads

2019-02-12 Thread Hao Hao
arrived requests. But I don't see it can help with handling more concurrent requests. Best, Hao On Tue, Feb 12, 2019 at 6:45 PM Boris wrote: > Thanks Hao, appreciate your response. > > Do we also need to bump other RPC thread related parameters queue etc.? > > On Tue, Feb 12, 2019

Re: Inconsistent read performance with Spark

2019-02-12 Thread Hao Hao
Hi Faraz, Answered inline below. Best, Hao On Tue, Feb 12, 2019 at 6:59 AM Faraz Mateen wrote: > Hi all, > > I am using spark to pull data from my single node testing kudu setup and > publish it to kafka. However, my query time is not consistent. > > I am querying a table with around *1.1

Re: Changing number of Kudu worker threads

2019-02-12 Thread Hao Hao
Hi Boris, Sorry for the delay, --rpc_num_service_threads sets the number of threads in RPC service thread pool (the default is 20 for tablet server, 10 for master). It should help with processing concurrent incoming RPC requests, but increasing it more than the number of available CPU cores of

Re: Inconsistent read performance with Spark

2019-02-13 Thread Hao Hao
ting data in random >>> primary key order? >> >> >> The table has hash partitioning on a ID column that can have 15 different >> values and range partition on datetime which is split monthly. Both ID and >> datetime are my primary keys. The data we ingest is in increasing o

Re: Inconsistent read performance with Spark

2019-02-14 Thread Hao Hao
hu, Feb 14, 2019 at 4:31 AM Hao Hao wrote: > >> Hi Faraz, >> >> What is the order of your primary key? Is it (datetime, ID) or (ID, >> datatime)? >> >> On the contrary, I suspect your scan performance got better for the same >> query because compaction hap

Re: [ANNOUNCE] Welcoming Yingchun Lai as a Kudu committer and PMC member

2019-06-06 Thread Hao Hao
Congratulations Yingchun, well deserved! Thank you for the hard work! On Thu, Jun 6, 2019 at 4:31 PM Andrew Wong wrote: > Well done Yingchun and congratulations! Keep up the good work! :) > > Andrew > > On Wed, Jun 5, 2019 at 11:25 AM Todd Lipcon wrote: > > > Hi Kudu community, > > > > I'm

Re: [ANNOUNCE] Welcoming Lifu He, Yao Xu, and Yao Zhang as Kudu committers and PMC members

2019-08-26 Thread Hao Hao
Congratulations! On Mon, Aug 26, 2019 at 10:33 AM Andrew Wong wrote: > Congratulations everyone! Keep up the great work! > > On Sun, Aug 25, 2019 at 9:40 PM Adar Lieber-Dembo > wrote: > > > Hi Kudu community, > > > > I'm happy to announce that the Kudu PMC has voted to add Lifu He, Yao > > Xu,

Re: [ANNOUNCE] Welcoming Bankim Bhavsar as Kudu committer and PMC member

2020-04-19 Thread Hao Hao
Congrats Bankim! Well deserved! Best, Hao On Sat, Apr 18, 2020 at 5:45 PM Andrew Wong wrote: > Congratulations Bankim! Keep up the great work  > > On Sat, Apr 18, 2020 at 3:28 PM Adar Dembo wrote: > >> Hi Kudu community, >> >> I'm happy to announce that the Kudu PMC has voted to add Bankim

[ANNOUNCE] Apache Kudu 1.12.0 Released

2020-05-18 Thread Hao Hao
The Apache Kudu team is happy to announce the release of Kudu 1.12.0! Kudu is an open source storage engine for structured data which supports low-latency random access together with efficient analytical access patterns. It is designed within the context of the Apache Hadoop ecosystem and