Re: How can I look up the total disk space occupied by the kudu table

2017-05-31 Thread William Berkeley
Since a Kudu table is distributed across tablet servers, the total size of the table is the sum of the sizes of its tablets. /metrics has an entry for each tablet, which will list which table it came from and the on-disk size of the tablet, so you can roll up all these numbers to compute the total

Re: Why RowSet size is much smaller than flush_threshold_mb

2018-06-15 Thread William Berkeley
The op seen in the logs is a rowset compaction, which takes existing diskrowsets and rewrites them. It's not a flush, which writes data in memory to disk, so I don't think the flush_threshold_mb is relevant. Rowset compaction is done to reduce the amount of overlap of rowsets in primary key space,

Re: How to migrate kudu tablets

2018-01-02 Thread William Berkeley
move_replica isn't available in CDH 5.12 / Kudu 1.4. It's first available in CDH 5.13 / Kudu 1.5. There's a few solutions: 1. Running move_replica against a 5.12 cluster using a 5.13 or later kudu tool should work. 2. You can move replicas manually using add_replica to add the new replica,

Re: Segmentation Fault when running kudu ksck

2018-08-20 Thread William Berkeley
That looks like KUDU-2113, which was fixed in 1.6.0. It happens if the tablet servers report peers in their config that are not known to the master. Probably, you have removed servers from the cluster and some of the tablets are in a bad state as a result. These sorts of problems were

Re: Kudu hashes and Java hashes

2018-08-28 Thread William Berkeley
> 1. We have multiple Kudu clients (Reducers). Would it be better if each one has a single session to a single tablet writing large number of records, or multiple sessions writing to different tablets (total number of records is the same)? The advantage I see in writing to a single tablet from

Re: Kudu's data pagination

2018-09-04 Thread William Berkeley
Hi Irtiza. What do you mean by paginate? I'm guessing you mean doing something like taking the results of a query like SELECT name, age FROM users SORT BY age DESC and displaying the results on some UI 10 at a time, say. If that's the case, the answer is no. It requires additional application

Re: Unable to Initialize catalog manager

2018-07-05 Thread William Berkeley
You need the follow the directions at http://kudu.apache.org/docs/administration.html#migrate_to_multi_master to migrate from 1 to 3 masters. It's not sufficient just to start up the new masters and change the master_addresses flag. -Will On Thu, Jul 5, 2018 at 7:10 PM Sangeeta Gulia wrote: >

Re: Any plans for supporting schemas (namespaces of tables)?

2018-04-24 Thread William Berkeley
Hi Martin. I don't see any conflicts between that and any current or near-term work I know of happening in Kudu. There are a couple of related JIRAs for database support: KUDU-2063 and KUDU-2362 .

Re: Problems connecting form Spark

2018-03-06 Thread William Berkeley
In each case the problem is that some part of your application can't find the leader master of the Kudu cluster: org.apache.kudu.client.NoLeaderFoundException: Master config (*172.17.0.43:7077 *) has no leader. org.apache.kudu.client.NoLeaderFoundException: Master config

Re: kudu use impala query question!

2018-10-18 Thread William Berkeley
I think those messages are harmless and indicative of an underlying issue of leader elections, perhaps caused by load. The full ksck output would be helpful to understand more. This shouldn't cause writes to be lost. More likely the data was not written at all- are you checking for the success of

Re: Re[2]: [KUDU] Rebalancing tool

2018-12-04 Thread William Berkeley
question. > > is this (i mean leaders skew) something i should be concerned about? > In terms of load balancing for example in case if i use kudu 1.8.0 with > spark > > Regards Dmitry Pavlov > > > Вторник, 4 декабря 2018, 0:10 +03:00 от William Berkeley < > wdberk

Re: [KUDU] Rebalancing tool

2018-12-03 Thread William Berkeley
Yes, it's expected. The rebalancing tool does not balance leadership. In fact, it tries to avoid relocating leader replicas because the tool wants to minimize the disturbance to the cluster. It will relocate leader replicas if it has to, but it won't make any attempt to balance them. You can use

Re: Slow queries after massive deletions. Is it due to compaction?

2018-11-25 Thread William Berkeley
Hi Sergejs. You are correct. Kudu tracks deletes as a past data plus a "redo" that contains delete operations. The base data and the redos are stored on disk separately and are logically reconciled on scan. Brock is right that this situation is improved greatly for certain deletion patterns with

Re: Re: kuduissue!

2019-03-12 Thread William Berkeley
th impala, I found that the tablet of the newly > created kudu table was in the initialization state. > > ------ > 发件人:William Berkeley > 日 期:2019年03月12日 05:46:07 > 收件人: > 抄 送:Attila Bukor > 主 题:Re: Re: kuduissue! &