Re: [ANNOUNCE] Welcoming Márton Greber as Kudu committer and PMC member

2023-11-14 Thread Adar Lieber-Dembo
Congrats, Márton! On Tue, Nov 14, 2023 at 11:15 AM Alexey Serbin wrote: > Congratulations, Márton! > > On Tue, Nov 14, 2023 at 9:37 AM Andrew Wong wrote: > >> Hi Kudu community, >> >> I'm happy to announce that the Kudu PMC has voted to add Márton Greber as >> a >> new committer and PMC

Re: Tablet Server with almost 1TB of WALs (and large number of open files)

2020-03-31 Thread Adar Lieber-Dembo
they may be > prioritizing flushing of inserted rows at the expense of updates, > causing the tablets to retain a great number of WAL segments > (containing older updates) for durability's sake. > > > Just an FYI in case it helps confirm or rule it out, this refers to > KUDU

Re: Tablet Server with almost 1TB of WALs (and large number of open files)

2020-03-30 Thread Adar Lieber-Dembo
> - the number of open files in the Kudu process in the tablet servers has > increased to now more than 150,000 (as counted using 'lsof'); we raised the > limit of maximum number of open files twice already to avoid a crash, but we > (and our vendor) are concerned that something might not be

Re: Kudu - Dremio

2020-03-29 Thread Adar Lieber-Dembo
I don't believe that's true; there's support for Kudu in other SQL engines: - Presto: https://prestodb.io/docs/current/connector/kudu.html - Apache Hive: https://cwiki.apache.org/confluence/display/Hive/Kudu+Integration - Apache Drill:

Re: hash and range partition uneven distribution for one tablet server

2020-03-24 Thread Adar Lieber-Dembo
What you're seeing sort of makes sense given that partition assignment uses "power of 2" selection process: two servers are chosen at random, and the one with the fewer partitions is selected as the recipient of the new partition. Given enough partitions, this algorithm should result in an even

Re: The demo of mapreduce's problem_adjust last mail's format

2020-03-24 Thread Adar Lieber-Dembo
This doesn't seem to be a Kudu-related problem. The stack trace shows no Kudu-related frames, and suggests an issue within YARN. On Tue, Mar 24, 2020 at 8:18 PM chu zhi xing <1711731...@qq.com> wrote: > > Hi, > Sorry, my last mail's format is terrible, so I edited it and sent it > again. >

Re: Partitioning Rules of Thumb

2020-03-14 Thread Adar Lieber-Dembo
ans and > freezing the entire Kudu cluster. We had a good discussion on slack and > Todd Lipcon suggested a good workaround using flush_threshold_secs till we > move to 1.9 and it worked fine. Nowhere in the documentation, it was > suggested to set this flag and actually it was one of these

Re: Partitioning Rules of Thumb

2020-03-12 Thread Adar Lieber-Dembo
This has been an excellent discussion to follow, with very useful feedback. Thank you for that. Boris, if I can try to summarize your position, it's that manual partitioning doesn't scale when dealing with hundreds of (small) tables and when you don't control the PK of each table. The Kudu schema

Re: Impala or other query engine

2019-12-10 Thread Adar Lieber-Dembo
1. That's a great question. Maybe ask in u...@impala.apache.org? 2. Some Kudu users have reported success with Spark SQL, so if you're comfortable with Spark, you could give that a shot. I wouldn't expect it to out-perform Impala though. On Tue, Dec 10, 2019 at 7:41 AM Yariv Moshe wrote: > >

Re: Please please add bloom filter support

2019-10-20 Thread Adar Lieber-Dembo
I commented on KUDU-2483, but just to summarize: the remaining Kudu-side work is small, and may have already been implemented in a private branch by Lifu (cc'ed). I'm not familiar enough with the Impala codebase to know how large of an undertaking it would be to take advantage of Kudu bloom

[ANNOUNCE] Welcoming Lifu He, Yao Xu, and Yao Zhang as Kudu committers and PMC members

2019-08-25 Thread Adar Lieber-Dembo
Hi Kudu community, I'm happy to announce that the Kudu PMC has voted to add Lifu He, Yao Xu, and Yao Zhang as new committers and PMC members. Lifu has worked on a variety of patches, from dead container deletion in the block manager, to metric aggregation in the master, to several performance

Re: Dimension table delete and recreate

2019-08-14 Thread Adar Lieber-Dembo
(+user, -dev, as this is more appropriate for the users list) The Kudu master currently keeps a record of all tables and partitions, including those that have been deleted. With a high enough rate of table deletion it's theoretically possible for that to consume a lot of disk space or memory. In

Re: Delete or Update by Query

2019-07-16 Thread Adar Lieber-Dembo
Unfortunately there's no way to do that currently: if you want to delete a row, you must provide its complete primary key. On Tue, Jul 16, 2019 at 2:01 PM John Mora wrote: > > Hi. > > I am trying to delete multiple rows at the same time through a condition > using kudu-client. > > Let's say: >

Re: Underutilization of hardware resources with smaller number of Tservers

2019-07-12 Thread Adar Lieber-Dembo
Thanks for the detailed summary and analysis. I want to make sure I understand the overall cluster topology. You have two physical hosts, each running one VM, with each VM running either 4 or 8 tservers. Is that correct? Here are some other questions: - Have you verified that your _client_

Re: Kudu CLI tool JSON format

2019-06-11 Thread Adar Lieber-Dembo
Thanks for the report. I filed KUDU-2845 to track the issue. On Tue, Jun 11, 2019 at 9:44 AM Todd Lipcon wrote: > > I guess the issue is that we use rapidjson's 'String' support to write out > C++ strings, which are binary data, not valid UTF8. That's somewhat incorrect > of us, and we should

Re: [ANNOUNCE] Welcoming Yingchun Lai as a Kudu committer and PMC member

2019-06-06 Thread Adar Lieber-Dembo
Congrats, Yingchun! Thank you for all of your hard work on Kudu. On Wed, Jun 5, 2019 at 11:25 AM Todd Lipcon wrote: > > Hi Kudu community, > > I'm happy to announce that the Kudu PMC has voted to add Yingchun Lai as a > new committer and PMC member. > > Yingchun has been contributing to Kudu for

Re: Kudu memory pressure

2019-06-04 Thread Adar Lieber-Dembo
To add to Lifu's advice, assuming the transaction in this error is representative of the average Kudu transaction, the tablet in question has thousands of in-flight transactions. You should look into whether one of its replicas is lagging and can't keep up with the incoming writes. On Tue, Jun 4,

Re: tablet server not using all resources availables

2019-04-21 Thread Adar Lieber-Dembo
500 MB RAM per VM is very low. Is this a typo? If not, it's the first thing you should tackle. As for your questions: - You can look into increasing --maintenance_manager_num_threads. We typically recommend a 1:3 ratio of threads to number of disks, but that's when using spinning disks; with fast

Re: Kudu table api

2019-03-22 Thread Adar Lieber-Dembo
Probably a scan with no predicates and a minimal projection. Then you can iterate over the results and increment a count of rows. Or, if you're using Impala, "SELECT COUNT(*) FROM FOO". On Fri, Mar 22, 2019 at 3:23 AM Дмитрий Павлов wrote: > > Hi guys > > What is the quickest way to get total

Re: Check existing range partitions using the Java API

2019-03-06 Thread Adar Lieber-Dembo
ample errors: Not found: non-covered range' on the tasks, but of > course I still end up with a bunch of failed tasks, and the partition is > only added once all my tasks have failed. > > Do you perhaps have some guidance in this regard? > > On Wed, Mar 6, 2019 at 7:58 AM Adar Lieber-

Re: Check existing range partitions using the Java API

2019-03-05 Thread Adar Lieber-Dembo
Here are some other options: 1. Use the new KuduPartitioner class, available in master but not yet in any releases. Given a PartialRow (i.e. a row to be inserted), you can find its "partition index" and, more importantly for your use case, receive an exception if no partition exists for the row.

Re: Service unavailable due to exceeding transaction memory consumption

2019-02-22 Thread Adar Lieber-Dembo
The error indicates that the set of outstanding transactions on this tablet is hovering around 64MB. The error message is inherently racy in that the values included aren't necessarily the values that triggered the failure, so don't worry about that aspect of it. A couple things to think about:

Re: KuduScanner with multiple sets of compound primary keys

2018-12-11 Thread Adar Lieber-Dembo
Unfortunately that isn't possible with Kudu today. The workaround is, as you said, to perform one scan per predicate and to union the results. KUDU-2494 tracks adding support for disjunctions (i.e. OR predicates); if this is something you'd be interested in working on, your patches would be

Re: trying to install kudu from source

2018-12-10 Thread Adar Lieber-Dembo
The 'make install' target only installs client headers/libraries; it doesn't not install Kudu binaries. We should really update the documentation to clarify that (KUDU-1375 describes the misleading docs, but nobody has stepped up to fix it yet). KUDU-1344 tracks additional work for improving the

Re: any debian repository for Kudu 1.8.0 yet?

2018-12-04 Thread Adar Lieber-Dembo
The repository you're alluding to is published by a Kudu vendor; if you want to see the repository updated for Kudu 1.8.0, you should contact that vendor. As for upstream Kudu, we don't publish binary release artifacts and I'm not aware of any plans to change that. You might be interested in

Re: Index question

2018-11-01 Thread Adar Lieber-Dembo
Secondary indexing is a feature request that comes up fairly often. KUDU-2613 is the tracking JIRA, but there's no real content in there. Better to look at KUDU-2038, for which there's a work in progress patch for bitmap indexing (https://gerrit.cloudera.org/c/11722/) that you can also follow.

Re: kudu-tserver can not be started

2018-10-30 Thread Adar Lieber-Dembo
It is possible for a tserver's memory consumption to exceed --memory_limit_hard_bytes if the tserver is hosting an excessive amount of tablets (or tablet data blocks). That's usually a sign that the tserver is overloaded relative to the amount of resources allocated to it by the operator, and

Re: clarification on Partitioning Guidelines and CPU cores

2018-10-17 Thread Adar Lieber-Dembo
ill be around 8-10Gb in size. > Should I be worried since recommendation is to keep tablets about 1Gb in > size? > > On Wed, Oct 17, 2018 at 8:06 PM Adar Lieber-Dembo > wrote: > >> Hi Boris, >> >> > Also, when they say tablets - I assume this is before repl

Re: clarification on Partitioning Guidelines and CPU cores

2018-10-17 Thread Adar Lieber-Dembo
Hi Boris, > Also, when they say tablets - I assume this is before replication? so in > reality, it is number of nodes x cpu cores / replication factor? If this is > the case, it is not looking good... No, I think this is post-replication. The underlying assumption is that you want to maximize

Re: Kudu tablet server generated new uuid and did not disappeared from active configs

2018-10-17 Thread Adar Lieber-Dembo
How many tservers did you reformat? If more than one, it's important to do the reformatting one at a time, so that degraded tablets can rereplicate elsewhere. Based on the ksck results it looks like maybe you reformatted two tservers at the same time? If not, perhaps tablet

Re: Kudu 300 columns limitation

2018-08-30 Thread Adar Lieber-Dembo
Check out this older post by Todd Lipcon about the 300 column limit: http://mail-archives.apache.org/mod_mbox/kudu-user/201706.mbox/%3CCADY20s7iT7%2BrVZNhagnNFUjk7-nNMxJK6%2BnHV%2B2SzpHXKFxvmw%40mail.gmail.com%3E There are probably other folks who run with over 300 columns in their schemas, but

Re: How to decrease kudu server restart time

2018-08-15 Thread Adar Lieber-Dembo
table every day, with many updates. > > > I deep dived into kudu flags configuration and found the following flags > related to **BLOCK_SIZE**, what is the recommended value of these flags: > > --cfile_default_block_size=262144 > > --deltafile_default_block_size=32768 > > -default_co

Re: How to decrease kudu server restart time

2018-08-13 Thread Adar Lieber-Dembo
> Even if the kudu server started, it also spent too much copying tablet, as > the following tablet block copying log: > > > Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table > 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s) not > RUNNING >

Re: Recommended maximum amount of stored data per tablet server

2018-08-02 Thread Adar Lieber-Dembo
The 8TB limit isn't a hard one, it's just a reflection of the scale that Kudu developers commonly test. Beyond 8TB we can't vouch for Kudu's stability and performance. For example, we know that as the amount of on-disk data grows, node restart times get longer and longer (see KUDU-2014 for some

Re: Kudu deployment best practice

2018-04-10 Thread Adar Lieber-Dembo
On Tue, Apr 10, 2018 at 3:11 AM, Ksenya Leonova wrote: > > 1) Best practice in Kudu deployment: > it is planned to use Kudu in conjunction with HDFS, so how do you usually > solve the problem of sharing and flexible resource management between Kudu > and HDFS? At least

Re: Kudu silently failing to set nullable column

2018-04-09 Thread Adar Lieber-Dembo
This sounds like an Impala bug to me. http://issues.cloudera.org/browse/IMPALA-5217 seems relevant. Since you're using Cloudera's distribution of Impala, I took a look at Cloudera's release notes, and that bug was fixed in CDH 5.12.0, 5.11.2, and 5.10.2. So it makes sense that your Impala would

Re: Change Data Capture (CDC) with Kudu

2017-09-22 Thread Adar Lieber-Dembo
Franco, Thanks for the detailed description of your problem. I'm afraid there's no such mechanism in Kudu today. Mining the WALs seems like a path fraught with land mines. Kudu GCs WAL segments aggressively so I'd be worried about a listening mechanism missing out on some row operations. Plus

Re: Question about per server data upper limit.

2017-08-31 Thread Adar Lieber-Dembo
The upper limit of 4 TB is for data on-disk (post-encoding, post-compression, and post-replication); it does not include in-memory data from memrowsets or deltamemstores. The value of the limit is based on the kinds of workloads tested by the Kudu development community. As a group we feel