Congrats, Márton!
On Tue, Nov 14, 2023 at 11:15 AM Alexey Serbin wrote:
> Congratulations, Márton!
>
> On Tue, Nov 14, 2023 at 9:37 AM Andrew Wong wrote:
>
>> Hi Kudu community,
>>
>> I'm happy to announce that the Kudu PMC has voted to add Márton Greber as
>> a
>> new committer and PMC
they may be
> prioritizing flushing of inserted rows at the expense of updates,
> causing the tablets to retain a great number of WAL segments
> (containing older updates) for durability's sake.
>
>
> Just an FYI in case it helps confirm or rule it out, this refers to
> KUDU
> - the number of open files in the Kudu process in the tablet servers has
> increased to now more than 150,000 (as counted using 'lsof'); we raised the
> limit of maximum number of open files twice already to avoid a crash, but we
> (and our vendor) are concerned that something might not be
I don't believe that's true; there's support for Kudu in other SQL engines:
- Presto: https://prestodb.io/docs/current/connector/kudu.html
- Apache Hive: https://cwiki.apache.org/confluence/display/Hive/Kudu+Integration
- Apache Drill:
What you're seeing sort of makes sense given that partition assignment
uses "power of 2" selection process: two servers are chosen at random,
and the one with the fewer partitions is selected as the recipient of
the new partition. Given enough partitions, this algorithm should
result in an even
This doesn't seem to be a Kudu-related problem. The stack trace shows
no Kudu-related frames, and suggests an issue within YARN.
On Tue, Mar 24, 2020 at 8:18 PM chu zhi xing <1711731...@qq.com> wrote:
>
> Hi,
> Sorry, my last mail's format is terrible, so I edited it and sent it
> again.
>
ans and
> freezing the entire Kudu cluster. We had a good discussion on slack and
> Todd Lipcon suggested a good workaround using flush_threshold_secs till we
> move to 1.9 and it worked fine. Nowhere in the documentation, it was
> suggested to set this flag and actually it was one of these
This has been an excellent discussion to follow, with very useful feedback.
Thank you for that.
Boris, if I can try to summarize your position, it's that manual
partitioning doesn't scale when dealing with hundreds of (small) tables and
when you don't control the PK of each table. The Kudu schema
1. That's a great question. Maybe ask in u...@impala.apache.org?
2. Some Kudu users have reported success with Spark SQL, so if you're
comfortable with Spark, you could give that a shot. I wouldn't expect
it to out-perform Impala though.
On Tue, Dec 10, 2019 at 7:41 AM Yariv Moshe wrote:
>
>
I commented on KUDU-2483, but just to summarize: the remaining Kudu-side
work is small, and may have already been implemented in a private branch by
Lifu (cc'ed).
I'm not familiar enough with the Impala codebase to know how large of an
undertaking it would be to take advantage of Kudu bloom
Hi Kudu community,
I'm happy to announce that the Kudu PMC has voted to add Lifu He, Yao
Xu, and Yao Zhang as new committers and PMC members.
Lifu has worked on a variety of patches, from dead container deletion
in the block manager, to metric aggregation in the master, to several
performance
(+user, -dev, as this is more appropriate for the users list)
The Kudu master currently keeps a record of all tables and partitions,
including those that have been deleted. With a high enough rate of
table deletion it's theoretically possible for that to consume a lot
of disk space or memory. In
Unfortunately there's no way to do that currently: if you want to
delete a row, you must provide its complete primary key.
On Tue, Jul 16, 2019 at 2:01 PM John Mora wrote:
>
> Hi.
>
> I am trying to delete multiple rows at the same time through a condition
> using kudu-client.
>
> Let's say:
>
Thanks for the detailed summary and analysis. I want to make sure I
understand the overall cluster topology. You have two physical hosts,
each running one VM, with each VM running either 4 or 8 tservers. Is
that correct?
Here are some other questions:
- Have you verified that your _client_
Thanks for the report. I filed KUDU-2845 to track the issue.
On Tue, Jun 11, 2019 at 9:44 AM Todd Lipcon wrote:
>
> I guess the issue is that we use rapidjson's 'String' support to write out
> C++ strings, which are binary data, not valid UTF8. That's somewhat incorrect
> of us, and we should
Congrats, Yingchun! Thank you for all of your hard work on Kudu.
On Wed, Jun 5, 2019 at 11:25 AM Todd Lipcon wrote:
>
> Hi Kudu community,
>
> I'm happy to announce that the Kudu PMC has voted to add Yingchun Lai as a
> new committer and PMC member.
>
> Yingchun has been contributing to Kudu for
To add to Lifu's advice, assuming the transaction in this error is
representative of the average Kudu transaction, the tablet in question
has thousands of in-flight transactions. You should look into whether
one of its replicas is lagging and can't keep up with the incoming
writes.
On Tue, Jun 4,
500 MB RAM per VM is very low. Is this a typo? If not, it's the first
thing you should tackle.
As for your questions:
- You can look into increasing --maintenance_manager_num_threads. We
typically recommend a 1:3 ratio of threads to number of disks, but
that's when using spinning disks; with fast
Probably a scan with no predicates and a minimal projection. Then you
can iterate over the results and increment a count of rows.
Or, if you're using Impala, "SELECT COUNT(*) FROM FOO".
On Fri, Mar 22, 2019 at 3:23 AM Дмитрий Павлов wrote:
>
> Hi guys
>
> What is the quickest way to get total
ample errors: Not found: non-covered range' on the tasks, but of
> course I still end up with a bunch of failed tasks, and the partition is
> only added once all my tasks have failed.
>
> Do you perhaps have some guidance in this regard?
>
> On Wed, Mar 6, 2019 at 7:58 AM Adar Lieber-
Here are some other options:
1. Use the new KuduPartitioner class, available in master but not yet
in any releases. Given a PartialRow (i.e. a row to be inserted), you
can find its "partition index" and, more importantly for your use
case, receive an exception if no partition exists for the row.
The error indicates that the set of outstanding transactions on this
tablet is hovering around 64MB. The error message is inherently racy
in that the values included aren't necessarily the values that
triggered the failure, so don't worry about that aspect of it.
A couple things to think about:
Unfortunately that isn't possible with Kudu today. The workaround is,
as you said, to perform one scan per predicate and to union the
results.
KUDU-2494 tracks adding support for disjunctions (i.e. OR predicates);
if this is something you'd be interested in working on, your patches
would be
The 'make install' target only installs client headers/libraries; it
doesn't not install Kudu binaries. We should really update the
documentation to clarify that (KUDU-1375 describes the misleading
docs, but nobody has stepped up to fix it yet).
KUDU-1344 tracks additional work for improving the
The repository you're alluding to is published by a Kudu vendor; if
you want to see the repository updated for Kudu 1.8.0, you should
contact that vendor.
As for upstream Kudu, we don't publish binary release artifacts and
I'm not aware of any plans to change that. You might be interested in
Secondary indexing is a feature request that comes up fairly often.
KUDU-2613 is the tracking JIRA, but there's no real content in there.
Better to look at KUDU-2038, for which there's a work in progress
patch for bitmap indexing (https://gerrit.cloudera.org/c/11722/) that
you can also follow.
It is possible for a tserver's memory consumption to exceed
--memory_limit_hard_bytes if the tserver is hosting an excessive
amount of tablets (or tablet data blocks). That's usually a sign that
the tserver is overloaded relative to the amount of resources
allocated to it by the operator, and
ill be around 8-10Gb in size.
> Should I be worried since recommendation is to keep tablets about 1Gb in
> size?
>
> On Wed, Oct 17, 2018 at 8:06 PM Adar Lieber-Dembo
> wrote:
>
>> Hi Boris,
>>
>> > Also, when they say tablets - I assume this is before repl
Hi Boris,
> Also, when they say tablets - I assume this is before replication? so in
> reality, it is number of nodes x cpu cores / replication factor? If this is
> the case, it is not looking good...
No, I think this is post-replication. The underlying assumption is
that you want to maximize
How many tservers did you reformat? If more than one, it's important
to do the reformatting one at a time, so that degraded tablets can
rereplicate elsewhere. Based on the ksck results it looks like maybe
you reformatted two tservers at the same time? If not, perhaps tablet
Check out this older post by Todd Lipcon about the 300 column limit:
http://mail-archives.apache.org/mod_mbox/kudu-user/201706.mbox/%3CCADY20s7iT7%2BrVZNhagnNFUjk7-nNMxJK6%2BnHV%2B2SzpHXKFxvmw%40mail.gmail.com%3E
There are probably other folks who run with over 300 columns in their
schemas, but
table every day, with many updates.
>
>
> I deep dived into kudu flags configuration and found the following flags
> related to **BLOCK_SIZE**, what is the recommended value of these flags:
>
> --cfile_default_block_size=262144
>
> --deltafile_default_block_size=32768
>
> -default_co
> Even if the kudu server started, it also spent too much copying tablet, as
> the following tablet block copying log:
>
>
> Tablet 1ecbe230e14a4d9f9125dbc49c32860e of table
> 'impala::venus.ods_xk_pay_fee_order' is under-replicated: 1 replica(s) not
> RUNNING
>
The 8TB limit isn't a hard one, it's just a reflection of the scale
that Kudu developers commonly test. Beyond 8TB we can't vouch for
Kudu's stability and performance. For example, we know that as the
amount of on-disk data grows, node restart times get longer and longer
(see KUDU-2014 for some
On Tue, Apr 10, 2018 at 3:11 AM, Ksenya Leonova wrote:
>
> 1) Best practice in Kudu deployment:
> it is planned to use Kudu in conjunction with HDFS, so how do you usually
> solve the problem of sharing and flexible resource management between Kudu
> and HDFS?
At least
This sounds like an Impala bug to me.
http://issues.cloudera.org/browse/IMPALA-5217 seems relevant.
Since you're using Cloudera's distribution of Impala, I took a look at
Cloudera's release notes, and that bug was fixed in CDH 5.12.0,
5.11.2, and 5.10.2. So it makes sense that your Impala would
Franco,
Thanks for the detailed description of your problem.
I'm afraid there's no such mechanism in Kudu today. Mining the WALs seems
like a path fraught with land mines. Kudu GCs WAL segments aggressively so
I'd be worried about a listening mechanism missing out on some row
operations. Plus
The upper limit of 4 TB is for data on-disk (post-encoding,
post-compression, and post-replication); it does not include in-memory
data from memrowsets or deltamemstores.
The value of the limit is based on the kinds of workloads tested by
the Kudu development community. As a group we feel
38 matches
Mail list logo