Re: [ANNOUNCE] Two new Kudu committer/PMC members

2016-09-12 Thread Dan Burkert
Congrats! On Mon, Sep 12, 2016 at 4:27 PM, Jordan Birdsell wrote: > Congrats > > On Mon, Sep 12, 2016 at 7:09 PM Brock Noland wrote: > >> Congratulations!! >> >> On Mon, Sep 12, 2016 at 6:06 PM, David Alves >> wrote: >> >>> Congrats Alexey and Will!! >>> >>> On Mon, Sep 12, 2016 at 3:55 PM, To

Re: Spark on Kudu

2016-09-20 Thread Dan Burkert
: > Now that Kudu 1.0.0 is officially out and ready for production use, where > do we find the spark connector jar for this release? > > Thanks, > Ben > > > On Jun 17, 2016, at 11:08 AM, Dan Burkert wrote: > > Hi Ben, > > To your first question about `CREATE TABLE

Re: Create encoded columns in kudu

2016-09-21 Thread Dan Burkert
On Wed, Sep 21, 2016 at 7:53 AM, Jean-Daniel Cryans wrote: > Hi Amit, > > There's this jira on the Impala side: https://issues.cloudera. > org/browse/IMPALA-3726 > > I don't know exactly when it'll be available, but I think it's being > looked at. >

Re: Spark on Kudu

2016-10-10 Thread Dan Burkert
onnector jar for this release? >>> >>> >> It's available in the official ASF maven repository: >> https://repository.apache.org/#nexus-search;quick~kudu-spark >> >> >> org.apache.kudu >> kudu-spark_2.10 >> 1.0.0 >> >>

[ANNOUNCE] Apache Kudu 1.0.1 release

2016-10-11 Thread Dan Burkert
The Apache Kudu team is happy to announce the release of Kudu 1.0.1! Kudu is an open source storage engine for structured data which supports low-latency random access together with efficient analytical access patterns. It is designed within the context of the Apache Hadoop ecosystem and supports

Re: thirdparty llvm build failed on mac

2016-12-14 Thread Dan Burkert
Hi Zhen, The issue with homebrew krb5 is that it's contained in the dupes tap. Run `brew tap homebrew/dupes` and then `brew install krb5` should succeed. I've submitted a change to the docs to call this out in the install instructions. The compile issue may be that you do not have XCode installe

Re: thirdparty llvm build failed on mac

2016-12-15 Thread Dan Burkert
w tap homebrew/dupes', krb5 can be installed >> successfully. But what do mean about 'CLT is not sufficient'? What else >> should I Installed? And how to install them? >> >> Thanks, >> >> Zhen >> >> >> >> 2016-12-15 10:39 GM

Re: Monitoring Use Cases for Kudu

2017-01-31 Thread Dan Burkert
Hi Senthil, Kudu masters and tservers expose internal metrics in JSON format at the /metrics endpoint of the web UI. You could start by scraping this endpoint every 10 seconds or so, and recording that. I think that's more or less how Cloudera Manager exposes Kudu metrics. Our docs

Re: Kudu kerberos flags

2017-02-06 Thread Dan Burkert
Hi Amit, Kerberos support is not yet ready to turn on, it's still being actively worked on. When it's ready for production use we'll remove the 'experimental' designator, and you will see those flags move out of the unsupported section (we also reserve the right to change or remove them while the

Re: Adding examples to docs?

2017-02-12 Thread Dan Burkert
Hi Darren, Assuming you are asking about Impala syntax, you can find some examples here: https://kudu.apache.org/docs/kudu_impala_integration.html#advanced_partitioning - Dan On Sun, Feb 12, 2017 at 6:37 PM, Darren Hoo wrote: > specifically what is the SQL syntax for multi-level partitioning?

Re: How to get the health of Kudu

2017-02-16 Thread Dan Burkert
Hi Mike, I think your best bet is the 'ksck' tool, you can see the various options and health checks it exposes by running 'kudu cluster ksck --help'. - Dan On Thu, Feb 16, 2017 at 1:06 PM, Mike Zupan wrote: > Hi all, > > We need to upgrade nodes in the kudu cluster and we are planning on > br

Re: kudu table design question

2017-02-23 Thread Dan Burkert
Hi Tenny, First off, how many tablet servers are in your cluster? 16 partitions is appropriate for one or maybe two tablet servers, so if your cluster is bigger you could try bumping the number of partitions. Second, the schemas don't look identical, you have an additional 'id' column in the Kud

Re: mixing range and hash partitioning

2017-02-24 Thread Dan Burkert
Hi Paul, I think the issue you are running into is that if you don't add a range partition explicitly during table creation (by calling add_range_partition or inserting a split with add_range_partition_split), Kudu will default to creating 1 unbounded range partition. So your two options are to a

Re: kudu table design question

2017-02-24 Thread Dan Burkert
> On Thu, Feb 23, 2017 at 6:29 PM, Todd Lipcon wrote: > >> I'd add that moving the print_date_id to the beginning of the primary key >> in the Kudu fact table would allow each server to do a range scan instead >> of a full scan. >> >> -Todd >>

Re: mixing range and hash partitioning

2017-02-24 Thread Dan Burkert
tioning (by commenting out the call to > add_hash_partitions), adding a bounded partition succeeds, regardless of > whether I first drop the unbounded partition. This seems surprising; why > the difference? > > On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert > wrote: > >> Hi P

Re: mixing range and hash partitioning

2017-02-28 Thread Dan Burkert
ly have range partitioning (by commenting out the call to add_hash_partitions), adding a bounded partition succeeds, regardless of whether I first drop the unbounded partition. This seems surprising; why the difference? On Fri, Feb 24, 2017 at 4:20 PM, Dan Burkert wrote: Hi Paul, I think the issue you a

Re: mixing range and hash partitioning

2017-02-28 Thread Dan Burkert
gt; On Tue, Feb 28, 2017 at 10:05 AM, Dan Burkert > wrote: > >> Hey Paul, >> >> Thanks for checking that out and following up. I'm going to try and root >> cause this today so that we have plenty of time to get a fix in to 1.3 if >> it requires one. Th

Re: stripes, JBOD: Assignment and rebalance ?

2017-03-02 Thread Dan Burkert
Hi Alexandre, responses inline On Thu, Mar 2, 2017 at 9:18 AM, Alexandre Fouché wrote: > > > When storing data on multiple JBOD disks, will Kudu assign data for > tablets efficiently as far as tablet sizes or activity are concerned, or > will it simply try to assign roughly the same number of ta

Re: stripes, JBOD: Assignment and rebalance ?

2017-03-02 Thread Dan Burkert
y when some > gets bigger than other, in order not to fill a disk while the other disk > space could remain mostly free ? > > 2017-03-02 18:26 GMT+01:00 Dan Burkert : > >> Hi Alexandre, >> >> responses inline >> >> On Thu, Mar 2, 2017 at 9:18 AM, Alexandre Fou

Re: mixing range and hash partitioning

2017-03-06 Thread Dan Burkert
itions. Thanks again for the report! - Dan On Tue, Feb 28, 2017 at 1:03 PM, Dan Burkert wrote: > Yep: https://issues.apache.org/jira/browse/KUDU-1903 > > - Dan > > On Tue, Feb 28, 2017 at 12:51 PM, Todd Lipcon wrote: > >> Hey Dan, >> >> Mind filing a critical o

Re: I have a question about KUDU Disk.

2017-03-23 Thread Dan Burkert
Hi Jinsu, There is no limit quota functionality in Kudu, per se, but we do have a flag that configures Kudu to stop using a data directory after the disk has less than a set number of bytes free: -fs_data_dirs_reserved_bytes (Number of bytes to reserve on each data directory filesystem for

Re: How to flush `block_cache_capacity_mb` easily?

2017-04-07 Thread Dan Burkert
Hi Jason, There is no command to have Kudu evict its block cache, but restarting the tablet server process will have that effect. Ideally all written data will be flushed before the restart, otherwise startup/bootstrap will take a bit longer. Flushing typically happens within 60s of the last writ

Re: Spark 2.1 and Hive Metastore

2017-04-09 Thread Dan Burkert
Hi Ben, Was this meant for the Spark user list, or is there something specific to the Spark/Kudu integration you are asking about? - Dan On Sun, Apr 9, 2017 at 11:13 AM, Benjamin Kim wrote: > I’m curious about if and when Spark SQL will ever remove its dependency on > Hive Metastore. Now that

Re: Is there any recommended scale out strategy?

2017-04-10 Thread Dan Burkert
Kudu does not yet have a way to request tablet rebalancing, but we do have a few tools for balancing tablets manually. For example, if you had a tablet 'c5299ec14315401a89316b62afad5877' which you wanted to remove from an old tserver 'c5299ec14315401a89316b62afad5877' and add to a new tserver '4e6

Re: Is there any recommended scale out strategy?

2017-04-10 Thread Dan Burkert
Oops, the tablet ID I used in the example is '4398cf80d68141cdbdae882e97b6da45', not 'c5299ec14315401a89316b62afad5877'. - Dan On Mon, Apr 10, 2017 at 4:34 PM, Dan Burkert wrote: > Kudu does not yet have a way to request tablet rebalancing, but we do have > a few t

Re: Physical Tablet Data size is larger than size in Chart Library.

2017-04-12 Thread Dan Burkert
Hi Jason, First question: what filesystem and OS are you running? This has been an ongoing area of work; we fixed a few major issues in 1.2, and a few more major issues in 1.3, and have a new tool ('kudu fs check') that will be released in 1.4 to diagnose and fix further issues. In some cases we

Re: Question about redistributing tablets on failure of a tserver.

2017-04-12 Thread Dan Burkert
Hi Jason, answers inline: On Wed, Apr 12, 2017 at 5:53 AM, Jason Heo wrote: > > Q1. Can I disable redistributing tablets on failure of a tserver? The > reason why I need this is described in Background. > We don't have any kind of built-in maintenance mode that would prevent this, but it can be

Re: Physical Tablet Data size is larger than size in Chart Library.

2017-04-12 Thread Dan Burkert
Adar has told me it's fine to run the new 'kudu fs check' tool against a Kudu 1.2 server. It will require building locally, though. - Dan On Wed, Apr 12, 2017 at 10:59 AM, Dan Burkert wrote: > Hi Jason, > > First question: what filesystem and OS are you running? >

Re: Data encryption in Kudu

2017-04-25 Thread Dan Burkert
Hi Franco, I think you are right that a client-based approach wouldn't work, because we wouldn't want to encrypt at the level of individual cell values. That would get in the way of encoding, compression, predicate evaluation, etc. As you note, adding encryption at the block layer is probably the

Re: Data encryption in Kudu

2017-05-02 Thread Dan Burkert
hat Dan said. > > I think there are a number of interesting design alternatives to be > considered, so before coding it would be great to work through a design > document to explore the alternatives. For example, we could try to apply > encryption at the 'fs/' layer, which

Re: Data encryption in Kudu

2017-05-05 Thread Dan Burkert
n the same way and > should be encrypted too), and after that the WALs. > Yah, I think cfiles are a good place to start. AFAIK delta files reuse the cfile machinery when writing to disk. I originally considered recommending looking at the filesystem block manager, but we often do offset lookups

Re: Coprocessors

2017-05-17 Thread Dan Burkert
Hi Cheyenne, There is currently no support for coprocessors, nor is it something anyone is working on, as far as I know. Is there specific functionality you are looking for? - Dan On Wed, May 17, 2017 at 1:06 PM, Cheyenne Forbes < cheyenne.osanu.for...@gmail.com> wrote: > Will there be or are

Re: Coprocessors

2017-05-17 Thread Dan Burkert
; On Wed, May 17, 2017 at 3:22 PM, Cheyenne Forbes < > cheyenne.osanu.for...@gmail.com> wrote: > >> full text indexing... in hbase I use the coprocessors to create and >> search full text indexes with lucene. >> >> Regards, >> >> Cheyenne O. Forbes >>

Re: Hbase's Phoenix SQL "clone" for Kudu

2017-05-17 Thread Dan Burkert
The closest thing that exists right now is the Impala or SparkSQL integrations. As far as I know the targeted use cases are a little different, with Phoenix more focussed on OLTP workloads and Kudu targeting analytic workloads, at least on the read side. - Dan On Wed, May 17, 2017 at 1:26 PM, Ch

Re: Hbase's Phoenix SQL "clone" for Kudu

2017-05-17 Thread Dan Burkert
ng built > in 2017 and wanted to use something instead of mysql to store users, posts, > likes, comments and messages would you recommend using Kudu over Hbase in > this case? > > Regards, > > Cheyenne O. Forbes > > On Wed, May 17, 2017 at 3:41 PM, Dan Burkert > wrote:

Re: Question about redistributing tablets on failure of a tserver.

2017-05-20 Thread Dan Burkert
XXX7a2c637e3 of table 'impala::tbl1' is >>> under-replicated: 1 replica(s) not RUNNING >>> a7ca07f9b3d94414XXXb (hostname.com:7050): RUNNING >>> 40XXXd5b5feda1de212b (hostname.com:7050): RUNNING [LEADER] >>> aec55b4e2acX

Re: Question about redistributing tablets on failure of a tserver.

2017-05-22 Thread Dan Burkert
d' > already in progress: copying tablet > > But, after applied, there were no such messages. > > 3. > before applying, I used Kudu 1.3.0 and version is upgraded to 1.4 by using > the patch. > > Thanks. > > > 2017-05-21 0:02 GMT+09:00 Dan Burkert : > >&g

Re: Question about redistributing tablets on failure of a tserver.

2017-05-22 Thread Dan Burkert
Woops, I meant it should land in time for 1.4. - Dan On Mon, May 22, 2017 at 12:32 PM, Dan Burkert wrote: > Thanks for the info, Jason. I spent some more time looking at this today, > and confirmed that the patch is working as intended. I've updated the > commit message with m

Re: KuduScanToken pushdown question

2017-07-21 Thread Dan Burkert
Hi Clifford, Currently there isn't a way to do that. If you are 100% sure the PK ranges don't overlap, you might consider creating multiple sets of scan tokens, each with a unique range (through separate ScanTokenBuilder instances). This is more or less what Kudu would do behind the scenes to sup

Re: how can I get data by Primary Key faster with c++ api of kudu

2017-07-24 Thread Dan Burkert
That is the correct way. Adding EQUAL predicates on all columns of the primary key will result in an optimal single-row scan. - Dan On Mon, Jul 24, 2017 at 12:50 AM, 曾巍 wrote: > hello: > my code as below > p = KuduTable->NewComparisonPredicate("c1", kudu.EQUAL, i) > scanner.AddConjunctPredicat

Re: DMP/CDP Profile Store

2017-09-08 Thread Dan Burkert
Hi Ben, This is certainly an interesting idea. I think the architecture you laid out could be successful, especially if the set of attributes is relatively static. I just have a couple thoughts on various things: * Co-locating partitions Assuming you will be hash partitioning over the base ID o

Re: kudu resource/hardware question

2017-09-14 Thread Dan Burkert
Hi Amit, Access to Kudu via the Impala JDBC interface do go through Impala, and should be accounted for in Impala resource and capacity planning. Access to Kudu via the Kudu Java client API do not go through Impala, and therefore do not need to be accounted for in Impala capacity planning. Usage

Re: INT128 Column Support Interest

2017-11-16 Thread Dan Burkert
I think it would be useful. As far as I've seen the main costs in carrying data types are in writing performant encoders, and updating integrations to work with them. I'm guessing with 128 bit integers there would be some integrations that can't or won't support it, which might be a cause for con

Re: INT128 Column Support Interest

2017-11-16 Thread Dan Burkert
Aren't we going to need efficient encodings in order to make decimal work well, anyway? - Dan On Thu, Nov 16, 2017 at 2:54 PM, Todd Lipcon wrote: > On Thu, Nov 16, 2017 at 2:28 PM, Dan Burkert > wrote: > > > I think it would be useful. As far as I've seen the main c

Re: Efficient way of computing max(PK) in Kudu

2017-12-14 Thread Dan Burkert
Hi Franco, Great question, and I think this gets towards a deeper use-case that Kudu could really excel at, but currently doesn't have the full set of required features to support. To your original question: you've pretty much covered all of the bases. Kudu doesn't have an efficient way to searc

Re: Kudu Queries

2017-12-20 Thread Dan Burkert
Hi Ajay, Have you looked at the documentation section on kudu.apache.org? In particular these sections may be helpful: https://kudu.apache.org/docs/schema_design.html https://kudu.apache.org/docs/administration.html#migrate_to_multi_master https://kudu.apache.org/docs/administration.html#_adding

Re: Cannot create Kudu table with Range Partitioning

2018-02-12 Thread Dan Burkert
Hi Zakaria, There's a lot going on in that error message. I've got a suggestion, but first a question: Where is the line which contains 'Bad indirect slice' come from? Are you perhaps catching an exception returned by createTable and printing the error? If so, this could explain the subsequent

Re: A few questions for using Kudu

2018-03-15 Thread Dan Burkert
Hi, answers inline: On Thu, Mar 15, 2018 at 3:12 AM, 张晓宁 wrote: > I have a few questions for using kudu: > > 1. As more and more data inserted to kudu, the performance > decrease. After continuous data insertion for about 30 minutes, the TPS > performance decreased with 20%, and after 1-ho

Re: "broadcast" tablet replication for kudu?

2018-03-16 Thread Dan Burkert
The replication count is the number of tablet servers which Kudu will host copies on. So if you set the replication level to 5, Kudu will put the data on 5 separate tablet servers. There's no built-in broadcast table feature; upping the replication factor is the closest thing. A couple of things

Re: "broadcast" tablet replication for kudu?

2018-03-16 Thread Dan Burkert
t odd/even WRT number of tablet servers. - Dan > > From: Dan Burkert > Reply-To: "user@kudu.apache.org" > Date: Friday, March 16, 2018 at 2:09 PM > To: "user@kudu.apache.org" > Subject: Re: "broadcast" tablet replication for kudu? > > The re

Re: AsyncKudu

2018-04-09 Thread Dan Burkert
Hi José, The Deferred class is indeed pretty difficult to come to grips with, which is why we don't really recommend the async API for most use cases. I've personally found the Deferred class docs to be pretty useful when getting u

Re: AsyncKudu

2018-04-09 Thread Dan Burkert
duClient.tableExists(tableName)); > } > > > A little of the background of my project. The clients read and write on > other Database, and when they write something, the same information is sent > to Kudu. I don't want to block the client with the Kudu part, because the > client only need

Re: AsyncKudu

2018-04-10 Thread Dan Burkert
lient in that scenario. - Dan > > -José > ---------- > *De:* Dan Burkert > *Enviado:* 9 de abril de 2018 18:32:43 > > *Para:* user@kudu.apache.org > *Assunto:* Re: AsyncKudu > > Hi José, > > I would consider doing this a little bit diff

Re: Reverse sort on Primary Key

2018-04-23 Thread Dan Burkert
Hey Scott, Patrick's answer is spot on. I'm curious, though, is your usecase to find the latest value? Effectively a 'SORT BY DESC date LIMIT 1', or are you looking for the last n values, or all values? I ask because we frequently get the 'last value' question, and the solution for that might b

Re: Reverse sort on Primary Key

2018-04-24 Thread Dan Burkert
h one sequentially. Does that sound crazy? > > On Mon, Apr 23, 2018 at 3:23 PM Dan Burkert wrote: > >> Hey Scott, >> >> Patrick's answer is spot on. I'm curious, though, is your usecase to >> find the latest value? Effectively a 'SORT BY DESC dat

Re: Right way to insert to timestamp column via Java api

2018-05-02 Thread Dan Burkert
Hi Mauricio, The docs you linked to are for Impala, not Kudu. Kudu's timestamp type internally keeps microsecond precision. Your example of multiplying by 1000 is correct; you should adjust whatever your timestamp is to microseconds since the unix epoch. There are a bunch of different time APIs

Re: Difference in count(*) result for KUDU and parquet

2018-05-10 Thread Dan Burkert
Hi Geetika, this is a known issue in the Impala JDBC driver. For further questions about that JDBC driver I'd direct you to Cloudera's forums, since it's not an Apache or Kudu component. - Dan On Thu, Ma

[ANNOUNCE] Recognizing the newest Apache Kudu committers

2018-07-25 Thread Dan Burkert
Hi all, I'm pleased to announce that the Kudu PMC has voted to add Attila Bukor and Sailesh Mukil as committers and PMC members. Attila has contributed many supportability, build, docs, and quality of life improvements. In addition, Attila has been very active helping users on our Slack and emai

Re: Kudu hashes and Java hashes

2018-08-28 Thread Dan Burkert
I'm only aware of one reason you'd want to pre-partition the data before inserting it into Kudu, and that's if you are sorting the input data prior to inserting. Having a way to map a row to a partition means the sort step can be done per-partition instead of globally, which can help reduce memory

Re: Kudu's data pagination

2018-09-04 Thread Dan Burkert
Without the SORT BY requirement it's possible to do this by setting the primary key range of the scan to the incremented previous value, plus a limit, plus making it a fault-tolerant scan. Here are the options you'll need to configure: https://kudu.apache.org/apidocs/org/apache/kudu/client/ Abstr

Re: Multi-level partitions question

2018-10-11 Thread Dan Burkert
Hi Boris, The two examples you gave are exactly equivalent; the relative ordering of hash levels has no effect on query performance, hotspotting, or anything else. Given that 60% of your queries don't specify a specific customer_id, it does make sense to use hash(shop_id), hash(customer_id) inste

Re: Multi-level partitions question

2018-10-11 Thread Dan Burkert
e > them as a bunch of independent files instead and each file will have data > for the specific hash of shop_id/customer_id? > > Boris > > On Thu, Oct 11, 2018 at 4:05 PM Dan Burkert wrote: > >> Hi Boris, >> >> The two examples you gave are exactly equivalen

Re: Multi-level partitions question

2018-10-11 Thread Dan Burkert
ot of data you would actually want it to be parallelized across many tablets, and therefore be able to take advantage of many tservers to perform the scan. - Dan On Thu, Oct 11, 2018 at 3:25 PM Dan Burkert wrote: > > Just to clarify, are you saying that partition by hash(shop_id), > ha

Re: is it worth to have partitions on very small tables?

2018-10-15 Thread Dan Burkert
Often for these cases having multiple partitions doesn't provide any advantage. There are fixed-cost overheads to having many tablets, so if the tablets are small these costs can outweigh the benefit. Additionally, if you aren't actively writing to the table then the benefit of parallelizing thos