RE: Encryption

2016-05-12 Thread Jordan Birdsell
Thanks, I’ll see if I can find some available cycles From: Todd Lipcon [mailto:t...@cloudera.com] Sent: Thursday, May 12, 2016 6:25 PM To: user@kudu.incubator.apache.org Subject: Re: Encryption On Thu, May 12, 2016 at 9:45 AM, Jordan Birdsell mailto:jordan.birdsell.k...@statefarm.com>> wrote: T

Re: Partition and Split rows

2016-05-12 Thread Sand Stone
Cool, I will study it. On Thu, May 12, 2016 at 2:17 PM, Dan Burkert wrote: > > > On Thu, May 12, 2016 at 2:04 PM, Sand Stone > wrote: > > >Instead, take advantage of the index capability of Primary Keys. >> Currently I did make the "5-min" field a part of the primary key as well. >> I am most l

Re: Encryption

2016-05-12 Thread Todd Lipcon
On Thu, May 12, 2016 at 9:45 AM, Jordan Birdsell < jordan.birdsell.k...@statefarm.com> wrote: > Thanks Todd. From a roadmap perspective, do think this will be the > recommended way of enabling encryption for Kudu or should a design be put > together for something more integrated with Kudu itself?

Re: Sparse Data

2016-05-12 Thread Benjamin Kim
So, basically go back to using relational database techniques. Got it. But, how was the performance? Cheers, Ben > On May 12, 2016, at 2:43 PM, Chris George wrote: > > I’ve used kudu with an EAV model for sparse data and that worked extremely > well for us with billions of rows and the correc

Re: Sparse Data

2016-05-12 Thread Chris George
I've used kudu with an EAV model for sparse data and that worked extremely well for us with billions of rows and the correct partitioning. -Chris On 5/12/16, 3:21 PM, "Dan Burkert" mailto:d...@cloudera.com>> wrote: Hi Ben, Kudu doesn't support sparse datasets with many columns very well. Kudu

Re: Sparse Data

2016-05-12 Thread Dan Burkert
Hi Ben, Kudu doesn't support sparse datasets with many columns very well. Kudu's data model looks much more like the relational, structured data model of a traditional SQL database than HBase's data model. Kudu doesn't yet have a map column type (or any nested column types), but we do have BINAR

Re: Partition and Split rows

2016-05-12 Thread Dan Burkert
On Thu, May 12, 2016 at 2:04 PM, Sand Stone wrote: >Instead, take advantage of the index capability of Primary Keys. > Currently I did make the "5-min" field a part of the primary key as well. > I am most likely overdoing it. I will play around with the schema and use > cases around it. > Defini

Sparse Data

2016-05-12 Thread Benjamin Kim
Can Kudu handle the use case where sparse data is involved? In many of our processes, we deal with data that can have any number of columns and many previously unknown column names depending on what attributes are brought in at the time. Currently, we use HBase to handle this. Since Kudu is base

Re: Partition and Split rows

2016-05-12 Thread Sand Stone
Thanks for the advice, Dan. >Instead, take advantage of the index capability of Primary Keys. Currently I did make the "5-min" field a part of the primary key as well. I am most likely overdoing it. I will play around with the schema and use cases around it. >since each tablet server should only

Re: Partition and Split rows

2016-05-12 Thread Dan Burkert
On Thu, May 12, 2016 at 11:39 AM, Sand Stone wrote: I don't know how Kudu load balance the data across the tablet servers. > Individual tablets are replicated and balanced across all available tablet servers, for more on that see http://getkudu.io/docs/schema_design.html#data-distribution. >

Re: Partition and Split rows

2016-05-12 Thread Sand Stone
Thanks, Dan. In your scheme, I assume you suggest the range partition on the timestamp. I don't know how Kudu load balance the data across the tablet servers. For example, do I need to pre-calculate every day, a list of 5 minutes apart timestamps at table creation? [assume I have to create a new t

Re: Partition and Split rows

2016-05-12 Thread Dan Burkert
Forgot to add the PK specification to the CREATE TABLE, it should have read as follows: CREATE TABLE metrics (metric STRING, time TIMESTAMP, value DOUBLE) PRIMARY KEY (metric, time); - Dan On Thu, May 12, 2016 at 11:12 AM, Dan Burkert wrote: > > On Thu, May 12, 2016 at 11:05 AM, Sand Stone >

Re: Partition and Split rows

2016-05-12 Thread Dan Burkert
On Thu, May 12, 2016 at 11:05 AM, Sand Stone wrote: > > Is the requirement to pre-aggregate by time window? > No, I am thinking to create a column say, "minute". It's basically the > minute field of the timestamp column(even round to 5-min bucket depending > on the needs). So it's a computed colu

Re: Partition and Split rows

2016-05-12 Thread Sand Stone
> Is the requirement to pre-aggregate by time window? No, I am thinking to create a column say, "minute". It's basically the minute field of the timestamp column(even round to 5-min bucket depending on the needs). So it's a computed column being filled in on data ingestion. My goal is that this fie

Re: best practices to remove/retire data

2016-05-12 Thread Dan Burkert
On Thu, May 12, 2016 at 8:32 AM, Chris George wrote: > How hard would a predicate based delete be? > Ie ScanDelete or something. > -Chris George > That might be pretty difficult, since it implicitly assumes cross row transactional consistency. If consistency isn't required you can simulate it t

Re: best practices to remove/retire data

2016-05-12 Thread Jean-Daniel Cryans
It should be fully implemented for 1.0 which we're aiming for August. You can follow this jira: https://issues.apache.org/jira/browse/KUDU-1306 J-D On Thu, May 12, 2016 at 10:10 AM, Sand Stone wrote: > Thanks J-D. > > Any idea when the partition level deletion will be implemented? > > On Thu, M

Re: best practices to remove/retire data

2016-05-12 Thread Sand Stone
Thanks J-D. Any idea when the partition level deletion will be implemented? On Thu, May 12, 2016 at 8:24 AM, Jean-Daniel Cryans wrote: > Hi, > > Right now this use case is more difficult than it needs to be. In your > previous thread, "Partition and Split rows", we talked about non-covering > r

RE: Encryption

2016-05-12 Thread Jordan Birdsell
Thanks Todd. From a roadmap perspective, do think this will be the recommended way of enabling encryption for Kudu or should a design be put together for something more integrated with Kudu itself? From: Todd Lipcon [mailto:t...@cloudera.com] Sent: Thursday, May 12, 2016 12:31 PM To: user@kudu.

Re: Encryption

2016-05-12 Thread Todd Lipcon
Hi Jordan, I'm not aware of someone doing this yet, but since dm-crypt operates at the device level and not the filesystem level, I don't see any reasons it wouldn't work. It's possible some developers here have already been running like this if they use dm-crypt on their laptops. Maybe someone e

Re: best practices to remove/retire data

2016-05-12 Thread Chris George
How hard would a predicate based delete be? Ie ScanDelete or something. -Chris George On 5/12/16, 9:24 AM, "Jean-Daniel Cryans" mailto:jdcry...@apache.org>> wrote: Hi, Right now this use case is more difficult than it needs to be. In your previous thread, "Partition and Split rows", we talked

Encryption

2016-05-12 Thread Jordan Birdsell
Hi, A while back we had a thread going about using dm-crypt as a means to encrypt kudu data. Out of curiosity, has any one actually done this? Thanks, Jordan Birdsell

Re: best practices to remove/retire data

2016-05-12 Thread Jean-Daniel Cryans
Hi, Right now this use case is more difficult than it needs to be. In your previous thread, "Partition and Split rows", we talked about non-covering range partition and this is something that would help your use case a lot. Basically, you could create partitions that cover full days, and everyday

best practices to remove/retire data

2016-05-12 Thread Sand Stone
Hi. Presumably I need to write a program to delete the unwanted rows, say, remove all data older than 3 days, while the table is still ingesting new data. How well will this perform for large tables? Both deletion and ingestion wise. Or for this specific case that I retire data by day, I should c