Thanks, I’ll see if I can find some available cycles
From: Todd Lipcon [mailto:t...@cloudera.com]
Sent: Thursday, May 12, 2016 6:25 PM
To: user@kudu.incubator.apache.org
Subject: Re: Encryption
On Thu, May 12, 2016 at 9:45 AM, Jordan Birdsell
mailto:jordan.birdsell.k...@statefarm.com>>
wrote:
T
Cool, I will study it.
On Thu, May 12, 2016 at 2:17 PM, Dan Burkert wrote:
>
>
> On Thu, May 12, 2016 at 2:04 PM, Sand Stone
> wrote:
>
> >Instead, take advantage of the index capability of Primary Keys.
>> Currently I did make the "5-min" field a part of the primary key as well.
>> I am most l
On Thu, May 12, 2016 at 9:45 AM, Jordan Birdsell <
jordan.birdsell.k...@statefarm.com> wrote:
> Thanks Todd. From a roadmap perspective, do think this will be the
> recommended way of enabling encryption for Kudu or should a design be put
> together for something more integrated with Kudu itself?
So, basically go back to using relational database techniques. Got it. But, how
was the performance?
Cheers,
Ben
> On May 12, 2016, at 2:43 PM, Chris George wrote:
>
> I’ve used kudu with an EAV model for sparse data and that worked extremely
> well for us with billions of rows and the correc
I've used kudu with an EAV model for sparse data and that worked extremely well
for us with billions of rows and the correct partitioning.
-Chris
On 5/12/16, 3:21 PM, "Dan Burkert"
mailto:d...@cloudera.com>> wrote:
Hi Ben,
Kudu doesn't support sparse datasets with many columns very well. Kudu
Hi Ben,
Kudu doesn't support sparse datasets with many columns very well. Kudu's
data model looks much more like the relational, structured data model of a
traditional SQL database than HBase's data model. Kudu doesn't yet have a
map column type (or any nested column types), but we do have BINAR
On Thu, May 12, 2016 at 2:04 PM, Sand Stone wrote:
>Instead, take advantage of the index capability of Primary Keys.
> Currently I did make the "5-min" field a part of the primary key as well.
> I am most likely overdoing it. I will play around with the schema and use
> cases around it.
>
Defini
Can Kudu handle the use case where sparse data is involved? In many of our
processes, we deal with data that can have any number of columns and many
previously unknown column names depending on what attributes are brought in at
the time. Currently, we use HBase to handle this. Since Kudu is base
Thanks for the advice, Dan.
>Instead, take advantage of the index capability of Primary Keys.
Currently I did make the "5-min" field a part of the primary key as well. I
am most likely overdoing it. I will play around with the schema and use
cases around it.
>since each tablet server should only
On Thu, May 12, 2016 at 11:39 AM, Sand Stone wrote:
I don't know how Kudu load balance the data across the tablet servers.
>
Individual tablets are replicated and balanced across all available tablet
servers, for more on that see
http://getkudu.io/docs/schema_design.html#data-distribution.
>
Thanks, Dan.
In your scheme, I assume you suggest the range partition on the timestamp.
I don't know how Kudu load balance the data across the tablet servers. For
example, do I need to pre-calculate every day, a list of 5 minutes apart
timestamps at table creation? [assume I have to create a new t
Forgot to add the PK specification to the CREATE TABLE, it should have read
as follows:
CREATE TABLE metrics (metric STRING, time TIMESTAMP, value DOUBLE)
PRIMARY KEY (metric, time);
- Dan
On Thu, May 12, 2016 at 11:12 AM, Dan Burkert wrote:
>
> On Thu, May 12, 2016 at 11:05 AM, Sand Stone
>
On Thu, May 12, 2016 at 11:05 AM, Sand Stone wrote:
> > Is the requirement to pre-aggregate by time window?
> No, I am thinking to create a column say, "minute". It's basically the
> minute field of the timestamp column(even round to 5-min bucket depending
> on the needs). So it's a computed colu
> Is the requirement to pre-aggregate by time window?
No, I am thinking to create a column say, "minute". It's basically the
minute field of the timestamp column(even round to 5-min bucket depending
on the needs). So it's a computed column being filled in on data ingestion.
My goal is that this fie
On Thu, May 12, 2016 at 8:32 AM, Chris George
wrote:
> How hard would a predicate based delete be?
> Ie ScanDelete or something.
> -Chris George
>
That might be pretty difficult, since it implicitly assumes cross row
transactional consistency. If consistency isn't required you can simulate
it t
It should be fully implemented for 1.0 which we're aiming for August. You
can follow this jira: https://issues.apache.org/jira/browse/KUDU-1306
J-D
On Thu, May 12, 2016 at 10:10 AM, Sand Stone wrote:
> Thanks J-D.
>
> Any idea when the partition level deletion will be implemented?
>
> On Thu, M
Thanks J-D.
Any idea when the partition level deletion will be implemented?
On Thu, May 12, 2016 at 8:24 AM, Jean-Daniel Cryans
wrote:
> Hi,
>
> Right now this use case is more difficult than it needs to be. In your
> previous thread, "Partition and Split rows", we talked about non-covering
> r
Thanks Todd. From a roadmap perspective, do think this will be the recommended
way of enabling encryption for Kudu or should a design be put together for
something more integrated with Kudu itself?
From: Todd Lipcon [mailto:t...@cloudera.com]
Sent: Thursday, May 12, 2016 12:31 PM
To: user@kudu.
Hi Jordan,
I'm not aware of someone doing this yet, but since dm-crypt operates at the
device level and not the filesystem level, I don't see any reasons it
wouldn't work.
It's possible some developers here have already been running like this if
they use dm-crypt on their laptops. Maybe someone e
How hard would a predicate based delete be?
Ie ScanDelete or something.
-Chris George
On 5/12/16, 9:24 AM, "Jean-Daniel Cryans"
mailto:jdcry...@apache.org>> wrote:
Hi,
Right now this use case is more difficult than it needs to be. In your previous
thread, "Partition and Split rows", we talked
Hi,
A while back we had a thread going about using dm-crypt as a means to encrypt
kudu data. Out of curiosity, has any one actually done this?
Thanks,
Jordan Birdsell
Hi,
Right now this use case is more difficult than it needs to be. In your
previous thread, "Partition and Split rows", we talked about non-covering
range partition and this is something that would help your use case a lot.
Basically, you could create partitions that cover full days, and everyday
Hi. Presumably I need to write a program to delete the unwanted rows, say,
remove all data older than 3 days, while the table is still ingesting new
data.
How well will this perform for large tables? Both deletion and ingestion
wise.
Or for this specific case that I retire data by day, I should c
23 matches
Mail list logo