[jira] [Commented] (KUDU-2888) Better encoding for dictionary code-words

2019-07-09 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/KUDU-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881716#comment-16881716
 ] 

Todd Lipcon commented on KUDU-2888:
---

attached a little test file I wrote. NOTE: it has some weirdness where it gets 
the compression ratio of bitshuffle off by four, and maybe there are some perf 
problems too (didn't spend a lot of time on it).

> Better encoding for dictionary code-words
> -
>
> Key: KUDU-2888
> URL: https://issues.apache.org/jira/browse/KUDU-2888
> Project: Kudu
>  Issue Type: Bug
>  Components: cfile, perf
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: codec-test.py
>
>
> Currently we use bitshuffle for all ints, including dictionary codewords. For 
> dictionary codewords, we know the maximum possible value up-front, and we 
> also know that the ints will be non-negative and small. This set of 
> constraints makes it much better to use a specialized bitpacking algorithm 
> rather than a more generic compression like bitshuffle+lz4. Based on some 
> quick experiments I ran, we can probably get a several-fold decoding speedup 
> with no loss of compression by switching to a codec like simdbitpacking for 
> these codewords.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KUDU-2888) Better encoding for dictionary code-words

2019-07-09 Thread Todd Lipcon (JIRA)
Todd Lipcon created KUDU-2888:
-

 Summary: Better encoding for dictionary code-words
 Key: KUDU-2888
 URL: https://issues.apache.org/jira/browse/KUDU-2888
 Project: Kudu
  Issue Type: Bug
  Components: cfile, perf
Reporter: Todd Lipcon
 Attachments: codec-test.py

Currently we use bitshuffle for all ints, including dictionary codewords. For 
dictionary codewords, we know the maximum possible value up-front, and we also 
know that the ints will be non-negative and small. This set of constraints 
makes it much better to use a specialized bitpacking algorithm rather than a 
more generic compression like bitshuffle+lz4. Based on some quick experiments I 
ran, we can probably get a several-fold decoding speedup with no loss of 
compression by switching to a codec like simdbitpacking for these codewords.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KUDU-2888) Better encoding for dictionary code-words

2019-07-09 Thread Todd Lipcon (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Todd Lipcon updated KUDU-2888:
--
Attachment: codec-test.py

> Better encoding for dictionary code-words
> -
>
> Key: KUDU-2888
> URL: https://issues.apache.org/jira/browse/KUDU-2888
> Project: Kudu
>  Issue Type: Bug
>  Components: cfile, perf
>Reporter: Todd Lipcon
>Priority: Major
> Attachments: codec-test.py
>
>
> Currently we use bitshuffle for all ints, including dictionary codewords. For 
> dictionary codewords, we know the maximum possible value up-front, and we 
> also know that the ints will be non-negative and small. This set of 
> constraints makes it much better to use a specialized bitpacking algorithm 
> rather than a more generic compression like bitshuffle+lz4. Based on some 
> quick experiments I ran, we can probably get a several-fold decoding speedup 
> with no loss of compression by switching to a codec like simdbitpacking for 
> these codewords.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (KUDU-1948) Client-side configuration of cluster details

2019-07-09 Thread Yingchun Lai (JIRA)


 [ 
https://issues.apache.org/jira/browse/KUDU-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yingchun Lai reassigned KUDU-1948:
--

Assignee: (was: Yingchun Lai)

> Client-side configuration of cluster details
> 
>
> Key: KUDU-1948
> URL: https://issues.apache.org/jira/browse/KUDU-1948
> Project: Kudu
>  Issue Type: New Feature
>  Components: client, security
>Affects Versions: 1.3.0
>Reporter: Todd Lipcon
>Priority: Major
>
> In the beginning, Kudu clients were configured with only the address of the 
> single Kudu master. This was nice and simple, and there was no need for a 
> client "configuration file".
> Then, we added multi-masters, and the client API had to take a list of master 
> addresses. This wasn't awful, but started to be a bit aggravating when trying 
> to use tools on a multi-master cluster (who wants to type out three long 
> hostnames in a 'ksck' command line every time?).
> Now with security, we have a couple more bits of configuration for the 
> client. Namely:
> - "require SSL" and "require authentication" booleans -- necessary to prevent 
> MITM downgrade attacks
> - custom Kerberos principal -- if the server wants to use a principal other 
> than 'kudu/@REALM' then the client needs to know to expect it and fetch 
> the appropriate service ticket. (Note this isn't yet supported but would like 
> to be!)
> In the future, there are other items that might be best specified as part of 
> a client configuration as well (e.g. CA cert for BYO PKI, wire compression 
> options, etc).
> For the above use cases it would be nicer to allow the various options to be 
> specified in a configuration file rather than adding specific APIs for all 
> options.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)