[jira] [Commented] (KUDU-2888) Better encoding for dictionary code-words
[ https://issues.apache.org/jira/browse/KUDU-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16881716#comment-16881716 ] Todd Lipcon commented on KUDU-2888: --- attached a little test file I wrote. NOTE: it has some weirdness where it gets the compression ratio of bitshuffle off by four, and maybe there are some perf problems too (didn't spend a lot of time on it). > Better encoding for dictionary code-words > - > > Key: KUDU-2888 > URL: https://issues.apache.org/jira/browse/KUDU-2888 > Project: Kudu > Issue Type: Bug > Components: cfile, perf >Reporter: Todd Lipcon >Priority: Major > Attachments: codec-test.py > > > Currently we use bitshuffle for all ints, including dictionary codewords. For > dictionary codewords, we know the maximum possible value up-front, and we > also know that the ints will be non-negative and small. This set of > constraints makes it much better to use a specialized bitpacking algorithm > rather than a more generic compression like bitshuffle+lz4. Based on some > quick experiments I ran, we can probably get a several-fold decoding speedup > with no loss of compression by switching to a codec like simdbitpacking for > these codewords. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KUDU-2888) Better encoding for dictionary code-words
Todd Lipcon created KUDU-2888: - Summary: Better encoding for dictionary code-words Key: KUDU-2888 URL: https://issues.apache.org/jira/browse/KUDU-2888 Project: Kudu Issue Type: Bug Components: cfile, perf Reporter: Todd Lipcon Attachments: codec-test.py Currently we use bitshuffle for all ints, including dictionary codewords. For dictionary codewords, we know the maximum possible value up-front, and we also know that the ints will be non-negative and small. This set of constraints makes it much better to use a specialized bitpacking algorithm rather than a more generic compression like bitshuffle+lz4. Based on some quick experiments I ran, we can probably get a several-fold decoding speedup with no loss of compression by switching to a codec like simdbitpacking for these codewords. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (KUDU-2888) Better encoding for dictionary code-words
[ https://issues.apache.org/jira/browse/KUDU-2888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Todd Lipcon updated KUDU-2888: -- Attachment: codec-test.py > Better encoding for dictionary code-words > - > > Key: KUDU-2888 > URL: https://issues.apache.org/jira/browse/KUDU-2888 > Project: Kudu > Issue Type: Bug > Components: cfile, perf >Reporter: Todd Lipcon >Priority: Major > Attachments: codec-test.py > > > Currently we use bitshuffle for all ints, including dictionary codewords. For > dictionary codewords, we know the maximum possible value up-front, and we > also know that the ints will be non-negative and small. This set of > constraints makes it much better to use a specialized bitpacking algorithm > rather than a more generic compression like bitshuffle+lz4. Based on some > quick experiments I ran, we can probably get a several-fold decoding speedup > with no loss of compression by switching to a codec like simdbitpacking for > these codewords. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (KUDU-1948) Client-side configuration of cluster details
[ https://issues.apache.org/jira/browse/KUDU-1948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yingchun Lai reassigned KUDU-1948: -- Assignee: (was: Yingchun Lai) > Client-side configuration of cluster details > > > Key: KUDU-1948 > URL: https://issues.apache.org/jira/browse/KUDU-1948 > Project: Kudu > Issue Type: New Feature > Components: client, security >Affects Versions: 1.3.0 >Reporter: Todd Lipcon >Priority: Major > > In the beginning, Kudu clients were configured with only the address of the > single Kudu master. This was nice and simple, and there was no need for a > client "configuration file". > Then, we added multi-masters, and the client API had to take a list of master > addresses. This wasn't awful, but started to be a bit aggravating when trying > to use tools on a multi-master cluster (who wants to type out three long > hostnames in a 'ksck' command line every time?). > Now with security, we have a couple more bits of configuration for the > client. Namely: > - "require SSL" and "require authentication" booleans -- necessary to prevent > MITM downgrade attacks > - custom Kerberos principal -- if the server wants to use a principal other > than 'kudu/@REALM' then the client needs to know to expect it and fetch > the appropriate service ticket. (Note this isn't yet supported but would like > to be!) > In the future, there are other items that might be best specified as part of > a client configuration as well (e.g. CA cert for BYO PKI, wire compression > options, etc). > For the above use cases it would be nicer to allow the various options to be > specified in a configuration file rather than adding specific APIs for all > options. -- This message was sent by Atlassian JIRA (v7.6.3#76005)