Which instruction set extensions should Impala require?

2017-11-04 Thread Jim Apple
In a discussion on https://issues.apache.org/jira/browse/IMPALA-6128,
we are talking about which instruction sets (available on newer x86-64
processors) we want to require.

At this point, I'm not sure how strong the motivation is for requiring
certain instruction sets, but it may be worth some effort to talk
about guidelines. As of now, we can decide at run time which methods
to use based on CPU info gathered at daemon start time. See
cpu-info.cc.

The instruction in this case is the CLMUL instruction, which we
believe was available on all new server-class x86-64 chips by Intel
and AMD as of Q2, 2011. It has good performance benefits for
spill-to-disk encryption.

We currently use the following, but only dispatching at run time:

SSSE3(*), SSE4.1, SSE4.2 (Available since late 2011 on both AMD and Intel)
POPCNT (Available since late 2008 on both AMD and Intel)
AVX (late 2011)
AVX2 (late 2015)

One argument for continuing with our current requirements is that
dispatching still gets us good speedup in some cases, and the branch
predictor should take care of some of the latency of dispatching.

One argument for adding more requirements is that not only can
dispatching go away, but we can add flags to the compilers to use
later instructions, which can speed up auto-vectorized operations or
standard library operations. For instance, AVX has 256-bit registers
that can speed up bulk memory operations.

A concern I have with setting a time-based rule is that it doesn't
seem easy to me to figure out when, say, AMD *stopped* selling
server-class chips without AVX. So, if we started requiring AVX, we
could have some Impala user with recent AMD chips become unable to run
the latest Impala, which would be a shame.

Thoughts about what we should require?

(*) We spit out an error if the machine does not have SSSE3


New Impala Contributors: IMPALA-3323

2017-11-04 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what
you want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on,
with hopefully enough detail to get you going but not so much to take
away the fun.


How can we fix https://issues.apache.org/jira/browse/IMPALA-3323,
"impala-shell --ldap_password_cmd has no config file equivalent"?
First, make sure you have your development environment set up. Let's
see if we can reproduce the issue. Once your impala-server is running,
try to launch the impala shell with the --ldap_password_cmd flag set:


$ bin/impala-shell.sh --ldap_password_cmd
Usage: impala_shell.py [options]

impala_shell.py: error: --ldap_password_cmd option requires an argument
$ bin/impala-shell.sh --ldap_password_cmd SOME_ARGUMENT
Option --ldap_password_cmd requires using LDAP authentication mechanism (-l)
$ bin/impala-shell.sh --ldap_password_cmd SOME_ARGUMENT -l
LDAP credentials may not be sent over insecure connections. Enable SSL
or set --auth_creds_ok_in_clear
$ bin/impala-shell.sh --ldap_password_cmd SOME_ARGUMENT -l
--auth_creds_ok_in_clear
Starting Impala Shell using LDAP-based authentication
Error retrieving LDAP password (command was: 'SOME_ARGUMENT',
exception was: '[Errno 2] No such file or directory')

While not a resounding success, at least we know that the shell can
get past its argument parsing phase! To duplicate the issue referenced
in the ticket, let's create a .impalarc file that should recognize
that the --ldap_password_cmd flag is set. To see how a valid impalarc
flag looks, grep through the source code for references to it using
"git grep impalarc". You'll see references in
tests/shell/test_shell_commandline.py to the --config_file flag and a
file named good_impalarc. You can find that file using "find . -name
good_impalarc" and try to duplicate the command. Then, run it again,
but with a config file with a reference to ldap_password_cmd. What
error do you get? If you grep through the source code, where can you
find that error text referenced? What triggers it, and how can you fix
it?

Once you've solved that mystery and you can make an impala config file
that causes the shell to recognize the ldap_password_cmd option,
you'll want to write a regression test for it. In the
test_shell_commandline.py file, you'll see references to tests of
config files and tests of LDAP options. Use your best judgment on
whether this ticket deserves its own test method or can be folded into
one of the other two. As you iterate, you can test this file with

bin/impala-py.test
tests/shell/test_shell_commandline.py::TestImpalaShell::test_ldap_3323

In that example command line, test_ldap_3323 is a test method name -
you can change it to the method name of any other test method in that
file.

Have fun, and don't be afraid to ask d...@impala.apache.org is you have
any questions!