Re: Docker image for impala development

2017-11-15 Thread Jim Apple
I don't know of a up-to-date docker image or dockerfile.

When you say "following the apache site docker image for impala dev
steps", I assume you mean this page:

https://cwiki.apache.org/confluence/display/IMPALA/Docker+for+Impala+Developers

It does take a lot of time and a lot of disk space, which is
unfortunate and unlikely to be easily fixable. You could run just the
bootstrap_system.sh script, which will get a working system, but won't
compile the code (which takes perhaps 40 minutes) or load the test
data (which takes something like an hour).

On Wed, Nov 15, 2017 at 10:43 AM, Sandish Kumar HN
 wrote:
> Hi,
>
> Does any have docker image for impala dev ? Atleast the docker file
>  script?
> I have been following the apache site docker image for impala dev steps,
> which is taking lot of time to build and taking large amount of disk space,
> which is an issue to upload to docker hub.
>
> Please suggest me some steps
> --
> Sent from Gmail Mobile


thread_local compatible with other threading models?

2017-11-14 Thread Jim Apple
A quick git grep shows use of both boost::thread and pthread. C++14 has a
thread_local keyword:

http://eel.is/c++draft/basic.stc.thread

Do we know if the semantics of thread_local in C++14 are compatible with
thread-locality in pthreads and boost::thread?


Re: Outdated information in the wiki

2017-11-14 Thread Jim Apple
Jinchul, I changed the admin permissions for you. I think you now have
access to editing the wiki.

On Mon, Nov 13, 2017 at 4:30 PM, Jin Chul Kim  wrote:

> Hi,
>
> It seems the wiki page has not up-to-date information. Please see
> https://cwiki.apache.org/confluence/display/IMPALA/How+
> to+load+and+run+Impala+tests#
>
> Since the change "IMPALA-3516: Avoid writing to /tmp in testing
> ", the location needs to be updated
> in the part "Run just end-to-end tests",
>
> # To update the results of tests (The new test files will be located in
> /tmp/test_file_name.test):
> ./tests/run-tests.py --update_results
>
> The new test files will be located at ${IMPALA_EE_TEST_LOGS_DIR}. The
> environment variable should be set by loading impala-config.sh
>
> export IMPALA_EE_TEST_LOGS_DIR="${IMPALA_LOGS_DIR}/ee_tests"
>
> Do you mind if I correct the part? Currently I haven't a privilege to
> edit/comment on wiki, so it would be great if one of you grants a
> permission to me. My username is "jinchul".
>
> Best regards,
> Jinchul
>


Re: Getting started with 'newbie' tasks

2017-11-12 Thread Jim Apple
https://issues.apache.org/jira/browse/IMPALA-5341 might be a good choice

On Sun, Nov 12, 2017 at 12:58 AM, Sagar Batchu 
wrote:

> Hi Impala devs,
>
> I came across several Impala tasks listed under Apache's help wanted
> sections. Do you have any recommendations on which 'newbie' tasks
>  to
> look
> at first ? Looking forward to contributing.
>
> Best,
> Sagar Batchu
>  LinkedIn://sagarbatchu  sagar-batchu-981b3738/>
>


Re: Graduation resolution proposal

2017-11-11 Thread Jim Apple
That vote passed. Now the IPMC will recommend to the ASF board that Impala
be created as a TLP.

The board meets once a month, so they will likely be considering Impala TLP
status on either November 15 or December 20. I will let this list know when
I know more.

On Wed, Nov 8, 2017 at 8:56 PM, Jim Apple <jbap...@cloudera.com> wrote:

> We are now on step 3, in which the IPMC votes on the proposed graduation
> resolution:
>
> https://lists.apache.org/thread.html/4abfbf40b7d822cdc19421ea55de21
> f19ce70c4fd73c6f4c8cc98ce8@%3Cgeneral.incubator.apache.org%3E
>
> If it passes, the next step is a board resolution:
>
> http://incubator.apache.org/guides/graduation.html#
> submission_of_the_resolution_to_the_board
>
> On Tue, Oct 31, 2017 at 10:36 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
>> Thanks Jim!
>>
>> -Todd
>>
>> On Tue, Oct 31, 2017 at 10:35 PM, Jim Apple <jbap...@cloudera.com> wrote:
>>
>> > I have sent this to general@ for discussion:
>> >
>> > https://lists.apache.org/thread.html/6b8598408f76a472532923c5a7fc51
>> > 0470b21671677ba3486568c57e@%3Cgeneral.incubator.apache.org%3E
>> >
>> > On Sat, Oct 28, 2017 at 8:12 AM, Jim Apple <jbap...@cloudera.com>
>> wrote:
>> > > Below is a graduation resolution I would like to send to
>> > > general@incubator for discussion. It includes the PMC volunteers as
>> > > well as the result of the first PMC chair election, which was me.
>> > >
>> > > Unless there is objection, I'll send this to general@incubator for
>> > > discussion in a couple of days. If you want to participate in that
>> > > discussion at general@incubator, you can subscribe by emailing
>> > > general-subscr...@incubator.apache.org.
>> > >
>> > > As a reminder, the next steps I will take are:
>> > >
>> > > 1. Prepare a charter (i.e. this email)
>> > >
>> > > 2. Start a discussion on general@incubator.
>> > >
>> > > Should the discussion look mostly positive:
>> > >
>> > > 3. Call a vote on general@incubator.
>> > >
>> > > Should that vote succeed:
>> > >
>> > > 4. Submit the resolution to the ASF Board. See more here:
>> > > http://incubator.apache.org/guides/graduation.html
>> > >
>> > > 
>> > ---
>> > >
>> > > Establish the Apache Impala Project
>> > >
>> > > WHEREAS, the Board of Directors deems it to be in the best interests
>> of
>> > > the Foundation and consistent with the Foundation's purpose to
>> establish
>> > > a Project Management Committee charged with the creation and
>> maintenance
>> > > of open-source software, for distribution at no charge to the public,
>> > > related to a high-performance distributed SQL engine.
>> > >
>> > > NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
>> > > (PMC), to be known as the "Apache Impala Project", be and hereby is
>> > > established pursuant to Bylaws of the Foundation; and be it further
>> > >
>> > > RESOLVED, that the Apache Impala Project be and hereby is responsible
>> > > for the creation and maintenance of software related to a
>> > > high-performance distributed SQL engine; and be it further
>> > >
>> > > RESOLVED, that the office of "Vice President, Apache Impala" be and
>> > > hereby is created, the person holding such office to serve at the
>> > > direction of the Board of Directors as the chair of the Apache Impala
>> > > Project, and to have primary responsibility for management of the
>> > > projects within the scope of responsibility of the Apache Impala
>> > > Project; and be it further
>> > >
>> > > RESOLVED, that the persons listed immediately below be and hereby are
>> > > appointed to serve as the initial members of the Apache Impala
>> Project:
>> > >
>> > >  * Alex Behm <ab...@apache.org>
>> > >  * Bharath Vissapragada  <bhara...@apache.org>
>> > >  * Brock Noland  <br...@apache.org>
>> > >  * Carl Steinbach<c...@apache.org>
>> > >  * Casey Ching   <ca...@apache.org>
>> > >  * Daniel Hecht  <dhe...@apache.org>
>> > >  * Dimitris Tsirogiannis <dt

Experienced Impala Contributors: writing "New Impala Contributors" walk-throughs

2017-11-11 Thread Jim Apple
If you are an experienced Impala contributor, and you'd like to help with
community outreach, you can post a walkthrough of a ticket that a newbie
will be able to address in only an hour of your time.

Get started by looking through open tickets labelled "newbie"
https://issues.apache.org/jira/issues/?filter=12341668

Look for a ticket with a few characteristics:

1. It looks like a new contributor could honestly handle this ticket!
Sometimes the "newbie" label is applied overzealously.
2. It looks like a ticket you could write a quick-and-ugly patch for very
quickly. Nobody will have to ever see it but you!
3. It's not essential to the next release - we wouldn't want an experienced
contributor taking on an issue out of urgency and snatching it away from a
new contributor.
4. It's not so out-of-date that it no longer describes the system as it
currently exists.
5. It hasn't been commented on with a walkthrough already.

If you can't find a ticket like this that is labelled "newbie", look for
newer open tickets with priority "Major" or below. The last community
effort to go through open tickets and label some "newbie" may have been
months before you are looking, so there may be good tickets for new
contributors that just haven't been labelled "newbie" yet.

Once you find a ticket you like, get a document ready to take notes in, and
start hacking up a patch, just for yourself, to make sure you know what
direction to point newbies in. Your notes document does not need to include
any of the code you write, only hints and signposts. You don't need your
code or prose to be perfect. Writing a quick-and-dirty patch and doing your
writeup of how you did it should take an hour or less. It's OK to remind
new contributors that they can choose to only partially fix an issue, as
long as the state of the code is still coherent and lends itself to being
fixed more, later.

Once you're done, prepend to your writeup a note about how to get started.
I use the following text, but feel free to improvise:

"If you'd like to contribute a patch to Impala, but aren't sure what you
want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on, with
hopefully enough detail to get you going but not so much to take away the
fun."

Append to your writeup a note of encouragement, like "Have fun, and you can
ask d...@impala.apache.org for help if you need a hint!"

Now post your writeup to d...@impala.apache.org. You can email me when
you're done and I will post it two more places:

1. I'll wait a few minutes to see your writeup on
https://lists.apache.org/list.html?d...@impala.apache.org. I'll click the
message title, then click "Permalink", then copy the URL and post that to
the ticket.
2. I'll post to https://helpwanted.apache.org. Here's an example of what it
will look like:
https://helpwanted.apache.org/task.html?b1b131fffb24afb17f52f7aae67beb73034832a7

If you're feeling energetic, feel free to do this cross-posting yourself.

Have fun, and don't hesitate to email me with any questions!


Re: Graduation resolution proposal

2017-11-08 Thread Jim Apple
We are now on step 3, in which the IPMC votes on the proposed graduation
resolution:

https://lists.apache.org/thread.html/4abfbf40b7d822cdc19421ea55de21f19ce70c4fd73c6f4c8cc98ce8@%3Cgeneral.incubator.apache.org%3E

If it passes, the next step is a board resolution:

http://incubator.apache.org/guides/graduation.html#submission_of_the_resolution_to_the_board

On Tue, Oct 31, 2017 at 10:36 PM, Todd Lipcon <t...@cloudera.com> wrote:

> Thanks Jim!
>
> -Todd
>
> On Tue, Oct 31, 2017 at 10:35 PM, Jim Apple <jbap...@cloudera.com> wrote:
>
> > I have sent this to general@ for discussion:
> >
> > https://lists.apache.org/thread.html/6b8598408f76a472532923c5a7fc51
> > 0470b21671677ba3486568c57e@%3Cgeneral.incubator.apache.org%3E
> >
> > On Sat, Oct 28, 2017 at 8:12 AM, Jim Apple <jbap...@cloudera.com> wrote:
> > > Below is a graduation resolution I would like to send to
> > > general@incubator for discussion. It includes the PMC volunteers as
> > > well as the result of the first PMC chair election, which was me.
> > >
> > > Unless there is objection, I'll send this to general@incubator for
> > > discussion in a couple of days. If you want to participate in that
> > > discussion at general@incubator, you can subscribe by emailing
> > > general-subscr...@incubator.apache.org.
> > >
> > > As a reminder, the next steps I will take are:
> > >
> > > 1. Prepare a charter (i.e. this email)
> > >
> > > 2. Start a discussion on general@incubator.
> > >
> > > Should the discussion look mostly positive:
> > >
> > > 3. Call a vote on general@incubator.
> > >
> > > Should that vote succeed:
> > >
> > > 4. Submit the resolution to the ASF Board. See more here:
> > > http://incubator.apache.org/guides/graduation.html
> > >
> > > 
> > ---
> > >
> > > Establish the Apache Impala Project
> > >
> > > WHEREAS, the Board of Directors deems it to be in the best interests of
> > > the Foundation and consistent with the Foundation's purpose to
> establish
> > > a Project Management Committee charged with the creation and
> maintenance
> > > of open-source software, for distribution at no charge to the public,
> > > related to a high-performance distributed SQL engine.
> > >
> > > NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
> > > (PMC), to be known as the "Apache Impala Project", be and hereby is
> > > established pursuant to Bylaws of the Foundation; and be it further
> > >
> > > RESOLVED, that the Apache Impala Project be and hereby is responsible
> > > for the creation and maintenance of software related to a
> > > high-performance distributed SQL engine; and be it further
> > >
> > > RESOLVED, that the office of "Vice President, Apache Impala" be and
> > > hereby is created, the person holding such office to serve at the
> > > direction of the Board of Directors as the chair of the Apache Impala
> > > Project, and to have primary responsibility for management of the
> > > projects within the scope of responsibility of the Apache Impala
> > > Project; and be it further
> > >
> > > RESOLVED, that the persons listed immediately below be and hereby are
> > > appointed to serve as the initial members of the Apache Impala Project:
> > >
> > >  * Alex Behm <ab...@apache.org>
> > >  * Bharath Vissapragada  <bhara...@apache.org>
> > >  * Brock Noland  <br...@apache.org>
> > >  * Carl Steinbach<c...@apache.org>
> > >  * Casey Ching   <ca...@apache.org>
> > >  * Daniel Hecht  <dhe...@apache.org>
> > >  * Dimitris Tsirogiannis <dtsirogian...@apache.org>
> > >  * Henry Robinson<he...@apache.org>
> > >  * Ishaan Joshi  <ish...@apache.org>
> > >  * Jim Apple <jbap...@apache.org>
> > >  * John Russell  <jruss...@apache.org>
> > >  * Juan Yu   <j...@apache.org>
> > >  * Lars Volker   <l...@apache.org>
> > >  * Lenni Kuff<lsk...@apache.org>
> > >  * Marcel Kornacker  <mar...@apache.org>
> > >  * Martin Grund  <mgr...@apache.org>
> > >  * Matthew Jacobs<mjac...@apache.org>
> > >  * Michael Brown

S3 connections

2017-11-08 Thread Jim Apple
http://impala.apache.org/docs/build/html/topics/impala_s3.html
recommends "Set the safety valve fs.s3a.connection.maximum to 1500 for
impalad." For best performance, should this be increased for nodes
with very high CPU, RAM, or bandwidth? Or decreased for less-beefy
nodes?


Re: A question about loading data by functional-query workload

2017-11-08 Thread Jim Apple
The recommended way to get a development environment set up calls
https://github.com/apache/incubator-impala/blob/master/bin/bootstrap_development.sh
which calls /buildall.sh -noclean -format -testdata -skiptests. That
is the recommended way to load the test data.

On Tue, Nov 7, 2017 at 10:17 PM, Jin Chul Kim  wrote:
> Hi,
>
> I would like to run E2E test using the document:
> https://cwiki.apache.org/confluence/display/IMPALA/How+to+load+and+run+Impala+tests
> .
>
> Here is my command: ./tests/run-tests.py query_test/test_queries.py -k
> TestQueriesTextTables
>
> It failed with not found functional database. Should I load data manually?
> Anyway, I found ./bin/load-data.sh by chance and I ran it with the
> command: ./bin/load-data.py -w functional-query
> By the way, it failed with not found matching path file:
> ${IMPALA_HOME}/testdata/target/AllTypes/090101.txt. I don't find the
> directory ${IMPALA_HOME}/testdata/target/AllTypes. I guess it can be
> generated internally. Would you please guide me?
>
> 0: jdbc:hive2://localhost:11050/default> LOAD DATA LOCAL INPATH
> '/home/jinchulkim/workspace/Impala/testdata/target/AllTypes/090101.txt'
> OVERWRITE INTO TABLE functional.alltypes PARTITION(year=2009, month=1);
> going to print operations logs
> printed operations logs
> Getting log thread is interrupted, since query is done!
> Error: Error while compiling statement: FAILED: SemanticException Line 1:23
> Invalid path
> ''/home/jinchulkim/workspace/Impala/testdata/target/AllTypes/090101.txt'':
> No files matching path
> file:/home/jinchulkim/workspace/Impala/testdata/target/AllTypes/090101.txt
> (state=42000,code=4)
> org.apache.hive.service.cli.HiveSQLException: Error while compiling
> statement: FAILED: SemanticException Line 1:23 Invalid path
> ''/home/jinchulkim/workspace/Impala/testdata/target/AllTypes/090101.txt'':
> No files matching path
> file:/home/jinchulkim/workspace/Impala/testdata/target/AllTypes/090101.txt
>
> Best regards,
> Jinchul


Re: Mentoring Programme, was Re: APACHE - IMPALA

2017-11-06 Thread Jim Apple
Moved dev@ to BCC, in case this was sent there by mistake.

Don't worry about the timing, and feel free to ask dev@ for help if
you could use a hint.

On Mon, Nov 6, 2017 at 10:45 AM, kenneth mcfarland
<kennethpmcfarl...@gmail.com> wrote:
> Jim, i'm having a hard time finding the email you sent me  recently. I
> wanted to let you know that I'm still working on that issue and my
> development box is set up. I'm sorry I haven't made more progress on it
> sooner, I am a student with some other stuff slowing me down right now.
> However I am going to be finishing this before the end of the week for sure
> unless for some reason it's like a huge issue which it's not. So please
> thank you for staying patient with me and I will reach out if I have any
> problems so far everything has been smooth setting up under Ubuntu. Also I
> am not going to sit on this is going to get done so thank you for being
> supportive. I will try to do my best to stick to that time frame
>
> On Oct 24, 2017 1:22 PM, "kenneth mcfarland" <kennethpmcfarl...@gmail.com>
> wrote:
>
>> Thank you for doing that, sorry about the title. I'm going to learn and
>> become useful either way, it would mostly have been for resumes so I can be
>> patient. Thank you for your reply.
>>
>> On Oct 24, 2017 1:14 PM, "Jim Apple" <jbap...@cloudera.com> wrote:
>>
>>> I am glad that you have made a decision to contribute. This is a good
>>> mailing list for talking about issues like this. I have changed the
>>> subject
>>> to make it reflect the topic of conversation - a topic like the original
>>> one ("APACHE - IMPALA") applies to all threads on this mailing list.
>>>
>>> https://community.apache.org/mentoringprogramme.html
>>>
>>> I'm not sure Impala qualifies, since Impala is not a "Top Level Project",
>>> (aka "TLP"). Impala is "Incubating". We may become a "Top Level Project"
>>> at
>>> some point, and in fact are hoping to do so, soon, but there is no
>>> guarantee.
>>>
>>>
>>> On Tue, Oct 24, 2017 at 12:25 PM, kenneth mcfarland <
>>> kennethpmcfarl...@gmail.com> wrote:
>>>
>>> > Hi Impala Community!
>>> >
>>> > I just took on IMPALA-5392 and I'm getting set up. My language of
>>> choice is
>>> > java but I secretly love assembler code so this is cool to do some work
>>> in
>>> > C++, all my previous contributions to ASF have been in java.
>>> >
>>> > Anyways, I have been searching like heck to find someone to do the ASF
>>> > mentor programme with me (it is a light load on mentor, not much). If
>>> there
>>> > is anyone here that thinks they can do this, it would be really cool!
>>> It's
>>> > great for ASF, your resume, community in general, and it helps get
>>> people
>>> > up to speed.
>>> >
>>> > Thank you guys so much in advance
>>> >
>>> > Kenneth
>>> >
>>>
>>


Re: Which instruction set extensions should Impala require?

2017-11-06 Thread Jim Apple
I like those ideas. Filed https://issues.apache.org/jira/browse/IMPALA-6166

On Mon, Nov 6, 2017 at 11:26 AM, Todd Lipcon <t...@cloudera.com> wrote:
> FWIW in Kudu we draw the line at requiring SSE4.2 but not AVX. I think this
> means we support Westmere (2010) and potentially also 1st gen Nehalem
> (2008). Going back older than 2008 definitely seems excessively generous.
>
> Another thing to keep in mind is that some virtualization software may not
> pass through all instruction set extensions. It would be worth double
> checking that relatively recent versions of VirtualBox or other
> commonly-used desktop VM options pass through AVX properly before making it
> a requirement.
>
> -Todd
>
> On Mon, Nov 6, 2017 at 10:33 AM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
>> I don't believe that we've seen people "seriously" using Impala with older
>> hardware than that recently, but it's hard to know for sure.
>>
>> It's nice for adoption to support relatively old hardware - one thing we
>> have seen before is people installing Impala on an old cluster to try it
>> out. E.g. they wanted to try Impala and had four old servers sitting around
>> unused. I don't think we should optimise too much for that case, but it's
>> an argument for supporting ~5 year old hardware that has probably been
>> retired from its original purpose.
>>
>> One option we could consider is drawing the line at
>> https://en.wikipedia.org/wiki/Sandy_Bridge and
>> https://en.wikipedia.org/wiki/Bulldozer_(microarchitecture)#
>> Instruction_set_extensions.
>> That would give us CLMUL, POPCNT, SSSE3, SSE4.1, SSE4.2 and AVX. It doesn't
>> look like AVX2 was available on AMD chips until 2015.
>>
>> It seems less disruptive if we drop support for older processors in a major
>> release - i.e. Impala 3.0. I don't think that needs to be a strict policy
>> of never dropping hardware support in a minor release but I think it's more
>> convenient for users.
>>
>> On Sat, Nov 4, 2017 at 10:53 PM, Jim Apple <jbap...@cloudera.com> wrote:
>>
>> > In a discussion on https://issues.apache.org/jira/browse/IMPALA-6128,
>> > we are talking about which instruction sets (available on newer x86-64
>> > processors) we want to require.
>> >
>> > At this point, I'm not sure how strong the motivation is for requiring
>> > certain instruction sets, but it may be worth some effort to talk
>> > about guidelines. As of now, we can decide at run time which methods
>> > to use based on CPU info gathered at daemon start time. See
>> > cpu-info.cc.
>> >
>> > The instruction in this case is the CLMUL instruction, which we
>> > believe was available on all new server-class x86-64 chips by Intel
>> > and AMD as of Q2, 2011. It has good performance benefits for
>> > spill-to-disk encryption.
>> >
>> > We currently use the following, but only dispatching at run time:
>> >
>> > SSSE3(*), SSE4.1, SSE4.2 (Available since late 2011 on both AMD and
>> Intel)
>> > POPCNT (Available since late 2008 on both AMD and Intel)
>> > AVX (late 2011)
>> > AVX2 (late 2015)
>> >
>> > One argument for continuing with our current requirements is that
>> > dispatching still gets us good speedup in some cases, and the branch
>> > predictor should take care of some of the latency of dispatching.
>> >
>> > One argument for adding more requirements is that not only can
>> > dispatching go away, but we can add flags to the compilers to use
>> > later instructions, which can speed up auto-vectorized operations or
>> > standard library operations. For instance, AVX has 256-bit registers
>> > that can speed up bulk memory operations.
>> >
>> > A concern I have with setting a time-based rule is that it doesn't
>> > seem easy to me to figure out when, say, AMD *stopped* selling
>> > server-class chips without AVX. So, if we started requiring AVX, we
>> > could have some Impala user with recent AMD chips become unable to run
>> > the latest Impala, which would be a shame.
>> >
>> > Thoughts about what we should require?
>> >
>> > (*) We spit out an error if the machine does not have SSSE3
>> >
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera


Re: IMPALA-5607

2017-11-05 Thread Jim Apple
My intuition is that this would not be as easy as decimal_v2 to put
behind a flag. Is that right?



On Wed, Nov 1, 2017 at 2:26 PM, Greg Rahn  wrote:
> It might not be that simple.
>
> One side-effect of the return type promotion of EXTRACT and DATE_PART() is
> that INSERTs that use either could potentially fail after this change
> simply because of the type mismatch and the fact that Impala does not do
> implicit type casting even if the value could fit.
>
> For example:
> create table t(i int);  -- assume this is a destination for any EXTRACT or
> DATE_PART() value
> insert into t values( cast(10 as bigint));  -- emulating BIGINT return type
> ERROR: Expression 'cast(10 as bigint)' (type: BIGINT) would need to be cast
> to INT for column 'I'
>
> The solution here is an obvious CAST because w/o the nanosecond support, no
> values can overflow INT, but that still requires changes from end users.
>
> Similarly with #2, there is a value change.  Arguably a good change, but a
> change nonetheless.  Again, there is a trivial solution to provide an
> equivalent expression that returns the legacy value, but again, a change.
>
> Based on this I would suggest we apply this change to the 3.0 branch and
> make sure we call out both behavioral changes in the 3.0 docs.
>
>
>
> On Tue, Oct 31, 2017 at 2:48 PM, Zachary Amsden 
> wrote:
>
>> In discussion about IMPALA-5607, it came up that implementing this as
>> described is a compatibility breaking change.  There are a couple of
>> required changes:
>>
>> 1) Return data type for date_part and EXTRACT FROM must be promoted to
>> BIGINT.
>> 2) MILLISECONDS now will include seconds part in the calculation of
>> milliseconds.
>>
>> I think the first is a non-issue; we're promoting the type to be wide
>> enough to hold nanoseconds precision values (with the seconds component
>> included).  An alternative could be to return this as a decimal type, but
>> that seems rather unwieldy for other date expressions so I'd prefer these
>> values to all be returned as integral types.
>>
>> The bigger issue is including seconds in the calculation of milliseconds,
>> microseconds and nanoseconds breaks the existing value returned for
>> milliseconds, which is just bare milliseconds with no seconds component.
>> For compatibility with other SQL implementations, I think we'd like to
>> include seconds with all of these date parts, but that is certainly
>> debatable.
>>
>> The question then, is anyone relying on this functionality that can't
>> easily workaround such a change?  The Impala documentation doesn't specify
>> this behavior either way, and there isn't a formal specification for how
>> sub-second granularity time is handled.  Whatever we decide, we should
>> document this going forward.
>>
>>  - Zach
>>


Which instruction set extensions should Impala require?

2017-11-04 Thread Jim Apple
In a discussion on https://issues.apache.org/jira/browse/IMPALA-6128,
we are talking about which instruction sets (available on newer x86-64
processors) we want to require.

At this point, I'm not sure how strong the motivation is for requiring
certain instruction sets, but it may be worth some effort to talk
about guidelines. As of now, we can decide at run time which methods
to use based on CPU info gathered at daemon start time. See
cpu-info.cc.

The instruction in this case is the CLMUL instruction, which we
believe was available on all new server-class x86-64 chips by Intel
and AMD as of Q2, 2011. It has good performance benefits for
spill-to-disk encryption.

We currently use the following, but only dispatching at run time:

SSSE3(*), SSE4.1, SSE4.2 (Available since late 2011 on both AMD and Intel)
POPCNT (Available since late 2008 on both AMD and Intel)
AVX (late 2011)
AVX2 (late 2015)

One argument for continuing with our current requirements is that
dispatching still gets us good speedup in some cases, and the branch
predictor should take care of some of the latency of dispatching.

One argument for adding more requirements is that not only can
dispatching go away, but we can add flags to the compilers to use
later instructions, which can speed up auto-vectorized operations or
standard library operations. For instance, AVX has 256-bit registers
that can speed up bulk memory operations.

A concern I have with setting a time-based rule is that it doesn't
seem easy to me to figure out when, say, AMD *stopped* selling
server-class chips without AVX. So, if we started requiring AVX, we
could have some Impala user with recent AMD chips become unable to run
the latest Impala, which would be a shame.

Thoughts about what we should require?

(*) We spit out an error if the machine does not have SSSE3


New Impala Contributors: IMPALA-3323

2017-11-04 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what
you want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on,
with hopefully enough detail to get you going but not so much to take
away the fun.


How can we fix https://issues.apache.org/jira/browse/IMPALA-3323,
"impala-shell --ldap_password_cmd has no config file equivalent"?
First, make sure you have your development environment set up. Let's
see if we can reproduce the issue. Once your impala-server is running,
try to launch the impala shell with the --ldap_password_cmd flag set:


$ bin/impala-shell.sh --ldap_password_cmd
Usage: impala_shell.py [options]

impala_shell.py: error: --ldap_password_cmd option requires an argument
$ bin/impala-shell.sh --ldap_password_cmd SOME_ARGUMENT
Option --ldap_password_cmd requires using LDAP authentication mechanism (-l)
$ bin/impala-shell.sh --ldap_password_cmd SOME_ARGUMENT -l
LDAP credentials may not be sent over insecure connections. Enable SSL
or set --auth_creds_ok_in_clear
$ bin/impala-shell.sh --ldap_password_cmd SOME_ARGUMENT -l
--auth_creds_ok_in_clear
Starting Impala Shell using LDAP-based authentication
Error retrieving LDAP password (command was: 'SOME_ARGUMENT',
exception was: '[Errno 2] No such file or directory')

While not a resounding success, at least we know that the shell can
get past its argument parsing phase! To duplicate the issue referenced
in the ticket, let's create a .impalarc file that should recognize
that the --ldap_password_cmd flag is set. To see how a valid impalarc
flag looks, grep through the source code for references to it using
"git grep impalarc". You'll see references in
tests/shell/test_shell_commandline.py to the --config_file flag and a
file named good_impalarc. You can find that file using "find . -name
good_impalarc" and try to duplicate the command. Then, run it again,
but with a config file with a reference to ldap_password_cmd. What
error do you get? If you grep through the source code, where can you
find that error text referenced? What triggers it, and how can you fix
it?

Once you've solved that mystery and you can make an impala config file
that causes the shell to recognize the ldap_password_cmd option,
you'll want to write a regression test for it. In the
test_shell_commandline.py file, you'll see references to tests of
config files and tests of LDAP options. Use your best judgment on
whether this ticket deserves its own test method or can be folded into
one of the other two. As you iterate, you can test this file with

bin/impala-py.test
tests/shell/test_shell_commandline.py::TestImpalaShell::test_ldap_3323

In that example command line, test_ldap_3323 is a test method name -
you can change it to the method name of any other test method in that
file.

Have fun, and don't be afraid to ask d...@impala.apache.org is you have
any questions!


Re: Graduation resolution proposal

2017-10-31 Thread Jim Apple
I have sent this to general@ for discussion:

https://lists.apache.org/thread.html/6b8598408f76a472532923c5a7fc510470b21671677ba3486568c57e@%3Cgeneral.incubator.apache.org%3E

On Sat, Oct 28, 2017 at 8:12 AM, Jim Apple <jbap...@cloudera.com> wrote:
> Below is a graduation resolution I would like to send to
> general@incubator for discussion. It includes the PMC volunteers as
> well as the result of the first PMC chair election, which was me.
>
> Unless there is objection, I'll send this to general@incubator for
> discussion in a couple of days. If you want to participate in that
> discussion at general@incubator, you can subscribe by emailing
> general-subscr...@incubator.apache.org.
>
> As a reminder, the next steps I will take are:
>
> 1. Prepare a charter (i.e. this email)
>
> 2. Start a discussion on general@incubator.
>
> Should the discussion look mostly positive:
>
> 3. Call a vote on general@incubator.
>
> Should that vote succeed:
>
> 4. Submit the resolution to the ASF Board. See more here:
> http://incubator.apache.org/guides/graduation.html
>
> ---
>
> Establish the Apache Impala Project
>
> WHEREAS, the Board of Directors deems it to be in the best interests of
> the Foundation and consistent with the Foundation's purpose to establish
> a Project Management Committee charged with the creation and maintenance
> of open-source software, for distribution at no charge to the public,
> related to a high-performance distributed SQL engine.
>
> NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
> (PMC), to be known as the "Apache Impala Project", be and hereby is
> established pursuant to Bylaws of the Foundation; and be it further
>
> RESOLVED, that the Apache Impala Project be and hereby is responsible
> for the creation and maintenance of software related to a
> high-performance distributed SQL engine; and be it further
>
> RESOLVED, that the office of "Vice President, Apache Impala" be and
> hereby is created, the person holding such office to serve at the
> direction of the Board of Directors as the chair of the Apache Impala
> Project, and to have primary responsibility for management of the
> projects within the scope of responsibility of the Apache Impala
> Project; and be it further
>
> RESOLVED, that the persons listed immediately below be and hereby are
> appointed to serve as the initial members of the Apache Impala Project:
>
>  * Alex Behm <ab...@apache.org>
>  * Bharath Vissapragada  <bhara...@apache.org>
>  * Brock Noland  <br...@apache.org>
>  * Carl Steinbach<c...@apache.org>
>  * Casey Ching   <ca...@apache.org>
>  * Daniel Hecht  <dhe...@apache.org>
>  * Dimitris Tsirogiannis <dtsirogian...@apache.org>
>  * Henry Robinson<he...@apache.org>
>  * Ishaan Joshi  <ish...@apache.org>
>  * Jim Apple <jbap...@apache.org>
>  * John Russell  <jruss...@apache.org>
>  * Juan Yu   <j...@apache.org>
>  * Lars Volker   <l...@apache.org>
>  * Lenni Kuff<lsk...@apache.org>
>  * Marcel Kornacker  <mar...@apache.org>
>  * Martin Grund  <mgr...@apache.org>
>  * Matthew Jacobs<mjac...@apache.org>
>  * Michael Brown <mi...@apache.org>
>  * Michael Ho<k...@apache.org>
>  * Sailesh Mukil <sail...@apache.org>
>  * Skye Wanderman-Milne  <s...@apache.org>
>  * Taras Bobrovytsky <taras...@apache.org>
>  * Tim Armstrong <tarmstr...@apache.org>
>  * Todd Lipcon   <t...@apache.org>
>
> NOW, THEREFORE, BE IT FURTHER RESOLVED, that Jim Apple be appointed to
> the office of Vice President, Apache Impala, to serve in accordance with
> and subject to the direction of the Board of Directors and the Bylaws of
> the Foundation until death, resignation, retirement, removal or
> disqualification, or until a successor is appointed; and be it further
>
> RESOLVED, that the initial Apache Impala PMC be and hereby is tasked
> with the creation of a set of bylaws intended to encourage open
> development and increased participation in the Apache Impala Project;
> and be it further
>
> RESOLVED, that the Apache Impala Project be and hereby is tasked with
> the migration and rationalization of the Apache Incubator Impala
> podling; and be it further
>
> RESOLVED, that all responsibilities pertaining to the Apache Incubator
> Impala podling encumbered upon the Apache Incubator PMC are hereafter
> discharged.


PMC Chair vote results in detail

2017-10-30 Thread Jim Apple
In response to a voter query, the detailed vote results are as follows:

The final electronic tally was 11-6. However, Marcel and I were both
sent two ballots, for technical reasons, and for the same reason,
Marcel was unable to vote either time. That makes the final actual
tally 10-7.


Re: Wiki update no triaging test failures

2017-10-27 Thread Jim Apple
Nice work, Phil!

On Fri, Oct 27, 2017 at 9:39 AM, Philip Zeyliger 
wrote:

> I wrote up some quick notes on how to look at logs on a Jenkins server to
> figure out what failed. They're at
> https://cwiki.apache.org/confluence/pages/diffpagesbyversion.action?
> pageId=65147141=18=19
> (diff) or
> https://cwiki.apache.org/confluence/display/IMPALA/How+
> to+load+and+run+Impala+tests
> (raw).
>
> If anyone has tricks up their sleeve, please do share them.
>
> Comments/edits welcome!
>
> -- Philip
>


New Impala Contributors: IMPALA-941

2017-10-26 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what you
want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on, with
hopefully enough detail to get you going but not so much to take away the
fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-941, "Impala
Parser issue when using fully qualified table names that start with a
number"? First, you'll want to get your development environment set up and
make sure the parse tests are passing. Follow the examples on
https://cwiki.apache.org/confluence/display/IMPALA/How+to+load+and+run+Impala+tests
:

(pushd fe && mvn -fae test -Dtest=ParserTest)

This is running the tests in the file
fe/src/test/java/org/apache/impala/analysis/ParserTest.java. Now that you
have checked your development environment is working, add a new test to
ParserTest.java. There is an example of a statement that fails to parse in
the ticket. Given that test case, can you find a method in ParserTest.java
that should be testing this statement? If not, make a new test method
annotated with @Test and with a method name starting with "Test".

Now run the test again. It should fail and give an error message similar to
the one in the ticket. You should now be ready to fix the bug.

The lexing and parsing of SQL are performed in
fe/src/main/jflex/sql-scanner.flex and fe/src/main/cup/sql-parser.cup,
respectively. The error message indicates "Encountered: DECIMAL LITERAL".
If you run "git grep 'DECIMAL LITERAL'", you will see that this is
referenced in just sql-scanner.flex. This is because decimal literals are
parsed as a single token. In other words, in the query listed in the ticket
"INVALIDATE METADATA db.571_market", "db" is lexed as IdentifierOrKw,
".571" is lexed as a DecimalLiteral, and "_market" is lexed as
IdentifierOrKw.

To fix this, you need "db.571_market" to be lexed as the sequence
IdentifierOrKw SqlParserSymbols.DOT IdentifierOrKw. The dot will be parsed
in sql-parser.cup as table_name. In order for sql-parser.cup to be able to
do so, the lexer must not over-eagerly identify a DecimalLiteral. You can
probably achieve that by delaying the recognition of decimal literals to
the parser. Try to translate the lexer's definition of DecimalLiteral to a
definition that works in the parser.

You'll probably find the manuals for the lexer and the parser useful:

http://jflex.de/manual.html
http://www2.cs.tum.edu/projects/cup/docs.php

Have fun! Once all the tests are passing again, you can send your patch for
review following
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.


Mentoring Programme, was Re: APACHE - IMPALA

2017-10-24 Thread Jim Apple
I am glad that you have made a decision to contribute. This is a good
mailing list for talking about issues like this. I have changed the subject
to make it reflect the topic of conversation - a topic like the original
one ("APACHE - IMPALA") applies to all threads on this mailing list.

https://community.apache.org/mentoringprogramme.html

I'm not sure Impala qualifies, since Impala is not a "Top Level Project",
(aka "TLP"). Impala is "Incubating". We may become a "Top Level Project" at
some point, and in fact are hoping to do so, soon, but there is no
guarantee.


On Tue, Oct 24, 2017 at 12:25 PM, kenneth mcfarland <
kennethpmcfarl...@gmail.com> wrote:

> Hi Impala Community!
>
> I just took on IMPALA-5392 and I'm getting set up. My language of choice is
> java but I secretly love assembler code so this is cool to do some work in
> C++, all my previous contributions to ASF have been in java.
>
> Anyways, I have been searching like heck to find someone to do the ASF
> mentor programme with me (it is a light load on mentor, not much). If there
> is anyone here that thinks they can do this, it would be really cool! It's
> great for ASF, your resume, community in general, and it helps get people
> up to speed.
>
> Thank you guys so much in advance
>
> Kenneth
>


Re: Will there be a 2.12.0 release?

2017-10-24 Thread Jim Apple
Do we want to have a 3.0 process, where one person tracks all of the open
breaking-change JIRAs and makes sure nothing gets accidentally left out? I
ask this because, if the answer is "yes", we might make the 2.12 decision
based on scope and quantity of 3.0 JIRAs.

On Tue, Oct 24, 2017 at 10:07 AM, Tim Armstrong 
wrote:

> I was just retargeting some JIRAs from 2.11 to a later release. I'm
> wondering if people had thoughts on whether we should have a 2.12 release
> before 3.0?
>
> We have a lot of breaking changes queued up so I'm sure people are looking
> forward to 3.0, but do we think there will be a minor release before then?
>


Re: Broken Link

2017-10-24 Thread Jim Apple
Fxied; thanks for finding this, Kenneth.

On Mon, Oct 23, 2017 at 8:04 PM, kenneth mcfarland <
kennethpmcfarl...@gmail.com> wrote:

> That works great, thank you!
>
> On Oct 23, 2017 7:57 PM, "Mostafa Mokhtar"  wrote:
>
> > This link should work:
> >
> > cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf
> >
> > Will try to update the link in http://impala.apache.org/overview.html.
> >
> > On Mon, Oct 23, 2017 at 7:52 PM, kenneth mcfarland <
> > kennethpmcfarl...@gmail.com> wrote:
> >
> > > Hi Impala Crew,
> > >
> > > I really wanted to read about the architecture as I'm new, curious, and
> > > decide to take a swing at IMPALA-5392.
> > >
> > > This link on the /overview page is busted:
> > >
> > > http://www.cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf
> > >
> > > Thanks in advance,
> > >
> > > Kenny
> > >
> >
>


Re: Podling Report Reminder - November 2017

2017-10-23 Thread Jim Apple
Thanks - added to wiki with those two sign-offs

On Mon, Oct 23, 2017 at 11:31 AM, Brock Noland <br...@phdata.io> wrote:

> Same for me
>
> On Mon, Oct 23, 2017 at 1:28 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
> > Looks good to me. Feel free to add my sign-off when you post to the wiki.
> >
> > -Todd
> >
> > On Sat, Oct 21, 2017 at 4:34 PM, Jim Apple <jbap...@cloudera.com> wrote:
> >
> > > Please review:
> > >
> > > Impala is a high-performance C++ and Java SQL query engine for data
> > stored
> > > in
> > > Apache Hadoop-based clusters.
> > >
> > > Impala has been incubating since 2015-12-03.
> > >
> > > Three most important issues to address in the move towards graduation:
> > >
> > >   Our graduation proposal is in the works.
> > >
> > > Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> > > aware of?
> > >
> > >  No
> > >
> > > How has the community developed since the last report?
> > >
> > >  There have been 279 Commits:
> > >git log --format='%ci' | grep -cE '2017-(08|09|10)'
> > >
> > >  62 of those commits were by non-committers:
> > >git log --format='%an %ci' | grep -E '2017-(08|09|10)' | tr -d
> > '0-9\-' |
> > > cut -d ' ' -f -2 | sort | uniq -c | sort -n
> > >
> > > Of the 37 patch authors, 16 were not committers at the beginning of
> this
> > > reporting period.
> > >
> > >  There are three new committers members and one new PPMC member:
> > >
> > > https://lists.apache.org/list.html?d...@impala.apache.org:
> > > dfr=2017-8-1|dto=2017-10-31:%22has%20invited%22
> > >
> > > Impala has done a fourth release with a third release manager.
> > >
> > > Impala has begun graduation procedures: we have held a community
> > discussion
> > > and a community vote on graduation, both unanimous. We have established
> > our
> > > intended PMC. Next, we will draft our charter and hold a discussion on
> > > general@incubator.
> > >
> > > How has the project developed since the last report?
> > >
> > > Impala has removed the old unpartitioned hash and aggregation nodes,
> > relics
> > > from years ago that were kept around for backwards compatibility: the
> new
> > > buffer management makes these obsolete. Code generation for decimal and
> > > timestamp types has been added to the text scanner, increasing the
> > > performance of some queries by up to 19%. More robust query plans in
> case
> > > of data skew have made some aggregations eight times as fast. A number
> of
> > > large changes are in-flight, including changes to equivalence class
> > > computation in the planner, more decimal semantics adjustments, min-max
> > > filters for Kudu, and multi-threaded metadata loading that increases
> the
> > > performance of some metadata operations by 8x.
> > >
> > > How would you assess the podling's maturity?
> > > Please feel free to add your own commentary.
> > >
> > >  [ ] Initial setup
> > >  [ ] Working towards first release
> > >  [] Community building
> > >  [X] Nearing graduation
> > >  [ ] Other:
> > >
> > > Date of last release:
> > >
> > >  2017-09-14
> > >
> > > When were the last committers or PPMC members elected?
> > >
> > >  2017-09-29
> > >
> > > Signed-off-by:
> > >
> > >  [ ](impala) Tom White
> > > Comments:
> > >  [ ](impala) Todd Lipcon
> > > Comments:
> > >  [ ](impala) Carl Steinbach
> > > Comments:
> > >  [ ](impala) Brock Noland
> > > Comments:
> > >
> > > On Sat, Oct 21, 2017 at 3:43 PM, <johndam...@apache.org> wrote:
> > >
> > > > Dear podling,
> > > >
> > > > This email was sent by an automated system on behalf of the Apache
> > > > Incubator PMC. It is an initial reminder to give you plenty of time
> to
> > > > prepare your quarterly board report.
> > > >
> > > > The board meeting is scheduled for Wed, 15 November 2017, 10:30 am
> PDT.
> > > > The report for your podling will form a part of the Incubator PMC
> > > > report. The Incubator PMC requires your report to be submitted 2
> weeks
> > > > before the board meeting, to allow sufficient tim

Re: Time for graduation?

2017-10-23 Thread Jim Apple
As a clarification: Todd, Marcel, and I are "vote monitors", but we can't
see who voted for whom or even who has cast a vote.

On Mon, Oct 23, 2017 at 11:17 AM, Todd Lipcon <t...@cloudera.com> wrote:

> Thanks for setting this up, Jim! Can confirm that the vote request came
> through.
>
> -Todd
>
> On Mon, Oct 23, 2017 at 11:14 AM, Jim Apple <jbap...@cloudera.com> wrote:
>
> > I have sent out the Chair election ballot via steve.apache.org to people
> > who will be listed in our resolution as PMC members, via email addresses
> (@
> > apache.org). Please vote. The vote closes Friday.
> >
> > On Fri, Oct 20, 2017 at 10:19 PM, Jim Apple <jbap...@cloudera.com>
> wrote:
> >
> > > I have a resolution with a blank space for chair, and the community
> voted
> > > unanimously to graduate.
> > >
> > > I also have a set of people who will make up the PMC (should we
> > graduate),
> > > based on their responses to the email I sent.
> > >
> > > We have two volunteers for PMC chair. I'll call a vote with potential
> PMC
> > > members as voters, starting on Monday using https://steve.apache.org/,
> > > following examples from other ASF projects.
> > >
> > > On Thu, Oct 12, 2017 at 3:24 PM, Jim Apple <jbap...@cloudera.com>
> wrote:
> > >
> > >> I think it would be a good time to graduate. I'm very proud of the
> > >> progress the community has made in terms of acting in an Apache way.
> > >>
> > >> Some logistics:
> > >>
> > >> I would be happy to serve as an initial chair.
> > >>
> > >> I'll draft a resolution, with a blank space for chair. This doesn't
> mean
> > >> we have to agree now is the time to graduate, but we'll have it
> > available
> > >> for discussion and revision whenever we are ready.
> > >>
> > >> If we decide to graduate now, maybe we could email everyone who is on
> > the
> > >> PPMC, ccing private@, to see if they are still interested in being on
> > >> the PMC, and taking no response to mean "yes" until we hear otherwise,
> > in
> > >> case someone is on vacation away from email, or in the hospital, or
> > >> something.
> > >>
> > >> Also, mentors are traditionally included in a graduating podling's
> PMC,
> > >> right?
> > >>
> > >> On Thu, Oct 12, 2017 at 2:17 PM, Todd Lipcon <t...@apache.org> wrote:
> > >>
> > >>> Hey Impala community,
> > >>>
> > >>> It's been a while that all of the Impala infrastructure has been
> moved
> > >>> over, and the community appears to be functioning healthily,
> generating
> > >>> new
> > >>> releases on a regular cadence as well as adding new committers and
> PPMC
> > >>> members. All of the branding stuff seems great, and the user mailing
> > list
> > >>> has a healthy amount of traffic and a good track record of answering
> > >>> questions when they come up.
> > >>>
> > >>> As a mentor I think it's probably time to discuss graduation. The
> > project
> > >>> is already functioning in the same way as your typical Apache TLP and
> > it
> > >>> seems like it's time to become one.
> > >>>
> > >>> Any thoughts? If everyone is on board, the next step would be:
> > >>>
> > >>> 1. Pick the initial PMC chair for the TLP. According to the published
> > >>> Impala Bylaws it seems that this is meant to rotate annually, so no
> > need
> > >>> to
> > >>> stress too much about it.
> > >>>
> > >>> A couple obvious choices here would be Marcel (as the original
> founder
> > of
> > >>> the project) or perhaps Jim (who has done yeoman's work on a lot of
> the
> > >>> incubation process, podling reports, etc). Others could certainly
> > >>> volunteer
> > >>> or be nominated as well.
> > >>>
> > >>> 2. Draft a Resolution for the PPMC and IPMC to vote upon.
> > >>> -- the resolution would include the above-decided chair as well as
> the
> > >>> list
> > >>> of initial PMC, etc.
> > >>> -- the Initial PMC could be just the current list of PPMC, or you
> could
> > >>> consider adding others at this point as well.
> > >>>
> > >>>
> > >>> I can help with the above process but figured I'd solicit opinions
> > first
> > >>> on
> > >>> whether the communit feels it's ready to graduate.
> > >>>
> > >>> Thanks
> > >>> Todd
> > >>>
> > >>
> > >>
> > >
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: Time for graduation?

2017-10-23 Thread Jim Apple
I have sent out the Chair election ballot via steve.apache.org to people
who will be listed in our resolution as PMC members, via email addresses (@
apache.org). Please vote. The vote closes Friday.

On Fri, Oct 20, 2017 at 10:19 PM, Jim Apple <jbap...@cloudera.com> wrote:

> I have a resolution with a blank space for chair, and the community voted
> unanimously to graduate.
>
> I also have a set of people who will make up the PMC (should we graduate),
> based on their responses to the email I sent.
>
> We have two volunteers for PMC chair. I'll call a vote with potential PMC
> members as voters, starting on Monday using https://steve.apache.org/,
> following examples from other ASF projects.
>
> On Thu, Oct 12, 2017 at 3:24 PM, Jim Apple <jbap...@cloudera.com> wrote:
>
>> I think it would be a good time to graduate. I'm very proud of the
>> progress the community has made in terms of acting in an Apache way.
>>
>> Some logistics:
>>
>> I would be happy to serve as an initial chair.
>>
>> I'll draft a resolution, with a blank space for chair. This doesn't mean
>> we have to agree now is the time to graduate, but we'll have it available
>> for discussion and revision whenever we are ready.
>>
>> If we decide to graduate now, maybe we could email everyone who is on the
>> PPMC, ccing private@, to see if they are still interested in being on
>> the PMC, and taking no response to mean "yes" until we hear otherwise, in
>> case someone is on vacation away from email, or in the hospital, or
>> something.
>>
>> Also, mentors are traditionally included in a graduating podling's PMC,
>> right?
>>
>> On Thu, Oct 12, 2017 at 2:17 PM, Todd Lipcon <t...@apache.org> wrote:
>>
>>> Hey Impala community,
>>>
>>> It's been a while that all of the Impala infrastructure has been moved
>>> over, and the community appears to be functioning healthily, generating
>>> new
>>> releases on a regular cadence as well as adding new committers and PPMC
>>> members. All of the branding stuff seems great, and the user mailing list
>>> has a healthy amount of traffic and a good track record of answering
>>> questions when they come up.
>>>
>>> As a mentor I think it's probably time to discuss graduation. The project
>>> is already functioning in the same way as your typical Apache TLP and it
>>> seems like it's time to become one.
>>>
>>> Any thoughts? If everyone is on board, the next step would be:
>>>
>>> 1. Pick the initial PMC chair for the TLP. According to the published
>>> Impala Bylaws it seems that this is meant to rotate annually, so no need
>>> to
>>> stress too much about it.
>>>
>>> A couple obvious choices here would be Marcel (as the original founder of
>>> the project) or perhaps Jim (who has done yeoman's work on a lot of the
>>> incubation process, podling reports, etc). Others could certainly
>>> volunteer
>>> or be nominated as well.
>>>
>>> 2. Draft a Resolution for the PPMC and IPMC to vote upon.
>>> -- the resolution would include the above-decided chair as well as the
>>> list
>>> of initial PMC, etc.
>>> -- the Initial PMC could be just the current list of PPMC, or you could
>>> consider adding others at this point as well.
>>>
>>>
>>> I can help with the above process but figured I'd solicit opinions first
>>> on
>>> whether the communit feels it's ready to graduate.
>>>
>>> Thanks
>>> Todd
>>>
>>
>>
>


Re: Podling Report Reminder - November 2017

2017-10-21 Thread Jim Apple
Please review:

Impala is a high-performance C++ and Java SQL query engine for data stored
in
Apache Hadoop-based clusters.

Impala has been incubating since 2015-12-03.

Three most important issues to address in the move towards graduation:

  Our graduation proposal is in the works.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 No

How has the community developed since the last report?

 There have been 279 Commits:
   git log --format='%ci' | grep -cE '2017-(08|09|10)'

 62 of those commits were by non-committers:
   git log --format='%an %ci' | grep -E '2017-(08|09|10)' | tr -d '0-9\-' |
cut -d ' ' -f -2 | sort | uniq -c | sort -n

Of the 37 patch authors, 16 were not committers at the beginning of this
reporting period.

 There are three new committers members and one new PPMC member:

https://lists.apache.org/list.html?d...@impala.apache.org:dfr=2017-8-1|dto=2017-10-31:%22has%20invited%22

Impala has done a fourth release with a third release manager.

Impala has begun graduation procedures: we have held a community discussion
and a community vote on graduation, both unanimous. We have established our
intended PMC. Next, we will draft our charter and hold a discussion on
general@incubator.

How has the project developed since the last report?

Impala has removed the old unpartitioned hash and aggregation nodes, relics
from years ago that were kept around for backwards compatibility: the new
buffer management makes these obsolete. Code generation for decimal and
timestamp types has been added to the text scanner, increasing the
performance of some queries by up to 19%. More robust query plans in case
of data skew have made some aggregations eight times as fast. A number of
large changes are in-flight, including changes to equivalence class
computation in the planner, more decimal semantics adjustments, min-max
filters for Kudu, and multi-threaded metadata loading that increases the
performance of some metadata operations by 8x.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [ ] Initial setup
 [ ] Working towards first release
 [] Community building
 [X] Nearing graduation
 [ ] Other:

Date of last release:

 2017-09-14

When were the last committers or PPMC members elected?

 2017-09-29

Signed-off-by:

 [ ](impala) Tom White
Comments:
 [ ](impala) Todd Lipcon
Comments:
 [ ](impala) Carl Steinbach
Comments:
 [ ](impala) Brock Noland
Comments:

On Sat, Oct 21, 2017 at 3:43 PM,  wrote:

> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 15 November 2017, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, November 01).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
> *   How does the podling rate their own maturity.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/November2017
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>


Re: Want to contribute

2017-10-21 Thread Jim Apple
You can subscribe to the list by mailing
dev-subscr...@impala.incubator.apache.org.

On Wed, Oct 18, 2017 at 1:52 AM, Kapil Jain  wrote:

> I would like to contribute. Could you please add me to the list.
>
> Kapil
>


Re: Time for graduation?

2017-10-20 Thread Jim Apple
I have a resolution with a blank space for chair, and the community voted
unanimously to graduate.

I also have a set of people who will make up the PMC (should we graduate),
based on their responses to the email I sent.

We have two volunteers for PMC chair. I'll call a vote with potential PMC
members as voters, starting on Monday using https://steve.apache.org/,
following examples from other ASF projects.

On Thu, Oct 12, 2017 at 3:24 PM, Jim Apple <jbap...@cloudera.com> wrote:

> I think it would be a good time to graduate. I'm very proud of the
> progress the community has made in terms of acting in an Apache way.
>
> Some logistics:
>
> I would be happy to serve as an initial chair.
>
> I'll draft a resolution, with a blank space for chair. This doesn't mean
> we have to agree now is the time to graduate, but we'll have it available
> for discussion and revision whenever we are ready.
>
> If we decide to graduate now, maybe we could email everyone who is on the
> PPMC, ccing private@, to see if they are still interested in being on the
> PMC, and taking no response to mean "yes" until we hear otherwise, in case
> someone is on vacation away from email, or in the hospital, or something.
>
> Also, mentors are traditionally included in a graduating podling's PMC,
> right?
>
> On Thu, Oct 12, 2017 at 2:17 PM, Todd Lipcon <t...@apache.org> wrote:
>
>> Hey Impala community,
>>
>> It's been a while that all of the Impala infrastructure has been moved
>> over, and the community appears to be functioning healthily, generating
>> new
>> releases on a regular cadence as well as adding new committers and PPMC
>> members. All of the branding stuff seems great, and the user mailing list
>> has a healthy amount of traffic and a good track record of answering
>> questions when they come up.
>>
>> As a mentor I think it's probably time to discuss graduation. The project
>> is already functioning in the same way as your typical Apache TLP and it
>> seems like it's time to become one.
>>
>> Any thoughts? If everyone is on board, the next step would be:
>>
>> 1. Pick the initial PMC chair for the TLP. According to the published
>> Impala Bylaws it seems that this is meant to rotate annually, so no need
>> to
>> stress too much about it.
>>
>> A couple obvious choices here would be Marcel (as the original founder of
>> the project) or perhaps Jim (who has done yeoman's work on a lot of the
>> incubation process, podling reports, etc). Others could certainly
>> volunteer
>> or be nominated as well.
>>
>> 2. Draft a Resolution for the PPMC and IPMC to vote upon.
>> -- the resolution would include the above-decided chair as well as the
>> list
>> of initial PMC, etc.
>> -- the Initial PMC could be just the current list of PPMC, or you could
>> consider adding others at this point as well.
>>
>>
>> I can help with the above process but figured I'd solicit opinions first
>> on
>> whether the communit feels it's ready to graduate.
>>
>> Thanks
>> Todd
>>
>
>


[RESULT] [VOTE] Graduate to a TLP

2017-10-20 Thread Jim Apple
The result is

+1: 30: Greg Rahn, Michael Brown, Bikramjeet Vig, Shant Hovsepian, Martin
Grund, Juan, Tom White, Lars Volker, Brock Noland, Mostafa Mokhtar, Bharath
Vissapragada, Thomas Tauber-Marshall, Michael Ho, Taras Bobrovytsky,
Alexander Behm, Tim Armstron, Jeszy, Marcel Kornacker, Sailesh Mukil,
Matthew Jacobs, yu feng, Philip Zeyliger, John Sherman, Joe McDonnell, Anuj
Phadke, Daniel Hecht, David Knupp, Zachary Amsden, Quanlong Huang, Jim Apple
Other votes: none

We haven't graduated yet. The next steps are these:

1. Prepare a charter.

2. Start a discussion on general@incubator.

Should the discussion look mostly positive:

3. Call a vote on general@incubator.

Should that vote succeed:

4. Submit the resolution to the ASF Board. See more here:
http://incubator.apache.org/guides/graduation.html


Re: Using Gerrit drafts

2017-10-19 Thread Jim Apple
I'm surprised my edits gave you the impression that I think people should
not review their code before asking for peer review.

On Thu, Oct 19, 2017 at 1:06 PM, Daniel Hecht  wrote:

> I think it's a good idea for everyone to review their own code before
> asking for a peer review.  A lot of style nits, etc can be addressed
> before the first peer review iteration that way.
>
> On Thu, Oct 19, 2017 at 11:44 AM, Philip Zeyliger 
> wrote:
> 
> > I think it's a good idea for new contributors to review their code
> reviews
> > first and explicitly hit publish. Experienced contributors will get
> > sufficiently acquainted with Gerrit over time.
> >
>


Re: Using Gerrit drafts

2017-10-19 Thread Jim Apple
Sorry, didn't mean "workflow", mean "repo".

On Thu, Oct 19, 2017 at 11:54 AM, Jim Apple <jbap...@cloudera.com> wrote:

> In between when I saw the edit and when I saw this email, I re-edited.
> Please fell free to re-edit again. Mostly I didn't want "Publish" to sound
> like it was going to be part of the workflow, but I also made drafts the
> second suggestion, because I would expect gerrit newbies to publish there
> too often and then disappear when nobody reviews their code.
>
> New users misunderstanding and abusing refs/for/master is rare.
>
> On Thu, Oct 19, 2017 at 11:44 AM, Philip Zeyliger <phi...@cloudera.com>
> wrote:
>
>> On Thu, Oct 19, 2017 at 10:50 AM, Daniel Hecht <dhe...@cloudera.com>
>> wrote:
>>
>> > Add this info to
>> > https://cwiki.apache.org/confluence/display/IMPALA/
>> > Using+Gerrit+to+submit+and+review+patches
>> > if not already there?
>>
>>
>> Thanks for the suggestion.
>>
>> I updated
>> https://cwiki.apache.org/confluence/display/IMPALA/Using+
>> Gerrit+to+submit+and+review+patches#UsingGerrittosubmitandreviewpatches-
>> Sendingapatchforreview.1
>> to basically recommend the draft workflow over the "refs/for" workflow. I
>> think it's a good idea for new contributors to review their code reviews
>> first and explicitly hit publish. Experienced contributors will get
>> sufficiently acquainted with Gerrit over time.
>>
>> -- Philip
>>
>
>


Re: Using Gerrit drafts

2017-10-19 Thread Jim Apple
In between when I saw the edit and when I saw this email, I re-edited.
Please fell free to re-edit again. Mostly I didn't want "Publish" to sound
like it was going to be part of the workflow, but I also made drafts the
second suggestion, because I would expect gerrit newbies to publish there
too often and then disappear when nobody reviews their code.

New users misunderstanding and abusing refs/for/master is rare.

On Thu, Oct 19, 2017 at 11:44 AM, Philip Zeyliger 
wrote:

> On Thu, Oct 19, 2017 at 10:50 AM, Daniel Hecht 
> wrote:
>
> > Add this info to
> > https://cwiki.apache.org/confluence/display/IMPALA/
> > Using+Gerrit+to+submit+and+review+patches
> > if not already there?
>
>
> Thanks for the suggestion.
>
> I updated
> https://cwiki.apache.org/confluence/display/IMPALA/
> Using+Gerrit+to+submit+and+review+patches#UsingGerrittosubmitandreviewpa
> tches-Sendingapatchforreview.1
> to basically recommend the draft workflow over the "refs/for" workflow. I
> think it's a good idea for new contributors to review their code reviews
> first and explicitly hit publish. Experienced contributors will get
> sufficiently acquainted with Gerrit over time.
>
> -- Philip
>


New Impala contributors: IMPALA-5314

2017-10-19 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what you
want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on, with
hopefully enough detail to get you going but not so much to take away the
fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-5314, "Rename
single letter tables in FE tests"? First, get your development environment
set up. Then, since this is a ticket about frontend ("FE") tests, you'll
want to check that your frontend tests are passing. Following
https://cwiki.apache.org/confluence/display/IMPALA/How+to+load+and+run+Impala+tests
:

(pushd fe && mvn -fae test)

This particular ticket proposes to change single-letter table names to
names less likely to collide with a developer's existing environment. Let's
check to see what these look like. Run bin/impala-shell.sh and:

[localhost:21000] > create table t (id int);
Query: create table t (id int)
Fetched 0 row(s) in 0.04s
[localhost:21000] > create table p (id int);
Query: create table p (id int)
Fetched 0 row(s) in 0.03s

Now run your frontend tests again. At the moment I am writing this, that
causes 15 failures. Doing a very simple search-and-replace (using git
ls-files and sed), I replaced " table p " and " table t " with " table
JohnJacobJingleheimerSchmidt ", which reduced the number of failures to 11.
This likely means that this simple method will help four tests could be
made resilient to developer-created single-letter table names.

Next, drop the tables t and p that you created above via
bin/impala-shell.sh. Run the frontend tests again and carefully inspect the
output. This will help you fix any tests broken by the simple
search-and-replace.

Once all the tests are passing again, you can send your patch for review
following
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
Your patch doesn't have to fix the entirety of the ticket to be sent for
review. It is probably good to get one patch under your belt before sending
any large patches. One this patch is committed and in Impala, you can come
back to this ticket and continue to fix single table names in future
patches.

One place you'll want to look for those is in
testdata/workloads/functional-planner/queries/PlannerTest/*.test. These
files are test data for use in
fe/src/test/java/org/apache/impala/planner/*Test.java. For those, a more
complex search-and-replace pattern will remove some issues, but it won't
fix everything.

Have fun!


Re: New Impala contributors: outreach

2017-10-19 Thread Jim Apple
I am going to also start adding to https://helpwanted.apache.org/, now that
https://issues.apache.org/jira/browse/COMDEV-225 is fixed. Here is the
first one:

https://helpwanted.apache.org/task.html?3cb146d3a253d0dd1f951c151b37f7e8e8fc97c7

On Wed, Sep 6, 2017 at 3:01 PM, Jim Apple <jbap...@cloudera.com> wrote:

> OK, done:
>
> https://github.com/yourfirstpr/yourfirstpr.github.io/issues/86
>
> https://github.com/up-for-grabs/up-for-grabs.net/pull/717
>
> Can whoever runs the Apache Impala twitter account tweet at
> https://twitter.com/yourfirstpr?lang=en?
>
> On Sun, Sep 3, 2017 at 7:27 PM, Jim Apple <jbap...@cloudera.com> wrote:
> > I'd like to encourage people who haven't contributed to Impala before
> > to get started making patches. One way to do that would be to engage
> > with communities where people for reaching out to new contributors.
> >
> > It appears that Your First PR has a mechanism to invite new
> > contributors to the project by filing a ticket:
> >
> > https://github.com/yourfirstpr/yourfirstpr.github.io/issues?q=is%3Aopen+
> is%3Aissue
> >
> > Any objections to me filing a ticket there and pointing to
> > https://issues.apache.org/jira/issues/?filter=12341668? That's
> > "newbie" open bugs with no assignee.
> >
> > We could also tweet at https://twitter.com/yourfirstpr?lang=en.
> >
> > I'm interested in also reaching out via
> > https://github.com/up-for-grabs/up-for-grabs.net#add-a-project and
> > maybe https://helpwanted.apache.org/, if
> > https://issues.apache.org/jira/browse/COMDEV-225 gets fixed.
> >
> > I'll go ahead with these in a couple of days, unless I hear any
> > objections before then.
>


Re: Graduation voting and work agenda

2017-10-18 Thread Jim Apple
1 is ongoing, 2 is done.

The response to #1 is so strong I am going to do some work on #3: inviting
all PPMC members and Mentors to be prospective PMC members in our charter,
as discussed previously in the first graduation discussion thread. I will
be sending out that email shortly.

On Tue, Oct 17, 2017 at 7:00 PM, Jim Apple <jbap...@cloudera.com> wrote:

> Following the community discussion, my observations from general@incubator,
> and https://incubator.apache.org/guides/graduation.html#the_
> graduation_process , I will do the following:
>
> 1. Call for a formal graduation vote on dev@.
>
> 2. While that is happening, let general@ know we are voting.
>
> Should the dev@ vote pass:
>
> 3. Prepare a charter with the community.
>
> 4. Start a discussion on general@.
>
> Should the discussion look mostly positive:
>
> 5. Call a vote on general@.
>


[VOTE] Graduate to a TLP

2017-10-17 Thread Jim Apple
Following our discussion
https://lists.apache.org/thread.html/2f5db4788aff9b0557354b9106c0328a29c1f90c1a74a228163949d2@%3Cdev.impala.apache.org%3E
, I propose that we graduate to a TLP. According to
https://incubator.apache.org/guides/graduation.html#community_graduation_vote
this is not required, and https://impala.apache.org/bylaws.html does not
say whose votes are "binding" in a graduation vote, so all community
members are welcome to vote.

This will remain open 72 hours. I will be notifying general@incubator it is
occurring.

This is my +1.


Graduation voting and work agenda

2017-10-17 Thread Jim Apple
Following the community discussion, my observations from general@incubator,
and
https://incubator.apache.org/guides/graduation.html#the_graduation_process
, I will do the following:

1. Call for a formal graduation vote on dev@.

2. While that is happening, let general@ know we are voting.

Should the dev@ vote pass:

3. Prepare a charter with the community.

4. Start a discussion on general@.

Should the discussion look mostly positive:

5. Call a vote on general@.


Re: Impala Version Stability

2017-10-16 Thread Jim Apple
Impala is a project at the Apache Software Foundation. Various vendors
package and distribute Impala, including Cloudera.

That being said, this is a mailing list for the Apache Impala community, so
the details of that and how to dig into that might be better addressed on a
Cloudera forum.

On Sun, Oct 15, 2017 at 11:29 PM, sky  wrote:

> Hi all,
> Which impala version of the stability is better ? cdh5.13.0-release or
> cdh5.12.1-release ?


Re: Time for graduation?

2017-10-12 Thread Jim Apple
All of that SGTM


Re: Time for graduation?

2017-10-12 Thread Jim Apple
I think it would be a good time to graduate. I'm very proud of the progress
the community has made in terms of acting in an Apache way.

Some logistics:

I would be happy to serve as an initial chair.

I'll draft a resolution, with a blank space for chair. This doesn't mean we
have to agree now is the time to graduate, but we'll have it available for
discussion and revision whenever we are ready.

If we decide to graduate now, maybe we could email everyone who is on the
PPMC, ccing private@, to see if they are still interested in being on the
PMC, and taking no response to mean "yes" until we hear otherwise, in case
someone is on vacation away from email, or in the hospital, or something.

Also, mentors are traditionally included in a graduating podling's PMC,
right?

On Thu, Oct 12, 2017 at 2:17 PM, Todd Lipcon  wrote:

> Hey Impala community,
>
> It's been a while that all of the Impala infrastructure has been moved
> over, and the community appears to be functioning healthily, generating new
> releases on a regular cadence as well as adding new committers and PPMC
> members. All of the branding stuff seems great, and the user mailing list
> has a healthy amount of traffic and a good track record of answering
> questions when they come up.
>
> As a mentor I think it's probably time to discuss graduation. The project
> is already functioning in the same way as your typical Apache TLP and it
> seems like it's time to become one.
>
> Any thoughts? If everyone is on board, the next step would be:
>
> 1. Pick the initial PMC chair for the TLP. According to the published
> Impala Bylaws it seems that this is meant to rotate annually, so no need to
> stress too much about it.
>
> A couple obvious choices here would be Marcel (as the original founder of
> the project) or perhaps Jim (who has done yeoman's work on a lot of the
> incubation process, podling reports, etc). Others could certainly volunteer
> or be nominated as well.
>
> 2. Draft a Resolution for the PPMC and IPMC to vote upon.
> -- the resolution would include the above-decided chair as well as the list
> of initial PMC, etc.
> -- the Initial PMC could be just the current list of PPMC, or you could
> consider adding others at this point as well.
>
>
> I can help with the above process but figured I'd solicit opinions first on
> whether the communit feels it's ready to graduate.
>
> Thanks
> Todd
>


New Impala contributors: IMPALA-5341

2017-10-09 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what you
want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on, with
hopefully enough detail to get you going but not so much to take away the
fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-5341, "File
size filter in planner tests also filters row-size"? The very first thing
to do is understand the pieces of test infrastructure that relate to this
issue.

When Impala processes a query, before running it, the query has to be
"planned". Part of the output of that is what you see when you EXPLAIN
SELECT. For example:

[localhost:21000] > explain select * from tpch.lineitem;
Query: explain select * from tpch.lineitem
+--+
| Explain String   |
+--+
| Max Per-Host Resource Reservation: Memory=0B |
| Per-Host Resource Estimates: Memory=264.00MB |
|  |
| PLAN-ROOT SINK   |
| ||
| 01:EXCHANGE [UNPARTITIONED]  |
| ||
| 00:SCAN HDFS [tpch.lineitem] |
|partitions=1/1 files=1 size=718.94MB  |
+--+
Fetched 9 row(s) in 0.01s


Impala has planner-specific tests that focus on just this one part of the
system. You can see what these tests look like in
testdata/workloads/functional-planner/queries/PlannerTest/. For example, in
aggregation.test, one test starts with:

# basic aggregation
select count(*), count(tinyint_col), min(tinyint_col), max(tinyint_col),
sum(tinyint_col),
avg(tinyint_col)
from functional.alltypesagg
 PLAN
PLAN-ROOT SINK
|
01:AGGREGATE [FINALIZE]
|  output: count(*), count(tinyint_col), min(tinyint_col),
max(tinyint_col), sum(tinyint_col), avg(tinyint_col)
|
00:SCAN HDFS [functional.alltypesagg]
   partitions=11/11 files=11 size=814.73KB
 DISTRIBUTEDPLAN
...

As you can see, this looks like the output of EXPLAIN. These tests are run
by running EXPLAIN on the query in the first section and diffing the result
with the plan in the second section.

One part of the EXPLAIN output that isn't consistent is the file size.
Because this can change, text like "size=814.73KB" is replaced with just
"size=" before diffing. This covers up any differences in the file sizes,
but it also covers up differences in "row-size=" sections, which you can
see in constant-folding.test. Try changing one of the "row-size=" sections
to be much larger or smaller and see that it doesn't cause the tests to
fail.

You can find these in
fe/src/test/java/org/apache/impala/planner/PlannerTest.java. They mostly
just refer to the .test files, so for instance, constant-folding.test is
referenced in

  @Test
  public void testConstantFolding() {
// Tests that constant folding is applied to all relevant PlanNodes and
DataSinks.
// Note that not all Exprs are printed in the explain plan, so
validating those
// via this test is currently not possible.
TQueryOptions options = defaultQueryOptions();
options.setExplain_level(TExplainLevel.EXTENDED);
runPlannerTestFile("constant-folding", options);
  }

You can run this test by using:

(pushd fe && mvn -fae test -Dtest=PlannerTest#testConstantFolding)

OK, now that we have covered the background, you are ready to fix the
issue. You probably want to make the filter more restrictive, perhaps by
changing the static Strings or changing the matches() and transform()
methods in TestUtils.java. Once that's done, try running the Planner tests
on a test that includes row-size again. This time, it should pass with the
row-size as written and fail if you change the row-size, like we did above.

Have fun, and don't hesitate to ask d...@impala.apache.org if you get stuck!


New Impala contributors: IMPALA-5362

2017-10-02 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what you
want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on, with
hopefully enough detail to get you going but not so much to take away the
fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-5362, "Preserve
case-sensitivity in field titles"?

First, set up your development environment, and make sure you can run the
tests. In particular, you'll want to run the front-end tests and the
end-to-end shell tests:

(pushd fe && mvn -fae test)
tests/run-tests.py shell

The test case listed in the ticket uses the impala shell, so let's first
look to add it to the tests for the shell. Look in
tests/shell/test_shell_interactive.py. This interacts with the shell in a
way most similar to how a human would, among Impala's automated tests. You
will notice that a number of tests check stdout or stderr for the string
they expect in the results. You can use this as well to write a test that
fails in HEAD but that will pass when you are done with your patch.

The column headers printed by impala-shell.sh are stored in some variables
and parameters named column_names. These are fetched from
ImpalaClient.get_column_names. You can continue to trace this all the way
to the frontend. That code is in the fe directory. There, the terminology
changes to column labels, rather than column names. If you grep around for
toLowerCase, you will find that column labels are forced to lowercase in
SelectListItem.toColumnLabel.

That call to toLowerCase() must be there for a reason - see if you can
figure out why! You may decide to keep it and add to column labels an
additional string containing the original capitalization, or you may decide
to eliminate the call to toLowerCase(). In the former case, you can limit
the uses of the original capitalization to impala-shell.sh, but in the
latter all clients will see the original capitalization. This may break
other tests, including frontend tests.

You may need to fix some planner tests. Some of these have a special format
that you can see examples of in
testdata/workloads/functional-planner/queries/PlannerTest/.

Have fun! Once you're done, re-run the front-end tests and the end-to-end
shell tests, as described above. Then submit your patch for review
following the instructions on
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.


New Impala committer: Bikramjeet Vig

2017-09-29 Thread Jim Apple
The Podling Project Management Committee (PPMC) for Apache Impala
(incubating) has invited Bikramjeet Vig to become a committer and we
are pleased to announce that they have accepted.

Congratulations and welcome, Bikramjeet!


New Impala committer: Zach Amsden

2017-09-29 Thread Jim Apple
The Podling Project Management Committee (PPMC) for Apache Impala
(incubating) has invited Zach Amsden to become a committer and we are
pleased to announce that they have accepted.

Congratulations and welcome, Zach!


New Impala contributors: IMPALA-5392

2017-09-27 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what
you want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on,
with hopefully enough detail to get you going but not so much to take
away the fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-5392,
"Stack depth for threads printed in the Catalog UI under JVM Threads
is not deep enough"?

First, set up your development environment and start a development
impala cluster by running bin/start-impala-cluster.py. Once it is
running, check out http://localhost:25020/jvm-threadz, which you will
see referenced offhand in
http://impala.apache.org/docs/build/html/topics/impala_webui.html.
That page has the stack traces the ticket (IMPALA-5392) is talking
about. You can see that several end in ellipses, indicating they are
cut off. Fixing that is the aim of this ticket.

In www/jvm-threadz.tmpl, you will see the template for creating this
page. The section to fix is "{{summary}}". To see where this is
populated, git grep jvm-threadz. There you will see it is referenced
in be/src/util/thread.cc. The references to summary in that file are
getting that information from a TJvmThreadInfo object, so we'll need
to trace where the summary field of that object is populated. That
object is referenced in common/thrift/Frontend.thrift, which is its
definition, and in
fe/src/main/java/org/apache/impala/common/JniUtil.java. In that file,
you will see setSummary called with the result of a call to
java.lang.management.ThreadInfo.toString(), which is described here:

https://docs.oracle.com/javase/8/docs/api/java/lang/management/ThreadInfo.html#toString--

Investigate those docs and find a way to add more stack trace
information. Once you think you have it, you will need to rebuild and
restart the cluster:

./buildall.sh -notests -noclean -ninja -start_impala_cluster

Reload the webpage and admire your handiwork! Once you're satisfied,
submit your patch for review following the instructions on
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.


New Impala PPMC member: Bharath Vissapragada

2017-09-26 Thread Jim Apple
The Podling Project Management Committee (PPMC) for Apache Impala
(incubating) has invited Bharath Vissapragada to become a PPMC member
and we are pleased to announce that they have accepted.

Congratulations and welcome, Bharath!


New Impala contributors: IMPALA-5440

2017-09-19 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what you
want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on, with
hopefully enough detail to get you going but not so much to take away the
fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-5440, "Add
planner tests with extreme statistics values"? The comments on the ticket
address a number of ways, some of them rather ambitious for a new
contributor, so let's talk about a smaller chunk of it.

This ticket was filed in response to
https://issues.apache.org/jira/browse/IMPALA-5282, which included an
exception in the frontend (which does parsing, analyzing, and planning for
queries) from an overflow. Take a look at the patch which fixed the issue,
https://gerrit.cloudera.org/#/c/7084. It doesn't include any new tests,
which is why IMPALA-5440 was filed. You can see this in the comments on the
patch: "For now, I feel pretty good about the computePerHostResources()
with respect to overflow since I read all the code carefully. We should
still have tests to not break it sometime later. I filed IMPALA-5440 to
address the long-standing bug in test coverage."

Reading the comments on a patch are a good way to understand why something
in Impala is the way it is. All recent Impala patches have a line in the
bottom of the commit message with a URL of the code review so you can do
archaeology for information that wasn't included in the patch itself. All
code review comments are also sent to
https://lists.apache.org/list.html?revi...@impala.apache.org, which you can
subscribe to in the same way you subscribed to this list, by mailing
reviews-subscr...@impala.incubator.apache.org.

In this case, the question to address is arithmetic overflow in the
frontend. The previous patch shows many places where overflow is checked,
and you may be able to add new tests for each line in that patch. For now,
let's just work on two categories of overflow: cardinality estimation and
memory estimates.

Impala's planner, in order to execute a query efficiently, makes estimates
about the number of rows that will be produced by different parts of the
query. If cardinality estimations have arithmetic overflow, they will
estimate a negative number of rows!

To see if you can get arithmetic overflow, start up impala-shell.sh and set
explain_level=2. This will show the planner's estimates on the number of
rows each part of a query produces. Then explain the plans for some cross
joins:

use tpch;
explain select * from lineitem a;
explain select * from lineitem a, lineitem b;
explain select * from lineitem a, lineitem b, lineitem c;
...

At some point in that sequence, you will see that the cardinality estimate
reaches a ceiling, even though those queries would actually produce more
and more rows with each cross-join. This is because the overflow check is
working and capping the cardinality estimate at the largest long value,
2^63 - 1.

To see how to test this, take a look at
fe/src/test/java/org/apache/impala/planner/PlannerTest.java. Each of the
tests in that file references a file in
testdata/workloads/functional-planner/queries/PlannerTest/. To look for a
test that can check that cardinality is bounded, look for the string
"cardinality" in the PlannerTest directory. Check out the test method in
PlannerTest.java that corresponds, and write a similar test file and test
method.

Have fun, and don't hesitate to ask on d...@impala.apache.org if you get
stuck and need help!


Re: jenkins.impala.io pre-existing workspace

2017-09-19 Thread Jim Apple
Nobody did.

On Tue, Sep 19, 2017 at 2:29 PM, Matthew Jacobs <m...@cloudera.com> wrote:

> Did anyone file a JIRA for this? I saw this again.
>
> On Thu, Aug 31, 2017 at 1:36 PM, Jim Apple <jbap...@cloudera.com> wrote:
> > Also, to be clear, I don't have the cycles to lead the fix-the-cleanup
> > task at the moment.
> >
> > On Wed, Aug 30, 2017 at 4:45 PM, Jim Apple <jbap...@cloudera.com> wrote:
> >> The workspace cleanup isn't working - see the last bit of any recent
> >> ub1604 job: https://jenkins.impala.io/view/Utility/job/ubuntu-16.04-
> from-scratch/206/console
> >>
> >> 03:56:40.920 [WS-CLEANUP] Deleting project workspace...Cannot delete
> >> workspace :remote file operation failed: /home/ubuntu at
> >> hudson.remoting.Channel@4384d5b9:ubuntu-16.04 (i-032d527b9c801df4c):
> >> java.io.IOException: Unable to delete '/home/ubuntu'. Tried 3 times
> >> (of a maximum of 3) waiting 0.1 sec between attempts.
> >> 03:56:48.161 ERROR: Step ‘Delete workspace when build is done’ failed:
> >> Cannot delete workspace: remote file operation failed: /home/ubuntu at
> >> hudson.remoting.Channel@4384d5b9:ubuntu-16.04 (i-032d527b9c801df4c):
> >> java.io.IOException: Unable to delete '/home/ubuntu'. Tried 3 times
> >> (of a maximum of 3) waiting 0.1 sec between attempts.
> >>
> >> The workspace is $HOME, so you can't just delete it without being root.
> >>
> >> This could be changed to
> >>
> >> 1. A post-build script to "rm -rf ~/*". This doesn't reset everything,
> >> though - the job makes changes to other parts of the filesystem.
> >>
> >> 2. A post-build script to "sudo shutdown -h now" to make sure ec2
> >> instances are not re-used. I'm not sure how Jenkins would feel about
> >> this. :-)
> >>
> >> 3. A post-build script to move $HOME to some archived location on the
> >> disk, to preserve debuggability.
> >>
> >> 4. A bash trap in the script to do one of the above.
> >>
> >> 5. Run the whole thing in a docker in the build machine, then delete
> >> the container when the script is done. Or don't, if there's enough
> >> disk space to not worry about that.
> >>
> >> 6. Do all of the work in a workspace inside $HOME. This would require
> >> some changes to bootstrap_development.sh.
> >>
> >> #5 is the most hermetic, I'd guess.
> >>
> >> On Thu, Aug 24, 2017 at 8:29 AM, Michael Brown <mi...@cloudera.com>
> wrote:
> >>> Looks like someone has done this.
> >>>
> >>> On Wed, Aug 23, 2017 at 8:16 PM, Alexander Behm <
> alex.b...@cloudera.com>
> >>> wrote:
> >>>
> >>>> Yes, let's please add the post-build action for sanity and
> consistency with
> >>>> our other jobs.
> >>>>
> >>>> On Wed, Aug 23, 2017 at 7:42 PM, Tim Armstrong <
> tarmstr...@cloudera.com>
> >>>> wrote:
> >>>>
> >>>> > Maybe the workspace just got left in a weird state - I think in most
> >>>> cases
> >>>> > "git init" followed by checking out a branch and doing a clean would
> >>>> work.
> >>>> >
> >>>> > Should we add the delete workspace post-build action?
> >>>> >
> >>>> > On Wed, Aug 23, 2017 at 5:32 PM, Michael Brown <mi...@cloudera.com>
> >>>> wrote:
> >>>> >
> >>>> > > Not a known issue. I noticed ubuntu-16.04-from-scratch is not set
> to
> >>>> > clean
> >>>> > > up its workspace, and its config has not been touched since Aug
> 11. It
> >>>> > > seems strange we only saw this now
> >>>> > >
> >>>> > > On Wed, Aug 23, 2017 at 5:25 PM, Tim Armstrong <
> >>>> tarmstr...@cloudera.com>
> >>>> > > wrote:
> >>>> > >
> >>>> > > > Is this a known problem? My job failed because the Impala repo
> >>>> already
> >>>> > > > existed on the machine:
> >>>> > > >
> >>>> > > > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/164/
> >>>> > > >
> >>>> > > > *23:00:24* + /usr/bin/git init /home/ubuntu/Impala*23:00:24*
> >>>> > > > Reinitialized existing Git repository in
> /home/ub

Re: [RESULT] Vote on Impala 2.10.0 release candidate 2

2017-09-18 Thread Jim Apple
I have updated the release instructions at
https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release to
reflect this addition.

On Wed, Sep 13, 2017 at 10:10 AM, sebb  wrote:

> The summary is easy to read, but has no context.
>
> Ideally the result mail should either form part of the VOTE thread
> (reply and prefix the subject with [RESULT]) or should contain a link
> to the vote thread.
>
> Please could you consider addressing this for the next vote result?
>
> Thanks.
>
> Just my 2p
>
>
>
> On 13 September 2017 at 07:14, Bharath Vissapragada
>  wrote:
> > The vote has passed with the following tally.
> >
> > +1 (binding)
> >
> > - Brock Noland
> > - Carl Steinbach
> > - John D. Ament
> >
> > -1 (binding) - None
> > 0 - None
> >
> > Thanks everyone for testing and voting on the release.
>


Re: [ANNOUNCE] Apache Impala (incubating) 2.10.0 release

2017-09-15 Thread Jim Apple
-everyone but dev@

Thank you, Bharath!

On Thu, Sep 14, 2017 at 10:41 PM, Bharath Vissapragada 
wrote:

> The Apache Impala (incubating) team is pleased to announce the release of
> Impala 2.10.0.
>
> Impala is a high-performance C++ and Java SQL query engine for data stored
> in Apache Hadoop-based clusters.
>
> The release is available at: https://impala.incubator.
> apache.org/downloads.html
>
> Thanks,
>
> The Apache Impala (incubating) team
>
> =
>
> *Disclaimer*
>
> Apache Impala is an effort undergoing incubation at The Apache Software
> Foundation (ASF),
> sponsored by the name of Apache Incubator PMC. Incubation is required of
> all newly accepted
> projects until a further review indicates that the infrastructure,
> communications, and
> decision making process have stabilized in a manner consistent with other
> successful ASF
> projects. While incubation status is not necessarily a reflection of the
> completeness or
> stability of the code, it does indicate that the project has yet to be
> fully endorsed by
> the ASF.
>


New Impala contributors: IMPALA-5610

2017-09-14 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what you
want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on, with
hopefully enough detail to get you going but not so much to take away the
fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-5610, "Warn if
deprecated flags are set"?

First, set up your development environment. Then, double-check that the bug
works the way you think it does by starting your impala cluster with a
deprecated flag:

bin/start-impala-cluster.py
--impalad_args="--enable_partitioned_hash_join=false"

This succeeds, even though

$ git grep enable_partitioned_hash_join
be/src/exec/exec-node.cc:DEFINE_bool_hidden(enable_partitioned_hash_join,
true, "Deprecated - has no effect");

OK, so this is a deprecated flag - we shouldn't be able to start Impala
with it. How can we warn if it is not default? Take a look at
be/src/service/impalad-main.cc. There you will see a number of LOG(WARNING)
messages if FLAGS_ENABLE_RM, another deprecated flag is set. That seems as
good a place as any to start.

Impala's runtime flags are defined with gflags:
https://gflags.github.io/gflags/. Some good patterns to look for that
indicate deprecated flags are "deprecated" and "_hidden".

To get started easy, try fixing just one flag at first, then get your patch
through code review and committed into the master branch by an Impala
committer. Once you have that experience under your belt, the next patch
can tackle more flags.

If you find yourself repeating the same pattern for many flags, consider
how you might reduce the boilerplate associated with that by refactoring
the code to use functions, classes, or even macros (if necessary) to
prevent having to repeat yourself.

Good luck, and have fun!


Re: Impala entry on Encyclopedia of Big Data

2017-09-12 Thread Jim Apple
I'd be happy to join a team to do this. Anybody else?

On Tue, Sep 12, 2017 at 10:22 AM, Dimitris Tsirogiannis <
dtsirogian...@cloudera.com> wrote:

> I've been contacted by the members of the editorial board of the new
> Encyclopedia of Big Data asking if we can contribute an entry for Impala.
> Here are some details:
>
> Each entry contribution should not be more than 2-3 pages in length
> including figures and references. Please feel free to recruit one or two
> more co-authors, if you choose. Just let us know who will be responsible
> for this entry.
>
> The current timeline for the project is as follows:
>
> - Authors submitting their entries --> 15 November 2017
> - Section Editors release the first batch of accepted and approved entries
> --> 15 December 2017
> - Section Editors release the second batch of revised, accepted and
> approved entries -->  31 January 2018
>
> Let me know if there any volunteers (it doesn't have to be a single person)
> interested in writing the Impala entry and I will bring you in touch with
> the members of the editorial board.
>
> Regards
> Dimitris
>


New Impala contributors: IMPALA-5614

2017-09-11 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what you
want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on, with
hopefully enough detail to get you going but not so much to take away the
fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-5614, "Add
COMMENT ON syntax to support comments on all objects"?

First, set up your development environment. Then launch bin/impala-shell.sh
to see that the syntax, as expected, doesn't yet work:

$ bin/impala-shell.sh
Starting Impala Shell without Kerberos authentication
Connected to localhost:21000
Server version: impalad version 2.10.0-SNAPSHOT DEBUG (build
23d79462da5d0108709e8b1399c97606f4ebdf92)
***
Welcome to the Impala shell.
(Impala Shell v2.10.0-SNAPSHOT (23d7946) built on Thu Aug 31 23:52:28 PDT
2017)

The HISTORY command lists all shell commands in chronological order.
***
[localhost:21000] > COMMENT ON DATABASE functional IS 'Development
Database';
Query: comment ON DATABASE functional IS 'Development Database'
Query submitted at: 2017-09-01 21:19:11 (Coordinator:
http://jbapple-optiplex:25000)
ERROR: AnalysisException: Syntax error in line 1:
comment ON DATABASE functional IS 'Development Database'
^
Encountered: COMMENT
Expected: ALTER, COMPUTE, CREATE, DELETE, DESCRIBE, DROP, EXPLAIN, GRANT,
INSERT, INVALIDATE, LOAD, REFRESH, REVOKE, SELECT, SET, SHOW, TRUNCATE,
UPDATE, UPSERT, USE, VALUES, WITH

CAUSED BY: Exception: Syntax error


The first thing you'll want to do is to change the parser to recognize
statements of this form. Statements are parsed in the front end. Before we
talk about that, note that Impala does use a traditional lex-then-parse
method for generating the abstract syntax tree. The lexer is in JFlex, and
is located in fe/src/main/jflex. The parser is in CUP and is located in
fe/src/main/cup/. If you look at the lexer, you'll see that all of the
keywords referenced in the ticket: COMMENT, ON, DATABASE, TABLE, COLUMN,
and IS are already keywords of the language, so you won't need to alter the
lexer.


If you look at the parser, you'll see it's in a BNF-like format, with the
top-level starting non-terminal being stmt. You'll probably want to add a
new type of statement, perhaps something like comment_on_stmt. First, build
the frontend to make sure you can iterate quickly on the changes you are
making, using ./buildall.sh -fe_only.

Now, try to copy an existing statement type to make your new COMMENT ON
statement. I'd recommend starting with a single type of COMMENT ON and
making sure that it works, including tests, before you do the other types.
You might even want to break this up into multiple commits - first get
COMMENT ON DATABASE working, tested, through code review, and committed,
before doing the rest.

Some places you'll want to look for places to modify or add files:

fe/src/main/java/org/apache/impala/analysis contains the statement type
classes for use in the front-end "analysis", which runs on the AST.

fe/src/main/java/org/apache/impala/service contains Frontend.java, which
can analyze a statement and turn it into a DDL request, and
CatalogOpExecutor.java, which can execute operations that alter tables.

For both of those directories, there is a corresponding directory in
fe/src/test/java/org/apache/impala with unit tests. You'll want to add some
unit tests, probably.

common/thrift contains Thrift definitions for the statement types that the
catalog can execute.

testdata/workloads/functional-query/queries/QueryTest contains .test files
for running end-to-end tests.

That should hopefully be enough to get you started. Have fun!


Re: expr-test stuck in getJNIEnv

2017-09-10 Thread Jim Apple
It was . bin/set-classpath.sh. Forgot about that one. Thanks, Henry!

On Sun, Sep 10, 2017 at 3:31 PM, Henry Robinson <he...@apache.org> wrote:
> I've seen this deadlock before although not in expr-test. I can't remember
> exactly how I cleared it but I believe it was either:
>
> 1. make fe && . bin/set-classpath.sh
> 2. bin/create-test-configuration.sh
>
> Sailesh knows about the upstream HDFS bug which I think has been fixed but
> not incorporated into Impala's dependencies.
>
> On Sun, Sep 10, 2017 at 1:42 PM Jim Apple <jbap...@cloudera.com> wrote:
>
>> When I run expr-test, it gets stuck in getJNIEnv(). Here's the full stack
>> trace:
>>
>> #0  __lll_lock_wait () at
>> ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
>> #1  0x74885dbd in __GI___pthread_mutex_lock (mutex=0x45fe5a0
>> ) at ../nptl/pthread_mutex_lock.c:80
>> #2  0x02cf79f6 in mutexLock (m=) at
>>
>> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
>> #3  0x02cf01b7 in setTLSExceptionStrings (rootCause=0x0,
>> stackTrace=0x0) at
>>
>> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
>> #4  0x02cf7f77 in printExceptionAndFreeV (env=0x4f221e8,
>> exc=0x4eb6a00, noPrintFlags=, fmt=0x33d7994
>> "loadFileSystems", ap=0x7fff9da0)
>> at
>> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
>> #5  0x02cf81dd in printExceptionAndFree (env=,
>> exc=, noPrintFlags=, fmt=> out>)
>> at
>> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
>> #6  0x02cf0faf in getGlobalJNIEnv () at
>>
>> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
>> #7  getJNIEnv () at
>>
>> /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:528
>> #8  0x01a0116a in impala::JniUtil::Init () at
>> be/src/util/jni-util.cc:105
>> #9  0x014fd881 in impala::InitCommonRuntime (argc=1,
>> argv=0x7fffa628, init_jvm=true,
>> test_mode=impala::TestInfo::BE_TEST) at be/src/common/init.cc:236
>> #10 0x0143da3f in main (argc=1, argv=0x7fffa628) at
>> be/src/exprs/expr-test.cc:7420
>>
>> I've tried git fetch, bin/clean.sh, running with the minicluster on,
>> running with the minicluster off, running with the impala cluster on,
>> running with it off, running in release mode, debug mode, in gdb, and
>> out of gdb.
>>
>> Has anyone else seen this and escaped from its clutches?
>>


expr-test stuck in getJNIEnv

2017-09-10 Thread Jim Apple
When I run expr-test, it gets stuck in getJNIEnv(). Here's the full stack trace:

#0  __lll_lock_wait () at ../sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135
#1  0x74885dbd in __GI___pthread_mutex_lock (mutex=0x45fe5a0
) at ../nptl/pthread_mutex_lock.c:80
#2  0x02cf79f6 in mutexLock (m=) at
/data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28
#3  0x02cf01b7 in setTLSExceptionStrings (rootCause=0x0,
stackTrace=0x0) at
/data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581
#4  0x02cf7f77 in printExceptionAndFreeV (env=0x4f221e8,
exc=0x4eb6a00, noPrintFlags=, fmt=0x33d7994
"loadFileSystems", ap=0x7fff9da0)
at 
/data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183
#5  0x02cf81dd in printExceptionAndFree (env=,
exc=, noPrintFlags=, fmt=)
at 
/data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:213
#6  0x02cf0faf in getGlobalJNIEnv () at
/data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:463
#7  getJNIEnv () at
/data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:528
#8  0x01a0116a in impala::JniUtil::Init () at
be/src/util/jni-util.cc:105
#9  0x014fd881 in impala::InitCommonRuntime (argc=1,
argv=0x7fffa628, init_jvm=true,
test_mode=impala::TestInfo::BE_TEST) at be/src/common/init.cc:236
#10 0x0143da3f in main (argc=1, argv=0x7fffa628) at
be/src/exprs/expr-test.cc:7420

I've tried git fetch, bin/clean.sh, running with the minicluster on,
running with the minicluster off, running with the impala cluster on,
running with it off, running in release mode, debug mode, in gdb, and
out of gdb.

Has anyone else seen this and escaped from its clutches?


Re: Encountering failure during build on docker

2017-09-08 Thread Jim Apple
Hm. Haven't seen this before. Does "MBP" stand for "Mac Book Pro"?
This could be an issue with the Docker instructions in
bootstrap_development.sh not accounting for some transparency in
Docker exposing the host to the container.

If possible, can you send the full stdout and stderr from that last command?

On Fri, Sep 8, 2017 at 8:57 AM, Manaswini Maharana
<mmahar...@cloudera.com> wrote:
> Here you go -
>
>
> 1. The command you used to start the docker container
> mmaharana-MBP:~ mmaharana$ *docker pull ubuntu:16.04*
> mmaharana-MBP:~ mmaharana$ *docker run --privileged --interactive --tty
> --name impala-dev ubuntu:16.04 bash*
>
> 2. The output of gcc --version inside your docker container
> impdev@ea385187b032:~$ *gcc --version*
> *gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609*
> *Copyright (C) 2015 Free Software Foundation, Inc.*
> *This is free software; see the source for copying conditions.  There is NO*
> *warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR
> PURPOSE.*
>
>
> 3. The output of lsb_release -a both in the host and inside the docker
> container
> On Host:
> mmaharana-MBP:~ mmaharana$* lsb_release -a*
> *-bash: lsb_release: command not found*
>
> On Container:
> *impdev@ea385187b032:~$ lsb_release -a *
> *No LSB modules are available.*
> *Distributor ID: Ubuntu*
> *Description: Ubuntu 16.04.3 LTS*
> *Release: 16.04*
> *Codename: xenial*
>
> 4. The commands you ran inside the container
> root@ea385187b032:/# *apt-get update*
> root@ea385187b032:/# *apt-get install sudo*
> root@ea385187b032:/# *adduser --disabled-password --gecos '' impdev*
> root@ea385187b032:/# *echo 'impdev ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers*
> root@ea385187b032:/#* su - impdev*
> impdev@ea385187b032:~$ *sudo apt-get --yes install git*
> impdev@ea385187b032:~$* git clone
> https://git-wip-us.apache.org/repos/asf/incubator-impala.git
> <https://git-wip-us.apache.org/repos/asf/incubator-impala.git> ~/Impala*
> impdev@ea385187b032:~$ *source ~/Impala/bin/bootstrap_development.sh*
>
>
> Thanks!
> Mansi
>
>
>
>
>
> On Fri, Sep 8, 2017 at 10:11 AM, Jim Apple <jbap...@cloudera.com> wrote:
>
>> Can you provide:
>>
>> 1. The command you used to start the docker container
>>
>> 2. The output of gcc --version inside your docker container
>>
>> 3. The output of lsb_release -a both in the host and inside the docker
>> container
>>
>> 4. The commands you ran inside the container
>>
>> Thank you!
>>
>> On Fri, Sep 8, 2017 at 7:58 AM, Manaswini Maharana
>> <mmahar...@cloudera.com> wrote:
>> > Hello team,
>> >
>> > I'm trying to setup docker for development and encountering the below
>> issue
>> > during bootstrap_development.sh sourcing. Any pointers on how to resolve
>> > this? If you need more stack trace to backtrack or any other kind of
>> > information to debug let me know.
>> >
>> >
>> > *Please submit a full bug report,*
>> >
>> > *with preprocessed source if appropriate.*
>> >
>> > *Please include the complete backtrace with any bug report.*
>> >
>> > *See <http://gcc.gnu.org/bugs.html <http://gcc.gnu.org/bugs.html>> for
>> > instructions.*
>> >
>> > *be/src/service/CMakeFiles/Service.dir/build.make:123: recipe for target
>> > 'be/src/service/CMakeFiles/Service.dir/impala-server.cc.o' failed*
>> >
>> > *make[2]: *** [be/src/service/CMakeFiles/Service.dir/impala-server.cc.o]
>> > Error 4*
>> >
>> > *CMakeFiles/Makefile2:5694: recipe for target
>> > 'be/src/service/CMakeFiles/Service.dir/all' failed*
>> >
>> > *make[1]: *** [be/src/service/CMakeFiles/Service.dir/all] Error 2*
>> >
>> > *Linking CXX static library ../../build/debug/testutil/libTestUtil.a*
>> >
>> > *[ 23%] Built target TestUtil*
>> >
>> > *Makefile:85: recipe for target 'all' failed*
>> >
>> > *make: *** [all] Error 2*
>> >
>> > *Error in /home/impdev/Impala/bin/make_impala.sh at line 178:
>> ${MAKE_CMD}
>> > ${MAKE_ARGS}*
>> >
>> > Thanks!
>> >
>> > Mansi
>>


Re: Encountering failure during build on docker

2017-09-08 Thread Jim Apple
Can you provide:

1. The command you used to start the docker container

2. The output of gcc --version inside your docker container

3. The output of lsb_release -a both in the host and inside the docker container

4. The commands you ran inside the container

Thank you!

On Fri, Sep 8, 2017 at 7:58 AM, Manaswini Maharana
 wrote:
> Hello team,
>
> I'm trying to setup docker for development and encountering the below issue
> during bootstrap_development.sh sourcing. Any pointers on how to resolve
> this? If you need more stack trace to backtrack or any other kind of
> information to debug let me know.
>
>
> *Please submit a full bug report,*
>
> *with preprocessed source if appropriate.*
>
> *Please include the complete backtrace with any bug report.*
>
> *See > for
> instructions.*
>
> *be/src/service/CMakeFiles/Service.dir/build.make:123: recipe for target
> 'be/src/service/CMakeFiles/Service.dir/impala-server.cc.o' failed*
>
> *make[2]: *** [be/src/service/CMakeFiles/Service.dir/impala-server.cc.o]
> Error 4*
>
> *CMakeFiles/Makefile2:5694: recipe for target
> 'be/src/service/CMakeFiles/Service.dir/all' failed*
>
> *make[1]: *** [be/src/service/CMakeFiles/Service.dir/all] Error 2*
>
> *Linking CXX static library ../../build/debug/testutil/libTestUtil.a*
>
> *[ 23%] Built target TestUtil*
>
> *Makefile:85: recipe for target 'all' failed*
>
> *make: *** [all] Error 2*
>
> *Error in /home/impdev/Impala/bin/make_impala.sh at line 178: ${MAKE_CMD}
> ${MAKE_ARGS}*
>
> Thanks!
>
> Mansi


New Impala contributors: IMPALA-5886

2017-09-07 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what
you want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on,
with hopefully enough detail to get you going but not so much to take
away the fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-5886,
"run-tests.py returns error overeagerly"?

First, set yourself up with a development environment. Then, reproduce
the test case shown in the ticket:

FE_TEST=false BE_TEST=false EE_TEST=true
EE_TEST_FILES=query_test/test_udfs.py JDBC_TEST=false
CLUSTER_TEST=false bin/run-all-tests.sh

This takes a couple of minutes to run for me. If you look through the
output, you can see that no stress tests are executed. A notice of
that is printed (in my terminal) in a yellow font, unlike the passing
test results, which are printed in green. The problem, of course, is
not the color, but the fact that the exit value from the program is 1,
which indicates a non-successful run. You can verify this in bash by
"echo $?" after running the test.

The ticket itself points out where the bug itself is in run-tests.py:
in the run_tests method on the class TestExecutor. The variable
exit_code is set to the return value of the call to pytest.main, and
that is compared against 0 in order to set self.tests_failed. However,
https://docs.pytest.org/en/latest/usage.html points out that a return
value of 5 simply means "no tests were run", which is arguably not an
error.

One way to fix this is to compare 0 < exit_code < 5. This will return
true when no tests are actually run, even if the user intended for
some to be run, so another way to fix that would be to add a flag to
the program (following how the --help flag is specified) to enforce
that at least one test ran. If that flag is passed, the sys.exit(1)
call would be made in either the case that a test failed or that no
tests ran. Otherwise, running zero tests is likely not an error and
the program can exit normally.


Re: New Impala contributors: outreach

2017-09-06 Thread Jim Apple
OK, done:

https://github.com/yourfirstpr/yourfirstpr.github.io/issues/86

https://github.com/up-for-grabs/up-for-grabs.net/pull/717

Can whoever runs the Apache Impala twitter account tweet at
https://twitter.com/yourfirstpr?lang=en?

On Sun, Sep 3, 2017 at 7:27 PM, Jim Apple <jbap...@cloudera.com> wrote:
> I'd like to encourage people who haven't contributed to Impala before
> to get started making patches. One way to do that would be to engage
> with communities where people for reaching out to new contributors.
>
> It appears that Your First PR has a mechanism to invite new
> contributors to the project by filing a ticket:
>
> https://github.com/yourfirstpr/yourfirstpr.github.io/issues?q=is%3Aopen+is%3Aissue
>
> Any objections to me filing a ticket there and pointing to
> https://issues.apache.org/jira/issues/?filter=12341668? That's
> "newbie" open bugs with no assignee.
>
> We could also tweet at https://twitter.com/yourfirstpr?lang=en.
>
> I'm interested in also reaching out via
> https://github.com/up-for-grabs/up-for-grabs.net#add-a-project and
> maybe https://helpwanted.apache.org/, if
> https://issues.apache.org/jira/browse/COMDEV-225 gets fixed.
>
> I'll go ahead with these in a couple of days, unless I hear any
> objections before then.


Re: New Impala contributors: IMPALA-5754

2017-09-06 Thread Jim Apple
I have posted a link on the ticket to
https://lists.apache.org/thread.html/6fbcfa650cbb920e2b517ae643bcd0859f1ba0368451d2949eda274d@%3Cdev.impala.apache.org%3E.
I hope to write some more of these, after which perhaps I should make
a space on the wiki to hold them all.

On Wed, Sep 6, 2017 at 10:08 AM, Todd Lipcon <t...@cloudera.com> wrote:
> Hey JIm,
>
> This is a great tutorial, thanks for posting it. One thought: would be
> great to put this somewhere on the web -- either as a blog post or wiki
> entry, so if someone googles they are more likely to find it. (sometimes
> mailing list archives are harder to bring up in google results)
>
> On Wed, Sep 6, 2017 at 10:05 AM, Jim Apple <jbap...@cloudera.com> wrote:
>
>> If you'd like to contribute a patch to Impala, but aren't sure what
>> you want to work on, you can look at Impala's newbie issues:
>> https://issues.apache.org/jira/issues/?filter=12341668. You can find
>> detailed instructions on submitting patches at
>> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
>> This is a walkthrough of a ticket a new contributor could take on,
>> with hopefully enough detail to get you going but not so much to take
>> away the fun.
>>
>> How can we fix https://issues.apache.org/jira/browse/IMPALA-5754,
>> "rand() algorithm is very non-random"? This is a partial walk-through
>> of how to get started.
>>
>> Set up your development environment. Then, look for where we might
>> first write a failing test. The test case given in the ticket is
>> "select count(distinct(rand(867-5309))), count(*) from alltypes a,
>> alltypes b;". Tests that run a full query are considered "end-to-end
>> tests".
>>
>> End-to-end tests are described in two ways: .test files and .py files.
>>
>> .test files contain queries and their expected results. For example:
>>
>> 
>>  QUERY
>> # Regression test for IMPALA-938
>> select smallint_col, int_col, (cast("1970-01-01" as timestamp) +
>> interval smallint_col days)
>> from functional.alltypes where smallint_col = 1 limit 1
>>  RESULTS
>> 1,1,1970-01-02 00:00:00
>>  TYPES
>> smallint, int, timestamp
>> 
>>
>> That is taken from
>> testdata/workloads/functional-query/queries/QueryTest/exprs.test.
>> That's a good test file to add a test case to, since it is testing
>> "exprs", and the bug is in  MathFunctions::Rand, which is defined in
>> be/src/exprs.
>>
>> First, let's run all of the exprs tests to see that they pass. You can
>> see them called in tests/query_test/test_exprs.py. The Python scrips
>> in tests/ can run these .test files by calling ImpalaTestSuite's
>> run_test_case() method with an abbreviated name of the .test file. In
>> test_exprs.py, this looks like
>>
>> self.run_test_case('QueryTest/exprs', vector)
>>
>> That call is in the method TestExprs.test_exprs(); you can invoke it with:
>>
>> ./bin/impala-py.test
>> tests/query_test/test_exprs.py::TestExprs::test_exprs --sanity
>>
>> This should take about 40 seconds and should pass, indicated by a
>> return value of 0 and a green line printed to the terminal reading:
>>
>> ...== 1 passed in 39.85 seconds ==...
>>
>> Now add a test case, following the example from the ticket and the
>> format in exprs.test. Run the test again; it should fail.
>>
>> Fix the bug and run the test again. Once the test is passing, follow
>> the instructions on the wiki to send your patch for code review:
>> https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera


New Impala contributors: IMPALA-5754

2017-09-06 Thread Jim Apple
If you'd like to contribute a patch to Impala, but aren't sure what
you want to work on, you can look at Impala's newbie issues:
https://issues.apache.org/jira/issues/?filter=12341668. You can find
detailed instructions on submitting patches at
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala.
This is a walkthrough of a ticket a new contributor could take on,
with hopefully enough detail to get you going but not so much to take
away the fun.

How can we fix https://issues.apache.org/jira/browse/IMPALA-5754,
"rand() algorithm is very non-random"? This is a partial walk-through
of how to get started.

Set up your development environment. Then, look for where we might
first write a failing test. The test case given in the ticket is
"select count(distinct(rand(867-5309))), count(*) from alltypes a,
alltypes b;". Tests that run a full query are considered "end-to-end
tests".

End-to-end tests are described in two ways: .test files and .py files.

.test files contain queries and their expected results. For example:


 QUERY
# Regression test for IMPALA-938
select smallint_col, int_col, (cast("1970-01-01" as timestamp) +
interval smallint_col days)
from functional.alltypes where smallint_col = 1 limit 1
 RESULTS
1,1,1970-01-02 00:00:00
 TYPES
smallint, int, timestamp


That is taken from
testdata/workloads/functional-query/queries/QueryTest/exprs.test.
That's a good test file to add a test case to, since it is testing
"exprs", and the bug is in  MathFunctions::Rand, which is defined in
be/src/exprs.

First, let's run all of the exprs tests to see that they pass. You can
see them called in tests/query_test/test_exprs.py. The Python scrips
in tests/ can run these .test files by calling ImpalaTestSuite's
run_test_case() method with an abbreviated name of the .test file. In
test_exprs.py, this looks like

self.run_test_case('QueryTest/exprs', vector)

That call is in the method TestExprs.test_exprs(); you can invoke it with:

./bin/impala-py.test
tests/query_test/test_exprs.py::TestExprs::test_exprs --sanity

This should take about 40 seconds and should pass, indicated by a
return value of 0 and a green line printed to the terminal reading:

...== 1 passed in 39.85 seconds ==...

Now add a test case, following the example from the ticket and the
format in exprs.test. Run the test again; it should fail.

Fix the bug and run the test again. Once the test is passing, follow
the instructions on the wiki to send your patch for code review:
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala


New Impala contributors: getting started

2017-09-03 Thread Jim Apple
If you are new to Impala and would like to contribute, you can start
by setting up an Impala development environment. For this you'll need
an Ubuntu 14.04 or 16.04 machine. Then just:

git clone https://git-wip-us.apache.org/repos/asf/incubator-impala.git ~/Impala
source ~/Impala/bin/bootstrap_development.sh

This will take about two hours to run, but when it is done you will be
ready to start developing Impala!

If you are then ready to start developing, take a look at Impala's
newbie issues: https://issues.apache.org/jira/issues/?filter=12341668.
If you find one you like, feel free to email d...@impala.apache.org to
discuss it, or dig right in. Before you start, though, register on the
Apache JIRA system and ask someone on dev@ to assign the ticket to
you. That way you don't end up in a race condition with another new
contributor! :-D

More detailed instructions on Impala's contribution process are
available on the wiki:
https://cwiki.apache.org/confluence/display/IMPALA/Contributing+to+Impala

If you don't have an Ubuntu 14.04 or 16.04 environment available, you
can use Docker. First, install Docker as you normally would. Then,

docker pull ubuntu:16.04
docker run --privileged --interactive --tty --name impala-dev ubuntu:16.04 bash

Now, within the container:

apt-get update
apt-get install sudo
adduser --disabled-password --gecos '' impdev
echo 'impdev ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers
su - impdev

Then, as impdev in the container:

sudo apt-get --yes install git
git clone https://git-wip-us.apache.org/repos/asf/incubator-impala.git ~/Impala
source ~/Impala/bin/bootstrap_development.sh

When that's done, start developing! When you're ready to pause, in a
new terminal in the host:

docker commit impala-dev && docker stop impala-dev

When you're ready to get back to work:

docker start --interactive impala-dev

If instead of committing your work and stopping the container, you
just want to detach from it, use ctrl-p ctrl-q. You can re-attach
using the start command.


New Impala contributors: outreach

2017-09-03 Thread Jim Apple
I'd like to encourage people who haven't contributed to Impala before
to get started making patches. One way to do that would be to engage
with communities where people for reaching out to new contributors.

It appears that Your First PR has a mechanism to invite new
contributors to the project by filing a ticket:

https://github.com/yourfirstpr/yourfirstpr.github.io/issues?q=is%3Aopen+is%3Aissue

Any objections to me filing a ticket there and pointing to
https://issues.apache.org/jira/issues/?filter=12341668? That's
"newbie" open bugs with no assignee.

We could also tweet at https://twitter.com/yourfirstpr?lang=en.

I'm interested in also reaching out via
https://github.com/up-for-grabs/up-for-grabs.net#add-a-project and
maybe https://helpwanted.apache.org/, if
https://issues.apache.org/jira/browse/COMDEV-225 gets fixed.

I'll go ahead with these in a couple of days, unless I hear any
objections before then.


Re: [VOTE] 2.10.0 release candidate 2 (RC2)

2017-09-02 Thread Jim Apple
Yeah, this jenkins.impala.io environment is pretty stable for DEBUG
builds with core test exploration, but ASAN builds are infrequently
done on it, so there might be a flaky test situation that is either
Jenkins-related or test-related, which doesn't directly implicate the
RC.

I'm still OK with the +1 I cast above.

On Fri, Sep 1, 2017 at 10:54 PM, Bharath Vissapragada
<bhara...@cloudera.com> wrote:
> Jim, thanks for testing the ASAN configuration. I tested it in my own
> environment (centos6) and it has passed. I'm suspecting that either
> something is flaky or probably related to the build environment.
>
> On Thu, Aug 31, 2017 at 11:23 PM, Jim Apple <jbap...@cloudera.com> wrote:
>
>> I think ASAN doesn't work with FE tests - I mistakenly left them in
>> for this run.
>>
>> We have never before seen two jobs on the same worker in jenkins.impala.io
>> .
>>
>> Here's an ASAN rerun that is failing other e2e tests:
>> https://jenkins.impala.io/view/Utility/job/ubuntu-16.04-
>> from-scratch/222/parameters/
>>
>> With this one, because the machine is still up, you can see no other
>> builds have run on it:
>> https://jenkins.impala.io/computer/ubuntu-16.04%20(i-
>> 0c375c963694b400b)/builds
>>
>> exhaustive DEBUG and RELEASE runs passed.
>>
>> On Thu, Aug 31, 2017 at 10:27 PM, Alexander Behm <alex.b...@cloudera.com>
>> wrote:
>> > Thanks for testing, Jim. I looked into your build but could not determine
>> > what happened. I found no ASAN output in the logs. It's interesting that
>> > this test also fails with a crash:
>> >
>> > ERROR at teardown of TestGrantRevoke.test_role_update[exec_option:
>> > {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
>> > 'disable_codegen': False, 'abort_on_error': 1,
>> > 'exec_single_node_rows_threshold': 0} | table_format: text/none]
>> >
>> > Looks like these two completely unrelated tests both failed in a strange
>> > way.
>> >
>> > It's also interesting that many FE unit tests failed with NoClassDefFound
>> > (very unusual).
>> >
>> > Could it be that two jobs were scheduled on the same worker?
>> Alternatively,
>> > maybe another job did not clean up after itself and your run landed on an
>> > unclean workspace leading to problems? This smells like an infra problem
>> to
>> > me,
>> > Maybe do another run?
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Aug 31, 2017 at 8:02 PM, Jim Apple <jbap...@cloudera.com> wrote:
>> >
>> >> This ASAN testing failed:
>> >>
>> >> https://jenkins.impala.io/view/Utility/job/ubuntu-16.04-
>> >> from-scratch/218/consoleFull
>> >> failed in query_test/test_udfs.py::TestUdfExecution::test_ir_
>> >> functions[exec_option:
>> >> {'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
>> >> 'exec_single_node_rows_threshold': 0, 'enable_expr_rewrites': False} |
>> >> table_format: text/none]. Looks like a crash to me.
>> >>
>> >> I didn't see a corresponding bug. Has anyone else seen something like
>> >> this before?
>> >>
>> >> On Thu, Aug 31, 2017 at 1:15 PM, Jim Apple <jbap...@cloudera.com>
>> wrote:
>> >> > BTW, this Jenkins job includes the log of what it tested, which
>> >> > follows the Release Guide, so you should be able to follow along OK.
>> >> > All committers should have access to run that job, too, if you don't
>> >> > trust my result.
>> >> >
>> >> > I am also testing exhaustive (not just core) tests at
>> >> > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/, builds
>> >> > 218-220 (once the instances come up).
>> >> >
>> >> > I tested with ImpalaLZO at this commit:
>> >> > https://github.com/cloudera/impala-lzo/tree/
>> >> 62c4b94ed6e89f0ce2068280864546ebccfb0729
>> >> >
>> >> > On Thu, Aug 31, 2017 at 6:19 AM, Jim Apple <jbap...@cloudera.com>
>> wrote:
>> >> >> +1
>> >> >>
>> >> >> https://jenkins.impala.io/job/release-test/20/console
>> >> >>
>> >> >> This tested following
>> >> >> https://cwiki.apache.org/confluence/display/IMPALA/How+
>> >> to+load+and+run+Impala+tests
>> >> >> and https://cwiki.apache.org/confluence/display/IMPALA/How+
>> >> to+Release#

Re: [VOTE] 2.10.0 release candidate 2 (RC2)

2017-09-01 Thread Jim Apple
I think ASAN doesn't work with FE tests - I mistakenly left them in
for this run.

We have never before seen two jobs on the same worker in jenkins.impala.io.

Here's an ASAN rerun that is failing other e2e tests:
https://jenkins.impala.io/view/Utility/job/ubuntu-16.04-from-scratch/222/parameters/

With this one, because the machine is still up, you can see no other
builds have run on it:
https://jenkins.impala.io/computer/ubuntu-16.04%20(i-0c375c963694b400b)/builds

exhaustive DEBUG and RELEASE runs passed.

On Thu, Aug 31, 2017 at 10:27 PM, Alexander Behm <alex.b...@cloudera.com> wrote:
> Thanks for testing, Jim. I looked into your build but could not determine
> what happened. I found no ASAN output in the logs. It's interesting that
> this test also fails with a crash:
>
> ERROR at teardown of TestGrantRevoke.test_role_update[exec_option:
> {'batch_size': 0, 'num_nodes': 0, 'disable_codegen_rows_threshold': 0,
> 'disable_codegen': False, 'abort_on_error': 1,
> 'exec_single_node_rows_threshold': 0} | table_format: text/none]
>
> Looks like these two completely unrelated tests both failed in a strange
> way.
>
> It's also interesting that many FE unit tests failed with NoClassDefFound
> (very unusual).
>
> Could it be that two jobs were scheduled on the same worker? Alternatively,
> maybe another job did not clean up after itself and your run landed on an
> unclean workspace leading to problems? This smells like an infra problem to
> me,
> Maybe do another run?
>
>
>
>
>
> On Thu, Aug 31, 2017 at 8:02 PM, Jim Apple <jbap...@cloudera.com> wrote:
>
>> This ASAN testing failed:
>>
>> https://jenkins.impala.io/view/Utility/job/ubuntu-16.04-
>> from-scratch/218/consoleFull
>> failed in query_test/test_udfs.py::TestUdfExecution::test_ir_
>> functions[exec_option:
>> {'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
>> 'exec_single_node_rows_threshold': 0, 'enable_expr_rewrites': False} |
>> table_format: text/none]. Looks like a crash to me.
>>
>> I didn't see a corresponding bug. Has anyone else seen something like
>> this before?
>>
>> On Thu, Aug 31, 2017 at 1:15 PM, Jim Apple <jbap...@cloudera.com> wrote:
>> > BTW, this Jenkins job includes the log of what it tested, which
>> > follows the Release Guide, so you should be able to follow along OK.
>> > All committers should have access to run that job, too, if you don't
>> > trust my result.
>> >
>> > I am also testing exhaustive (not just core) tests at
>> > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/, builds
>> > 218-220 (once the instances come up).
>> >
>> > I tested with ImpalaLZO at this commit:
>> > https://github.com/cloudera/impala-lzo/tree/
>> 62c4b94ed6e89f0ce2068280864546ebccfb0729
>> >
>> > On Thu, Aug 31, 2017 at 6:19 AM, Jim Apple <jbap...@cloudera.com> wrote:
>> >> +1
>> >>
>> >> https://jenkins.impala.io/job/release-test/20/console
>> >>
>> >> This tested following
>> >> https://cwiki.apache.org/confluence/display/IMPALA/How+
>> to+load+and+run+Impala+tests
>> >> and https://cwiki.apache.org/confluence/display/IMPALA/How+
>> to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate.
>> >>
>> >> On Wed, Aug 30, 2017 at 11:35 PM, Bharath Vissapragada
>> >> <bhara...@cloudera.com> wrote:
>> >>> This is a vote to release Impala 2.10.0.
>> >>>
>> >>> - The artefacts for testing can be downloaded from <
>> >>> https://dist.apache.org/repos/dist/dev/incubator/impala/2.10.0/RC2/>
>> >>>
>> >>> - The git tag for this release candidate is 2.10.0-rc2 and treehash is
>> >>> visible at
>> >>> <
>> >>> https://git-wip-us.apache.org/repos/asf?p=incubator-impala.
>> git;a=tree;hb=23d79462da5d0108709e8b1399c97606f4ebdf92
>> >>>>
>> >>>
>> >>> Please vote +1 or -1. -1 votes should be accompanied by an explanation
>> of
>> >>> the reason. Only PPMC members and mentors have binding votes, but other
>> >>> community members are encouraged to cast non-binding votes. This vote
>> will
>> >>> pass if there are 3 binding +1 votes and more binding +1 votes than -1
>> >>> votes.
>> >>>
>> >>> This wiki page describes how to check the release before you vote:
>> >>> *https://cwiki.apache.org/confluence/display/IMPALA/How+
>> to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate
>> >>> <https://cwiki.apache.org/confluence/display/IMPALA/How+
>> to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate>*
>> >>>
>> >>> The vote will be open until the end of day, September 5th, Pacific time
>> >>> zone (UTC-08:00).
>> >>> Once the vote passes the Impala PPMC vote, it still must pass the
>> incubator
>> >>> PMC vote before a release is made.
>>


Re: [VOTE] 2.10.0 release candidate 2 (RC2)

2017-08-31 Thread Jim Apple
This ASAN testing failed:

https://jenkins.impala.io/view/Utility/job/ubuntu-16.04-from-scratch/218/consoleFull
failed in 
query_test/test_udfs.py::TestUdfExecution::test_ir_functions[exec_option:
{'disable_codegen_rows_threshold': 0, 'disable_codegen': False,
'exec_single_node_rows_threshold': 0, 'enable_expr_rewrites': False} |
table_format: text/none]. Looks like a crash to me.

I didn't see a corresponding bug. Has anyone else seen something like
this before?

On Thu, Aug 31, 2017 at 1:15 PM, Jim Apple <jbap...@cloudera.com> wrote:
> BTW, this Jenkins job includes the log of what it tested, which
> follows the Release Guide, so you should be able to follow along OK.
> All committers should have access to run that job, too, if you don't
> trust my result.
>
> I am also testing exhaustive (not just core) tests at
> https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/, builds
> 218-220 (once the instances come up).
>
> I tested with ImpalaLZO at this commit:
> https://github.com/cloudera/impala-lzo/tree/62c4b94ed6e89f0ce2068280864546ebccfb0729
>
> On Thu, Aug 31, 2017 at 6:19 AM, Jim Apple <jbap...@cloudera.com> wrote:
>> +1
>>
>> https://jenkins.impala.io/job/release-test/20/console
>>
>> This tested following
>> https://cwiki.apache.org/confluence/display/IMPALA/How+to+load+and+run+Impala+tests
>> and 
>> https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate.
>>
>> On Wed, Aug 30, 2017 at 11:35 PM, Bharath Vissapragada
>> <bhara...@cloudera.com> wrote:
>>> This is a vote to release Impala 2.10.0.
>>>
>>> - The artefacts for testing can be downloaded from <
>>> https://dist.apache.org/repos/dist/dev/incubator/impala/2.10.0/RC2/>
>>>
>>> - The git tag for this release candidate is 2.10.0-rc2 and treehash is
>>> visible at
>>> <
>>> https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git;a=tree;hb=23d79462da5d0108709e8b1399c97606f4ebdf92
>>>>
>>>
>>> Please vote +1 or -1. -1 votes should be accompanied by an explanation of
>>> the reason. Only PPMC members and mentors have binding votes, but other
>>> community members are encouraged to cast non-binding votes. This vote will
>>> pass if there are 3 binding +1 votes and more binding +1 votes than -1
>>> votes.
>>>
>>> This wiki page describes how to check the release before you vote:
>>> *https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate
>>> <https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate>*
>>>
>>> The vote will be open until the end of day, September 5th, Pacific time
>>> zone (UTC-08:00).
>>> Once the vote passes the Impala PPMC vote, it still must pass the incubator
>>> PMC vote before a release is made.


Re: jenkins.impala.io pre-existing workspace

2017-08-31 Thread Jim Apple
Also, to be clear, I don't have the cycles to lead the fix-the-cleanup
task at the moment.

On Wed, Aug 30, 2017 at 4:45 PM, Jim Apple <jbap...@cloudera.com> wrote:
> The workspace cleanup isn't working - see the last bit of any recent
> ub1604 job: 
> https://jenkins.impala.io/view/Utility/job/ubuntu-16.04-from-scratch/206/console
>
> 03:56:40.920 [WS-CLEANUP] Deleting project workspace...Cannot delete
> workspace :remote file operation failed: /home/ubuntu at
> hudson.remoting.Channel@4384d5b9:ubuntu-16.04 (i-032d527b9c801df4c):
> java.io.IOException: Unable to delete '/home/ubuntu'. Tried 3 times
> (of a maximum of 3) waiting 0.1 sec between attempts.
> 03:56:48.161 ERROR: Step ‘Delete workspace when build is done’ failed:
> Cannot delete workspace: remote file operation failed: /home/ubuntu at
> hudson.remoting.Channel@4384d5b9:ubuntu-16.04 (i-032d527b9c801df4c):
> java.io.IOException: Unable to delete '/home/ubuntu'. Tried 3 times
> (of a maximum of 3) waiting 0.1 sec between attempts.
>
> The workspace is $HOME, so you can't just delete it without being root.
>
> This could be changed to
>
> 1. A post-build script to "rm -rf ~/*". This doesn't reset everything,
> though - the job makes changes to other parts of the filesystem.
>
> 2. A post-build script to "sudo shutdown -h now" to make sure ec2
> instances are not re-used. I'm not sure how Jenkins would feel about
> this. :-)
>
> 3. A post-build script to move $HOME to some archived location on the
> disk, to preserve debuggability.
>
> 4. A bash trap in the script to do one of the above.
>
> 5. Run the whole thing in a docker in the build machine, then delete
> the container when the script is done. Or don't, if there's enough
> disk space to not worry about that.
>
> 6. Do all of the work in a workspace inside $HOME. This would require
> some changes to bootstrap_development.sh.
>
> #5 is the most hermetic, I'd guess.
>
> On Thu, Aug 24, 2017 at 8:29 AM, Michael Brown <mi...@cloudera.com> wrote:
>> Looks like someone has done this.
>>
>> On Wed, Aug 23, 2017 at 8:16 PM, Alexander Behm <alex.b...@cloudera.com>
>> wrote:
>>
>>> Yes, let's please add the post-build action for sanity and consistency with
>>> our other jobs.
>>>
>>> On Wed, Aug 23, 2017 at 7:42 PM, Tim Armstrong <tarmstr...@cloudera.com>
>>> wrote:
>>>
>>> > Maybe the workspace just got left in a weird state - I think in most
>>> cases
>>> > "git init" followed by checking out a branch and doing a clean would
>>> work.
>>> >
>>> > Should we add the delete workspace post-build action?
>>> >
>>> > On Wed, Aug 23, 2017 at 5:32 PM, Michael Brown <mi...@cloudera.com>
>>> wrote:
>>> >
>>> > > Not a known issue. I noticed ubuntu-16.04-from-scratch is not set to
>>> > clean
>>> > > up its workspace, and its config has not been touched since Aug 11. It
>>> > > seems strange we only saw this now
>>> > >
>>> > > On Wed, Aug 23, 2017 at 5:25 PM, Tim Armstrong <
>>> tarmstr...@cloudera.com>
>>> > > wrote:
>>> > >
>>> > > > Is this a known problem? My job failed because the Impala repo
>>> already
>>> > > > existed on the machine:
>>> > > >
>>> > > > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/164/
>>> > > >
>>> > > > *23:00:24* + /usr/bin/git init /home/ubuntu/Impala*23:00:24*
>>> > > > Reinitialized existing Git repository in /home/ubuntu/Impala/.git/
>>> > > > 
>>> > > > *23:02:18* + for ITER in '$(seq 1 10)'*23:02:18* + echo 'ATTEMPT:
>>> > > > 1'*23:02:18* ATTEMPT: 1*23:02:18* + /usr/bin/git checkout
>>> > > > FETCH_HEAD*23:02:18* + cat
>>> > > > /home/ubuntu/Impala/tmp.3tYBn0GUga*23:02:18* 23:02:18.712300
>>> git.c:344
>>> > > >   trace: built-in: git 'checkout' 'FETCH_HEAD'*23:02:18*
>>> > > > error: The following untracked working tree files would be
>>> overwritten
>>> > > > by checkout:*23:02:18*  .clang-format*23:02:18*
>>> > > >  .clang-tidy*23:02:18*
>>> > > > .gitignore*23:02:18*CMakeLists.txt*23:02:18*
>>> > > > DISCLAIMER*23:02:18*
>>> > > > EXPORT_CONTROL.md*23:02:18* LICENSE.txt*23:02:18*
>>> > > >  LOGS.md*23:02:18*
>>> > > > NOTICE.txt*23:02:18*README.md*23:02:18*
>>> > > >  be/.gitignore*23:02:18*
>>> > > > be/.impala.doxy*23:02:18*   be/CMakeLists.txt*23:02:18*
>>> > > > be/src/benchmarks/CMakeLists.txt*23:02:18*
>>> > > > be/src/benchmarks/atod-benchmark.cc*23:02:18*
>>> > > > be/src/benchmarks/atof-benchmark.cc*23:02:18*
>>> > > > be/src/benchmarks/atoi-benchmark.cc*23:02:18*
>>> > > > be/src/benchmarks/bit-packing-benchmark.cc*23:02:18*
>>> > > > be/src/benchmarks/bitmap-benchmark.cc
>>> > > > ...
>>> > > >
>>> > >
>>> >
>>>


Re: [VOTE] 2.10.0 release candidate 2 (RC2)

2017-08-31 Thread Jim Apple
BTW, this Jenkins job includes the log of what it tested, which
follows the Release Guide, so you should be able to follow along OK.
All committers should have access to run that job, too, if you don't
trust my result.

I am also testing exhaustive (not just core) tests at
https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/, builds
218-220 (once the instances come up).

I tested with ImpalaLZO at this commit:
https://github.com/cloudera/impala-lzo/tree/62c4b94ed6e89f0ce2068280864546ebccfb0729

On Thu, Aug 31, 2017 at 6:19 AM, Jim Apple <jbap...@cloudera.com> wrote:
> +1
>
> https://jenkins.impala.io/job/release-test/20/console
>
> This tested following
> https://cwiki.apache.org/confluence/display/IMPALA/How+to+load+and+run+Impala+tests
> and 
> https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate.
>
> On Wed, Aug 30, 2017 at 11:35 PM, Bharath Vissapragada
> <bhara...@cloudera.com> wrote:
>> This is a vote to release Impala 2.10.0.
>>
>> - The artefacts for testing can be downloaded from <
>> https://dist.apache.org/repos/dist/dev/incubator/impala/2.10.0/RC2/>
>>
>> - The git tag for this release candidate is 2.10.0-rc2 and treehash is
>> visible at
>> <
>> https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git;a=tree;hb=23d79462da5d0108709e8b1399c97606f4ebdf92
>>>
>>
>> Please vote +1 or -1. -1 votes should be accompanied by an explanation of
>> the reason. Only PPMC members and mentors have binding votes, but other
>> community members are encouraged to cast non-binding votes. This vote will
>> pass if there are 3 binding +1 votes and more binding +1 votes than -1
>> votes.
>>
>> This wiki page describes how to check the release before you vote:
>> *https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate
>> <https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate>*
>>
>> The vote will be open until the end of day, September 5th, Pacific time
>> zone (UTC-08:00).
>> Once the vote passes the Impala PPMC vote, it still must pass the incubator
>> PMC vote before a release is made.


Re: [VOTE] 2.10.0 release candidate 2 (RC2)

2017-08-31 Thread Jim Apple
Clarification: +1 (binding)

As a reminder, binding votes in the PPMC vote are not binding in the
IPMC vote unless the voter is also an IPMC member.

Since I am not, this is a PPMC "+1 (binding)" only.

On Thu, Aug 31, 2017 at 6:19 AM, Jim Apple <jbap...@cloudera.com> wrote:
> +1
>
> https://jenkins.impala.io/job/release-test/20/console
>
> This tested following
> https://cwiki.apache.org/confluence/display/IMPALA/How+to+load+and+run+Impala+tests
> and 
> https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate.
>
> On Wed, Aug 30, 2017 at 11:35 PM, Bharath Vissapragada
> <bhara...@cloudera.com> wrote:
>> This is a vote to release Impala 2.10.0.
>>
>> - The artefacts for testing can be downloaded from <
>> https://dist.apache.org/repos/dist/dev/incubator/impala/2.10.0/RC2/>
>>
>> - The git tag for this release candidate is 2.10.0-rc2 and treehash is
>> visible at
>> <
>> https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git;a=tree;hb=23d79462da5d0108709e8b1399c97606f4ebdf92
>>>
>>
>> Please vote +1 or -1. -1 votes should be accompanied by an explanation of
>> the reason. Only PPMC members and mentors have binding votes, but other
>> community members are encouraged to cast non-binding votes. This vote will
>> pass if there are 3 binding +1 votes and more binding +1 votes than -1
>> votes.
>>
>> This wiki page describes how to check the release before you vote:
>> *https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate
>> <https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate>*
>>
>> The vote will be open until the end of day, September 5th, Pacific time
>> zone (UTC-08:00).
>> Once the vote passes the Impala PPMC vote, it still must pass the incubator
>> PMC vote before a release is made.


Re: [VOTE] 2.10.0 release candidate 2 (RC2)

2017-08-31 Thread Jim Apple
+1

https://jenkins.impala.io/job/release-test/20/console

This tested following
https://cwiki.apache.org/confluence/display/IMPALA/How+to+load+and+run+Impala+tests
and 
https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate.

On Wed, Aug 30, 2017 at 11:35 PM, Bharath Vissapragada
 wrote:
> This is a vote to release Impala 2.10.0.
>
> - The artefacts for testing can be downloaded from <
> https://dist.apache.org/repos/dist/dev/incubator/impala/2.10.0/RC2/>
>
> - The git tag for this release candidate is 2.10.0-rc2 and treehash is
> visible at
> <
> https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git;a=tree;hb=23d79462da5d0108709e8b1399c97606f4ebdf92
>>
>
> Please vote +1 or -1. -1 votes should be accompanied by an explanation of
> the reason. Only PPMC members and mentors have binding votes, but other
> community members are encouraged to cast non-binding votes. This vote will
> pass if there are 3 binding +1 votes and more binding +1 votes than -1
> votes.
>
> This wiki page describes how to check the release before you vote:
> *https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate
> *
>
> The vote will be open until the end of day, September 5th, Pacific time
> zone (UTC-08:00).
> Once the vote passes the Impala PPMC vote, it still must pass the incubator
> PMC vote before a release is made.


Re: jenkins.impala.io pre-existing workspace

2017-08-30 Thread Jim Apple
The workspace cleanup isn't working - see the last bit of any recent
ub1604 job: 
https://jenkins.impala.io/view/Utility/job/ubuntu-16.04-from-scratch/206/console

03:56:40.920 [WS-CLEANUP] Deleting project workspace...Cannot delete
workspace :remote file operation failed: /home/ubuntu at
hudson.remoting.Channel@4384d5b9:ubuntu-16.04 (i-032d527b9c801df4c):
java.io.IOException: Unable to delete '/home/ubuntu'. Tried 3 times
(of a maximum of 3) waiting 0.1 sec between attempts.
03:56:48.161 ERROR: Step ‘Delete workspace when build is done’ failed:
Cannot delete workspace: remote file operation failed: /home/ubuntu at
hudson.remoting.Channel@4384d5b9:ubuntu-16.04 (i-032d527b9c801df4c):
java.io.IOException: Unable to delete '/home/ubuntu'. Tried 3 times
(of a maximum of 3) waiting 0.1 sec between attempts.

The workspace is $HOME, so you can't just delete it without being root.

This could be changed to

1. A post-build script to "rm -rf ~/*". This doesn't reset everything,
though - the job makes changes to other parts of the filesystem.

2. A post-build script to "sudo shutdown -h now" to make sure ec2
instances are not re-used. I'm not sure how Jenkins would feel about
this. :-)

3. A post-build script to move $HOME to some archived location on the
disk, to preserve debuggability.

4. A bash trap in the script to do one of the above.

5. Run the whole thing in a docker in the build machine, then delete
the container when the script is done. Or don't, if there's enough
disk space to not worry about that.

6. Do all of the work in a workspace inside $HOME. This would require
some changes to bootstrap_development.sh.

#5 is the most hermetic, I'd guess.

On Thu, Aug 24, 2017 at 8:29 AM, Michael Brown  wrote:
> Looks like someone has done this.
>
> On Wed, Aug 23, 2017 at 8:16 PM, Alexander Behm 
> wrote:
>
>> Yes, let's please add the post-build action for sanity and consistency with
>> our other jobs.
>>
>> On Wed, Aug 23, 2017 at 7:42 PM, Tim Armstrong 
>> wrote:
>>
>> > Maybe the workspace just got left in a weird state - I think in most
>> cases
>> > "git init" followed by checking out a branch and doing a clean would
>> work.
>> >
>> > Should we add the delete workspace post-build action?
>> >
>> > On Wed, Aug 23, 2017 at 5:32 PM, Michael Brown 
>> wrote:
>> >
>> > > Not a known issue. I noticed ubuntu-16.04-from-scratch is not set to
>> > clean
>> > > up its workspace, and its config has not been touched since Aug 11. It
>> > > seems strange we only saw this now
>> > >
>> > > On Wed, Aug 23, 2017 at 5:25 PM, Tim Armstrong <
>> tarmstr...@cloudera.com>
>> > > wrote:
>> > >
>> > > > Is this a known problem? My job failed because the Impala repo
>> already
>> > > > existed on the machine:
>> > > >
>> > > > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/164/
>> > > >
>> > > > *23:00:24* + /usr/bin/git init /home/ubuntu/Impala*23:00:24*
>> > > > Reinitialized existing Git repository in /home/ubuntu/Impala/.git/
>> > > > 
>> > > > *23:02:18* + for ITER in '$(seq 1 10)'*23:02:18* + echo 'ATTEMPT:
>> > > > 1'*23:02:18* ATTEMPT: 1*23:02:18* + /usr/bin/git checkout
>> > > > FETCH_HEAD*23:02:18* + cat
>> > > > /home/ubuntu/Impala/tmp.3tYBn0GUga*23:02:18* 23:02:18.712300
>> git.c:344
>> > > >   trace: built-in: git 'checkout' 'FETCH_HEAD'*23:02:18*
>> > > > error: The following untracked working tree files would be
>> overwritten
>> > > > by checkout:*23:02:18*  .clang-format*23:02:18*
>> > > >  .clang-tidy*23:02:18*
>> > > > .gitignore*23:02:18*CMakeLists.txt*23:02:18*
>> > > > DISCLAIMER*23:02:18*
>> > > > EXPORT_CONTROL.md*23:02:18* LICENSE.txt*23:02:18*
>> > > >  LOGS.md*23:02:18*
>> > > > NOTICE.txt*23:02:18*README.md*23:02:18*
>> > > >  be/.gitignore*23:02:18*
>> > > > be/.impala.doxy*23:02:18*   be/CMakeLists.txt*23:02:18*
>> > > > be/src/benchmarks/CMakeLists.txt*23:02:18*
>> > > > be/src/benchmarks/atod-benchmark.cc*23:02:18*
>> > > > be/src/benchmarks/atof-benchmark.cc*23:02:18*
>> > > > be/src/benchmarks/atoi-benchmark.cc*23:02:18*
>> > > > be/src/benchmarks/bit-packing-benchmark.cc*23:02:18*
>> > > > be/src/benchmarks/bitmap-benchmark.cc
>> > > > ...
>> > > >
>> > >
>> >
>>


Re: [VOTE] 2.10.0 release candidate 1 (RC1)

2017-08-30 Thread Jim Apple
I ran some release tests following the instructions
https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate
and 
https://cwiki.apache.org/confluence/display/IMPALA/How+to+load+and+run+Impala+tests.
Everything passed.

I would +1, but I notice downthread that there is going to be an rc2,
so: +0 for now.

On Sun, Aug 27, 2017 at 10:32 PM, Bharath Vissapragada
 wrote:
> This is a vote to release Impala 2.10.0.
>
> - The artefacts for testing can be downloaded from <
> https://dist.apache.org/repos/dist/dev/incubator/impala/2.10.0/RC1/>.
>
> - The git tag for this release candidate is 2.10.0-rc1 and tree hash is
> visible at
> <
> https://git-wip-us.apache.org/repos/asf?p=incubator-impala.git;a=tree;hb=2a7c8b9011905bfeb21b0610f0739f9df9daacef
>>
>
> Please vote +1 or -1. -1 votes should be accompanied by an explanation of
> the reason. Only PPMC members and mentors have binding votes, but other
> community members are encouraged to cast non-binding votes. This vote will
> pass if there are 3 binding +1 votes and more binding +1 votes than -1
> votes.
>
> This wiki page describes how to check the release before you vote:
> *https://cwiki.apache.org/confluence/display/IMPALA/How+to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate
> *
>
> The vote will be open until the end of Wednesday, August 30, Pacific time
> zone (UTC-08:00).
> Once the vote passes the Impala PPMC vote, it still must pass the incubator
> PMC vote before a release is made.


Re: IGFS (Ignite FS) support

2017-08-30 Thread Jim Apple
Doesn't look like there is a JIRA yet:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20IMPALA%20AND%20text%20~%20%22ignite%22

https://issues.apache.org/jira/issues/?jql=project%20%3D%20IMPALA%20AND%20text%20~%20%22igfs%22

You might want to file one as a single point of coordination for
people interested in this work.

On Wed, Aug 30, 2017 at 2:59 AM, Andrey Kuznetsov
 wrote:
> Hi team,
> My team has faced with follow issue:
> Impala doesn't work with IGFS because IgniteHadoopFileSystem is not validated 
>  as WritableFilesystem, because  IgniteHadoopFileSystem  does not extend 
> org.apache.hadoop.hdfs.DistributedFileSystem  and not listed in others 
> acceptable file systems list:
>
>* Returns true iff the given location is on a filesystem that Impala can 
> write to.
>
>*/
>
>   public static boolean isImpalaWritableFilesystem(String location)
>
>   throws IOException {
>
> Path path = new Path(location);
>
> return (FileSystemUtil.isDistributedFileSystem(path) ||
>
> FileSystemUtil.isLocalFileSystem(path) ||
>
> FileSystemUtil.isS3AFileSystem(path) ||
>
> FileSystemUtil.isADLFileSystem(path));
>
>   }
>
>
> Does anybody know if we plan to support IGFS?
>
> Best regards,
> ANDREY KUZNETSOV


Re: Re: Re: Re: Load Data from HDFS to Parquet

2017-08-14 Thread Jim Apple
I do not know of a way to get Impala to read data that it does not
consider a table. Are you concerned about the overhead of Impala's
maintenance of the metadata?

On Mon, Aug 14, 2017 at 7:57 PM, sky <x_h...@163.com> wrote:
> Thank you,
> I am currently using this way. But  is there any way to load data from
> hdfs to parquet table not via external table or internal table?
>
>
>
>
>
>
> At 2017-08-15 10:53:55, "Jim Apple" <jbap...@cloudera.com> wrote:
>>http://impala.apache.org/docs/build/html/topics/impala_create_table.html#create_table
>>
>>I think you can follow these two steps in order:
>>
>>1. Make an external table referring to the CSV
>>
>>2. Use CREATE TABLE AS SELECT to make a parquet table
>>
>>On Mon, Aug 14, 2017 at 7:48 PM, sky <x_h...@163.com> wrote:
>>> csv file on the HDFS.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 2017-08-15 10:42:13, "Jim Apple" <jbap...@cloudera.com> wrote:
>>>>Is the data in a format that Impala can read?
>>>>
>>>>On Mon, Aug 14, 2017 at 7:31 PM, sky <x_h...@163.com> wrote:
>>>>> Thank you,
>>>>> I read the document.But it only describes the conversion of
>>>>> internal
>>>>> and external tables.How to directly load data to parquet table? Could
>>>>> you
>>>>> provide an example? Thank You !
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> At 2017-08-15 03:25:43, "Jim Apple" <jbap...@cloudera.com> wrote:
>>>>>>Maybe this will help:
>>>>>>
>>>>>>http://impala.apache.org/docs/build/html/topics/impala_create_table.html#create_table
>>>>>>
>>>>>>"Although the EXTERNAL and LOCATION clauses are often specified
>>>>>>together, LOCATION is optional for external tables, and you can also
>>>>>>specify LOCATION for internal tables. The difference is all about
>>>>>>whether Impala "takes control" of the underlying data files and moves
>>>>>>them when you rename the table, or deletes them when you drop the
>>>>>>table. For more about internal and external tables and how they
>>>>>>interact with the LOCATION attribute, see Overview of Impala Tables."
>>>>>>
>>>>>>On Thu, Aug 10, 2017 at 10:45 PM, sky <x_h...@163.com> wrote:
>>>>>>> Hi all,
>>>>>>> Is there any way to load data from hdfs to parquet table not via
>>>>>>> external table or inner table?
>>>
>>>
>>>
>>>
>
>
>
>


Re: Re: Re: Load Data from HDFS to Parquet

2017-08-14 Thread Jim Apple
http://impala.apache.org/docs/build/html/topics/impala_create_table.html#create_table

I think you can follow these two steps in order:

1. Make an external table referring to the CSV

2. Use CREATE TABLE AS SELECT to make a parquet table

On Mon, Aug 14, 2017 at 7:48 PM, sky <x_h...@163.com> wrote:
> csv file on the HDFS.
>
>
>
>
>
>
>
> At 2017-08-15 10:42:13, "Jim Apple" <jbap...@cloudera.com> wrote:
>>Is the data in a format that Impala can read?
>>
>>On Mon, Aug 14, 2017 at 7:31 PM, sky <x_h...@163.com> wrote:
>>> Thank you,
>>> I read the document.But it only describes the conversion of internal
>>> and external tables.How to directly load data to parquet table? Could you
>>> provide an example? Thank You !
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> At 2017-08-15 03:25:43, "Jim Apple" <jbap...@cloudera.com> wrote:
>>>>Maybe this will help:
>>>>
>>>>http://impala.apache.org/docs/build/html/topics/impala_create_table.html#create_table
>>>>
>>>>"Although the EXTERNAL and LOCATION clauses are often specified
>>>>together, LOCATION is optional for external tables, and you can also
>>>>specify LOCATION for internal tables. The difference is all about
>>>>whether Impala "takes control" of the underlying data files and moves
>>>>them when you rename the table, or deletes them when you drop the
>>>>table. For more about internal and external tables and how they
>>>>interact with the LOCATION attribute, see Overview of Impala Tables."
>>>>
>>>>On Thu, Aug 10, 2017 at 10:45 PM, sky <x_h...@163.com> wrote:
>>>>> Hi all,
>>>>> Is there any way to load data from hdfs to parquet table not via
>>>>> external table or inner table?
>
>
>
>


Re: Re: Load Data from HDFS to Parquet

2017-08-14 Thread Jim Apple
Is the data in a format that Impala can read?

On Mon, Aug 14, 2017 at 7:31 PM, sky <x_h...@163.com> wrote:
> Thank you,
> I read the document.But it only describes the conversion of internal and 
> external tables.How to directly load data to parquet table? Could you provide 
> an example? Thank You !
>
>
>
>
>
>
>
> At 2017-08-15 03:25:43, "Jim Apple" <jbap...@cloudera.com> wrote:
>>Maybe this will help:
>>
>>http://impala.apache.org/docs/build/html/topics/impala_create_table.html#create_table
>>
>>"Although the EXTERNAL and LOCATION clauses are often specified
>>together, LOCATION is optional for external tables, and you can also
>>specify LOCATION for internal tables. The difference is all about
>>whether Impala "takes control" of the underlying data files and moves
>>them when you rename the table, or deletes them when you drop the
>>table. For more about internal and external tables and how they
>>interact with the LOCATION attribute, see Overview of Impala Tables."
>>
>>On Thu, Aug 10, 2017 at 10:45 PM, sky <x_h...@163.com> wrote:
>>> Hi all,
>>> Is there any way to load data from hdfs to parquet table not via 
>>> external table or inner table?


Re: Load Data from HDFS to Parquet

2017-08-14 Thread Jim Apple
Maybe this will help:

http://impala.apache.org/docs/build/html/topics/impala_create_table.html#create_table

"Although the EXTERNAL and LOCATION clauses are often specified
together, LOCATION is optional for external tables, and you can also
specify LOCATION for internal tables. The difference is all about
whether Impala "takes control" of the underlying data files and moves
them when you rename the table, or deletes them when you drop the
table. For more about internal and external tables and how they
interact with the LOCATION attribute, see Overview of Impala Tables."

On Thu, Aug 10, 2017 at 10:45 PM, sky  wrote:
> Hi all,
> Is there any way to load data from hdfs to parquet table not via external 
> table or inner table?


Re: [DISCUSS] 2.10.0 release

2017-08-14 Thread Jim Apple
This sounds like a good idea to me. Thank you for volunteering!

On Mon, Aug 14, 2017 at 12:37 AM, Bharath Vissapragada
 wrote:
> Folks,
>
> It has been almost 2 months since we released Apache Impala (incubating)
> 2.9.0 and there have been new feature improvements and a good number of bug
> fixes checked in since then.
>
> I propose that we release 2.10.0 soon and I volunteer to be its release
> manager. Please speak up and let the community know if anyone has any
> objections to this.
>
> Thanks,
> Bharath


Re: Small change to all-build-options job

2017-08-12 Thread Jim Apple
Thanks, Henry!

On Fri, Aug 11, 2017 at 11:08 PM, Henry Robinson  wrote:
> [You can probably skip this mail unless you're interested. TLDR: I slightly
> changed the all-build-options job in a very minor way; let me know if you
> see issues with it]
>
> Impala will shortly have a dependency on libkb5-dev, which provides
> Kerberos headers and libraries for security on the KRPC branch. As of Jim's
> recent work, the bootstrap_development.sh script installs that on an Ubuntu
> machine as a pre-requisite. I have added it to bootstrap_build.sh myself.
>
> However, the all-build-options job doesn't seem to use either script, and
> does not appear to install any dependencies. This blocks the patch that
> introduces the libkrb5 dependency from passing GVO. So for now, I've added
> the apt-get line from our bootstrap commands to the job script directly.
> This is a temporary change, and over the next few days I'll file a JIRA to
> sort out the bootstrap scripts so we can call it directly (if all GVO jobs
> are succeeding).
>
> My test jobs have passed the apt-get statement (having installed
> libkrb5-dev succesfully), and seem to be proceeding fine, so there will
> probably be no visible effect from this change. But if you see suspicious
> failures in all-build-options, let me know.
>
> Thanks,
> Henry


Making a new development environment from scratch

2017-08-11 Thread Jim Apple
bin/bootstrap_development.sh now no longer references the chef repo <
https://github.com/awleblang/impala-setup>. My hope is that this makes it
easier to maintain and extend to new Linux distributions.

It now also supports Ubuntu 16.04 and 14.04. I have tested it in EC2,
Google Compute Engine, and Docker.

It currently uses OpenJDK -- version 7 on 14.04 and version 8 on 16.04. It
could be changed to Oracle JDK on 16.04:
https://issues.apache.org/jira/browse/IMPALA-5793

The pre-merge job has been switched away from using <
https://jenkins.impala.io/view/Utility/job/ubuntu-14.04-from-scratch/> to
using . You will
need to rebase before starting the pre-merge job.


Re: Unable to start catalog, but with no error message?

2017-08-10 Thread Jim Apple
I can no longer repro this. ¯\_(ツ)_/¯

On Sun, Jul 30, 2017 at 11:48 AM, Bharath Vissapragada
<bhara...@cloudera.com> wrote:
> How about attaching 'strace' to the catalogd startup and see where it
> crashes (if its reproducible on demand) ? May be others have better ideas.
>
> On Sat, Jul 29, 2017 at 3:14 PM, Jim Apple <jbap...@cloudera.com> wrote:
>
>> To be specific about "no error message": the logs written in the logs
>> directory near the time of the crash are nearly identical to those of a
>> process that got much further on a machine with a configuration that I do
>> not know how to reproduce. The one that ended earlier has output like:
>>
>> Creating /test-warehouse HDFS directory (logging to
>> /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
>> OK (Took: 0 min 2 sec)
>> Derived params for create-load-data.sh:
>> EXPLORATION_STRATEGY=exhaustive
>> SKIP_METADATA_LOAD=0
>> SKIP_SNAPSHOT_LOAD=0
>> SNAPSHOT_FILE=
>> CM_HOST=
>> REMOTE_LOAD=
>> Starting Impala cluster (logging to
>> /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
>> FAILED (Took: 0 min 11 sec)
>> '/home/ubuntu/Impala/bin/start-impala-cluster.py
>> --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3' failed. Tail of log:
>> Log for command '/home/ubuntu/Impala/bin/start-impala-cluster.py
>> --log_dir=/home/ubuntu/Impala/logs/data_loading -s 3'
>> Starting State Store logging to
>> /home/ubuntu/Impala/logs/data_loading/statestored.INFO
>> Starting Catalog Service logging to
>> /home/ubuntu/Impala/logs/data_loading/catalogd.INFO
>> Error starting cluster: Unable to start catalogd. Check log or file
>> permissions for more details.
>> Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
>> LOAD_DATA_ARGS=""
>> + cleanup
>> + rm -rf /tmp/tmp.HVkbPNl08R
>>
>>
>> The one that got further in the process (and I think may be dying due to a
>> spurious out-of-disk failure that I am putting on the back-burner for the
>> moment) has the following output:
>>
>> Creating /test-warehouse HDFS directory (logging to
>> /home/ubuntu/Impala/logs/data_loading/create-test-warehouse-dir.log)...
>> OK (Took: 0 min 2 sec)
>> Derived params for create-load-data.sh:
>> EXPLORATION_STRATEGY=exhaustive
>> SKIP_METADATA_LOAD=0
>> SKIP_SNAPSHOT_LOAD=0
>> SNAPSHOT_FILE=
>> CM_HOST=
>> REMOTE_LOAD=
>> Starting Impala cluster (logging to
>> /home/ubuntu/Impala/logs/data_loading/start-impala-cluster.log)...
>> OK (Took: 0 min 11 sec)
>> Setting up HDFS environment (logging to
>> /home/ubuntu/Impala/logs/data_loading/setup-hdfs-env.log)...
>> OK (Took: 0 min 8 sec)
>> Loading custom schemas (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-custom-schemas.log)...
>> OK (Took: 0 min 35 sec)
>> Loading functional-query data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-functional-query.log)...
>> OK (Took: 37 min 14 sec)
>> Loading TPC-H data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-tpch.log)...
>> OK (Took: 14 min 11 sec)
>> Loading nested data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-nested.log)...
>> OK (Took: 3 min 41 sec)
>> Loading TPC-DS data (logging to
>> /home/ubuntu/Impala/logs/data_loading/load-tpcds.log)...
>> FAILED (Took: 5 min 50 sec)
>> 'load-data tpcds core' failed. Tail of log:
>> ss_net_paid_inc_tax,
>> ss_net_profit,
>> ss_sold_date_sk
>> from store_sales_unpartitioned
>> WHERE ss_sold_date_sk < 2451272
>> distribute by ss_sold_date_sk
>> INFO  : Query ID =
>> ubuntu_20170729150909_583df9cf-e54b-44bf-a104-ef5e690cfa0d
>> INFO  : Total jobs = 1
>> INFO  : Launching Job 1 out of 1
>> INFO  : Starting task [Stage-1:MAPRED] in serial mode
>> INFO  : Number of reduce tasks not specified. Estimated from input data
>> size: 2
>> INFO  : In order to change the average load for a reducer (in bytes):
>> INFO  :   set hive.exec.reducers.bytes.per.reducer=
>> INFO  : In order to limit the maximum number of reducers:
>> INFO  :   set hive.exec.reducers.max=
>> INFO  : In order to set a constant number of reducers:
>> INFO  :   set mapreduce.job.reduces=
>> INFO  : number of splits:2
>> INFO  : Submitting tokens for job: job_local1041198115_0826
>> INFO  : The url to track the job: http://localhost:8080/
>> INFO  : Job running in-process (local Hadoop)
>> INFO  : 2017-07-29 15:09:25,495 Stage-1 

Re: [DRAFT] Incubator PMC Board Report - August 2017

2017-08-06 Thread Jim Apple
> * Podlings missing sign off, will be moved to failed to report

This is new, right? I don't see it in my email archives that "will be
moved to failed to report" has appeared previously.


Re: Reminder: "newbie" label on tickets

2017-07-31 Thread Jim Apple
https://issues.apache.org/jira/browse/IMPALA-5742?jql=project%20%3D%20IMPALA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20labels%20%3D%20newbie%20AND%20assignee%20in%20(EMPTY)

On Mon, Jul 31, 2017 at 7:01 PM, yu feng <olaptes...@gmail.com> wrote:

> As a newbie to impala community, I have done one JIRA, where is the newbie
> tickets which I can try to solve it.  Thanks a lot
>
> 2017-07-31 23:57 GMT+08:00 Tim Armstrong <tarmstr...@cloudera.com>:
>
> > Let's also make sure that everything with the "newbie" label is actually
> > straightforward and has a clear end-goal. Oh, and is reasonably issue to
> > test.
> >
> > E.g. adding a built-in function is a good one if the semantics of the
> > function are clearly documented in the JIRA and there aren't any
> potential
> > compatibility issues.
> >
> > We've seen a few new contributors pick up JIRAs with the newbie that
> > sounded easy but were actually tricky to get right - that's not a great
> > experience.
> >
> >
> >
> > On Sun, Jul 30, 2017 at 1:30 PM, Jim Apple <jbap...@cloudera.com> wrote:
> >
> > > As a reminder, when you file a ticket, you can label tickets that could
> > be
> > > completed by a first-time Impala contributor "newbie". This can be a
> tool
> > > to help grow the community.
> > >
> >
>


Reminder: "newbie" label on tickets

2017-07-30 Thread Jim Apple
As a reminder, when you file a ticket, you can label tickets that could be
completed by a first-time Impala contributor "newbie". This can be a tool
to help grow the community.


Re: Unable to start catalog, but with no error message?

2017-07-29 Thread Jim Apple
beeline.BeeLine.execute(BeeLine.java:1010)
at org.apache.hive.beeline.BeeLine.executeFile(BeeLine.java:987)
at org.apache.hive.beeline.BeeLine.begin(BeeLine.java:914)
at
org.apache.hive.beeline.BeeLine.mainWithInputRedirection(BeeLine.java:518)
at org.apache.hive.beeline.BeeLine.main(BeeLine.java:501)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
Error executing file from Hive: load-tpcds-core-hive-generated.sql
Error in /home/ubuntu/Impala/testdata/bin/create-load-data.sh at line 48:
LOAD_DATA_ARGS=""
+ cleanup
+ rm -rf /tmp/tmp.Yfeh8QGfi1




On Sat, Jul 29, 2017 at 12:47 AM, Jim Apple <jbap...@cloudera.com> wrote:

> I'm seeing https://issues.apache.org/jira/browse/IMPALA-5700 when trying
> to bootstrap a new development environment on an EC2 machine with Ubuntu
> 14.04, 250GB of free disk space and over 60GB of free memory. I've seen
> this with and without the -so flag.
>
> I'm running the below script, which I thought was the canonical way to
> bootstrap a development environment. When catalog doesn't start, I don't
> see anything amiss in any of the logs. I was thinking that maybe a port is
> closed that should be open? I only have port 22 open in my ec2 config.
>
> Has anyone else fixed a problem like this before?
>
> #!/bin/bash -eux
>
> IMPALA_REPO_URL=https://git-wip-us.apache.org/repos/asf/
> incubator-impala.git
> IMPALA_REPO_BRANCH=master
>
> sudo apt-get install --yes git
>
> sudo apt-get install --yes openjdk-7-jdk
>
> # JAVA_HOME needed by chef scripts
> export JAVA_HOME="/usr/lib/jvm/$(ls -tr /usr/lib/jvm/ | tail -1)"
> $JAVA_HOME/bin/javac -version
>
> # TODO: check that df . is large enough.
> df -h .
>
> IMPALA_LOCATION=Impala
>
> cd "/home/$(whoami)"
>
> git clone "${IMPALA_REPO_URL}" "${IMPALA_LOCATION}"
> cd "${IMPALA_LOCATION}"
> git checkout "${IMPALA_REPO_BRANCH}"
> GIT_LOG_FILE=$(mktemp)
> git log --pretty=oneline >"${GIT_LOG_FILE}"
> head "${GIT_LOG_FILE}"
>
> ./bin/bootstrap_development.sh
>


Re: Podling Report Reminder - August 2017

2017-07-29 Thread Jim Apple
Done.

On Sun, Jul 23, 2017 at 6:29 PM, Jim Apple <jbap...@cloudera.com> wrote:

> Here is my draft report. Any comments?
>
> Impala is a high-performance C++ and Java SQL query engine for data stored
> in
> Apache Hadoop-based clusters.
>
> Impala has been incubating since 2015-12-03.
>
> Three most important issues to address in the move towards graduation:
>
>  1. Growth of the developer community
>  2.
>  3.
>
> Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
> aware of?
>
>  No
>
> How has the community developed since the last report?
>
>  There have been 268 Commits:
>git log --format='%ci' | grep -cE '2017-0(5|6|7)'
>
>  51 of those commits were by non-committers:
>git log --format='%ae %ci' | grep -E '2017-0(5|6|7)' | cut -d ' ' -f 1
> | sort | uniq -c | sort -n
>
>  There are two new PPMC members:
>https://lists.apache.org/list.html?dev@impala.apache.
> org:dfr=2017-2-1|dto=2017-4-30:%22has%20invited%22
> <https://lists.apache.org/list.html?d...@impala.apache.org:dfr=2017-2-1%7Cdto=2017-4-30:%22has%20invited%22>
>
> Impala has done a third release with a second release manager. Two CVEs
> were issued, our first ones under the Apache security guidelines.
>
> How has the project developed since the last report?
>
> There have been big changes to the buffer pool, as outlined in
> https://lists.apache.org/thread.html/f573698455bf2ff9ac2073c778802d
> 0d5c9f3c8be43ede80614259cb@%3Cdev.impala.apache.org%3E . There have also
> been big changes landing to the RPC layer to improve scalability. Impala
> now has TABLESAMPLE to allow running queries on only a small percentage of
> the table for experimenting with queries quickly, and it now works on ADLS.
>
> How would you assess the podling's maturity?
> Please feel free to add your own commentary.
>
>  [ ] Initial setup
>  [ ] Working towards first release
>  [X] Community building
>  [X] Nearing graduation
>  [ ] Other:
>
>  Once the developer community has grown a bit, Impala will be ready
>  to contemplate graduation.
>
> Date of last release:
>
>  2017-06-16
>
> When were the last committers or PPMC members elected?
>
>  2017-07-17
>
> Signed-off-by:
>
>  [ ](impala) Tom White
> Comments:
>  [ ](impala) Todd Lipcon
> Comments:
>  [ ](impala) Carl Steinbach
> Comments:
>  [ ](impala) Brock Noland
> Comments:
>
> On Sun, Jul 23, 2017 at 5:07 PM, <johndam...@apache.org> wrote:
>
>> Dear podling,
>>
>> This email was sent by an automated system on behalf of the Apache
>> Incubator PMC. It is an initial reminder to give you plenty of time to
>> prepare your quarterly board report.
>>
>> The board meeting is scheduled for Wed, 16 August 2017, 10:30 am PDT.
>> The report for your podling will form a part of the Incubator PMC
>> report. The Incubator PMC requires your report to be submitted 2 weeks
>> before the board meeting, to allow sufficient time for review and
>> submission (Wed, August 02).
>>
>> Please submit your report with sufficient time to allow the Incubator
>> PMC, and subsequently board members to review and digest. Again, the
>> very latest you should submit your report is 2 weeks prior to the board
>> meeting.
>>
>> Thanks,
>>
>> The Apache Incubator PMC
>>
>> Submitting your Report
>>
>> --
>>
>> Your report should contain the following:
>>
>> *   Your project name
>> *   A brief description of your project, which assumes no knowledge of
>> the project or necessarily of its field
>> *   A list of the three most important issues to address in the move
>> towards graduation.
>> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
>> aware of
>> *   How has the community developed since the last report
>> *   How has the project developed since the last report.
>> *   How does the podling rate their own maturity.
>>
>> This should be appended to the Incubator Wiki page at:
>>
>> https://wiki.apache.org/incubator/August2017
>>
>> Note: This is manually populated. You may need to wait a little before
>> this page is created from a template.
>>
>> Mentors
>> ---
>>
>> Mentors should review reports for their project(s) and sign them off on
>> the Incubator wiki page. Signing off reports shows that you are
>> following the project - projects that are not signed may raise alarms
>> for the Incubator PMC.
>>
>> Incubator PMC
>>
>
>


Re: IMPALA-5702 - disable shared linking on jenkins?

2017-07-24 Thread Jim Apple
I meant the from-scratch on Ubuntu 14.04 job. I've started an ASAN build:

https://jenkins.impala.io/view/Utility/job/ubuntu-14.04-from-scratch/1764/

On Mon, Jul 24, 2017 at 5:52 PM, Henry Robinson <he...@apache.org> wrote:

> Could you point me to the failing job? I couldn't see it obviously on
> https://jenkins.impala.io/.
>
> On 24 July 2017 at 17:42, Jim Apple <jbap...@cloudera.com> wrote:
>
> > Yes, ASAN in the current 1404 job fails with something about linking. I
> > haven't got around to investigating in detail.
> >
> > On Mon, Jul 24, 2017 at 1:39 PM, Todd Lipcon <t...@cloudera.com> wrote:
> >
> > > Is it possible that the issue here is due to a "one definition rule"
> > > violation? eg something like
> > > https://github.com/google/sanitizers/wiki/
> AddressSanitizerOneDefinitionR
> > > uleViolation
> > > Another similar thing is described here:
> > > https://github.com/google/sanitizers/wiki/
> AddressSanitizerInitialization
> > > OrderFiasco
> > >
> > > ASAN with the appropriate flags might help expose if one of the above
> is
> > > related.
> > >
> > > I wonder whether it is a kind of coincidence that it is fine in a
> static
> > > build but causes problems in dynamic, and at some point the static link
> > > order may slightly shift, causing another new subtle bug.
> > >
> > >
> > >
> > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson <he...@apache.org>
> > wrote:
> > >
> > > > We've started seeing isolated incidences of IMPALA-5702 during GVOs,
> > > where
> > > > a custom cluster test fails by throwing an exception during locale
> > > > handling.
> > > >
> > > > I've been able to reproduce this locally, but only with shared
> linking
> > > > enabled (which makes sense since the issue is symptomatic of a global
> > > c'tor
> > > > not getting called the right number of times).
> > > >
> > > > It's probable that my patch for IMPALA-5659 exposed this (since it
> > > forced a
> > > > more correct linking strategy for thirdparty libraries when dynamic
> > > linking
> > > > was enabled), but it looks to me at first glance like there were
> latent
> > > > dynamic linking bugs that we weren't getting hit by. Fixing
> IMPALA-5702
> > > > will probably take a while, and I don't think we should hold up GVOs
> or
> > > put
> > > > them at risk.
> > > >
> > > > So there are two options:
> > > >
> > > > 1. Revert IMPALA-5659
> > > >
> > > > 2. Switch GVO to static linking
> > > >
> > > > IMPALA-5659 is important to commit the kudu util library, which is
> > needed
> > > > for the KRPC work. Without it, shared linking doesn't work *at all*
> > when
> > > > the kudu util library is committed.
> > > >
> > > > Static linking doesn't take much longer in my unscientific
> > measurements,
> > > > and is closer to how Impala is actually used. In the interest of
> > forward
> > > > progress I'd like to try switching ubuntu-14.04-from-scratch to use
> > > static
> > > > linking while I work on IMPALA-5702.
> > > >
> > > > What does everyone else think?
> > > >
> > > > Henry
> > > >
> > >
> > >
> > >
> > > --
> > > Todd Lipcon
> > > Software Engineer, Cloudera
> > >
> >
>


Re: IMPALA-5702 - disable shared linking on jenkins?

2017-07-24 Thread Jim Apple
Got it - thanks for the clarification!

Also, I think I was unclear in my stated concern for new contributors. It
seems to me that new contributors could choose to use the -so flag, even if
the official pre-merge jobs doesn't, but that there is a cost to diverging
from the pre-merge job in that it is hard to know what is to blame if your
pre-merge job fails.

On Mon, Jul 24, 2017 at 5:46 PM, Henry Robinson <he...@apache.org> wrote:

> On 24 July 2017 at 17:43, Jim Apple <jbap...@cloudera.com> wrote:
>
> > On Mon, Jul 24, 2017 at 5:08 PM, Henry Robinson <he...@apache.org>
> wrote:
> >
> > > On 24 July 2017 at 17:04, Jim Apple <jbap...@cloudera.com> wrote:
> > >
> > > > I had anticipated that shared linking would save time and disk space,
> > but
> > > > it sounds like, from your testing, it doesn't save much time. Does it
> > > save
> > > > disk space?
> > > >
> > >
> > > I haven't measured but I would expect not. Do we need to be very
> careful
> > > about disk space in the current configuration?
> > >
> >
> > I don't think so, but since we are trying to entice new community members
> > to commit patches, I am concerned about the cost on developer machines.
> >
> >
> > >
> > >
> > > >
> > > > Does static linking save time when compiling incremental changes?
> > > >
> > >
> > > Again, I haven't measured.
> > >
> >
> >
> > I'm confused. You said, "Static linking doesn't take much longer in my
> > unscientific measurements".
> >
>
> I am also confused. I spoke about end-to-end builds on
> ubuntu-14.04-from-scratch. I haven't measured incremental changes, unless
> they're covered by that build.
>


Re: IMPALA-5702 - disable shared linking on jenkins?

2017-07-24 Thread Jim Apple
Yes, ASAN in the current 1404 job fails with something about linking. I
haven't got around to investigating in detail.

On Mon, Jul 24, 2017 at 1:39 PM, Todd Lipcon  wrote:

> Is it possible that the issue here is due to a "one definition rule"
> violation? eg something like
> https://github.com/google/sanitizers/wiki/AddressSanitizerOneDefinitionR
> uleViolation
> Another similar thing is described here:
> https://github.com/google/sanitizers/wiki/AddressSanitizerInitialization
> OrderFiasco
>
> ASAN with the appropriate flags might help expose if one of the above is
> related.
>
> I wonder whether it is a kind of coincidence that it is fine in a static
> build but causes problems in dynamic, and at some point the static link
> order may slightly shift, causing another new subtle bug.
>
>
>
> On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson  wrote:
>
> > We've started seeing isolated incidences of IMPALA-5702 during GVOs,
> where
> > a custom cluster test fails by throwing an exception during locale
> > handling.
> >
> > I've been able to reproduce this locally, but only with shared linking
> > enabled (which makes sense since the issue is symptomatic of a global
> c'tor
> > not getting called the right number of times).
> >
> > It's probable that my patch for IMPALA-5659 exposed this (since it
> forced a
> > more correct linking strategy for thirdparty libraries when dynamic
> linking
> > was enabled), but it looks to me at first glance like there were latent
> > dynamic linking bugs that we weren't getting hit by. Fixing IMPALA-5702
> > will probably take a while, and I don't think we should hold up GVOs or
> put
> > them at risk.
> >
> > So there are two options:
> >
> > 1. Revert IMPALA-5659
> >
> > 2. Switch GVO to static linking
> >
> > IMPALA-5659 is important to commit the kudu util library, which is needed
> > for the KRPC work. Without it, shared linking doesn't work *at all* when
> > the kudu util library is committed.
> >
> > Static linking doesn't take much longer in my unscientific measurements,
> > and is closer to how Impala is actually used. In the interest of forward
> > progress I'd like to try switching ubuntu-14.04-from-scratch to use
> static
> > linking while I work on IMPALA-5702.
> >
> > What does everyone else think?
> >
> > Henry
> >
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: IMPALA-5702 - disable shared linking on jenkins?

2017-07-24 Thread Jim Apple
On Mon, Jul 24, 2017 at 5:08 PM, Henry Robinson <he...@apache.org> wrote:

> On 24 July 2017 at 17:04, Jim Apple <jbap...@cloudera.com> wrote:
>
> > I had anticipated that shared linking would save time and disk space, but
> > it sounds like, from your testing, it doesn't save much time. Does it
> save
> > disk space?
> >
>
> I haven't measured but I would expect not. Do we need to be very careful
> about disk space in the current configuration?
>

I don't think so, but since we are trying to entice new community members
to commit patches, I am concerned about the cost on developer machines.


>
>
> >
> > Does static linking save time when compiling incremental changes?
> >
>
> Again, I haven't measured.
>


I'm confused. You said, "Static linking doesn't take much longer in my
unscientific measurements".


Re: IMPALA-5702 - disable shared linking on jenkins?

2017-07-24 Thread Jim Apple
I had anticipated that shared linking would save time and disk space, but
it sounds like, from your testing, it doesn't save much time. Does it save
disk space?

Does static linking save time when compiling incremental changes?

On Mon, Jul 24, 2017 at 4:51 PM, Henry Robinson  wrote:

> :) I agree - we should also track any known breaks to shared linking in a
> best effort fashion because it's so useful to some dev workflows.
>
> On 24 July 2017 at 16:49, Tim Armstrong  wrote:
>
> > I vote for changing Jenkins' linking strategy now and not changing it
> back
> > :). Static linking is the blessed configuration so I think we should be
> > running tests with that primarily.
> >
> > On Mon, Jul 24, 2017 at 4:34 PM, Henry Robinson 
> wrote:
> >
> > > On 24 July 2017 at 13:58, Todd Lipcon  wrote:
> > >
> > > > On Mon, Jul 24, 2017 at 1:47 PM, Henry Robinson 
> > > wrote:
> > > >
> > > > > Thanks for the asan pointer - I'll give it a go.
> > > > >
> > > > > My understanding of linking isn't deep, but my working theory has
> > been
> > > > that
> > > > > the complications have been caused by glog getting linked twice -
> > once
> > > > > statically (possibly into libkudu.so), and once dynamically (via
> > > everyone
> > > > > else).
> > > > >
> > > >
> > > > In libkudu_client.so, we use a linker script to ensure that we don't
> > leak
> > > > glog/gflags/etc symbols. Those are all listed as 'local' in
> > > > src/kudu/client/symbols.map. We also have a unit test
> > > > 'client_symbol-test.sh' which uses nm to dump the list of symbols and
> > > make
> > > > sure that they all non-local non-weak symbols are under the 'kudu::'
> > > > namespace.
> > > >
> > > > So it's possible that something's getting linked twice but I'd be
> > > somewhat
> > > > surprised if it's from the Kudu client.
> > > >
> > > >
> > > Good to know, thanks.
> > >
> > > ASAN hasn't turned up anything yet - so does anyone have an opinion
> about
> > > changing Jenkins' linking strategy for now?
> > >
> > >
> > > > -Todd
> > > >
> > > >
> > > > >
> > > > > I would think that could lead to one or both of the issues you
> linked
> > > to.
> > > > >
> > > > >
> > > > > On 24 July 2017 at 13:39, Todd Lipcon  wrote:
> > > > >
> > > > > > Is it possible that the issue here is due to a "one definition
> > rule"
> > > > > > violation? eg something like
> > > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerOn
> > > > > > eDefinitionRuleViolation
> > > > > > Another similar thing is described here:
> > > > > > https://github.com/google/sanitizers/wiki/AddressSanitizerIn
> > > > > > itializationOrderFiasco
> > > > > >
> > > > > > ASAN with the appropriate flags might help expose if one of the
> > above
> > > > is
> > > > > > related.
> > > > > >
> > > > > > I wonder whether it is a kind of coincidence that it is fine in a
> > > > static
> > > > > > build but causes problems in dynamic, and at some point the
> static
> > > link
> > > > > > order may slightly shift, causing another new subtle bug.
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson <
> he...@apache.org>
> > > > > wrote:
> > > > > >
> > > > > > > We've started seeing isolated incidences of IMPALA-5702 during
> > > GVOs,
> > > > > > where
> > > > > > > a custom cluster test fails by throwing an exception during
> > locale
> > > > > > > handling.
> > > > > > >
> > > > > > > I've been able to reproduce this locally, but only with shared
> > > > linking
> > > > > > > enabled (which makes sense since the issue is symptomatic of a
> > > global
> > > > > > c'tor
> > > > > > > not getting called the right number of times).
> > > > > > >
> > > > > > > It's probable that my patch for IMPALA-5659 exposed this (since
> > it
> > > > > > forced a
> > > > > > > more correct linking strategy for thirdparty libraries when
> > dynamic
> > > > > > linking
> > > > > > > was enabled), but it looks to me at first glance like there
> were
> > > > latent
> > > > > > > dynamic linking bugs that we weren't getting hit by. Fixing
> > > > IMPALA-5702
> > > > > > > will probably take a while, and I don't think we should hold up
> > > GVOs
> > > > or
> > > > > > put
> > > > > > > them at risk.
> > > > > > >
> > > > > > > So there are two options:
> > > > > > >
> > > > > > > 1. Revert IMPALA-5659
> > > > > > >
> > > > > > > 2. Switch GVO to static linking
> > > > > > >
> > > > > > > IMPALA-5659 is important to commit the kudu util library, which
> > is
> > > > > needed
> > > > > > > for the KRPC work. Without it, shared linking doesn't work *at
> > all*
> > > > > when
> > > > > > > the kudu util library is committed.
> > > > > > >
> > > > > > > Static linking doesn't take much longer in my unscientific
> > > > > measurements,
> > > > > > > and is closer to how Impala is actually used. In the interest
> of
> > > > > forward
> > > > > > > progress 

Re: Podling Report Reminder - August 2017

2017-07-23 Thread Jim Apple
Here is my draft report. Any comments?

Impala is a high-performance C++ and Java SQL query engine for data stored
in
Apache Hadoop-based clusters.

Impala has been incubating since 2015-12-03.

Three most important issues to address in the move towards graduation:

 1. Growth of the developer community
 2.
 3.

Any issues that the Incubator PMC (IPMC) or ASF Board wish/need to be
aware of?

 No

How has the community developed since the last report?

 There have been 268 Commits:
   git log --format='%ci' | grep -cE '2017-0(5|6|7)'

 51 of those commits were by non-committers:
   git log --format='%ae %ci' | grep -E '2017-0(5|6|7)' | cut -d ' ' -f 1 |
sort | uniq -c | sort -n

 There are two new PPMC members:

https://lists.apache.org/list.html?d...@impala.apache.org:dfr=2017-2-1|dto=2017-4-30:%22has%20invited%22

Impala has done a third release with a second release manager. Two CVEs
were issued, our first ones under the Apache security guidelines.

How has the project developed since the last report?

There have been big changes to the buffer pool, as outlined in
https://lists.apache.org/thread.html/f573698455bf2ff9ac2073c778802d0d5c9f3c8be43ede80614259cb@%3Cdev.impala.apache.org%3E
. There have also been big changes landing to the RPC layer to improve
scalability. Impala now has TABLESAMPLE to allow running queries on only a
small percentage of the table for experimenting with queries quickly, and
it now works on ADLS.

How would you assess the podling's maturity?
Please feel free to add your own commentary.

 [ ] Initial setup
 [ ] Working towards first release
 [X] Community building
 [X] Nearing graduation
 [ ] Other:

 Once the developer community has grown a bit, Impala will be ready
 to contemplate graduation.

Date of last release:

 2017-06-16

When were the last committers or PPMC members elected?

 2017-07-17

Signed-off-by:

 [ ](impala) Tom White
Comments:
 [ ](impala) Todd Lipcon
Comments:
 [ ](impala) Carl Steinbach
Comments:
 [ ](impala) Brock Noland
Comments:

On Sun, Jul 23, 2017 at 5:07 PM,  wrote:

> Dear podling,
>
> This email was sent by an automated system on behalf of the Apache
> Incubator PMC. It is an initial reminder to give you plenty of time to
> prepare your quarterly board report.
>
> The board meeting is scheduled for Wed, 16 August 2017, 10:30 am PDT.
> The report for your podling will form a part of the Incubator PMC
> report. The Incubator PMC requires your report to be submitted 2 weeks
> before the board meeting, to allow sufficient time for review and
> submission (Wed, August 02).
>
> Please submit your report with sufficient time to allow the Incubator
> PMC, and subsequently board members to review and digest. Again, the
> very latest you should submit your report is 2 weeks prior to the board
> meeting.
>
> Thanks,
>
> The Apache Incubator PMC
>
> Submitting your Report
>
> --
>
> Your report should contain the following:
>
> *   Your project name
> *   A brief description of your project, which assumes no knowledge of
> the project or necessarily of its field
> *   A list of the three most important issues to address in the move
> towards graduation.
> *   Any issues that the Incubator PMC or ASF Board might wish/need to be
> aware of
> *   How has the community developed since the last report
> *   How has the project developed since the last report.
> *   How does the podling rate their own maturity.
>
> This should be appended to the Incubator Wiki page at:
>
> https://wiki.apache.org/incubator/August2017
>
> Note: This is manually populated. You may need to wait a little before
> this page is created from a template.
>
> Mentors
> ---
>
> Mentors should review reports for their project(s) and sign them off on
> the Incubator wiki page. Signing off reports shows that you are
> following the project - projects that are not signed may raise alarms
> for the Incubator PMC.
>
> Incubator PMC
>


To code is human, to review divine.

2017-07-20 Thread Jim Apple
To newer Impala contributors out there:

Thank you for contributing! I hope you continue contributing!

To increase your understanding of Impala and make your next contribution
easier, you could consider investing time in code reviews! You can watch
existing reviews at
http://mail-archives.apache.org/mod_mbox/incubator-impala-reviews/ or
https://lists.apache.org/list.html?revi...@impala.apache.org or
https://gerrit.cloudera.org/#/q/project:Impala-ASF

I find reviewing code is very educational for me. If there is a change I
want to do but don't have the time or expertise for, I try to follow the
review of that code when someone else gets around to writing the change.

High quality reviewing is also one path towards committership.


New Impala PPMC member: Michael Brown

2017-07-17 Thread Jim Apple
The Podling Project Management Committee (PPMC) for Apache Impala
(incubating) has invited Michael Brown to become a PPMC member and we are
pleased to announce that they have accepted.

Congratulations and welcome, Michael!


Re: Loading tpc-ds

2017-07-13 Thread Jim Apple
I also see this with the Oracle JDK. I have also now checked I am not
running out of memory.

Oracle JDK7 is harder to get one's hands on, and OpenJDK7 isn't packaged by
canonical for Ubuntu 16.04.

On Wed, Jul 12, 2017 at 11:20 PM, Jim Apple <jbap...@cloudera.com> wrote:

> I'm getting data loading errors on Ubuntu 16.04 in TPC-DS. The terminal
> shows:
>
> ERROR : FAILED: Execution Error, return code 2 from
> org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>
> logs/cluster/hive/hive.log shows the error below, which previous bugs have
> called an issue with the disk being out of space, but my disk has at least
> 45GB left on it
>
> IMPALA-3246, IMPALA-2856, IMPALA-2617
>
> I see this with openJDK8. I haven't tried Oracle's JDK yet.
>
> Has anyone else seen this and been able to diagnose it as something that
> doesn't mean a full disk?
>
>
> FATAL ExecReducer (ExecReducer.java:reduce(264)) -
> org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
> while processing row (tag=0) {"key":{},"value":{"_col0":
> 48147,"_col1":17805,"_col2":27944,"_col3":606992,"_col4":
> 3193,"_col5":16641,"_col6":10,"_col7":209,"_col8":44757,"_
> col9":20,"_col10":5.51,"_col11":9.36,"_col12":9.17,"_
> col13":0,"_col14":183.4,"_col15":110.2,"_col16":187.2,"_
> col17":3.66,"_col18":0,"_col19":183.4,"_col20":187.06,"
> _col21":73.2,"_col22":2452013}}
> at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(
> ExecReducer.java:253)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(
> ReduceTask.java:444)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.LocalJobRunner$Job$
> ReduceTaskRunnable.run(LocalJobRunner.java:346)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1142)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:617)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> /test-warehouse/tpcds.store_sales/.hive-staging_hive_2017-
> 07-12_22-51-18_139_3687815919405186455-760/_task_
> tmp.-ext-1/ss_sold_date_sk=2452013/_tmp.01_0 could only be
> replicated to 0 nodes instead of minReplication (=1).  There are 3
> datanode(s) running and no node(s) are excluded in this operation.
> at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.
> chooseTarget4NewBlock(BlockManager.java:1724)
> at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> getAdditionalBlock(FSNamesystem.java:3385)
> at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> addBlock(NameNodeRpcServer.java:683)
> at org.apache.hadoop.hdfs.server.namenode.
> AuthorizationProviderProxyClientProtocol.addBlock(
> AuthorizationProviderProxyClientProtocol.java:214)
> at org.apache.hadoop.hdfs.protocolPB.
> ClientNamenodeProtocolServerSideTranslatorPB.addBlock(
> ClientNamenodeProtocolServerSideTranslatorPB.java:495)
> at org.apache.hadoop.hdfs.protocol.proto.
> ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(
> ClientNamenodeProtocolProtos.java)
> at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
> at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at org.apache.hadoop.security.UserGroupInformation.doAs(
> UserGroupInformation.java:1917)
> at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
>
> at org.apache.hadoop.hive.ql.exec.FileSinkOperator.
> processOp(FileSinkOperator.java:751)
> at org.apache.hadoop.hive.ql.exec.Operator.forward(
> Operator.java:815)
> at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(
> SelectOperator.java:84)
> at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(
> ExecReducer.java:244)
>


Loading tpc-ds

2017-07-13 Thread Jim Apple
I'm getting data loading errors on Ubuntu 16.04 in TPC-DS. The terminal
shows:

ERROR : FAILED: Execution Error, return code 2 from
org.apache.hadoop.hive.ql.exec.mr.MapRedTask

logs/cluster/hive/hive.log shows the error below, which previous bugs have
called an issue with the disk being out of space, but my disk has at least
45GB left on it

IMPALA-3246, IMPALA-2856, IMPALA-2617

I see this with openJDK8. I haven't tried Oracle's JDK yet.

Has anyone else seen this and been able to diagnose it as something that
doesn't mean a full disk?


FATAL ExecReducer (ExecReducer.java:reduce(264)) -
org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while
processing row (tag=0)
{"key":{},"value":{"_col0":48147,"_col1":17805,"_col2":27944,"_col3":606992,"_col4":3193,"_col5":16641,"_col6":10,"_col7":209,"_col8":44757,"_col9":20,"_col10":5.51,"_col11":9.36,"_col12":9.17,"_col13":0,"_col14":183.4,"_col15":110.2,"_col16":187.2,"_col17":3.66,"_col18":0,"_col19":183.4,"_col20":187.06,"_col21":73.2,"_col22":2452013}}
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:253)
at
org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$ReduceTaskRunnable.run(LocalJobRunner.java:346)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
/test-warehouse/tpcds.store_sales/.hive-staging_hive_2017-07-12_22-51-18_139_3687815919405186455-760/_task_tmp.-ext-1/ss_sold_date_sk=2452013/_tmp.01_0
could only be replicated to 0 nodes instead of minReplication (=1).  There
are 3 datanode(s) running and no node(s) are excluded in this operation.
at
org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:1724)
at
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:3385)
at
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:683)
at
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.addBlock(AuthorizationProviderProxyClientProtocol.java:214)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:495)
at
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1917)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)

at
org.apache.hadoop.hive.ql.exec.FileSinkOperator.processOp(FileSinkOperator.java:751)
at
org.apache.hadoop.hive.ql.exec.Operator.forward(Operator.java:815)
at
org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(SelectOperator.java:84)
at
org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(ExecReducer.java:244)


Re: Disabling all clang-tidy checks

2017-07-12 Thread Jim Apple
The clang-diagnostics are, IIRC, also enabled by the -W flags. You could
try turning all warnings off via compiler flags.

There is also a tool that auto-fixes clang-tidy warnings, but only some of
them, and I never got even that much to work :-/

On Wed, Jul 12, 2017 at 5:24 PM, Henry Robinson <he...@apache.org> wrote:

> That does not, for whatever reason, actually disable clang-diagnostic-*. I
> don't know why either :/
>
> On 12 July 2017 at 17:15, Jim Apple <jbap...@cloudera.com> wrote:
>
> > What about "diagnostic-henry-thinks-will-never-fire,-*,-clang-diagnosti
> > c-*"?
> >
> >
> >
> > On Wed, Jul 12, 2017 at 5:01 PM, Henry Robinson <he...@apache.org>
> wrote:
> >
> > > Has anyone found a way to disable all clang-tidy checks for a
> directory?
> > >
> > > I've tried a directory-specific .clang-tidy file with
> > >
> > > ---
> > > Checks: "-*"
> > >
> > > but that causes clang-tidy to exit with an error (because I didn't
> > > configure any checks). So I tried adding one check that I thought would
> > > never fire. But that silently re-enables a bunch of clang-diagnostic*
> > > checks that I don't want.
> > >
> > > This happens when running:
> > >
> > > git diff HEAD~1 |
> > >  "${IMPALA_TOOLCHAIN}/llvm-${IMPALA_LLVM_VERSION}/share/clan
> > > g/clang-tidy-diff.py"
> > > -clang-tidy-binary
> > > "${IMPALA_TOOLCHAIN}/llvm-${IMPALA_LLVM_VERSION}/bin/clang-tidy" -p 1
> > >
> > > per
> > > https://cwiki.apache.org/confluence/pages/viewpage.action?
> > pageId=65868536
> > >
> > > Any ideas? Am I running clang-tidy wrong?
> > >
> >
>


Re: Disabling all clang-tidy checks

2017-07-12 Thread Jim Apple
What about "diagnostic-henry-thinks-will-never-fire,-*,-clang-diagnostic-*"?



On Wed, Jul 12, 2017 at 5:01 PM, Henry Robinson  wrote:

> Has anyone found a way to disable all clang-tidy checks for a directory?
>
> I've tried a directory-specific .clang-tidy file with
>
> ---
> Checks: "-*"
>
> but that causes clang-tidy to exit with an error (because I didn't
> configure any checks). So I tried adding one check that I thought would
> never fire. But that silently re-enables a bunch of clang-diagnostic*
> checks that I don't want.
>
> This happens when running:
>
> git diff HEAD~1 |
>  "${IMPALA_TOOLCHAIN}/llvm-${IMPALA_LLVM_VERSION}/share/clan
> g/clang-tidy-diff.py"
> -clang-tidy-binary
> "${IMPALA_TOOLCHAIN}/llvm-${IMPALA_LLVM_VERSION}/bin/clang-tidy" -p 1
>
> per
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=65868536
>
> Any ideas? Am I running clang-tidy wrong?
>


Re: Can't start minicluster

2017-07-10 Thread Jim Apple
Using that command and then running a build without -noclean WFM. Thanks!

On Sun, Jul 9, 2017 at 9:10 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> Maybe the thrift be/generated-sources are out of sync with the source code?
>
> We had some kind of metastore scheme upgrade that caused the other one.
> Dimitris' instructions to fix them were:
>
> > To fix this without doing a full data reload, you can use the following
> command:
> > ${IMPALA_TOOLCHAIN}/cdh_components/hive-1.1.0-cdh5.13.0-
> SNAPSHOT/bin/schematool
> -upgradeSchema -dbType {type}
> > where type is one of 'postgres' or 'mysql', depending on your setup.
>
> On Sun, Jul 9, 2017 at 3:52 PM, Jim Apple <jbap...@cloudera.com> wrote:
>
> > I am getting the following message in FATAL when I try to start a
> > minicluster
> >
> > Check failed: _TImpalaQueryOptions_VALUES_TO_NAMES.size() ==
> > TImpalaQueryOptions::DEFAULT_JOIN_DISTRIBUTION_MODE + 1 (57 vs. 56)
> >
> > Any ideas what is going on? I was actually trying to buildall.sh
> > -format_metastore -format_sentry_policy_db because I was seeing messages
> > like the following (in hive.log) when I tried to start the minicluster:
> >
> >  org.postgresql.util.PSQLException: ERROR: column A0.SCHEMA_VERSION_V2
> > does
> > not exist
> >
>


Re: Impala Build issue

2017-07-10 Thread Jim Apple
For #1, I recommend one of the following:

1. Get access to Oracle JDK7 however you would normally do so (legally, of
course)

2. OR use OpenJDK7

8. OR use JDK8, either Oracle or OpenJDK

On Sun, Jul 9, 2017 at 1:31 PM, Suresh Pujari 
wrote:

> Hi Sir,
>
>I am receiving the below issues.
>
> 1. the JDK path
> "*http://download.oracle.com/otn-pub/java/jdk/7u75-b13/jdk-
> 7u75-linux-x64.tar.g
>  7u75-linux-x64.tar.g>z*"
> in the install.sh is incorrect because of which errors are coming.
>
> sudo ./install
>
> Compiled Resource:
> --
> # Declared in /impala-setup/cookbooks/java/recipes/oracle.rb:53:in
> `from_file'
>
> java_ark("jdk") do
>   action [:install]
>   supports {:report=>true, :exception=>true}
>   retries 0
>   retry_delay 2
>   default_guard_interpreter :default
>   declared_type :java_ark
>   cookbook_name :java
>   recipe_name "oracle"
>   url "
> http://download.oracle.com/otn-pub/java/jdk/7u75-b13/jdk-
> 7u75-linux-x64.tar.gz
> "
>   default true
>   checksum "6f1f81030a34f7a9c987f8b68a24d139"
>   app_home "/usr/lib/jvm/java-7-oracle-amd64"
>   bin_cmds ["appletviewer", "apt", "ControlPanel", "extcheck", "idlj",
> "jar", "jarsigner", "java", "javac", "javadoc", "javafxpackager", "javah",
> "javap", "javaws", "jcmd", "jconsole", "jcontrol", "jdb", "jhat", "jinfo",
> "jmap", "jps", "jrunscript", "jsadebugd", "jstack", "jstat", "jstatd",
> "jvisualvm", "keytool", "native2ascii", "orbd", "pack200", "policytool",
> "rmic", "rmid", "rmiregistry", "schemagen", "serialver", "servertool",
> "tnameserv", "unpack200", "wsgen", "wsimport", "xjc"]
>   alternatives_priority 1062
>   connect_timeout 600
>   owner "root"
> end
>
>
> Running handlers:
> [2017-07-10T00:28:41+04:00] ERROR: Running exception handlers
> Running handlers complete
> [2017-07-10T00:28:41+04:00] ERROR: Exception handlers complete
> [2017-07-10T00:28:41+04:00] FATAL: Stacktrace dumped to
> /mnt/DATA/impala-setup/chef-stacktrace.out
> Chef Client failed. 4 resources updated in 7.840529619 seconds
> [2017-07-10T00:28:41+04:00] ERROR: java_ark[jdk] (java::oracle line 53) had
> an error: SystemExit: exit
> [2017-07-10T00:28:41+04:00] FATAL: Chef::Exceptions::ChildConvergeError:
> Chef run process exited unsuccessfully (exit code 1)
>
> 2. ./bin/bootstrap_build.sh
>
> Linking CXX shared library libgutil.so
> [ 10%] Built target gutil
> make[1]: *** [be/src/service/CMakeFiles/impalad.dir/rule] Error 2
> make: *** [impalad] Error 2
> Error in /impala-2.9.0/bin/make_impala.sh at line 179: ${MAKE_CMD}
> ${MAKE_ARGS} ${MAKE_TARGETS}
>
> Please help.
>
> Regards
> Suresh
>


Lars Volker has joined the Impala PPMC

2017-07-03 Thread Jim Apple
The Podling Project Management Committee (PPMC) for Apache Impala
(incubating) has invited Lars Volker to become a PPMC member and we are
pleased to announce that they have accepted.

Congratulations and welcome, Lars!


  1   2   3   4   5   6   7   8   9   10   >