Re: thread_local compatible with other threading models?

2017-11-15 Thread Tim Armstrong
Pretty sure that boost::thread uses pthreads under the covers, so I think
the question is whether thread_local works with the lowest common
denoninator pthreads.

It sounds like thread_local uses an older mechanism __thread under the
covers that
https://sourceware.org/glibc/wiki/Destructor%20support%20for%20thread_local%20variables
but that there is some special handling for destructors. My guess is that
that will work with pthreads but we could check experimentally.

On Tue, Nov 14, 2017 at 9:21 AM, Jim Apple  wrote:

> A quick git grep shows use of both boost::thread and pthread. C++14 has a
> thread_local keyword:
>
> http://eel.is/c++draft/basic.stc.thread
>
> Do we know if the semantics of thread_local in C++14 are compatible with
> thread-locality in pthreads and boost::thread?
>


Re: Graduation resolution proposal

2017-11-09 Thread Tim Armstrong
Very exciting - thanks for the update Jim!


On Wed, Nov 8, 2017 at 8:56 PM, Jim Apple <jbap...@cloudera.com> wrote:

> We are now on step 3, in which the IPMC votes on the proposed graduation
> resolution:
>
> https://lists.apache.org/thread.html/4abfbf40b7d822cdc19421ea55de21
> f19ce70c4fd73c6f4c8cc98ce8@%3Cgeneral.incubator.apache.org%3E
>
> If it passes, the next step is a board resolution:
>
> http://incubator.apache.org/guides/graduation.html#
> submission_of_the_resolution_to_the_board
>
> On Tue, Oct 31, 2017 at 10:36 PM, Todd Lipcon <t...@cloudera.com> wrote:
>
> > Thanks Jim!
> >
> > -Todd
> >
> > On Tue, Oct 31, 2017 at 10:35 PM, Jim Apple <jbap...@cloudera.com>
> wrote:
> >
> > > I have sent this to general@ for discussion:
> > >
> > > https://lists.apache.org/thread.html/6b8598408f76a472532923c5a7fc51
> > > 0470b21671677ba3486568c57e@%3Cgeneral.incubator.apache.org%3E
> > >
> > > On Sat, Oct 28, 2017 at 8:12 AM, Jim Apple <jbap...@cloudera.com>
> wrote:
> > > > Below is a graduation resolution I would like to send to
> > > > general@incubator for discussion. It includes the PMC volunteers as
> > > > well as the result of the first PMC chair election, which was me.
> > > >
> > > > Unless there is objection, I'll send this to general@incubator for
> > > > discussion in a couple of days. If you want to participate in that
> > > > discussion at general@incubator, you can subscribe by emailing
> > > > general-subscr...@incubator.apache.org.
> > > >
> > > > As a reminder, the next steps I will take are:
> > > >
> > > > 1. Prepare a charter (i.e. this email)
> > > >
> > > > 2. Start a discussion on general@incubator.
> > > >
> > > > Should the discussion look mostly positive:
> > > >
> > > > 3. Call a vote on general@incubator.
> > > >
> > > > Should that vote succeed:
> > > >
> > > > 4. Submit the resolution to the ASF Board. See more here:
> > > > http://incubator.apache.org/guides/graduation.html
> > > >
> > > > 
> > > ---
> > > >
> > > > Establish the Apache Impala Project
> > > >
> > > > WHEREAS, the Board of Directors deems it to be in the best interests
> of
> > > > the Foundation and consistent with the Foundation's purpose to
> > establish
> > > > a Project Management Committee charged with the creation and
> > maintenance
> > > > of open-source software, for distribution at no charge to the public,
> > > > related to a high-performance distributed SQL engine.
> > > >
> > > > NOW, THEREFORE, BE IT RESOLVED, that a Project Management Committee
> > > > (PMC), to be known as the "Apache Impala Project", be and hereby is
> > > > established pursuant to Bylaws of the Foundation; and be it further
> > > >
> > > > RESOLVED, that the Apache Impala Project be and hereby is responsible
> > > > for the creation and maintenance of software related to a
> > > > high-performance distributed SQL engine; and be it further
> > > >
> > > > RESOLVED, that the office of "Vice President, Apache Impala" be and
> > > > hereby is created, the person holding such office to serve at the
> > > > direction of the Board of Directors as the chair of the Apache Impala
> > > > Project, and to have primary responsibility for management of the
> > > > projects within the scope of responsibility of the Apache Impala
> > > > Project; and be it further
> > > >
> > > > RESOLVED, that the persons listed immediately below be and hereby are
> > > > appointed to serve as the initial members of the Apache Impala
> Project:
> > > >
> > > >  * Alex Behm <ab...@apache.org>
> > > >  * Bharath Vissapragada  <bhara...@apache.org>
> > > >  * Brock Noland  <br...@apache.org>
> > > >  * Carl Steinbach<c...@apache.org>
> > > >  * Casey Ching   <ca...@apache.org>
> > > >  * Daniel Hecht  <dhe...@apache.org>
> > > >  * Dimitris Tsirogiannis <dtsirogian...@apache.org>
> > > >  * Henry Robinson<he...@apache.org>
> > > >  * Ishaan Joshi   

Re: long codegen time while codegen disabled

2017-11-08 Thread Tim Armstrong
This was cross-posted to a Cloudera forum:
http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/long-codegen-time-while-codegen-disabled/m-p/61635#M3812?eid=1=1,
where I gave the same answer that Mostafa gave.

If you cross-post to multiple related mailing lists and forums, can you
please at least link to the previous post so that people don't waste time
answering the same question twice? We're happy to help but don't appreciate
having our time wasted.

On Wed, Nov 8, 2017 at 8:29 AM, Mostafa Mokhtar 
wrote:

> From the profile codegen is disabled for HDFS_SCAN_NODE (id=8) and not the
> entire query.
> If you wish to disable codegen run "set disable_codegen=1;" before
> executing the query from impala-shell or add it to the connection string if
> using JDBC.
>
>HDFS_SCAN_NODE (id=8):(Total: 2.314ms, non-child: 2.314ms, %
> non-child: 100.00%)
>   Hdfs split stats (:<# splits>/):
> 2:1/14.28 KB
>   Hdfs Read Thread Concurrency Bucket: 0:0% 1:0% 2:0% 3:0%
>   File Formats: PARQUET/SNAPPY:3
>   ExecOption: Codegen enabled: 0 out of 1
>
>
> On a side not I recommend trying out a more recent version of Impala
> as a lot has improved since.
>
>
> On Wed, Nov 8, 2017 at 12:13 AM, chen  wrote:
>
> >  I have a query,tooks a long time on codegen:
> >
> >   CodeGen:(Total: 32m22s, non-child: 32m22s, % non-child: 100.00%)
> >  - CodegenTime: 0ns
> >  - CompileTime: 53.143ms
> >  - LoadTime: 58.680us
> >  - ModuleFileSize: 1.96 MB (2054956)
> >  - OptimizationTime: 32m22s
> >  - PrepareTime: 157.700ms
> >
> > but from the profile ,we can see that codegen is diabled for this query:
> >
> > ExecOption: Codegen enabled: 0 out of 1
> >
> > attached is the complete profile.
> >
> >
> > can anyone help to firgure out a way to bypass.
> >
> >
> > Chen
> >
> >
>


Re: Plan to support Date type?

2017-11-06 Thread Tim Armstrong
I think we'd really like to add it but so far nobody has invested the time
into doing it. It's a fairly big task because I think we would have to add
a new internal type to Impala, which requires changes across the codebase
and then some thought into adding builtins, etc.

That's my two cents. Other people have probably thought about it a lot more
than I have though!

- Tim

On Mon, Nov 6, 2017 at 12:38 AM, Quanlong Huang 
wrote:

> Hi all,
>
>
> AFAIK, Impala has not supported Date type, which describes a particular
> year/month/day, in the form -­MM-­DD, in Hive.
> However, Hive has supported Date type in Parquet two years ago in
> Hive-1.2.0. (See https://issues.apache.org/jira/browse/HIVE-8119 and
> https://cwiki.apache.org/confluence/display/Hive/Parquet#Parquet-
> VersionsandLimitations.) I know that we have Timestamp type which can
> totally cover the Date type. But we have huge legacy data in our Hive
> warehouse containing the Date type. It's inconvenient that each time we
> have to convert the data we need into types Impala supports.
>
>
> Do we have some discussion before about this? Is this blocked by the fact
> that the latest CDH still uses Hive-1.1.0? Finally, do we plan to support
> Date type?
>
> Thanks,
>
> Quanlong


Re: impala::Mutex

2017-10-31 Thread Tim Armstrong
The main case where we still use boost::mutex is when it's pair with a
condition variable - we don't have any condition variable that integrates
with SpinLock.

Mutex does provide some stronger fairness guarantees, I think, but afaik we
don't rely on that anywhere.

On Tue, Oct 31, 2017 at 9:54 AM, Daniel Hecht  wrote:

> We have impala::SpinLock, which works with lock_guard, unique_lock, etc.
> It's adaptive so it will block after spinning for a little bit, and seems
> to work well in most cases. It's built on gutil, which also has some
> tracing we could enable. Any reason not to stick with that?
>
> I haven't looked at Kudu's implementation in a while and am not opposed to
> using it if there's a good reason to switch.  But I think maybe we should
> first figure out how to better share code with Kudu so that we don't just
> continue to fork general purpose utility code (which is what we're
> currently doing).
>
> On Tue, Oct 31, 2017 at 9:47 AM, Zoltan Borok-Nagy <
> borokna...@cloudera.com>
> wrote:
>
> > Hi Everyone,
> >
> > I started to review the usage of synchronization primitives in Impala and
> > created this JIRA issue: https://issues.apache.org/
> jira/browse/IMPALA-6125
> >
> > After some chat with Tim, we talked about the possibilities of reducing
> the
> > dependence of boost. Since now we are using C++11, there are std::thread,
> > std::mutex, and std::condition_variable in the standard library.
> >
> > On the other hand, the standard library likes to use exceptions in case
> of
> > errors, and AFAIK we don't like exceptions in Impala. Also, we already
> have
> > utility classes like ConditionVariable in the code base. In the kudu/util
> > directory, there is a Mutex class which is more conform to the Impala
> code
> > conventions than boost and the standard library. It also checks the
> > ownership of the mutex in debug mode, which can be useful for detecting
> > bugs.
> >
> > Do you think it would be a good idea to create our own Mutex
> implementation
> > based on kudu/util/mutex?
> >
> > BR,
> > Zoltan
> >
>


Re: Unknown clang-tidy failure in GVO

2017-10-26 Thread Tim Armstrong
I think the actual failure was this:

*16:13:18* + grep ']' /home/ubuntu/tidylog.txt*16:13:18*
/home/ubuntu/Impala/be/src/rpc/thrift-server-test.cc:105:26: warning:
extra ';' after member function definition
[clang-diagnostic-extra-semi]*16:13:18*
/home/ubuntu/Impala/be/src/rpc/thrift-server-test.cc:106:29: warning:
extra ';' after member function definition
[clang-diagnostic-extra-semi]*16:13:18*
/home/ubuntu/Impala/be/src/rpc/thrift-server-test.cc:149:7: warning:
ignoring return value of function declared with 'nodiscard' attribute
[clang-diagnostic-unused-result]

I'm not sure if the hadoop-lzo build noise was related

- Tim

On Thu, Oct 26, 2017 at 12:57 PM, Sailesh Mukil 
wrote:

> Does anyone know the cause for this failure in the clang-tidy run in GVO?
> It's something to do with hadoop-lzo.
>
> https://jenkins.impala.io/job/clang-tidy-ub1604/78/consoleFull
>
>
> *15:52:44*   [javadoc] Generating
> /home/ubuntu/hadoop-lzo/build/docs/api/help-doc.html...*15:52:44*
> [javadoc] 1 error*15:52:44*   [javadoc] 1 warning*15:52:44*
> [javadoc] javadoc: error - Error while reading file
> /home/ubuntu/hadoop-lzo/src/java/overview.html*15:52:44* *15:52:44*
> package:*15:52:44* [mkdir] Created dir:
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15*15:52:44* [mkdir]
> Created dir: /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/lib*15:52:44*
> [mkdir] Created dir:
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/docs*15:52:44*
> [mkdir] Created dir: /home/ubuntu/hadoop-lzo/lib*15:52:44*  [copy]
> Copying 56 files to
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/lib*15:52:44*
> [copy] Copying 1 file to
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15*15:52:44*  [exec]
> Created /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/lib/native/
> docs*15:52:44*
>  [exec] Copying libraries in /docs to
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/lib/native/docs/*15:52:44*
>  [exec] Created
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/lib/native/
> hadoop-lzo-0.4.15.jar*15:52:44*
>  [exec] Copying libraries in /hadoop-lzo-0.4.15.jar to
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/lib/native/
> hadoop-lzo-0.4.15.jar/*15:52:44*
>  [exec] Created
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/lib/native/lib*15:52:44*
>  [exec] Copying libraries in /lib to
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/lib/native/lib/*15:52:44*
>  [exec] Created
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/lib/native/
> Linux-amd64-64*15:52:44*
>  [exec] Copying libraries in
> /home/ubuntu/hadoop-lzo/build/native/Linux-amd64-64/lib to
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/lib/native/
> Linux-amd64-64/*15:52:44*
>  [exec] /home/ubuntu/hadoop-lzo/src/native/packageNativeHadoop.sh:
> 44: cd: can't cd to /docs/*15:52:44*  [exec] tar:
> *gplcompression*: Cannot stat: No such file or directory*15:52:44*
>  [exec] tar: Exiting with failure status due to previous
> errors*15:52:44*  [exec]
> /home/ubuntu/hadoop-lzo/src/native/packageNativeHadoop.sh: 44: cd:
> can't cd to /hadoop-lzo-0.4.15.jar/*15:52:44*  [exec] tar:
> *gplcompression*: Cannot stat: No such file or directory*15:52:44*
>  [exec] tar: Exiting with failure status due to previous
> errors*15:52:44*  [exec] tar: *gplcompression*: Cannot stat: No
> such file or directory*15:52:44*  [exec] tar: Exiting with failure
> status due to previous errors*15:52:44*  [copy] Copying 77 files
> to /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/docs*15:52:44*
> [copy] Copying 1 file to
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15*15:52:44*  [copy]
> Copying 4 files to
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/ivy*15:52:44*
> [copy] Copying 1 file to
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15*15:52:44*  [copy]
> Copying 64 files to
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15/src*15:52:44*
> [copy] Copying 1 file to
> /home/ubuntu/hadoop-lzo/build/hadoop-lzo-0.4.15
>


Re: Parquet min/max statistics & null values

2017-10-26 Thread Tim Armstrong
Hi Bruno,
 Could you provide an example of the specific predicates that aren't being
used to successfully skip the row group?

- Tim

On Thu, Oct 26, 2017 at 7:21 AM, Jeszy  wrote:

> Hello Bruno,
>
> Thanks for bringing this up. While not apparent from the commit
> comments, this limitation was mentioned during the code review:
> 'min/max are only set when there are non-null values, so we don't
> consider statistics for "is null".' (see
> https://gerrit.cloudera.org/#/c/6147/).
> It looks to me that this was intended, but I'll let others confirm.
> Definitely a point where we can improve.
>
> Thanks!
>
> On 26 October 2017 at 08:02, Bruno Quinart  wrote:
> > Hi all
> >
> > With IMPALA-2328, Parquet row group statistics are now being used to skip
> > the row group completely if the min/max range is excluded from the
> > predicate.
> > We have a use case in which we make sure the data is sorted on a 'key'
> and
> > have then many selective queries on that 'key' field. We notice a
> > significant performance increase.
> > So thanks a lot for all the work on that!
> >
> > One thing we notice is an unexpected behavior for records where that
> 'key'
> > has null values. It seems that as soon as null values are present in a
> row
> > group, the test on the min/max fails and the row group is read.
> >
> > We work with Impala 2.9. The data is put in parquet files by Impala
> itself.
> > We have noticed this effect for both bigint as decimal fields. Note that
> > it's difficult for me to extract the min/max statistics from the parquet
> > files. The parquet-tools included in our distribution (5.12) is not the
> > latest. And I was told PARQUET-327 would anyway not print the those row
> > group stats because of the way Impala stores them.
> > We do confirm the expected behavior (exactly one row group read for
> properly
> > sorted data) when we create a similar table but explicitly filter out all
> > null values for that 'key' field. We also notice that the the number of
> row
> > groups read (but zero records retained) is proportional to the number of
> > null values.
> >
> > Is this behavior expected?
> > Is there a fundamental reason those row groups can not be skipped?
> >
> > Thanks!
> > Bruno
> >
>


Re: Please hold off merging new code changes

2017-10-26 Thread Tim Armstrong
It looks like tests are more stable for me now so I think we can continue
merging changes. Please put in as much effort as reasonable to make sure
that your changes don't add to test flakiness.

- Tim

On Wed, Oct 25, 2017 at 2:51 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> A few recent changes have destabilised a lot of tests - see IMPALA-6106
> and IMPALA-6108 in particular. Until those are sorted out let's avoid
> merging any code changes that don't fix flaky or broken tests - it will
> just make it more difficult to root-cause any further failures.
>
> I'll send out another email once things are looking more stable.
>
> Thanks,
> Tim
>


Please hold off merging new code changes

2017-10-25 Thread Tim Armstrong
A few recent changes have destabilised a lot of tests - see IMPALA-6106 and
IMPALA-6108 in particular. Until those are sorted out let's avoid merging
any code changes that don't fix flaky or broken tests - it will just make
it more difficult to root-cause any further failures.

I'll send out another email once things are looking more stable.

Thanks,
Tim


Re: Does this mem-tracker.h assertion ring a bell?

2017-10-25 Thread Tim Armstrong
Will you file a JIRA for this bug? Sounds like something we don't want to
lose track of.

- Tim

On Wed, Oct 25, 2017 at 11:05 AM, Philip Zeyliger <phi...@cloudera.com>
wrote:

> Thanks. I'm beginning to think my patch is not causing these breakages. A
> different run was almost clean, with some TPC-DS query tests failing, that
> I think are also new.
>
> -- Philip
>
> On Wed, Oct 25, 2017 at 10:36 AM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
> > Yeah it's probably another consequence of
> > https://issues.apache.org/jira/browse/IMPALA-5789. Maybe your patch
> > changed
> > the timing enough to trigger it.
> >
> > I think the bug might be related to using directory.capacity() as the
> > argument to Release(). Calling directory.clear() after releasing the
> memory
> > in FitlerState::Disable() won't necessarily deallocate the memory so we
> > could end up releasing it twice.
> >
> > On Wed, Oct 25, 2017 at 10:11 AM, Mostafa Mokhtar <mmokh...@cloudera.com
> >
> > wrote:
> >
> > > Maybe related to https://issues.apache.org/jira/browse/IMPALA-6099?
> > >
> > > On Wed, Oct 25, 2017 at 10:02 AM, Philip Zeyliger <phi...@cloudera.com
> >
> > > wrote:
> > >
> > > > Hi folks,
> > > >
> > > > I'm debugging some test failures related to an LLVM/AvroCodegen patch
> > > I've
> > > > got going on. The failures are in the parallel EE tests, and most of
> > them
> > > > are complaining that Impala is out to lunch. It looks like the
> > following
> > > > assertion is firing, causing an impalad to fail, causing many tests
> to
> > > > start failing. (I've also got a minidump, but the build was on
> > > > jenkins.impala.io, so I don't think I have the symbols/binaries to
> use
> > > > it.)
> > > >
> > > > If this sort of thing rings a bell for anyone, please holler!
> > > >
> > > > Obviously I'll work on reproducing this locally to figure it out.
> > > >
> > > > F1025 02:20:43.786911 82485 mem-tracker.h:231] Check failed:
> > > > tracker->consumption_->current_value() >= 0 (-1052615 vs. 0)
> > > > Runtime Filter (Coordinator): Total=-1.00 MB Peak=1.00 MB
> > > > *** Check failure stack trace: ***
> > > > @  0x2f1e11d  google::LogMessage::Fail()
> > > > @  0x2f1f9c2  google::LogMessage::SendToLog()
> > > > @  0x2f1daf7  google::LogMessage::Flush()
> > > > @  0x2f210be  google::LogMessageFatal::~
> LogMessageFatal()
> > > > @  0x17425fb  impala::MemTracker::Release()
> > > > @  0x1fa7e8b  impala::Coordinator::UpdateFilter()
> > > > @  0x186e3cf  impala::ImpalaServer::UpdateFilter()
> > > > @  0x18d824f  impala::ImpalaInternalService:
> > :UpdateFilter()
> > > > @  0x1dda35a
> > > > impala::ImpalaInternalServiceProcessor::process_UpdateFilter()
> > > > @  0x1dd8308
> > > > impala::ImpalaInternalServiceProcessor::dispatchCall()
> > > > @  0x15410ea  apache::thrift::
> > TDispatchProcessor::process()
> > > > @  0x171042b
> > > > apache::thrift::server::TAcceptQueueServer::Task::run()
> > > > @  0x170c307  impala::ThriftThread::RunRunnable()
> > > > @  0x170da13  boost::_mfi::mf2<>::operator()()
> > > > @  0x170d8a9  boost::_bi::list3<>::operator()<>()
> > > > @  0x170d5f5  boost::_bi::bind_t<>::operator()()
> > > > @  0x170d508
> > > > boost::detail::function::void_function_obj_invoker0<>::invoke()
> > > > @  0x171bdfc  boost::function0<>::operator()()
> > > > @  0x19f3393  impala::Thread::SuperviseThread()
> > > > @  0x19fbf26  boost::_bi::list4<>::operator()<>()
> > > > @  0x19fbe69  boost::_bi::bind_t<>::operator()()
> > > > @  0x19fbe2c  boost::detail::thread_data<>::run()
> > > > @  0x20a7c9a  thread_proxy
> > > > @ 0x7fe6536186ba  start_thread
> > > > @ 0x7fe65334e3dd  clone
> > > > r.java:81)
> > > >
> > >
> >
>


Re: Does this mem-tracker.h assertion ring a bell?

2017-10-25 Thread Tim Armstrong
Yeah it's probably another consequence of
https://issues.apache.org/jira/browse/IMPALA-5789. Maybe your patch changed
the timing enough to trigger it.

I think the bug might be related to using directory.capacity() as the
argument to Release(). Calling directory.clear() after releasing the memory
in FitlerState::Disable() won't necessarily deallocate the memory so we
could end up releasing it twice.

On Wed, Oct 25, 2017 at 10:11 AM, Mostafa Mokhtar 
wrote:

> Maybe related to https://issues.apache.org/jira/browse/IMPALA-6099?
>
> On Wed, Oct 25, 2017 at 10:02 AM, Philip Zeyliger 
> wrote:
>
> > Hi folks,
> >
> > I'm debugging some test failures related to an LLVM/AvroCodegen patch
> I've
> > got going on. The failures are in the parallel EE tests, and most of them
> > are complaining that Impala is out to lunch. It looks like the following
> > assertion is firing, causing an impalad to fail, causing many tests to
> > start failing. (I've also got a minidump, but the build was on
> > jenkins.impala.io, so I don't think I have the symbols/binaries to use
> > it.)
> >
> > If this sort of thing rings a bell for anyone, please holler!
> >
> > Obviously I'll work on reproducing this locally to figure it out.
> >
> > F1025 02:20:43.786911 82485 mem-tracker.h:231] Check failed:
> > tracker->consumption_->current_value() >= 0 (-1052615 vs. 0)
> > Runtime Filter (Coordinator): Total=-1.00 MB Peak=1.00 MB
> > *** Check failure stack trace: ***
> > @  0x2f1e11d  google::LogMessage::Fail()
> > @  0x2f1f9c2  google::LogMessage::SendToLog()
> > @  0x2f1daf7  google::LogMessage::Flush()
> > @  0x2f210be  google::LogMessageFatal::~LogMessageFatal()
> > @  0x17425fb  impala::MemTracker::Release()
> > @  0x1fa7e8b  impala::Coordinator::UpdateFilter()
> > @  0x186e3cf  impala::ImpalaServer::UpdateFilter()
> > @  0x18d824f  impala::ImpalaInternalService::UpdateFilter()
> > @  0x1dda35a
> > impala::ImpalaInternalServiceProcessor::process_UpdateFilter()
> > @  0x1dd8308
> > impala::ImpalaInternalServiceProcessor::dispatchCall()
> > @  0x15410ea  apache::thrift::TDispatchProcessor::process()
> > @  0x171042b
> > apache::thrift::server::TAcceptQueueServer::Task::run()
> > @  0x170c307  impala::ThriftThread::RunRunnable()
> > @  0x170da13  boost::_mfi::mf2<>::operator()()
> > @  0x170d8a9  boost::_bi::list3<>::operator()<>()
> > @  0x170d5f5  boost::_bi::bind_t<>::operator()()
> > @  0x170d508
> > boost::detail::function::void_function_obj_invoker0<>::invoke()
> > @  0x171bdfc  boost::function0<>::operator()()
> > @  0x19f3393  impala::Thread::SuperviseThread()
> > @  0x19fbf26  boost::_bi::list4<>::operator()<>()
> > @  0x19fbe69  boost::_bi::bind_t<>::operator()()
> > @  0x19fbe2c  boost::detail::thread_data<>::run()
> > @  0x20a7c9a  thread_proxy
> > @ 0x7fe6536186ba  start_thread
> > @ 0x7fe65334e3dd  clone
> > r.java:81)
> >
>


Re: Will there be a 2.12.0 release?

2017-10-24 Thread Tim Armstrong
It probably depends a lot on whether we have volunteers to do the releases
too. It sounds like it's probably worth creating a 2.12.0 release in JIRA -
we can always bump anything targeted for that release to 3.0 if it doesn't
happen.

On Tue, Oct 24, 2017 at 10:52 AM, Lars Volker <l...@cloudera.com> wrote:

> I like Jim's idea of having a process to scope out 3.0.
>
> Tim, can you think of features that are already lined up for 3.0 that we're
> currently holding off on? If there are no pressing issues, I think it'd be
> great to start working on scoping out 3.0 and produce 2.x releases until
> then.
>
> On Tue, Oct 24, 2017 at 10:19 AM, Jim Apple <jbap...@cloudera.com> wrote:
>
> > Do we want to have a 3.0 process, where one person tracks all of the open
> > breaking-change JIRAs and makes sure nothing gets accidentally left out?
> I
> > ask this because, if the answer is "yes", we might make the 2.12 decision
> > based on scope and quantity of 3.0 JIRAs.
> >
> > On Tue, Oct 24, 2017 at 10:07 AM, Tim Armstrong <tarmstr...@cloudera.com
> >
> > wrote:
> >
> > > I was just retargeting some JIRAs from 2.11 to a later release. I'm
> > > wondering if people had thoughts on whether we should have a 2.12
> release
> > > before 3.0?
> > >
> > > We have a lot of breaking changes queued up so I'm sure people are
> > looking
> > > forward to 3.0, but do we think there will be a minor release before
> > then?
> > >
> >
>


Will there be a 2.12.0 release?

2017-10-24 Thread Tim Armstrong
I was just retargeting some JIRAs from 2.11 to a later release. I'm
wondering if people had thoughts on whether we should have a 2.12 release
before 3.0?

We have a lot of breaking changes queued up so I'm sure people are looking
forward to 3.0, but do we think there will be a minor release before then?


Re: Using Gerrit drafts

2017-10-19 Thread Tim Armstrong
I thought subsequent drafts were only visible to reviewers (in general
drafts are visible to any reviewers you add to the patch). At least on
older versions of gerrit if you pushed out a draft to a published patchset,
a notification email was sent out but the updated patchset was invisible to
non-reviewers.

Another feature I find useful for organising related patches is the
"topic". If you push to refs/for/master%topic=buffer-pool then the patch is
associated with a topic.

On Thu, Oct 19, 2017 at 9:50 AM, Lars Volker  wrote:

> Note, that publishing cannot be undone. In particular, after you published
> a change, subsequent pushes to refs/drafts will be public patch sets, too.
>
> On Oct 19, 2017 09:34, "Philip Zeyliger"  wrote:
>
> Hey folks,
>
> This wasn't obvious for me, so I figured I'd share it. If you want to
> review your Gerrit changes on the Gerrit UI before sending e-mail to the
> community, you can run something like:
>
> git push asf-gerrit HEAD:refs/drafts/master
>
> This will give you a URL that you can browse to, and you can even run
> https://jenkins.impala.io/view/Utility/job/pre-review-test/ against it. No
> e-mails are sent!
>
> Once you've looked it over, you can hit 'Publish' on the web UI, and, boom,
> e-mails.
>
> Cheers,
>
> -- Philip
>


Re: [VOTE] Graduate to a TLP

2017-10-17 Thread Tim Armstrong
+1

On 17 Oct. 2017 8:38 pm, "Alexander Behm"  wrote:

> +1
>
> On Tue, Oct 17, 2017 at 8:18 PM, Taras Bobrovytsky 
> wrote:
>
> > +1
> >
> > On Tue, Oct 17, 2017 at 7:56 PM, Michael Ho  wrote:
> >
> > > +1
> > >
> > > On Tue, Oct 17, 2017 at 7:25 PM, Thomas Tauber-Marshall <
> > > tmarsh...@cloudera.com> wrote:
> > >
> > > > +1
> > > >
> > > > On Tue, Oct 17, 2017 at 9:12 PM Bharath Vissapragada <
> > > > bhara...@cloudera.com>
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On Tue, Oct 17, 2017 at 7:10 PM, Mostafa Mokhtar <
> > > mmokh...@cloudera.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > Thanks
> > > > > > Mostafa
> > > > > >
> > > > > > > On Oct 17, 2017, at 7:09 PM, Brock Noland 
> > wrote:
> > > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > >> On Tue, Oct 17, 2017 at 9:07 PM, Lars Volker  >
> > > > wrote:
> > > > > > >> +1
> > > > > > >>
> > > > > > >>> On Oct 17, 2017 19:07, "Jim Apple" 
> > wrote:
> > > > > > >>>
> > > > > > >>> Following our discussion
> > > > > > >>> https://lists.apache.org/thread.html/
> > > > 2f5db4788aff9b0557354b9106c032
> > > > > > >>> 8a29c1f90c1a74a228163949d2@%3Cdev.impala.apache.org%3E
> > > > > > >>> , I propose that we graduate to a TLP. According to
> > > > > > >>> https://incubator.apache.org/guides/graduation.html#
> > > > > > >>> community_graduation_vote
> > > > > > >>> this is not required, and https://impala.apache.org/
> > bylaws.html
> > > > does
> > > > > > not
> > > > > > >>> say whose votes are "binding" in a graduation vote, so all
> > > > community
> > > > > > >>> members are welcome to vote.
> > > > > > >>>
> > > > > > >>> This will remain open 72 hours. I will be notifying
> > > > general@incubator
> > > > > > it
> > > > > > >>> is
> > > > > > >>> occurring.
> > > > > > >>>
> > > > > > >>> This is my +1.
> > > > > > >>>
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> > > --
> > > Thanks,
> > > Michael
> > >
> >
>


Re: Time for graduation?

2017-10-13 Thread Tim Armstrong
I'd be very happy if we could graduate too. We still obviously have to
continue working on growing the community but we've made a huge amount of
progress in setting up the infrastructure and processes to be successful as
a top level project.

I think having users like Brock on the PMC is great so that we can get
input on whether the project is going in the right direction from their
point of view.

- Tim

On Thu, Oct 12, 2017 at 4:46 PM, Brock Noland  wrote:

> Hi all,
>
> I've been thinking about this as well and I feel Impala is ready.
>
> (more inline)
>
> On Thu, Oct 12, 2017 at 6:06 PM, Todd Lipcon  wrote:
>
> > On Thu, Oct 12, 2017 at 3:24 PM, Jim Apple  wrote:
> >
> > > Also, mentors are traditionally included in a graduating podling's PMC,
> > > right?
> >
> > That's often been done but I don't think there's any hard requirement.
> > Perhaps we could ask each mentor whether they would like to continue to
> be
> > involved?
> >
>
> For my part, I don't feel I contribute much to the PMC, but Impala is a
> project I use everyday and thus have a strong interest in the project being
> successful. I would not be hurt in the *least* if I was not included on the
> PMC. However, I'd be more than happy to serve.
>
> Cheers,
> Brock
>


Re: "tests for tests" in gerrit-verify-dryrun?

2017-09-25 Thread Tim Armstrong
Seems like a good idea. Ideally we could set it up in a way such that if
you made changes across directories (e.g. docs + a code change) that it
would run all applicable tests.

On Mon, Sep 25, 2017 at 5:31 PM, Jim Apple  wrote:

> That would also enable unifying docs pre-merge testing with code pre-merge
> testing.
>
> On Mon, Sep 25, 2017 at 4:41 PM Daniel Hecht  wrote:
>
> > +1 to branching based on e.g. files in commit.
> >
> > On Mon, Sep 25, 2017 at 4:33 PM, Philip Zeyliger 
> > wrote:
> >
> > > I'm not entirely familiar with the current complexity of the Jenkins
> jobs
> > > on the ASF infrastructure, but I think it's very sensible to look at
> the
> > > files in a commit (e.g., "git diff-tree --no-commit-id --name-only -r
> > > HEAD") and branch based on patterns in that data.
> > >
> > > -- Philip
> > >
> > > On Mon, Sep 25, 2017 at 4:27 PM, Taras Bobrovytsky <
> taras...@apache.org>
> > > wrote:
> > >
> > > > I like the idea of having tests for tests as part of GVD. This helps
> > > ensure
> > > > that the tests are always functional and are never broken by a
> commit.
> > > > Having tests in a functional state is arguably just as important as
> > > having
> > > > a functional product.
> > > >
> > > > On Mon, Sep 25, 2017 at 4:04 PM, Michael Brown 
> > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I'm about to start working on Impala's random query generator, a
> > > testing
> > > > > tool to help find test gaps in Impala's functional tests.
> > > > >
> > > > > The random query generator and infra code around it has some
> > functional
> > > > and
> > > > > pure unit tests [0] that are not part of GVD, but it wouldn't be
> hard
> > > to
> > > > > fold them into GVD's execution. As part of the upcoming work, I
> plan
> > to
> > > > add
> > > > > even more tests: we need quick unit or functional tests to ensure a
> > > test
> > > > > tool is working as expected.
> > > > >
> > > > > What are people's thoughts on having these "tests for tests", or
> > infra
> > > > > tests, be part of GVD?
> > > > >
> > > > > Pros:
> > > > > 1. Helps prevent regression in tools and infra
> > > > >
> > > > > 2. Verification procedure is the same as with the rest of Impala:
> run
> > > > > gerrit-verify-dryrun
> > > > >
> > > > > 3. Automatic Apache RAT verification
> > > > >
> > > > > Cons:
> > > > > 1. Patches to the random query generator tend to be self-contained.
> > > Ought
> > > > > we spend more AWS cycles and time building Impala and running these
> > > tests
> > > > > in order to run some ostensible (but growing) infra tests?
> > > > >
> > > > > 2. Flaky tests and failing builds can block test tool progress
> > > > >
> > > > > Other solutions if the cons win:
> > > > > 1. Separate Jenkins job for these tests (there's a separate job for
> > > > > submitting and verifying docs, for instance). A con of this is that
> > > this
> > > > > can lead to a proliferation of Jenkins jobs and confusion with
> > > > contributors
> > > > > on which jobs apply where. Also, if there is ever a patch where
> > Impala
> > > > > proper and query generator are both updated, which job wins?
> > > > >
> > > > > 2. Status quo and set Verified+1/Submitted by hand. This is much
> > easier
> > > > for
> > > > > committers than non-committers. I'm OK with status quo, but in the
> > > past,
> > > > > there have been requests to improve this situation [1]
> > > > >
> > > > > For a data point, I can cd to "tests/comparison/tests", run
> > > > > "impala-py.test", and 71 tests take about 10 seconds to run.
> > > > >
> > > > > Thanks for any feedback.
> > > > >
> > > > > [0]
> > > > > https://git-wip-us.apache.org/repos/asf?p=incubator-impala.
> > > > > git;a=tree;f=tests/comparison/tests;h=
> 49e3b5d7d9a6f5f716c135bda36292
> > > > > e05fb0e0d3;hb=HEAD
> > > > >
> > > > > [1] https://issues.apache.org/jira/browse/IMPALA-4756
> > > > >
> > > >
> > >
> >
>


Re: vim / Eclipse setups for new developers, on the C++ side

2017-09-13 Thread Tim Armstrong
For a long time I've just used GNU screen + VIM with syntax highlighting.
Then "git grep" or search in VIM as needed to find things. Obviously not
ideal for everyone.

I've tried YouCompleteMe recently and it works fairly well but hasn't been
a game-changer for me. Jumping to definitions is handy sometimes but I
haven't found that it's changed my workflow that much.

On Wed, Sep 13, 2017 at 2:18 PM, Philip Zeyliger 
wrote:

> Hi folks,
>
> I'm querying what folks use for working on the C++ side of the code base.
> I'm specifically interested in navigation tools for vim (better than
> ctags), error-highlighting tools for vim (showing syntax errors and such
> "live"), and Eclipse integration (yes, I've seen the wiki
>  Eclipse+Setup+for+Impala+Development>
> ).
>
> I'll be happy to collate and update
> https://cwiki.apache.org/confluence/display/IMPALA/
> Useful+Tips+for+New+Impala+Developers
> (or other appropriate pages) once I get some feedback!
>
> Thanks!
>
> -- Philip
>


Re: [RESULT] Vote on Impala 2.10.0 release candidate 2

2017-09-13 Thread Tim Armstrong
Thanks for your hard work on this Bharath.

On Tue, Sep 12, 2017 at 11:14 PM, Bharath Vissapragada <
bhara...@cloudera.com> wrote:

> The vote has passed with the following tally.
>
> +1 (binding)
>
> - Brock Noland
> - Carl Steinbach
> - John D. Ament
>
> -1 (binding) - None
> 0 - None
>
> Thanks everyone for testing and voting on the release.
>


Re: Build broken by b66af0357e - IMPALA-5854: Update external hadoop versions

2017-09-01 Thread Tim Armstrong
It works ok for me. Maybe you need to re-source impala-config.sh or
re-bootstrap to pull down the new components?

On Fri, Sep 1, 2017 at 10:17 AM, Lars Volker  wrote:

> It looks like the recent hadoop version update broke the build for me. I
> get this error, with the previous commit it still work. Anyone else seeing
> this?
>
> be/src/common/hdfs.h:30:18: fatal error: hdfs.h: No such file or directory
>  #include 
>


Re: Question about the multi-thread scan node model

2017-08-31 Thread Tim Armstrong
I spoke to Alex Behm off-list about that JIRA a while ago. I don't think
it's a true ramp-up task. The code change is easy but I think we would want
to do performance validation and testing to make sure that the new
multithreaded scanners have similar performance and stability before making
them the default.

On Thu, Aug 31, 2017 at 12:34 AM, huangquanl...@gmail.com <
huangquanl...@gmail.com> wrote:

> Yeah, "compute stats" is really cpu bound. That sounds great!
>
> I noticed that one of the sub tasks of multithreading work is labeled with
> "ramp up": https://issues.apache.org/jira/browse/IMPALA-5802
> Is this on progress? If not, could you reassign it to me to familiar with
> the latest framework?
>
> Thanks,
> Quanlong
>
> On 2017-08-31 07:16, Tim Armstrong <tarmstr...@cloudera.com> wrote:
> > Hi,
> >   The new scanner model is part of the multithreading work to support
> > running multiple instances of each fragment on each Impala daemon. The
> idea
> > there is that parallelisation is done at the fragment level so that all
> > execution including aggregations, sorts, joins is parallelised - not just
> > scans. This is enabled by setting mt_dop > 0. Currently it doesn't work
> for
> > plans including joins and HDFS inserts.
> >
> > We find that a lot of queries are compute bound, particularly by
> > aggregations and joins. In those cases we get big speedups from the newer
> > multithreading model. E.g. "compute stats" is a lot faster.
> >
> > On Wed, Aug 30, 2017 at 3:50 PM, 黄权隆 <huangquanl...@gmail.com> wrote:
> >
> > > Hi all,
> > >
> > >
> > > I’m working on applying our orc-support patch into the latest code
> bases (
> > > IMPALA-5717 <https://issues.apache.org/jira/browse/IMPALA-5717>).
> Since
> > > our
> > > patch is based on cdh-5.7.3-release which was released one year ago,
> > > there’re lots of work to merge it.
> > >
> > >
> > > One of the biggest changes from cdh-5.7.3-release I notice is the new
> scan
> > > node & scanner model introduced in IMPALA-3902
> > > <https://issues.apache.org/jira/browse/IMPALA-3902>. I think it’s
> inspired
> > > by the investigating task in IMPALA-2849
> > > <https://issues.apache.org/jira/browse/IMPALA-2849>, but I cannot
> find any
> > > performance report in this issue. Could you share some report about
> this
> > > multi-thread refactor?
> > >
> > >
> > > I’m wondering how much this can improve the performance, since the old
> > > single thread scan node & multi-thread scanners model has supplied
> > > concurrent IO for reading, and most of the queries in OLAP are IO
> bound.
> > >
> > >
> > > Thanks,
> > >
> > > Quanlong
> > >
> >
>


Re: [VOTE] 2.10.0 release candidate 1 (RC1)

2017-08-30 Thread Tim Armstrong
Tests passed, I've pushed to release-2.10.0.

- Tim

On Wed, Aug 30, 2017 at 2:59 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> There were a few other fixes for critical issues and usability bugs that
> seemed worth including. I've put together a branch here that I will run
> tests on. I was going to include these commits:
>
> 23d79462da5d0108709e8b1399c97606f4ebdf92 IMPALA-5855: reserve enough
> memory for preaggs
> a58394be7c7998a5dfea53d8a3dbf8beb3370a48 IMPALA-5850: Cast sender
> partition exprs under unions.
> 2912a0f9d9b32caf586b9383c7e027af3fe4c5c4 IMPALA-5857: avoid invalid free
> of hedged read metrics
> ebe8ddd451b3f14d3f778339978a76bcd14b2589 IMPALA-5830:
> SET_DENY_RESERVATION_PROBABILITY test
> 1faf89f047e7d78c3a1f3b518269a3ae21a4ddea IMPALA-5840: Don't write
> page-level statistics in Parquet files.
> 73cb9b8b0f6020fb90acf4fa12a00753a3120058 IMPALA-5852: improve
> MINIMUM_RESERVATION_UNAVAILABLE error
> 99fe9b3fd602180d63cbfe73ac2c9171c31ae455 IMPALA-5838: Improve errors on
> AC buffer mem rejection
>
> See: https://github.com/timarmstrong/incubator-impala/
> commits/release-2.10.0
>
>
> On Wed, Aug 30, 2017 at 2:24 PM, Jim Apple <jbap...@cloudera.com> wrote:
>
>> I ran some release tests following the instructions
>> https://cwiki.apache.org/confluence/display/IMPALA/How+to+
>> Release#HowtoRelease-HowtoVoteonaReleaseCandidate
>> and https://cwiki.apache.org/confluence/display/IMPALA/How+to+
>> load+and+run+Impala+tests.
>> Everything passed.
>>
>> I would +1, but I notice downthread that there is going to be an rc2,
>> so: +0 for now.
>>
>> On Sun, Aug 27, 2017 at 10:32 PM, Bharath Vissapragada
>> <bhara...@cloudera.com> wrote:
>> > This is a vote to release Impala 2.10.0.
>> >
>> > - The artefacts for testing can be downloaded from <
>> > https://dist.apache.org/repos/dist/dev/incubator/impala/2.10.0/RC1/>.
>> >
>> > - The git tag for this release candidate is 2.10.0-rc1 and tree hash is
>> > visible at
>> > <
>> > https://git-wip-us.apache.org/repos/asf?p=incubator-impala.g
>> it;a=tree;hb=2a7c8b9011905bfeb21b0610f0739f9df9daacef
>> >>
>> >
>> > Please vote +1 or -1. -1 votes should be accompanied by an explanation
>> of
>> > the reason. Only PPMC members and mentors have binding votes, but other
>> > community members are encouraged to cast non-binding votes. This vote
>> will
>> > pass if there are 3 binding +1 votes and more binding +1 votes than -1
>> > votes.
>> >
>> > This wiki page describes how to check the release before you vote:
>> > *https://cwiki.apache.org/confluence/display/IMPALA/How+to+
>> Release#HowtoRelease-HowtoVoteonaReleaseCandidate
>> > <https://cwiki.apache.org/confluence/display/IMPALA/How+to+
>> Release#HowtoRelease-HowtoVoteonaReleaseCandidate>*
>> >
>> > The vote will be open until the end of Wednesday, August 30, Pacific
>> time
>> > zone (UTC-08:00).
>> > Once the vote passes the Impala PPMC vote, it still must pass the
>> incubator
>> > PMC vote before a release is made.
>>
>
>


Re: Question about the multi-thread scan node model

2017-08-30 Thread Tim Armstrong
Hi,
  The new scanner model is part of the multithreading work to support
running multiple instances of each fragment on each Impala daemon. The idea
there is that parallelisation is done at the fragment level so that all
execution including aggregations, sorts, joins is parallelised - not just
scans. This is enabled by setting mt_dop > 0. Currently it doesn't work for
plans including joins and HDFS inserts.

We find that a lot of queries are compute bound, particularly by
aggregations and joins. In those cases we get big speedups from the newer
multithreading model. E.g. "compute stats" is a lot faster.

On Wed, Aug 30, 2017 at 3:50 PM, 黄权隆  wrote:

> Hi all,
>
>
> I’m working on applying our orc-support patch into the latest code bases (
> IMPALA-5717 ). Since
> our
> patch is based on cdh-5.7.3-release which was released one year ago,
> there’re lots of work to merge it.
>
>
> One of the biggest changes from cdh-5.7.3-release I notice is the new scan
> node & scanner model introduced in IMPALA-3902
> . I think it’s inspired
> by the investigating task in IMPALA-2849
> , but I cannot find any
> performance report in this issue. Could you share some report about this
> multi-thread refactor?
>
>
> I’m wondering how much this can improve the performance, since the old
> single thread scan node & multi-thread scanners model has supplied
> concurrent IO for reading, and most of the queries in OLAP are IO bound.
>
>
> Thanks,
>
> Quanlong
>


Re: [VOTE] 2.10.0 release candidate 1 (RC1)

2017-08-30 Thread Tim Armstrong
There were a few other fixes for critical issues and usability bugs that
seemed worth including. I've put together a branch here that I will run
tests on. I was going to include these commits:

23d79462da5d0108709e8b1399c97606f4ebdf92 IMPALA-5855: reserve enough memory
for preaggs
a58394be7c7998a5dfea53d8a3dbf8beb3370a48 IMPALA-5850: Cast sender partition
exprs under unions.
2912a0f9d9b32caf586b9383c7e027af3fe4c5c4 IMPALA-5857: avoid invalid free of
hedged read metrics
ebe8ddd451b3f14d3f778339978a76bcd14b2589 IMPALA-5830:
SET_DENY_RESERVATION_PROBABILITY test
1faf89f047e7d78c3a1f3b518269a3ae21a4ddea IMPALA-5840: Don't write
page-level statistics in Parquet files.
73cb9b8b0f6020fb90acf4fa12a00753a3120058 IMPALA-5852: improve
MINIMUM_RESERVATION_UNAVAILABLE error
99fe9b3fd602180d63cbfe73ac2c9171c31ae455 IMPALA-5838: Improve errors on AC
buffer mem rejection

See: https://github.com/timarmstrong/incubator-impala/commits/release-2.10.0


On Wed, Aug 30, 2017 at 2:24 PM, Jim Apple  wrote:

> I ran some release tests following the instructions
> https://cwiki.apache.org/confluence/display/IMPALA/How+
> to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate
> and https://cwiki.apache.org/confluence/display/IMPALA/How+
> to+load+and+run+Impala+tests.
> Everything passed.
>
> I would +1, but I notice downthread that there is going to be an rc2,
> so: +0 for now.
>
> On Sun, Aug 27, 2017 at 10:32 PM, Bharath Vissapragada
>  wrote:
> > This is a vote to release Impala 2.10.0.
> >
> > - The artefacts for testing can be downloaded from <
> > https://dist.apache.org/repos/dist/dev/incubator/impala/2.10.0/RC1/>.
> >
> > - The git tag for this release candidate is 2.10.0-rc1 and tree hash is
> > visible at
> > <
> > https://git-wip-us.apache.org/repos/asf?p=incubator-impala.
> git;a=tree;hb=2a7c8b9011905bfeb21b0610f0739f9df9daacef
> >>
> >
> > Please vote +1 or -1. -1 votes should be accompanied by an explanation of
> > the reason. Only PPMC members and mentors have binding votes, but other
> > community members are encouraged to cast non-binding votes. This vote
> will
> > pass if there are 3 binding +1 votes and more binding +1 votes than -1
> > votes.
> >
> > This wiki page describes how to check the release before you vote:
> > *https://cwiki.apache.org/confluence/display/IMPALA/How+
> to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate
> >  to+Release#HowtoRelease-HowtoVoteonaReleaseCandidate>*
> >
> > The vote will be open until the end of Wednesday, August 30, Pacific time
> > zone (UTC-08:00).
> > Once the vote passes the Impala PPMC vote, it still must pass the
> incubator
> > PMC vote before a release is made.
>


Re: Compile error in Redhat 7.x

2017-08-29 Thread Tim Armstrong
Impala definitely should build and run on RHEL6.x. We do most of our
testing on CentOS 6.

I'm not sure if this will help, but when building on Ubuntu we generally
need to set LD_LIBRARY_PATH to work around this problem

export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH




On Tue, Aug 29, 2017 at 8:15 PM, yu feng  wrote:

> Hi, I try to compile impala in Red Hat Enterprise Linux Server release 6.5
> (Santiago), and I get thje error :
>
> cmake: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by
> /home/hzfengyu/source/impala-kudu/toolchain/gcc-4.9.2/
> lib64/libstdc++.so.6)
> cmake: /lib64/libc.so.6: version `GLIBC_2.18' not found (required by
> /home/hzfengyu/source/impala-kudu/toolchain/gcc-4.9.2/
> lib64/libstdc++.so.6)
> cmake: /lib64/libc.so.6: version `GLIBC_2.17' not found (required by
> /home/hzfengyu/source/impala-kudu/toolchain/gcc-4.9.2/
> lib64/libstdc++.so.6)
> cmake: /lib64/libc.so.6: version `GLIBC_2.14' not found (required by
> /home/hzfengyu/source/impala-kudu/toolchain/gcc-4.9.2/lib64/libgcc_s.so.1)
> Error in /home/hzfengyu/source/impala-kudu/bin/make_impala.sh at line 125:
> cmake . ${CMAKE_ARGS[@]}
> Error in buildall.sh at line 319: "$IMPALA_HOME/bin/make_impala.sh"
> ${MAKE_IMPALA_ARGS}
>
> change GLIBC have a great influence, So do I have any other way finish it.
>
> I just want to compile and run impala cluster beyond Red Hat Enterprise
> Linux Server release 6.5 .
>
> Thanks a lot.
>


Re: [VOTE] 2.10.0 release candidate 1 (RC1)

2017-08-28 Thread Tim Armstrong
Matt Mulder just found a fairly nasty bug in RC1:
https://issues.apache.org/jira/browse/IMPALA-5855 . It seems like we should
probably generate a new RC once that is fixed.

On Mon, Aug 28, 2017 at 11:46 AM, Bharath Vissapragada <
bhara...@cloudera.com> wrote:

> Thanks Todd for the quick help. I read more about it and I found this link
> [1] interesting. So, looks like we need to grow our "web of trust" and one
> way I think is to trust the keys of RMs in the KEYS file, especially given
> they have write permission to the directory and could update that file. As
> per the link I mentioned, this doesn't look like a standard Apache
> practice, but I don't see any other way (please correct me if I'm wrong).
>
> [1] https://mirror-vm.apache.org/~henkp/trust/
>
> On Mon, Aug 28, 2017 at 11:14 AM, Todd Lipcon  wrote:
>
> > Hey Bharath,
> >
> > Take a look at https://www.apache.org/dev/release-signing.html#web-of-
> > trust -- it has some info on the GPG "web of trust". Basically, you need
> > to either directly trust Jim's key 6850196C, or you need to trust someone
> > who trusts him, etc. If you haven't yourself signed or trusted anyone's
> > keys, then no one's signature will be considered trusted for you.
> >
> > Typically projects also publish a KEYS file in their distribution
> > directory which would be able to verify that the signing key at least
> > matches the one that was uploaded via ASF infrastructure.
> >
> > -Todd
> >
> > On Mon, Aug 28, 2017 at 11:09 AM, Bharath Vissapragada <
> > bhara...@cloudera.com> wrote:
> >
> >> + mentors
> >>
> >> Thanks for testing the release Matt. I ran into the same issue while
> >> testing it myself. So I double checked older releases 2.9.0 and 2.8.0
> and I
> >> saw the same behavior.
> >>
> >> gpg --verify apache-impala-incubating-2.9.0.tar.gz.asc
> >> apache-impala-incubating-2.9.0.tar.gz
> >> gpg: Signature made Fri 02 Jun 2017 12:25:45 PM PDT using RSA key ID
> >> 9522D0F3
> >> gpg: Good signature from "Taras Bobrovytsky (CODE SIGNING KEY) <
> >> taras...@apache.org>"
> >> gpg: WARNING: This key is not certified with a trusted signature!
> >> gpg:  There is no indication that the signature belongs to the
> >> owner.
> >> Primary key fingerprint: 8B3E 3FC6 7005 4F52 2421  EEA9 8F3F 86FA 9522
> >> D0F3
> >>
> >> gpg --verify apache-impala-incubating-2.8.0.tar.gz.asc
> >> apache-impala-incubating-2.8.0.tar.gz
> >> gpg: Signature made Sat 07 Jan 2017 10:50:22 AM PST using RSA key ID
> >> 6850196C
> >> gpg: Good signature from "Jim Apple (CODE SIGNING KEY) <
> >> jbap...@apache.org>"
> >> gpg: WARNING: This key is not certified with a trusted signature!
> >> gpg:  There is no indication that the signature belongs to the
> >> owner.
> >> Primary key fingerprint: 11EA E1B3 F3D9 9D7F 897E  4601 91EE 4306 6850
> >> 196C
> >>
> >> I tried to dig into it and this looks like a pretty common problem [1].
> >> But, I'm not totally sure about the standard practices to make a key
> >> trusted. Does anyone else in the community knows what are the best
> >> practices around this and how it works with other Apache projects?
> >>
> >> [1] https://serverfault.com/questions/569911/how-to-verify-
> >> an-imported-gpg-key
> >>
> >>
> >> On Mon, Aug 28, 2017 at 10:26 AM, Matthew Jacobs 
> wrote:
> >>
> >>> Bharath, is your key set up correctly?
> >>>
> >>> Running the script on
> >>> https://cwiki.apache.org/confluence/display/IMPALA/How+to+Re
> >>> lease#HowtoRelease-HowtoVoteonaReleaseCandidate
> >>> resulted in this warning indicating that your signature is not
> >>> trusted:
> >>>
> >>> gpg: WARNING: This key is not certified with a trusted signature!
> >>> gpg:  There is no indication that the signature belongs to the
> >>> owner.
> >>>
> >>> Maybe someone who has RM'd before can comment on this.
> >>>
> >>>
> >>> ...
> >>> gpg: key 6850196C: public key "Jim Apple (CODE SIGNING KEY)
> >>> " imported
> >>> gpg: key 9522D0F3: public key "Taras Bobrovytsky (CODE SIGNING KEY)
> >>> " imported
> >>> gpg: key 64DAB27C: public key "Bharath Vissapragada
> >>> " imported
> >>> gpg: Total number processed: 3
> >>> gpg:   imported: 3  (RSA: 3)
> >>> gpg: no ultimately trusted keys found
> >>> + echo 'If in an interactive shell, At the prompt, enter '\''5'\'' for
> >>> '\''I trust ultimately'\'', then '\''y'\'' for '\''yes'\'', then
> >>> '\''q'\'' for '\''quit'\'''
> >>> If in an interactive shell, At the prompt, enter '5' for 'I trust
> >>> ultimately', then 'y' for 'yes', then 'q' for 'quit'
> >>> + [[ ehuxB == *i* ]]
> >>> + echo 'Download the release artifacts:'
> >>> Download the release artifacts:
> >>> + for SUFFIX in gz gz.asc gz.md5 gz.sha512
> >>> + wget -q https://dist.apache.org/repos/dist/dev/incubator/impala/2.10
> >>> .0/RC1/apache-impala-incubating-2.10.0.tar.gz
> >>> + for SUFFIX in gz gz.asc gz.md5 gz.sha512
> >>> + wget -q 

Re: [DISCUSS] 2.10.0 release

2017-08-24 Thread Tim Armstrong
All of the IMPALA-3200 work is now in master!

On Wed, Aug 23, 2017 at 12:21 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> I was looking through open JIRAs to make sure I didn't drop the ball on
> any buffer pool changes and discovered we have 100+ open JIRAs targeted for
> 2.10: https://issues.apache.org/jira/issues/?filter=12341748
>
> It would be great to clean those up. I tried to clean up the ones that I
> know something about but most of them I'm not familiar with. It looks like
> a lot aren't being actively worked on so probably belong in the backlog -
> the target version seems to just be expressing a hope that someone else
> will fix it soon.
>
> You can check your own 2.10 JIRAs with this filter:
> https://issues.apache.org/jira/issues/?filter=12341563
>
> There are also a bunch of unassigned ones: https://issues.apache.org/
> jira/issues/?filter=12341750
>
>
>
> On Mon, Aug 14, 2017 at 11:09 AM, Bharath Vissapragada <
> bhara...@cloudera.com> wrote:
>
>> Agreed Tim.
>>
>> On Mon, Aug 14, 2017 at 9:13 AM, Tim Armstrong <tarmstr...@cloudera.com>
>> wrote:
>>
>> > Sounds good to me. We should coordinate to make sure that all of
>> > https://issues.apache.org/jira/browse/IMPALA-3200 (the buffer pool
>> > changes)
>> > and related fixes make it into the release.
>> >
>> > - Tim
>> >
>> > On Mon, Aug 14, 2017 at 5:52 AM, Jim Apple <jbap...@cloudera.com>
>> wrote:
>> >
>> > > This sounds like a good idea to me. Thank you for volunteering!
>> > >
>> > > On Mon, Aug 14, 2017 at 12:37 AM, Bharath Vissapragada
>> > > <bhara...@cloudera.com> wrote:
>> > > > Folks,
>> > > >
>> > > > It has been almost 2 months since we released Apache Impala
>> > (incubating)
>> > > > 2.9.0 and there have been new feature improvements and a good
>> number of
>> > > bug
>> > > > fixes checked in since then.
>> > > >
>> > > > I propose that we release 2.10.0 soon and I volunteer to be its
>> release
>> > > > manager. Please speak up and let the community know if anyone has
>> any
>> > > > objections to this.
>> > > >
>> > > > Thanks,
>> > > > Bharath
>> > >
>> >
>>
>
>


Re: jenkins.impala.io pre-existing workspace

2017-08-23 Thread Tim Armstrong
Maybe the workspace just got left in a weird state - I think in most cases
"git init" followed by checking out a branch and doing a clean would work.

Should we add the delete workspace post-build action?

On Wed, Aug 23, 2017 at 5:32 PM, Michael Brown <mi...@cloudera.com> wrote:

> Not a known issue. I noticed ubuntu-16.04-from-scratch is not set to clean
> up its workspace, and its config has not been touched since Aug 11. It
> seems strange we only saw this now
>
> On Wed, Aug 23, 2017 at 5:25 PM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
> > Is this a known problem? My job failed because the Impala repo already
> > existed on the machine:
> >
> > https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/164/
> >
> > *23:00:24* + /usr/bin/git init /home/ubuntu/Impala*23:00:24*
> > Reinitialized existing Git repository in /home/ubuntu/Impala/.git/
> > 
> > *23:02:18* + for ITER in '$(seq 1 10)'*23:02:18* + echo 'ATTEMPT:
> > 1'*23:02:18* ATTEMPT: 1*23:02:18* + /usr/bin/git checkout
> > FETCH_HEAD*23:02:18* + cat
> > /home/ubuntu/Impala/tmp.3tYBn0GUga*23:02:18* 23:02:18.712300 git.c:344
> >   trace: built-in: git 'checkout' 'FETCH_HEAD'*23:02:18*
> > error: The following untracked working tree files would be overwritten
> > by checkout:*23:02:18*  .clang-format*23:02:18*
> >  .clang-tidy*23:02:18*
> > .gitignore*23:02:18*CMakeLists.txt*23:02:18*
> > DISCLAIMER*23:02:18*
> > EXPORT_CONTROL.md*23:02:18* LICENSE.txt*23:02:18*
> >  LOGS.md*23:02:18*
> > NOTICE.txt*23:02:18*README.md*23:02:18*
> >  be/.gitignore*23:02:18*
> > be/.impala.doxy*23:02:18*   be/CMakeLists.txt*23:02:18*
> > be/src/benchmarks/CMakeLists.txt*23:02:18*
> > be/src/benchmarks/atod-benchmark.cc*23:02:18*
> > be/src/benchmarks/atof-benchmark.cc*23:02:18*
> > be/src/benchmarks/atoi-benchmark.cc*23:02:18*
> > be/src/benchmarks/bit-packing-benchmark.cc*23:02:18*
> > be/src/benchmarks/bitmap-benchmark.cc
> > ...
> >
>


jenkins.impala.io pre-existing workspace

2017-08-23 Thread Tim Armstrong
Is this a known problem? My job failed because the Impala repo already
existed on the machine:

https://jenkins.impala.io/job/ubuntu-16.04-from-scratch/164/

*23:00:24* + /usr/bin/git init /home/ubuntu/Impala*23:00:24*
Reinitialized existing Git repository in /home/ubuntu/Impala/.git/

*23:02:18* + for ITER in '$(seq 1 10)'*23:02:18* + echo 'ATTEMPT:
1'*23:02:18* ATTEMPT: 1*23:02:18* + /usr/bin/git checkout
FETCH_HEAD*23:02:18* + cat
/home/ubuntu/Impala/tmp.3tYBn0GUga*23:02:18* 23:02:18.712300 git.c:344
  trace: built-in: git 'checkout' 'FETCH_HEAD'*23:02:18*
error: The following untracked working tree files would be overwritten
by checkout:*23:02:18*  .clang-format*23:02:18* .clang-tidy*23:02:18*
.gitignore*23:02:18*CMakeLists.txt*23:02:18*
DISCLAIMER*23:02:18*
EXPORT_CONTROL.md*23:02:18* LICENSE.txt*23:02:18*   
LOGS.md*23:02:18*
NOTICE.txt*23:02:18*README.md*23:02:18* be/.gitignore*23:02:18*
be/.impala.doxy*23:02:18*   be/CMakeLists.txt*23:02:18*
be/src/benchmarks/CMakeLists.txt*23:02:18*
be/src/benchmarks/atod-benchmark.cc*23:02:18*
be/src/benchmarks/atof-benchmark.cc*23:02:18*
be/src/benchmarks/atoi-benchmark.cc*23:02:18*
be/src/benchmarks/bit-packing-benchmark.cc*23:02:18*
be/src/benchmarks/bitmap-benchmark.cc
...


Re: [DISCUSS] 2.10.0 release

2017-08-23 Thread Tim Armstrong
I was looking through open JIRAs to make sure I didn't drop the ball on any
buffer pool changes and discovered we have 100+ open JIRAs targeted for
2.10: https://issues.apache.org/jira/issues/?filter=12341748

It would be great to clean those up. I tried to clean up the ones that I
know something about but most of them I'm not familiar with. It looks like
a lot aren't being actively worked on so probably belong in the backlog -
the target version seems to just be expressing a hope that someone else
will fix it soon.

You can check your own 2.10 JIRAs with this filter:
https://issues.apache.org/jira/issues/?filter=12341563

There are also a bunch of unassigned ones:
https://issues.apache.org/jira/issues/?filter=12341750



On Mon, Aug 14, 2017 at 11:09 AM, Bharath Vissapragada <
bhara...@cloudera.com> wrote:

> Agreed Tim.
>
> On Mon, Aug 14, 2017 at 9:13 AM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
> > Sounds good to me. We should coordinate to make sure that all of
> > https://issues.apache.org/jira/browse/IMPALA-3200 (the buffer pool
> > changes)
> > and related fixes make it into the release.
> >
> > - Tim
> >
> > On Mon, Aug 14, 2017 at 5:52 AM, Jim Apple <jbap...@cloudera.com> wrote:
> >
> > > This sounds like a good idea to me. Thank you for volunteering!
> > >
> > > On Mon, Aug 14, 2017 at 12:37 AM, Bharath Vissapragada
> > > <bhara...@cloudera.com> wrote:
> > > > Folks,
> > > >
> > > > It has been almost 2 months since we released Apache Impala
> > (incubating)
> > > > 2.9.0 and there have been new feature improvements and a good number
> of
> > > bug
> > > > fixes checked in since then.
> > > >
> > > > I propose that we release 2.10.0 soon and I volunteer to be its
> release
> > > > manager. Please speak up and let the community know if anyone has any
> > > > objections to this.
> > > >
> > > > Thanks,
> > > > Bharath
> > >
> >
>


Re: how to pass constant parameter to Init() function of UDAF

2017-08-14 Thread Tim Armstrong
If you're running an older version of Impala you could be hitting
https://issues.apache.org/jira/browse/IMPALA-2379

On Mon, Aug 14, 2017 at 1:49 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> Hi Santanu,
>   Thanks for your interest. I can probably help you out given a bit more
> info. Whether the arguments are constant or not is determined based on
> analysis of the input expression to your function. In your case 1 and 100
> are definitely constant.
>
> What version of Impala are you running? Could you also show us the actual
> code for your UDAF (or an simplified reproduction of the problem), the SQL
> commands you ran and the output?
>
> - Tim
>
> On Mon, Aug 14, 2017 at 1:39 PM, Santanu Chatterjee <
> santanu.chat...@gmail.com> wrote:
>
>> I am trying to develop a UDAF which takes three parameters. SQL syntax
>> would look like this :
>>
>> select myudaf(col1, 1, 100) from mytab;
>>
>> Here col1 is from table mytab and of type numeric (double/int etc.). The
>> other two parameters are constants. The third parameter determines memory
>> allocation for intermediate results. Therefore, I need to access it from
>> Init() function. Here is how I developed my update and init functions :
>>
>> void myUDAFInit(FunctionContext *ctx, StringVal *dst);
>> void myUDAFUpdate(FunctionContext *ctx, const DoubleVal& d, const IntVal&,
>> const IntVal&, StringVal* result);
>>
>> Also, I am defining my UDAF like this :
>>
>> create aggregate function myUDAF(double, int, int) returns... ;
>>
>> However, when I try to access function arguments in my Init() function, it
>> says the arguments are non-constant. Is there a different way to define
>> constant arguments?
>>
>> Thanks in Advance.
>>
>
>


Re: how to pass constant parameter to Init() function of UDAF

2017-08-14 Thread Tim Armstrong
Hi Santanu,
  Thanks for your interest. I can probably help you out given a bit more
info. Whether the arguments are constant or not is determined based on
analysis of the input expression to your function. In your case 1 and 100
are definitely constant.

What version of Impala are you running? Could you also show us the actual
code for your UDAF (or an simplified reproduction of the problem), the SQL
commands you ran and the output?

- Tim

On Mon, Aug 14, 2017 at 1:39 PM, Santanu Chatterjee <
santanu.chat...@gmail.com> wrote:

> I am trying to develop a UDAF which takes three parameters. SQL syntax
> would look like this :
>
> select myudaf(col1, 1, 100) from mytab;
>
> Here col1 is from table mytab and of type numeric (double/int etc.). The
> other two parameters are constants. The third parameter determines memory
> allocation for intermediate results. Therefore, I need to access it from
> Init() function. Here is how I developed my update and init functions :
>
> void myUDAFInit(FunctionContext *ctx, StringVal *dst);
> void myUDAFUpdate(FunctionContext *ctx, const DoubleVal& d, const IntVal&,
> const IntVal&, StringVal* result);
>
> Also, I am defining my UDAF like this :
>
> create aggregate function myUDAF(double, int, int) returns... ;
>
> However, when I try to access function arguments in my Init() function, it
> says the arguments are non-constant. Is there a different way to define
> constant arguments?
>
> Thanks in Advance.
>


Re: [DISCUSS] 2.10.0 release

2017-08-14 Thread Tim Armstrong
Sounds good to me. We should coordinate to make sure that all of
https://issues.apache.org/jira/browse/IMPALA-3200 (the buffer pool changes)
and related fixes make it into the release.

- Tim

On Mon, Aug 14, 2017 at 5:52 AM, Jim Apple  wrote:

> This sounds like a good idea to me. Thank you for volunteering!
>
> On Mon, Aug 14, 2017 at 12:37 AM, Bharath Vissapragada
>  wrote:
> > Folks,
> >
> > It has been almost 2 months since we released Apache Impala (incubating)
> > 2.9.0 and there have been new feature improvements and a good number of
> bug
> > fixes checked in since then.
> >
> > I propose that we release 2.10.0 soon and I volunteer to be its release
> > manager. Please speak up and let the community know if anyone has any
> > objections to this.
> >
> > Thanks,
> > Bharath
>


Coordinated change to Impala and Impala-lzo - you may need to rebase Impala-lzo

2017-08-13 Thread Tim Armstrong
IMPALA-5412  required a
corresponding change in the Cloudera Impala-lzo plugin.

If you develop using both you may need to rebase Impala-lzo.


Re: Parttition_id parameter of HdfsScanNodeRange::AllocateScanRange

2017-08-08 Thread Tim Armstrong
Based on the comment of AllocateScanRange(), it seems like in some cases
the partition id is ignored so a dummy value can safely be passed in. I'm
honestly not sure why it was optional in those cases - it doesn't seem like
it would be hard plumb through the valid partition ID.

So the change seems reasonable to me.


On Tue, Aug 8, 2017 at 9:10 AM, Gabor Kaszab 
wrote:

> Hey,
>
> I'm currently working on IMPALA-5412
>  where I enhanced
> FileDescMap to have a key of (partition_id, filename) pair instead of a
> single filename.
> With this I have to extend the HdfsScanNode::GetFileDesc to expect the
> partition_id as well. In turn one of the functions calling this GetFileDesc
> is HdfsScanNode::AllocateScanRange already has a partition_id as an input
> parameter so it seemed obvious to pass that to GetFileDesc.
> Unfortunately, after some debugging of the impala code that crashed during
> the "Loading Kudu TPCH" phase of testing I observed that this
> AllocateScanRange is sometimes given '-1' as a partition_id. (in one
> occasion it is given a scalar_reader->col_idx() as partition_id.)
> As a result no matching file descriptor is found by GetFileDesc where a
> DCHECK fails.
>
> I naively modified how AllocateScanRange is called to receive valid
> partition IDs instead of '-1' and col_idx(). This fixes the issues I
> encountered and mentioned above, however I'm not sure that this change
> doesn't break some other logic. (Core tests are being run, but even if they
> pass I'm not sure this change is fine)
>
> Was it intentional to give those non-valid partition_id's to
> AllocateScanRange? Do you see any risk of passing valid IDs instead?
>
> Cheers,
> Gabor
>


Re: Impala Sorter just sort small partition?

2017-08-05 Thread Tim Armstrong
No problems at all, thanks for your interest.

On Fri, Aug 4, 2017 at 9:07 PM, cjjnj...@gmail.com <cjjnj...@gmail.com>
wrote:

> oh, there is an assignment to low.  thanks for patience:)
>
> ---Original---
> *From:* "Tim Armstrong "<tarmstr...@cloudera.com>
> *Date:* 2017/8/5 11:27:21
> *To:* "dev@impala"<dev@impala.incubator.apache.org>;
> *Subject:* Re: Impala Sorter just sort small partition?
>
> It does sort both left and right partitions - it just recurses on the small
> partition and the next iteration of the loop processes the large partition.
>
> This is a pretty common optimisation. This page has a nice 
> explanation:http://www.geeksforgeeks.org/quicksort-tail-call-optimization-reducing-worst-case-space-log-n/
>
> On Fri, Aug 4, 2017 at 6:12 PM, 俊杰陈 <cjjnj...@gmail.com> wrote:
>
> > Thanks for your detail description.
> >
> > My question should be more specific to quicksort part. This line
> > <https://github.com/apache/incubator-impala/blob/master/
> > be/src/runtime/sorter.cc#L1258>
> > say
> > recurse on the small partition due to stack consideration, while as my
> > understanding quicksort should recurse on both left partition and right
> > partition, so I'm curious how it keep one run sorted, does it sort in later
> > merge sort or somewhere else?   But the merge process should take sorted
> > runs as input.
> >
> > 2017-08-05 0:18 GMT+08:00 Tim Armstrong <tarmstr...@cloudera.com>:
> >
> > > The Sorter does a 3-level hybrid sort with merge sort, quicksort and
> > > insertion sort.
> > >
> > > SortHelper implements a 2-level hybrid in-memory sort. It fully sorts an
> > > arbitrarily sized in-memory input. E.g. if 'begin' and 'end' point to the
> > > begin and end of the sorted run, it will sort the full run. It does
> > > quicksort recursively then switches to insertion sort once the partitions
> > > are less than INSERTION_THRESHOLD = 16.
> > >
> > > Sorter also supports an external merge sort - if the full input doesn't
> > fit
> > > in memory, it sorts in-memory runs with SortHelper() then does merge sort
> > > with the sorted runs.
> > >
> > > On Thu, Aug 3, 2017 at 11:13 PM, 俊杰陈 <cjjnj...@gmail.com
> > wrote:
> > >
> > > > Hi
> > > > I'm looking Sorter.cc and found that Sorter::SortHelper just sort
> > smaller
> > > > partition. Is there anything I missed?
> > > >
> > > > --
> > > > Thanks & Best Regards
> > > >
> > >
> >
> >
> >
> > --
> > Thanks & Best Regards
> >
>
>


Re: Impala Sorter just sort small partition?

2017-08-04 Thread Tim Armstrong
It does sort both left and right partitions - it just recurses on the small
partition and the next iteration of the loop processes the large partition.

This is a pretty common optimisation. This page has a nice explanation:
http://www.geeksforgeeks.org/quicksort-tail-call-optimization-reducing-worst-case-space-log-n/

On Fri, Aug 4, 2017 at 6:12 PM, 俊杰陈 <cjjnj...@gmail.com> wrote:

> Thanks for your detail description.
>
> My question should be more specific to quicksort part. This line
> <https://github.com/apache/incubator-impala/blob/master/
> be/src/runtime/sorter.cc#L1258>
> say
> recurse on the small partition due to stack consideration, while as my
> understanding quicksort should recurse on both left partition and right
> partition, so I'm curious how it keep one run sorted, does it sort in later
> merge sort or somewhere else?   But the merge process should take sorted
> runs as input.
>
> 2017-08-05 0:18 GMT+08:00 Tim Armstrong <tarmstr...@cloudera.com>:
>
> > The Sorter does a 3-level hybrid sort with merge sort, quicksort and
> > insertion sort.
> >
> > SortHelper implements a 2-level hybrid in-memory sort. It fully sorts an
> > arbitrarily sized in-memory input. E.g. if 'begin' and 'end' point to the
> > begin and end of the sorted run, it will sort the full run. It does
> > quicksort recursively then switches to insertion sort once the partitions
> > are less than INSERTION_THRESHOLD = 16.
> >
> > Sorter also supports an external merge sort - if the full input doesn't
> fit
> > in memory, it sorts in-memory runs with SortHelper() then does merge sort
> > with the sorted runs.
> >
> > On Thu, Aug 3, 2017 at 11:13 PM, 俊杰陈 <cjjnj...@gmail.com> wrote:
> >
> > > Hi
> > > I'm looking Sorter.cc and found that Sorter::SortHelper just sort
> smaller
> > > partition. Is there anything I missed?
> > >
> > > --
> > > Thanks & Best Regards
> > >
> >
>
>
>
> --
> Thanks & Best Regards
>


Re: Impala Sorter just sort small partition?

2017-08-04 Thread Tim Armstrong
The Sorter does a 3-level hybrid sort with merge sort, quicksort and
insertion sort.

SortHelper implements a 2-level hybrid in-memory sort. It fully sorts an
arbitrarily sized in-memory input. E.g. if 'begin' and 'end' point to the
begin and end of the sorted run, it will sort the full run. It does
quicksort recursively then switches to insertion sort once the partitions
are less than INSERTION_THRESHOLD = 16.

Sorter also supports an external merge sort - if the full input doesn't fit
in memory, it sorts in-memory runs with SortHelper() then does merge sort
with the sorted runs.

On Thu, Aug 3, 2017 at 11:13 PM, 俊杰陈  wrote:

> Hi
> I'm looking Sorter.cc and found that Sorter::SortHelper just sort smaller
> partition. Is there anything I missed?
>
> --
> Thanks & Best Regards
>


Re: problem about buildall.sh

2017-08-02 Thread Tim Armstrong
It looks like the error is:

> -- --> Adding thirdparty library glog. <--
> -- Header files: /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/glog-0.3.4-p2/include
> -- Added shared library dependency glog: /home/hzfengyu/impala/apache-
> impala/incubator-impala/toolchain/glog-0.3.4-p2/lib/libglog.so
> CMake Error at CMakeLists.txt:178 (IMPALA_ADD_THIRDPARTY_LIB):
>   IMPALA_ADD_THIRDPARTY_LIB Function invoked with incorrect arguments for
>   function named: IMPALA_ADD_THIRDPARTY_LIB

Has anyone seen this before?

One thing you can try is deleting glog then rebuilding. E.g.

rm -r toolchain/glog-0.3.4-p2/

On Wed, Aug 2, 2017 at 9:34 PM, yu feng  wrote:

> CMakeOutput.log :
>
>
>
> The system is: Linux - 3.16.0-4-amd64 - x86_64
> Compiling the C compiler identification source file "CMakeCCompilerId.c"
> succeeded.
> Compiler:
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/gcc-4.9.2/bin/gcc
> Build flags:
> Id flags:
>
> The output was:
> 0
>
>
> Compilation of the C compiler identification source "CMakeCCompilerId.c"
> produced "a.out"
>
> The C compiler identification is GNU, found in
> "/home/hzfengyu/impala/apache-impala/incubator-impala/
> CMakeFiles/3.2.3/CompilerIdC/a.out"
>
> Compiling the CXX compiler identification source file
> "CMakeCXXCompilerId.cpp" succeeded.
> Compiler:
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/gcc-4.9.2/bin/g++
> Build flags:
> Id flags:
>
> The output was:
> 0
>
>
> Compilation of the CXX compiler identification source
> "CMakeCXXCompilerId.cpp" produced "a.out"
>
> The CXX compiler identification is GNU, found in
> "/home/hzfengyu/impala/apache-impala/incubator-impala/CMakeFiles/3.2.3/
> CompilerIdCXX/a.out"
>
> Determining if the C compiler works passed with the following output:
> Change Dir:
> /home/hzfengyu/impala/apache-impala/incubator-impala/CMakeFiles/CMakeTmp
>
> Run Build Command:"/usr/bin/make" "cmTryCompileExec2174981562/fast"
> /usr/bin/make -f CMakeFiles/cmTryCompileExec2174981562.dir/build.make
> CMakeFiles/cmTryCompileExec2174981562.dir/build
> make[1]: Entering directory
> '/home/hzfengyu/impala/apache-impala/incubator-impala/CMakeFiles/CMakeTmp'
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/cmake-3.2.3-p1/bin/cmake
> -E cmake_progress_report
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> CMakeFiles/CMakeTmp/CMakeFiles
> 1
> Building C object
> CMakeFiles/cmTryCompileExec2174981562.dir/testCCompiler.c.o
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/gcc-4.9.2/bin/gcc
>-o CMakeFiles/cmTryCompileExec2174981562.dir/testCCompiler.c.o   -c
> /home/hzfengyu/impala/apache-impala/incubator-impala/CMakeFiles/CMakeTmp/
> testCCompiler.c
> Linking C executable cmTryCompileExec2174981562
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/cmake-3.2.3-p1/bin/cmake
> -E cmake_link_script CMakeFiles/cmTryCompileExec2174981562.dir/link.txt
> --verbose=1
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/gcc-4.9.2/bin/gcc
>   CMakeFiles/cmTryCompileExec2174981562.dir/testCCompiler.c.o  -o
> cmTryCompileExec2174981562 -rdynamic
> make[1]: Leaving directory
> '/home/hzfengyu/impala/apache-impala/incubator-impala/CMakeFiles/CMakeTmp'
>
>
> Detecting C compiler ABI info compiled with the following output:
> Change Dir:
> /home/hzfengyu/impala/apache-impala/incubator-impala/CMakeFiles/CMakeTmp
>
> Run Build Command:"/usr/bin/make" "cmTryCompileExec3739174323/fast"
> /usr/bin/make -f CMakeFiles/cmTryCompileExec3739174323.dir/build.make
> CMakeFiles/cmTryCompileExec3739174323.dir/build
> make[1]: Entering directory
> '/home/hzfengyu/impala/apache-impala/incubator-impala/CMakeFiles/CMakeTmp'
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/cmake-3.2.3-p1/bin/cmake
> -E cmake_progress_report
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> CMakeFiles/CMakeTmp/CMakeFiles
> 1
> Building C object
> CMakeFiles/cmTryCompileExec3739174323.dir/CMakeCCompilerABI.c.o
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/gcc-4.9.2/bin/gcc
>-o CMakeFiles/cmTryCompileExec3739174323.dir/CMakeCCompilerABI.c.o   -c
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/cmake-3.2.3-p1/share/cmake-3.2/Modules/CMakeCCompilerABI.c
> Linking C executable cmTryCompileExec3739174323
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/cmake-3.2.3-p1/bin/cmake
> -E cmake_link_script CMakeFiles/cmTryCompileExec3739174323.dir/link.txt
> --verbose=1
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/gcc-4.9.2/bin/gcc
>  -v CMakeFiles/cmTryCompileExec3739174323.dir/CMakeCCompilerABI.c.o
> -o
> cmTryCompileExec3739174323 -rdynamic
> make[1]: Leaving directory
> '/home/hzfengyu/impala/apache-impala/incubator-impala/CMakeFiles/CMakeTmp'
> Using built-in specs.
> COLLECT_GCC=/home/hzfengyu/impala/apache-impala/
> 

Re: material for impala newbie

2017-08-02 Thread Tim Armstrong
There's also a wiki page with some pointers:
https://cwiki.apache.org/confluence/display/IMPALA/Codegen

On Wed, Aug 2, 2017 at 10:05 AM, Henry Robinson  wrote:

> We don't have a lot of in-depth documentation, partly because the
> implementation details change frequently.
>
> Have you read the Impala paper?
> http://cidrdb.org/cidr2015/Papers/CIDR15_Paper28.pdf
> (here's a summary:
> https://blog.acolyer.org/2015/02/05/impala-a-modern-open-
> source-sql-engine-for-hadoop/
> )
>
> There's also an old paper on code generation:
> https://pdfs.semanticscholar.org/bac4/169d6b6f713c76271b5ccf3d452933
> 51f785.pdf
>
> But the very best thing to read is the source code...
>
> On 2 August 2017 at 09:59, 俊杰陈  wrote:
>
> > Hi
> >
> > I’m learning impala code now, is there anyone has any impala doc/PPT for
> > computing workflow (such as order by), vectorization, and codegen?
> Thanks
> > in advanced.
> >
> > --
> > Thanks & Best Regards
> >
>


Re: problem about buildall.sh

2017-08-02 Thread Tim Armstrong
Hi,
  I don't see an error in the output you pasted. Maybe it would help to
include the full output of buildall.sh and CmakeOutput.log

On Wed, Aug 2, 2017 at 5:32 AM, yu feng  wrote:

> Hi, I clone impala from
> https://git-wip-us.apache.org/repos/asf/incubator-impala.git, and try to
> run 'bash buildall.sh -noclean -skiptests -build_shared_libs -format', I
> have finish building impalad two months ago, However, I git pull all newest
> code, error happened:
>
>
> -- Found JNI:
> /home/hzfengyu/impala/deploy/jdk1.7.0_79/jre/lib/amd64/
> libjawt.so;/home/hzfengyu/impala/deploy/jdk1.7.0_79/jre/
> lib/amd64/libjsig.so;/home/hzfengyu/impala/deploy/jdk1.7.
> 0_79/jre/lib/amd64/server/libjvm.so
>
> -- --> Adding thirdparty library java_jvm. <--
> -- Header files:
> /home/hzfengyu/impala/deploy/jdk1.7.0_79/include;/home/
> hzfengyu/impala/deploy/jdk1.7.0_79/include/linux;/home/
> hzfengyu/impala/deploy/jdk1.7.0_79/include
> -- Added static library dependency java_jvm:
> /home/hzfengyu/impala/deploy/jdk1.7.0_79/jre/lib/amd64/server/libjvm.so
> -- --> Adding thirdparty library breakpad_client. <--
> -- Header files:
> /home/hzfengyu/impala/apache-impala/incubator-impala/toolchain/breakpad-
> ffe3e478657dc7126fca6329dfcedc49f4c726d9-p2/include/breakpad
> -- Added static library dependency breakpad_client:
> /home/hzfengyu/impala/apache-impala/incubator-impala/toolchain/breakpad-
> ffe3e478657dc7126fca6329dfcedc49f4c726d9-p2/lib/libbreakpad_client.a
> -- Added shared library dependency rt: /usr/lib/x86_64-linux-gnu/librt.so
> -- Added shared library dependency dl: /usr/lib/x86_64-linux-gnu/libdl.so
> Using Thrift compiler:
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/thrift-0.9.0-p9/bin/thrift
> Found output dir:
> /home/hzfengyu/impala/apache-impala/incubator-impala/shell/
> Using FlatBuffers compiler:
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/flatbuffers-1.6.0/bin/flatc
> --java-o/home/hzfengyu/impala/apache-impala/incubator-
> impala/fe/generated-sources/gen-java-b
> --cpp-o/home/hzfengyu/impala/apache-impala/incubator-
> impala/be/generated-sources/gen-cpp-b
> -- Could NOT find Doxygen (missing:  DOXYGEN_EXECUTABLE)
> -- WARNING: Doxygen not found - Docs will not be created
> -- Looking for sched_getcpu
> -- Looking for sched_getcpu - found
> -- Looking for pipe2
> -- Looking for pipe2 - found
> -- Looking for fallocate
> -- Looking for fallocate - found
> -- Looking for preadv
> -- Looking for preadv - found
> -- Looking for include file linux/magic.h
> -- Looking for include file linux/magic.h - found
> -- Compiler Flags:  -Wall -Wno-sign-compare -Wno-unknown-pragmas -pthread
> -fno-strict-aliasing -std=c++14 -Wno-deprecated -Wno-vla
> -DBOOST_DATE_TIME_POSIX_TIME_STD_CONFIG -DBOOST_SYSTEM_NO_DEPRECATED -B
> /home/hzfengyu/impala/apache-impala/incubator-impala/
> toolchain/binutils-2.26.1/bin/
> -fuse-ld=gold -g -Wno-unused-local-typedefs -ggdb -gdwarf-2 -Werror
> -fverbose-asm -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS
> -D__STDC_LIMIT_MACROS
> -- Common
> /home/hzfengyu/impala/apache-impala/incubator-impala/be/build/debug/
> -- Configuring incomplete, errors occurred!
> See also
> "/home/hzfengyu/impala/apache-impala/incubator-impala/
> CMakeFiles/CMakeOutput.log".
> Error in
> /home/hzfengyu/impala/apache-impala/incubator-impala/bin/make_impala.sh at
> line 160: cmake . ${CMAKE_ARGS[@]}
>
>
>
> I try to find something in CMakeOutput.log,  and I can not find any error,
> Is there I miss something? thanks a lot.
>


Re: Loading tpc-ds

2017-07-31 Thread Tim Armstrong
with errors
2017-07-31 23:55:38,631 ERROR exec.Task
(SessionState.java:printError(1103)) - Error during job, obtaining
debugging information...
2017-07-31 23:55:38,641 ERROR ql.Driver
(SessionState.java:printError(1103)) - FAILED: Execution Error, return code
2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask
2017-07-31 23:55:38,641 INFO  log.PerfLogger
(PerfLogger.java:PerfLogEnd(168)) - 


On Mon, Jul 31, 2017 at 8:03 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> I saw this on GVO: https://jenkins.impala.io/job/ubuntu-14.04-from-
> scratch/1807/
>
> I haven't pulled out the error from hive.log yet - for some reason that
> log is almost 500mb.
>
> On Thu, Jul 13, 2017 at 3:52 PM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
>> I'm not sure exactly what is going on, but I can confirm that I was able
>> to load data on Ubuntu 16.04 with OpenJDK 8 a while back.
>>
>> On Thu, Jul 13, 2017 at 2:58 PM, Jim Apple <jbap...@cloudera.com> wrote:
>>
>>> I also see this with the Oracle JDK. I have also now checked I am not
>>> running out of memory.
>>>
>>> Oracle JDK7 is harder to get one's hands on, and OpenJDK7 isn't packaged
>>> by
>>> canonical for Ubuntu 16.04.
>>>
>>> On Wed, Jul 12, 2017 at 11:20 PM, Jim Apple <jbap...@cloudera.com>
>>> wrote:
>>>
>>> > I'm getting data loading errors on Ubuntu 16.04 in TPC-DS. The terminal
>>> > shows:
>>> >
>>> > ERROR : FAILED: Execution Error, return code 2 from
>>> > org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>>> >
>>> > logs/cluster/hive/hive.log shows the error below, which previous bugs
>>> have
>>> > called an issue with the disk being out of space, but my disk has at
>>> least
>>> > 45GB left on it
>>> >
>>> > IMPALA-3246, IMPALA-2856, IMPALA-2617
>>> >
>>> > I see this with openJDK8. I haven't tried Oracle's JDK yet.
>>> >
>>> > Has anyone else seen this and been able to diagnose it as something
>>> that
>>> > doesn't mean a full disk?
>>> >
>>> >
>>> > FATAL ExecReducer (ExecReducer.java:reduce(264)) -
>>> > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
>>> > while processing row (tag=0) {"key":{},"value":{"_col0":
>>> > 48147,"_col1":17805,"_col2":27944,"_col3":606992,"_col4":
>>> > 3193,"_col5":16641,"_col6":10,"_col7":209,"_col8":44757,"_
>>> > col9":20,"_col10":5.51,"_col11":9.36,"_col12":9.17,"_
>>> > col13":0,"_col14":183.4,"_col15":110.2,"_col16":187.2,"_
>>> > col17":3.66,"_col18":0,"_col19":183.4,"_col20":187.06,"
>>> > _col21":73.2,"_col22":2452013}}
>>> > at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(
>>> > ExecReducer.java:253)
>>> > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(
>>> > ReduceTask.java:444)
>>> > at org.apache.hadoop.mapred.Reduc
>>> eTask.run(ReduceTask.java:392)
>>> > at org.apache.hadoop.mapred.LocalJobRunner$Job$
>>> > ReduceTaskRunnable.run(LocalJobRunner.java:346)
>>> > at java.util.concurrent.Executors$RunnableAdapter.
>>> > call(Executors.java:511)
>>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>>> > at java.util.concurrent.ThreadPoolExecutor.runWorker(
>>> > ThreadPoolExecutor.java:1142)
>>> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>>> > ThreadPoolExecutor.java:617)
>>> > at java.lang.Thread.run(Thread.java:748)
>>> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>>> > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>>> > /test-warehouse/tpcds.store_sales/.hive-staging_hive_2017-
>>> > 07-12_22-51-18_139_3687815919405186455-760/_task_
>>> > tmp.-ext-1/ss_sold_date_sk=2452013/_tmp.01_0 could only be
>>> > replicated to 0 nodes instead of minReplication (=1).  There are 3
>>> > datanode(s) running and no node(s) are excluded in this operation.
>>> > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.
>>> > chooseTarget4Ne

Re: Loading tpc-ds

2017-07-31 Thread Tim Armstrong
I saw this on GVO:
https://jenkins.impala.io/job/ubuntu-14.04-from-scratch/1807/

I haven't pulled out the error from hive.log yet - for some reason that log
is almost 500mb.

On Thu, Jul 13, 2017 at 3:52 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> I'm not sure exactly what is going on, but I can confirm that I was able
> to load data on Ubuntu 16.04 with OpenJDK 8 a while back.
>
> On Thu, Jul 13, 2017 at 2:58 PM, Jim Apple <jbap...@cloudera.com> wrote:
>
>> I also see this with the Oracle JDK. I have also now checked I am not
>> running out of memory.
>>
>> Oracle JDK7 is harder to get one's hands on, and OpenJDK7 isn't packaged
>> by
>> canonical for Ubuntu 16.04.
>>
>> On Wed, Jul 12, 2017 at 11:20 PM, Jim Apple <jbap...@cloudera.com> wrote:
>>
>> > I'm getting data loading errors on Ubuntu 16.04 in TPC-DS. The terminal
>> > shows:
>> >
>> > ERROR : FAILED: Execution Error, return code 2 from
>> > org.apache.hadoop.hive.ql.exec.mr.MapRedTask
>> >
>> > logs/cluster/hive/hive.log shows the error below, which previous bugs
>> have
>> > called an issue with the disk being out of space, but my disk has at
>> least
>> > 45GB left on it
>> >
>> > IMPALA-3246, IMPALA-2856, IMPALA-2617
>> >
>> > I see this with openJDK8. I haven't tried Oracle's JDK yet.
>> >
>> > Has anyone else seen this and been able to diagnose it as something that
>> > doesn't mean a full disk?
>> >
>> >
>> > FATAL ExecReducer (ExecReducer.java:reduce(264)) -
>> > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
>> > while processing row (tag=0) {"key":{},"value":{"_col0":
>> > 48147,"_col1":17805,"_col2":27944,"_col3":606992,"_col4":
>> > 3193,"_col5":16641,"_col6":10,"_col7":209,"_col8":44757,"_
>> > col9":20,"_col10":5.51,"_col11":9.36,"_col12":9.17,"_
>> > col13":0,"_col14":183.4,"_col15":110.2,"_col16":187.2,"_
>> > col17":3.66,"_col18":0,"_col19":183.4,"_col20":187.06,"
>> > _col21":73.2,"_col22":2452013}}
>> > at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(
>> > ExecReducer.java:253)
>> > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(
>> > ReduceTask.java:444)
>> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
>> > at org.apache.hadoop.mapred.LocalJobRunner$Job$
>> > ReduceTaskRunnable.run(LocalJobRunner.java:346)
>> > at java.util.concurrent.Executors$RunnableAdapter.
>> > call(Executors.java:511)
>> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> > at java.util.concurrent.ThreadPoolExecutor.runWorker(
>> > ThreadPoolExecutor.java:1142)
>> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
>> > ThreadPoolExecutor.java:617)
>> > at java.lang.Thread.run(Thread.java:748)
>> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
>> > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
>> > /test-warehouse/tpcds.store_sales/.hive-staging_hive_2017-
>> > 07-12_22-51-18_139_3687815919405186455-760/_task_
>> > tmp.-ext-1/ss_sold_date_sk=2452013/_tmp.01_0 could only be
>> > replicated to 0 nodes instead of minReplication (=1).  There are 3
>> > datanode(s) running and no node(s) are excluded in this operation.
>> > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.
>> > chooseTarget4NewBlock(BlockManager.java:1724)
>> > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
>> > getAdditionalBlock(FSNamesystem.java:3385)
>> > at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
>> > addBlock(NameNodeRpcServer.java:683)
>> > at org.apache.hadoop.hdfs.server.namenode.
>> > AuthorizationProviderProxyClientProtocol.addBlock(
>> > AuthorizationProviderProxyClientProtocol.java:214)
>> > at org.apache.hadoop.hdfs.protocolPB.
>> > ClientNamenodeProtocolServerSideTranslatorPB.addBlock(
>> > ClientNamenodeProtocolServerSideTranslatorPB.java:495)
>> > at org.apache.hadoop.hdfs.protocol.proto.
>> > ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBl
>>

Re: Reminder: "newbie" label on tickets

2017-07-31 Thread Tim Armstrong
Let's also make sure that everything with the "newbie" label is actually
straightforward and has a clear end-goal. Oh, and is reasonably issue to
test.

E.g. adding a built-in function is a good one if the semantics of the
function are clearly documented in the JIRA and there aren't any potential
compatibility issues.

We've seen a few new contributors pick up JIRAs with the newbie that
sounded easy but were actually tricky to get right - that's not a great
experience.



On Sun, Jul 30, 2017 at 1:30 PM, Jim Apple  wrote:

> As a reminder, when you file a ticket, you can label tickets that could be
> completed by a first-time Impala contributor "newbie". This can be a tool
> to help grow the community.
>


Re: IMPALA-5702 - disable shared linking on jenkins?

2017-07-24 Thread Tim Armstrong
I vote for changing Jenkins' linking strategy now and not changing it back
:). Static linking is the blessed configuration so I think we should be
running tests with that primarily.

On Mon, Jul 24, 2017 at 4:34 PM, Henry Robinson  wrote:

> On 24 July 2017 at 13:58, Todd Lipcon  wrote:
>
> > On Mon, Jul 24, 2017 at 1:47 PM, Henry Robinson 
> wrote:
> >
> > > Thanks for the asan pointer - I'll give it a go.
> > >
> > > My understanding of linking isn't deep, but my working theory has been
> > that
> > > the complications have been caused by glog getting linked twice - once
> > > statically (possibly into libkudu.so), and once dynamically (via
> everyone
> > > else).
> > >
> >
> > In libkudu_client.so, we use a linker script to ensure that we don't leak
> > glog/gflags/etc symbols. Those are all listed as 'local' in
> > src/kudu/client/symbols.map. We also have a unit test
> > 'client_symbol-test.sh' which uses nm to dump the list of symbols and
> make
> > sure that they all non-local non-weak symbols are under the 'kudu::'
> > namespace.
> >
> > So it's possible that something's getting linked twice but I'd be
> somewhat
> > surprised if it's from the Kudu client.
> >
> >
> Good to know, thanks.
>
> ASAN hasn't turned up anything yet - so does anyone have an opinion about
> changing Jenkins' linking strategy for now?
>
>
> > -Todd
> >
> >
> > >
> > > I would think that could lead to one or both of the issues you linked
> to.
> > >
> > >
> > > On 24 July 2017 at 13:39, Todd Lipcon  wrote:
> > >
> > > > Is it possible that the issue here is due to a "one definition rule"
> > > > violation? eg something like
> > > > https://github.com/google/sanitizers/wiki/AddressSanitizerOn
> > > > eDefinitionRuleViolation
> > > > Another similar thing is described here:
> > > > https://github.com/google/sanitizers/wiki/AddressSanitizerIn
> > > > itializationOrderFiasco
> > > >
> > > > ASAN with the appropriate flags might help expose if one of the above
> > is
> > > > related.
> > > >
> > > > I wonder whether it is a kind of coincidence that it is fine in a
> > static
> > > > build but causes problems in dynamic, and at some point the static
> link
> > > > order may slightly shift, causing another new subtle bug.
> > > >
> > > >
> > > >
> > > > On Mon, Jul 24, 2017 at 1:22 PM, Henry Robinson 
> > > wrote:
> > > >
> > > > > We've started seeing isolated incidences of IMPALA-5702 during
> GVOs,
> > > > where
> > > > > a custom cluster test fails by throwing an exception during locale
> > > > > handling.
> > > > >
> > > > > I've been able to reproduce this locally, but only with shared
> > linking
> > > > > enabled (which makes sense since the issue is symptomatic of a
> global
> > > > c'tor
> > > > > not getting called the right number of times).
> > > > >
> > > > > It's probable that my patch for IMPALA-5659 exposed this (since it
> > > > forced a
> > > > > more correct linking strategy for thirdparty libraries when dynamic
> > > > linking
> > > > > was enabled), but it looks to me at first glance like there were
> > latent
> > > > > dynamic linking bugs that we weren't getting hit by. Fixing
> > IMPALA-5702
> > > > > will probably take a while, and I don't think we should hold up
> GVOs
> > or
> > > > put
> > > > > them at risk.
> > > > >
> > > > > So there are two options:
> > > > >
> > > > > 1. Revert IMPALA-5659
> > > > >
> > > > > 2. Switch GVO to static linking
> > > > >
> > > > > IMPALA-5659 is important to commit the kudu util library, which is
> > > needed
> > > > > for the KRPC work. Without it, shared linking doesn't work *at all*
> > > when
> > > > > the kudu util library is committed.
> > > > >
> > > > > Static linking doesn't take much longer in my unscientific
> > > measurements,
> > > > > and is closer to how Impala is actually used. In the interest of
> > > forward
> > > > > progress I'd like to try switching ubuntu-14.04-from-scratch to use
> > > > static
> > > > > linking while I work on IMPALA-5702.
> > > > >
> > > > > What does everyone else think?
> > > > >
> > > > > Henry
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Todd Lipcon
> > > > Software Engineer, Cloudera
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>


Re: Error of impala query empty parquet file

2017-07-21 Thread Tim Armstrong
Hi Yu Feng,
  It looks like we already changed Impala to accept valid files with no row
groups: https://issues.apache.org/jira/browse/IMPALA-3943

That error should only be hit if the file metadata reports that it has rows:

  // IMPALA-3943: Do not throw an error for empty files for backwards
compatibility.
  if (file_metadata_.num_rows == 0) return Status::OK();

  // Parse out the created by application version string
  if (file_metadata_.__isset.created_by) {
file_version_ = ParquetFileVersion(file_metadata_.created_by);
  }
  if (file_metadata_.row_groups.empty()) {
return Status(
Substitute("Invalid file. This file: $0 has no row groups",
filename()));
  }

On Sun, Jul 16, 2017 at 11:36 PM, yu feng  wrote:

> Hi all,
>
>I always have a query error when I query a parquet table and the table
> have a empty parquet file, which means the files only have footer
> information and do not have any row group.
>
> I check the code and find the code:
>
>   if (file_metadata_.row_groups.empty()) {
> return Status(
> Substitute("Invalid file. This file: $0 has no row groups",
> filename()));
>   }
>
> I want to modify the logic, If find a no-row-group file, I want to skip the
> scan range and do not return any row-batch from the parquet-scanner, Is it
> right to doing like this, and do you have some another suggestion about
> the situation?
>
> Thanks a lots
>


Re: What is dictionary filter in Impala?

2017-07-19 Thread Tim Armstrong
Hi,
  The Parquet format supports various encodings that help compress columns
of data with different characteristics. Dictionary encoding is useful if
there are many repeats of the same value in the same column. E.g. if you
have a string column with country names - you might have "Australia",
"USA", "China" repeated many times. If there are <= 40,000 distinct values
a column can be encoded with a dictionary: at the start of the column there
is a dictionary with all of the distinct values, then the data is
represented as integers.

 E.g. if the dictionary was ["Australia", "USA", "China"], then "China"
would be encoded as 2.

Dictionary filtering takes advantage of this to speed up scans. E.g. if I
have a query like "select * from my_table where country = 'Iceland'", then
we can check the dictionary for a Parquet row group before scanning the row
group. If no entries in the dictionary match the condition, then we can
skip the whole row group.

On Wed, Jul 19, 2017 at 3:22 AM, Wang Chunling 
wrote:

> Hi,
>
> I find there is dictionary filter in Impala when doing Parquet scan. The
> comment says the column is 100% dictionary encoded can be dictionary
> filtered. Can you explain what kind of columns can be dictionary encoded?
> And is there any example of dictionary filter? Thanks a lot.
>
>
> Chunling


Re: jenkins.impala.io switching to SSL

2017-07-18 Thread Tim Armstrong
Thanks for taking care of this Lars!

On Tue, Jul 18, 2017 at 1:43 PM, Lars Volker  wrote:

> Hi All,
>
> Jenkins has been running with SSL for the past few days and I haven't
> received any complaints. If no-one objects, tomorrow morning (Wednesday,
> PST) I will configure http://jenkins.impala.io:8080/ to redirect to
> https://jenkins.impala.io. From that point on, Jenkins will also post
> links
> to its https endpoint in code reviews.
>
> Let me know if you have any questions or concerns.
>
> Cheers, Lars
>
> On Fri, Jul 14, 2017 at 10:55 PM, Lars Volker  wrote:
>
> > Hi All,
> >
> > our Jenkins instance now has a proper SSL certificate and can be reached
> > at https://jenkins.impala.io. The old redirect from http://j.i.o now
> > points to the SSL endpoint instead of port 8080.
> >
> > If you run into any issues with the SSL setup, please let me know. As a
> > workaround you can still access Jenkins directly at
> > http://jenkins.impala.io:8080/. If no-one reports any issues in the next
> > few days, I will eventually make that URL redirect to SSL, too, so all
> > connections will be secured.
> >
> > Cheers, Lars
> >
>


Re: Re: Impala hadoop variable

2017-07-17 Thread Tim Armstrong
I'm not sure that I fully understand the question.

There isn't a way to override HADOOP_CONF_DIR mostly  - most scripts source
impala-config.sh.

On Sun, Jul 16, 2017 at 8:31 PM, sky  wrote:

> Hi Tim,
> I found it from ./bin/create-test-configuration.sh that generating
> ./fe/src/test/resources configurations, and HADOOP_CONFIG_DIR variable also
> points to this directory. But I change this variable is not take effect. Is
> this a hard code?


Re: Heads up: buffer pool switchover coming soon

2017-07-14 Thread Tim Armstrong
Also, the old hash join and aggregations will be removed and
--enable_partitioned_aggregation=false and
--enable_partitioned_hash_join=false flags will become no-ops.

Another change that may be relevant for development is that the
max_block_mgr_memory query option is replaced with buffer_pool_limit. It
has essentially the same effect for now.

On Fri, Jul 14, 2017 at 12:16 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> Hi All,
>   I have +2s on the main patches to switch query execution over from the
> old BufferedBlockMgr to the new BufferPool. I wanted to let everyone know
> what to expect. I'll do it in a question and answer format.
>
> ** When is the merge happening?*
> Once I've done enough testing (stress test, perf, etc) to be confident
> that merging the changes won't disrupt other people's workflow. Probably
> mid-to-late next week.
>
> ** Is the new code production ready?*
> No. Although it is already more robust in many ways than the old code. We
> need time to test it and have people play around with it to find any bugs
> or usability problems before we release it.
>
> I also need to finish executing the full test plan
> <https://docs.google.com/document/d/10glhb7KKc_2JeSMQTxb0Zc_l7A-w1IqsmyqcgcJj_Ao/edit?usp=sharing>.
> I'm using IMPALA-3200 <https://issues.apache.org/jira/browse/IMPALA-3200>
> to track completion of the test plan and remaining issues.
>
>
> ** Is this the last piece of the memory management/spill-to-disk work?*
> No, this is a big milestone, but there's lot of other improvements that
> will be unblocked by this. E.g. HDFS scan memory improvements, admission
> control improvements, spill-to-disk performance improvements,
> simplification of memory transfer.
>
> I've tried to link all the JIRAs to IMPALA-3200.
> <https://issues.apache.org/jira/browse/IMPALA-3200>
>
> ** How will the change affect production users of Impala?*
> In most circumstances the switch should be non-disruptive. We've put a lot
> of effort into this part of the design and testing.
>
> Query memory requirements will generally decrease, although may increase
> somewhat in some circumstances because more memory is reserved upfront,
> instead of allocated best-effort. We now have more safety-valve tuning
> parameters that give some degree of control over this.
>
> "Memory limit exceeded" will become a lot less frequent - instead the
> query will fail at startup if it cannot get its initial memory reservation.
> IMPALA-4834 <https://issues.apache.org/jira/browse/IMPALA-4834> tracks
> fixing the remaining common cases of "memory limit exceeded"
>
> Spill-to-disk performance may change somewhat because so much code has
> changed, and the default spill buffer size is smaller (2MB vs. 8MB), but in
> my experiments the performance change has been within variance.
>
> In-memory performance of large aggs and joins will improve significantly
> if transparent huge pages are available due to reduced TLB misses.
>
> ** How does this affect my development workflow?*
> Hopefully not much at all, unless you are working directly on
> spill-to-disk or memory management code. In the medium to long term this
> switch will unblock a lot of other work. It will also become easier to
> understand and test the spill-to-disk code - we'll catch more bugs in
> functional testing instead of stress testing.
>
> If you are working on adding new operators you should be thinking about
> how they can be made to operate in a memory constraint: IMPALA-4834
> <https://issues.apache.org/jira/browse/IMPALA-4834>
>
> If you see anything weird check to see if there's a JIRA linked to
> IMPALA-3200, <https://issues.apache.org/jira/browse/IMPALA-3200>or just
> file a bug and assign it to me.
>
> ** Can I switch back to the old code with a flag?*
> No, it would have been very difficult to keep both versions of the code
> enabled side-by-side. It would have been very difficult to test the old
> code as well. One of the main goals of the work is to reduce the volume of
> legacy code we have to maintain.
>
> ** How can I learn about the new code?*
> I'm working on putting together slides with a high-level summary. The new
> APIs are documented in detail in the headers in be/src/runtime/bufferpool/.
> I'd also recommend looking at explain plans with explain_level=2, which has
> information about memory reservations.
>
> Thanks everyone, please reach out if you have any questions or concerns.
>
> - Tim
> <https://issues.apache.org/jira/browse/IMPALA-3200>
>


Heads up: buffer pool switchover coming soon

2017-07-14 Thread Tim Armstrong
Hi All,
  I have +2s on the main patches to switch query execution over from the
old BufferedBlockMgr to the new BufferPool. I wanted to let everyone know
what to expect. I'll do it in a question and answer format.

** When is the merge happening?*
Once I've done enough testing (stress test, perf, etc) to be confident that
merging the changes won't disrupt other people's workflow. Probably
mid-to-late next week.

** Is the new code production ready?*
No. Although it is already more robust in many ways than the old code. We
need time to test it and have people play around with it to find any bugs
or usability problems before we release it.

I also need to finish executing the full test plan
.
I'm using IMPALA-3200 
to track completion of the test plan and remaining issues.


** Is this the last piece of the memory management/spill-to-disk work?*
No, this is a big milestone, but there's lot of other improvements that
will be unblocked by this. E.g. HDFS scan memory improvements, admission
control improvements, spill-to-disk performance improvements,
simplification of memory transfer.

I've tried to link all the JIRAs to IMPALA-3200.


** How will the change affect production users of Impala?*
In most circumstances the switch should be non-disruptive. We've put a lot
of effort into this part of the design and testing.

Query memory requirements will generally decrease, although may increase
somewhat in some circumstances because more memory is reserved upfront,
instead of allocated best-effort. We now have more safety-valve tuning
parameters that give some degree of control over this.

"Memory limit exceeded" will become a lot less frequent - instead the query
will fail at startup if it cannot get its initial memory reservation.
IMPALA-4834  tracks
fixing the remaining common cases of "memory limit exceeded"

Spill-to-disk performance may change somewhat because so much code has
changed, and the default spill buffer size is smaller (2MB vs. 8MB), but in
my experiments the performance change has been within variance.

In-memory performance of large aggs and joins will improve significantly if
transparent huge pages are available due to reduced TLB misses.

** How does this affect my development workflow?*
Hopefully not much at all, unless you are working directly on spill-to-disk
or memory management code. In the medium to long term this switch will
unblock a lot of other work. It will also become easier to understand and
test the spill-to-disk code - we'll catch more bugs in functional testing
instead of stress testing.

If you are working on adding new operators you should be thinking about how
they can be made to operate in a memory constraint: IMPALA-4834


If you see anything weird check to see if there's a JIRA linked to
IMPALA-3200, or just
file a bug and assign it to me.

** Can I switch back to the old code with a flag?*
No, it would have been very difficult to keep both versions of the code
enabled side-by-side. It would have been very difficult to test the old
code as well. One of the main goals of the work is to reduce the volume of
legacy code we have to maintain.

** How can I learn about the new code?*
I'm working on putting together slides with a high-level summary. The new
APIs are documented in detail in the headers in be/src/runtime/bufferpool/.
I'd also recommend looking at explain plans with explain_level=2, which has
information about memory reservations.

Thanks everyone, please reach out if you have any questions or concerns.

- Tim



Re: Impala build issue

2017-07-14 Thread Tim Armstrong
Hi Sky,
 I'm not sure why that isn't working for you - it seems to work ok for me
on my machine. Maybe a maven expert knows how to change timeouts?

- Tim

On Thu, Jul 13, 2017 at 8:30 PM, sky  wrote:

> Hi all,
> I use this command( ./buildall.sh -fe_only -notests -so) to compile
> impala,in the fe module will be given the error(Connection timed out), But
> on the host using the "telnet" command to connect the
> repository.cloudera.com:443 is success. why?
> The following is a detailed error
> Scanning dependencies of target fe
> 
> Running mvn install -DskipTests
> Directory: /home/dreambase/dreambase_deploy/fe
> 
> [WARNING] Could not transfer metadata 
> com.cloudera.cdh:cdh-root:5.13.0-SNAPSHOT/maven-metadata.xml
> from/to ${distMgmtSnapshotsId} (${distMgmtSnapshotsUrl}): Cannot access
> ${distMgmtSnapshotsUrl} with type default using the available connector
> factories: BasicRepositoryConnectorFactory
>
>
> [WARNING] Could not transfer metadata org.apache.sentry:sentry-bindi
> ng-hive-conf:1.5.1-cdh5.13.0-SNAPSHOT/maven-metadata.xml from/to
> cdh.snapshots.repo (https://repository.cloudera.c
> om/content/repositories/snapshots): Connect to repository.cloudera.com:443
> [repository.cloudera.com/34.197.206.76, repository.cloudera.com/34.201
> .234.30] failed: Connection timed out
> [WARNING] Failure to transfer org.apache.sentry:sentry-bindi
> ng-hive-conf:1.5.1-cdh5.13.0-SNAPSHOT/maven-metadata.xml from
> https://repository.cloudera.com/content/repositories/snapshots was cached
> in the local repository, resolution will not be reattempted until the
> update interval of cdh.snapshots.repo has elapsed or updates are forced.
> Original error: Could not transfer metadata org.apache.sentry:sentry-bindi
> ng-hive-conf:1.5.1-cdh5.13.0-SNAPSHOT/maven-metadata.xml from/to
> cdh.snapshots.repo (https://repository.cloudera.c
> om/content/repositories/snapshots): Connect to repository.cloudera.com:443
> [repository.cloudera.com/34.197.206.76, repository.cloudera.com/34.201
> .234.30] failed: Connection timed out
> [WARNING] Could not transfer metadata org.apache.sentry:sentry-bindi
> ng:1.5.1-cdh5.13.0-SNAPSHOT/maven-metadata.xml from/to
> cdh.rcs.releases.repo (https://repository.cloudera.c
> om/content/groups/cdh-releases-rcs): Connect to
> repository.cloudera.com:443 [repository.cloudera.com/34.201.234.30,
> repository.cloudera.com/34.197.206.76] failed: Connection timed out


Re: Impala hadoop variable

2017-07-14 Thread Tim Armstrong
You need to run ./bin/create-test-configuration.sh to generate the test
configurations. That's automatically run by buildall.sh in some
circumstances but if you clean your workspace the configs get deleted. I
often run the following commands to clean my workspace:

./bin/clean.sh && ./bin/create-test-configuration.sh

On Thu, Jul 13, 2017 at 11:17 PM, sky  wrote:

> Hi all,
>  When I used  the command "./bin/start-impala-cluster.py" to start
> the impala cluster, it will report the following error:
>  impalad.INFO:
>   I0714 13:37:13.292771  9363 status.cc:122] Currently
> configured default filesystem: LocalFileSystem. fs.defaultFS (file:///) is
> no
> t supported.
>
>
> HADOOP_CONF_DIR  and DEFAULT_FS  variables in the
> ./bin/impala-config.sh is set correctly(hdfs://).The same is true for the
> hadoop configuration file.
> Execute the /bin/start-impala-cluster.py file will get the correct
> "hdfs://" configuration only on the hadoop configuration directory , no
> other directory. Why ? How does the variable affects and takes effect.
>
>


Re: Loading tpc-ds

2017-07-13 Thread Tim Armstrong
I'm not sure exactly what is going on, but I can confirm that I was able to
load data on Ubuntu 16.04 with OpenJDK 8 a while back.

On Thu, Jul 13, 2017 at 2:58 PM, Jim Apple  wrote:

> I also see this with the Oracle JDK. I have also now checked I am not
> running out of memory.
>
> Oracle JDK7 is harder to get one's hands on, and OpenJDK7 isn't packaged by
> canonical for Ubuntu 16.04.
>
> On Wed, Jul 12, 2017 at 11:20 PM, Jim Apple  wrote:
>
> > I'm getting data loading errors on Ubuntu 16.04 in TPC-DS. The terminal
> > shows:
> >
> > ERROR : FAILED: Execution Error, return code 2 from
> > org.apache.hadoop.hive.ql.exec.mr.MapRedTask
> >
> > logs/cluster/hive/hive.log shows the error below, which previous bugs
> have
> > called an issue with the disk being out of space, but my disk has at
> least
> > 45GB left on it
> >
> > IMPALA-3246, IMPALA-2856, IMPALA-2617
> >
> > I see this with openJDK8. I haven't tried Oracle's JDK yet.
> >
> > Has anyone else seen this and been able to diagnose it as something that
> > doesn't mean a full disk?
> >
> >
> > FATAL ExecReducer (ExecReducer.java:reduce(264)) -
> > org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error
> > while processing row (tag=0) {"key":{},"value":{"_col0":
> > 48147,"_col1":17805,"_col2":27944,"_col3":606992,"_col4":
> > 3193,"_col5":16641,"_col6":10,"_col7":209,"_col8":44757,"_
> > col9":20,"_col10":5.51,"_col11":9.36,"_col12":9.17,"_
> > col13":0,"_col14":183.4,"_col15":110.2,"_col16":187.2,"_
> > col17":3.66,"_col18":0,"_col19":183.4,"_col20":187.06,"
> > _col21":73.2,"_col22":2452013}}
> > at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(
> > ExecReducer.java:253)
> > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(
> > ReduceTask.java:444)
> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> > at org.apache.hadoop.mapred.LocalJobRunner$Job$
> > ReduceTaskRunnable.run(LocalJobRunner.java:346)
> > at java.util.concurrent.Executors$RunnableAdapter.
> > call(Executors.java:511)
> > at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> > at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > ThreadPoolExecutor.java:1142)
> > at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:748)
> > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException:
> > org.apache.hadoop.ipc.RemoteException(java.io.IOException): File
> > /test-warehouse/tpcds.store_sales/.hive-staging_hive_2017-
> > 07-12_22-51-18_139_3687815919405186455-760/_task_
> > tmp.-ext-1/ss_sold_date_sk=2452013/_tmp.01_0 could only be
> > replicated to 0 nodes instead of minReplication (=1).  There are 3
> > datanode(s) running and no node(s) are excluded in this operation.
> > at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.
> > chooseTarget4NewBlock(BlockManager.java:1724)
> > at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.
> > getAdditionalBlock(FSNamesystem.java:3385)
> > at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.
> > addBlock(NameNodeRpcServer.java:683)
> > at org.apache.hadoop.hdfs.server.namenode.
> > AuthorizationProviderProxyClientProtocol.addBlock(
> > AuthorizationProviderProxyClientProtocol.java:214)
> > at org.apache.hadoop.hdfs.protocolPB.
> > ClientNamenodeProtocolServerSideTranslatorPB.addBlock(
> > ClientNamenodeProtocolServerSideTranslatorPB.java:495)
> > at org.apache.hadoop.hdfs.protocol.proto.
> > ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.
> callBlockingMethod(
> > ClientNamenodeProtocolProtos.java)
> > at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$
> > ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
> > at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2217)
> > at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2213)
> > at java.security.AccessController.doPrivileged(Native Method)
> > at javax.security.auth.Subject.doAs(Subject.java:422)
> > at org.apache.hadoop.security.UserGroupInformation.doAs(
> > UserGroupInformation.java:1917)
> > at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2211)
> >
> > at org.apache.hadoop.hive.ql.exec.FileSinkOperator.
> > processOp(FileSinkOperator.java:751)
> > at org.apache.hadoop.hive.ql.exec.Operator.forward(
> > Operator.java:815)
> > at org.apache.hadoop.hive.ql.exec.SelectOperator.processOp(
> > SelectOperator.java:84)
> > at org.apache.hadoop.hive.ql.exec.mr.ExecReducer.reduce(
> > ExecReducer.java:244)
> >
>


Re: impala debug

2017-07-13 Thread Tim Armstrong
Removing the user list...

The Oracle docs say "

The -Xcheck:jni option causes the VM to do additional validation on the
arguments passed to JNI functions"


I guessthat means there's a buginhow we're using JNI?

On Thu, Jul 13, 2017 at 5:44 AM, Wang Lei(北京研发中心.112456) <
alaleiw...@sohu-inc.com> wrote:

> Hi all:
>   I followed https://cwiki.apache.org/confluence/display/IMPALA/
> Impala+Debugging+Tips to debug impalad fe
>   My step:
>  JAVA_TOOL_OPTIONS="-agentlib:jdwp=transport=dt_socket,
> address=localhost:9009,server=y,suspend=y -Xcheck:jni"
>
>   When using idea attaching,impalad process exited,with following error
> msg:
> I0713 16:49:02.511322 32715 JniFrontend.java:637] checkConfiguration
> called.
> I0713 16:49:02.538199 32715 JniFrontend.java:638] fs.defaultFS=hdfs://
> nameservice1
> I0713 16:49:02.538336 32715 JniFrontend.java:639]
> dfs.nameservices=nameservice1
> I0713 16:49:02.538432 32715 JniFrontend.java:640] foo=null
> I0713 16:49:05.230675 32715 JniFrontend.java:679] Short-circuit reads are
> not enabled.
> I0713 16:49:05.231148 32715 tmp-file-mgr.cc:109] Using scratch directory
> /opt/impala/impalad/impala-scratch on disk 1
> I0713 16:49:05.231283 32715 simple-logger.cc:83] Logging to:
> /tmp/profiles//impala_profile_log_1.1-1499935745231
> I0713 16:49:05.233969 32715 impala-server.cc:492] Event logging is disabled
> I0713 16:49:05.234060 32715 simple-logger.cc:83] Logging to:
> /opt/var/log/impalad/lineage/impala_lineage_log_1.0-1499935745234
> I0713 16:49:05.284077#
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (javaCalls.cpp:550), pid=32715, tid=140494292248832
> #  guarantee(method->size_of_parameters() == size_of_parameters())
> failed: wrong no. of arguments pushed
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build
> 1.7.0_67-b01)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode
> linux-amd64 compressed oops)
> # Failed to write core dump. Core dumps have been disabled. To enable core
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /run/cloudera-scm-agent/process/5532-impala-IMPALAD/
> impala-conf/hs_err_pid32715.log
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.sun.com/bugreport/crash.jsp
> #
>  32715 impala-server.cc:1856] Impala Beeswax Service listening on 21000
> I0713 16:49:05.285711 32715 impala-server.cc:1878] Impala HiveServer2
> Service listening on 21050
> I0713 16:49:05.287331 32715 impala-server.cc:1897] ImpalaInternalService
> listening on 22000
> I0713 16:49:05.290220 32715 thrift-server.cc:449] ThriftServer 'backend'
> started on port: 22000
> I0713 16:49:05.504369 32715 thrift-server.cc:449] ThriftServer
> 'beeswax-frontend' started on port: 21000
> I0713 16:49:05.878450 32715 thrift-server.cc:449] ThriftServer
> 'hiveserver2-frontend' started on port: 21050
> I0713 16:49:05.878495 32715 exec-env.cc:241] Starting global services
>
> And /run/cloudera-scm-agent/process/5532-impala-IMPALAD/
> impala-conf/hs_err_pid32715.log showed:
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  Internal Error (javaCalls.cpp:550), pid=32715, tid=140494292248832
> #  guarantee(method->size_of_parameters() == size_of_parameters())
> failed: wrong no. of arguments pushed
> #
> # JRE version: Java(TM) SE Runtime Environment (7.0_67-b01) (build
> 1.7.0_67-b01)
> # Java VM: Java HotSpot(TM) 64-Bit Server VM (24.65-b04 mixed mode
> linux-amd64 compressed oops)
> # Failed to write core dump. Core dumps have been disabled. To enable core
> dumping, try "ulimit -c unlimited" before starting Java again
> #
> # If you would like to submit a bug report, please visit:
> #   http://bugreport.sun.com/bugreport/crash.jsp
> #
>
> ---  T H R E A D  ---
>
> Current thread (0x0662e000):  JavaThread "main" [_thread_in_vm,
> id=32715, stack(0x7ffe33568000,0x7ffe33668000)]
>
> Stack: [0x7ffe33568000,0x7ffe33668000],  sp=0x7ffe33660020,
> free space=992k
> Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native
> code)
> V  [libjvm.so+0x99eb8a]  VMError::report_and_die()+0x2ea
> V  [libjvm.so+0x497282]  report_vm_error(char const*, int, char const*,
> char const*)+0x62
> V  [libjvm.so+0x5ff69a]  JavaCallArguments::verify(methodHandle,
> BasicType, Thread*)+0x5a
> V  [libjvm.so+0x5ff9ed]  JavaCalls::call_helper(JavaValue*,
> methodHandle*, JavaCallArguments*, Thread*)+0x1ed
> V  [libjvm.so+0x5fe5c8]  JavaCalls::call(JavaValue*, methodHandle,
> JavaCallArguments*, Thread*)+0x28
> V  [libjvm.so+0x638dd4]  jni_invoke_nonstatic(JNIEnv_*, JavaValue*,
> _jobject*, JNICallType, _jmethodID*, JNI_ArgumentPusher*, Thread*)+0x2b4
> V  [libjvm.so+0x649ab9]  jni_CallObjectMethodV+0xe9
> V  [libjvm.so+0x67293e]  checked_jni_CallObjectMethodV+0x15e
> C  [impalad+0xdb9799]  

Re: Can't start minicluster

2017-07-09 Thread Tim Armstrong
Maybe the thrift be/generated-sources are out of sync with the source code?

We had some kind of metastore scheme upgrade that caused the other one.
Dimitris' instructions to fix them were:

> To fix this without doing a full data reload, you can use the following
command:
> ${IMPALA_TOOLCHAIN}/cdh_components/hive-1.1.0-cdh5.13.0-
SNAPSHOT/bin/schematool
-upgradeSchema -dbType {type}
> where type is one of 'postgres' or 'mysql', depending on your setup.

On Sun, Jul 9, 2017 at 3:52 PM, Jim Apple  wrote:

> I am getting the following message in FATAL when I try to start a
> minicluster
>
> Check failed: _TImpalaQueryOptions_VALUES_TO_NAMES.size() ==
> TImpalaQueryOptions::DEFAULT_JOIN_DISTRIBUTION_MODE + 1 (57 vs. 56)
>
> Any ideas what is going on? I was actually trying to buildall.sh
> -format_metastore -format_sentry_policy_db because I was seeing messages
> like the following (in hive.log) when I tried to start the minicluster:
>
>  org.postgresql.util.PSQLException: ERROR: column A0.SCHEMA_VERSION_V2
> does
> not exist
>


Re: encounter errors when doing a fresh build

2017-07-05 Thread Tim Armstrong
Yeah that step is actually meant to use the system python - it's mean to
build a standalone tarball for impala-shell that you can use with the
system python. Maybe we shouldn't have it as part of buildall.sh for the
development process... most people don't need it.

On Wed, Jul 5, 2017 at 1:58 AM, Amos Bird  wrote:

>
> Ah, fixed it by updating setuptools and packaging.
>
> sudo pip install --upgrade setuptools
> sudo pip install --upgrade packaging
>
>
> Amos Bird  writes:
>
> > Dear Impala community,
> >
> > I just tried out the head commit
> >
> > ```
> > commit 9f678a74269250bf5c7ae2c5e8afd93c5b3734de (HEAD -> master,
> origin/master, origin/HEAD)
> > Author: Alex Behm 
> > Date:   Tue Jun 6 16:54:41 2017 -0700
> > ```
> >
> > in a centos 7 box.
> >
> > I did `./buildall.sh -notests -noclean -build_shared_libs` then got
> stuck at this following error:
> >
> > ```
> > # Generated version information from save-version.sh
> > VERSION: 2.10.0-SNAPSHOT
> > GIT_HASH: 9f678a74269250bf5c7ae2c5e8afd93c5b3734de
> > BUILD_TIME: Tue Jul  4 20:20:29 CST 2017
> > Deleting all files in /home/amos/softwares/impala/
> shell/build/impala-shell-2.10.0-SNAPSHOT/{gen-py,lib,ext-py}
> > Building all external modules into eggs
> > Cleaning up old build artifacts.
> > Creating an egg for /home/amos/softwares/impala/
> shell/ext-py/prettytable-0.7.1
> > Traceback (most recent call last):
> >   File "setup.py", line 2, in 
> > from setuptools import setup
> >   File "/usr/lib/python2.7/site-packages/setuptools/__init__.py", line
> 2, in 
> > from setuptools.extension import Extension, Library
> >   File "/usr/lib/python2.7/site-packages/setuptools/extension.py", line
> 5, in 
> > from setuptools.dist import _get_unpatched
> >   File "/usr/lib/python2.7/site-packages/setuptools/dist.py", line 7,
> in 
> > from setuptools.command.install import install
> >   File "/usr/lib/python2.7/site-packages/setuptools/command/__init__.py",
> line 8, in 
> > from setuptools.command import install_scripts
> >   File 
> > "/usr/lib/python2.7/site-packages/setuptools/command/install_scripts.py",
> line 3, in 
> > from pkg_resources import Distribution, PathMetadata,
> ensure_directory
> >   File "/usr/lib/python2.7/site-packages/pkg_resources/__init__.py",
> line 72, in 
> > import packaging.requirements
> >   File "/usr/lib/python2.7/site-packages/packaging/requirements.py",
> line 59, in 
> > MARKER_EXPR = originalTextFor(MARKER_EXPR())("marker")
> > TypeError: __call__() takes exactly 2 arguments (1 given)
> > Error in shell/make_shell_tarball.sh at line 96: python setup.py -q
> bdist_egg clean
> > make[3]: *** [CMakeFiles/shell_tarball] Error 1
> > make[2]: *** [CMakeFiles/shell_tarball.dir/all] Error 2
> > make[1]: *** [CMakeFiles/tarballs.dir/rule] Error 2
> > make: *** [tarballs] Error 2
> > Error in /home/amos/softwares/impala/bin/make_impala.sh at line 179:
> ${MAKE_CMD} ${MAKE_ARGS} ${MAKE_TARGETS}
> > ```
> >
> > It seems the script is trying to use system's python. Is that normal?
> >
> > regards,
> > Amos
>
>


Re: Re: Impala make install

2017-06-21 Thread Tim Armstrong
 THIS IS NOT AN OFFICIALLY SUPPORTED DEPLOYMENT METHOD *
I believe the artifacts that are actually used at runtime are impalad,
catalogd, statestored, libfesupport.so, impala-frontend-0.1-SNAPSHOT.jar,
impala-data-source-api-1.0-SNAPSHOT.jar, libstdc++.so* libkudu_client.so*

You may be able to get something to work if you replace the versions
currently on the cluster with the output from your Impala build. This is a
purely experimental method of deployment and may or may not work as
expected. If your cluster catches on fire, don't blame me.


On Wed, Jun 21, 2017 at 4:03 PM, 孙清孟 <sqm2...@gmail.com> wrote:

> Hi Jeszy,
>Sorry that I didn't make it clear, I've already installed impala servive
> with CM.
>I did some modification in Impala, and built it successfully.
>now I want to replace the impala installed by CM.
>
> 2017-06-21 19:29 GMT+08:00 Jeszy <jes...@gmail.com>:
>
> > With CM you can just add Impala as a new service to your cluster. Use
> > the dropdown next to the cluster name. A binary version of impala is
> > shipped as part of the CDH 5.11 parcels that you have installed.
> >
> > On 21 June 2017 at 11:58, 孙清孟 <sqm2...@gmail.com> wrote:
> > > Hi Tim,
> > >   I've built Impala according to the describe here:
> > > https://cwiki.apache.org/confluence/display/IMPALA/Building+Impala
> > >   But how can I install the Impala to an already running
> > cdh-5.11.0-release
> > > cluster that managered by Cloudera Manager.
> > >   Build Debian packages and use `apt-get`?
> > >
> > > 2017-06-21 11:16 GMT+08:00 Henry Robinson <he...@apache.org>:
> > >
> > >> I don't think there's any plan for this work. The CMake documentation
> > would
> > >> be where I'd start looking for ideas:
> > >>
> > >> https://cmake.org/cmake/help/v3.2/command/install.html
> > >>
> > >> Best,
> > >> Henry
> > >>
> > >> On 20 June 2017 at 18:31, sky <x_h...@163.com> wrote:
> > >>
> > >> > Hi Tim,
> > >> >Is there a plan for this work? Could you provide a manual copy of
> > the
> > >> > example?Thanks.
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > At 2017-06-21 01:41:33, "Tim Armstrong" <tarmstr...@cloudera.com>
> > wrote:
> > >> > >Hi Sky,
> > >> > >  We have not implemented an install target yet - for deployment we
> > rely
> > >> > on
> > >> > >copying out the artifacts manually. I believe CMake has some
> support
> > for
> > >> > >implementing install targets but nobody has picked up that work
> yet.
> > >> > >
> > >> > >- Tim
> > >> > >
> > >> > >On Mon, Jun 19, 2017 at 8:45 PM, sky <x_h...@163.com> wrote:
> > >> > >
> > >> > >> Hi all,
> > >> > >> I am using cdh5.11.1-release,the compilation command is
> > provided
> > >> in
> > >> > >> the documentation(./buildall.sh -notests -so),but there is no
> > command
> > >> > >> similar to 'make install'.In the current document compiled, the
> > >> > directory
> > >> > >> structure is too much and do not need too many files. Could you
> > >> provide
> > >> > an
> > >> > >> "install" command to extract compiled files to other directories
> > for
> > >> > easy
> > >> > >> management
> > >> >
> > >>
> >
>


Re: Impala make install

2017-06-20 Thread Tim Armstrong
Hi Sky,
  We have not implemented an install target yet - for deployment we rely on
copying out the artifacts manually. I believe CMake has some support for
implementing install targets but nobody has picked up that work yet.

- Tim

On Mon, Jun 19, 2017 at 8:45 PM, sky  wrote:

> Hi all,
> I am using cdh5.11.1-release,the compilation command is provided in
> the documentation(./buildall.sh -notests -so),but there is no command
> similar to 'make install'.In the current document compiled, the directory
> structure is too much and do not need too many files. Could you provide an
> "install" command to extract compiled files to other directories for easy
> management


Re: Broken build from Sentry

2017-06-20 Thread Tim Armstrong
Yeah my change should build against either version by design.

On 20 Jun. 2017 9:21 am, "Henry Robinson" <he...@apache.org> wrote:

> Yes, I did. AFAICT it worked fine.
>
> On 20 June 2017 at 09:19, Alexander Behm <alex.b...@cloudera.com> wrote:
>
> > Henry, did you try the revert on top of Tim's already-checked-in change?
> >
> > On Tue, Jun 20, 2017 at 9:18 AM, Alexander Behm <alex.b...@cloudera.com>
> > wrote:
> >
> > > Let's revert the version to buy us some time. That solution is a
> ticking
> > > time bomb though since that version will disappear soon.
> > >
> > > On Tue, Jun 20, 2017 at 8:56 AM, Henry Robinson <he...@apache.org>
> > wrote:
> > >
> > >> I was able to run a build with EE and FE tests with Sentry reverted to
> > >> 5.12
> > >> - unless there are objections I'm going to post a patch to revert the
> > >> version bump.
> > >>
> > >> On 20 June 2017 at 06:53, Thomas Tauber-Marshall <
> > tmarsh...@cloudera.com>
> > >> wrote:
> > >>
> > >> > So we've had a successful run of the nightlies now, and I've
> uploaded
> > >> the
> > >> > new jars to the s3 bucket, but Sentry still fails for some reason.
> > >> >
> > >> > I filed: https://issues.apache.org/jira/browse/IMPALA-5540 to track
> > >> this
> > >> >
> > >> > On Tue, Jun 20, 2017 at 1:25 AM Alexander Kolbasov <
> > ak...@cloudera.com>
> > >> > wrote:
> > >> >
> > >> > > Note that Apache upstream story is more complicated - there was a
> > >> change
> > >> > > done upstream that refactored a bunch of Sentry code that will
> cause
> > >> > > similar issue (I think it is SENTRY-1205). The change is present
> in
> > >> > Sentry
> > >> > > master but not in upstream sentry HA branch.
> > >> > >
> > >> > > On Mon, Jun 19, 2017 at 11:02 PM, Dimitris Tsirogiannis <
> > >> > > dtsirogian...@cloudera.com> wrote:
> > >> > >
> > >> > > > +Sasha, who I believe has more up-to-date information on this.
> > >> > > >
> > >> > > > On Mon, Jun 19, 2017 at 10:56 PM, Henry Robinson <
> > he...@apache.org>
> > >> > > wrote:
> > >> > > >
> > >> > > >> FWIW, I've been able to start Sentry by setting:
> > >> > > >>
> > >> > > >> export IMPALA_SENTRY_VERSION=1.5.1-cdh5.12.0-SNAPSHOT
> > >> > > >>
> > >> > > >> (i.e. rolling back to the previous version of Sentry). I
> haven't
> > >> yet
> > >> > > tried
> > >> > > >> to run tests - does anyone know an ETA for a fix coming out of
> > >> > Cloudera
> > >> > > >> for
> > >> > > >> the 5.13-SNAPSHOT? If it might be a while, we should consider
> > >> > regressing
> > >> > > >> the Sentry version to unblock checkins.
> > >> > > >>
> > >> > > >> On 19 June 2017 at 15:31, Tim Armstrong <
> tarmstr...@cloudera.com
> > >
> > >> > > wrote:
> > >> > > >>
> > >> > > >> > It's unfortunately not that simple. The API change has been
> in
> > >> > Apache
> > >> > > >> > sentry
> > >> > > >> >
> > >> > > >> > So rolling back the API change temporarily solves the problem
> > for
> > >> > > >> Cloudera,
> > >> > > >> > but we're going to have to deal with it at some point and get
> > >> Impala
> > >> > > >> > building against both versions of the API.
> > >> > > >> >
> > >> > > >> > On Mon, Jun 19, 2017 at 2:55 PM, Thomas Tauber-Marshall <
> > >> > > >> > tmarsh...@cloudera.com> wrote:
> > >> > > >> >
> > >> > > >> > > Yes, the Sentry team has been contacted and they're going
> to
> > be
> > >> > > >> rolling
> > >> > > >> > it
> > >> > > >> > > back.
> > >> > > >> > >
> &g

Re: Broken build from Sentry

2017-06-19 Thread Tim Armstrong
It's unfortunately not that simple. The API change has been in Apache
sentry

So rolling back the API change temporarily solves the problem for Cloudera,
but we're going to have to deal with it at some point and get Impala
building against both versions of the API.

On Mon, Jun 19, 2017 at 2:55 PM, Thomas Tauber-Marshall <
tmarsh...@cloudera.com> wrote:

> Yes, the Sentry team has been contacted and they're going to be rolling it
> back.
>
> On Mon, Jun 19, 2017 at 4:53 PM Todd Lipcon <t...@cloudera.com> wrote:
>
> > Quick question from a bystander: it seems like Sentry committed an
> > API-incompatible change. Instead of fixing on the Impala side, should the
> > Sentry project be notified that they may want to roll back such a change?
> > It seems like an error on their part to do such a thing within a minor
> > version.
> >
> > On Mon, Jun 19, 2017 at 1:56 PM, Thomas Tauber-Marshall <
> > tmarsh...@cloudera.com> wrote:
> >
> > > I'm working on getting the s3 jars updated, which presumably will fix
> > that.
> > >
> > > The problem (to my understanding) is that the nightlies haven't passed
> > > since the change went into Sentry and so the Jenkins job that normally
> > > produces the new jars is still pulling in old bits.
> > >
> > > I've been talking with releng and they expect the new jars to be
> > available
> > > later today.
> > >
> > > On Mon, Jun 19, 2017 at 3:48 PM Tim Armstrong <tarmstr...@cloudera.com
> >
> > > wrote:
> > >
> > > > Looks like the build still breaks when starting up sentry after my
> fix:
> > > >
> > > >
> > http://jenkins.impala.io:8080/job/ubuntu-14.04-from-scratch/1547/console
> > > >
> > > > *20:08:54*  --> Starting the Sentry Policy Server*20:08:59* Error in
> > > > /home/ubuntu/Impala/testdata/bin/run-all.sh at line 58:
> > > > $IMPALA_HOME/testdata/bin/run-sentry-service.sh > \*20:08:59* +
> > > > onexit*20:08:59* + df -m*20:08:59* Filesystem 1M-blocks  Used
> > > > Available Use% Mounted on*20:08:59* udev   15070 1
> > > > 15070   1% /dev*20:08:59* tmpfs   3015 1  3015
> > > > 1% /run*20:08:59* /dev/xvda1161129 22275132204  15%
> > > > /*20:08:59* none   1 0 1   0%
> > > > /sys/fs/cgroup*20:08:59* none   5 0 5
>  0%
> > > > /run/lock*20:08:59* none   15075 1 15075   1%
> > > > /run/shm*20:08:59* none 100 0   100   0%
> > > > /run/user*20:08:59* + free -m*20:08:59*  total   used
> > > >  free sharedbuffers cached*20:08:59* Mem:
> > > > 30148  19597  10550 11 91
> 14323*20:08:59*
> > > > -/+ buffers/cache:   5182  24965*20:08:59* Swap:0
> > > > 0  0*20:08:59* + uptime -p*20:08:59* up 45
> > > > minutes*20:08:59* + rm -rf /home/ubuntu/Impala/logs_static*20:08:59*
> +
> > > > mkdir -p /home/ubuntu/Impala/logs_static*20:08:59* + cp -r -L
> > > > /home/ubuntu/Impala/logs /home/ubuntu/Impala/logs_static*20:08:59*
> > > > Build step 'Execute shell' marked build as failure*20:08:59* Set
> build
> > > > name.*20:08:59* New build name is '#1547
> > > > refs/changes/22/7222/3'*20:08:59* Variable with name
> > > > 'BUILD_DISPLAY_NAME' already exists, current value: '#1547
> > > > refs/changes/22/7222/3', new value: '#1547
> > > > refs/changes/22/7222/3'*20:09:12* Archiving artifacts*20:09:21*
> > > > Finished: FAILURE
> > > >
> > > >
> > > > On Mon, Jun 19, 2017 at 12:23 PM, Tim Armstrong <
> > tarmstr...@cloudera.com
> > > >
> > > > wrote:
> > > >
> > > > > It's unclear if there will be incompatibility between the updated
> > > client
> > > > > and the version of sentry we use for the minicluster. I kicked off
> a
> > > test
> > > > > run to see if it works.
> > > > >
> > > > > On Mon, Jun 19, 2017 at 12:06 PM, Henry Robinson <he...@apache.org
> >
> > > > wrote:
> > > > >
> > > > >> Presumably this will break GVO jobs as well - should we commit
> Tim's
> > > > patch
> > > > >> to get us moving again while Alex works on the root cause?
> > > > >>
> > > > >> On 19 June 2017 at 09:23, Alexander Behm <alex.b...@cloudera.com>
> > > > wrote:
> > > > >>
> > > > >> > Meanwhile, I'll work on fixing the root cause:
> > > > >> > https://issues.apache.org/jira/browse/IMPALA-5530
> > > > >> >
> > > > >> > On Mon, Jun 19, 2017 at 9:20 AM, Tim Armstrong <
> > > > tarmstr...@cloudera.com
> > > > >> >
> > > > >> > wrote:
> > > > >> >
> > > > >> > > You may have noticed that Impala doesn't build this morning
> > > because
> > > > >> of a
> > > > >> > > sentry exception class no longer existing. I was able to
> unblock
> > > > >> myself
> > > > >> > > with this change, if you want to cherry-pick it:
> > > > >> > > https://gerrit.cloudera.org/#/c/7222/
> > > > >> > >
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>


Re: Broken build from Sentry

2017-06-19 Thread Tim Armstrong
Looks like the build still breaks when starting up sentry after my fix:

http://jenkins.impala.io:8080/job/ubuntu-14.04-from-scratch/1547/console

*20:08:54*  --> Starting the Sentry Policy Server*20:08:59* Error in
/home/ubuntu/Impala/testdata/bin/run-all.sh at line 58:
$IMPALA_HOME/testdata/bin/run-sentry-service.sh > \*20:08:59* +
onexit*20:08:59* + df -m*20:08:59* Filesystem 1M-blocks  Used
Available Use% Mounted on*20:08:59* udev   15070 1
15070   1% /dev*20:08:59* tmpfs   3015 1  3015
1% /run*20:08:59* /dev/xvda1161129 22275132204  15%
/*20:08:59* none   1 0 1   0%
/sys/fs/cgroup*20:08:59* none   5 0 5   0%
/run/lock*20:08:59* none   15075 1 15075   1%
/run/shm*20:08:59* none 100 0   100   0%
/run/user*20:08:59* + free -m*20:08:59*  total   used
 free sharedbuffers cached*20:08:59* Mem:
30148  19597  10550 11 91  14323*20:08:59*
-/+ buffers/cache:   5182  24965*20:08:59* Swap:0
0  0*20:08:59* + uptime -p*20:08:59* up 45
minutes*20:08:59* + rm -rf /home/ubuntu/Impala/logs_static*20:08:59* +
mkdir -p /home/ubuntu/Impala/logs_static*20:08:59* + cp -r -L
/home/ubuntu/Impala/logs /home/ubuntu/Impala/logs_static*20:08:59*
Build step 'Execute shell' marked build as failure*20:08:59* Set build
name.*20:08:59* New build name is '#1547
refs/changes/22/7222/3'*20:08:59* Variable with name
'BUILD_DISPLAY_NAME' already exists, current value: '#1547
refs/changes/22/7222/3', new value: '#1547
refs/changes/22/7222/3'*20:09:12* Archiving artifacts*20:09:21*
Finished: FAILURE


On Mon, Jun 19, 2017 at 12:23 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> It's unclear if there will be incompatibility between the updated client
> and the version of sentry we use for the minicluster. I kicked off a test
> run to see if it works.
>
> On Mon, Jun 19, 2017 at 12:06 PM, Henry Robinson <he...@apache.org> wrote:
>
>> Presumably this will break GVO jobs as well - should we commit Tim's patch
>> to get us moving again while Alex works on the root cause?
>>
>> On 19 June 2017 at 09:23, Alexander Behm <alex.b...@cloudera.com> wrote:
>>
>> > Meanwhile, I'll work on fixing the root cause:
>> > https://issues.apache.org/jira/browse/IMPALA-5530
>> >
>> > On Mon, Jun 19, 2017 at 9:20 AM, Tim Armstrong <tarmstr...@cloudera.com
>> >
>> > wrote:
>> >
>> > > You may have noticed that Impala doesn't build this morning because
>> of a
>> > > sentry exception class no longer existing. I was able to unblock
>> myself
>> > > with this change, if you want to cherry-pick it:
>> > > https://gerrit.cloudera.org/#/c/7222/
>> > >
>> >
>>
>
>


Re: Broken build from Sentry

2017-06-19 Thread Tim Armstrong
It's unclear if there will be incompatibility between the updated client
and the version of sentry we use for the minicluster. I kicked off a test
run to see if it works.

On Mon, Jun 19, 2017 at 12:06 PM, Henry Robinson <he...@apache.org> wrote:

> Presumably this will break GVO jobs as well - should we commit Tim's patch
> to get us moving again while Alex works on the root cause?
>
> On 19 June 2017 at 09:23, Alexander Behm <alex.b...@cloudera.com> wrote:
>
> > Meanwhile, I'll work on fixing the root cause:
> > https://issues.apache.org/jira/browse/IMPALA-5530
> >
> > On Mon, Jun 19, 2017 at 9:20 AM, Tim Armstrong <tarmstr...@cloudera.com>
> > wrote:
> >
> > > You may have noticed that Impala doesn't build this morning because of
> a
> > > sentry exception class no longer existing. I was able to unblock myself
> > > with this change, if you want to cherry-pick it:
> > > https://gerrit.cloudera.org/#/c/7222/
> > >
> >
>


Broken build from Sentry

2017-06-19 Thread Tim Armstrong
You may have noticed that Impala doesn't build this morning because of a
sentry exception class no longer existing. I was able to unblock myself
with this change, if you want to cherry-pick it:
https://gerrit.cloudera.org/#/c/7222/


Re: Getting additional permissions on Apache JIRA

2017-06-15 Thread Tim Armstrong
It looks like this is something I need to file a JIRA with Apache infra
for. Thanks to those of you who responded privately.

On Thu, Jun 15, 2017 at 4:40 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> I'm trying to create an agile board on JIRA but running into a permissions
> problem. Would someone with admin privileges be able to help me out?
>
> The error is:
>
>- You do not have permission to share. All shares are invalid.
>    - The user 'Tim Armstrong' does not have permission to share. The
>owner has not been changed.
>
>
> It looks like I need the "Create Shared Objects" permission (according to
> https://confluence.atlassian.com/adminjiracloud/managing-
> global-permissions-776636359.html)
>
> - Tim
>


Getting additional permissions on Apache JIRA

2017-06-15 Thread Tim Armstrong
I'm trying to create an agile board on JIRA but running into a permissions
problem. Would someone with admin privileges be able to help me out?

The error is:

   - You do not have permission to share. All shares are invalid.
   - The user 'Tim Armstrong' does not have permission to share. The owner
   has not been changed.


It looks like I need the "Create Shared Objects" permission (according to
https://confluence.atlassian.com/adminjiracloud/managing-global-permissions-776636359.html
)

- Tim


Re: how add lzma compression to Parquet in Impala

2017-06-12 Thread Tim Armstrong
You would need to add a new codec to the Impala source tree. The codecs are
implemented in be/src/util/codec.h,  be/src/util/compress.h  and
be/src/util/decompress.h. There are a few other places you may need to
change. I would just "git grep -i gzip" to see how the gzip codec is
implemented.

For compressed text files you would also need to add support to the
frontend, e.g. in
fe/src/main/java/org/apache/impala/catalog/HdfsCompression.java

I'm also not sure if there are any licensing issues here since the XZ
library is GPL licensed.

On Sat, Jun 10, 2017 at 5:41 PM, 孙清孟  wrote:

> I have added lzma codec (hadoop-xz) to parquet(modify the parquet-format
> and parquet-mr)  for hive, and get a higher compression ratio.
>
> But how add a new codec for Impala?
>


Re: How to start impalad

2017-06-09 Thread Tim Armstrong
The development minicluster should have a statestored, a catalogd and three
impalads. Is that what you're seeing? In the local development environment
if you run ./bin/start-impala-cluster.py with no arguments that's what it
does by default.

On Thu, Jun 8, 2017 at 7:50 PM, Miklos Szegedi 
wrote:

> Hello,
>
> I built impala using the instructions here:
> https://cwiki.apache.org/confluence/display/IMPALA/
> Bootstrapping+an+Impala+
> Development+Environment+From+Scratch
>
> My question is how to start it. When I launch impalad, it seems to start up
> as statestored:
>
> root@8682379ba09f:/incubator-impala/be/build/release/service# ps -Af
>
> UID PID   PPID  C STIME TTY  TIME CMD
>
> root  1  0  0 02:10 ?00:00:00 bash
>
> root 22  1  0 02:11 ?00:00:00 ./statestored
>
> root 97  1  0 02:13 ?00:00:01 ./catalogd
>
> root239  1  0 02:38 ?00:00:00 ./impalad
> --flagfile=/flags
>
> root279  1  0 02:44 ?00:00:00 ps -Af
>
> root@8682379ba09f:/incubator-impala/be/build/release/service# cat
> /proc/239/net/tcp
>
>   sl  local_address rem_address   st tx_queue rx_queue tr tm->when retrnsmt
>   uid  timeout inode
>
>0: :5DC0 : 0A : 00: 
> 00 1072893 1 880757a626c0 100 0 0 10 0
>
>1: :61B2 : 0A : 00: 
> 00 922388 1 88075b7dbe00 100 0 0 10 0
>
> *5DC0 is port 24000 not 21000*
>
> Here are my flags:
>
> -beeswax_port=21000
>
> -fe_port=21000
>
> -be_port=22000
>
> -llama_callback_port=28000
>
> -hs2_port=21050
>
> -enable_webserver=true
>
> -mem_limit=21917574758
>
> -max_log_files=10
>
> -webserver_port=25000
>
> -max_result_cache_size=10
>
> -state_store_subscriber_port=23000
>
> -statestore_subscriber_timeout_seconds=30
>
> -scratch_dirs=/impala/impalad
>
> -default_query_options
>
> -load_auth_to_local_rules=false
>
> -log_filename=impalad
>
> -audit_event_log_dir=/var/log/impalad/audit
>
> -max_audit_event_log_file_size=5000
>
> -abort_on_failed_audit_event=false
>
> -minidump_path=/var/log/impala-minidumps
>
> -max_minidumps=9
>
> -lineage_event_log_dir=/var/log/impalad/lineage
>
> -max_lineage_log_file_size=5000
>
> -hostname=example.com
>
> -state_store_host=localhost
>
> -enable_rm=false
>
> -state_store_port=24000
>
> -catalog_service_host=localhost
>
> -catalog_service_port=26000
>
> -local_library_dir=/var/lib/impala/udfs
>
> -disable_admission_control=false
>
> -queue_wait_timeout_ms=6
>
> -disk_spill_encryption=false
>
> -abort_on_config_error=true
>
> -kudu_master_hosts=example.com
>
> End of logs:
> I0609 02:38:58.544764   239 init.cc:218] Cpu Info:
>
>   Model: Intel(R) Xeon(R) CPU E5-2680 v2 @ 2.80GHz
>
>   Cores: 16
>
>   Max Possible Cores: 16
>
>   L1 Cache: 32.00 KB (Line: 64.00 B)
>
>   L2 Cache: 256.00 KB (Line: 64.00 B)
>
>   L3 Cache: 25.00 MB (Line: 64.00 B)
>
>   Hardware Supports:
>
> ssse3
>
> sse4_1
>
> sse4_2
>
> popcnt
>
> avx
>
>   Numa Nodes: 1
>
>   Numa Nodes of Cores: 0->0 | 1->0 | 2->0 | 3->0 | 4->0 | 5->0 | 6->0 |
> 7->0 | 8->0 | 9->0 | 10->0 | 11->0 | 12->0 | 13->0 | 14->0 | 15->0 |
>
> I0609 02:38:58.544772   239 init.cc:219] Disk Info:
>
>   Num disks 3:
>
> xvda (rotational=false)
>
> xvdb (rotational=false)
>
> xvdc (rotational=false)
>
> I0609 02:38:58.544790   239 init.cc:220] Physical Memory: 29.03 GB
>
> I0609 02:38:58.544795   239 init.cc:221] OS version: Linux version
> 3.10.0-514.2.2.el7.x86_64 (buil...@kbuilder.dev.centos.org) (gcc version
> 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Tue Dec 6 23:06:41 UTC
> 2016
>
> Clock: clocksource: 'xen', clockid_t: CLOCK_MONOTONIC_COARSE
>
> I0609 02:38:58.544800   239 init.cc:222] Process ID: 239
>
> Thanks,
>
> Miklos
>


Re: Re: Failed to load test data about TPC-H

2017-06-01 Thread Tim Armstrong
We don't test with mixed versions like that unfortunately.

On Thu, Jun 1, 2017 at 8:02 AM, 黄权隆 <huang_quanl...@126.com> wrote:

> Hi Tim,
>
> Thanks for you reply! I'll try these scripts later. One more question.
> Is the latest Impala compatible with components in CDH-5.7.3?
> For example, Hadoop-2.6.0 and Hive-1.1.0?
>
> We use the old version cdh-5.7.3-release just due to the concern
> of incompatibility.
>
> Thanks
> 
> Quanlong
>
>
> At 2017-06-01 21:31:17, "Tim Armstrong" <tarmstr...@cloudera.com> wrote:
> >Hi Quanlong,
> >  It looks like you're missing the TPC-H data. In older versions of Impala
> >you had to generate the data manually and put it in that directory. We've
> >automated that in more recent versions (I think probably since a year ago).
> >If you can switch to a newer version, then this will just work. Data
> >loading is a lot more reliable now.
> >
> >Otherwise this is the script that generates the data. You can probably copy
> >this script to your repository and run it by hand:
> >
> >https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpch/preload
> >
> >You will also need to do the same for TPC-DS:
> >https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpcds/preload
> >
> >
> >Cheers,
> >Tim
> >
> >On Thu, Jun 1, 2017 at 12:54 AM, 黄权隆 <huang_quanl...@126.com> wrote:
> >
> >> Hi friends,
> >>
> >>
> >> I'm trying to run the impala tests. What I referred is the wiki 'How to
> >> load and run Impala tests'.
> >> Although I just want to run some end-to-end tests, I know I should load
> >> the test data first. So I use
> >> |
> >> ./buildall.sh -noclean -testdata
> >> |
> >> It succeeded to load the functional test data, but failed to load the tpch
> >> data set. Here are some related logs:
> >>
> >>
> >> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> >> release/testdata/target
> >> SUCCESS, data generated into /home/CORP/quanlong.huang/
> >> workspace/Impala-cdh5.7.3-release/testdata/target
> >> Loading Hive Builtins (logging to load-hive-builtins.log)... OK
> >> Generating HBase data (logging to create-hbase.log)... OK
> >> Creating /test-warehouse HDFS directory (logging to
> >> create-test-warehouse-dir.log)... OK
> >> Starting Impala cluster (logging to start-impala-cluster.log)... OK
> >> Setting up HDFS environment (logging to setup-hdfs-env.log)... OK
> >> Loading custom schemas (logging to load-custom-schemas.log)... OK
> >> Loading functional-query data (logging to load-functional-query.log)... OK
> >> Loading TPC-H data (logging to load-tpch.log)... FAILED
> >> 'load-data tpch core' failed. Tail of log:
> >> Log for command 'load-data tpch core'
> >> Loading workload 'tpch' Using exploration strategy 'core'. Logging to
> >> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> >> release/cluster_logs/data_loading/data-load-tpch-core.log
> >> Error loading data. The end of the log file is:
> >> at org.apache.thrift.ProcessFunction.process(
> >> ProcessFunction.java:39)
> >> at org.apache.thrift.TBaseProcessor.process(
> >> TBaseProcessor.java:39)
> >> at org.apache.hive.service.auth.TSetIpAddressProcessor.process(
> >> TSetIpAddressProcessor.java:56)
> >> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(
> >> TThreadPoolServer.java:285)
> >> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> >> ThreadPoolExecutor.java:1145)
> >> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> >> ThreadPoolExecutor.java:615)
> >> at java.lang.Thread.run(Thread.java:745)
> >> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23
> >> Invalid path ''/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> >> release/testdata/impala-data/tpch/lineitem'': No files matching path
> >> file:/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.
> >> 3-release/testdata/impala-data/tpch/lineitem
> >> at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
> >> applyConstraints(LoadSemanticAnalyzer.java:139)
> >> at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
> >> analyzeInternal(LoadSemanticAnalyzer.java:230)
> >> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.
> >> analyze(BaseSemanticAnalyzer.java:222)
&g

Re: Failed to load test data about TPC-H

2017-06-01 Thread Tim Armstrong
Hi Quanlong,
  It looks like you're missing the TPC-H data. In older versions of Impala
you had to generate the data manually and put it in that directory. We've
automated that in more recent versions (I think probably since a year ago).
If you can switch to a newer version, then this will just work. Data
loading is a lot more reliable now.

Otherwise this is the script that generates the data. You can probably copy
this script to your repository and run it by hand:

https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpch/preload

You will also need to do the same for TPC-DS:
https://github.com/apache/incubator-impala/blob/master/testdata/datasets/tpcds/preload


Cheers,
Tim

On Thu, Jun 1, 2017 at 12:54 AM, 黄权隆  wrote:

> Hi friends,
>
>
> I'm trying to run the impala tests. What I referred is the wiki 'How to
> load and run Impala tests'.
> Although I just want to run some end-to-end tests, I know I should load
> the test data first. So I use
> |
> ./buildall.sh -noclean -testdata
> |
> It succeeded to load the functional test data, but failed to load the tpch
> data set. Here are some related logs:
>
>
> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> release/testdata/target
> SUCCESS, data generated into /home/CORP/quanlong.huang/
> workspace/Impala-cdh5.7.3-release/testdata/target
> Loading Hive Builtins (logging to load-hive-builtins.log)... OK
> Generating HBase data (logging to create-hbase.log)... OK
> Creating /test-warehouse HDFS directory (logging to
> create-test-warehouse-dir.log)... OK
> Starting Impala cluster (logging to start-impala-cluster.log)... OK
> Setting up HDFS environment (logging to setup-hdfs-env.log)... OK
> Loading custom schemas (logging to load-custom-schemas.log)... OK
> Loading functional-query data (logging to load-functional-query.log)... OK
> Loading TPC-H data (logging to load-tpch.log)... FAILED
> 'load-data tpch core' failed. Tail of log:
> Log for command 'load-data tpch core'
> Loading workload 'tpch' Using exploration strategy 'core'. Logging to
> /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> release/cluster_logs/data_loading/data-load-tpch-core.log
> Error loading data. The end of the log file is:
> at org.apache.thrift.ProcessFunction.process(
> ProcessFunction.java:39)
> at org.apache.thrift.TBaseProcessor.process(
> TBaseProcessor.java:39)
> at org.apache.hive.service.auth.TSetIpAddressProcessor.process(
> TSetIpAddressProcessor.java:56)
> at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(
> TThreadPoolServer.java:285)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.hadoop.hive.ql.parse.SemanticException: Line 1:23
> Invalid path ''/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> release/testdata/impala-data/tpch/lineitem'': No files matching path
> file:/home/CORP/quanlong.huang/workspace/Impala-cdh5.7.
> 3-release/testdata/impala-data/tpch/lineitem
> at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
> applyConstraints(LoadSemanticAnalyzer.java:139)
> at org.apache.hadoop.hive.ql.parse.LoadSemanticAnalyzer.
> analyzeInternal(LoadSemanticAnalyzer.java:230)
> at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.
> analyze(BaseSemanticAnalyzer.java:222)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:445)
> at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:311)
> at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.
> java:1189)
> at org.apache.hadoop.hive.ql.Driver.compileAndRespond(
> Driver.java:1176)
> at org.apache.hive.service.cli.operation.SQLOperation.
> prepare(SQLOperation.java:134)
> ... 26 more
>
>
> Closing: 0: jdbc:hive2://localhost:11050/default;auth=none
> Error executing file from Hive: load-tpch-core-hive-generated.sql
> Error in /home/CORP/quanlong.huang/workspace/Impala-cdh5.7.3-
> release/testdata/bin/create-load-data.sh at line 41: while [ -n "$*" ]
> Error in ./buildall.sh at line 368: 
> ${IMPALA_HOME}/testdata/bin/create-load-data.sh
> ${CREATE_LOAD_DATA_ARGS} <<< Y
>
>
> I'm using version cdh5.7.3-release. The directory 
> ${IMPALA_HOME}/testdata/impala-data
> dose not exist.
>
>
> Could you tell me how to generate this data set? Or where can I download
> the snapshot file of test-warehouse so I can skip this step?
>
>
> Thanks
> 
> Quanlong
>
>
>
> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
>
>
>
> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>
>
>
>
> 【网易自营|30天无忧退货】德国Birkenstock制造商“经典软木凉拖”限时仅69.9元>>


Re: connection refuseLeap status not available

2017-05-24 Thread Tim Armstrong
Yes it's also possible that my solution "works" because it distracts me for
long enough for NTP to sync :)

On Wed, May 24, 2017 at 12:20 PM, Zachary Amsden <zams...@cloudera.com>
wrote:

> This is similar to what I hit with NTP the other day after a restart.  I
> tried a number of things, and I think the only thing that worked was
> waiting for NTP to sync.  Pitfalls: ntpdate requires a host on the command
> line, and doesn't read the configuration file.
>
> There was some circumstantial evidence that a connection to the Ubuntu NTP
> pool failed during boot - overload or random connection failure, but I
> never got an exact cause.
>
> On Wed, May 24, 2017 at 12:09 PM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
> > I see that with some frequency when restarting my system.
> >
> > I usually manage to fix it with a non-scientific approach of running some
> > combination of these commands until ntp-wait works:
> >
> > sudo service ntp restart
> > sudo ntpdate -s ntp.ubuntu.com
> > ntp-wait -v
> >
> > On Wed, May 24, 2017 at 11:33 AM, Jim Apple <jbap...@cloudera.com>
> wrote:
> >
> > > testdata/cluster/admin
> > >
> > > calls ntp-wait which returns an error:
> > >
> > > "ntpq: read: Connection refuseLeap status not avalaible"
> > >
> > > Has anyone seen this? There are 0 Google hits for refuseLeap and the
> > > string is not present in my repository or my /etc/.
> > >
> >
>


Re: connection refuseLeap status not available

2017-05-24 Thread Tim Armstrong
I see that with some frequency when restarting my system.

I usually manage to fix it with a non-scientific approach of running some
combination of these commands until ntp-wait works:

sudo service ntp restart
sudo ntpdate -s ntp.ubuntu.com
ntp-wait -v

On Wed, May 24, 2017 at 11:33 AM, Jim Apple  wrote:

> testdata/cluster/admin
>
> calls ntp-wait which returns an error:
>
> "ntpq: read: Connection refuseLeap status not avalaible"
>
> Has anyone seen this? There are 0 Google hits for refuseLeap and the
> string is not present in my repository or my /etc/.
>


Re: Bootstrapping a development environment on Ubuntu 16.04

2017-05-21 Thread Tim Armstrong
Did you manage to run all of the tests and data loading? I remember a while
back I was having issues with Kudu crashing - never retried to see if it
got better.

On Sat, May 20, 2017 at 9:23 PM, Jim Apple  wrote:

> I am hoping to upgrade from Ubuntu 14.04 to 16.04. In order to check
> that I could still do Impala development work, I bootstrapped a
> development environment on a 16.04 cloud instance. The script that
> worked for me is here:
>
> https://cwiki.apache.org/confluence/display/IMPALA/Bootstrap
> ping+an+Impala+Development+Environment+From+Scratch
>
> While it might be useful to have this as a Jenkins job, I have no
> plans to create such a job.
>
> It might be a bit cleaner to get this done in the chef repo in
> https://github.com/awleblang/impala-setup, but I don't plan to do that
> at this time.
>


Re: [DISCUSS] Release 2.9.0 soon?

2017-05-19 Thread Tim Armstrong
+1 Thanks for volunteering. It would be great to get a release done - it's
been quite a while since the last one.

On Fri, May 19, 2017 at 5:53 PM, Bharath Vissapragada  wrote:

> +1. Good to have a new release with all the latest improvements.
>
> On Fri, May 19, 2017 at 3:51 PM, Alexander Behm 
> wrote:
>
> > +1 for doing a release
> >
> > On Fri, May 19, 2017 at 3:41 PM, Taras Bobrovytsky 
> > wrote:
> >
> > > This is not a [VOTE] thread. Everyone is encourage to participate.
> > >
> > > I am volunteering to be a release manager for Impala 2.9.0. Are there
> any
> > > objections to releasing 2.9.0 soon?
> > > Keep in mind this is NOT your last chance to speak - there will be at
> > least
> > > two votes, one for PPMC releasing and one for IPMC releasing.
> > >
> > > See
> > > https://cwiki.apache.org/confluence/display/IMPALA/
> > DRAFT%3A+How+to+Release
> > >
> >
>


Re: IMPALA-5030

2017-05-15 Thread Tim Armstrong
Not sure - if you tell us your username on Apache JIRA we could just assign
it to you.

On Mon, May 15, 2017 at 6:13 PM, Vincent Tran <vtt...@cloudera.com> wrote:

> Thanks for the tips. I will go with option 3.
>
> Also, for some reasons, I don't have permissions to assign issues to myself
> in the Impala project. Do you guys know if I need to request additional
> privileges?
>
> I can assign Hive issues to myself without additional actions.
>
>
> Vincent
>
>
> On Mon, May 15, 2017 at 6:55 PM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
> > Also if you're starting to work on the JIRA, can assign it to yourself in
> > the Apache JIRA instance so we can see that it's assigned to someone.
> >
> > Cheers,
> > Tim
> >
> > On Mon, May 15, 2017 at 4:21 PM, Alexander Behm <alex.b...@cloudera.com>
> > wrote:
> >
> > > Hi Vincent,
> > >
> > > Jim is correct, we to be able to handle invocations with those types:
> > > The 1st arg can be any type. The 2nd and 3rd types must be compatible.
> > > During function resolution the FE will pick the most appropriate
> > signature
> > > and add casts to the 2nd or 3rd arg as necessary.
> > > You do not need to create signatures like "nvl2(string, int,
> timestamp)".
> > >
> > > I have considered the following options for addressing this difficulty:
> > > 1. Accept an AnyType for the 1st argument and create 9 function
> > signatures.
> > > The FE current does not have such a concept. Adding it would involve
> > > changing many tricky places.
> > > 2. Rewrite NVL2() as an IF in the FE using the ExprRewriteRule
> framework.
> > > You will need to make changes to FunctionCallExpr to accept NVL2().
> > > 3. Stamp out all 9*9 function signatures. Should be easy since you are
> in
> > > Python code there.
> > >
> > > Option 3 seems easiest/preferable to me at this point.
> > >
> > > Alex
> > >
> > >
> > >
> > > On Mon, May 15, 2017 at 1:05 PM, Jim Apple <jbap...@cloudera.com>
> wrote:
> > >
> > > > It seems to me like there should be version with type X, [P,X,X] for
> > > > all pair P and X. There might be a smart way to do this without the
> > > > quadratic number of cases; someone else probably knows better than I
> > > > do about that.
> > > >
> > > > On Sun, May 14, 2017 at 1:00 AM, Vincent Tran <vtt...@cloudera.com>
> > > wrote:
> > > > > Hey folks,
> > > > >
> > > > > I want to start contributing to learn the code base. I added this
> > > > function
> > > > > to my personal build:
> > > > >
> > > > >   [['nvl2'], 'TINYINT', ['TINYINT', 'TINYINT', 'TINYINT'],
> > > > > 'impala::ConditionalFunctions::NVL2'],
> > > > >   [['nvl2'], 'SMALLINT', ['SMALLINT', 'SMALLINT', 'SMALLINT'],
> > > > > 'impala::ConditionalFunctions::NVL2'],
> > > > >   [['nvl2'], 'INT', ['INT', 'INT', 'INT'],
> > > > > 'impala::ConditionalFunctions::NVL2'],
> > > > >   [['nvl2'], 'BIGINT', ['BIGINT', 'BIGINT', 'BIGINT'],
> > > > > 'impala::ConditionalFunctions::NVL2'],
> > > > >   [['nvl2'], 'FLOAT', ['FLOAT', 'FLOAT', 'FLOAT'],
> > > > > 'impala::ConditionalFunctions::NVL2'],
> > > > >   [['nvl2'], 'DOUBLE', ['DOUBLE', 'DOUBLE', 'DOUBLE'],
> > > > > 'impala::ConditionalFunctions::NVL2'],
> > > > >   [['nvl2'], 'DECIMAL', ['DECIMAL', 'DECIMAL', 'DECIMAL'],
> > > > > 'impala::ConditionalFunctions::NVL2'],
> > > > >   [['nvl2'], 'STRING', ['STRING', 'STRING', 'STRING'],
> > > > > 'impala::ConditionalFunctions::NVL2'],
> > > > >   [['nvl2'], 'TIMESTAMP', ['TIMESTAMP', 'TIMESTAMP', 'TIMESTAMP'],
> > > > > 'impala::ConditionalFunctions::NVL2'],
> > > > >
> > > > >
> > > > > Do you think this is a sound approach? Should we allow mix types
> for
> > > this
> > > > > function?
> > > > >
> > > > > i.e. nvl2(string, int, timestamp)
> > > > >
> > > > >
> > > > > https://issues.cloudera.org/browse/IMPALA-5030
> > > > >
> > > > >
> > > > > --
> > > > > Vincent T. Tran
> > > > > Customer Operations Engineer
> > > > > Cloudera, Inc.
> > > >
> > >
> >
>
>
>
> --
> Vincent T. Tran
> Customer Operations Engineer
> Cloudera, Inc.
>


Re: IMPALA-5030

2017-05-15 Thread Tim Armstrong
Also if you're starting to work on the JIRA, can assign it to yourself in
the Apache JIRA instance so we can see that it's assigned to someone.

Cheers,
Tim

On Mon, May 15, 2017 at 4:21 PM, Alexander Behm 
wrote:

> Hi Vincent,
>
> Jim is correct, we to be able to handle invocations with those types:
> The 1st arg can be any type. The 2nd and 3rd types must be compatible.
> During function resolution the FE will pick the most appropriate signature
> and add casts to the 2nd or 3rd arg as necessary.
> You do not need to create signatures like "nvl2(string, int, timestamp)".
>
> I have considered the following options for addressing this difficulty:
> 1. Accept an AnyType for the 1st argument and create 9 function signatures.
> The FE current does not have such a concept. Adding it would involve
> changing many tricky places.
> 2. Rewrite NVL2() as an IF in the FE using the ExprRewriteRule framework.
> You will need to make changes to FunctionCallExpr to accept NVL2().
> 3. Stamp out all 9*9 function signatures. Should be easy since you are in
> Python code there.
>
> Option 3 seems easiest/preferable to me at this point.
>
> Alex
>
>
>
> On Mon, May 15, 2017 at 1:05 PM, Jim Apple  wrote:
>
> > It seems to me like there should be version with type X, [P,X,X] for
> > all pair P and X. There might be a smart way to do this without the
> > quadratic number of cases; someone else probably knows better than I
> > do about that.
> >
> > On Sun, May 14, 2017 at 1:00 AM, Vincent Tran 
> wrote:
> > > Hey folks,
> > >
> > > I want to start contributing to learn the code base. I added this
> > function
> > > to my personal build:
> > >
> > >   [['nvl2'], 'TINYINT', ['TINYINT', 'TINYINT', 'TINYINT'],
> > > 'impala::ConditionalFunctions::NVL2'],
> > >   [['nvl2'], 'SMALLINT', ['SMALLINT', 'SMALLINT', 'SMALLINT'],
> > > 'impala::ConditionalFunctions::NVL2'],
> > >   [['nvl2'], 'INT', ['INT', 'INT', 'INT'],
> > > 'impala::ConditionalFunctions::NVL2'],
> > >   [['nvl2'], 'BIGINT', ['BIGINT', 'BIGINT', 'BIGINT'],
> > > 'impala::ConditionalFunctions::NVL2'],
> > >   [['nvl2'], 'FLOAT', ['FLOAT', 'FLOAT', 'FLOAT'],
> > > 'impala::ConditionalFunctions::NVL2'],
> > >   [['nvl2'], 'DOUBLE', ['DOUBLE', 'DOUBLE', 'DOUBLE'],
> > > 'impala::ConditionalFunctions::NVL2'],
> > >   [['nvl2'], 'DECIMAL', ['DECIMAL', 'DECIMAL', 'DECIMAL'],
> > > 'impala::ConditionalFunctions::NVL2'],
> > >   [['nvl2'], 'STRING', ['STRING', 'STRING', 'STRING'],
> > > 'impala::ConditionalFunctions::NVL2'],
> > >   [['nvl2'], 'TIMESTAMP', ['TIMESTAMP', 'TIMESTAMP', 'TIMESTAMP'],
> > > 'impala::ConditionalFunctions::NVL2'],
> > >
> > >
> > > Do you think this is a sound approach? Should we allow mix types for
> this
> > > function?
> > >
> > > i.e. nvl2(string, int, timestamp)
> > >
> > >
> > > https://issues.cloudera.org/browse/IMPALA-5030
> > >
> > >
> > > --
> > > Vincent T. Tran
> > > Customer Operations Engineer
> > > Cloudera, Inc.
> >
>


Re: Should we change tests so they don't use single letter table names?

2017-05-12 Thread Tim Armstrong
Personally I'd prefer the frontend test to fail instead of dropping my
table without warning. I assume these tables are in the default database,
right?

On Fri, May 12, 2017 at 8:43 AM, Alexander Behm 
wrote:

> Michael, to keep them fast and self-contained the FE tests do not require a
> running Impala cluster, and as such cannot really execute any statements
> (e.g. DROP/ADD).
>
> The FE has limited mechanisms for setting up temporary tables which might
> suffice in most but not all cases.
>
> I agree with Lars that we should address this issue. We need to look at a
> few cases and see if there's a sledgehammer solution we can apply.
>
> On Fri, May 12, 2017 at 7:21 AM, Michael Brown  wrote:
>
> > Why not alter the frontend test to drop t if exists? Tests should
> generally
> > strive to set themselves up. Is there some trait of the frontend tests
> that
> > prevents that?
> >
> > On Fri, May 12, 2017 at 4:38 AM, Lars Volker  wrote:
> >
> > > Hi All,
> > >
> > > I frequently create test tables on my local system with names like "t"
> or
> > > "p". A couple of frontend tests use the same names and then fail with
> > > "Table already exists".
> > >
> > > Does anyone else hit this from time to time? Can we change the table
> > names
> > > in the tests to avoid single letter names? If there are no objections,
> > I'll
> > > open a JIRA.
> > >
> > > Thanks, Lars
> > >
> >
>


Need reviewer for distcc patches

2017-05-08 Thread Tim Armstrong
This will unblock distcc for a decent number of people:
https://gerrit.cloudera.org/#/c/6655/ . It would be good to get it in
sooner rather than later.

Cheers,
Tim


Re: impala support json format table

2017-04-18 Thread Tim Armstrong
+1 to starting with some smaller tasks

On 18 Apr. 2017 9:32 am, "Jim Apple" <jbap...@cloudera.com> wrote:

> That's a good point, Tim. Generally, new contributors might want to start
> with a newbie bug:
>
> https://issues.apache.org/jira/issues/?jql=project%20%
> 3D%20IMPALA%20AND%20status%20%3D%20Open%20AND%20labels%20%3D%20newbie
>
> JSON support is a larger project, and it might not be the one most amenable
> to learning about the Impala community's style and processes.
>
> On Tue, Apr 18, 2017 at 9:27 AM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
> > Seems like useful functionality that would be great to have in Impala.
> > There was an earlier attempt to do this that didn't make it in - I'm not
> > sure that the approach was quite right:
> > https://gerrit.cloudera.org/#/c/1201/1 . I'm not sure what the exact
> > problems were but I remember we didn't think it was quite the right
> > approach.
> >
> > I think we'd need to talk through a design first because there are a lot
> of
> > considerations and we want to make sure to get it right. I had some
> initial
> > questions that I'd want to think through before adding a JSON scanner.
> >
> >- What JSON does it accept?
> >- How do we declare a table schema and map it to the JSON
> >- How does it handle missing or extra fields - does it just return
> null
> >or drop the fields? What if the field type is wrong?
> >- How do the numeric types work? JSON only supports floating point,
> but
> >I think many people would like to store higher-precision decimal or
> > 64-bit
> >integer types (which is technically outside of the JSON standard).
> >- Will it support codegen? If not, is it written in a way that allows
> it
> >in future?
> >
> > Cheers,
> > Tim
> >
> > - Tim
> >
> > On Tue, Apr 18, 2017 at 8:52 AM, Jim Apple <jbap...@cloudera.com> wrote:
> >
> > > On Mon, Apr 17, 2017 at 8:02 PM, yu feng <olaptes...@gmail.com> wrote:
> > >
> > > > Hi impala community:
> > > >   I am Newly join to Impala,
> > >
> > >
> > > Welcome!
> > >
> > > I want to know what is the attitude of impala
> > > > community for supporting json format.
> > >
> > >
> > > I am in favor of it. I am only one person, though - anybody else object
> > to
> > > JSON support?
> > >
> > > If this match the roadmap, maybe I
> > > > can make some contribution.
> > > >
> > >
> > > I do not recall much talk about Apache Impala's roadmap since we joined
> > the
> > > ASF. Perhaps I missed a thread about it?
> > >
> >
>


Re: impala support json format table

2017-04-18 Thread Tim Armstrong
Seems like useful functionality that would be great to have in Impala.
There was an earlier attempt to do this that didn't make it in - I'm not
sure that the approach was quite right:
https://gerrit.cloudera.org/#/c/1201/1 . I'm not sure what the exact
problems were but I remember we didn't think it was quite the right
approach.

I think we'd need to talk through a design first because there are a lot of
considerations and we want to make sure to get it right. I had some initial
questions that I'd want to think through before adding a JSON scanner.

   - What JSON does it accept?
   - How do we declare a table schema and map it to the JSON
   - How does it handle missing or extra fields - does it just return null
   or drop the fields? What if the field type is wrong?
   - How do the numeric types work? JSON only supports floating point, but
   I think many people would like to store higher-precision decimal or 64-bit
   integer types (which is technically outside of the JSON standard).
   - Will it support codegen? If not, is it written in a way that allows it
   in future?

Cheers,
Tim

- Tim

On Tue, Apr 18, 2017 at 8:52 AM, Jim Apple  wrote:

> On Mon, Apr 17, 2017 at 8:02 PM, yu feng  wrote:
>
> > Hi impala community:
> >   I am Newly join to Impala,
>
>
> Welcome!
>
> I want to know what is the attitude of impala
> > community for supporting json format.
>
>
> I am in favor of it. I am only one person, though - anybody else object to
> JSON support?
>
> If this match the roadmap, maybe I
> > can make some contribution.
> >
>
> I do not recall much talk about Apache Impala's roadmap since we joined the
> ASF. Perhaps I missed a thread about it?
>


Re: Upstreaming ppc64le patches for native-toolchain

2017-04-17 Thread Tim Armstrong
Makes sense to me

On Mon, Apr 17, 2017 at 3:11 PM, Jim Apple <jbap...@cloudera.com> wrote:

> I'm OK with us calling this "experimental" (once the patches are in and we
> think it works) and letting patches continue to land with our current
> testing regime.
>
> Later, if we add daily test reports with contributor-managed hardware or
> even pre-commit testing on jenkins.impala.io in a VM, maybe we could
> upgrade to other names than "experimental", like "beta" or "best-effort",
> if we still want to have caveats.
>
> How does that sound to everyone else?
>
> On Mon, Apr 17, 2017 at 2:54 PM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
>
> > Something like that makes sense - I think it should be an experimental or
> > unsupported feature until if/when we have a critical mass of committers
> to
> > maintain it.
> >
> > On Mon, Apr 17, 2017 at 2:46 PM, Jim Apple <jbap...@cloudera.com> wrote:
> >
> > > That makes sense to me. It would be good for us to provide support
> > without
> > > completely focusing all development effort on a HW platform with fewer
> > > users.
> > >
> > > If ppc64le support lands in 2.10, and between 2.10 and 2.11 the ppc64le
> > > tests start breaking for reasons nobody with HW access can debug,
> should
> > we
> > > say in 2.11 release notes that ppc64le is no longer supported?
> > >
> > > Or perhaps, even in the first release where we think it works, we
> should
> > > spell it out as a platform with only "best-effort" support, that way we
> > > don't have to retract support?
> > >
> > > On Mon, Apr 17, 2017 at 2:41 PM, Marcel Kornacker <mar...@cloudera.com
> >
> > > wrote:
> > >
> > > > My main concern is that we don't unduly burden the development
> process.
> > > As
> > > > such, it makes a lot of sense to keep the PPC tests out of the
> regular
> > > > tests and have the party that's interested in the PPC tests create
> the
> > > > infrastructure and run those tests.
> > > >
> > > > On Mon, Apr 17, 2017 at 2:30 PM, Jim Apple <jbap...@cloudera.com>
> > wrote:
> > > >
> > > > > One concern I have is sustainability. If only one Impala
> contributor
> > > can
> > > > > work with ppc64le, and that contributor is not as seasoned as some
> of
> > > the
> > > > > other committers, what happens if ppc64le breaks and the one person
> > > with
> > > > VM
> > > > > access can't fix it?
> > > > >
> > > > > Part of my concern is just how flaky the current tests are, too. It
> > > takes
> > > > > some time to be able to distinguish broken tests that are flaky and
> > > > broken
> > > > > tests that are the result of a specific commit.
> > > > >
> > > > > My hope was that with a ppc64le VM (maybe through Qemu?) that runs
> on
> > > > > x86-64 Linux, the other contributors could help fix things that
> broke
> > > on
> > > > > that platform.
> > > > >
> > > > > On Mon, Apr 17, 2017 at 12:51 PM, Marcel Kornacker <
> > > mar...@cloudera.com>
> > > > > wrote:
> > > > >
> > > > > > +1
> > > > > >
> > > > > > On Mon, Apr 17, 2017 at 11:56 AM, Henry Robinson <
> > he...@cloudera.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > +1
> > > > > > >
> > > > > > > On Mon, Apr 17, 2017 at 9:09 AM Tim Armstrong <
> > > > tarmstr...@cloudera.com
> > > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > I feel like we shouldn't make PPC part of pre-commit at least
> > > > > > initially -
> > > > > > > > it's an unreasonable barrier if contributors/committers to
> > debug
> > > > > issues
> > > > > > > on
> > > > > > > > a platform they don't have easy access to. Having the testing
> > > infra
> > > > > is
> > > > > > > > still important because we don't want to have code in there
> > that
> > > > will
> > > > > > > > gradually bit-rot without us noticing.
> > > > > 

Re: Upstreaming ppc64le patches for native-toolchain

2017-04-17 Thread Tim Armstrong
Something like that makes sense - I think it should be an experimental or
unsupported feature until if/when we have a critical mass of committers to
maintain it.

On Mon, Apr 17, 2017 at 2:46 PM, Jim Apple <jbap...@cloudera.com> wrote:

> That makes sense to me. It would be good for us to provide support without
> completely focusing all development effort on a HW platform with fewer
> users.
>
> If ppc64le support lands in 2.10, and between 2.10 and 2.11 the ppc64le
> tests start breaking for reasons nobody with HW access can debug, should we
> say in 2.11 release notes that ppc64le is no longer supported?
>
> Or perhaps, even in the first release where we think it works, we should
> spell it out as a platform with only "best-effort" support, that way we
> don't have to retract support?
>
> On Mon, Apr 17, 2017 at 2:41 PM, Marcel Kornacker <mar...@cloudera.com>
> wrote:
>
> > My main concern is that we don't unduly burden the development process.
> As
> > such, it makes a lot of sense to keep the PPC tests out of the regular
> > tests and have the party that's interested in the PPC tests create the
> > infrastructure and run those tests.
> >
> > On Mon, Apr 17, 2017 at 2:30 PM, Jim Apple <jbap...@cloudera.com> wrote:
> >
> > > One concern I have is sustainability. If only one Impala contributor
> can
> > > work with ppc64le, and that contributor is not as seasoned as some of
> the
> > > other committers, what happens if ppc64le breaks and the one person
> with
> > VM
> > > access can't fix it?
> > >
> > > Part of my concern is just how flaky the current tests are, too. It
> takes
> > > some time to be able to distinguish broken tests that are flaky and
> > broken
> > > tests that are the result of a specific commit.
> > >
> > > My hope was that with a ppc64le VM (maybe through Qemu?) that runs on
> > > x86-64 Linux, the other contributors could help fix things that broke
> on
> > > that platform.
> > >
> > > On Mon, Apr 17, 2017 at 12:51 PM, Marcel Kornacker <
> mar...@cloudera.com>
> > > wrote:
> > >
> > > > +1
> > > >
> > > > On Mon, Apr 17, 2017 at 11:56 AM, Henry Robinson <he...@cloudera.com
> >
> > > > wrote:
> > > >
> > > > > +1
> > > > >
> > > > > On Mon, Apr 17, 2017 at 9:09 AM Tim Armstrong <
> > tarmstr...@cloudera.com
> > > >
> > > > > wrote:
> > > > >
> > > > > > I feel like we shouldn't make PPC part of pre-commit at least
> > > > initially -
> > > > > > it's an unreasonable barrier if contributors/committers to debug
> > > issues
> > > > > on
> > > > > > a platform they don't have easy access to. Having the testing
> infra
> > > is
> > > > > > still important because we don't want to have code in there that
> > will
> > > > > > gradually bit-rot without us noticing.
> > > > > >
> > > > > > On Mon, Apr 17, 2017 at 8:51 AM, Silvius Rus <s...@cloudera.com>
> > > > wrote:
> > > > > >
> > > > > > > Would it make sense to _not_ run PPC tests as part of
> presubmit?
> > > > > Instead
> > > > > > > Valencia could set up nightly tests using in-house
> > infrastructure.
> > > > And
> > > > > > > share the test results, e.g., by sending them to a new email
> list
> > > > > > > te...@impala.incubator.apache.org (that we'd need to create)
> so
> > > > > everyone
> > > > > > > can see when there are failures or if coverage stops for
> whatever
> > > > > reason.
> > > > > > > GCC has been doing something like this for long,
> > > > > > > https://gcc.gnu.org/ml/gcc-testresults/2017-04/.
> > > > > > >
> > > > > > > On Tue, Apr 11, 2017 at 9:44 AM, Jim Apple <
> jbap...@cloudera.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > > > >
> > > > > > > > > Locally, I work on native-toolchain using a VM configured
> > with
> > > > > > > > > Ubuntu16.04ppc64le, 4GB RAM and 50GB of HDD. If  we provide
> > > you a
> > > > > VM
> > > > > > > with
> > > > > > > > > this config, will it be sufficient ?
> > > > > > > > >
> > > > > > > >
> > > > > > > > What hypervisor/emulator will it use?
> > > > > > > >
> > > > > > > > What are the requirements of the host OS and host hardware?
> > > > > > > >
> > > > > > > > Why is the config you have it set to so important that you
> > > mention
> > > > it
> > > > > > in
> > > > > > > > your email - will the config be locked down into that config
> or
> > > can
> > > > > it
> > > > > > be
> > > > > > > > reconfigured later?
> > > > > > > >
> > > > > > > > How is the VM controlled from the host OS? Keep in mind that
> a
> > > GUI
> > > > > > cannot
> > > > > > > > be the only option for automated tests.
> > > > > > > >
> > > > > > > > FWIW, Impala's test suite probably cannot fully complete
> > without
> > > at
> > > > > > least
> > > > > > > > 8, and I suspect 16, GB of RAM, and we might need more disk
> > > space,
> > > > > too,
> > > > > > > but
> > > > > > > > these should be reconfigurable with most
> hypervisors/emulators.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > > --
> > > > > Henry Robinson
> > > > > Software Engineer
> > > > > Cloudera
> > > > > 415-994-6679
> > > > >
> > > >
> > >
> >
>


Re: Upstreaming ppc64le patches for native-toolchain

2017-04-17 Thread Tim Armstrong
I feel like we shouldn't make PPC part of pre-commit at least initially -
it's an unreasonable barrier if contributors/committers to debug issues on
a platform they don't have easy access to. Having the testing infra is
still important because we don't want to have code in there that will
gradually bit-rot without us noticing.

On Mon, Apr 17, 2017 at 8:51 AM, Silvius Rus  wrote:

> Would it make sense to _not_ run PPC tests as part of presubmit?  Instead
> Valencia could set up nightly tests using in-house infrastructure.  And
> share the test results, e.g., by sending them to a new email list
> te...@impala.incubator.apache.org (that we'd need to create) so everyone
> can see when there are failures or if coverage stops for whatever reason.
> GCC has been doing something like this for long,
> https://gcc.gnu.org/ml/gcc-testresults/2017-04/.
>
> On Tue, Apr 11, 2017 at 9:44 AM, Jim Apple  wrote:
>
> > >
> > > Locally, I work on native-toolchain using a VM configured with
> > > Ubuntu16.04ppc64le, 4GB RAM and 50GB of HDD. If  we provide you a VM
> with
> > > this config, will it be sufficient ?
> > >
> >
> > What hypervisor/emulator will it use?
> >
> > What are the requirements of the host OS and host hardware?
> >
> > Why is the config you have it set to so important that you mention it in
> > your email - will the config be locked down into that config or can it be
> > reconfigured later?
> >
> > How is the VM controlled from the host OS? Keep in mind that a GUI cannot
> > be the only option for automated tests.
> >
> > FWIW, Impala's test suite probably cannot fully complete without at least
> > 8, and I suspect 16, GB of RAM, and we might need more disk space, too,
> but
> > these should be reconfigurable with most hypervisors/emulators.
> >
>


Re: Upcoming changes to distcc scripts

2017-04-05 Thread Tim Armstrong
Did the local compilation succeed? Were the failures just intermittent?

My guess is that it's probably some kind of infrastructure flakiness or
just load on the servers.

I've seen that occasionally myself but I don't believe it's related to the
changes - before and after the change we're running distcc with the exact
same command line as before.



On Wed, Apr 5, 2017 at 9:36 AM, Marcel Kornacker <mar...@cloudera.com>
wrote:

> I'm now getting "failed to distribute, running locally instead" for
> every source file. Did anything else change?
>
> On Mon, Apr 3, 2017 at 11:44 AM, Tim Armstrong <tarmstr...@cloudera.com>
> wrote:
> > I'm about to start a merge for a patch that solves some of the issues
> with
> > distcc - switching between ASAN and non-ASAN builds and with ccache:
> > https://gerrit.cloudera.org/#/c/6493/ . Unfortunately it won't be
> > completely transparent. When you pull down the patch, you'll need to:
> >
> >- Open a new terminal or run "unset IMPALA_CXX_COMPILER" in your
> current
> >terminal
> >- Source bin/distcc/distcc_env.sh
> >- Rebuild with buildall.sh
> >
> > If you run into any problems, you can try cleaning out your
> cmake-generated
> > files.
> >
> >- ./bin/clean.sh && ./bin/create-test-configuration.sh
> >
> >
> > Cheers,
> > Tim
>


Re: Installation Problem?

2017-04-04 Thread Tim Armstrong
Didn't finish my thought:

Probably the most direct solution is to revert to the older version of
setuptools. Otherwise you could potentially try upgrading to a newer
version or setting up a virtualenv.

On Tue, Apr 4, 2017 at 10:13 AM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> Hi Katelyn,
>   It looks like the immediate cause of the problem is that you have a
> setuptools 0.7.x version installed. For whatever reason (it's arcane to me)
> that version is incompatible with the standard Python installation tools
> that we use for impala-shell. setuptools 0.6.x (which looks like what is
> provided by default on CentOS 6) or any setuptools version >= 0.8 should
> not have this problem.
>
> We use CentOS 6.4 pretty heavily and it looks like we have 0.6.10
> installed on those systems:
>
> $ yum info python-setuptools
> Name: python-setuptools
> Arch: noarch
> Version : 0.6.10
> Release : 3.el6
> Size: 1.5 M
> Repo: installed
> From repo   : centos-base
> Summary : Easily build and distribute Python packages
> URL : http://pypi.python.org/pypi/distribute
> License : Python or ZPLv2.0
> Description : Setuptools is a collection of enhancements to the Python
> distutils
> : that allow you to more easily build and distribute Python
> : packages, especially ones that have dependencies on other
> : packages.
> :
> : This package contains the runtime components of setuptools,
> : necessary to execute the software that requires
> pkg_resources.py.
>
>
> Probably the most direct solution is to switch to
>
> Cheers,
> Tim
>
> On Tue, Apr 4, 2017 at 9:26 AM, Jim Apple <jbap...@cloudera.com> wrote:
>
>> Are you planning to develop Impala or just use it? If just using it, this
>> question might be better suited for user@ rather than dev@.
>>
>> apache.org mailing lists automatically set reply-to to the list. I'm sure
>> people who reply will attempt to cc you, but they may forget. I'd suggest
>> that you will be more likely to get replies by subscribing to the list you
>> are sending it to.
>>
>> What version of Ubuntu are you on? What are the "necessary apt-gets" you
>> have done? How did you originally install Impala?
>>
>> On Tue, Apr 4, 2017 at 5:37 AM, Katelyn Mulgrew <katelyn.mulg...@bjss.com
>> >
>> wrote:
>>
>> > I am not part of the mailing list.  Could you send your response to me
>> at
>> > katelyn.mulg...@bjss.com<mailto:katelyn.mulg...@bjss.com>?  See the
>> > question below:
>> >
>> > From: Katelyn Mulgrew
>> > Sent: Monday, April 3, 2017 4:55 PM
>> > To: 'dev@impala.incubator.apache.org' <dev@impala.incubator.apache.org>
>> > Subject: Installation Problem?
>> >
>> > Impala experts,
>> >
>> > I am trying to run the impala shell (what I really want is to see if I
>> > installed impala properly).  I am on Ubuntu and have done all the
>> necessary
>> > apt-gets.  However, when I write "impala-shell" and hit return, I get a
>> > number of errors, as described here https://issues.apache.org/
>> > jira/browse/IMPALA-2863.  Can you help me?
>> >
>> > Sincerely,
>> >
>> > Katelyn Mulgrew
>> > The information included in this email and any attachments transmitted
>> > with it may contain information that is confidential. This email and any
>> > attachments are for the exclusive and confidential use of the intended
>> > recipient. If you are not the intended recipient, please do not read,
>> copy,
>> > disclose, distribute or take action in reliance upon this email. If you
>> > have received this email in error, please notify the sender immediately
>> by
>> > return email and promptly delete this message and its attachments from
>> your
>> > computer system. In the absence of a signed writing to the contrary,
>> BJSS'
>> > relevant standard terms and conditions will apply to instructions or
>> > descriptions in this email regarding any work to be undertaken. BJSS
>> shall
>> > have no liability for any damages to your system or data caused by any
>> > virus transmitted by this email or any attachments. Please carry out
>> virus
>> > or such other checks as you consider appropriate. BJSS, Inc. 14 Wall
>> > Street, 20th Fl, Suite #2069, New York, NY 10005
>> >
>>
>
>


Re: Installation Problem?

2017-04-04 Thread Tim Armstrong
Hi Katelyn,
  It looks like the immediate cause of the problem is that you have a
setuptools 0.7.x version installed. For whatever reason (it's arcane to me)
that version is incompatible with the standard Python installation tools
that we use for impala-shell. setuptools 0.6.x (which looks like what is
provided by default on CentOS 6) or any setuptools version >= 0.8 should
not have this problem.

We use CentOS 6.4 pretty heavily and it looks like we have 0.6.10 installed
on those systems:

$ yum info python-setuptools
Name: python-setuptools
Arch: noarch
Version : 0.6.10
Release : 3.el6
Size: 1.5 M
Repo: installed
>From repo   : centos-base
Summary : Easily build and distribute Python packages
URL : http://pypi.python.org/pypi/distribute
License : Python or ZPLv2.0
Description : Setuptools is a collection of enhancements to the Python
distutils
: that allow you to more easily build and distribute Python
: packages, especially ones that have dependencies on other
: packages.
:
: This package contains the runtime components of setuptools,
: necessary to execute the software that requires
pkg_resources.py.


Probably the most direct solution is to switch to

Cheers,
Tim

On Tue, Apr 4, 2017 at 9:26 AM, Jim Apple  wrote:

> Are you planning to develop Impala or just use it? If just using it, this
> question might be better suited for user@ rather than dev@.
>
> apache.org mailing lists automatically set reply-to to the list. I'm sure
> people who reply will attempt to cc you, but they may forget. I'd suggest
> that you will be more likely to get replies by subscribing to the list you
> are sending it to.
>
> What version of Ubuntu are you on? What are the "necessary apt-gets" you
> have done? How did you originally install Impala?
>
> On Tue, Apr 4, 2017 at 5:37 AM, Katelyn Mulgrew 
> wrote:
>
> > I am not part of the mailing list.  Could you send your response to me at
> > katelyn.mulg...@bjss.com?  See the
> > question below:
> >
> > From: Katelyn Mulgrew
> > Sent: Monday, April 3, 2017 4:55 PM
> > To: 'dev@impala.incubator.apache.org' 
> > Subject: Installation Problem?
> >
> > Impala experts,
> >
> > I am trying to run the impala shell (what I really want is to see if I
> > installed impala properly).  I am on Ubuntu and have done all the
> necessary
> > apt-gets.  However, when I write "impala-shell" and hit return, I get a
> > number of errors, as described here https://issues.apache.org/
> > jira/browse/IMPALA-2863.  Can you help me?
> >
> > Sincerely,
> >
> > Katelyn Mulgrew
> > The information included in this email and any attachments transmitted
> > with it may contain information that is confidential. This email and any
> > attachments are for the exclusive and confidential use of the intended
> > recipient. If you are not the intended recipient, please do not read,
> copy,
> > disclose, distribute or take action in reliance upon this email. If you
> > have received this email in error, please notify the sender immediately
> by
> > return email and promptly delete this message and its attachments from
> your
> > computer system. In the absence of a signed writing to the contrary,
> BJSS'
> > relevant standard terms and conditions will apply to instructions or
> > descriptions in this email regarding any work to be undertaken. BJSS
> shall
> > have no liability for any damages to your system or data caused by any
> > virus transmitted by this email or any attachments. Please carry out
> virus
> > or such other checks as you consider appropriate. BJSS, Inc. 14 Wall
> > Street, 20th Fl, Suite #2069, New York, NY 10005
> >
>


Upcoming changes to distcc scripts

2017-04-03 Thread Tim Armstrong
I'm about to start a merge for a patch that solves some of the issues with
distcc - switching between ASAN and non-ASAN builds and with ccache:
https://gerrit.cloudera.org/#/c/6493/ . Unfortunately it won't be
completely transparent. When you pull down the patch, you'll need to:

   - Open a new terminal or run "unset IMPALA_CXX_COMPILER" in your current
   terminal
   - Source bin/distcc/distcc_env.sh
   - Rebuild with buildall.sh

If you run into any problems, you can try cleaning out your cmake-generated
files.

   - ./bin/clean.sh && ./bin/create-test-configuration.sh


Cheers,
Tim


Re: Best practice when closing issues?

2017-04-03 Thread Tim Armstrong
I've always used "Resolve Issue".

I think there's only a distinction for more complex workflows. According to
JIRA, "Resolving an issue indicates that the developers are satisfied the
issue is finished" and "Closing an issue indicates that there is no more
work to be done on it, and that it has been verified as complete".


On Mon, Apr 3, 2017 at 2:32 AM, Lars Volker  wrote:

> Do we have a best practice for the Status of finished work? Currently we
> seem to use "Resolved" mostly, but sometime also "Closed". When should I
> use which one?
>
> Thanks, Lars
>
> StatusIssuesPercentage
> Open
>  jspa?reset=true=project+%3D+IMPALA+AND+status+
> %3D+Open+ORDER+BY+priority+DESC=hide>
> 1491
>
>29%
> Reopened
>  jspa?reset=true=project+%3D+IMPALA+AND+status+
> %3D+Reopened+ORDER+BY+priority+DESC=hide>
> 44
>
>1%
> Resolved
>  jspa?reset=true=project+%3D+IMPALA+AND+status+
> %3D+Resolved+ORDER+BY+priority+DESC=hide>
> 3550
>
>69%
> Closed
>  jspa?reset=true=project+%3D+IMPALA+AND+status+
> %3D+Closed+ORDER+BY+priority+DESC=hide>
> 45
>
>1%
>


Re: Kudu errors when using a custom toolchain

2017-03-28 Thread Tim Armstrong
I think the distcc_env.sh script went rogues and deleted all the *cmake*
files under your toolchain. I'd suggest blowing away Kudu in the toolchain
and bootstrapping again.
We fixed clean.sh to not do that a while ago but I just noticed that the
distcc script wasn't fixed.

On Tue, Mar 28, 2017 at 9:23 AM, Lars Volker  wrote:

> I followed the steps outlined here
>  Building+native-toolchain+from+scratch+and+using+with+Impala>
> but when trying to build against my local toolchain I get the error below.
> Does anyone know what I'm doing wrong? Thanks for the help, Lars.
>
> CMake Error at CMakeLists.txt:354 (find_package):
>   Could not find a package configuration file provided by "kuduClient" with
>   any of the following names:
>
> kuduClientConfig.cmake
> kuduclient-config.cmake
>
>   Add the installation prefix of "kuduClient" to CMAKE_PREFIX_PATH or set
>   "kuduClient_DIR" to a directory containing one of the above files.  If
>   "kuduClient" provides a separate development package or SDK, be sure it
> has
>   been installed.
>
>
> -- Configuring incomplete, errors occurred!
> See also "/home/lv/i5/CMakeFiles/CMakeOutput.log".
> Error in /home/lv/i5/bin/make_impala.sh at line 160: cmake .
> ${CMAKE_ARGS[@]}
>


Impala virtualenv bootstrapping

2017-03-07 Thread Tim Armstrong
Hi all,

I just landed a change in master (https://gerrit.cloudera.org/#/c/6218/)
that changes how the Python virtualenv is bootstrapped. This fixes a
problem people were hitting on Ubuntu 16.04 and also speeds up the
bootstrapping process.

I don't anticipate any problems but let me know if you see anything strange
as a result of the change.

- Tim


Re: Toolchain - versioning dependencies with the same version number

2017-02-28 Thread Tim Armstrong
I agree it's not too bad if you have a fat pipe to S3, but it's a pretty
bad regression in usability to make it the default and particularly provide
no way to opt out.

The toolchain is almost 1GB though, which is pretty problematic to download
if a developer is on coffee-shop wifi, cellular wireless, airplane wifi,
etc. It'd also be pretty easy for a developer working offline to switch
branches, run buildall.sh, have gcc, etc, automatically deleted and then be
stuck unable to build anything.


On Tue, Feb 28, 2017 at 9:07 AM, Henry Robinson <he...@apache.org> wrote:

> I'd prefer not to do that because it's something of a hack and generates
> too many artifacts if we make incremental build changes, not to mention the
> extra complexity required to make such a change because new tarballs might
> need to be uploaded.
>
>
>
>
> On Tue, Feb 28, 2017 at 8:55 AM Lars Volker <l...@cloudera.com> wrote:
>
> > Can we add another version string component like -1 or -impala1, or add a
> > dummy patch to the affected packages to allow for new versions with the
> > same upstream version? I think this is what Linux distributions commonly
> do
> > to have several versions of the same upstream version.
> >
> > On Feb 27, 2017 21:15, "Henry Robinson" <he...@cloudera.com> wrote:
> >
> > Yes, it would force re-downloading. At my office, downloading a toolchain
> > takes a matter of a few seconds, so I'm not sure the cost is that great.
> > And if it turned out to be problematic, one could always change the
> > toolchain directory for different branches. Having something locally that
> > set IMPALA_TOOLCHAIN_DIR=${IMPALA_HOME}/${IMPALA_TOOLCHAIN_BUILD_ID}/
> would
> > work.
> >
> > However I wouldn't want to force behaviour that into the toolchain
> scripts
> > because of the need for garbage collection it would raise - it wouldn't
> be
> > clear when to delete old toolchains programatically.
> >
> > On 27 February 2017 at 20:51, Tim Armstrong <tarmstr...@cloudera.com>
> > wrote:
> >
> > > Maybe I'm misunderstanding, but wouldn't that force re-downloading of
> the
> > > entire toolchain every time a developer switches between branches with
> > > different build IDs?
> > >
> > > I know some developers do that frequently, e.g. to try and reproduce
> bugs
> > > on older versions or backport patches.
> > >
> > > I agree it would be good to fix this, since I've run into this problem
> > > before, I'm just not quite sure what the best solution is. In the other
> > > case where I had this issue with LLVM I changed the version number (by
> > > appending noasserts-) to it, but that's really just a hack.
> > >
> > > -Tim
> > >
> > > On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson <he...@cloudera.com>
> > > wrote:
> > >
> > > > As Matt said, I have a patch that implements build ID-based
> versioning
> > at
> > > > https://gerrit.cloudera.org/#/c/6166/2.
> > > >
> > > > Does anyone want to take a look? If we could get this in soon it
> would
> > > help
> > > > smooth over the LZ4 change which is going in shortly.
> > > >
> > > > On 27 February 2017 at 14:21, Henry Robinson <he...@cloudera.com>
> > wrote:
> > > >
> > > > > I agree that that might be useful, and that it's a separately
> > > addressable
> > > > > problem.
> > > > >
> > > > > On 27 February 2017 at 14:18, Matthew Jacobs <m...@cloudera.com>
> > wrote:
> > > > >
> > > > >> Just catching up to this e-mail, though I had seen your code
> reviews
> > > > >> and I think this approach makes sense. An additional concern would
> > be
> > > > >> how to identify how a toolchain package was built, and AFAIK this
> is
> > > > >> tricky now if only the 'toolchain ID' is known. Before I saw this
> > > > >> e-mail I was thinking about this problem (which I think we can
> > address
> > > > >> separately), and that we might want to write the native-toolchain
> > git
> > > > >> hash with every toolchain build so that the exact build scripts
> are
> > > > >> associated with those build artifacts. I filed
> > > > >> https://issues.cloudera.org/browse/IMPALA-5002 for this related
> > > > >> problem.
> > > > >>
> > > > >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson <
&

Re: Toolchain - versioning dependencies with the same version number

2017-02-27 Thread Tim Armstrong
Maybe I'm misunderstanding, but wouldn't that force re-downloading of the
entire toolchain every time a developer switches between branches with
different build IDs?

I know some developers do that frequently, e.g. to try and reproduce bugs
on older versions or backport patches.

I agree it would be good to fix this, since I've run into this problem
before, I'm just not quite sure what the best solution is. In the other
case where I had this issue with LLVM I changed the version number (by
appending noasserts-) to it, but that's really just a hack.

-Tim

On Mon, Feb 27, 2017 at 4:35 PM, Henry Robinson  wrote:

> As Matt said, I have a patch that implements build ID-based versioning at
> https://gerrit.cloudera.org/#/c/6166/2.
>
> Does anyone want to take a look? If we could get this in soon it would help
> smooth over the LZ4 change which is going in shortly.
>
> On 27 February 2017 at 14:21, Henry Robinson  wrote:
>
> > I agree that that might be useful, and that it's a separately addressable
> > problem.
> >
> > On 27 February 2017 at 14:18, Matthew Jacobs  wrote:
> >
> >> Just catching up to this e-mail, though I had seen your code reviews
> >> and I think this approach makes sense. An additional concern would be
> >> how to identify how a toolchain package was built, and AFAIK this is
> >> tricky now if only the 'toolchain ID' is known. Before I saw this
> >> e-mail I was thinking about this problem (which I think we can address
> >> separately), and that we might want to write the native-toolchain git
> >> hash with every toolchain build so that the exact build scripts are
> >> associated with those build artifacts. I filed
> >> https://issues.cloudera.org/browse/IMPALA-5002 for this related
> >> problem.
> >>
> >> On Sat, Feb 25, 2017 at 10:22 PM, Henry Robinson 
> >> wrote:
> >> > As written, the toolchain can't apparently deal with the possibility
> of
> >> > build flags changing, but a dependency version remaining the same.
> >> >
> >> > LZ4 has never (afaict) been built with optimization enabled. I have a
> >> > commit that enables -O3, but that continues to produce artifacts for
> >> > lz4-1.7.5 with no version change. This is a problem because
> >> bootstrapping
> >> > the toolchain will fail to pick up the new binaries - because the
> >> > previously downloaded version is still in the local cache, and won't
> be
> >> > overwritten because of the version change.
> >> >
> >> > I think the simplest way to fix this is to write the toolchain build
> ID
> >> to
> >> > the dependency version file (that's in the local cache only) when it's
> >> > downloaded. If that ID changes, the dependency will be re-downloaded.
> >> >
> >> > This has the disadvantage that any bump in IMPALA_TOOLCHAIN_BUILD_ID
> >> will
> >> > invalidate all dependencies, and bin/bootstrap_toolchain.py will
> >> > re-download all of them. My feeling is that that cost is better than
> >> trying
> >> > to individually determine whether a dependency has changed between
> >> > toolchain builds.
> >> >
> >> > Any thoughts on whether this is the right way to go?
> >> >
> >> > Henry
> >>
> >
> >
> >
> > --
> > Henry Robinson
> > Software Engineer
> > Cloudera
> > 415-994-6679 <(415)%20994-6679>
> >
>
>
>
> --
> Henry Robinson
> Software Engineer
> Cloudera
> 415-994-6679
>


Re: status-benchmark.cc compilation time

2017-02-27 Thread Tim Armstrong
I think for status-benchmark.cc we should just reduce the unrolling - I
don't see a valid reason to unroll a loop that many times unless you're
just testing the compiler. No reason we can't just unroll the loop, say 10
times and run that 100 times and get an equally valid result.

Todd's suggestion about just running the benchmark a couple of iterations
is a reasonable idea, although I think it depends whether the benchmarks
are once-off experiments (in which case it seems ok to let them bit-rot) or
they are actually likely to be reused.

I think if we're going to more actively maintain benchmarks we should also
consider more proactively disabling or removing once-off benchmarks that
aren't likely to be reused.

On Thu, Feb 23, 2017 at 10:26 AM, Henry Robinson  wrote:

> I think the main problem I want to avoid is paying the cost of linking,
> which is expensive for Impala as it often generates multi-hundred-MB
> binaries per benchmark or test.
>
> Building the benchmarks during GVO seems the best solution to that to me.
>
> On 23 February 2017 at 10:23, Todd Lipcon  wrote:
>
> > One thing we've found useful in Kudu to prevent bitrot of benchmarks is
> to
> > actually use gtest and gflags for the benchmark programs.
> >
> > We set some flag like --benchmark_num_rows or --benchmark_num_iterations
> > with a default that's low enough to only run for a second or two, and run
> > it as part of our normal test suite. Rarely catches any bugs, but serves
> to
> > make sure that the code keeps working. Then, when a developer wants to
> > actually test a change for performance, they can run it with
> > --num_iterations=.
> >
> > Doesn't help the weird case of status-benchmark where *compiling* takes
> 10
> > minutes... but I think the manual unrolling of 1000 status calls in there
> > is probably unrealistic anyway regarding how the different options
> perform
> > in a whole-program setting.
> >
> > -Todd
> >
> > On Thu, Feb 23, 2017 at 10:20 AM, Zachary Amsden 
> > wrote:
> >
> > > Yes.  If you take a look at the benchmark, you'll notice the JNI call
> to
> > > initialize the frontend doesn't even have the right signature anymore.
> > > That's one easy way to bitrot while still compiling.
> > >
> > > Even fixing that isn't enough to get it off the ground.
> > >
> > >  - Zach
> > >
> > > On Tue, Feb 21, 2017 at 11:44 AM, Henry Robinson 
> > > wrote:
> > >
> > > > Did you run . bin/set-classpath.sh before running expr-benchmark?
> > > >
> > > > On 21 February 2017 at 11:30, Zachary Amsden 
> > > wrote:
> > > >
> > > > > Unfortunately some of the benchmarks have actually bit-rotted.  For
> > > > > example, expr-benchmark compiles but immediately throws JNI
> > exceptions.
> > > > >
> > > > > On Tue, Feb 21, 2017 at 10:55 AM, Marcel Kornacker <
> > > mar...@cloudera.com>
> > > > > wrote:
> > > > >
> > > > > > I'm also in favor of not compiling it on the standard
> commandline.
> > > > > >
> > > > > > However, I'm very much against allowing the benchmarks to bitrot.
> > As
> > > > > > was pointed out, those benchmarks can be valuable tools during
> > > > > > development, and keeping them in working order shouldn't really
> > > impact
> > > > > > the development process.
> > > > > >
> > > > > > In other words, let's compile them as part of gvo.
> > > > > >
> > > > > > On Tue, Feb 21, 2017 at 10:50 AM, Alex Behm <
> > alex.b...@cloudera.com>
> > > > > > wrote:
> > > > > > > +1 for not compiling the benchmarks in -notests
> > > > > > >
> > > > > > > On Mon, Feb 20, 2017 at 7:55 PM, Jim Apple <
> jbap...@cloudera.com
> > >
> > > > > wrote:
> > > > > > >
> > > > > > >> > On which note, would anyone object if we disabled benchmark
> > > > > > compilation
> > > > > > >> by
> > > > > > >> > default when building the BE tests? I mean separating out
> > > -notests
> > > > > > into
> > > > > > >> > -notests and -build_benchmarks (the latter false by
> default).
> > > > > > >>
> > > > > > >> I think this is a great idea.
> > > > > > >>
> > > > > > >> > I don't mind if the benchmarks bitrot as a result, because
> we
> > > > don't
> > > > > > run
> > > > > > >> > them regularly or pay attention to their output except when
> > > > > > developing a
> > > > > > >> > feature. Of course, maybe an 'exhaustive' run should build
> the
> > > > > > benchmarks
> > > > > > >> > as well just to keep us honest, but I'd be happy if 95% of
> > > Jenkins
> > > > > > builds
> > > > > > >> > didn't bother.
> > > > > > >>
> > > > > > >> The pre-merge (aka GVM aka GVO) testing builds
> > > > > > >> http://jenkins.impala.io:8080/job/all-build-options, which
> > builds
> > > > > > >> without the "-notests" flag.
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Henry Robinson
> > > > Software Engineer
> > > > Cloudera
> > > > 415-994-6679
> > > >
> > >
> >
> >
> >
> > --
> > Todd Lipcon
> > Software Engineer, Cloudera
> >
>

Re: test_ddl_stress likely not getting regular, automated runs

2017-02-08 Thread Tim Armstrong
For what it's worth there's a similar problem with TestSpillStress that I
came across too. I was surprised when I realised too.

On Wed, Feb 8, 2017 at 3:38 PM, Michael Brown  wrote:

> Hello,
>
> Background: I am conducting an audit of metadata tests.
>
> A particular manifestation of IMPALA-3947 [0] is that the test module
> test_ddl_stress.py isn't runnable via any buildall.sh option. I expect
> this means test_ddl_stress.py isn't being run regularly except by
> conscientious developers running it locally. Note that this test was
> not disabled as part of the commit "IMPALA-2605: Omit the sort and
> mini stress tests".
>
> I'm interested to know any history around this.
>
> Could any community members who have been around for several years,
> especially those who work on the Catalog, shed light on this? Were you
> assuming test_ddl_stress.py was being run regularly? Or do you know to
> run it by hand? Do you know it to be reliable and useful, and it
> should be included in regular runs, or do you know it to be flaky and
> needing work?
>
> It's fine if no one knows any such answers, but they would lend
> context to getting test_ddl_stress running regularly.
>
> Thanks.
>
> [0] https://issues.cloudera.org/browse/IMPALA-3947
>


Re: Standards for committers and PPMC members

2017-02-07 Thread Tim Armstrong
I just wanted to follow up on this discussion so that non-PPMC members know
that this hasn't just stalled out. The PPMC has been actively working on
identifying and voting on committers. I think we converged more towards
Todd Lipcon and Jim Apple's view of things where we take a more inclusive
view of committership.

So watch this space.

- Tim

On Tue, Jan 31, 2017 at 8:52 AM, Jim Apple <jbap...@cloudera.com> wrote:

> Which kind of things do you think we should use for examples of the
> contributions of Larry, Mathilda, Nicholas, Omie, and Patrick? I was
> thinking things in tests/benchmark, tests/comparison, the rest of
> tests/, testdata/, bin/, and bug reports. Would that help clarify?
>
> In the first one, I wrote the examples and then I said how I would
> feel about them. Would it be more helpful if you wrote them and I
> (and, perhaps, other PPMC members) gave feedback?
>
> On Mon, Jan 30, 2017 at 10:52 AM, Michael Brown <mi...@cloudera.com>
> wrote:
> > I apologize for dropping the ball on this.
> >
> >> Would it help to have examples of candidates L, M, N, O, and P who
> focus on testing tools and infrastructure?
> >
> > Yes.
> >
> > On Wed, Jan 11, 2017 at 1:50 PM, Jim Apple <jbap...@cloudera.com> wrote:
> >> Do you have any thoughts about what specific type or format of
> >> feedback would help make it less of a black box? Would it help to have
> >> examples of candidates L, M, N, O, and P who focus on testing tools
> >> and infrastructure?
> >>
> >> On Wed, Jan 11, 2017 at 1:14 PM, Michael Brown <mi...@cloudera.com>
> wrote:
> >>>> What do you think, Michael?
> >>>
> >>> Thanks to you, Tim, and Todd for your thoughts. It still feels like a
> black
> >>> box, especially for those of us who tend to concentrate on testing
> tools
> >>> and infrastructure for Impala. Any feedback is appreciated.
> >>>
> >>> On Fri, Jan 6, 2017 at 8:53 AM, Jim Apple <jbap...@cloudera.com>
> wrote:
> >>>
> >>>> My feeling is similar to Tim's:
> >>>>
> >>>> It's the PPMC's responsibility, but a contributor is welcome to plead
> >>>> their case, ask for a mentor, and so on. I think we shouldn't consider
> >>>> it rude or pushy or aggressive to request committership. It is a
> >>>> compliment to Impala and the Impala community that the contributor
> >>>> want to be more involved.
> >>>>
> >>>> What do you think, Michael?
> >>>>
> >>>> On Fri, Jan 6, 2017 at 8:36 AM, Tim Armstrong <
> tarmstr...@cloudera.com>
> >>>> wrote:
> >>>> > Hi Michael,
> >>>> >   My two cents is that the PMC should be proactive about identifying
> >>>> > potential committers and working with them to address any gaps. We
> >>>> haven't
> >>>> > done a good job of that so far but we've started up some
> discussions on
> >>>> the
> >>>> > private list to get better at that.
> >>>> >
> >>>> > You should feel free to ask anyone on the PMC about any of the above
> >>>> > questions. Ideally that wouldn't be necessary, but in practice it
> may
> >>>> help
> >>>> > move things along, particularly if you have someone who will
> advocate for
> >>>> > you and wrangle the PMC to come to a consensus. It's definitely on
> us to
> >>>> > communicate to you what gaps (if any) there are - it shouldn't
> really be
> >>>> a
> >>>> > black box.
> >>>> >
> >>>> > - Tim
> >>>> >
> >>>> > On Fri, Jan 6, 2017 at 8:24 AM, Michael Brown <mi...@cloudera.com>
> >>>> wrote:
> >>>> >
> >>>> >> You've done a great job highlighting some example scenarios. Here
> are
> >>>> some
> >>>> >> questions that aren't addressed in your writeup.
> >>>> >>
> >>>> >> What are contributors' responsibilities to move toward
> committership? In
> >>>> >> particular, I'm talking about process, not the nuts and bolts of
> >>>> >> contributions (including patches, bugs, reviews).  For example:
> >>>> >>
> >>>> >> Should a contributor who wants to be a committer find a "mentor"?
> >>>> >>
> >>>> >> Should a 

Re: Can a committer please gvm this

2017-02-07 Thread Tim Armstrong
Started the merge

On Tue, Feb 7, 2017 at 8:13 AM, Bharath Vissapragada 
wrote:

> https://gerrit.cloudera.org/#/c/5828
>


Re: Impala Hbase Security

2017-02-02 Thread Tim Armstrong
Does anyone on dev@ about this? I'm guess we don't support impersonation
but have no idea if we support kerberos - is that automatically picked up
by the HBASE client?

On Thu, Feb 2, 2017 at 6:55 AM, Danny Morgan  wrote:

> Hi Everyone, any luck?
> --
> *From:* Danny Morgan 
> *Sent:* Friday, January 27, 2017 10:08:12 PM
> *To:* u...@impala.incubator.apache.org
> *Subject:* Impala Hbase Security
>
>
> Does Impala support HBase security? Can Impala impersonation end users when
>  access HBase?
>
>
> Does Impala work with Kerberized HBase?
>
>
> Thank You
>
>


Re: Can a Committer Please Carry the +2 and Submit this Change for me?

2017-02-01 Thread Tim Armstrong
Will do

On Wed, Feb 1, 2017 at 5:20 PM, Lars Volker  wrote:

> https://gerrit.cloudera.org/#/c/5611/12
>
> Patch Set 11 has a +2 from Marcel. In Patch Set 12 I rebased the change,
> replaced NULL with nullptr in one .cc file, and removed "// clang-format"
> control statements and TODOs.
>
> Thanks for the help, Lars
>


Re: Upgrade Snappy to 1.1.4?

2017-01-31 Thread Tim Armstrong
Nice! We should do this. Last time we upgraded it was an easy perf win - we
snappy compress parquet by default and spend significant time decompressing
in scans.

I filed a JIRA: https://issues.cloudera.org/browse/IMPALA-4846

Is anyone interested in picking this up? It would require adding a new
version of snappy to the native-toolchain, then bumping the version in
Impala. Good way to learn about how we handle third-party dependencies.

- Tim

On Tue, Jan 31, 2017 at 10:28 AM, Todd Lipcon  wrote:

> I seem to recall that Impala uses snappy as the default codec for a lot of
> compression/decompression. May be worth upgrading to the latest release
> which claims a 20% improvement in decompression performance:
>
> https://github.com/google/snappy/blob/master/NEWS
>
> (just submitted a code review to Kudu to do the same, though we use LZ4
> more than snappy)
>
> -Todd
> --
> Todd Lipcon
> Software Engineer, Cloudera
>


Re: Hadoop Weekly #202

2017-01-29 Thread Tim Armstrong
But yeah, I don't know where they got that idea from, there wasn't an
official milestone where we claimed to support 16.04.

- Tim

On Sun, Jan 29, 2017 at 10:23 PM, Tim Armstrong <tarmstr...@cloudera.com>
wrote:

> We did a few things to fix the build on Ubuntu 16.04. AFAIK the build
> artifacts can be deployed and will work on Ubuntu 16.04 system. The tests
> still don't fully work because the way we handle dynamic library
> dependencies in the toolchain and impala-config with LD_LIBRARY_PATH is a
> bit broken.
>
> - Tim
>
> On Sun, Jan 29, 2017 at 10:00 PM, Jim Apple <jbap...@cloudera.com> wrote:
>
>> Hello, Impalas!
>>
>> This week, Hadoop Weekly said:
>>
>> -- Forwarded message --
>> From: Hadoop Weekly <i...@hadoopweekly.com>
>> Date: Sun, Jan 29, 2017 at 5:49 PM
>> Subject: Hadoop Weekly #202
>>
>> Hadoop Weekly
>> Issue #202
>> 29 January 2017
>>
>> ...
>>
>> Apache Impala (incubating) released version 2.8.0. The release fixes a
>> large number of bugs, adds support for Ubuntu 16.04, adds a number of
>> fixes to Kudu integration (support for ADD/DROP range partition,
>> completes support for ALTER commands), and more.
>>
>> https://lists.apache.org/thread.html/7ff6a68cbaab4133871adeb
>> 9851095d8dfe1d82ee6efb738e0fa9546@%3Cdev.impala.apache.org%3E
>>
>> ...
>>
>> 
>>
>>
>> Did we do that Ubuntu 16.04 thing? i don't see much about it:
>>
>> https://issues.cloudera.org/browse/IMPALA-4681?jql=project%
>> 20%3D%20IMPALA%20AND%20fixVersion%20%3D%20%22Impala%202.8.0%
>> 22%20AND%20text%20~%20%22ubuntu%22
>>
>> I also don't see this in
>>
>> git log 2.7.0..2.8.0
>>
>> or
>>
>> https://github.com/apache/incubator-impala/compare/2.7.0...2.8.0
>>
>> Where did Hadoop Weekly get this snippet from? Should we add a note
>> about promoting it to Hadoop Weekly on
>> https://cwiki.apache.org/confluence/display/IMPALA/DRAFT%3A+
>> How+to+Release
>> ?
>>
>
>


  1   2   3   4   5   6   7   8   9   10   >