from:"Philip Zeyliger \(Code Review\)"

[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard

2019-04-29 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13020 )

Change subject: IMPALA-8344: Add support for running the minicluster with 
S3Guard
..


Patch Set 2:

(6 comments)

http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh@312
PS2, Line 312:   export 
S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore"
I looked into whether there were other interesting impls here and couldn't find 
any. 
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LocalMetadataStore.java
 doesn't work across processes. If we had something that went cross-process, we 
wouldn't really need Dynamo in our world.


http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh
File bin/jenkins/release_cloud_resources.sh:

http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@29
PS2, Line 29: # This is currently only implemented for s3.
Consider asserting S3GUARD_DYNAMODB_TABLE and S3_BUCKET are non-empty...


http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@37
PS2, Line 37:   aws s3 rm --recursive --quiet 
s3://${S3_BUCKET}${TEST_WAREHOUSE_DIR}
This line and line 33 is useful enough that we should either do set +x or echo 
what we're about to do.


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py
File tests/common/impala_test_suite.py:

http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@177
PS2, Line 177: cls.filesystem_client = S3Client(S3_BUCKET_NAME)
I think we could coalesce this to use HDFS/Hadoop always, thereby maybe 
removing the S3Client code?


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@178
PS2, Line 178: elif IS_ABFS:
 :   # ABFS is implemented via HDFS command line client
 :   cls.filesystem_client = HdfsCommandLineClient("ABFS")
 : elif IS_ADLS:
 :   cls.filesystem_client = ADLSClient(ADLS_STORE_NAME)
Do you know which, if any, of these are getting tested these days?


http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py
File tests/util/hdfs_util.py:

http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@149
PS2, Line 149: # This client is a wrapper around the hdfs command line. This is 
useful for filesystems
Use pydoc?

class Foo(bla):
  """bla bla bla"""



--
To view, visit http://gerrit.cloudera.org:8080/13020
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d
Gerrit-Change-Number: 13020
Gerrit-PatchSet: 2
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: David Knupp 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Mon, 29 Apr 2019 18:15:20 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0

2019-04-23 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13005 )

Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 
3.1.0
..


Patch Set 3:

(1 comment)

Nice work.

When I did the transitional work between Hive 1 and Hive 2, I introduced a 
variable that switched at build time between the two worlds. (See 
a203733fac3e1e37df8abeee39a88d187153a8c5 for the revert and "git log --grep 
IMPALA_MINICLUSTER_PROFILE")

If I'm understanding right, the approach here is to produce a single "binary" 
that works for both worlds? Or at run time do the "original" Hive jars get run? 
I think both approaches are plausible; just want to make sure we're clear about 
it.

(Is the shading slow? I've seen maven-shade-plugin be very slow...)

http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh
File bin/impala-config.sh:

http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh@167
PS3, Line 167: export CDP_BUILD_NUMBER=994
This number looks like a CDH_BUILD_NUMBER, and is probably from the same name 
space, but looks like something you hand-wrote. You can ask the same oracle 
that CDH_BUILD_NUMBER uses for your own number which is guaranteed not to 
conflict in the future.



--
To view, visit http://gerrit.cloudera.org:8080/13005
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436
Gerrit-Change-Number: 13005
Gerrit-PatchSet: 3
Gerrit-Owner: Vihang Karajgaonkar 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sudhanshu Arora 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Comment-Date: Tue, 23 Apr 2019 17:10:25 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8288: avoid overflow causing DCHECK in PrettyPrint.

2019-03-08 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12702 )

Change subject: IMPALA-8288: avoid overflow causing DCHECK in PrettyPrint.
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12702
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I00d9769cf34e2ccd796ec1cf88797c8f8250f718
Gerrit-Change-Number: 12702
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 08 Mar 2019 21:09:40 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

2019-03-07 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
..


Patch Set 3:

> "I've kicked off a suite of our tests with code coverage on. I'm
 > also going to kick off a set of tests with the JNI checking and
 > 'exhaustive' tests. Maybe there's coverage, but we're not running
 > the queries against HBase."
 >
 > Did anything new come up in this?
 >

Nothing new. I'm still chasing a sequence of these calls which lead to 
CallObjectMethod() not having its exception checked...

checked_jni_CallVoidMethodV
checked_jni_ExceptionOccurred
checked_jni_ExceptionClear
checked_jni_GetObjectClass
checked_jni_FindClass
checked_jni_GetMethodID
checked_jni_CallObjectMethod

I can trigger this with

set mem_limit=50m;use functional;insert into table alltypesinsert
partition (year, month) /* +noclustered */
select at1.id, at1.bool_col, at1.tinyint_col, at1.smallint_col, at1.int_col, 
at1.bigint_col,
  at1.float_col, at1.double_col, at1.date_string_col, at1.string_col, 
at1.timestamp_col,
  at1.year, at2.id as month
from  functional.alltypes at1, functional.alltypes at2;

when the sink gets closed after a failure.

I don't know where this stack of calls is at this point. I'll keep looking, but 
I don't want to block on it.


--
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Thu, 07 Mar 2019 23:08:14 +
Gerrit-HasComments: No

[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@63
PS1, Line 63:   you'd actually want to integrate this with the general scratch 
space
> Currently we don't have global limits in TmpFileMgr, just per-query limits
It seems like spilling is going to be pretty sequential, so spinning cares less 
about SSD than caching does. That's another angle to reconcile.

Anyway--I agree that this is tractable. We don't want to run the user out of 
disk space, and evicting cache is a very reasonable thing to do.


http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@92
PS1, Line 92: useful. The buffer pool code currently provides buffers that are 
pinned
> I think having the pages pinned by default is the right thing for this case
Yep. I think the underlying question here is how much we want to "manage" this. 
In the traditional "local" case, we don't manage the OS buffer cache, and it 
definitely helps us out.

I could definitely imagine a world where Impala has a default cache size of 
~4GB that's always devoted to cache. The rest of Impala "managed" (as in, 
within memlimit) memory gets used by the cache if it's available, and evicted 
as necessary. The cache here is the "memory user of last resort".

Your point on observability is well-taken. Remote reads incur a write-to-cache 
cost that's kind of unexpected. Fortunately, we presume the disk is fairly idle 
given that the reads are happening over the network, but it'll be an important 
thing to surface.



--
To view, visit http://gerrit.cloudera.org:8080/12683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5
Gerrit-Change-Number: 12683
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 06 Mar 2019 20:14:26 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@28
PS1, Line 28: The code in this commit builds a cache with the simplest policy I 
could
Note that Kudu has an LRU cache that could be used pretty directly.



--
To view, visit http://gerrit.cloudera.org:8080/12683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5
Gerrit-Change-Number: 12683
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 06 Mar 2019 19:12:30 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12683 )

Change subject: Prototype for a remote read byte cache.
..


Patch Set 1:

I'm using Gerrit as a way to share this. It's certainly not ready for commit, 
but it may be worthy of discussion.


--
To view, visit http://gerrit.cloudera.org:8080/12683
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5
Gerrit-Change-Number: 12683
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:56:50 +
Gerrit-HasComments: No

[Impala-ASF-CR] Prototype for a remote read byte cache.

2019-03-06 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12683


Change subject: Prototype for a remote read byte cache.
..

Prototype for a remote read byte cache.

This code (which is far from ready for prime time) attempts to get to a
place where a byte cache for remote (either HDFS or S3 or ADLS) reads
can be evaluated. This commit message tries to write down what I've
figured out so far...

Hive LLAP found that it was useful to build such a cache.  (See
hive.git:llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelCache.java
and neighbors.) In their case, they implemented an LRUF policy. They are also
caching the uncompressed ORC byte streams, which give them fairly consistent
chunk sizes (because the compression buffers are typically all the same size).
Alluxio also found it useful to build a cache, and it plugs in as a Hadoop
filesystem. Apache Ignite and Quobole Rubix are of interest. And, of course,
there's memcached, Redis, ehcache, and so on and on and on...

Hive LLAP found that SSDs work best for these caches, since they're
seek heavy. I think this implies that they'd be configured differently
than scratch dirs, since, in a machine with both SSDs and disk, you may
want to keep the cache off the disk. (Or you may want a three-tier policy..)

The code in this commit builds a cache with the simplest policy I could
implement (FIFO) and backs the cache with a large, memory-mapped file. There is
implicitly a two-level cache here: the OS page cache is handling which pages
are in memory, and the cache here is handling eviction from the cache
altogether. The metadata for the cache is stored in a map in memory,
though the linked list management of free space is directly in the buffers.
(I currently regret that, but I don't think it affects the evaluation goal;
a linked list without pointer arithmetic would have been more pleasant
to deal with.)

I hooked up the code through ReadFromPosInternal() which was sort of
easy.  Technically cached files don't need an hdfsFileOpen(), but I've
not optimized that.

I've not made the cache coalesce overlapping reads. I suspect that
in practice it may not be necessary, because how we read tables is pretty
consistent, but that's something that needs to be checked.

I've not made the cache do "compaction" because the FIFO policy means
that there's no fragmentation.

When breaking down the function of the cache, a few things have to happen:
* Impala has to read through the cache or store things in the cache.
  I've done this via wrapping ReadFromPosInternal()

* Impala has to specify the "value" of certain data; e.g., Parquet footers
  or index pages are more valuable than data pages of really large
  fact tables.

  I've not dealt with this.

* The cache has to be configured.

  I've exposed a pair of flags for directories and how much of
  the relevant free space to use. If this sort of thing is long-lived,
  you'd actually want to integrate this with the general scratch space
  management code in Impala: a spilling query could use disk space
  that's being used by a cache. (Though even spilling queries should
  have limits; the cache may be more valuable than the currently
  spilling query.)

* The cache has to expose metrics.

  I exposed hit/miss counters (in both count and bytes) on the query,
  but nothing global.

* The cache needs to have a notion of locality, so that files don't
  end up cached N times for an N-node cluster.

  I've taken advantage of the recent changes to make remote reads pin themselves
  to certain hosts (to make the remote file handle cache work). I don't know
  whether that currently leaves us with 3 or 1 copies of most files, but
  it's enough for evaluation.

* The cache needs to have decent threading, since we read from many threads.

  I've set it up so that you get multiple caches, one per dir, and,
  furthermore, each cache is split into 8. This roughly models
  having a pool of 8 threads reading from SSDs. I've not explored
  this tunable space.

I used neither the temporary file manager code nor the buffer pool code.
This is out of expedience/ignorance. I suspect the temporary file
manager code, and especially its ability to do encryption, would be very
useful. The buffer pool code currently provides buffers that are pinned
by default, and I didn't want to deal with pinning and unpinning. A more
complicated implementation would find much more re-use potential here.

I looked into using 2MB Transparent Huge Pages but found that mmap
doesn't do both "file-backed" and "THP." There's some performance
exploration to be done here, but I've not looked into it.

The cache doesn't survive restarts. This seems ok and certainly makes
state management easier, since we don't have to worry about consistency
for the metadata state in the face of a crash. The underlying file is
deleted at creation time. Users may end up getting confused

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

2019-03-06 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
..


Patch Set 2:

(1 comment)

Yep, good catch. Thanks.

http://gerrit.cloudera.org:8080/#/c/12660/2/be/src/exec/hbase-table-scanner.cc
File be/src/exec/hbase-table-scanner.cc:

http://gerrit.cloudera.org:8080/#/c/12660/2/be/src/exec/hbase-table-scanner.cc@562
PS2, Line 562:   RETURN_ERROR_IF_EXC(env);
> I think GetArrayLength does not throw an exception, so this check is probab
Done



--
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 06 Mar 2019 18:13:58 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

2019-03-06 Thread Philip Zeyliger (Code Review)

Hello Sahil Takiar, Todd Lipcon, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12660

to look at the new patch set (#3).

Change subject: IMPALA-8250: Clean up JNI warnings.
..

IMPALA-8250: Clean up JNI warnings.

Using LIBHDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to
(a) checking for exceptions and (b) leaking local references.

Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC
left and right. I chose not to expand the JniCall infrastructure
to handle this more generally at the moment.

The leaky local references are a bit harder. In the logs, they show up
as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A
few of these errors seem to be not in our code.  The ones that I've
found in our code stemmed from HBaseTableScanner::GetRowKey(): this
method uses local references and wasn't returning them. Using a
JniLocalFrame seems to have taken care of the warnings.

I have added code to skip test_large_strings when JNI checking is
enabled. This test takes forever (presumably because JNI is checking
bounds on strings very aggressively), and times out. The time out also
causes some metric-related checks to fail (since a query is still in
flight).

Debugging this required customizing my JDK to give stack traces
when these warnings occurred. The following diff facilitated
this.

  diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp
  --- a/src/share/vm/prims/jniCheck.cpp Tue Jan 15 10:43:31 2019 +
  +++ b/src/share/vm/prims/jniCheck.cpp Wed Feb 27 11:57:13 2019 -0800
  @@ -143,11 +143,30 @@
   static const char * fatal_instance_field_mismatch = "Field type (instance) 
mismatch in JNI get/set field operations";
   static const char * fatal_non_string = "JNI string operation received a 
non-string";

  +// thisone: whether to print every time, or maybe, depending on future
  +// how many future stacks we want printed (totally racy); helps catch
  +// missing exception handling if there's a way to tickle that code
  +// reliably.
  +static inline void dump_native_stack(JavaThread* thr, bool thisone, int 
future) {
  +  static int fut_stacks = 0; // racy!
  +  if (fut_stacks > 0) {
  +thisone = true;
  +fut_stacks--;
  +  }
  +  if (future > 0) fut_stacks = future;
  +  if (thisone) {
  +frame fr = os::current_frame();
  +char buf[6000];
  +tty->print_cr("Thread: %s %d", thr->get_thread_name(), 
thr->osthread()->thread_id());
  +print_native_stack(tty, fr, thr, buf, sizeof(buf));
  +  }
  +}

   // When in VM state:
   static void ReportJNIWarning(JavaThread* thr, const char *msg) {
 tty->print_cr("WARNING in native method: %s", msg);
 thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   }

   // When in NATIVE state:
  @@ -199,11 +218,14 @@
 tty->print_cr("WARNING in native method: JNI call made without 
checking exceptions when required to from %s",
   thr->get_pending_jni_exception_check());
 thr->print_stack();
  +  dump_native_stack(thr, true, 10);
   )
   thr->clear_pending_jni_exception_check(); // Just complain once
 }
   }

  +
  +
   /**
* Add to the planned number of handles. I.e. plus current live & warning 
threshold
*/
  @@ -254,9 +276,12 @@
 tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu",
 live_handles, planned_capacity);
 thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   )
   // Complain just the once, reset to current + warn threshold
   add_planned_handle_capacity(handles, 0);
  +  } else {
  +dump_native_stack(thr, false, 0);
 }
   }

Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
---
M be/src/catalog/catalog.cc
M be/src/exec/hbase-table-scanner.cc
M be/src/exprs/hive-udf-call.cc
M be/src/runtime/hbase-table-factory.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M tests/query_test/test_insert.py
7 files changed, 55 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12660/3
--
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon

[Impala-ASF-CR] Simplify JNI exception-handling macros.

2019-03-05 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12659 )

Change subject: Simplify JNI exception-handling macros.
..


Patch Set 3: Code-Review+2

Rebase; carrying +2.


--
To view, visit http://gerrit.cloudera.org:8080/12659
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I38e9e51d3180328935f41bb118647113e5cd409c
Gerrit-Change-Number: 12659
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 06 Mar 2019 00:59:11 +
Gerrit-HasComments: No

[Impala-ASF-CR] Fix code coverage argument to bin/run-all-tests.sh

2019-03-05 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12673 )

Change subject: Fix code coverage argument to bin/run-all-tests.sh
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12673
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I0244cae9f67b184c931616c58c42c62cabeb0627
Gerrit-Change-Number: 12673
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 06 Mar 2019 00:50:06 +
Gerrit-HasComments: No

[native-toolchain-CR] Add ccache support

2019-03-05 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12574 )

Change subject: Add ccache support
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12574/2/docker/all/postinstall.sh
File docker/all/postinstall.sh:

http://gerrit.cloudera.org:8080/#/c/12574/2/docker/all/postinstall.sh@54
PS2, Line 54:   make -j
We often use -j$(nproc)... don't know that it matters.



-- 
To view, visit http://gerrit.cloudera.org:8080/12574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieef291d1294a204b0b8da7e7aa4fa642cdd5e144
Gerrit-Change-Number: 12574
Gerrit-PatchSet: 2
Gerrit-Owner: Hector Acosta 
Gerrit-Reviewer: Hector Acosta 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 05 Mar 2019 21:30:59 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Simplify JNI exception-handling macros.

2019-03-05 Thread Philip Zeyliger (Code Review)

Hello Sahil Takiar, Todd Lipcon, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12659

to look at the new patch set (#2).

Change subject: Simplify JNI exception-handling macros.
..

Simplify JNI exception-handling macros.

This simplifies the implementations of ABORT_IF_EXC() and
CLEAN_EXIT_IF_EXC() by reducing them to usages
of JniUtil::GetJniExceptionMsg() and our typical Status-handling
macros. The existing implementations didn't clear
the exception before calling additional Java methods,
whereas JniUtil::GetJniExceptionMsg() already handled
that correctly.

I tested this manually by renaming a class in catalogd's startup
sequence to a class that doesn't exist and checked that the crash
logging was useful, containing both the Java and C++ stack traces
with both macros.

This is part of IMPALA-8250.

Change-Id: I38e9e51d3180328935f41bb118647113e5cd409c
---
M be/src/util/jni-util.h
1 file changed, 9 insertions(+), 27 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/12659/2
--
To view, visit http://gerrit.cloudera.org:8080/12659
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I38e9e51d3180328935f41bb118647113e5cd409c
Gerrit-Change-Number: 12659
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

2019-03-05 Thread Philip Zeyliger (Code Review)

Hello Sahil Takiar, Todd Lipcon, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12660

to look at the new patch set (#2).

Change subject: IMPALA-8250: Clean up JNI warnings.
..

IMPALA-8250: Clean up JNI warnings.

Using LIBHDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to
(a) checking for exceptions and (b) leaking local references.

Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC
left and right. I chose not to expand the JniCall infrastructure
to handle this more generally at the moment.

The leaky local references are a bit harder. In the logs, they show up
as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A
few of these errors seem to be not in our code.  The ones that I've
found in our code stemmed from HBaseTableScanner::GetRowKey(): this
method uses local references and wasn't returning them. Using a
JniLocalFrame seems to have taken care of the warnings.

I have added code to skip test_large_strings when JNI checking is
enabled. This test takes forever (presumably because JNI is checking
bounds on strings very aggressively), and times out. The time out also
causes some metric-related checks to fail (since a query is still in
flight).

Debugging this required customizing my JDK to give stack traces
when these warnings occurred. The following diff facilitated
this.

  diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp
  --- a/src/share/vm/prims/jniCheck.cpp Tue Jan 15 10:43:31 2019 +
  +++ b/src/share/vm/prims/jniCheck.cpp Wed Feb 27 11:57:13 2019 -0800
  @@ -143,11 +143,30 @@
   static const char * fatal_instance_field_mismatch = "Field type (instance) 
mismatch in JNI get/set field operations";
   static const char * fatal_non_string = "JNI string operation received a 
non-string";

  +// thisone: whether to print every time, or maybe, depending on future
  +// how many future stacks we want printed (totally racy); helps catch
  +// missing exception handling if there's a way to tickle that code
  +// reliably.
  +static inline void dump_native_stack(JavaThread* thr, bool thisone, int 
future) {
  +  static int fut_stacks = 0; // racy!
  +  if (fut_stacks > 0) {
  +thisone = true;
  +fut_stacks--;
  +  }
  +  if (future > 0) fut_stacks = future;
  +  if (thisone) {
  +frame fr = os::current_frame();
  +char buf[6000];
  +tty->print_cr("Thread: %s %d", thr->get_thread_name(), 
thr->osthread()->thread_id());
  +print_native_stack(tty, fr, thr, buf, sizeof(buf));
  +  }
  +}

   // When in VM state:
   static void ReportJNIWarning(JavaThread* thr, const char *msg) {
 tty->print_cr("WARNING in native method: %s", msg);
 thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   }

   // When in NATIVE state:
  @@ -199,11 +218,14 @@
 tty->print_cr("WARNING in native method: JNI call made without 
checking exceptions when required to from %s",
   thr->get_pending_jni_exception_check());
 thr->print_stack();
  +  dump_native_stack(thr, true, 10);
   )
   thr->clear_pending_jni_exception_check(); // Just complain once
 }
   }

  +
  +
   /**
* Add to the planned number of handles. I.e. plus current live & warning 
threshold
*/
  @@ -254,9 +276,12 @@
 tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu",
 live_handles, planned_capacity);
 thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   )
   // Complain just the once, reset to current + warn threshold
   add_planned_handle_capacity(handles, 0);
  +  } else {
  +dump_native_stack(thr, false, 0);
 }
   }

Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
---
M be/src/catalog/catalog.cc
M be/src/exec/hbase-table-scanner.cc
M be/src/exprs/hive-udf-call.cc
M be/src/runtime/hbase-table-factory.cc
M be/src/service/frontend.cc
M be/src/service/frontend.h
M tests/query_test/test_insert.py
7 files changed, 56 insertions(+), 10 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12660/2
--
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

2019-03-05 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12660 )

Change subject: IMPALA-8250: Clean up JNI warnings.
..


Patch Set 1:

(3 comments)

> (3 comments)
 >
 > I'm seeing a lot of places in the HBase scanner where we call
 > GetByteArrayRegion or GetObjectArrayElement without checking for
 > exceptions (these can both throw ArrayIndexOutOfBoundsExceptions).
 > I'm not sure why -Xcheck:jni wouldn't detect these, but perhaps be
 > should do a scrub of the rest of the Impala code to make sure we
 > are checking for exceptions after each array call (there are
 > several other array-based JNI ops that throw ArrayIndexOutOfBoundsException
 > - 
 > https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html).
 > We can do that in a different patch though since it might be a
 > larger change.

Yeah: this is surprising. I read the jniCheck.cpp code and it should be 
handling the GetObjectArray... methods. The most likely scenario is that our 
tests don't have coverage here. Or jniCheck isn't doing its business.

I'm going to sprinkle in some of these where it's obvious.

I've kicked off a suite of our tests with code coverage on. I'm also going to 
kick off a set of tests with the JNI checking and 'exhaustive' tests. Maybe 
there's coverage, but we're not running the queries against HBase.


This discussion (from jniCheck.cpp) is also interesting. I didn't see the code 
that didn't warn about ArrayIndexOutOfBounds, but maybe that's what's happening.


/**
 * Check whether or not a programmer has actually checked for exceptions. 
According
 * to the JNI Specification ("jni/spec/design.html#java_exceptions"):
 *
 * There are two cases where the programmer needs to check for exceptions 
without
 * being able to first check an error code:
 *
 * - The JNI functions that invoke a Java method return the result of the Java 
method.
 * The programmer must call ExceptionOccurred() to check for possible exceptions
 * that occurred during the execution of the Java method.
 *
 * - Some of the JNI array access functions do not return an error code, but may
 * throw an ArrayIndexOutOfBoundsException or ArrayStoreException.
 *
 * In all other cases, a non-error return value guarantees that no exceptions 
have been thrown.
 *
 * Programmers often defend against ArrayIndexOutOfBoundsException, so warning
 * for these functions would be pedantic.
 */

http://gerrit.cloudera.org:8080/#/c/12660/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12660/1//COMMIT_MSG@9
PS1, Line 9: Using HDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings 
related to
> It's LIBHDFS_OPTS right?
Done


http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/runtime/hbase-table-factory.cc
File be/src/runtime/hbase-table-factory.cc:

http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/runtime/hbase-table-factory.cc@97
PS1, Line 97: if (!s.ok()) LOG(INFO) << "Exception when cleaning up HBase" 
<< s;
> nit: space after HBase
Done


http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/service/frontend.cc@116
PS1, Line 116: ABORT_IF_ERROR(JniUtil::LoadJniMethod(jni_env, fe_class_, 
&(methods[i])));
> Might be understanding the code wrong, but shouldn't these be global refs s
Based on 
https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#accessing_fields_and_methods
 , 
https://stackoverflow.com/questions/2093112/why-i-should-not-reuse-a-jclass-and-or-jmethodid-in-jni
 and 
http://www.latkin.org/blog/2016/02/01/jni-object-lifetimes-quick-reference/ 
(very nice!) and 
https://developer.android.com/training/articles/perf-jni.html#local_and_global_references
 say that jmethodid (which is the type here) doesn't need global references, 
but is invalidated if the class gets unloaded. Fortunately, the class is kept 
alive because fe_ stores it.

I thought fe_class_ had the problem you suggest (it's a local reference), but 
it turns out that it's not used outside this method. I removed it from the 
class to make that obvious.



--
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 05 Mar 2019 20:34:15 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Remove unused THROW IF ERROR WITH LOGGING, THROW IF EXC, and RETURN IF EXC.

2019-03-05 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12658 )

Change subject: Remove unused THROW_IF_ERROR_WITH_LOGGING, THROW_IF_EXC, and 
RETURN_IF_EXC.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12658/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12658/1//COMMIT_MSG@7
PS1, Line 7: Remove unused THROW_IF_ERROR_WITH_LOGGING, THROW_IF_EXC, and 
RETURN_IF_EXC.
> nit: Add "IMPALA-8250: "
Done


http://gerrit.cloudera.org:8080/#/c/12658/1//COMMIT_MSG@9
PS1, Line 9: These macros have no references, so removing them. Found while
> So were these removed purely for cleanup purposes, or was there something w
Added to the commit message. Some of them were broken.

Note how ExceptionClear() is called after CallStaticObjectMethod()...

-jthrowable exc = (env)->ExceptionOccurred(); \
-if (exc != nullptr) { \
-  DCHECK((throwable_to_string_id_) != nullptr); \
-  jstring stack = (jstring) 
env->CallStaticObjectMethod(JniUtil::jni_util_class(), \
-  (JniUtil::throwable_to_stack_trace_id()), exc); \
-  jboolean is_copy; \
-  const char* c_stack = \
-  reinterpret_cast((env)->GetStringUTFChars(stack, 
_copy)); \
-  (env)->ExceptionClear(); \



--
To view, visit http://gerrit.cloudera.org:8080/12658
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ic63b498e0e4da57c7ec15cf1ad8070a6afdb3d96
Gerrit-Change-Number: 12658
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 05 Mar 2019 19:07:56 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8250: Remove unused THROW IF ERROR WITH LOGGING, THROW IF EXC, and RETURN IF EXC.

2019-03-05 Thread Philip Zeyliger (Code Review)

Hello Sahil Takiar, Todd Lipcon, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12658

to look at the new patch set (#2).

Change subject: IMPALA-8250: Remove unused THROW_IF_ERROR_WITH_LOGGING, 
THROW_IF_EXC, and RETURN_IF_EXC.
..

IMPALA-8250: Remove unused THROW_IF_ERROR_WITH_LOGGING, THROW_IF_EXC, and 
RETURN_IF_EXC.

These macros have no references, so removing them. Found while
investigating JNI warnings in IMPALA-8250.

The following one-liner does a hap-hazard job of finding similar issues.
There are relatively few left, and none of them seemed problematic.

  (for x in $(git grep '#define' be/src | grep -v be/src/kudu | grep -v 
be/src/gutil | grep -v be/src/thirdparty | grep -v be/src/transport/config.h | 
awk '{ print $2 }' | sed -e 's,(.*,,' | sort | uniq); do echo -n "$x "; git 
grep -e $x | wc -l; done) | awk '$2 == 1 { print }'

Note that some of the macros were broken, because they
called other JNI functions before calling ExceptionClear().

Change-Id: Ic63b498e0e4da57c7ec15cf1ad8070a6afdb3d96
---
M be/src/util/jni-util.h
1 file changed, 0 insertions(+), 41 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/12658/2
--
To view, visit http://gerrit.cloudera.org:8080/12658
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ic63b498e0e4da57c7ec15cf1ad8070a6afdb3d96
Gerrit-Change-Number: 12658
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Reviewer: Todd Lipcon

[Impala-ASF-CR] IMPALA-8100: Add initial support for Ranger

2019-03-04 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12632 )

Change subject: IMPALA-8100: Add initial support for Ranger
..


Patch Set 6:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12632/6/be/src/service/frontend.cc
File be/src/service/frontend.cc:

http://gerrit.cloudera.org:8080/#/c/12632/6/be/src/service/frontend.cc@41
PS6, Line 41: 
"org.apache.impala.authorization.sentry.SentryAuthorizationFactory",
> I'm actually having a second thought whether it's a good idea to make Autho
If we expose it as a pluggable thing, we have to maintain the pluggable API 
forever. Unless someone is clamoring for it, I'd rather we lock it down by 
default. Not a blocking concern.



--
To view, visit http://gerrit.cloudera.org:8080/12632
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8cad9e609d20aae1ff645c84fd58a02afee70276
Gerrit-Change-Number: 12632
Gerrit-PatchSet: 6
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 05 Mar 2019 05:07:08 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Simplify JNI exception-handling macros.

2019-03-04 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12659


Change subject: Simplify JNI exception-handling macros.
..

Simplify JNI exception-handling macros.

This simplifies the implementations of ABORT_IF_EXC() and
CLEAN_EXIT_IF_EXC() by reducing them to usages
of JniUtil::GetJniExceptionMsg() and our typical Status-handling
macros. The existing implementations didn't clear
the exception before calling additional Java methods,
whereas JniUtil::GetJniExceptionMsg() already handled
that correctly.

I tested this manually by renaming a class in catalogd's startup
sequence to a class that doesn't exist and checked that the crash
logging was useful, containing both the Java and C++ stack traces
with both macros.

This is part of IMPALA-8250.

Change-Id: I38e9e51d3180328935f41bb118647113e5cd409c
---
M be/src/util/jni-util.h
1 file changed, 9 insertions(+), 27 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/12659/1
--
To view, visit http://gerrit.cloudera.org:8080/12659
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I38e9e51d3180328935f41bb118647113e5cd409c
Gerrit-Change-Number: 12659
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger

[Impala-ASF-CR] Remove unused THROW IF ERROR WITH LOGGING, THROW IF EXC, and RETURN IF EXC.

2019-03-04 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12658


Change subject: Remove unused THROW_IF_ERROR_WITH_LOGGING, THROW_IF_EXC, and 
RETURN_IF_EXC.
..

Remove unused THROW_IF_ERROR_WITH_LOGGING, THROW_IF_EXC, and RETURN_IF_EXC.

These macros have no references, so removing them. Found while
investigating JNI warnings in IMPALA-8250.

The following one-liner does a hap-hazard job of finding similar issues.
There are relatively few left, and none of them seemed problematic.

  (for x in $(git grep '#define' be/src | grep -v be/src/kudu | grep -v 
be/src/gutil | grep -v be/src/thirdparty | grep -v be/src/transport/config.h | 
awk '{ print $2 }' | sed -e 's,(.*,,' | sort | uniq); do echo -n "$x "; git 
grep -e $x | wc -l; done) | awk '$2 == 1 { print }'

Change-Id: Ic63b498e0e4da57c7ec15cf1ad8070a6afdb3d96
---
M be/src/util/jni-util.h
1 file changed, 0 insertions(+), 41 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/12658/1
--
To view, visit http://gerrit.cloudera.org:8080/12658
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ic63b498e0e4da57c7ec15cf1ad8070a6afdb3d96
Gerrit-Change-Number: 12658
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger

[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.

2019-03-04 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12660


Change subject: IMPALA-8250: Clean up JNI warnings.
..

IMPALA-8250: Clean up JNI warnings.

Using HDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to
(a) checking for exceptions and (b) leaking local references.

Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC
left and right. I chose not to expand the JniCall infrastructure
to handle this more generally at the moment.

The leaky local references are a bit harder. In the logs, they show up
as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A
few of these errors seem to be not in our code.  The ones that I've
found in our code stemmed from HBaseTableScanner::GetRowKey(): this
method uses local references and wasn't returning them. Using a
JniLocalFrame seems to have taken care of the warnings.

I have added code to skip test_large_strings when JNI checking is
enabled. This test takes forever (presumably because JNI is checking
bounds on strings very aggressively), and times out. The time out also
causes some metric-related checks to fail (since a query is still in
flight).

Debugging this required customizing my JDK to give stack traces
when these warnings occurred. The following diff facilitated
this.

  diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp
  --- a/src/share/vm/prims/jniCheck.cpp Tue Jan 15 10:43:31 2019 +
  +++ b/src/share/vm/prims/jniCheck.cpp Wed Feb 27 11:57:13 2019 -0800
  @@ -143,11 +143,30 @@
   static const char * fatal_instance_field_mismatch = "Field type (instance) 
mismatch in JNI get/set field operations";
   static const char * fatal_non_string = "JNI string operation received a 
non-string";

  +// thisone: whether to print every time, or maybe, depending on future
  +// how many future stacks we want printed (totally racy); helps catch
  +// missing exception handling if there's a way to tickle that code
  +// reliably.
  +static inline void dump_native_stack(JavaThread* thr, bool thisone, int 
future) {
  +  static int fut_stacks = 0; // racy!
  +  if (fut_stacks > 0) {
  +thisone = true;
  +fut_stacks--;
  +  }
  +  if (future > 0) fut_stacks = future;
  +  if (thisone) {
  +frame fr = os::current_frame();
  +char buf[6000];
  +tty->print_cr("Thread: %s %d", thr->get_thread_name(), 
thr->osthread()->thread_id());
  +print_native_stack(tty, fr, thr, buf, sizeof(buf));
  +  }
  +}

   // When in VM state:
   static void ReportJNIWarning(JavaThread* thr, const char *msg) {
 tty->print_cr("WARNING in native method: %s", msg);
 thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   }

   // When in NATIVE state:
  @@ -199,11 +218,14 @@
 tty->print_cr("WARNING in native method: JNI call made without 
checking exceptions when required to from %s",
   thr->get_pending_jni_exception_check());
 thr->print_stack();
  +  dump_native_stack(thr, true, 10);
   )
   thr->clear_pending_jni_exception_check(); // Just complain once
 }
   }

  +
  +
   /**
* Add to the planned number of handles. I.e. plus current live & warning 
threshold
*/
  @@ -254,9 +276,12 @@
 tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu",
 live_handles, planned_capacity);
 thr->print_stack();
  +  dump_native_stack(thr, true, 0);
   )
   // Complain just the once, reset to current + warn threshold
   add_planned_handle_capacity(handles, 0);
  +  } else {
  +dump_native_stack(thr, false, 0);
 }
   }

Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
---
M be/src/catalog/catalog.cc
M be/src/exec/hbase-table-scanner.cc
M be/src/exprs/hive-udf-call.cc
M be/src/runtime/hbase-table-factory.cc
M be/src/service/frontend.cc
M tests/query_test/test_insert.py
6 files changed, 43 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12660/1
--
To view, visit http://gerrit.cloudera.org:8080/12660
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f
Gerrit-Change-Number: 12660
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger

[Impala-ASF-CR] IMPALA-8188: Fix DiskInfo::GetDeviceNames() for NVME disks

2019-02-26 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12557 )

Change subject: IMPALA-8188: Fix DiskInfo::GetDeviceNames() for NVME disks
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d5bd93b4b09682a8c8248e7aa123d23d27cfeb4
Gerrit-Change-Number: 12557
Gerrit-PatchSet: 3
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Tue, 26 Feb 2019 23:30:51 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8250: Fix a handful of JNI usage errors.

2019-02-26 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12582 )

Change subject: IMPALA-8250: Fix a handful of JNI usage errors.
..


Patch Set 2:

Added a JIRA. Yeah; I'm still working on this, but this is the lowest-hanging 
fruit.


--
To view, visit http://gerrit.cloudera.org:8080/12582
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Id3b8bccee20cc882cffa4439e550e1f48347d3a6
Gerrit-Change-Number: 12582
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Sahil Takiar 
Gerrit-Comment-Date: Tue, 26 Feb 2019 21:43:20 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8250: Fix a handful of JNI usage errors.

2019-02-26 Thread Philip Zeyliger (Code Review)

Hello Lars Volker, Sahil Takiar, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12582

to look at the new patch set (#2).

Change subject: IMPALA-8250: Fix a handful of JNI usage errors.
..

IMPALA-8250: Fix a handful of JNI usage errors.

Running Impala with LIB_HDFS_OPTS=-Xcheck:jni revealed a case of "FATAL
ERROR in native method: Bad global or local ref passed to JNI" as well
as 'WARNING in native method: JNI FindClass received a bad class
descriptor "Lorg/apache/impala/common/Pair;"' This commit fixes these.

There are additional JNI warnings that I'm working through
as well as a fundamental question about how to avoid these
systematically.

Kudos to Lars Volker and Sahil Takiar who co-found these issues.

Change-Id: Id3b8bccee20cc882cffa4439e550e1f48347d3a6
---
M be/src/catalog/catalog-util.cc
M be/src/util/logging-support.cc
2 files changed, 4 insertions(+), 4 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/12582/2
--
To view, visit http://gerrit.cloudera.org:8080/12582
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Id3b8bccee20cc882cffa4439e550e1f48347d3a6
Gerrit-Change-Number: 12582
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Sahil Takiar

[Impala-ASF-CR] Fix a handful of JNI usage errors.

2019-02-25 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12582


Change subject: Fix a handful of JNI usage errors.
..

Fix a handful of JNI usage errors.

Running Impala with LIB_HDFS_OPTS=-Xcheck:jni revealed a case of "FATAL
ERROR in native method: Bad global or local ref passed to JNI" as well
as 'WARNING in native method: JNI FindClass received a bad class
descriptor "Lorg/apache/impala/common/Pair;"' This commit fixes these.

There are additional JNI warnings that I'm working through
as well as a fundamental question about how to avoid these
systematically.

Kudos to Lars Volker and Sahil Takiar who co-found these issues.

Change-Id: Id3b8bccee20cc882cffa4439e550e1f48347d3a6
---
M be/src/catalog/catalog-util.cc
M be/src/util/logging-support.cc
2 files changed, 4 insertions(+), 4 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/12582/1
--
To view, visit http://gerrit.cloudera.org:8080/12582
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id3b8bccee20cc882cffa4439e550e1f48347d3a6
Gerrit-Change-Number: 12582
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger

[Impala-ASF-CR] Allow CMake to find JNI libraries from JDK

2019-02-25 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12580


Change subject: Allow CMake to find JNI libraries from JDK
..

Allow CMake to find JNI libraries from JDK

JNI libraries can be in JAVA_HOME/jre/lib/amd64 or
JAVA_HOME/lib/amd64. We were missing one entry in
the list of places to look.

This came up when I built a custom OpenJDK for myself
and wanted to use it for building.

Change-Id: I6e9f9e5b96e2a1c3c0b0ad6cae1a34ca22c1ec19
---
M cmake_modules/FindJNI.cmake
1 file changed, 1 insertion(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/12580/1
--
To view, visit http://gerrit.cloudera.org:8080/12580
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I6e9f9e5b96e2a1c3c0b0ad6cae1a34ca22c1ec19
Gerrit-Change-Number: 12580
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger

[native-toolchain-CR] Add ccache support

2019-02-25 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12574 )

Change subject: Add ccache support
..


Patch Set 1: Code-Review+1

(2 comments)

Looks reasonable to me. Nice speed improvement.

http://gerrit.cloudera.org:8080/#/c/12574/1/docker/debian7.df
File docker/debian7.df:

http://gerrit.cloudera.org:8080/#/c/12574/1/docker/debian7.df@43
PS1, Line 43:   wget "https://www.samba.org/ftp/ccache/ccache-3.3.3.tar.gz; && \
I recommend moving this into a shell script.

We've also tended to sha-check our binaries. E.g.:

bin/bootstrap_system.sh:redhat sha512sum -c - <<< 
'c8321aa223f70d7e64d3d0274263000cfffb46fbea61488534e26f9f0245d99e9872d0888e35cd3274416392a13f80c748c07750caaeffa5f9cae1220020715f
  apache-ant-1.9.13-bin.tar.gz'


http://gerrit.cloudera.org:8080/#/c/12574/1/init.sh
File init.sh:

http://gerrit.cloudera.org:8080/#/c/12574/1/init.sh@138
PS1, Line 138: : ${USE_CCACHE=0}
 : export USE_CCACHE
 :
 : : ${CCACHE_MAXSIZE=50G}
 : export CCACHE_MAXSIZE
 :
 : : ${CCACHE_DIR=$SOURCE_DIR/ccache}
 : export CCACHE_DIR
 :
 : # Default ccache_compilercheck is mtime, which considers CC's 
mtime + size
 : # to determine if there's a hit. Setting CCACHE_COMPILERCHECK to 
'content'
 : # uses the hash of the compiler instead.
 : export CCACHE_COMPILERCHECK=${CCACHE_COMPILERCHECK:-content}
 :
 : export CCACHE_COMPRESS=1
Do you think these would be better encapsulated in a .ccache.conf file that we 
create when we bootstrap ccache?



--
To view, visit http://gerrit.cloudera.org:8080/12574
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ieef291d1294a204b0b8da7e7aa4fa642cdd5e144
Gerrit-Change-Number: 12574
Gerrit-PatchSet: 1
Gerrit-Owner: hector.aco...@cloudera.com 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 25 Feb 2019 17:36:38 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8188: Fix DiskInfo::GetDeviceNames() for NVME disks

2019-02-22 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12557 )

Change subject: IMPALA-8188: Fix DiskInfo::GetDeviceNames() for NVME disks
..


Patch Set 1:

Could you extract the function and add a unit test that the regexp does what 
it's supposed to?

Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/12557
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4d5bd93b4b09682a8c8248e7aa123d23d27cfeb4
Gerrit-Change-Number: 12557
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Fri, 22 Feb 2019 21:15:21 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8178: Disable file handle cache for HDFS erasure coded files

2019-02-21 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12552 )

Change subject: IMPALA-8178: Disable file handle cache for HDFS erasure coded 
files
..


Patch Set 1: Code-Review+2

(2 comments)

Thanks.

http://gerrit.cloudera.org:8080/#/c/12552/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12552/1//COMMIT_MSG@12
PS1, Line 12: unbuffer(), so these file handles are not suitable for long
Maybe mention that we used /jmx to get at at least part of the memory usage?


http://gerrit.cloudera.org:8080/#/c/12552/1/be/src/runtime/io/scan-range.cc
File be/src/runtime/io/scan-range.cc:

http://gerrit.cloudera.org:8080/#/c/12552/1/be/src/runtime/io/scan-range.cc@463
PS1, Line 463:   DCHECK(buffer_opts.client_buffer_ == nullptr ||
I think you could reasonably assert:

DCHECK(!(expected_local && is_erasure_coded))

I.e., if something is expected_local, it can't possibly be erasure coded.



--
To view, visit http://gerrit.cloudera.org:8080/12552
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I8c761e08aacc952de0033a4c91e07f15c8ec96da
Gerrit-Change-Number: 12552
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 21 Feb 2019 23:59:11 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8168: Disable Sentry HDFS sync on S3

2019-02-21 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12482 )

Change subject: IMPALA-8168: Disable Sentry HDFS sync on S3
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12482
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifa12f163a32bb6e597323149fbe565a6f66f5e69
Gerrit-Change-Number: 12482
Gerrit-PatchSet: 2
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 21 Feb 2019 16:52:23 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8233: Do not re-download Ranger if it is already downloaded

2019-02-21 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12541 )

Change subject: IMPALA-8233: Do not re-download Ranger if it is already 
downloaded
..


Patch Set 1: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12541/1/bin/bootstrap_toolchain.py
File bin/bootstrap_toolchain.py:

http://gerrit.cloudera.org:8080/#/c/12541/1/bin/bootstrap_toolchain.py@425
PS1, Line 425:   if not version:
 : raise Exception("Could not find version for Ranger in 
environment var {0}"
 : .format(env_var_version))
While you're here, do you want to move this to between line 421 and 422, so 
it's closer?



--
To view, visit http://gerrit.cloudera.org:8080/12541
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iec3b200bda11d00bba6a250461b37c599d8d1adf
Gerrit-Change-Number: 12541
Gerrit-PatchSet: 1
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 21 Feb 2019 16:54:57 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8234: Fix ordering of Thrift enum, fix enum values, add warning

2019-02-20 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12543 )

Change subject: IMPALA-8234: Fix ordering of Thrift enum, fix enum values, add 
warning
..


Patch Set 3: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/12543
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: If215f16a636008757ceb439edbd6900a1be88c59
Gerrit-Change-Number: 12543
Gerrit-PatchSet: 3
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 21 Feb 2019 04:00:01 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8185: Abstract out real/mock file system operations

2019-02-20 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12437 )

Change subject: IMPALA-8185: Abstract out real/mock file system operations
..


Patch Set 4:

(2 comments)

Just kibitzing: would be good to move the comments to the interface so people 
can tell what's going on?

http://gerrit.cloudera.org:8080/#/c/12437/4/fe/src/main/java/org/apache/impala/analysis/FileSystemFacade.java
File fe/src/main/java/org/apache/impala/analysis/FileSystemFacade.java:

http://gerrit.cloudera.org:8080/#/c/12437/4/fe/src/main/java/org/apache/impala/analysis/FileSystemFacade.java@64
PS4, Line 64:   ScanAllocation allocateScans(FeFsTable table, TScanRangeSpec 
scanRangeSpecs,
Where do we document what this does?


http://gerrit.cloudera.org:8080/#/c/12437/4/fe/src/main/java/org/apache/impala/analysis/HdfsFileSystemFacade.java
File fe/src/main/java/org/apache/impala/analysis/HdfsFileSystemFacade.java:

http://gerrit.cloudera.org:8080/#/c/12437/4/fe/src/main/java/org/apache/impala/analysis/HdfsFileSystemFacade.java@102
PS4, Line 102:   public ScanAllocation allocateScans(FeFsTable table, 
TScanRangeSpec scanRangeSpecs,
Do you know what this is used for?

It's kind of different from the C++ version of the same 
(be/src/scheduling/scheduler.h). I don't think it matters, but it's weirdo.

I'd basically be surprised of computeNumNodes() isn't just:

max(1, min(numExecutors, numBlocks))

i.e., always at least 1 and at most numExecutors, but determined by the number 
of blocks.



--
To view, visit http://gerrit.cloudera.org:8080/12437
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1a385923b64c9fb59cc6e700ee7ee14919398e6d
Gerrit-Change-Number: 12437
Gerrit-PatchSet: 4
Gerrit-Owner: Paul Rogers 
Gerrit-Reviewer: Bharath Vissapragada 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 20 Feb 2019 22:43:04 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Logging current database context when logging analyzing queries.

2019-02-20 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12301 )

Change subject: Logging current database context when logging analyzing queries.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12301/1/fe/src/main/java/org/apache/impala/service/Frontend.java
File fe/src/main/java/org/apache/impala/service/Frontend.java:

http://gerrit.cloudera.org:8080/#/c/12301/1/fe/src/main/java/org/apache/impala/service/Frontend.java@1243
PS1, Line 1243: + queryCtx.session.database);
> In Java, a general rule is to use accessor methods, but since this is Thrif
I'm not using accessor methods because that was the local style.

It's a little selfish, but if the query doesn't have newlines in it (as many 
queries don't), then you get one query per log line, which is super handy for 
grep.

I can certainly move the db up, like:

Analyzing query (default db: ...): 

if you think that looks better.



--
To view, visit http://gerrit.cloudera.org:8080/12301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I70886be4685e0d3ae7ca9f6e57e8159dc39c67c7
Gerrit-Change-Number: 12301
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 20 Feb 2019 19:21:54 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Adding hostname to Disk I/O errors.

2019-02-20 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12402 )

Change subject: Adding hostname to Disk I/O errors.
..


Patch Set 1:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/12402/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12402/1//COMMIT_MSG@16
PS1, Line 16: sprinked
> nit: typo
Done


http://gerrit.cloudera.org:8080/#/c/12402/1/be/src/runtime/io/disk-io-mgr.cc
File be/src/runtime/io/disk-io-mgr.cc:

http://gerrit.cloudera.org:8080/#/c/12402/1/be/src/runtime/io/disk-io-mgr.cc@297
PS1, Line 297:
> nit: while you're here, remove double space?
Done



--
To view, visit http://gerrit.cloudera.org:8080/12402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2
Gerrit-Change-Number: 12402
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 20 Feb 2019 19:09:04 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Adding hostname to Disk I/O errors.

2019-02-20 Thread Philip Zeyliger (Code Review)

Hello Lars Volker, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12402

to look at the new patch set (#2).

Change subject: Adding hostname to Disk I/O errors.
..

Adding hostname to Disk I/O errors.

I recently ran into some queries that failed like so:

  WARNINGS: Disk I/O error: Could not open file: /data/...: Error(5): 
Input/output error

These warnings were in the profile, but I had to cross-reference impalad
logs to figure out which machine had the broken disk.

In this commit, I've sprinkled GetBackendString() to include it.

Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2
---
M be/src/exprs/timezone_db.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/error-converter.cc
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-monitored-ops.cc
M be/src/runtime/io/local-file-reader.cc
M common/thrift/generate_error_codes.py
8 files changed, 23 insertions(+), 20 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/12402/2
--
To view, visit http://gerrit.cloudera.org:8080/12402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2
Gerrit-Change-Number: 12402
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker

[Impala-ASF-CR] Adding hostname to Disk I/O errors.

2019-02-20 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12402 )

Change subject: Adding hostname to Disk I/O errors.
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2
Gerrit-Change-Number: 12402
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 20 Feb 2019 19:09:15 +
Gerrit-HasComments: No

[Impala-ASF-CR] Adding hostname to Disk I/O errors.

2019-02-20 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12402 )

Change subject: Adding hostname to Disk I/O errors.
..


Patch Set 2:

Trivial carry +2.


--
To view, visit http://gerrit.cloudera.org:8080/12402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2
Gerrit-Change-Number: 12402
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 20 Feb 2019 19:09:12 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-19 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 2: Code-Review+2

I think get_pids() works better here. Thanks!


--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 2
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 20 Feb 2019 00:03:08 +
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-8204: checkout to the right commit of Impala-lzo in buildall.sh

2019-02-15 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12495 )

Change subject: IMPALA-8204: checkout to the right commit of Impala-lzo in 
buildall.sh
..


Patch Set 2:

> I still think we should write down the depended Impala-lzo version
 > somewhere. Once I want to build an older version of Impala, I don't
 > need to guess which Impala-lzo commit should I check out to.

it's in bin/bootstrap_development.sh, no?


 >
 > There's a benifit to do the checkout in buildall.sh. In the future
 > when I want to build an older version (but >2.13, >3.2), I just
 > need to checkout to the right tag in Impala and then run
 > buildall.sh. It'll bootstrap dependencies including the Impala-lzo
 > to the corresponding versions.
 >
 > If you're developing Impala-lzo, you can first commit it locally,
 > and then change the depended commit id in impala-config-branch.sh.
 > I think this is the same with other dependencies. For example, if
 > we want to bump the version of ORC lib, we need to upgrade it in
 > native-toolchain project and then change the corresponding version
 > in impala-config.sh.

I understand your argument, but I don't think it's acceptable for a "build" 
script to change branches of git repos. We could treat impala-lzo like we treat 
the toolchain (i.e., checked out in bootstrap_toolchain.sh), and then I'd sort 
of see it as a "build artifact" which can be overridden rather than my source 
code.

(BTW, if you choose to cherrypick the num_unqueued_files that drove this mess, 
we'll be back to using the same impala-lzo code base in both branches...)


--
To view, visit http://gerrit.cloudera.org:8080/12495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I5e4b53d695de04d31d39b6909572907944713ba4
Gerrit-Change-Number: 12495
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Sat, 16 Feb 2019 00:34:23 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8193: Fix python 2.6 issue in junit prune notrun.py

2019-02-15 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12479 )

Change subject: IMPALA-8193: Fix python 2.6 issue in junit_prune_notrun.py
..


Patch Set 1: Code-Review+2

Sure.


--
To view, visit http://gerrit.cloudera.org:8080/12479
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9ef8fb77b1ac8c51e3dfb6b04690ae9ccc490d62
Gerrit-Change-Number: 12479
Gerrit-PatchSet: 1
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Sat, 16 Feb 2019 00:29:33 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-15 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 1:

(3 comments)

Ok. Thanks for explaining. I'm fine with the tactical fix, but I think doing 
process.get_pids() is probably easy enough and will be less confusing.

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/common/impala_cluster.py
File tests/common/impala_cluster.py:

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/common/impala_cluster.py@321
PS1, Line 321: if set(self.cmd) == set(process.cmdline):
Do you know why this is set?


http://gerrit.cloudera.org:8080/#/c/12501/1/tests/common/impala_cluster.py@322
PS1, Line 322:   return pid
Instead of returning pid here, you could collect all matches, and then either 
pick the better one (parent) or fail? Would that have made this more obvious?


http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py
File tests/custom_cluster/test_breakpad.py:

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py@88
PS1, Line 88:   def wait_for_all_processes_dead(self, processes, timeout=300):
: # For every process in the list we might see the original 
Impala process plus a forked
: # off child that is writing the minidump. To catch both we go 
through the list twice.
: for process in processes * 2:
> These are Process objects (https://github.com/apache/impala/blob/master/tes
OMG.

I would never have thought that "Process" is really "output of pgrep".

How would you feel about changing this to process.get_pids() here, so that it's 
more obvious that a process can get represented multiple times in this case, 
and we have an answer?



--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Sat, 16 Feb 2019 00:27:02 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test

2019-02-15 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12501 )

Change subject: IMPALA-8191: Wait for additional breakpad processes during test
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py
File tests/custom_cluster/test_breakpad.py:

http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py@88
PS1, Line 88:   def wait_for_all_processes_dead(self, processes, timeout=300):
: # For every process in the list we might see the original 
Impala process plus a forked
: # off child that is writing the minidump. To catch both we go 
through the list twice.
: for process in processes * 2:
I don't understand this. processes here is a list of pids, and we wait() on 
each of them. Why does waiting on them multiple times help things?

It sounded from your description in the commit message is that the issue is 
where we enumarate which process to wait for (the impalad or its fork), but 
that's not what this code is doing.

I'm sure I'm confused; let me know where!



--
To view, visit http://gerrit.cloudera.org:8080/12501
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca
Gerrit-Change-Number: 12501
Gerrit-PatchSet: 1
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 15 Feb 2019 23:00:54 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8204: checkout to the right commit of Impala-lzo in buildall.sh

2019-02-15 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12497 )

Change subject: IMPALA-8204: checkout to the right commit of Impala-lzo in 
buildall.sh
..


Patch Set 3:

Please see my reviews in https://gerrit.cloudera.org/#/c/12495/. I don't think 
it's appropriate for buildall to move git checkouts around.


--
To view, visit http://gerrit.cloudera.org:8080/12497
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I5e4b53d695de04d31d39b6909572907944713ba4
Gerrit-Change-Number: 12497
Gerrit-PatchSet: 3
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 15 Feb 2019 17:46:38 +
Gerrit-HasComments: No

[Impala-ASF-CR](2.x) IMPALA-8204: checkout to the right commit of Impala-lzo in buildall.sh

2019-02-15 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12495 )

Change subject: IMPALA-8204: checkout to the right commit of Impala-lzo in 
buildall.sh
..


Patch Set 2:

(2 comments)

I don't think this approach is wise. We have two places where we check out 
Impala-lzo:

a) bin/bootstrap_system.sh:  git clone --branch master 
https://github.com/cloudera/impala-lzo.git "$IMPALA_LZO_HOME"

b) Jenkins jobs.

I think it's ok to move a git branch when doing the checkout, but it's not so 
nice to move the branch when you're doing buildall. (For example, if I were to 
need to make a change to impala-lzo, this would confuse me incredibly, since 
buildall would be undoing my changes!)

http://gerrit.cloudera.org:8080/#/c/12495/2/bin/impala-config-branch.sh
File bin/impala-config-branch.sh:

http://gerrit.cloudera.org:8080/#/c/12495/2/bin/impala-config-branch.sh@26
PS2, Line 26: IMPALA_LZO_VERSION=8a984e0
Do you want to use a long git hash or a tag or a branch name instead? The short 
git hash is sort of the worst of all worlds, since it's not guaranteed to work 
if there's a collision.


http://gerrit.cloudera.org:8080/#/c/12495/2/buildall.sh
File buildall.sh:

http://gerrit.cloudera.org:8080/#/c/12495/2/buildall.sh@337
PS2, Line 337:   # Checkout to the right version of Impala-lzo. 
IMPALA_LZO_VERSION is set in
 :   # bin/impala-config-branch.sh
 :   if [[ -e "$IMPALA_LZO" ]]; then
 : echo "Checkout to commit $IMPALA_LZO_VERSION for Impala-lzo 
repo in $IMPALA_LZO"
 : (pushd "$IMPALA_LZO" && git checkout "$IMPALA_LZO_VERSION")
 :   fi
I don't think it's appropriate for buildall.sh to move a repo around like this.



--
To view, visit http://gerrit.cloudera.org:8080/12495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: 2.x
Gerrit-MessageType: comment
Gerrit-Change-Id: I5e4b53d695de04d31d39b6909572907944713ba4
Gerrit-Change-Number: 12495
Gerrit-PatchSet: 2
Gerrit-Owner: Quanlong Huang 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 15 Feb 2019 17:45:49 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-8099: Update the build scripts to support Apache Ranger

2019-02-15 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12469 )

Change subject: IMPALA-8099: Update the build scripts to support Apache Ranger
..


Patch Set 3: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12469
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I249cd64d74518946829e8588ed33d5ac454ffa7b
Gerrit-Change-Number: 12469
Gerrit-PatchSet: 3
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Anonymous Coward (402)
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 15 Feb 2019 17:13:35 +
Gerrit-HasComments: No

[Impala-ASF-CR] Remove some noisier clang-tidy rules

2019-02-13 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12470 )

Change subject: Remove some noisier clang-tidy rules
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12470
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I12d61dccff0f9772165c10238ae178208a100706
Gerrit-Change-Number: 12470
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 13 Feb 2019 23:02:13 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8194: wait longer to detect JVM pause in TestPauseMonitor.

2019-02-13 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12475 )

Change subject: IMPALA-8194: wait longer to detect JVM pause in 
TestPauseMonitor.
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12475
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I735c0c0ecfd3a9099c9cef332c5e79854bec7b8d
Gerrit-Change-Number: 12475
Gerrit-PatchSet: 1
Gerrit-Owner: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 13 Feb 2019 22:56:08 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8099: Update the build scripts to support Apache Ranger

2019-02-13 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12469 )

Change subject: IMPALA-8099: Update the build scripts to support Apache Ranger
..


Patch Set 2:

(8 comments)

If you have a chance, I'd love for you to see that test-with-docker doesn't 
bail out with this change.

No major issues; just some questions below.

http://gerrit.cloudera.org:8080/#/c/12469/2/bin/create-test-configuration.sh
File bin/create-test-configuration.sh:

http://gerrit.cloudera.org:8080/#/c/12469/2/bin/create-test-configuration.sh@204
PS2, Line 204: p -f "${RANGER_TEST_CONF_DIR}/java_home.sh" 
"${RANGER_SERVER_CONF_DIR}"
 : cp -f 
"${RANGER_TEST_CONF_DIR}/ranger-admin-env-hadoopconfdir.sh" \
 : "${RANGER_SERVER_CONF_DIR}"
 : cp -f "${RANGER_TEST_CONF_DIR}/ranger-admin-env-logdir.sh" \
 : "${RANGER_SERVER_CONF_DIR}"
 : cp -f "${RANGER_TEST_CONF_DIR}/ranger-admin-env-piddir.sh" \
 : "${RANGER_SERVER_CONF_DIR}"
 : cp -f "${RANGER_TEST_CONF_DIR}/security-applicationContext.xml" \
 : "${RANGER_SERVER_CONF_DIR}"
 : cp -f "${RANGER_TEST_CONF_DIR}/core-site.xml" 
"${RANGER_SERVER_CONF_DIR}"
Is this "cp ${RANGE_TEST_CONF_DIR}/* $RANGER_SERVER_CONF_DIR"?


http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/bin/run-ranger-server.sh
File testdata/bin/run-ranger-server.sh:

http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/bin/run-ranger-server.sh@31
PS2, Line 31: JAVA_OPTS="-Xdebug 
-Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=30130" \
Is it intentional that this has debug ports open by default?


http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/bin/run-ranger-server.sh@32
PS2, Line 32: "${RANGER_HOME}"/ews/ranger-admin-services.sh restart
Does this have an XmX setting? If we don't, these things eat a ton of RAM on 
machines with a lot of RAM.


http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/install.properties.template
File testdata/cluster/ranger/install.properties.template:

http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/install.properties.template@37
PS2, Line 37: #SQL_CONNECTOR_JAR=/opt/sqlanywhere17/java/sajdbc4.jar
This is super noisy with a lot of things commented out.


http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/java_home.sh
File testdata/cluster/ranger/java_home.sh:

http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/java_home.sh@1
PS2, Line 1: export JAVA_HOME=${JAVA_HOME}
Why is this necessary? We have JAVA_HOME configured anyway...


http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-default-site.xml.template
File testdata/cluster/ranger/ranger-admin-default-site.xml.template:

http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-default-site.xml.template@510
PS2, Line 510: tzL1AKl5uc4NKYaoQ4P3WLGIBFPXWPWdu1fRm9004jtQiV
Is this a problem?


http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-env-hadoopconfdir.sh
File testdata/cluster/ranger/ranger-admin-env-hadoopconfdir.sh:

http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-env-hadoopconfdir.sh@1
PS2, Line 1: export RANGER_HADOOP_CONF_DIR=/etc/hadoop/conf
Don't we need to substitute this?


http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-env-logdir.sh
File testdata/cluster/ranger/ranger-admin-env-logdir.sh:

http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-env-logdir.sh@1
PS2, Line 1: export RANGER_ADMIN_LOG_DIR=${RANGER_HOME}/ews/logs
We put sentry logs somewhere else, I think...



--
To view, visit http://gerrit.cloudera.org:8080/12469
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I249cd64d74518946829e8588ed33d5ac454ffa7b
Gerrit-Change-Number: 12469
Gerrit-PatchSet: 2
Gerrit-Owner: Fredy Wijaya 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 13 Feb 2019 19:46:52 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()

2019-02-13 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12267 )

Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST()
..


Patch Set 6:

I don't know if it's plausible, but it may make sense to get the query 
generator to generate some of these casts in a randomized fashion. (See 
tests/comparison.)


--
To view, visit http://gerrit.cloudera.org:8080/12267
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697
Gerrit-Change-Number: 12267
Gerrit-PatchSet: 6
Gerrit-Owner: Gabor Kaszab 
Gerrit-Reviewer: Attila Jeges 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Gabor Kaszab 
Gerrit-Reviewer: Greg Rahn 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 13 Feb 2019 17:24:57 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8186: script to configure docker network

2019-02-12 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12452 )

Change subject: IMPALA-8186: script to configure docker network
..


Patch Set 2:

(4 comments)

No objections here; just thinking out loud about how some of this stuff works. 
Fine by me to make incremental progress.

http://gerrit.cloudera.org:8080/#/c/12452/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12452/2//COMMIT_MSG@15
PS2, Line 15: "start-impala-cluster.py --docker_network=network-name"
Should the network name end up in the env variables too?


http://gerrit.cloudera.org:8080/#/c/12452/2/docker/configure_test_network.sh
File docker/configure_test_network.sh:

http://gerrit.cloudera.org:8080/#/c/12452/2/docker/configure_test_network.sh@23
PS2, Line 23: #  cluster configurations need to be regenerated, all minicluster 
processes restarted,
nit, extra space.


http://gerrit.cloudera.org:8080/#/c/12452/2/docker/configure_test_network.sh@45
PS2, Line 45: GATEWAY=$(docker network inspect "$NETWORK_NAME" | grep -o 
'"Gateway": "[0-9\.]*"' | \
:   grep -o "[0-9.]*")
I don't know if the following makes you any happier:

$docker network inspect bridge -f '{{(index .IPAM.Config 0).Gateway}}'
10.250.0.1


http://gerrit.cloudera.org:8080/#/c/12452/2/docker/configure_test_network.sh@50
PS2, Line 50: echo "export INTERNAL_LISTEN_HOST=${GATEWAY}" >> 
bin/impala-config-local.sh
This clearly persists across restarts. Does the "docker network create" command 
persist across reboots? Is Docker deterministic about the gateway?



--
To view, visit http://gerrit.cloudera.org:8080/12452
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icb4854aa951bcad7087a9653845b22ffd862057d
Gerrit-Change-Number: 12452
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 12 Feb 2019 19:02:19 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Update toolchain to support ubuntu 18.04

2019-02-11 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12421 )

Change subject: Update toolchain to support ubuntu 18.04
..


Patch Set 1: Code-Review+2

This is fine by me.


--
To view, visit http://gerrit.cloudera.org:8080/12421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ie25c8cb129c6817a2e116f31853ae64c5a8acfe9
Gerrit-Change-Number: 12421
Gerrit-PatchSet: 1
Gerrit-Owner: hector.aco...@cloudera.com 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Mon, 11 Feb 2019 20:17:27 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-8175: improve tests minicluster obj

2019-02-08 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12412 )

Change subject: IMPALA-8175: improve tests_minicluster_obj
..


Patch Set 2: Code-Review+2

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12412/2/tests/comparison/cluster.py
File tests/comparison/cluster.py:

http://gerrit.cloudera.org:8080/#/c/12412/2/tests/comparison/cluster.py@846
PS2, Line 846:   def find_pid(self):
 : # Need to filter results to avoid pgrep picking up its 
parent bash script.
 : # Test with:
 : # sh -c "pgrep -l -f 'impala.*21050' | grep [i]mpalad | grep 
-o '^[0-9]*' || true"
 : pid = self.shell("pgrep -l -f 'impalad.*%s' | grep [i]mpalad 
| "
 :  "grep -o '^[0-9]*' || true" % self.hs2_port)
 : if pid:
 :   return int(pid)
I'm fine with this, but this is

$fuser 21050/tcp
21050/tcp:2820

or possibly

$pgrep impalad | head -n 1
2820



--
To view, visit http://gerrit.cloudera.org:8080/12412
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I558b3157bb16ef3d169c0d3e795e03700a17ffe4
Gerrit-Change-Number: 12412
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Andrew Sherman 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Brown 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 08 Feb 2019 17:47:54 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Adding hostname to Disk I/O errors.

2019-02-07 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12402


Change subject: Adding hostname to Disk I/O errors.
..

Adding hostname to Disk I/O errors.

I recently ran into some queries that failed like so:

  WARNINGS: Disk I/O error: Could not open file: /data/...: Error(5): 
Input/output error

These warnings were in the profile, but I had to cross-reference impalad
logs to figure out which machine had the broken disk.

In this commit, I've sprinked GetBackendString() to include it.

Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2
---
M be/src/exprs/timezone_db.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/disk-io-mgr.cc
M be/src/runtime/io/error-converter.cc
M be/src/runtime/io/hdfs-file-reader.cc
M be/src/runtime/io/hdfs-monitored-ops.cc
M be/src/runtime/io/local-file-reader.cc
M common/thrift/generate_error_codes.py
8 files changed, 22 insertions(+), 19 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/12402/1
--
To view, visit http://gerrit.cloudera.org:8080/12402
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2
Gerrit-Change-Number: 12402
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger

[Impala-ASF-CR] Add raw values to pretty-printed time profiles.

2019-02-07 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12403


Change subject: Add raw values to pretty-printed time profiles.
..

Add raw values to pretty-printed time profiles.

Impala text profiles pretty print time values, like "1h3m" and so on.
This is handy, but it makes it miserable to programmatically find, say,
large values without writing a parser for the pretty-printer. This
commit includes the raw values when printing, with the units those raw
values come in. (We don't put the units for "bytes", but whereas most
metrics measured in bytes are actually measured in bytes, we mix nano
and milli seconds in several places.)

Change-Id: I3034482b3643968ac9e931bb28692e68b5a709c2
---
M be/src/util/pretty-printer.h
1 file changed, 4 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/12403/1
--
To view, visit http://gerrit.cloudera.org:8080/12403
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3034482b3643968ac9e931bb28692e68b5a709c2
Gerrit-Change-Number: 12403
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger

[Impala-ASF-CR] IMPALA-8147: part 1/2: make make *.sh redundant.

2019-02-07 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12368 )

Change subject: IMPALA-8147: part 1/2: make make_*.sh redundant.
..


Patch Set 2: Code-Review+2

(2 comments)

Looks good to me. I took a look at bin/jenkins/build-all-flag-combinations.sh 
and it's not obvious that we should add things in there.

http://gerrit.cloudera.org:8080/#/c/12368/2/buildall.sh
File buildall.sh:

http://gerrit.cloudera.org:8080/#/c/12368/2/buildall.sh@389
PS2, Line 389:   generate_cmake_files $build_type
Are we trying to make repeated invocations of buildall faster in any way? This 
re-creates all the cmake stuff, which maybe makes everything invalid? But maybe 
that was already happening before...


http://gerrit.cloudera.org:8080/#/c/12368/2/buildall.sh@427
PS2, Line 427:   if [[ ("$build_type" == "ADDRESS_SANITIZER") \
 : || ("$build_type" == "TIDY") \
 : || ("$build_type" == "UBSAN") \
 : || ("$build_type" == "UBSAN_FULL") \
 : || ("$build_type" == "TSAN") ]]; then
 : 
CMAKE_ARGS+=(-DCMAKE_TOOLCHAIN_FILE=$IMPALA_HOME/cmake_modules/clang_toolchain.cmake)
 :   else
 : 
CMAKE_ARGS+=(-DCMAKE_TOOLCHAIN_FILE=$IMPALA_HOME/cmake_modules/toolchain.cmake)
 :   fi
It's tempting to push this into CMake itself. Most of this file is passing 
arguments around, but this is really logic. When I worked on one of these 
sanitizers, I got super super confused that make_impala had this logic.

Anyway, fine not to do, but kibitzing.



--
To view, visit http://gerrit.cloudera.org:8080/12368
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I70e4f65712166348ca006bc68e1a1e18e853d3a0
Gerrit-Change-Number: 12368
Gerrit-PatchSet: 2
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 07 Feb 2019 23:22:08 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Enable full stacktrace logging with surefire

2019-02-05 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12370 )

Change subject: Enable full stacktrace logging with surefire
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12370
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I17fb7c72ff39dec1b0cbc7af9df3f61a9abc52b7
Gerrit-Change-Number: 12370
Gerrit-PatchSet: 1
Gerrit-Owner: Bharath Vissapragada 
Gerrit-Reviewer: Fredy Wijaya 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 05 Feb 2019 19:16:47 +
Gerrit-HasComments: No

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

2019-02-05 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
..


Patch Set 2:

> I think this change makes sense. Can you rebase?

done


--
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michal Ostrowski 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 05 Feb 2019 18:16:21 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7928: Consistent remote read scheduling

2019-02-05 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12037 )

Change subject: IMPALA-7928: Consistent remote read scheduling
..


Patch Set 8: Code-Review+2

(3 comments)

I'm comfortable with this. Nice work!

I'm still thinking a bit about end-to-end testing here. I think we should keep 
thinking about Tim's docker work to see if we can have a style of test that 
create remotes read because of containers. We can delay that.

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h
File be/src/scheduling/hash-ring.h:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@106
PS5, Line 106:   /// impact the actual allocations.
> I think doing some work to reduce our use of strings in the scheduler would
I've not seen evidence that scheduling is slow at the moment; I think it's fine 
to leave this alone.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@111
PS5, Line 111: };
> Two thoughts here:
I thought about this in the previous review. I don't think it's a big deal. The 
scheduler tries to spread out the load across replicas. If a block has 1 or 2 
fake replicas instead of 3, the scheduler will be able to even it out.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.h
File be/src/scheduling/scheduler.h:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.h@191
PS5, Line 191: /// the file name / offset from 'hdfs_file_split' multiple 
times and look up the
> That is a good idea. We want big files to get split up, and this shouldn't
Great find, Lars.



--
To view, visit http://gerrit.cloudera.org:8080/12037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45
Gerrit-Change-Number: 12037
Gerrit-PatchSet: 8
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 05 Feb 2019 17:38:10 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Revert "IMPALA-8146: Remove make {debug,release,asan}.sh"

2019-02-01 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12335 )

Change subject: Revert "IMPALA-8146: Remove make_{debug,release,asan}.sh"
..


Patch Set 1: Code-Review+2

I recommend self-verifying and pushing ahead.


--
To view, visit http://gerrit.cloudera.org:8080/12335
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ibcb921224071eef2dc77cf7b9f4c618f88aaa7c1
Gerrit-Change-Number: 12335
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 01 Feb 2019 19:29:02 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-02-01 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 9: Verified+1 Code-Review+2

Carrying +2. I've run tests on this manually, so I'm manually verifying. This 
needs a push to impala-lzo at the same time, so wouldn't pass the pre-commit 
anyway.


--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 9
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 01 Feb 2019 19:03:48 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-02-01 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_submissions_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change and 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

I happened to rename HdfsScanNode::file_type_counts_ to
HdfsScanNode::file_type_counts_lock_ because
HdfsScanNodeBase::file_type_counts_ also exists, and
is totally different.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Reviewed-on: http://gerrit.cloudera.org:8080/12097
Reviewed-by: Philip Zeyliger 
Tested-by: Philip Zeyliger 
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M

[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.

2019-01-31 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11782 )

Change subject: test-with-docker: decrease image size by "de-duping" HDFS.
..


Patch Set 1:

(1 comment)

> (1 comment)
 >
 > You know what somebody once said about shell scripts getting too
 > big?

Yes, I definitely think this is crossing that threshold, but don't want to 
re-write it quite yet.

http://gerrit.cloudera.org:8080/#/c/11782/1/docker/entrypoint.sh
File docker/entrypoint.sh:

http://gerrit.cloudera.org:8080/#/c/11782/1/docker/entrypoint.sh@228
PS1, Line 228:   set +x
> Do you need to keep this?
Yes, this avoids a lot of verbosity for every copy. It's undone in line 238.



--
To view, visit http://gerrit.cloudera.org:8080/11782
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1
Gerrit-Change-Number: 11782
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Michal Ostrowski 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Thu, 31 Jan 2019 22:47:12 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7928: Consistent remote read scheduling

2019-01-31 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12037 )

Change subject: IMPALA-7928: Consistent remote read scheduling
..


Patch Set 5:

(12 comments)

Some more comments. i think this is pretty close. I think we can make the 
hashing simpler.

http://gerrit.cloudera.org:8080/#/c/12037/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12037/5//COMMIT_MSG@12
PS5, Line 12: cache that is at the file level).
There is no other cache at the file level, so i think this parenthetical 
confuses more than helps?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/backend-config.cc
File be/src/scheduling/backend-config.cc:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/backend-config.cc@22
PS5, Line 22: // TODO: determine the appropriate number of replicas for the 
hash-ring
Are we going to do this, or leave for the future? Fine to leave; just checking 
that you didn't forget about this accidentally.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring-test.cc
File be/src/scheduling/hash-ring-test.cc:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring-test.cc@31
PS5, Line 31: class HashRingTest : public ::testing::Test {
I don't see a test for GetNode(n) here. Something pretty basic should work. If 
you're willing to peek at internals, checking that the "overflow" works 
correctly and that picking n around the biggest address works is working 
correctly.

If we want, we can write crash tests for Adding and Removing nodes that already 
exist or don't exist, though I'm ok without those.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring-test.cc@39
PS5, Line 39: EXPECT_EQ(hash_ring.GetTotalReplicas(), num_nodes * 
hash_ring.GetNumReplicas());
This assumes no collisions?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h
File be/src/scheduling/hash-ring.h:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@42
PS5, Line 42: and expects the caller to do its own hashing for the item.
Does this conflict with the TODO about "function pointer that generates a hash 
value"?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@56
PS5, Line 56: : num_replicas_(num_replicas) {}
assert that num_replicas >= 1?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@58
PS5, Line 58:   /// Copy constructor
Why do we need a copy constructor? We need it to copy the backend, eh?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@64
PS5, Line 64: for (auto it = hash_ring.node_hashmap_.begin();
:  it != hash_ring.node_hashmap_.end();
:  it++) {
:   uint32_t hash_value = it->first;
:   NodeIterator old_node_it = it->second;
:   // Look up the equivalent node iterator in the new nodes_ 
set.
:   NodeIterator new_node_it = nodes_.find(*old_node_it);
:   DCHECK(new_node_it != nodes_.end());
:   node_hashmap_.insert(std::pair(hash_value, new_node_it));
: }
Should this be moved out of the header?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@90
PS5, Line 90:   /// value.
Add: The hash ring must not be empty.

(I do wonder if we can get in trouble if the executors haven't joined the 
coordinators yet...)


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc
File be/src/scheduling/hash-ring.cc:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc@78
PS5, Line 78:   DCHECK_GT(num_removed, 0);
If we have just the right hash collisions, it's possible that num_removed == 0. 
I don't think this affects our correctness (the node just never gets any work 
scheduled to it, and it's unlikely). But am I understanding this right?


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc@91
PS5, Line 91:   return &(*node_it);
If node_it == node_hashmap_.end(), should you return nullptr here?

In debug mode, DCHECK_GT will prevent this from reaching, so it probably 
doesn't matter.


http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/util/hash-util.h
File be/src/util/hash-util.h:

http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/util/hash-util.h@290
PS5, Line 290:   class MultipleHashGenerator {
I suspect we can just use Rehash32to32 above.



--
To view, visit http://gerrit.cloudera.org:8080/12037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45
Gerrit-Change-Number: 12037
Gerrit-PatchSet: 5
Gerrit-Owner: Joe McDonnell 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Lars Volker

[Impala-ASF-CR] IMPALA-6932: Speed up scans for sequence datasets with many files

2019-01-30 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11517 )

Change subject: IMPALA-6932: Speed up scans for sequence datasets with many 
files
..


Patch Set 6: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/11517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965
Gerrit-Change-Number: 11517
Gerrit-PatchSet: 6
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 Jan 2019 23:29:38 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6932: Speed up scans for sequence datasets with many files

2019-01-30 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/11517 )

Change subject: IMPALA-6932: Speed up scans for sequence datasets with many 
files
..

IMPALA-6932: Speed up scans for sequence datasets with many files

This change addresses the slow scans of sequence datasets with
many files by enqueueing the scan ranges to the head of the disk
IO queue instead of the tail. This ensures that the data ranges
get priority over headers of other files. Hence it produces
results earlier for limit queries.

Testing:
Added a unit test to verify that the expected elements are
dequeued from the front.

Tested the performance of this patch on S3 to emulate remote reads.
The following query was executed several times:
"SELECT * FROM TPCH_AVRO.LINEITEM LIMIT 1;"
The average timeline difference was 8.66s vs 5.87s. The scanner I/O
wait time went down from 2.37s to 9.85s.

Tested the patch with backend and end-to-end tests.
Single node performance test results:
+--++-++++
| Workload | File Format| Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
+--++-++++
| TPCH(50) | avro / none / none | 65.62   | -0.38% | 43.51  | -0.79%
 |
+--++-++++

Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965
Reviewed-on: http://gerrit.cloudera.org:8080/11517
Reviewed-by: Philip Zeyliger 
Tested-by: Philip Zeyliger 
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M be/src/runtime/io/disk-io-mgr-stress.cc
M be/src/runtime/io/disk-io-mgr-test.cc
M be/src/runtime/io/request-context.cc
M be/src/runtime/io/request-context.h
M be/src/util/internal-queue-test.cc
M be/src/util/internal-queue.h
13 files changed, 159 insertions(+), 115 deletions(-)

Approvals:
  Philip Zeyliger: Looks good to me, approved; Verified

--
To view, visit http://gerrit.cloudera.org:8080/11517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965
Gerrit-Change-Number: 11517
Gerrit-PatchSet: 7
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-30 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 8: Code-Review+2

Carrying +2. Rebase was pretty straight-forward.


--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 30 Jan 2019 22:41:35 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6932: Speed up scans for sequence datasets with many files

2019-01-30 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/11517 )

Change subject: IMPALA-6932: Speed up scans for sequence datasets with many 
files
..


Patch Set 6: Code-Review+2

Pooja--by pushing this rebase, I may have ruined your GVD job. If it passes, 
and you need help pushing this along, please let me know.

(I'm carrying the +2; I was doing this rebase to work out a conflict for my in 
progress change.)


--
To view, visit http://gerrit.cloudera.org:8080/11517
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965
Gerrit-Change-Number: 11517
Gerrit-PatchSet: 6
Gerrit-Owner: Pooja Nilangekar 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Pooja Nilangekar 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 30 Jan 2019 22:40:18 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-30 Thread Philip Zeyliger (Code Review)

Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12097

to look at the new patch set (#8).

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_submissions_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change and 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

I happened to rename HdfsScanNode::file_type_counts_ to
HdfsScanNode::file_type_counts_lock_ because
HdfsScanNodeBase::file_type_counts_ also exists, and
is totally different.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-30 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 7:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12097/7/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/12097/7/be/src/exec/hdfs-scan-node-base.h@279
PS7, Line 279: Issuances
> Nit: Can this match up as "Submissions"?
Done



--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 7
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 30 Jan 2019 21:58:31 +
Gerrit-HasComments: Yes

[native-toolchain-CR] Initial support for building the toolchain in docker

2019-01-30 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12285 )

Change subject: Initial support for building the toolchain in docker
..


Patch Set 2:

(15 comments)

Thanks! When you say we're pinning the OS, does this actually pin the packages? 
Does it look like the libc (or whatever) we're getting from pinning to 6.6 
actually move? Unfortunately, I bet it does, so we have to be careful about how 
to use it. That's the underlying reason we need to be able to build the 
toolchain against explicit images.

http://gerrit.cloudera.org:8080/#/c/12285/2/Makefile
File Makefile:

http://gerrit.cloudera.org:8080/#/c/12285/2/Makefile@2
PS2, Line 2: STAMP_DIR=$(BUILD_DIR)/stamp
copyright, etc.


http://gerrit.cloudera.org:8080/#/c/12285/2/Makefile@15
PS2, Line 15:   ./in-docker.py $(IN_DOCKER_ARGS) $(@F) -- ./buildall.sh |sed -s 
's/^/$(@F): /'
Does that sed lose error codes because of pipefail-equivalent issues? (My quick 
testing suggests that it does.)


http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py
File docker/all/assert-dependencies-present.py:

http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@17
PS2, Line 17: # native toolchain. Must be python2.6 compatible.
Worth a note about why it must be 2.6 compatible.


http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@25
PS2, Line 25: def run(cmd):
I think you can call this check_output() since it's essentailly a 
re-implementation of subprocess.check_output() if we're on 2.6, which doesn't 
have it, if I remember correctly...


http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@34
PS2, Line 34:   for s in l:
: if re.match(regex, s):
:   return True
:   return False
I don't know if it's better, but this is basically any and map.

>>> import re
>>> r = re.compile('a')
>>> any(map(r.match, ["a", "b"]))
True
>>> any(map(r.match, ["c", "b"]))
False


http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@41
PS2, Line 41:   patterns = ['libdb.*.so',
:   'libkrb.*.so',
:   'libncurses.so',
:   'libsasl.*.so',
:   'libcrypto.so',
:   'libz.so']
There are dots here that are wildcards and dots here that are real dots. Should 
we be correct about it?


http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@47
PS2, Line 47:   out = run(['ldconfig', '-p'])[0]
:   libraries = []
:   for line in out.splitlines():
: libraries.append(line.split()[0])
I'm ambivalent, but:

libraries = [ line.split()[0] for line in check_output(["ldconfig", 
"-p"]).splitlines() ]


http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@77
PS2, Line 77: print 'Checking program: %s' % p
use logging?


http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@79
PS2, Line 79:   sys.exit('Unable to find %s in PATH' % p)
Use exceptions (and line 54)?


http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@85
PS2, Line 85: import distutils.core  # noqa: F401
Maybe just move this to top of the file and python will fail and it'll be clear 
enough?


http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py
File in-docker.py:

http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py@42
PS2, Line 42: in-docker.py --docker-args "-t" impala-toolchain-centos6 -- bash
Don't you need the "-i"?


http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py@70
PS2, Line 70:   '-v', '/etc/passwd:/etc/passwd:ro',
:   '-v', '/etc/group:/etc/group:ro',
We've maybe seen cases where this doesn't work, but I'm fine with it if it 
works.


http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py@77
PS2, Line 77:   parser.add_argument('--docker-args', default='')
Add help?

What's the quoting convention here?


http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py@125
PS2, Line 125:   xargs = subprocess.Popen(['xargs', '-i', 'cp', '--parents', 
'{}', distro_build_dir], stdin=git.stdout)
I had to look up -i and it looks like maybe it's deprecated?

  This option is a synonym for -Ireplace-str if replace-str is 
specified.  If the replace-str argument is missing, the effect is the same as 
-I{}.  This option is deprecated; use -I instead.


http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py@147
PS2, Line 147: sys.stderr.write('Running: %s\n' % ' '.join(cmd))
use logging?



--
To view, visit http://gerrit.cloudera.org:8080/12285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType:

[Impala-ASF-CR] IMPALA-8146: Remove make {debug,release,asan}.sh

2019-01-30 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12306 )

Change subject: IMPALA-8146: Remove make_{debug,release,asan}.sh
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12306
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Iac997917463b871769112834835cc3f99cff5954
Gerrit-Change-Number: 12306
Gerrit-PatchSet: 1
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Quanlong Huang 
Gerrit-Comment-Date: Wed, 30 Jan 2019 18:23:51 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-30 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 6:

(4 comments)

Thanks. I think we've converged here.

Annoyingly, I'll have to push this without the verifier job because I need to 
push the LZO change simultaneously.

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465
PS5, Line 465: remaining_scan_range_submission
> Thanks for the explanation. I suppose the initial source of confusion had t
I extended the comment a bit.


http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@79
PS5, Line 79:
: Status HdfsScanNode::GetNext(RuntimeState* state, RowBatch* 
row_batch, bool* eos) {
:   SCOPED_TIMER(runtime_profile_->total_time_counter());
:   Sco
> I was just wondering if the if-stmt body is empty as DCHECK is not compiled
I think we've converged here. The outer #ifndef NDEBUG in the current code 
removes the if in production, and in debug mode, it's all active.


http://gerrit.cloudera.org:8080/#/c/12097/6/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12097/6/be/src/exec/hdfs-scan-node.cc@196
PS6, Line 196: !(ranges_issued_barrier_.pending() != 0)
> Why not ranges_issued_barrier_.pending() == 0 ?
Done


http://gerrit.cloudera.org:8080/#/c/12097/6/be/src/exec/hdfs-scan-node.cc@197
PS6, Line 197:   && initial_ranges_issued_
 :   && progress_.done()) {
> nit: one line
I'm down to two lines here, but one line doesn't quite fit.



--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 6
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 30 Jan 2019 18:09:00 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-30 Thread Philip Zeyliger (Code Review)

Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12097

to look at the new patch set (#7).

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_submissions_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change and 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

I happened to rename HdfsScanNode::file_type_counts_ to
HdfsScanNode::file_type_counts_lock_ because
HdfsScanNodeBase::file_type_counts_ also exists, and
is totally different.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M

[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator

2019-01-29 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12299 )

Change subject: IMPALA-2990: timeout unresponsive queries in coordinator
..


Patch Set 1:

(5 comments)

Cool.

http://gerrit.cloudera.org:8080/#/c/12299/1//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12299/1//COMMIT_MSG@29
PS1, Line 29: - Write functional tests once the appropriate mechanisms are in 
place
I think you can "kill -STOP" an impalad running a query, and it will end up 
looking like this to the coordinator. Of course, maybe something else will 
timeout.


http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/runtime/coordinator.cc@213
PS1, Line 213:   backend_states_init_ = true;
This could probably just use backend_states_.empty() instead. I don't mind the 
extra variable, though. We have .empty() usages elsewhere.


http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/runtime/coordinator.cc@744
PS1, Line 744:   int64_t current_time = MonotonicMillis();
It's confusing, as be/src/runtime/coordinator-backend-state.h has similarly 
named variables that are in UnixMillis() and MonotonicMillis(). I think this is 
correct at the moment, which is good.


http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/service/impala-server.cc
File be/src/service/impala-server.cc:

http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/service/impala-server.cc@2212
PS1, Line 2212: true
Do we have a style to write
TerminatedByServer(, true /* expire */)
?


http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/service/impala-server.cc@2209
PS1, Line 2209: int64_t lag_time_ms = 
coord->GetMaxBackendStateLagMs();
  : if (lag_time_ms > FLAGS_max_report_lag_s * 1000) {
  :   
cancellation_thread_pool_->Offer(CancellationWork::TerminatedByServer(
  :   request_state->query_id(), Status("hung 
query"), true));
If I were a user receiving this error, I'd immediately want to know which 
backend failed. Let's include more information here, including the values of 
the flags, the max lag, and where the max lag was observed.



--
To view, visit http://gerrit.cloudera.org:8080/12299
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987
Gerrit-Change-Number: 12299
Gerrit-PatchSet: 1
Gerrit-Owner: Thomas Marshall 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Tue, 29 Jan 2019 23:31:01 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Logging current database context when logging analyzing queries.

2019-01-29 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12301


Change subject: Logging current database context when logging analyzing queries.
..

Logging current database context when logging analyzing queries.

The "analyzing query" log message is a reliable way to correlate
queries with ther thread ids, but the session database name
is missing there. Adding it in.

Change-Id: I70886be4685e0d3ae7ca9f6e57e8159dc39c67c7
---
M fe/src/main/java/org/apache/impala/service/Frontend.java
1 file changed, 2 insertions(+), 1 deletion(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/12301/1
--
To view, visit http://gerrit.cloudera.org:8080/12301
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I70886be4685e0d3ae7ca9f6e57e8159dc39c67c7
Gerrit-Change-Number: 12301
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger

[Impala-ASF-CR] Revert "IMPALA-7992: Revert "Symbolize stacktraces in debug builds.""

2019-01-29 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12300


Change subject: Revert "IMPALA-7992: Revert "Symbolize stacktraces in debug 
builds.""
..

Revert "IMPALA-7992: Revert "Symbolize stacktraces in debug builds.""

I believe IMPALA-7992 is addressed with "IMPALA-7992: Reduce iterations
for test_decimal_fuzz." which reduces the number of iterations.

Bringing back debug symbols allows us to go back to symbolizing debug
traces, which is very handy for logs that are unattached to the binary
that generated them.

This reverts commit 5bf81cdc2797f986189aec4e78ebff2c2d1ed1b6.

Change-Id: Idd9c444da4016eb6935119d03fec6b07c17dda53
---
M be/src/common/init.cc
1 file changed, 5 insertions(+), 0 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/12300/1
--
To view, visit http://gerrit.cloudera.org:8080/12300
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Idd9c444da4016eb6935119d03fec6b07c17dda53
Gerrit-Change-Number: 12300
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-29 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 5:

(3 comments)

Thanks for the reviews. The non-assertion (but rather printing) caught one more 
bug, wherein the Avro stuff wasn't decrementing when skipping files. I've fixed 
that.

I also renamed a variable that was the same name in a base class and a 
subclass. I was using the file_type_counter_ business to debug something. It's 
very minor, so I snuck it in.

http://gerrit.cloudera.org:8080/#/c/12097/5//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12097/5//COMMIT_MSG@85
PS5, Line 85: adn
> Nit: and
Done


http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465
PS5, Line 465: remaining_scan_range_issuances_
> Here is how it works (as I understand it, which might be incomplete):
I am now trying remaining_scan_range_submissions_. I'd argue that it's all 
better than num_unqueued_files_. :)


http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@79
PS5, Line 79: auto remaining = remaining_scan_range_issuances_.Load();
: if (remaining != 0) {
:   LOG(INFO) << "XXX Assertion Failed " << remaining << " " << 
GetStackTrace();
: }
> If you take the suggestion of using the above DCHECK, I don't think you nee
You were right and we can move this into Close() once the threads have been 
joined. I did switch to DCHECK, but I'm still using the ifdef. I could write a 
more complicated boolean expression rather than using the if(... && ... && ...) 
business, but I think it's clearer with the ifndef.



-- 
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 29 Jan 2019 23:11:31 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-29 Thread Philip Zeyliger (Code Review)

Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12097

to look at the new patch set (#6).

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_submissions_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change and 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

I happened to rename HdfsScanNode::file_type_counts_ to
HdfsScanNode::file_type_counts_lock_ because
HdfsScanNodeBase::file_type_counts_ also exists, and
is totally different.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-28 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 5:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465
PS5, Line 465: remaining_scan_range_issuances_
> This feels a bit of a misnomer as parquet scanner will actually issue more
Do you have any suggestions?

Todd previously said:

If we want to make this reflect its actual numeric value, maybe something like 
remaining_scan_range_addition_steps ? ie the Parquet scanner has one "scan 
range addition step" whereas the text ones have one such step per file? Or 
perhaps remaining_scan_range_issuances_?


http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc
File be/src/exec/hdfs-scan-node.cc:

http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@78
PS5, Line 78:   if (!(ranges_issued_barrier_.pending() != 0) && 
initial_ranges_issued_ && progress_.done()) {
> line too long (95 > 90)
Done


http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@79
PS5, Line 79: auto remaining = remaining_scan_range_issuances_.Load();
: if (remaining != 0) {
:   LOG(INFO) << "XXX Assertion Failed " << remaining << " " << 
GetStackTrace();
: }
> DCHECK_EQ(remaining_scan_range_issuances_.Load(), 0) << GetStackTrace();
D'oh. I don't know how this one got by me.


http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@212
PS5, Line 212: thread_state_.Close(this);
> I wonder if you can DCHECK remaining_scan_range_issuances_ to 0 after threa
I'll try it. I think the limit reach case is handled by progress_.done() being 
false.



--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Tue, 29 Jan 2019 00:26:48 +
Gerrit-HasComments: Yes

[native-toolchain-CR] Initial support for building the toolchain in docker

2019-01-28 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12285 )

Change subject: Initial support for building the toolchain in docker
..


Patch Set 1:

(16 comments)

Thanks!

This is a huge improvement on the Jenkins-only version of this. in-docker.sh is 
borderline worthy of re-writing in Python, though it's a bit hard to tell 
whether or not it'll get shorter. I'll leave that up to you.

I think we should try to separate the docker image building versus the building 
a little bit more than you have. Historically, we've pinned our AMI's, so the 
build is fairly reproducible. As you've got it here, a change to RedHat's libc 
will tease its way into our stuff, and maybe we want that to be explicit. I 
don't think that's a big deal: it's a matter of defining a default place to 
pull the images from...

http://gerrit.cloudera.org:8080/#/c/12285/1/buildall-docker.sh
File buildall-docker.sh:

http://gerrit.cloudera.org:8080/#/c/12285/1/buildall-docker.sh@2
PS1, Line 2: # Copyright 2017 Cloudera Inc.
2019.


http://gerrit.cloudera.org:8080/#/c/12285/1/buildall-docker.sh@23
PS1, Line 23: for image in $(cd docker; ./buildall.sh); do
Perhaps add a TODO for parallelism?


http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/assert.sh
File docker/all/assert.sh:

http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/assert.sh@2
PS1, Line 2: # Copyright 2015 Cloudera Inc.
2019


http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/assert.sh@16
PS1, Line 16: # Assert that a system has the correct libraries/binaries to 
build the
Perhaps rename assert to something more friendly? Like 
"assert-dependencies-present.sh"?


http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/assert.sh@19
PS1, Line 19: set -u
: set -e
: set -o pipefail
It's temping to write this in python from the get-go. The parsing of ldconfig 
-p in particular is a bit tedious in shell.


http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/assert.sh@35
PS1, Line 35: rm -f /tmp/lib
You seem like you use a temp dir on line 23; what's this about?


http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/postinstall.sh
File docker/all/postinstall.sh:

http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/postinstall.sh@2
PS1, Line 2: # Copyright 2015 Cloudera Inc.
2019


http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/postinstall.sh@18
PS1, Line 18: getent hosts github.com| xargs -n1 ssh-keyscan -t rsa,dsa >> 
/etc/ssh/ssh_known_hosts
Comment what this does?


http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos/yum-install
File docker/centos/yum-install:

http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos/yum-install@23
PS1, Line 23: yum install $@ 2>&1 | tee $t
Does this preserve yum failures, given that you don't have pipefail set?

Should this be "$@" rather than $@?


http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos/yum-install@25
PS1, Line 25:   >&2 echo "yum-install: One or more packages not available but 
specified as arguments"
avoid tabs?


http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos5.df
File docker/centos5.df:

http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos5.df@47
PS1, Line 47: # XXX: Thrift 0.9.3 requires python2.6, and uses system python.
??? Can we manipulate this via PATH?


http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos5/epel/epel-release-5-4.noarch.rpm
File docker/centos5/epel/epel-release-5-4.noarch.rpm:

http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos5/epel/epel-release-5-4.noarch.rpm@a1
PS1, Line 1:
I always worry about binary files in git repos... Can we do this in a more 
text-y way? (perhaps by just editing yum.repos.d files?


http://gerrit.cloudera.org:8080/#/c/12285/1/in-docker.sh
File in-docker.sh:

http://gerrit.cloudera.org:8080/#/c/12285/1/in-docker.sh@16
PS1, Line 16: # Script that sets up environment used in order to perform full 
(or partial)
Can you document how the file system mounting is set up here? i.e., where the 
output is.


http://gerrit.cloudera.org:8080/#/c/12285/1/in-docker.sh@49
PS1, Line 49: TARGET_DIR=/mnt
Is this useful to be configured?


http://gerrit.cloudera.org:8080/#/c/12285/1/in-docker.sh@66
PS1, Line 66:   while [[ $1 ]]; do
Can you do this with "-z" and avoid the "set +u" stuff?


http://gerrit.cloudera.org:8080/#/c/12285/1/in-docker.sh@87
PS1, Line 87: # GCC_VERSION works here
Comment that this looks for subdirectories of source (like source/foo) and 
produces variables FOO_VERSION for them.

You don't really need the basename here. You can do something like:

find . -mindepth 1 -maxdepth 1 -type d -printf "%f\n"

The %f gives you the last component of the name.



--
To view, visit http://gerrit.cloudera.org:8080/12285
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id:

[native-toolchain-CR] Fix orc compilation failures in centos 5

2019-01-28 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12284 )

Change subject: Fix orc compilation failures in centos 5
..


Patch Set 1:

We shouldn't be doing anything that requires centos:5?


--
To view, visit http://gerrit.cloudera.org:8080/12284
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: native-toolchain
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I741d55b55d0dfdfd99bd393855b439166367ee10
Gerrit-Change-Number: 12284
Gerrit-PatchSet: 1
Gerrit-Owner: hector.aco...@cloudera.com 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 28 Jan 2019 22:03:00 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Philip Zeyliger (Code Review)

Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12097

to look at the new patch set (#5).

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_issuances_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change adn 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12097/3/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/12097/3/be/src/exec/hdfs-scan-node-base.cc@452
PS3, Line 452:   SCOPED_CLEANUP({ UpdateRemainingScanRangeIssuances(-1); });
> We have MakeScopeExitTrigger in Impala, but it seems like we could just swi
Seems better to be consistent, so I switched to MakeScopeExitTrigger.



--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 25 Jan 2019 19:05:11 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-25 Thread Philip Zeyliger (Code Review)

Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12097

to look at the new patch set (#4).

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_issuances_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change adn 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-24 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 3:

The build failure here is because this needs to be pushed simultaneously with 
an LZO change.

https://github.com/cloudera/impala-lzo/pull/4/files

Tim--would you mind doing another pass?


--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 3
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Fri, 25 Jan 2019 00:27:51 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6664: Tag log statements with fragment or query ids.

2019-01-24 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12129 )

Change subject: IMPALA-6664: Tag log statements with fragment or query ids.
..


Patch Set 8: Code-Review+2

Carry +2. I had to make a minor change to the test because the log directory is 
actually in $IMPALA_EE_LOG_DIR (-ish). Also, functional.alltypes has more 
partitions/files, so serves as a more reliable test case. The tpcds table I was 
using was only running on two hosts, so had a bit more trouble.


--
To view, visit http://gerrit.cloudera.org:8080/12129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6634ef9d1a7346339f24f2d40a7a3aa36a535da8
Gerrit-Change-Number: 12129
Gerrit-PatchSet: 8
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Thu, 24 Jan 2019 20:17:15 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6664: Tag log statements with fragment or query ids.

2019-01-24 Thread Philip Zeyliger (Code Review)

Hello Paul Rogers, Zoltan Borok-Nagy, Zoram Thanga, Tim Armstrong, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12129

to look at the new patch set (#8).

Change subject: IMPALA-6664: Tag log statements with fragment or query ids.
..

IMPALA-6664: Tag log statements with fragment or query ids.

This implements much of the desire in IMPALA-6664 to tag all log
statements with their query ids. It re-uses the existing ThreadDebugInfo
infrastructure as well as the existing
InstallLogMessageListenerFunction() patch to glog (currently used for
log redaction) to prefix log messages with fragment ids or query ids,
when available. The fragment id is the query id with the last bits
incremented, so it's possible to correlate a given query's log messages.
For example:

  $ grep 85420d575b9ff4b9:402b8868 logs/cluster/impalad.INFO
  I0108 10:39:16.453958 14752 impala-server.cc:1052] 
85420d575b9ff4b9:402b8868] Registered query 
query_id=85420d575b9ff4b9:402b8868 
session_id=aa45e480434f0516:101ae5ac12679d94
  I0108 10:39:16.454738 14752 Frontend.java:1242] 
85420d575b9ff4b9:402b8868] Analyzing query: select count(*) from 
tpcds.web_sales
  I0108 10:39:16.456627 14752 Frontend.java:1282] 
85420d575b9ff4b9:402b8868] Analysis finished.
  I0108 10:39:16.463538 14818 admission-controller.cc:598] 
85420d575b9ff4b9:402b8868] Schedule for 
id=85420d575b9ff4b9:402b8868 in pool_name=default-pool 
per_host_mem_estimate=180.02 MB PoolConfig: max_requests=-1 max_queued=200 
max_mem=-1.00 B
  I0108 10:39:16.463603 14818 admission-controller.cc:603] 
85420d575b9ff4b9:402b8868] Stats: agg_num_running=0, agg_num_queued=0, 
agg_mem_reserved=0,  local_host(local_mem_admitted=0, num_admitted_running=0, 
num_queued=0, backend_mem_reserved=0)
  I0108 10:39:16.463780 14818 admission-controller.cc:635] 
85420d575b9ff4b9:402b8868] Admitted query 
id=85420d575b9ff4b9:402b8868
  I0108 10:39:16.463896 14818 coordinator.cc:93] 
85420d575b9ff4b9:402b8868] Exec() 
query_id=85420d575b9ff4b9:402b8868 stmt=select count(*) from 
tpcds.web_sales
  I0108 10:39:16.464795 14818 coordinator.cc:356] 
85420d575b9ff4b9:402b8868] starting execution on 2 backends for 
query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:16.466384 24891 impala-internal-service.cc:49] 
ExecQueryFInstances(): query_id=85420d575b9ff4b9:402b8868 
coord=pannier.sf.cloudera.com:22000 #instances=2
  I0108 10:39:16.467339 14818 coordinator.cc:370] 
85420d575b9ff4b9:402b8868] started execution on 2 backends for 
query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:16.467536 14823 query-state.cc:579] 
85420d575b9ff4b9:402b8868] Executing instance. 
instance_id=85420d575b9ff4b9:402b8868 fragment_idx=0 
per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=1
  I0108 10:39:16.467627 14824 query-state.cc:579] 
85420d575b9ff4b9:402b88680001] Executing instance. 
instance_id=85420d575b9ff4b9:402b88680001 fragment_idx=1 
per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
  I0108 10:39:16.820933 14824 query-state.cc:587] 
85420d575b9ff4b9:402b88680001] Instance completed. 
instance_id=85420d575b9ff4b9:402b88680001 #in-flight=1 status=OK
  I0108 10:39:17.122299 14823 krpc-data-stream-mgr.cc:294] 
85420d575b9ff4b9:402b8868] DeregisterRecvr(): 
fragment_instance_id=85420d575b9ff4b9:402b8868, node=2
  I0108 10:39:17.123500 24038 coordinator.cc:709] Backend completed: 
host=pannier.sf.cloudera.com:22001 remaining=2 
query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:17.123509 24038 coordinator-backend-state.cc:265] 
query_id=85420d575b9ff4b9:402b8868: first in-progress backend: 
pannier.sf.cloudera.com:22000
  I0108 10:39:17.167752 14752 impala-beeswax-server.cc:197] 
85420d575b9ff4b9:402b8868] get_results_metadata(): 
query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:17.168762 14752 coordinator.cc:483] 
85420d575b9ff4b9:402b8868] ExecState: query 
id=85420d575b9ff4b9:402b8868 execution completed
  I0108 10:39:17.168808 14752 coordinator.cc:608] 
85420d575b9ff4b9:402b8868] Coordinator waiting for backends to finish, 
1 remaining. query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:17.168880 14823 query-state.cc:587] 
85420d575b9ff4b9:402b8868] Instance completed. 
instance_id=85420d575b9ff4b9:402b8868 #in-flight=0 status=OK
  I0108 10:39:17.168977 14821 query-state.cc:252] UpdateBackendExecState(): 
last report for 85420d575b9ff4b9:402b8868
  I0108 10:39:17.174401 24038 coordinator.cc:709] Backend completed: 
host=pannier.sf.cloudera.com:22000 remaining=1 
query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:17.174513 14752 coordinator.cc:814] 
85420d575b9ff4b9:402b8868] Release admission control

[Impala-ASF-CR] Support centos:7 for test-with-docker.

2019-01-23 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12139 )

Change subject: Support centos:7 for test-with-docker.
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12139/1/bin/bootstrap_system.sh
File bin/bootstrap_system.sh:

http://gerrit.cloudera.org:8080/#/c/12139/1/bin/bootstrap_system.sh@253
PS1, Line 253: redhat6 sudo service ntpd start || grep docker /proc/1/cgroup
> Ignoring the nit, we stop ntpd on redhat7 notindocker. Don't we need to sta
Ah, I understand now. Adding.



--
To view, visit http://gerrit.cloudera.org:8080/12139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I7577949b6eaaa2239bcf0fadf64e1490c2106b08
Gerrit-Change-Number: 12139
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 23 Jan 2019 18:54:41 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] Support centos:7 for test-with-docker.

2019-01-23 Thread Philip Zeyliger (Code Review)

Hello Laszlo Gaal, Joe McDonnell, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12139

to look at the new patch set (#2).

Change subject: Support centos:7 for test-with-docker.
..

Support centos:7 for test-with-docker.

As a follow-on to IMPALA-7698, adds various incantations
so that centos:7 can build under test-with-docker.

The core issue is that the centos:7 image doesn't let you start sshd
(necessary for the HBase startup scripts, and probably could be worked
around) or postgresql (harder to work around) with systemctl, because
systemd isn't "running." To avoid this, we start them manually
with /usr/sbin/sshd and pg_ctl.

Change-Id: I7577949b6eaaa2239bcf0fadf64e1490c2106b08
---
M bin/bootstrap_system.sh
M docker/entrypoint.sh
2 files changed, 90 insertions(+), 12 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/12139/2
--
To view, visit http://gerrit.cloudera.org:8080/12139
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I7577949b6eaaa2239bcf0fadf64e1490c2106b08
Gerrit-Change-Number: 12139
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Philip Zeyliger

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-23 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 3:

(12 comments)

Thanks for the reviews. Here's another iteration.

The LZO stuff also needed handling, it turns out, so that's in progress.

http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG
Commit Message:

http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@7
PS2, Line 7: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
> In theory some of the lines in this commit message should be shorter
I tried to shorten some stuff. Didn't get all of it.


http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@21
PS2, Line 21: base-sequence-scanner.cc.
> did you consider an approach like adding something like this to the top lev
So, an HdfsScanNode can have multiple files, of different file formats. And 
then it uses static methods to call IssueInitialRanges(). So I wasn't able to 
shoe-horn in some object-oriented stuff in there. I didn't want to take on a 
bigger refactoring.

Here's the relevant snippet from hdfs-scan-node-base.cc.

  for (const auto& entry : matching_per_type_files) {
if (entry.second.empty()) continue;
switch (entry.first) {
  case THdfsFileFormat::PARQUET:
RETURN_IF_ERROR(HdfsParquetScanner::IssueInitialRanges(this, 
entry.second));
break;
  case THdfsFileFormat::TEXT:
RETURN_IF_ERROR(HdfsTextScanner::IssueInitialRanges(this, 
entry.second));
break;
  case THdfsFileFormat::SEQUENCE_FILE:
  case THdfsFileFormat::RC_FILE:
  case THdfsFileFormat::AVRO:
RETURN_IF_ERROR(BaseSequenceScanner::IssueInitialRanges(this, 
entry.second));
break;
  case THdfsFileFormat::ORC:
RETURN_IF_ERROR(HdfsOrcScanner::IssueInitialRanges(this, entry.second));
break;
  default:
DCHECK(false) << "Unexpected file type " << entry.first;
}
  }


http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@38
PS2, Line 38: required ~30 threads to be spinning in the kernel. We were able 
to use
> typo
Done


http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@74
PS2, Line 74: I se
> typo
Done


http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@266
PS2, Line 266: Must n
> Must?
Done


http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@277
PS2, Line 277: threads
> impala has plugins?
This is related to the Impala lzo plugin. The task here is to change the 
signature in that repo, and then delete this signature. There's no 
compatibility guarantee that I have to keep around; I just wanted to avoid 
having to teach some test infrastructure about multiple branches.

Anyway, I didn't see [[deprecated]] in use elsewhere, and this should be 
short-lived, so I'm going to ignore it at the moment.


http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@287
PS2, Line 287:   Tuple* InitTemplateTuple(const 
std::vector& value_evals,
> can you DCHECK that the result doesn't go below zero?
Done


http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@471
PS2, Line 471: m a slot's path to its index into ma
> yea, I was also confused by the inverted naming here.
Sorry about the confusing naming. As you can imagine, I went back and forth 
between booleans and ints and stuff here.

I went with "remaining_scan_range_issuances_".


http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.cc
File be/src/exec/hdfs-scan-node-base.cc:

http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.cc@453
PS2, Line 453:
> is it important to not decrement this in the error cases below on line 471-
It has to be decremented *after* IssueInitialRanges(), since, if you don't, all 
the scanner threads might exit and you'll hang. I was trying to catch all the 
error cases and always decrement in those.

I think the easiest way to do it is to wrap inside of another function, so I'll 
do that.


http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node.h
File be/src/exec/hdfs-scan-node.h:

http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node.h@139
PS2, Line 139:
 :   /// Number of times scanner thread didn't find work to do.
 :   RuntimeProfile::Counter* 
scanner_thread_workless_loops_counter_ = nullptr;
> hrm, I see the value of this for your test, but not entirely convinced it m
I'm curious whether or not this loop is hot in practice at our customers, and 
this was the easiest way to tell. I've seen this be ~10 in the benchmarking I 
was doing, and it'll be useful to check up on it in more cases.

I considered getting rid of this whole loop and using condition

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-23 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12097 )

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h
File be/src/exec/hdfs-scan-node-base.h:

http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@277
PS2, Line 277: plugins
> This is related to the Impala lzo plugin. The task here is to change the si
And now this is done...



--
To view, visit http://gerrit.cloudera.org:8080/12097
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
Gerrit-Change-Number: 12097
Gerrit-PatchSet: 2
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Joe McDonnell 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Todd Lipcon 
Gerrit-Comment-Date: Wed, 23 Jan 2019 18:01:09 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .

2019-01-23 Thread Philip Zeyliger (Code Review)

Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public 
Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12097

to look at the new patch set (#3).

Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.
..

IMPALA-7980: Fix spinning because of buggy num_unqueued_files_.

This commit removes num_unqueued_files_ and replaces it with a more
tightly scoped and easier to reason about
remaining_scan_range_issuances_ variable. This variable (and its
predecessor) are used as a way to signal to scanner threads they may
exit (instead of spinning) because there will never be a scan range
provided to them, because no more scan ranges will be added. In
practice, most scanner implementations can never call AddDiskIoRanges()
after IssueInitialRanges(). The exception is sequence files and Avro,
which share a common base class. Instead of incrementing and
decrementing this counter in a variety of paths, this commit makes the
common case simple (set to 1 initially; decrement at exit points of
IssueInitialRanges()) and the complicated, sequence-file case is treated
within base-sequence-scanner.cc.

Note that this is not the first instance of a subtle bug
in this code. The following two JIRAs (and corresponding
commits) are fundamentally similar bugs:
IMPALA-3798: Disable per-split filtering for sequence-based scanners
IMPALA-1730: reduce scanner thread spinning windows

We ran into this bug when running TPC-DS query 1 on scale factor 10,000
(10TB) on a 140-node cluster with replica_preference=remote, we observed
really high system CPU usage for some of the scan nodes:

  HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 
100.00%
- BytesRead: 80.50 MB (84408563)
- ScannerThreadsSysTime: 36m17s

Using 36 minutes of system time in only 1 minute of wall-clock time
required ~30 threads to be spinning in the kernel. We were able to use
perf to find a lot of usage of futex_wait() and pthread_cond_wait().
Eventually, we figured out that ScannerThreads, once started, loop
forever looking for work.  The case that there is no work is supposed to
be rare, and the scanner threads are supposed to exit based on
num_unqueued_files_ being 0, but, in some cases, that counter isn't
appropriately decremented.

The reproduction is any query that uses runtime filters to filter out
entire files. Something like:

  set RUNTIME_FILTER_WAIT_TIME_MS=1;
  select count(*)
  from customer
  join customer_address on c_current_addr_sk = ca_address_sk
  where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist";

triggers this behavior. This code path is covered by several existing
tests, most directly in test_runtime_filters.py:test_file_filtering().
Interestingly, though this wastes cycles, query results are unaffected.

I initially fixed this bug with a point fix that handled the case when
runtime filters caused files to be skipped and added assertions that
checked that num_unqueued_files_ was decremented to zero when queries
finished. Doing this led me, somewhat slowly, to both finding similar
bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had
the same bug if the entire file was skipped) and fighting with races on
the assertion itself. I eventually concluded that there's really no
shared synchronization between progress_.Done() and num_unqueued_files_.
The same conclusion is true for the current implementation, so there
aren't assertions.

I added a metric for how many times the scanners run through their
loop without doing any work and observed it to be non-zero
for a query from tests/query_test/test_runtime_filters.py:test_wait_time.

To measure the effect of this, I set up a cluster of 9 impalad's and
1 coordinator, running against an entirely remote HDFS. The machines
were r4.4xlarge and the remote disks were EBS st1's, though everything
was likely buffer cached. I ran
TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against
tpcds_1000_decimal_parquet 10 times. The big observable
thing is that ScannerThreadsSysTime went from 5.6 seconds per
query to 1.9 seconds per query. (I ran the text profiles through the 
old-fashioned:
  grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += 
$3/100 } END { print x }'
)
The query time effect was quite small (the fastest query was 3.373s
with the change adn 3.82s without the change, but the averages were
tighter), but the extra work was visible in the profiles.

This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip
Zeyliger.

Change-Id: I133de13238d3d05c510e2ff771d48979125735b1
---
M be/src/exec/base-sequence-scanner.cc
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/hdfs-scan-node-base.h
M be/src/exec/hdfs-scan-node.cc
M be/src/exec/hdfs-scan-node.h
M be/src/exec/hdfs-scanner.cc
M be/src/exec/hdfs-text-scanner.cc
M

[Impala-ASF-CR] Using 'master' branch of Impala-lzo and allowing test-with-docker to configure it.

2019-01-23 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/12259


Change subject: Using 'master' branch of Impala-lzo and allowing 
test-with-docker to configure it.
..

Using 'master' branch of Impala-lzo and allowing test-with-docker to configure 
it.

This updates bootstrap_system.sh to check out the 'master' branch of
Impala-lzo. (I've separately updated the 'master' branch to
be identical to today's cdh5-trunk branch; it had grown a few
years stale.) I've also added support to teasing the configuration
through test-with-docker.

This allows for Impala 2.x and 3.x to diverge here, and it allows
for testing changes to Impala-lzo.

Change-Id: Ieba45fc18d9e490f75d16c477cdc1cce26f41ce9
---
M bin/bootstrap_system.sh
M docker/entrypoint.sh
M docker/test-with-docker.py
3 files changed, 22 insertions(+), 2 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/12259/1
--
To view, visit http://gerrit.cloudera.org:8080/12259
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ieba45fc18d9e490f75d16c477cdc1cce26f41ce9
Gerrit-Change-Number: 12259
Gerrit-PatchSet: 1
Gerrit-Owner: Philip Zeyliger

[Impala-ASF-CR] IMPALA-8091: incremental improvements to NTP sync

2019-01-22 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12253 )

Change subject: IMPALA-8091: incremental improvements to NTP sync
..


Patch Set 1: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/12253
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ifc8cacf81765d132dd574993f8149231bdbb7bc6
Gerrit-Change-Number: 12253
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Brown 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Laszlo Gaal 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Wed, 23 Jan 2019 00:02:20 +
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-6664: Tag log statements with fragment or query ids.

2019-01-22 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12129 )

Change subject: IMPALA-6664: Tag log statements with fragment or query ids.
..


Patch Set 5:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/12129/2/be/src/common/thread-debug-info.h
File be/src/common/thread-debug-info.h:

http://gerrit.cloudera.org:8080/#/c/12129/2/be/src/common/thread-debug-info.h@a116
PS2, Line 116:
> Thanks! Do you plan to fix it in this change, or in a follow-up commit?
Thanks for catching that I didn't do this. I fixed it. Turned out to be nice 
and easy.


http://gerrit.cloudera.org:8080/#/c/12129/5/be/src/service/impala-hs2-server.cc
File be/src/service/impala-hs2-server.cc:

http://gerrit.cloudera.org:8080/#/c/12129/5/be/src/service/impala-hs2-server.cc@126
PS5, Line 126: ScopedThreadContext(GetThreadDebugInfo(), query_ctx.query_id);
> Currently it creates a temporary object that is deleted promptly.
Whoops. Thanks for catching this. Done.


http://gerrit.cloudera.org:8080/#/c/12129/5/be/src/service/impala-internal-service.cc
File be/src/service/impala-internal-service.cc:

http://gerrit.cloudera.org:8080/#/c/12129/5/be/src/service/impala-internal-service.cc@49
PS5, Line 49:   ScopedThreadContext(GetThreadDebugInfo(), 
params.query_ctx.query_id);
> It creates a temporary that is deleted promptly.
Done



--
To view, visit http://gerrit.cloudera.org:8080/12129
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I6634ef9d1a7346339f24f2d40a7a3aa36a535da8
Gerrit-Change-Number: 12129
Gerrit-PatchSet: 5
Gerrit-Owner: Philip Zeyliger 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Paul Rogers 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Reviewer: Zoltan Borok-Nagy 
Gerrit-Reviewer: Zoram Thanga 
Gerrit-Comment-Date: Tue, 22 Jan 2019 20:46:54 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-7731: Add Read/Exchange counters to profile

2019-01-18 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12229 )

Change subject: IMPALA-7731: Add Read/Exchange counters to profile
..


Patch Set 3: Code-Review+2

(4 comments)

Thanks!

http://gerrit.cloudera.org:8080/#/c/12229/3/be/src/runtime/coordinator-backend-state.cc
File be/src/runtime/coordinator-backend-state.cc:

http://gerrit.cloudera.org:8080/#/c/12229/3/be/src/runtime/coordinator-backend-state.cc@524
PS3, Line 524: //if (!p->metadata().__isset.plan_node_id) continue;
??


http://gerrit.cloudera.org:8080/#/c/12229/3/be/src/runtime/coordinator-backend-state.cc@531
PS3, Line 531: RuntimeProfile::Counter* bytes_sent = 
p->GetCounter("TotalBytesSent");
This this be a constant somewhere?


http://gerrit.cloudera.org:8080/#/c/12229/3/be/src/runtime/coordinator.cc
File be/src/runtime/coordinator.cc:

http://gerrit.cloudera.org:8080/#/c/12229/3/be/src/runtime/coordinator.cc@790
PS3, Line 790:   // node.
Is it the case that

TotalBytesSent == TotalScanBytesSent + TotalInnerBytesSent

?

Do we have a place to DCHECK assert that and/or test it?


http://gerrit.cloudera.org:8080/#/c/12229/3/tests/query_test/test_observability.py
File tests/query_test/test_observability.py:

http://gerrit.cloudera.org:8080/#/c/12229/3/tests/query_test/test_observability.py@390
PS3, Line 390:   def test_exchange_scan_ratio_non_zero(self):
Should we add small tests for the other new metrics you have?



--
To view, visit http://gerrit.cloudera.org:8080/12229
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: Ife7ec78fe42558429c1cbe6e5eba79842bffd648
Gerrit-Change-Number: 12229
Gerrit-PatchSet: 3
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Csaba Ringhofer 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Comment-Date: Fri, 18 Jan 2019 22:40:15 +
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-6664: Tag log statements with fragment or query ids.

2019-01-18 Thread Philip Zeyliger (Code Review)

Hello Paul Rogers, Zoltan Borok-Nagy, Zoram Thanga, Tim Armstrong, Impala 
Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/12129

to look at the new patch set (#6).

Change subject: IMPALA-6664: Tag log statements with fragment or query ids.
..

IMPALA-6664: Tag log statements with fragment or query ids.

This implements much of the desire in IMPALA-6664 to tag all log
statements with their query ids. It re-uses the existing ThreadDebugInfo
infrastructure as well as the existing
InstallLogMessageListenerFunction() patch to glog (currently used for
log redaction) to prefix log messages with fragment ids or query ids,
when available. The fragment id is the query id with the last bits
incremented, so it's possible to correlate a given query's log messages.
For example:

  $ grep 85420d575b9ff4b9:402b8868 logs/cluster/impalad.INFO
  I0108 10:39:16.453958 14752 impala-server.cc:1052] 
85420d575b9ff4b9:402b8868] Registered query 
query_id=85420d575b9ff4b9:402b8868 
session_id=aa45e480434f0516:101ae5ac12679d94
  I0108 10:39:16.454738 14752 Frontend.java:1242] 
85420d575b9ff4b9:402b8868] Analyzing query: select count(*) from 
tpcds.web_sales
  I0108 10:39:16.456627 14752 Frontend.java:1282] 
85420d575b9ff4b9:402b8868] Analysis finished.
  I0108 10:39:16.463538 14818 admission-controller.cc:598] 
85420d575b9ff4b9:402b8868] Schedule for 
id=85420d575b9ff4b9:402b8868 in pool_name=default-pool 
per_host_mem_estimate=180.02 MB PoolConfig: max_requests=-1 max_queued=200 
max_mem=-1.00 B
  I0108 10:39:16.463603 14818 admission-controller.cc:603] 
85420d575b9ff4b9:402b8868] Stats: agg_num_running=0, agg_num_queued=0, 
agg_mem_reserved=0,  local_host(local_mem_admitted=0, num_admitted_running=0, 
num_queued=0, backend_mem_reserved=0)
  I0108 10:39:16.463780 14818 admission-controller.cc:635] 
85420d575b9ff4b9:402b8868] Admitted query 
id=85420d575b9ff4b9:402b8868
  I0108 10:39:16.463896 14818 coordinator.cc:93] 
85420d575b9ff4b9:402b8868] Exec() 
query_id=85420d575b9ff4b9:402b8868 stmt=select count(*) from 
tpcds.web_sales
  I0108 10:39:16.464795 14818 coordinator.cc:356] 
85420d575b9ff4b9:402b8868] starting execution on 2 backends for 
query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:16.466384 24891 impala-internal-service.cc:49] 
ExecQueryFInstances(): query_id=85420d575b9ff4b9:402b8868 
coord=pannier.sf.cloudera.com:22000 #instances=2
  I0108 10:39:16.467339 14818 coordinator.cc:370] 
85420d575b9ff4b9:402b8868] started execution on 2 backends for 
query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:16.467536 14823 query-state.cc:579] 
85420d575b9ff4b9:402b8868] Executing instance. 
instance_id=85420d575b9ff4b9:402b8868 fragment_idx=0 
per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=1
  I0108 10:39:16.467627 14824 query-state.cc:579] 
85420d575b9ff4b9:402b88680001] Executing instance. 
instance_id=85420d575b9ff4b9:402b88680001 fragment_idx=1 
per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2
  I0108 10:39:16.820933 14824 query-state.cc:587] 
85420d575b9ff4b9:402b88680001] Instance completed. 
instance_id=85420d575b9ff4b9:402b88680001 #in-flight=1 status=OK
  I0108 10:39:17.122299 14823 krpc-data-stream-mgr.cc:294] 
85420d575b9ff4b9:402b8868] DeregisterRecvr(): 
fragment_instance_id=85420d575b9ff4b9:402b8868, node=2
  I0108 10:39:17.123500 24038 coordinator.cc:709] Backend completed: 
host=pannier.sf.cloudera.com:22001 remaining=2 
query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:17.123509 24038 coordinator-backend-state.cc:265] 
query_id=85420d575b9ff4b9:402b8868: first in-progress backend: 
pannier.sf.cloudera.com:22000
  I0108 10:39:17.167752 14752 impala-beeswax-server.cc:197] 
85420d575b9ff4b9:402b8868] get_results_metadata(): 
query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:17.168762 14752 coordinator.cc:483] 
85420d575b9ff4b9:402b8868] ExecState: query 
id=85420d575b9ff4b9:402b8868 execution completed
  I0108 10:39:17.168808 14752 coordinator.cc:608] 
85420d575b9ff4b9:402b8868] Coordinator waiting for backends to finish, 
1 remaining. query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:17.168880 14823 query-state.cc:587] 
85420d575b9ff4b9:402b8868] Instance completed. 
instance_id=85420d575b9ff4b9:402b8868 #in-flight=0 status=OK
  I0108 10:39:17.168977 14821 query-state.cc:252] UpdateBackendExecState(): 
last report for 85420d575b9ff4b9:402b8868
  I0108 10:39:17.174401 24038 coordinator.cc:709] Backend completed: 
host=pannier.sf.cloudera.com:22000 remaining=1 
query_id=85420d575b9ff4b9:402b8868
  I0108 10:39:17.174513 14752 coordinator.cc:814] 
85420d575b9ff4b9:402b8868] Release admission control

[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile

2019-01-18 Thread Philip Zeyliger (Code Review)

Philip Zeyliger has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/12069 )

Change subject: IMPALA-7694: Add host resource usage metrics to profile
..


Patch Set 10: Code-Review+2

What was the bug you found?


--
To view, visit http://gerrit.cloudera.org:8080/12069
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9
Gerrit-Change-Number: 12069
Gerrit-PatchSet: 10
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Philip Zeyliger 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 18 Jan 2019 20:32:36 +
Gerrit-HasComments: No

1 2 3 4 5 6 7 8 9 >

1 - 100 of 855 matches

Mail list logo