[Impala-ASF-CR] IMPALA-8344: Add support for running the minicluster with S3Guard
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/13020 ) Change subject: IMPALA-8344: Add support for running the minicluster with S3Guard .. Patch Set 2: (6 comments) http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13020/2/bin/impala-config.sh@312 PS2, Line 312: export S3GUARD_METADATASTORE_IMPL="org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore" I looked into whether there were other interesting impls here and couldn't find any. hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/s3guard/LocalMetadataStore.java doesn't work across processes. If we had something that went cross-process, we wouldn't really need Dynamo in our world. http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh File bin/jenkins/release_cloud_resources.sh: http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@29 PS2, Line 29: # This is currently only implemented for s3. Consider asserting S3GUARD_DYNAMODB_TABLE and S3_BUCKET are non-empty... http://gerrit.cloudera.org:8080/#/c/13020/2/bin/jenkins/release_cloud_resources.sh@37 PS2, Line 37: aws s3 rm --recursive --quiet s3://${S3_BUCKET}${TEST_WAREHOUSE_DIR} This line and line 33 is useful enough that we should either do set +x or echo what we're about to do. http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py File tests/common/impala_test_suite.py: http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@177 PS2, Line 177: cls.filesystem_client = S3Client(S3_BUCKET_NAME) I think we could coalesce this to use HDFS/Hadoop always, thereby maybe removing the S3Client code? http://gerrit.cloudera.org:8080/#/c/13020/2/tests/common/impala_test_suite.py@178 PS2, Line 178: elif IS_ABFS: : # ABFS is implemented via HDFS command line client : cls.filesystem_client = HdfsCommandLineClient("ABFS") : elif IS_ADLS: : cls.filesystem_client = ADLSClient(ADLS_STORE_NAME) Do you know which, if any, of these are getting tested these days? http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py File tests/util/hdfs_util.py: http://gerrit.cloudera.org:8080/#/c/13020/2/tests/util/hdfs_util.py@149 PS2, Line 149: # This client is a wrapper around the hdfs command line. This is useful for filesystems Use pydoc? class Foo(bla): """bla bla bla""" -- To view, visit http://gerrit.cloudera.org:8080/13020 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3c748529a494bb6e70fec96dc031523ff79bf61d Gerrit-Change-Number: 13020 Gerrit-PatchSet: 2 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: David Knupp Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Mon, 29 Apr 2019 18:15:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/13005 ) Change subject: IMPALA-8369 : Impala should be able to interoperate with Hive 3.1.0 .. Patch Set 3: (1 comment) Nice work. When I did the transitional work between Hive 1 and Hive 2, I introduced a variable that switched at build time between the two worlds. (See a203733fac3e1e37df8abeee39a88d187153a8c5 for the revert and "git log --grep IMPALA_MINICLUSTER_PROFILE") If I'm understanding right, the approach here is to produce a single "binary" that works for both worlds? Or at run time do the "original" Hive jars get run? I think both approaches are plausible; just want to make sure we're clear about it. (Is the shading slow? I've seen maven-shade-plugin be very slow...) http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh File bin/impala-config.sh: http://gerrit.cloudera.org:8080/#/c/13005/3/bin/impala-config.sh@167 PS3, Line 167: export CDP_BUILD_NUMBER=994 This number looks like a CDH_BUILD_NUMBER, and is probably from the same name space, but looks like something you hand-wrote. You can ask the same oracle that CDH_BUILD_NUMBER uses for your own number which is guaranteed not to conflict in the future. -- To view, visit http://gerrit.cloudera.org:8080/13005 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I45a4dadbdfe30a02f722dbd917a49bc182fc6436 Gerrit-Change-Number: 13005 Gerrit-PatchSet: 3 Gerrit-Owner: Vihang Karajgaonkar Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sudhanshu Arora Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Comment-Date: Tue, 23 Apr 2019 17:10:25 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8288: avoid overflow causing DCHECK in PrettyPrint.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12702 ) Change subject: IMPALA-8288: avoid overflow causing DCHECK in PrettyPrint. .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12702 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I00d9769cf34e2ccd796ec1cf88797c8f8250f718 Gerrit-Change-Number: 12702 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 08 Mar 2019 21:09:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 ) Change subject: IMPALA-8250: Clean up JNI warnings. .. Patch Set 3: > "I've kicked off a suite of our tests with code coverage on. I'm > also going to kick off a set of tests with the JNI checking and > 'exhaustive' tests. Maybe there's coverage, but we're not running > the queries against HBase." > > Did anything new come up in this? > Nothing new. I'm still chasing a sequence of these calls which lead to CallObjectMethod() not having its exception checked... checked_jni_CallVoidMethodV checked_jni_ExceptionOccurred checked_jni_ExceptionClear checked_jni_GetObjectClass checked_jni_FindClass checked_jni_GetMethodID checked_jni_CallObjectMethod I can trigger this with set mem_limit=50m;use functional;insert into table alltypesinsert partition (year, month) /* +noclustered */ select at1.id, at1.bool_col, at1.tinyint_col, at1.smallint_col, at1.int_col, at1.bigint_col, at1.float_col, at1.double_col, at1.date_string_col, at1.string_col, at1.timestamp_col, at1.year, at2.id as month from functional.alltypes at1, functional.alltypes at2; when the sink gets closed after a failure. I don't know where this stack of calls is at this point. I'll keep looking, but I don't want to block on it. -- To view, visit http://gerrit.cloudera.org:8080/12660 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f Gerrit-Change-Number: 12660 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Thu, 07 Mar 2019 23:08:14 + Gerrit-HasComments: No
[Impala-ASF-CR] Prototype for a remote read byte cache.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12683 ) Change subject: Prototype for a remote read byte cache. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@63 PS1, Line 63: you'd actually want to integrate this with the general scratch space > Currently we don't have global limits in TmpFileMgr, just per-query limits It seems like spilling is going to be pretty sequential, so spinning cares less about SSD than caching does. That's another angle to reconcile. Anyway--I agree that this is tractable. We don't want to run the user out of disk space, and evicting cache is a very reasonable thing to do. http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@92 PS1, Line 92: useful. The buffer pool code currently provides buffers that are pinned > I think having the pages pinned by default is the right thing for this case Yep. I think the underlying question here is how much we want to "manage" this. In the traditional "local" case, we don't manage the OS buffer cache, and it definitely helps us out. I could definitely imagine a world where Impala has a default cache size of ~4GB that's always devoted to cache. The rest of Impala "managed" (as in, within memlimit) memory gets used by the cache if it's available, and evicted as necessary. The cache here is the "memory user of last resort". Your point on observability is well-taken. Remote reads incur a write-to-cache cost that's kind of unexpected. Fortunately, we presume the disk is fairly idle given that the reads are happening over the network, but it'll be an important thing to surface. -- To view, visit http://gerrit.cloudera.org:8080/12683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5 Gerrit-Change-Number: 12683 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 06 Mar 2019 20:14:26 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Prototype for a remote read byte cache.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12683 ) Change subject: Prototype for a remote read byte cache. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12683/1//COMMIT_MSG@28 PS1, Line 28: The code in this commit builds a cache with the simplest policy I could Note that Kudu has an LRU cache that could be used pretty directly. -- To view, visit http://gerrit.cloudera.org:8080/12683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5 Gerrit-Change-Number: 12683 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 06 Mar 2019 19:12:30 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Prototype for a remote read byte cache.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12683 ) Change subject: Prototype for a remote read byte cache. .. Patch Set 1: I'm using Gerrit as a way to share this. It's certainly not ready for commit, but it may be worthy of discussion. -- To view, visit http://gerrit.cloudera.org:8080/12683 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic312b0f7ac7875e00a3855ef21dce5b8a9aa67c5 Gerrit-Change-Number: 12683 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 06 Mar 2019 18:56:50 + Gerrit-HasComments: No
[Impala-ASF-CR] Prototype for a remote read byte cache.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12683 Change subject: Prototype for a remote read byte cache. .. Prototype for a remote read byte cache. This code (which is far from ready for prime time) attempts to get to a place where a byte cache for remote (either HDFS or S3 or ADLS) reads can be evaluated. This commit message tries to write down what I've figured out so far... Hive LLAP found that it was useful to build such a cache. (See hive.git:llap-server/src/java/org/apache/hadoop/hive/llap/cache/LowLevelCache.java and neighbors.) In their case, they implemented an LRUF policy. They are also caching the uncompressed ORC byte streams, which give them fairly consistent chunk sizes (because the compression buffers are typically all the same size). Alluxio also found it useful to build a cache, and it plugs in as a Hadoop filesystem. Apache Ignite and Quobole Rubix are of interest. And, of course, there's memcached, Redis, ehcache, and so on and on and on... Hive LLAP found that SSDs work best for these caches, since they're seek heavy. I think this implies that they'd be configured differently than scratch dirs, since, in a machine with both SSDs and disk, you may want to keep the cache off the disk. (Or you may want a three-tier policy..) The code in this commit builds a cache with the simplest policy I could implement (FIFO) and backs the cache with a large, memory-mapped file. There is implicitly a two-level cache here: the OS page cache is handling which pages are in memory, and the cache here is handling eviction from the cache altogether. The metadata for the cache is stored in a map in memory, though the linked list management of free space is directly in the buffers. (I currently regret that, but I don't think it affects the evaluation goal; a linked list without pointer arithmetic would have been more pleasant to deal with.) I hooked up the code through ReadFromPosInternal() which was sort of easy. Technically cached files don't need an hdfsFileOpen(), but I've not optimized that. I've not made the cache coalesce overlapping reads. I suspect that in practice it may not be necessary, because how we read tables is pretty consistent, but that's something that needs to be checked. I've not made the cache do "compaction" because the FIFO policy means that there's no fragmentation. When breaking down the function of the cache, a few things have to happen: * Impala has to read through the cache or store things in the cache. I've done this via wrapping ReadFromPosInternal() * Impala has to specify the "value" of certain data; e.g., Parquet footers or index pages are more valuable than data pages of really large fact tables. I've not dealt with this. * The cache has to be configured. I've exposed a pair of flags for directories and how much of the relevant free space to use. If this sort of thing is long-lived, you'd actually want to integrate this with the general scratch space management code in Impala: a spilling query could use disk space that's being used by a cache. (Though even spilling queries should have limits; the cache may be more valuable than the currently spilling query.) * The cache has to expose metrics. I exposed hit/miss counters (in both count and bytes) on the query, but nothing global. * The cache needs to have a notion of locality, so that files don't end up cached N times for an N-node cluster. I've taken advantage of the recent changes to make remote reads pin themselves to certain hosts (to make the remote file handle cache work). I don't know whether that currently leaves us with 3 or 1 copies of most files, but it's enough for evaluation. * The cache needs to have decent threading, since we read from many threads. I've set it up so that you get multiple caches, one per dir, and, furthermore, each cache is split into 8. This roughly models having a pool of 8 threads reading from SSDs. I've not explored this tunable space. I used neither the temporary file manager code nor the buffer pool code. This is out of expedience/ignorance. I suspect the temporary file manager code, and especially its ability to do encryption, would be very useful. The buffer pool code currently provides buffers that are pinned by default, and I didn't want to deal with pinning and unpinning. A more complicated implementation would find much more re-use potential here. I looked into using 2MB Transparent Huge Pages but found that mmap doesn't do both "file-backed" and "THP." There's some performance exploration to be done here, but I've not looked into it. The cache doesn't survive restarts. This seems ok and certainly makes state management easier, since we don't have to worry about consistency for the metadata state in the face of a crash. The underlying file is deleted at creation time. Users may end up getting confused
[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 ) Change subject: IMPALA-8250: Clean up JNI warnings. .. Patch Set 2: (1 comment) Yep, good catch. Thanks. http://gerrit.cloudera.org:8080/#/c/12660/2/be/src/exec/hbase-table-scanner.cc File be/src/exec/hbase-table-scanner.cc: http://gerrit.cloudera.org:8080/#/c/12660/2/be/src/exec/hbase-table-scanner.cc@562 PS2, Line 562: RETURN_ERROR_IF_EXC(env); > I think GetArrayLength does not throw an exception, so this check is probab Done -- To view, visit http://gerrit.cloudera.org:8080/12660 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f Gerrit-Change-Number: 12660 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 06 Mar 2019 18:13:58 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.
Hello Sahil Takiar, Todd Lipcon, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12660 to look at the new patch set (#3). Change subject: IMPALA-8250: Clean up JNI warnings. .. IMPALA-8250: Clean up JNI warnings. Using LIBHDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to (a) checking for exceptions and (b) leaking local references. Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC left and right. I chose not to expand the JniCall infrastructure to handle this more generally at the moment. The leaky local references are a bit harder. In the logs, they show up as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A few of these errors seem to be not in our code. The ones that I've found in our code stemmed from HBaseTableScanner::GetRowKey(): this method uses local references and wasn't returning them. Using a JniLocalFrame seems to have taken care of the warnings. I have added code to skip test_large_strings when JNI checking is enabled. This test takes forever (presumably because JNI is checking bounds on strings very aggressively), and times out. The time out also causes some metric-related checks to fail (since a query is still in flight). Debugging this required customizing my JDK to give stack traces when these warnings occurred. The following diff facilitated this. diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp --- a/src/share/vm/prims/jniCheck.cpp Tue Jan 15 10:43:31 2019 + +++ b/src/share/vm/prims/jniCheck.cpp Wed Feb 27 11:57:13 2019 -0800 @@ -143,11 +143,30 @@ static const char * fatal_instance_field_mismatch = "Field type (instance) mismatch in JNI get/set field operations"; static const char * fatal_non_string = "JNI string operation received a non-string"; +// thisone: whether to print every time, or maybe, depending on future +// how many future stacks we want printed (totally racy); helps catch +// missing exception handling if there's a way to tickle that code +// reliably. +static inline void dump_native_stack(JavaThread* thr, bool thisone, int future) { + static int fut_stacks = 0; // racy! + if (fut_stacks > 0) { +thisone = true; +fut_stacks--; + } + if (future > 0) fut_stacks = future; + if (thisone) { +frame fr = os::current_frame(); +char buf[6000]; +tty->print_cr("Thread: %s %d", thr->get_thread_name(), thr->osthread()->thread_id()); +print_native_stack(tty, fr, thr, buf, sizeof(buf)); + } +} // When in VM state: static void ReportJNIWarning(JavaThread* thr, const char *msg) { tty->print_cr("WARNING in native method: %s", msg); thr->print_stack(); + dump_native_stack(thr, true, 0); } // When in NATIVE state: @@ -199,11 +218,14 @@ tty->print_cr("WARNING in native method: JNI call made without checking exceptions when required to from %s", thr->get_pending_jni_exception_check()); thr->print_stack(); + dump_native_stack(thr, true, 10); ) thr->clear_pending_jni_exception_check(); // Just complain once } } + + /** * Add to the planned number of handles. I.e. plus current live & warning threshold */ @@ -254,9 +276,12 @@ tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu", live_handles, planned_capacity); thr->print_stack(); + dump_native_stack(thr, true, 0); ) // Complain just the once, reset to current + warn threshold add_planned_handle_capacity(handles, 0); + } else { +dump_native_stack(thr, false, 0); } } Change-Id: Idd1709f749a764c1d947704bc64306493863b45f --- M be/src/catalog/catalog.cc M be/src/exec/hbase-table-scanner.cc M be/src/exprs/hive-udf-call.cc M be/src/runtime/hbase-table-factory.cc M be/src/service/frontend.cc M be/src/service/frontend.h M tests/query_test/test_insert.py 7 files changed, 55 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12660/3 -- To view, visit http://gerrit.cloudera.org:8080/12660 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f Gerrit-Change-Number: 12660 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] Simplify JNI exception-handling macros.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12659 ) Change subject: Simplify JNI exception-handling macros. .. Patch Set 3: Code-Review+2 Rebase; carrying +2. -- To view, visit http://gerrit.cloudera.org:8080/12659 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I38e9e51d3180328935f41bb118647113e5cd409c Gerrit-Change-Number: 12659 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 06 Mar 2019 00:59:11 + Gerrit-HasComments: No
[Impala-ASF-CR] Fix code coverage argument to bin/run-all-tests.sh
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12673 ) Change subject: Fix code coverage argument to bin/run-all-tests.sh .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12673 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I0244cae9f67b184c931616c58c42c62cabeb0627 Gerrit-Change-Number: 12673 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 06 Mar 2019 00:50:06 + Gerrit-HasComments: No
[native-toolchain-CR] Add ccache support
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12574 ) Change subject: Add ccache support .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/12574/2/docker/all/postinstall.sh File docker/all/postinstall.sh: http://gerrit.cloudera.org:8080/#/c/12574/2/docker/all/postinstall.sh@54 PS2, Line 54: make -j We often use -j$(nproc)... don't know that it matters. -- To view, visit http://gerrit.cloudera.org:8080/12574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieef291d1294a204b0b8da7e7aa4fa642cdd5e144 Gerrit-Change-Number: 12574 Gerrit-PatchSet: 2 Gerrit-Owner: Hector Acosta Gerrit-Reviewer: Hector Acosta Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 05 Mar 2019 21:30:59 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Simplify JNI exception-handling macros.
Hello Sahil Takiar, Todd Lipcon, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12659 to look at the new patch set (#2). Change subject: Simplify JNI exception-handling macros. .. Simplify JNI exception-handling macros. This simplifies the implementations of ABORT_IF_EXC() and CLEAN_EXIT_IF_EXC() by reducing them to usages of JniUtil::GetJniExceptionMsg() and our typical Status-handling macros. The existing implementations didn't clear the exception before calling additional Java methods, whereas JniUtil::GetJniExceptionMsg() already handled that correctly. I tested this manually by renaming a class in catalogd's startup sequence to a class that doesn't exist and checked that the crash logging was useful, containing both the Java and C++ stack traces with both macros. This is part of IMPALA-8250. Change-Id: I38e9e51d3180328935f41bb118647113e5cd409c --- M be/src/util/jni-util.h 1 file changed, 9 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/12659/2 -- To view, visit http://gerrit.cloudera.org:8080/12659 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I38e9e51d3180328935f41bb118647113e5cd409c Gerrit-Change-Number: 12659 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.
Hello Sahil Takiar, Todd Lipcon, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12660 to look at the new patch set (#2). Change subject: IMPALA-8250: Clean up JNI warnings. .. IMPALA-8250: Clean up JNI warnings. Using LIBHDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to (a) checking for exceptions and (b) leaking local references. Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC left and right. I chose not to expand the JniCall infrastructure to handle this more generally at the moment. The leaky local references are a bit harder. In the logs, they show up as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A few of these errors seem to be not in our code. The ones that I've found in our code stemmed from HBaseTableScanner::GetRowKey(): this method uses local references and wasn't returning them. Using a JniLocalFrame seems to have taken care of the warnings. I have added code to skip test_large_strings when JNI checking is enabled. This test takes forever (presumably because JNI is checking bounds on strings very aggressively), and times out. The time out also causes some metric-related checks to fail (since a query is still in flight). Debugging this required customizing my JDK to give stack traces when these warnings occurred. The following diff facilitated this. diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp --- a/src/share/vm/prims/jniCheck.cpp Tue Jan 15 10:43:31 2019 + +++ b/src/share/vm/prims/jniCheck.cpp Wed Feb 27 11:57:13 2019 -0800 @@ -143,11 +143,30 @@ static const char * fatal_instance_field_mismatch = "Field type (instance) mismatch in JNI get/set field operations"; static const char * fatal_non_string = "JNI string operation received a non-string"; +// thisone: whether to print every time, or maybe, depending on future +// how many future stacks we want printed (totally racy); helps catch +// missing exception handling if there's a way to tickle that code +// reliably. +static inline void dump_native_stack(JavaThread* thr, bool thisone, int future) { + static int fut_stacks = 0; // racy! + if (fut_stacks > 0) { +thisone = true; +fut_stacks--; + } + if (future > 0) fut_stacks = future; + if (thisone) { +frame fr = os::current_frame(); +char buf[6000]; +tty->print_cr("Thread: %s %d", thr->get_thread_name(), thr->osthread()->thread_id()); +print_native_stack(tty, fr, thr, buf, sizeof(buf)); + } +} // When in VM state: static void ReportJNIWarning(JavaThread* thr, const char *msg) { tty->print_cr("WARNING in native method: %s", msg); thr->print_stack(); + dump_native_stack(thr, true, 0); } // When in NATIVE state: @@ -199,11 +218,14 @@ tty->print_cr("WARNING in native method: JNI call made without checking exceptions when required to from %s", thr->get_pending_jni_exception_check()); thr->print_stack(); + dump_native_stack(thr, true, 10); ) thr->clear_pending_jni_exception_check(); // Just complain once } } + + /** * Add to the planned number of handles. I.e. plus current live & warning threshold */ @@ -254,9 +276,12 @@ tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu", live_handles, planned_capacity); thr->print_stack(); + dump_native_stack(thr, true, 0); ) // Complain just the once, reset to current + warn threshold add_planned_handle_capacity(handles, 0); + } else { +dump_native_stack(thr, false, 0); } } Change-Id: Idd1709f749a764c1d947704bc64306493863b45f --- M be/src/catalog/catalog.cc M be/src/exec/hbase-table-scanner.cc M be/src/exprs/hive-udf-call.cc M be/src/runtime/hbase-table-factory.cc M be/src/service/frontend.cc M be/src/service/frontend.h M tests/query_test/test_insert.py 7 files changed, 56 insertions(+), 10 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12660/2 -- To view, visit http://gerrit.cloudera.org:8080/12660 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f Gerrit-Change-Number: 12660 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12660 ) Change subject: IMPALA-8250: Clean up JNI warnings. .. Patch Set 1: (3 comments) > (3 comments) > > I'm seeing a lot of places in the HBase scanner where we call > GetByteArrayRegion or GetObjectArrayElement without checking for > exceptions (these can both throw ArrayIndexOutOfBoundsExceptions). > I'm not sure why -Xcheck:jni wouldn't detect these, but perhaps be > should do a scrub of the rest of the Impala code to make sure we > are checking for exceptions after each array call (there are > several other array-based JNI ops that throw ArrayIndexOutOfBoundsException > - > https://docs.oracle.com/javase/7/docs/technotes/guides/jni/spec/functions.html). > We can do that in a different patch though since it might be a > larger change. Yeah: this is surprising. I read the jniCheck.cpp code and it should be handling the GetObjectArray... methods. The most likely scenario is that our tests don't have coverage here. Or jniCheck isn't doing its business. I'm going to sprinkle in some of these where it's obvious. I've kicked off a suite of our tests with code coverage on. I'm also going to kick off a set of tests with the JNI checking and 'exhaustive' tests. Maybe there's coverage, but we're not running the queries against HBase. This discussion (from jniCheck.cpp) is also interesting. I didn't see the code that didn't warn about ArrayIndexOutOfBounds, but maybe that's what's happening. /** * Check whether or not a programmer has actually checked for exceptions. According * to the JNI Specification ("jni/spec/design.html#java_exceptions"): * * There are two cases where the programmer needs to check for exceptions without * being able to first check an error code: * * - The JNI functions that invoke a Java method return the result of the Java method. * The programmer must call ExceptionOccurred() to check for possible exceptions * that occurred during the execution of the Java method. * * - Some of the JNI array access functions do not return an error code, but may * throw an ArrayIndexOutOfBoundsException or ArrayStoreException. * * In all other cases, a non-error return value guarantees that no exceptions have been thrown. * * Programmers often defend against ArrayIndexOutOfBoundsException, so warning * for these functions would be pedantic. */ http://gerrit.cloudera.org:8080/#/c/12660/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12660/1//COMMIT_MSG@9 PS1, Line 9: Using HDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to > It's LIBHDFS_OPTS right? Done http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/runtime/hbase-table-factory.cc File be/src/runtime/hbase-table-factory.cc: http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/runtime/hbase-table-factory.cc@97 PS1, Line 97: if (!s.ok()) LOG(INFO) << "Exception when cleaning up HBase" << s; > nit: space after HBase Done http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/service/frontend.cc File be/src/service/frontend.cc: http://gerrit.cloudera.org:8080/#/c/12660/1/be/src/service/frontend.cc@116 PS1, Line 116: ABORT_IF_ERROR(JniUtil::LoadJniMethod(jni_env, fe_class_, &(methods[i]))); > Might be understanding the code wrong, but shouldn't these be global refs s Based on https://docs.oracle.com/javase/8/docs/technotes/guides/jni/spec/design.html#accessing_fields_and_methods , https://stackoverflow.com/questions/2093112/why-i-should-not-reuse-a-jclass-and-or-jmethodid-in-jni and http://www.latkin.org/blog/2016/02/01/jni-object-lifetimes-quick-reference/ (very nice!) and https://developer.android.com/training/articles/perf-jni.html#local_and_global_references say that jmethodid (which is the type here) doesn't need global references, but is invalidated if the class gets unloaded. Fortunately, the class is kept alive because fe_ stores it. I thought fe_class_ had the problem you suggest (it's a local reference), but it turns out that it's not used outside this method. I removed it from the class to make that obvious. -- To view, visit http://gerrit.cloudera.org:8080/12660 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f Gerrit-Change-Number: 12660 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 05 Mar 2019 20:34:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Remove unused THROW IF ERROR WITH LOGGING, THROW IF EXC, and RETURN IF EXC.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12658 ) Change subject: Remove unused THROW_IF_ERROR_WITH_LOGGING, THROW_IF_EXC, and RETURN_IF_EXC. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/12658/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12658/1//COMMIT_MSG@7 PS1, Line 7: Remove unused THROW_IF_ERROR_WITH_LOGGING, THROW_IF_EXC, and RETURN_IF_EXC. > nit: Add "IMPALA-8250: " Done http://gerrit.cloudera.org:8080/#/c/12658/1//COMMIT_MSG@9 PS1, Line 9: These macros have no references, so removing them. Found while > So were these removed purely for cleanup purposes, or was there something w Added to the commit message. Some of them were broken. Note how ExceptionClear() is called after CallStaticObjectMethod()... -jthrowable exc = (env)->ExceptionOccurred(); \ -if (exc != nullptr) { \ - DCHECK((throwable_to_string_id_) != nullptr); \ - jstring stack = (jstring) env->CallStaticObjectMethod(JniUtil::jni_util_class(), \ - (JniUtil::throwable_to_stack_trace_id()), exc); \ - jboolean is_copy; \ - const char* c_stack = \ - reinterpret_cast((env)->GetStringUTFChars(stack, _copy)); \ - (env)->ExceptionClear(); \ -- To view, visit http://gerrit.cloudera.org:8080/12658 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ic63b498e0e4da57c7ec15cf1ad8070a6afdb3d96 Gerrit-Change-Number: 12658 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 05 Mar 2019 19:07:56 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8250: Remove unused THROW IF ERROR WITH LOGGING, THROW IF EXC, and RETURN IF EXC.
Hello Sahil Takiar, Todd Lipcon, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12658 to look at the new patch set (#2). Change subject: IMPALA-8250: Remove unused THROW_IF_ERROR_WITH_LOGGING, THROW_IF_EXC, and RETURN_IF_EXC. .. IMPALA-8250: Remove unused THROW_IF_ERROR_WITH_LOGGING, THROW_IF_EXC, and RETURN_IF_EXC. These macros have no references, so removing them. Found while investigating JNI warnings in IMPALA-8250. The following one-liner does a hap-hazard job of finding similar issues. There are relatively few left, and none of them seemed problematic. (for x in $(git grep '#define' be/src | grep -v be/src/kudu | grep -v be/src/gutil | grep -v be/src/thirdparty | grep -v be/src/transport/config.h | awk '{ print $2 }' | sed -e 's,(.*,,' | sort | uniq); do echo -n "$x "; git grep -e $x | wc -l; done) | awk '$2 == 1 { print }' Note that some of the macros were broken, because they called other JNI functions before calling ExceptionClear(). Change-Id: Ic63b498e0e4da57c7ec15cf1ad8070a6afdb3d96 --- M be/src/util/jni-util.h 1 file changed, 0 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/12658/2 -- To view, visit http://gerrit.cloudera.org:8080/12658 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ic63b498e0e4da57c7ec15cf1ad8070a6afdb3d96 Gerrit-Change-Number: 12658 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Reviewer: Todd Lipcon
[Impala-ASF-CR] IMPALA-8100: Add initial support for Ranger
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12632 ) Change subject: IMPALA-8100: Add initial support for Ranger .. Patch Set 6: (1 comment) http://gerrit.cloudera.org:8080/#/c/12632/6/be/src/service/frontend.cc File be/src/service/frontend.cc: http://gerrit.cloudera.org:8080/#/c/12632/6/be/src/service/frontend.cc@41 PS6, Line 41: "org.apache.impala.authorization.sentry.SentryAuthorizationFactory", > I'm actually having a second thought whether it's a good idea to make Autho If we expose it as a pluggable thing, we have to maintain the pluggable API forever. Unless someone is clamoring for it, I'd rather we lock it down by default. Not a blocking concern. -- To view, visit http://gerrit.cloudera.org:8080/12632 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8cad9e609d20aae1ff645c84fd58a02afee70276 Gerrit-Change-Number: 12632 Gerrit-PatchSet: 6 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 05 Mar 2019 05:07:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Simplify JNI exception-handling macros.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12659 Change subject: Simplify JNI exception-handling macros. .. Simplify JNI exception-handling macros. This simplifies the implementations of ABORT_IF_EXC() and CLEAN_EXIT_IF_EXC() by reducing them to usages of JniUtil::GetJniExceptionMsg() and our typical Status-handling macros. The existing implementations didn't clear the exception before calling additional Java methods, whereas JniUtil::GetJniExceptionMsg() already handled that correctly. I tested this manually by renaming a class in catalogd's startup sequence to a class that doesn't exist and checked that the crash logging was useful, containing both the Java and C++ stack traces with both macros. This is part of IMPALA-8250. Change-Id: I38e9e51d3180328935f41bb118647113e5cd409c --- M be/src/util/jni-util.h 1 file changed, 9 insertions(+), 27 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/12659/1 -- To view, visit http://gerrit.cloudera.org:8080/12659 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I38e9e51d3180328935f41bb118647113e5cd409c Gerrit-Change-Number: 12659 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] Remove unused THROW IF ERROR WITH LOGGING, THROW IF EXC, and RETURN IF EXC.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12658 Change subject: Remove unused THROW_IF_ERROR_WITH_LOGGING, THROW_IF_EXC, and RETURN_IF_EXC. .. Remove unused THROW_IF_ERROR_WITH_LOGGING, THROW_IF_EXC, and RETURN_IF_EXC. These macros have no references, so removing them. Found while investigating JNI warnings in IMPALA-8250. The following one-liner does a hap-hazard job of finding similar issues. There are relatively few left, and none of them seemed problematic. (for x in $(git grep '#define' be/src | grep -v be/src/kudu | grep -v be/src/gutil | grep -v be/src/thirdparty | grep -v be/src/transport/config.h | awk '{ print $2 }' | sed -e 's,(.*,,' | sort | uniq); do echo -n "$x "; git grep -e $x | wc -l; done) | awk '$2 == 1 { print }' Change-Id: Ic63b498e0e4da57c7ec15cf1ad8070a6afdb3d96 --- M be/src/util/jni-util.h 1 file changed, 0 insertions(+), 41 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/58/12658/1 -- To view, visit http://gerrit.cloudera.org:8080/12658 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ic63b498e0e4da57c7ec15cf1ad8070a6afdb3d96 Gerrit-Change-Number: 12658 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-8250: Clean up JNI warnings.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12660 Change subject: IMPALA-8250: Clean up JNI warnings. .. IMPALA-8250: Clean up JNI warnings. Using HDFS_OPTS+="-Xcheck:jni" revealed a handful of warnings related to (a) checking for exceptions and (b) leaking local references. Checking for exceptions required sprinkling RETURN_ERROR_IF_EXC left and right. I chose not to expand the JniCall infrastructure to handle this more generally at the moment. The leaky local references are a bit harder. In the logs, they show up as "WARNING: JNI local refs: 2597, exceeds capacity: 35" or similar. A few of these errors seem to be not in our code. The ones that I've found in our code stemmed from HBaseTableScanner::GetRowKey(): this method uses local references and wasn't returning them. Using a JniLocalFrame seems to have taken care of the warnings. I have added code to skip test_large_strings when JNI checking is enabled. This test takes forever (presumably because JNI is checking bounds on strings very aggressively), and times out. The time out also causes some metric-related checks to fail (since a query is still in flight). Debugging this required customizing my JDK to give stack traces when these warnings occurred. The following diff facilitated this. diff -r 76a9c9cf14f1 src/share/vm/prims/jniCheck.cpp --- a/src/share/vm/prims/jniCheck.cpp Tue Jan 15 10:43:31 2019 + +++ b/src/share/vm/prims/jniCheck.cpp Wed Feb 27 11:57:13 2019 -0800 @@ -143,11 +143,30 @@ static const char * fatal_instance_field_mismatch = "Field type (instance) mismatch in JNI get/set field operations"; static const char * fatal_non_string = "JNI string operation received a non-string"; +// thisone: whether to print every time, or maybe, depending on future +// how many future stacks we want printed (totally racy); helps catch +// missing exception handling if there's a way to tickle that code +// reliably. +static inline void dump_native_stack(JavaThread* thr, bool thisone, int future) { + static int fut_stacks = 0; // racy! + if (fut_stacks > 0) { +thisone = true; +fut_stacks--; + } + if (future > 0) fut_stacks = future; + if (thisone) { +frame fr = os::current_frame(); +char buf[6000]; +tty->print_cr("Thread: %s %d", thr->get_thread_name(), thr->osthread()->thread_id()); +print_native_stack(tty, fr, thr, buf, sizeof(buf)); + } +} // When in VM state: static void ReportJNIWarning(JavaThread* thr, const char *msg) { tty->print_cr("WARNING in native method: %s", msg); thr->print_stack(); + dump_native_stack(thr, true, 0); } // When in NATIVE state: @@ -199,11 +218,14 @@ tty->print_cr("WARNING in native method: JNI call made without checking exceptions when required to from %s", thr->get_pending_jni_exception_check()); thr->print_stack(); + dump_native_stack(thr, true, 10); ) thr->clear_pending_jni_exception_check(); // Just complain once } } + + /** * Add to the planned number of handles. I.e. plus current live & warning threshold */ @@ -254,9 +276,12 @@ tty->print_cr("WARNING: JNI local refs: %zu, exceeds capacity: %zu", live_handles, planned_capacity); thr->print_stack(); + dump_native_stack(thr, true, 0); ) // Complain just the once, reset to current + warn threshold add_planned_handle_capacity(handles, 0); + } else { +dump_native_stack(thr, false, 0); } } Change-Id: Idd1709f749a764c1d947704bc64306493863b45f --- M be/src/catalog/catalog.cc M be/src/exec/hbase-table-scanner.cc M be/src/exprs/hive-udf-call.cc M be/src/runtime/hbase-table-factory.cc M be/src/service/frontend.cc M tests/query_test/test_insert.py 6 files changed, 43 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/60/12660/1 -- To view, visit http://gerrit.cloudera.org:8080/12660 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Idd1709f749a764c1d947704bc64306493863b45f Gerrit-Change-Number: 12660 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-8188: Fix DiskInfo::GetDeviceNames() for NVME disks
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12557 ) Change subject: IMPALA-8188: Fix DiskInfo::GetDeviceNames() for NVME disks .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d5bd93b4b09682a8c8248e7aa123d23d27cfeb4 Gerrit-Change-Number: 12557 Gerrit-PatchSet: 3 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 26 Feb 2019 23:30:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8250: Fix a handful of JNI usage errors.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12582 ) Change subject: IMPALA-8250: Fix a handful of JNI usage errors. .. Patch Set 2: Added a JIRA. Yeah; I'm still working on this, but this is the lowest-hanging fruit. -- To view, visit http://gerrit.cloudera.org:8080/12582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Id3b8bccee20cc882cffa4439e550e1f48347d3a6 Gerrit-Change-Number: 12582 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Sahil Takiar Gerrit-Comment-Date: Tue, 26 Feb 2019 21:43:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8250: Fix a handful of JNI usage errors.
Hello Lars Volker, Sahil Takiar, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12582 to look at the new patch set (#2). Change subject: IMPALA-8250: Fix a handful of JNI usage errors. .. IMPALA-8250: Fix a handful of JNI usage errors. Running Impala with LIB_HDFS_OPTS=-Xcheck:jni revealed a case of "FATAL ERROR in native method: Bad global or local ref passed to JNI" as well as 'WARNING in native method: JNI FindClass received a bad class descriptor "Lorg/apache/impala/common/Pair;"' This commit fixes these. There are additional JNI warnings that I'm working through as well as a fundamental question about how to avoid these systematically. Kudos to Lars Volker and Sahil Takiar who co-found these issues. Change-Id: Id3b8bccee20cc882cffa4439e550e1f48347d3a6 --- M be/src/catalog/catalog-util.cc M be/src/util/logging-support.cc 2 files changed, 4 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/12582/2 -- To view, visit http://gerrit.cloudera.org:8080/12582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Id3b8bccee20cc882cffa4439e550e1f48347d3a6 Gerrit-Change-Number: 12582 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Sahil Takiar
[Impala-ASF-CR] Fix a handful of JNI usage errors.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12582 Change subject: Fix a handful of JNI usage errors. .. Fix a handful of JNI usage errors. Running Impala with LIB_HDFS_OPTS=-Xcheck:jni revealed a case of "FATAL ERROR in native method: Bad global or local ref passed to JNI" as well as 'WARNING in native method: JNI FindClass received a bad class descriptor "Lorg/apache/impala/common/Pair;"' This commit fixes these. There are additional JNI warnings that I'm working through as well as a fundamental question about how to avoid these systematically. Kudos to Lars Volker and Sahil Takiar who co-found these issues. Change-Id: Id3b8bccee20cc882cffa4439e550e1f48347d3a6 --- M be/src/catalog/catalog-util.cc M be/src/util/logging-support.cc 2 files changed, 4 insertions(+), 4 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/82/12582/1 -- To view, visit http://gerrit.cloudera.org:8080/12582 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Id3b8bccee20cc882cffa4439e550e1f48347d3a6 Gerrit-Change-Number: 12582 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] Allow CMake to find JNI libraries from JDK
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12580 Change subject: Allow CMake to find JNI libraries from JDK .. Allow CMake to find JNI libraries from JDK JNI libraries can be in JAVA_HOME/jre/lib/amd64 or JAVA_HOME/lib/amd64. We were missing one entry in the list of places to look. This came up when I built a custom OpenJDK for myself and wanted to use it for building. Change-Id: I6e9f9e5b96e2a1c3c0b0ad6cae1a34ca22c1ec19 --- M cmake_modules/FindJNI.cmake 1 file changed, 1 insertion(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/80/12580/1 -- To view, visit http://gerrit.cloudera.org:8080/12580 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I6e9f9e5b96e2a1c3c0b0ad6cae1a34ca22c1ec19 Gerrit-Change-Number: 12580 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[native-toolchain-CR] Add ccache support
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12574 ) Change subject: Add ccache support .. Patch Set 1: Code-Review+1 (2 comments) Looks reasonable to me. Nice speed improvement. http://gerrit.cloudera.org:8080/#/c/12574/1/docker/debian7.df File docker/debian7.df: http://gerrit.cloudera.org:8080/#/c/12574/1/docker/debian7.df@43 PS1, Line 43: wget "https://www.samba.org/ftp/ccache/ccache-3.3.3.tar.gz; && \ I recommend moving this into a shell script. We've also tended to sha-check our binaries. E.g.: bin/bootstrap_system.sh:redhat sha512sum -c - <<< 'c8321aa223f70d7e64d3d0274263000cfffb46fbea61488534e26f9f0245d99e9872d0888e35cd3274416392a13f80c748c07750caaeffa5f9cae1220020715f apache-ant-1.9.13-bin.tar.gz' http://gerrit.cloudera.org:8080/#/c/12574/1/init.sh File init.sh: http://gerrit.cloudera.org:8080/#/c/12574/1/init.sh@138 PS1, Line 138: : ${USE_CCACHE=0} : export USE_CCACHE : : : ${CCACHE_MAXSIZE=50G} : export CCACHE_MAXSIZE : : : ${CCACHE_DIR=$SOURCE_DIR/ccache} : export CCACHE_DIR : : # Default ccache_compilercheck is mtime, which considers CC's mtime + size : # to determine if there's a hit. Setting CCACHE_COMPILERCHECK to 'content' : # uses the hash of the compiler instead. : export CCACHE_COMPILERCHECK=${CCACHE_COMPILERCHECK:-content} : : export CCACHE_COMPRESS=1 Do you think these would be better encapsulated in a .ccache.conf file that we create when we bootstrap ccache? -- To view, visit http://gerrit.cloudera.org:8080/12574 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ieef291d1294a204b0b8da7e7aa4fa642cdd5e144 Gerrit-Change-Number: 12574 Gerrit-PatchSet: 1 Gerrit-Owner: hector.aco...@cloudera.com Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 25 Feb 2019 17:36:38 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8188: Fix DiskInfo::GetDeviceNames() for NVME disks
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12557 ) Change subject: IMPALA-8188: Fix DiskInfo::GetDeviceNames() for NVME disks .. Patch Set 1: Could you extract the function and add a unit test that the regexp does what it's supposed to? Thanks! -- To view, visit http://gerrit.cloudera.org:8080/12557 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4d5bd93b4b09682a8c8248e7aa123d23d27cfeb4 Gerrit-Change-Number: 12557 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Fri, 22 Feb 2019 21:15:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8178: Disable file handle cache for HDFS erasure coded files
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12552 ) Change subject: IMPALA-8178: Disable file handle cache for HDFS erasure coded files .. Patch Set 1: Code-Review+2 (2 comments) Thanks. http://gerrit.cloudera.org:8080/#/c/12552/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12552/1//COMMIT_MSG@12 PS1, Line 12: unbuffer(), so these file handles are not suitable for long Maybe mention that we used /jmx to get at at least part of the memory usage? http://gerrit.cloudera.org:8080/#/c/12552/1/be/src/runtime/io/scan-range.cc File be/src/runtime/io/scan-range.cc: http://gerrit.cloudera.org:8080/#/c/12552/1/be/src/runtime/io/scan-range.cc@463 PS1, Line 463: DCHECK(buffer_opts.client_buffer_ == nullptr || I think you could reasonably assert: DCHECK(!(expected_local && is_erasure_coded)) I.e., if something is expected_local, it can't possibly be erasure coded. -- To view, visit http://gerrit.cloudera.org:8080/12552 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I8c761e08aacc952de0033a4c91e07f15c8ec96da Gerrit-Change-Number: 12552 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 21 Feb 2019 23:59:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8168: Disable Sentry HDFS sync on S3
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12482 ) Change subject: IMPALA-8168: Disable Sentry HDFS sync on S3 .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12482 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifa12f163a32bb6e597323149fbe565a6f66f5e69 Gerrit-Change-Number: 12482 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 21 Feb 2019 16:52:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8233: Do not re-download Ranger if it is already downloaded
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12541 ) Change subject: IMPALA-8233: Do not re-download Ranger if it is already downloaded .. Patch Set 1: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/12541/1/bin/bootstrap_toolchain.py File bin/bootstrap_toolchain.py: http://gerrit.cloudera.org:8080/#/c/12541/1/bin/bootstrap_toolchain.py@425 PS1, Line 425: if not version: : raise Exception("Could not find version for Ranger in environment var {0}" : .format(env_var_version)) While you're here, do you want to move this to between line 421 and 422, so it's closer? -- To view, visit http://gerrit.cloudera.org:8080/12541 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iec3b200bda11d00bba6a250461b37c599d8d1adf Gerrit-Change-Number: 12541 Gerrit-PatchSet: 1 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 21 Feb 2019 16:54:57 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8234: Fix ordering of Thrift enum, fix enum values, add warning
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12543 ) Change subject: IMPALA-8234: Fix ordering of Thrift enum, fix enum values, add warning .. Patch Set 3: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/12543 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: If215f16a636008757ceb439edbd6900a1be88c59 Gerrit-Change-Number: 12543 Gerrit-PatchSet: 3 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 21 Feb 2019 04:00:01 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8185: Abstract out real/mock file system operations
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12437 ) Change subject: IMPALA-8185: Abstract out real/mock file system operations .. Patch Set 4: (2 comments) Just kibitzing: would be good to move the comments to the interface so people can tell what's going on? http://gerrit.cloudera.org:8080/#/c/12437/4/fe/src/main/java/org/apache/impala/analysis/FileSystemFacade.java File fe/src/main/java/org/apache/impala/analysis/FileSystemFacade.java: http://gerrit.cloudera.org:8080/#/c/12437/4/fe/src/main/java/org/apache/impala/analysis/FileSystemFacade.java@64 PS4, Line 64: ScanAllocation allocateScans(FeFsTable table, TScanRangeSpec scanRangeSpecs, Where do we document what this does? http://gerrit.cloudera.org:8080/#/c/12437/4/fe/src/main/java/org/apache/impala/analysis/HdfsFileSystemFacade.java File fe/src/main/java/org/apache/impala/analysis/HdfsFileSystemFacade.java: http://gerrit.cloudera.org:8080/#/c/12437/4/fe/src/main/java/org/apache/impala/analysis/HdfsFileSystemFacade.java@102 PS4, Line 102: public ScanAllocation allocateScans(FeFsTable table, TScanRangeSpec scanRangeSpecs, Do you know what this is used for? It's kind of different from the C++ version of the same (be/src/scheduling/scheduler.h). I don't think it matters, but it's weirdo. I'd basically be surprised of computeNumNodes() isn't just: max(1, min(numExecutors, numBlocks)) i.e., always at least 1 and at most numExecutors, but determined by the number of blocks. -- To view, visit http://gerrit.cloudera.org:8080/12437 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1a385923b64c9fb59cc6e700ee7ee14919398e6d Gerrit-Change-Number: 12437 Gerrit-PatchSet: 4 Gerrit-Owner: Paul Rogers Gerrit-Reviewer: Bharath Vissapragada Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 20 Feb 2019 22:43:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Logging current database context when logging analyzing queries.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12301 ) Change subject: Logging current database context when logging analyzing queries. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12301/1/fe/src/main/java/org/apache/impala/service/Frontend.java File fe/src/main/java/org/apache/impala/service/Frontend.java: http://gerrit.cloudera.org:8080/#/c/12301/1/fe/src/main/java/org/apache/impala/service/Frontend.java@1243 PS1, Line 1243: + queryCtx.session.database); > In Java, a general rule is to use accessor methods, but since this is Thrif I'm not using accessor methods because that was the local style. It's a little selfish, but if the query doesn't have newlines in it (as many queries don't), then you get one query per log line, which is super handy for grep. I can certainly move the db up, like: Analyzing query (default db: ...): if you think that looks better. -- To view, visit http://gerrit.cloudera.org:8080/12301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70886be4685e0d3ae7ca9f6e57e8159dc39c67c7 Gerrit-Change-Number: 12301 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 20 Feb 2019 19:21:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Adding hostname to Disk I/O errors.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12402 ) Change subject: Adding hostname to Disk I/O errors. .. Patch Set 1: (2 comments) http://gerrit.cloudera.org:8080/#/c/12402/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12402/1//COMMIT_MSG@16 PS1, Line 16: sprinked > nit: typo Done http://gerrit.cloudera.org:8080/#/c/12402/1/be/src/runtime/io/disk-io-mgr.cc File be/src/runtime/io/disk-io-mgr.cc: http://gerrit.cloudera.org:8080/#/c/12402/1/be/src/runtime/io/disk-io-mgr.cc@297 PS1, Line 297: > nit: while you're here, remove double space? Done -- To view, visit http://gerrit.cloudera.org:8080/12402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2 Gerrit-Change-Number: 12402 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 20 Feb 2019 19:09:04 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Adding hostname to Disk I/O errors.
Hello Lars Volker, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12402 to look at the new patch set (#2). Change subject: Adding hostname to Disk I/O errors. .. Adding hostname to Disk I/O errors. I recently ran into some queries that failed like so: WARNINGS: Disk I/O error: Could not open file: /data/...: Error(5): Input/output error These warnings were in the profile, but I had to cross-reference impalad logs to figure out which machine had the broken disk. In this commit, I've sprinkled GetBackendString() to include it. Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2 --- M be/src/exprs/timezone_db.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/error-converter.cc M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-monitored-ops.cc M be/src/runtime/io/local-file-reader.cc M common/thrift/generate_error_codes.py 8 files changed, 23 insertions(+), 20 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/12402/2 -- To view, visit http://gerrit.cloudera.org:8080/12402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2 Gerrit-Change-Number: 12402 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] Adding hostname to Disk I/O errors.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12402 ) Change subject: Adding hostname to Disk I/O errors. .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2 Gerrit-Change-Number: 12402 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 20 Feb 2019 19:09:15 + Gerrit-HasComments: No
[Impala-ASF-CR] Adding hostname to Disk I/O errors.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12402 ) Change subject: Adding hostname to Disk I/O errors. .. Patch Set 2: Trivial carry +2. -- To view, visit http://gerrit.cloudera.org:8080/12402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2 Gerrit-Change-Number: 12402 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 20 Feb 2019 19:09:12 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12501 ) Change subject: IMPALA-8191: Wait for additional breakpad processes during test .. Patch Set 2: Code-Review+2 I think get_pids() works better here. Thanks! -- To view, visit http://gerrit.cloudera.org:8080/12501 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca Gerrit-Change-Number: 12501 Gerrit-PatchSet: 2 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 20 Feb 2019 00:03:08 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-8204: checkout to the right commit of Impala-lzo in buildall.sh
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12495 ) Change subject: IMPALA-8204: checkout to the right commit of Impala-lzo in buildall.sh .. Patch Set 2: > I still think we should write down the depended Impala-lzo version > somewhere. Once I want to build an older version of Impala, I don't > need to guess which Impala-lzo commit should I check out to. it's in bin/bootstrap_development.sh, no? > > There's a benifit to do the checkout in buildall.sh. In the future > when I want to build an older version (but >2.13, >3.2), I just > need to checkout to the right tag in Impala and then run > buildall.sh. It'll bootstrap dependencies including the Impala-lzo > to the corresponding versions. > > If you're developing Impala-lzo, you can first commit it locally, > and then change the depended commit id in impala-config-branch.sh. > I think this is the same with other dependencies. For example, if > we want to bump the version of ORC lib, we need to upgrade it in > native-toolchain project and then change the corresponding version > in impala-config.sh. I understand your argument, but I don't think it's acceptable for a "build" script to change branches of git repos. We could treat impala-lzo like we treat the toolchain (i.e., checked out in bootstrap_toolchain.sh), and then I'd sort of see it as a "build artifact" which can be overridden rather than my source code. (BTW, if you choose to cherrypick the num_unqueued_files that drove this mess, we'll be back to using the same impala-lzo code base in both branches...) -- To view, visit http://gerrit.cloudera.org:8080/12495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: I5e4b53d695de04d31d39b6909572907944713ba4 Gerrit-Change-Number: 12495 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Sat, 16 Feb 2019 00:34:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8193: Fix python 2.6 issue in junit prune notrun.py
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12479 ) Change subject: IMPALA-8193: Fix python 2.6 issue in junit_prune_notrun.py .. Patch Set 1: Code-Review+2 Sure. -- To view, visit http://gerrit.cloudera.org:8080/12479 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9ef8fb77b1ac8c51e3dfb6b04690ae9ccc490d62 Gerrit-Change-Number: 12479 Gerrit-PatchSet: 1 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Sat, 16 Feb 2019 00:29:33 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12501 ) Change subject: IMPALA-8191: Wait for additional breakpad processes during test .. Patch Set 1: (3 comments) Ok. Thanks for explaining. I'm fine with the tactical fix, but I think doing process.get_pids() is probably easy enough and will be less confusing. http://gerrit.cloudera.org:8080/#/c/12501/1/tests/common/impala_cluster.py File tests/common/impala_cluster.py: http://gerrit.cloudera.org:8080/#/c/12501/1/tests/common/impala_cluster.py@321 PS1, Line 321: if set(self.cmd) == set(process.cmdline): Do you know why this is set? http://gerrit.cloudera.org:8080/#/c/12501/1/tests/common/impala_cluster.py@322 PS1, Line 322: return pid Instead of returning pid here, you could collect all matches, and then either pick the better one (parent) or fail? Would that have made this more obvious? http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py File tests/custom_cluster/test_breakpad.py: http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py@88 PS1, Line 88: def wait_for_all_processes_dead(self, processes, timeout=300): : # For every process in the list we might see the original Impala process plus a forked : # off child that is writing the minidump. To catch both we go through the list twice. : for process in processes * 2: > These are Process objects (https://github.com/apache/impala/blob/master/tes OMG. I would never have thought that "Process" is really "output of pgrep". How would you feel about changing this to process.get_pids() here, so that it's more obvious that a process can get represented multiple times in this case, and we have an answer? -- To view, visit http://gerrit.cloudera.org:8080/12501 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca Gerrit-Change-Number: 12501 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Sat, 16 Feb 2019 00:27:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8191: Wait for additional breakpad processes during test
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12501 ) Change subject: IMPALA-8191: Wait for additional breakpad processes during test .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py File tests/custom_cluster/test_breakpad.py: http://gerrit.cloudera.org:8080/#/c/12501/1/tests/custom_cluster/test_breakpad.py@88 PS1, Line 88: def wait_for_all_processes_dead(self, processes, timeout=300): : # For every process in the list we might see the original Impala process plus a forked : # off child that is writing the minidump. To catch both we go through the list twice. : for process in processes * 2: I don't understand this. processes here is a list of pids, and we wait() on each of them. Why does waiting on them multiple times help things? It sounded from your description in the commit message is that the issue is where we enumarate which process to wait for (the impalad or its fork), but that's not what this code is doing. I'm sure I'm confused; let me know where! -- To view, visit http://gerrit.cloudera.org:8080/12501 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia4dcc5fecb9b5f38ae1504aae40f099837cf1bca Gerrit-Change-Number: 12501 Gerrit-PatchSet: 1 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 15 Feb 2019 23:00:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8204: checkout to the right commit of Impala-lzo in buildall.sh
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12497 ) Change subject: IMPALA-8204: checkout to the right commit of Impala-lzo in buildall.sh .. Patch Set 3: Please see my reviews in https://gerrit.cloudera.org/#/c/12495/. I don't think it's appropriate for buildall to move git checkouts around. -- To view, visit http://gerrit.cloudera.org:8080/12497 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I5e4b53d695de04d31d39b6909572907944713ba4 Gerrit-Change-Number: 12497 Gerrit-PatchSet: 3 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 15 Feb 2019 17:46:38 + Gerrit-HasComments: No
[Impala-ASF-CR](2.x) IMPALA-8204: checkout to the right commit of Impala-lzo in buildall.sh
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12495 ) Change subject: IMPALA-8204: checkout to the right commit of Impala-lzo in buildall.sh .. Patch Set 2: (2 comments) I don't think this approach is wise. We have two places where we check out Impala-lzo: a) bin/bootstrap_system.sh: git clone --branch master https://github.com/cloudera/impala-lzo.git "$IMPALA_LZO_HOME" b) Jenkins jobs. I think it's ok to move a git branch when doing the checkout, but it's not so nice to move the branch when you're doing buildall. (For example, if I were to need to make a change to impala-lzo, this would confuse me incredibly, since buildall would be undoing my changes!) http://gerrit.cloudera.org:8080/#/c/12495/2/bin/impala-config-branch.sh File bin/impala-config-branch.sh: http://gerrit.cloudera.org:8080/#/c/12495/2/bin/impala-config-branch.sh@26 PS2, Line 26: IMPALA_LZO_VERSION=8a984e0 Do you want to use a long git hash or a tag or a branch name instead? The short git hash is sort of the worst of all worlds, since it's not guaranteed to work if there's a collision. http://gerrit.cloudera.org:8080/#/c/12495/2/buildall.sh File buildall.sh: http://gerrit.cloudera.org:8080/#/c/12495/2/buildall.sh@337 PS2, Line 337: # Checkout to the right version of Impala-lzo. IMPALA_LZO_VERSION is set in : # bin/impala-config-branch.sh : if [[ -e "$IMPALA_LZO" ]]; then : echo "Checkout to commit $IMPALA_LZO_VERSION for Impala-lzo repo in $IMPALA_LZO" : (pushd "$IMPALA_LZO" && git checkout "$IMPALA_LZO_VERSION") : fi I don't think it's appropriate for buildall.sh to move a repo around like this. -- To view, visit http://gerrit.cloudera.org:8080/12495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: 2.x Gerrit-MessageType: comment Gerrit-Change-Id: I5e4b53d695de04d31d39b6909572907944713ba4 Gerrit-Change-Number: 12495 Gerrit-PatchSet: 2 Gerrit-Owner: Quanlong Huang Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 15 Feb 2019 17:45:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-8099: Update the build scripts to support Apache Ranger
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12469 ) Change subject: IMPALA-8099: Update the build scripts to support Apache Ranger .. Patch Set 3: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I249cd64d74518946829e8588ed33d5ac454ffa7b Gerrit-Change-Number: 12469 Gerrit-PatchSet: 3 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Anonymous Coward (402) Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 15 Feb 2019 17:13:35 + Gerrit-HasComments: No
[Impala-ASF-CR] Remove some noisier clang-tidy rules
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12470 ) Change subject: Remove some noisier clang-tidy rules .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12470 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I12d61dccff0f9772165c10238ae178208a100706 Gerrit-Change-Number: 12470 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 13 Feb 2019 23:02:13 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8194: wait longer to detect JVM pause in TestPauseMonitor.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12475 ) Change subject: IMPALA-8194: wait longer to detect JVM pause in TestPauseMonitor. .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12475 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I735c0c0ecfd3a9099c9cef332c5e79854bec7b8d Gerrit-Change-Number: 12475 Gerrit-PatchSet: 1 Gerrit-Owner: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 13 Feb 2019 22:56:08 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8099: Update the build scripts to support Apache Ranger
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12469 ) Change subject: IMPALA-8099: Update the build scripts to support Apache Ranger .. Patch Set 2: (8 comments) If you have a chance, I'd love for you to see that test-with-docker doesn't bail out with this change. No major issues; just some questions below. http://gerrit.cloudera.org:8080/#/c/12469/2/bin/create-test-configuration.sh File bin/create-test-configuration.sh: http://gerrit.cloudera.org:8080/#/c/12469/2/bin/create-test-configuration.sh@204 PS2, Line 204: p -f "${RANGER_TEST_CONF_DIR}/java_home.sh" "${RANGER_SERVER_CONF_DIR}" : cp -f "${RANGER_TEST_CONF_DIR}/ranger-admin-env-hadoopconfdir.sh" \ : "${RANGER_SERVER_CONF_DIR}" : cp -f "${RANGER_TEST_CONF_DIR}/ranger-admin-env-logdir.sh" \ : "${RANGER_SERVER_CONF_DIR}" : cp -f "${RANGER_TEST_CONF_DIR}/ranger-admin-env-piddir.sh" \ : "${RANGER_SERVER_CONF_DIR}" : cp -f "${RANGER_TEST_CONF_DIR}/security-applicationContext.xml" \ : "${RANGER_SERVER_CONF_DIR}" : cp -f "${RANGER_TEST_CONF_DIR}/core-site.xml" "${RANGER_SERVER_CONF_DIR}" Is this "cp ${RANGE_TEST_CONF_DIR}/* $RANGER_SERVER_CONF_DIR"? http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/bin/run-ranger-server.sh File testdata/bin/run-ranger-server.sh: http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/bin/run-ranger-server.sh@31 PS2, Line 31: JAVA_OPTS="-Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=30130" \ Is it intentional that this has debug ports open by default? http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/bin/run-ranger-server.sh@32 PS2, Line 32: "${RANGER_HOME}"/ews/ranger-admin-services.sh restart Does this have an XmX setting? If we don't, these things eat a ton of RAM on machines with a lot of RAM. http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/install.properties.template File testdata/cluster/ranger/install.properties.template: http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/install.properties.template@37 PS2, Line 37: #SQL_CONNECTOR_JAR=/opt/sqlanywhere17/java/sajdbc4.jar This is super noisy with a lot of things commented out. http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/java_home.sh File testdata/cluster/ranger/java_home.sh: http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/java_home.sh@1 PS2, Line 1: export JAVA_HOME=${JAVA_HOME} Why is this necessary? We have JAVA_HOME configured anyway... http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-default-site.xml.template File testdata/cluster/ranger/ranger-admin-default-site.xml.template: http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-default-site.xml.template@510 PS2, Line 510: tzL1AKl5uc4NKYaoQ4P3WLGIBFPXWPWdu1fRm9004jtQiV Is this a problem? http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-env-hadoopconfdir.sh File testdata/cluster/ranger/ranger-admin-env-hadoopconfdir.sh: http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-env-hadoopconfdir.sh@1 PS2, Line 1: export RANGER_HADOOP_CONF_DIR=/etc/hadoop/conf Don't we need to substitute this? http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-env-logdir.sh File testdata/cluster/ranger/ranger-admin-env-logdir.sh: http://gerrit.cloudera.org:8080/#/c/12469/2/testdata/cluster/ranger/ranger-admin-env-logdir.sh@1 PS2, Line 1: export RANGER_ADMIN_LOG_DIR=${RANGER_HOME}/ews/logs We put sentry logs somewhere else, I think... -- To view, visit http://gerrit.cloudera.org:8080/12469 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I249cd64d74518946829e8588ed33d5ac454ffa7b Gerrit-Change-Number: 12469 Gerrit-PatchSet: 2 Gerrit-Owner: Fredy Wijaya Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 13 Feb 2019 19:46:52 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4018 Part1: Add FORMAT clause in CAST()
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12267 ) Change subject: IMPALA-4018 Part1: Add FORMAT clause in CAST() .. Patch Set 6: I don't know if it's plausible, but it may make sense to get the query generator to generate some of these casts in a randomized fashion. (See tests/comparison.) -- To view, visit http://gerrit.cloudera.org:8080/12267 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ia514aaa9e8f5487d396587d5ed24c7348a492697 Gerrit-Change-Number: 12267 Gerrit-PatchSet: 6 Gerrit-Owner: Gabor Kaszab Gerrit-Reviewer: Attila Jeges Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Gabor Kaszab Gerrit-Reviewer: Greg Rahn Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 13 Feb 2019 17:24:57 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8186: script to configure docker network
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12452 ) Change subject: IMPALA-8186: script to configure docker network .. Patch Set 2: (4 comments) No objections here; just thinking out loud about how some of this stuff works. Fine by me to make incremental progress. http://gerrit.cloudera.org:8080/#/c/12452/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12452/2//COMMIT_MSG@15 PS2, Line 15: "start-impala-cluster.py --docker_network=network-name" Should the network name end up in the env variables too? http://gerrit.cloudera.org:8080/#/c/12452/2/docker/configure_test_network.sh File docker/configure_test_network.sh: http://gerrit.cloudera.org:8080/#/c/12452/2/docker/configure_test_network.sh@23 PS2, Line 23: # cluster configurations need to be regenerated, all minicluster processes restarted, nit, extra space. http://gerrit.cloudera.org:8080/#/c/12452/2/docker/configure_test_network.sh@45 PS2, Line 45: GATEWAY=$(docker network inspect "$NETWORK_NAME" | grep -o '"Gateway": "[0-9\.]*"' | \ : grep -o "[0-9.]*") I don't know if the following makes you any happier: $docker network inspect bridge -f '{{(index .IPAM.Config 0).Gateway}}' 10.250.0.1 http://gerrit.cloudera.org:8080/#/c/12452/2/docker/configure_test_network.sh@50 PS2, Line 50: echo "export INTERNAL_LISTEN_HOST=${GATEWAY}" >> bin/impala-config-local.sh This clearly persists across restarts. Does the "docker network create" command persist across reboots? Is Docker deterministic about the gateway? -- To view, visit http://gerrit.cloudera.org:8080/12452 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icb4854aa951bcad7087a9653845b22ffd862057d Gerrit-Change-Number: 12452 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Feb 2019 19:02:19 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Update toolchain to support ubuntu 18.04
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12421 ) Change subject: Update toolchain to support ubuntu 18.04 .. Patch Set 1: Code-Review+2 This is fine by me. -- To view, visit http://gerrit.cloudera.org:8080/12421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ie25c8cb129c6817a2e116f31853ae64c5a8acfe9 Gerrit-Change-Number: 12421 Gerrit-PatchSet: 1 Gerrit-Owner: hector.aco...@cloudera.com Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Mon, 11 Feb 2019 20:17:27 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-8175: improve tests minicluster obj
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12412 ) Change subject: IMPALA-8175: improve tests_minicluster_obj .. Patch Set 2: Code-Review+2 (1 comment) http://gerrit.cloudera.org:8080/#/c/12412/2/tests/comparison/cluster.py File tests/comparison/cluster.py: http://gerrit.cloudera.org:8080/#/c/12412/2/tests/comparison/cluster.py@846 PS2, Line 846: def find_pid(self): : # Need to filter results to avoid pgrep picking up its parent bash script. : # Test with: : # sh -c "pgrep -l -f 'impala.*21050' | grep [i]mpalad | grep -o '^[0-9]*' || true" : pid = self.shell("pgrep -l -f 'impalad.*%s' | grep [i]mpalad | " : "grep -o '^[0-9]*' || true" % self.hs2_port) : if pid: : return int(pid) I'm fine with this, but this is $fuser 21050/tcp 21050/tcp:2820 or possibly $pgrep impalad | head -n 1 2820 -- To view, visit http://gerrit.cloudera.org:8080/12412 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I558b3157bb16ef3d169c0d3e795e03700a17ffe4 Gerrit-Change-Number: 12412 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Brown Gerrit-Reviewer: Andrew Sherman Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Brown Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 08 Feb 2019 17:47:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Adding hostname to Disk I/O errors.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12402 Change subject: Adding hostname to Disk I/O errors. .. Adding hostname to Disk I/O errors. I recently ran into some queries that failed like so: WARNINGS: Disk I/O error: Could not open file: /data/...: Error(5): Input/output error These warnings were in the profile, but I had to cross-reference impalad logs to figure out which machine had the broken disk. In this commit, I've sprinked GetBackendString() to include it. Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2 --- M be/src/exprs/timezone_db.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/disk-io-mgr.cc M be/src/runtime/io/error-converter.cc M be/src/runtime/io/hdfs-file-reader.cc M be/src/runtime/io/hdfs-monitored-ops.cc M be/src/runtime/io/local-file-reader.cc M common/thrift/generate_error_codes.py 8 files changed, 22 insertions(+), 19 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/02/12402/1 -- To view, visit http://gerrit.cloudera.org:8080/12402 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ib977d2c0983ef81ab1338de090239ed57f3efde2 Gerrit-Change-Number: 12402 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] Add raw values to pretty-printed time profiles.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12403 Change subject: Add raw values to pretty-printed time profiles. .. Add raw values to pretty-printed time profiles. Impala text profiles pretty print time values, like "1h3m" and so on. This is handy, but it makes it miserable to programmatically find, say, large values without writing a parser for the pretty-printer. This commit includes the raw values when printing, with the units those raw values come in. (We don't put the units for "bytes", but whereas most metrics measured in bytes are actually measured in bytes, we mix nano and milli seconds in several places.) Change-Id: I3034482b3643968ac9e931bb28692e68b5a709c2 --- M be/src/util/pretty-printer.h 1 file changed, 4 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/03/12403/1 -- To view, visit http://gerrit.cloudera.org:8080/12403 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I3034482b3643968ac9e931bb28692e68b5a709c2 Gerrit-Change-Number: 12403 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-8147: part 1/2: make make *.sh redundant.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12368 ) Change subject: IMPALA-8147: part 1/2: make make_*.sh redundant. .. Patch Set 2: Code-Review+2 (2 comments) Looks good to me. I took a look at bin/jenkins/build-all-flag-combinations.sh and it's not obvious that we should add things in there. http://gerrit.cloudera.org:8080/#/c/12368/2/buildall.sh File buildall.sh: http://gerrit.cloudera.org:8080/#/c/12368/2/buildall.sh@389 PS2, Line 389: generate_cmake_files $build_type Are we trying to make repeated invocations of buildall faster in any way? This re-creates all the cmake stuff, which maybe makes everything invalid? But maybe that was already happening before... http://gerrit.cloudera.org:8080/#/c/12368/2/buildall.sh@427 PS2, Line 427: if [[ ("$build_type" == "ADDRESS_SANITIZER") \ : || ("$build_type" == "TIDY") \ : || ("$build_type" == "UBSAN") \ : || ("$build_type" == "UBSAN_FULL") \ : || ("$build_type" == "TSAN") ]]; then : CMAKE_ARGS+=(-DCMAKE_TOOLCHAIN_FILE=$IMPALA_HOME/cmake_modules/clang_toolchain.cmake) : else : CMAKE_ARGS+=(-DCMAKE_TOOLCHAIN_FILE=$IMPALA_HOME/cmake_modules/toolchain.cmake) : fi It's tempting to push this into CMake itself. Most of this file is passing arguments around, but this is really logic. When I worked on one of these sanitizers, I got super super confused that make_impala had this logic. Anyway, fine not to do, but kibitzing. -- To view, visit http://gerrit.cloudera.org:8080/12368 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I70e4f65712166348ca006bc68e1a1e18e853d3a0 Gerrit-Change-Number: 12368 Gerrit-PatchSet: 2 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 07 Feb 2019 23:22:08 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Enable full stacktrace logging with surefire
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12370 ) Change subject: Enable full stacktrace logging with surefire .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12370 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I17fb7c72ff39dec1b0cbc7af9df3f61a9abc52b7 Gerrit-Change-Number: 12370 Gerrit-PatchSet: 1 Gerrit-Owner: Bharath Vissapragada Gerrit-Reviewer: Fredy Wijaya Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 05 Feb 2019 19:16:47 + Gerrit-HasComments: No
[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11782 ) Change subject: test-with-docker: decrease image size by "de-duping" HDFS. .. Patch Set 2: > I think this change makes sense. Can you rebase? done -- To view, visit http://gerrit.cloudera.org:8080/11782 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1 Gerrit-Change-Number: 11782 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michal Ostrowski Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 05 Feb 2019 18:16:21 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7928: Consistent remote read scheduling
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12037 ) Change subject: IMPALA-7928: Consistent remote read scheduling .. Patch Set 8: Code-Review+2 (3 comments) I'm comfortable with this. Nice work! I'm still thinking a bit about end-to-end testing here. I think we should keep thinking about Tim's docker work to see if we can have a style of test that create remotes read because of containers. We can delay that. http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h File be/src/scheduling/hash-ring.h: http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@106 PS5, Line 106: /// impact the actual allocations. > I think doing some work to reduce our use of strings in the scheduler would I've not seen evidence that scheduling is slow at the moment; I think it's fine to leave this alone. http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@111 PS5, Line 111: }; > Two thoughts here: I thought about this in the previous review. I don't think it's a big deal. The scheduler tries to spread out the load across replicas. If a block has 1 or 2 fake replicas instead of 3, the scheduler will be able to even it out. http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.h File be/src/scheduling/scheduler.h: http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/scheduler.h@191 PS5, Line 191: /// the file name / offset from 'hdfs_file_split' multiple times and look up the > That is a good idea. We want big files to get split up, and this shouldn't Great find, Lars. -- To view, visit http://gerrit.cloudera.org:8080/12037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45 Gerrit-Change-Number: 12037 Gerrit-PatchSet: 8 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 05 Feb 2019 17:38:10 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Revert "IMPALA-8146: Remove make {debug,release,asan}.sh"
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12335 ) Change subject: Revert "IMPALA-8146: Remove make_{debug,release,asan}.sh" .. Patch Set 1: Code-Review+2 I recommend self-verifying and pushing ahead. -- To view, visit http://gerrit.cloudera.org:8080/12335 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ibcb921224071eef2dc77cf7b9f4c618f88aaa7c1 Gerrit-Change-Number: 12335 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 01 Feb 2019 19:29:02 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 9: Verified+1 Code-Review+2 Carrying +2. I've run tests on this manually, so I'm manually verifying. This needs a push to impala-lzo at the same time, so wouldn't pass the pre-commit anyway. -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 9 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 01 Feb 2019 19:03:48 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. This commit removes num_unqueued_files_ and replaces it with a more tightly scoped and easier to reason about remaining_scan_range_submissions_ variable. This variable (and its predecessor) are used as a way to signal to scanner threads they may exit (instead of spinning) because there will never be a scan range provided to them, because no more scan ranges will be added. In practice, most scanner implementations can never call AddDiskIoRanges() after IssueInitialRanges(). The exception is sequence files and Avro, which share a common base class. Instead of incrementing and decrementing this counter in a variety of paths, this commit makes the common case simple (set to 1 initially; decrement at exit points of IssueInitialRanges()) and the complicated, sequence-file case is treated within base-sequence-scanner.cc. Note that this is not the first instance of a subtle bug in this code. The following two JIRAs (and corresponding commits) are fundamentally similar bugs: IMPALA-3798: Disable per-split filtering for sequence-based scanners IMPALA-1730: reduce scanner thread spinning windows We ran into this bug when running TPC-DS query 1 on scale factor 10,000 (10TB) on a 140-node cluster with replica_preference=remote, we observed really high system CPU usage for some of the scan nodes: HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 100.00% - BytesRead: 80.50 MB (84408563) - ScannerThreadsSysTime: 36m17s Using 36 minutes of system time in only 1 minute of wall-clock time required ~30 threads to be spinning in the kernel. We were able to use perf to find a lot of usage of futex_wait() and pthread_cond_wait(). Eventually, we figured out that ScannerThreads, once started, loop forever looking for work. The case that there is no work is supposed to be rare, and the scanner threads are supposed to exit based on num_unqueued_files_ being 0, but, in some cases, that counter isn't appropriately decremented. The reproduction is any query that uses runtime filters to filter out entire files. Something like: set RUNTIME_FILTER_WAIT_TIME_MS=1; select count(*) from customer join customer_address on c_current_addr_sk = ca_address_sk where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist"; triggers this behavior. This code path is covered by several existing tests, most directly in test_runtime_filters.py:test_file_filtering(). Interestingly, though this wastes cycles, query results are unaffected. I initially fixed this bug with a point fix that handled the case when runtime filters caused files to be skipped and added assertions that checked that num_unqueued_files_ was decremented to zero when queries finished. Doing this led me, somewhat slowly, to both finding similar bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had the same bug if the entire file was skipped) and fighting with races on the assertion itself. I eventually concluded that there's really no shared synchronization between progress_.Done() and num_unqueued_files_. The same conclusion is true for the current implementation, so there aren't assertions. I added a metric for how many times the scanners run through their loop without doing any work and observed it to be non-zero for a query from tests/query_test/test_runtime_filters.py:test_wait_time. To measure the effect of this, I set up a cluster of 9 impalad's and 1 coordinator, running against an entirely remote HDFS. The machines were r4.4xlarge and the remote disks were EBS st1's, though everything was likely buffer cached. I ran TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against tpcds_1000_decimal_parquet 10 times. The big observable thing is that ScannerThreadsSysTime went from 5.6 seconds per query to 1.9 seconds per query. (I ran the text profiles through the old-fashioned: grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += $3/100 } END { print x }' ) The query time effect was quite small (the fastest query was 3.373s with the change and 3.82s without the change, but the averages were tighter), but the extra work was visible in the profiles. I happened to rename HdfsScanNode::file_type_counts_ to HdfsScanNode::file_type_counts_lock_ because HdfsScanNodeBase::file_type_counts_ also exists, and is totally different. This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip Zeyliger. Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Reviewed-on: http://gerrit.cloudera.org:8080/12097 Reviewed-by: Philip Zeyliger Tested-by: Philip Zeyliger --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M
[Impala-ASF-CR] test-with-docker: decrease image size by "de-duping" HDFS.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11782 ) Change subject: test-with-docker: decrease image size by "de-duping" HDFS. .. Patch Set 1: (1 comment) > (1 comment) > > You know what somebody once said about shell scripts getting too > big? Yes, I definitely think this is crossing that threshold, but don't want to re-write it quite yet. http://gerrit.cloudera.org:8080/#/c/11782/1/docker/entrypoint.sh File docker/entrypoint.sh: http://gerrit.cloudera.org:8080/#/c/11782/1/docker/entrypoint.sh@228 PS1, Line 228: set +x > Do you need to keep this? Yes, this avoids a lot of verbosity for every copy. It's undone in line 238. -- To view, visit http://gerrit.cloudera.org:8080/11782 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I4a13910ba5e873c31893dbb810a8410547adb2f1 Gerrit-Change-Number: 11782 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Michal Ostrowski Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Thu, 31 Jan 2019 22:47:12 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7928: Consistent remote read scheduling
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12037 ) Change subject: IMPALA-7928: Consistent remote read scheduling .. Patch Set 5: (12 comments) Some more comments. i think this is pretty close. I think we can make the hashing simpler. http://gerrit.cloudera.org:8080/#/c/12037/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12037/5//COMMIT_MSG@12 PS5, Line 12: cache that is at the file level). There is no other cache at the file level, so i think this parenthetical confuses more than helps? http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/backend-config.cc File be/src/scheduling/backend-config.cc: http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/backend-config.cc@22 PS5, Line 22: // TODO: determine the appropriate number of replicas for the hash-ring Are we going to do this, or leave for the future? Fine to leave; just checking that you didn't forget about this accidentally. http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring-test.cc File be/src/scheduling/hash-ring-test.cc: http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring-test.cc@31 PS5, Line 31: class HashRingTest : public ::testing::Test { I don't see a test for GetNode(n) here. Something pretty basic should work. If you're willing to peek at internals, checking that the "overflow" works correctly and that picking n around the biggest address works is working correctly. If we want, we can write crash tests for Adding and Removing nodes that already exist or don't exist, though I'm ok without those. http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring-test.cc@39 PS5, Line 39: EXPECT_EQ(hash_ring.GetTotalReplicas(), num_nodes * hash_ring.GetNumReplicas()); This assumes no collisions? http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h File be/src/scheduling/hash-ring.h: http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@42 PS5, Line 42: and expects the caller to do its own hashing for the item. Does this conflict with the TODO about "function pointer that generates a hash value"? http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@56 PS5, Line 56: : num_replicas_(num_replicas) {} assert that num_replicas >= 1? http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@58 PS5, Line 58: /// Copy constructor Why do we need a copy constructor? We need it to copy the backend, eh? http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@64 PS5, Line 64: for (auto it = hash_ring.node_hashmap_.begin(); : it != hash_ring.node_hashmap_.end(); : it++) { : uint32_t hash_value = it->first; : NodeIterator old_node_it = it->second; : // Look up the equivalent node iterator in the new nodes_ set. : NodeIterator new_node_it = nodes_.find(*old_node_it); : DCHECK(new_node_it != nodes_.end()); : node_hashmap_.insert(std::pair(hash_value, new_node_it)); : } Should this be moved out of the header? http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.h@90 PS5, Line 90: /// value. Add: The hash ring must not be empty. (I do wonder if we can get in trouble if the executors haven't joined the coordinators yet...) http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc File be/src/scheduling/hash-ring.cc: http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc@78 PS5, Line 78: DCHECK_GT(num_removed, 0); If we have just the right hash collisions, it's possible that num_removed == 0. I don't think this affects our correctness (the node just never gets any work scheduled to it, and it's unlikely). But am I understanding this right? http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/scheduling/hash-ring.cc@91 PS5, Line 91: return &(*node_it); If node_it == node_hashmap_.end(), should you return nullptr here? In debug mode, DCHECK_GT will prevent this from reaching, so it probably doesn't matter. http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/util/hash-util.h File be/src/util/hash-util.h: http://gerrit.cloudera.org:8080/#/c/12037/5/be/src/util/hash-util.h@290 PS5, Line 290: class MultipleHashGenerator { I suspect we can just use Rehash32to32 above. -- To view, visit http://gerrit.cloudera.org:8080/12037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Icbf74088a8bd8c285ab7285ea3a01acd1bb53a45 Gerrit-Change-Number: 12037 Gerrit-PatchSet: 5 Gerrit-Owner: Joe McDonnell Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-6932: Speed up scans for sequence datasets with many files
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11517 ) Change subject: IMPALA-6932: Speed up scans for sequence datasets with many files .. Patch Set 6: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/11517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Gerrit-Change-Number: 11517 Gerrit-PatchSet: 6 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 30 Jan 2019 23:29:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6932: Speed up scans for sequence datasets with many files
Philip Zeyliger has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/11517 ) Change subject: IMPALA-6932: Speed up scans for sequence datasets with many files .. IMPALA-6932: Speed up scans for sequence datasets with many files This change addresses the slow scans of sequence datasets with many files by enqueueing the scan ranges to the head of the disk IO queue instead of the tail. This ensures that the data ranges get priority over headers of other files. Hence it produces results earlier for limit queries. Testing: Added a unit test to verify that the expected elements are dequeued from the front. Tested the performance of this patch on S3 to emulate remote reads. The following query was executed several times: "SELECT * FROM TPCH_AVRO.LINEITEM LIMIT 1;" The average timeline difference was 8.66s vs 5.87s. The scanner I/O wait time went down from 2.37s to 9.85s. Tested the patch with backend and end-to-end tests. Single node performance test results: +--++-++++ | Workload | File Format| Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +--++-++++ | TPCH(50) | avro / none / none | 65.62 | -0.38% | 43.51 | -0.79% | +--++-++++ Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Reviewed-on: http://gerrit.cloudera.org:8080/11517 Reviewed-by: Philip Zeyliger Tested-by: Philip Zeyliger --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-text-scanner.cc M be/src/runtime/io/disk-io-mgr-stress.cc M be/src/runtime/io/disk-io-mgr-test.cc M be/src/runtime/io/request-context.cc M be/src/runtime/io/request-context.h M be/src/util/internal-queue-test.cc M be/src/util/internal-queue.h 13 files changed, 159 insertions(+), 115 deletions(-) Approvals: Philip Zeyliger: Looks good to me, approved; Verified -- To view, visit http://gerrit.cloudera.org:8080/11517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Gerrit-Change-Number: 11517 Gerrit-PatchSet: 7 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 8: Code-Review+2 Carrying +2. Rebase was pretty straight-forward. -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 8 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 30 Jan 2019 22:41:35 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6932: Speed up scans for sequence datasets with many files
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/11517 ) Change subject: IMPALA-6932: Speed up scans for sequence datasets with many files .. Patch Set 6: Code-Review+2 Pooja--by pushing this rebase, I may have ruined your GVD job. If it passes, and you need help pushing this along, please let me know. (I'm carrying the +2; I was doing this rebase to work out a conflict for my in progress change.) -- To view, visit http://gerrit.cloudera.org:8080/11517 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I211e2511ea3bb5edea29f1bd63e6b1fa4c4b1965 Gerrit-Change-Number: 11517 Gerrit-PatchSet: 6 Gerrit-Owner: Pooja Nilangekar Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Pooja Nilangekar Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 30 Jan 2019 22:40:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12097 to look at the new patch set (#8). Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. This commit removes num_unqueued_files_ and replaces it with a more tightly scoped and easier to reason about remaining_scan_range_submissions_ variable. This variable (and its predecessor) are used as a way to signal to scanner threads they may exit (instead of spinning) because there will never be a scan range provided to them, because no more scan ranges will be added. In practice, most scanner implementations can never call AddDiskIoRanges() after IssueInitialRanges(). The exception is sequence files and Avro, which share a common base class. Instead of incrementing and decrementing this counter in a variety of paths, this commit makes the common case simple (set to 1 initially; decrement at exit points of IssueInitialRanges()) and the complicated, sequence-file case is treated within base-sequence-scanner.cc. Note that this is not the first instance of a subtle bug in this code. The following two JIRAs (and corresponding commits) are fundamentally similar bugs: IMPALA-3798: Disable per-split filtering for sequence-based scanners IMPALA-1730: reduce scanner thread spinning windows We ran into this bug when running TPC-DS query 1 on scale factor 10,000 (10TB) on a 140-node cluster with replica_preference=remote, we observed really high system CPU usage for some of the scan nodes: HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 100.00% - BytesRead: 80.50 MB (84408563) - ScannerThreadsSysTime: 36m17s Using 36 minutes of system time in only 1 minute of wall-clock time required ~30 threads to be spinning in the kernel. We were able to use perf to find a lot of usage of futex_wait() and pthread_cond_wait(). Eventually, we figured out that ScannerThreads, once started, loop forever looking for work. The case that there is no work is supposed to be rare, and the scanner threads are supposed to exit based on num_unqueued_files_ being 0, but, in some cases, that counter isn't appropriately decremented. The reproduction is any query that uses runtime filters to filter out entire files. Something like: set RUNTIME_FILTER_WAIT_TIME_MS=1; select count(*) from customer join customer_address on c_current_addr_sk = ca_address_sk where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist"; triggers this behavior. This code path is covered by several existing tests, most directly in test_runtime_filters.py:test_file_filtering(). Interestingly, though this wastes cycles, query results are unaffected. I initially fixed this bug with a point fix that handled the case when runtime filters caused files to be skipped and added assertions that checked that num_unqueued_files_ was decremented to zero when queries finished. Doing this led me, somewhat slowly, to both finding similar bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had the same bug if the entire file was skipped) and fighting with races on the assertion itself. I eventually concluded that there's really no shared synchronization between progress_.Done() and num_unqueued_files_. The same conclusion is true for the current implementation, so there aren't assertions. I added a metric for how many times the scanners run through their loop without doing any work and observed it to be non-zero for a query from tests/query_test/test_runtime_filters.py:test_wait_time. To measure the effect of this, I set up a cluster of 9 impalad's and 1 coordinator, running against an entirely remote HDFS. The machines were r4.4xlarge and the remote disks were EBS st1's, though everything was likely buffer cached. I ran TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against tpcds_1000_decimal_parquet 10 times. The big observable thing is that ScannerThreadsSysTime went from 5.6 seconds per query to 1.9 seconds per query. (I ran the text profiles through the old-fashioned: grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += $3/100 } END { print x }' ) The query time effect was quite small (the fastest query was 3.373s with the change and 3.82s without the change, but the averages were tighter), but the extra work was visible in the profiles. I happened to rename HdfsScanNode::file_type_counts_ to HdfsScanNode::file_type_counts_lock_ because HdfsScanNodeBase::file_type_counts_ also exists, and is totally different. This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip Zeyliger. Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 7: (1 comment) http://gerrit.cloudera.org:8080/#/c/12097/7/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/7/be/src/exec/hdfs-scan-node-base.h@279 PS7, Line 279: Issuances > Nit: Can this match up as "Submissions"? Done -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 7 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 30 Jan 2019 21:58:31 + Gerrit-HasComments: Yes
[native-toolchain-CR] Initial support for building the toolchain in docker
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12285 ) Change subject: Initial support for building the toolchain in docker .. Patch Set 2: (15 comments) Thanks! When you say we're pinning the OS, does this actually pin the packages? Does it look like the libc (or whatever) we're getting from pinning to 6.6 actually move? Unfortunately, I bet it does, so we have to be careful about how to use it. That's the underlying reason we need to be able to build the toolchain against explicit images. http://gerrit.cloudera.org:8080/#/c/12285/2/Makefile File Makefile: http://gerrit.cloudera.org:8080/#/c/12285/2/Makefile@2 PS2, Line 2: STAMP_DIR=$(BUILD_DIR)/stamp copyright, etc. http://gerrit.cloudera.org:8080/#/c/12285/2/Makefile@15 PS2, Line 15: ./in-docker.py $(IN_DOCKER_ARGS) $(@F) -- ./buildall.sh |sed -s 's/^/$(@F): /' Does that sed lose error codes because of pipefail-equivalent issues? (My quick testing suggests that it does.) http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py File docker/all/assert-dependencies-present.py: http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@17 PS2, Line 17: # native toolchain. Must be python2.6 compatible. Worth a note about why it must be 2.6 compatible. http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@25 PS2, Line 25: def run(cmd): I think you can call this check_output() since it's essentailly a re-implementation of subprocess.check_output() if we're on 2.6, which doesn't have it, if I remember correctly... http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@34 PS2, Line 34: for s in l: : if re.match(regex, s): : return True : return False I don't know if it's better, but this is basically any and map. >>> import re >>> r = re.compile('a') >>> any(map(r.match, ["a", "b"])) True >>> any(map(r.match, ["c", "b"])) False http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@41 PS2, Line 41: patterns = ['libdb.*.so', : 'libkrb.*.so', : 'libncurses.so', : 'libsasl.*.so', : 'libcrypto.so', : 'libz.so'] There are dots here that are wildcards and dots here that are real dots. Should we be correct about it? http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@47 PS2, Line 47: out = run(['ldconfig', '-p'])[0] : libraries = [] : for line in out.splitlines(): : libraries.append(line.split()[0]) I'm ambivalent, but: libraries = [ line.split()[0] for line in check_output(["ldconfig", "-p"]).splitlines() ] http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@77 PS2, Line 77: print 'Checking program: %s' % p use logging? http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@79 PS2, Line 79: sys.exit('Unable to find %s in PATH' % p) Use exceptions (and line 54)? http://gerrit.cloudera.org:8080/#/c/12285/2/docker/all/assert-dependencies-present.py@85 PS2, Line 85: import distutils.core # noqa: F401 Maybe just move this to top of the file and python will fail and it'll be clear enough? http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py File in-docker.py: http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py@42 PS2, Line 42: in-docker.py --docker-args "-t" impala-toolchain-centos6 -- bash Don't you need the "-i"? http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py@70 PS2, Line 70: '-v', '/etc/passwd:/etc/passwd:ro', : '-v', '/etc/group:/etc/group:ro', We've maybe seen cases where this doesn't work, but I'm fine with it if it works. http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py@77 PS2, Line 77: parser.add_argument('--docker-args', default='') Add help? What's the quoting convention here? http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py@125 PS2, Line 125: xargs = subprocess.Popen(['xargs', '-i', 'cp', '--parents', '{}', distro_build_dir], stdin=git.stdout) I had to look up -i and it looks like maybe it's deprecated? This option is a synonym for -Ireplace-str if replace-str is specified. If the replace-str argument is missing, the effect is the same as -I{}. This option is deprecated; use -I instead. http://gerrit.cloudera.org:8080/#/c/12285/2/in-docker.py@147 PS2, Line 147: sys.stderr.write('Running: %s\n' % ' '.join(cmd)) use logging? -- To view, visit http://gerrit.cloudera.org:8080/12285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType:
[Impala-ASF-CR] IMPALA-8146: Remove make {debug,release,asan}.sh
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12306 ) Change subject: IMPALA-8146: Remove make_{debug,release,asan}.sh .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12306 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Iac997917463b871769112834835cc3f99cff5954 Gerrit-Change-Number: 12306 Gerrit-PatchSet: 1 Gerrit-Owner: Tim Armstrong Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Quanlong Huang Gerrit-Comment-Date: Wed, 30 Jan 2019 18:23:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 6: (4 comments) Thanks. I think we've converged here. Annoyingly, I'll have to push this without the verifier job because I need to push the LZO change simultaneously. http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465 PS5, Line 465: remaining_scan_range_submission > Thanks for the explanation. I suppose the initial source of confusion had t I extended the comment a bit. http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@79 PS5, Line 79: : Status HdfsScanNode::GetNext(RuntimeState* state, RowBatch* row_batch, bool* eos) { : SCOPED_TIMER(runtime_profile_->total_time_counter()); : Sco > I was just wondering if the if-stmt body is empty as DCHECK is not compiled I think we've converged here. The outer #ifndef NDEBUG in the current code removes the if in production, and in debug mode, it's all active. http://gerrit.cloudera.org:8080/#/c/12097/6/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/6/be/src/exec/hdfs-scan-node.cc@196 PS6, Line 196: !(ranges_issued_barrier_.pending() != 0) > Why not ranges_issued_barrier_.pending() == 0 ? Done http://gerrit.cloudera.org:8080/#/c/12097/6/be/src/exec/hdfs-scan-node.cc@197 PS6, Line 197: && initial_ranges_issued_ : && progress_.done()) { > nit: one line I'm down to two lines here, but one line doesn't quite fit. -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 6 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 30 Jan 2019 18:09:00 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12097 to look at the new patch set (#7). Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. This commit removes num_unqueued_files_ and replaces it with a more tightly scoped and easier to reason about remaining_scan_range_submissions_ variable. This variable (and its predecessor) are used as a way to signal to scanner threads they may exit (instead of spinning) because there will never be a scan range provided to them, because no more scan ranges will be added. In practice, most scanner implementations can never call AddDiskIoRanges() after IssueInitialRanges(). The exception is sequence files and Avro, which share a common base class. Instead of incrementing and decrementing this counter in a variety of paths, this commit makes the common case simple (set to 1 initially; decrement at exit points of IssueInitialRanges()) and the complicated, sequence-file case is treated within base-sequence-scanner.cc. Note that this is not the first instance of a subtle bug in this code. The following two JIRAs (and corresponding commits) are fundamentally similar bugs: IMPALA-3798: Disable per-split filtering for sequence-based scanners IMPALA-1730: reduce scanner thread spinning windows We ran into this bug when running TPC-DS query 1 on scale factor 10,000 (10TB) on a 140-node cluster with replica_preference=remote, we observed really high system CPU usage for some of the scan nodes: HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 100.00% - BytesRead: 80.50 MB (84408563) - ScannerThreadsSysTime: 36m17s Using 36 minutes of system time in only 1 minute of wall-clock time required ~30 threads to be spinning in the kernel. We were able to use perf to find a lot of usage of futex_wait() and pthread_cond_wait(). Eventually, we figured out that ScannerThreads, once started, loop forever looking for work. The case that there is no work is supposed to be rare, and the scanner threads are supposed to exit based on num_unqueued_files_ being 0, but, in some cases, that counter isn't appropriately decremented. The reproduction is any query that uses runtime filters to filter out entire files. Something like: set RUNTIME_FILTER_WAIT_TIME_MS=1; select count(*) from customer join customer_address on c_current_addr_sk = ca_address_sk where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist"; triggers this behavior. This code path is covered by several existing tests, most directly in test_runtime_filters.py:test_file_filtering(). Interestingly, though this wastes cycles, query results are unaffected. I initially fixed this bug with a point fix that handled the case when runtime filters caused files to be skipped and added assertions that checked that num_unqueued_files_ was decremented to zero when queries finished. Doing this led me, somewhat slowly, to both finding similar bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had the same bug if the entire file was skipped) and fighting with races on the assertion itself. I eventually concluded that there's really no shared synchronization between progress_.Done() and num_unqueued_files_. The same conclusion is true for the current implementation, so there aren't assertions. I added a metric for how many times the scanners run through their loop without doing any work and observed it to be non-zero for a query from tests/query_test/test_runtime_filters.py:test_wait_time. To measure the effect of this, I set up a cluster of 9 impalad's and 1 coordinator, running against an entirely remote HDFS. The machines were r4.4xlarge and the remote disks were EBS st1's, though everything was likely buffer cached. I ran TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against tpcds_1000_decimal_parquet 10 times. The big observable thing is that ScannerThreadsSysTime went from 5.6 seconds per query to 1.9 seconds per query. (I ran the text profiles through the old-fashioned: grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += $3/100 } END { print x }' ) The query time effect was quite small (the fastest query was 3.373s with the change and 3.82s without the change, but the averages were tighter), but the extra work was visible in the profiles. I happened to rename HdfsScanNode::file_type_counts_ to HdfsScanNode::file_type_counts_lock_ because HdfsScanNodeBase::file_type_counts_ also exists, and is totally different. This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip Zeyliger. Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M
[Impala-ASF-CR] IMPALA-2990: timeout unresponsive queries in coordinator
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12299 ) Change subject: IMPALA-2990: timeout unresponsive queries in coordinator .. Patch Set 1: (5 comments) Cool. http://gerrit.cloudera.org:8080/#/c/12299/1//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12299/1//COMMIT_MSG@29 PS1, Line 29: - Write functional tests once the appropriate mechanisms are in place I think you can "kill -STOP" an impalad running a query, and it will end up looking like this to the coordinator. Of course, maybe something else will timeout. http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/runtime/coordinator.cc@213 PS1, Line 213: backend_states_init_ = true; This could probably just use backend_states_.empty() instead. I don't mind the extra variable, though. We have .empty() usages elsewhere. http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/runtime/coordinator.cc@744 PS1, Line 744: int64_t current_time = MonotonicMillis(); It's confusing, as be/src/runtime/coordinator-backend-state.h has similarly named variables that are in UnixMillis() and MonotonicMillis(). I think this is correct at the moment, which is good. http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/service/impala-server.cc File be/src/service/impala-server.cc: http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/service/impala-server.cc@2212 PS1, Line 2212: true Do we have a style to write TerminatedByServer(, true /* expire */) ? http://gerrit.cloudera.org:8080/#/c/12299/1/be/src/service/impala-server.cc@2209 PS1, Line 2209: int64_t lag_time_ms = coord->GetMaxBackendStateLagMs(); : if (lag_time_ms > FLAGS_max_report_lag_s * 1000) { : cancellation_thread_pool_->Offer(CancellationWork::TerminatedByServer( : request_state->query_id(), Status("hung query"), true)); If I were a user receiving this error, I'd immediately want to know which backend failed. Let's include more information here, including the values of the flags, the max lag, and where the max lag was observed. -- To view, visit http://gerrit.cloudera.org:8080/12299 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I196c8c6a5633b1960e2c3a3884777be9b3824987 Gerrit-Change-Number: 12299 Gerrit-PatchSet: 1 Gerrit-Owner: Thomas Marshall Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Tue, 29 Jan 2019 23:31:01 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Logging current database context when logging analyzing queries.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12301 Change subject: Logging current database context when logging analyzing queries. .. Logging current database context when logging analyzing queries. The "analyzing query" log message is a reliable way to correlate queries with ther thread ids, but the session database name is missing there. Adding it in. Change-Id: I70886be4685e0d3ae7ca9f6e57e8159dc39c67c7 --- M fe/src/main/java/org/apache/impala/service/Frontend.java 1 file changed, 2 insertions(+), 1 deletion(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/01/12301/1 -- To view, visit http://gerrit.cloudera.org:8080/12301 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I70886be4685e0d3ae7ca9f6e57e8159dc39c67c7 Gerrit-Change-Number: 12301 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] Revert "IMPALA-7992: Revert "Symbolize stacktraces in debug builds.""
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12300 Change subject: Revert "IMPALA-7992: Revert "Symbolize stacktraces in debug builds."" .. Revert "IMPALA-7992: Revert "Symbolize stacktraces in debug builds."" I believe IMPALA-7992 is addressed with "IMPALA-7992: Reduce iterations for test_decimal_fuzz." which reduces the number of iterations. Bringing back debug symbols allows us to go back to symbolizing debug traces, which is very handy for logs that are unattached to the binary that generated them. This reverts commit 5bf81cdc2797f986189aec4e78ebff2c2d1ed1b6. Change-Id: Idd9c444da4016eb6935119d03fec6b07c17dda53 --- M be/src/common/init.cc 1 file changed, 5 insertions(+), 0 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/00/12300/1 -- To view, visit http://gerrit.cloudera.org:8080/12300 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Idd9c444da4016eb6935119d03fec6b07c17dda53 Gerrit-Change-Number: 12300 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 5: (3 comments) Thanks for the reviews. The non-assertion (but rather printing) caught one more bug, wherein the Avro stuff wasn't decrementing when skipping files. I've fixed that. I also renamed a variable that was the same name in a base class and a subclass. I was using the file_type_counter_ business to debug something. It's very minor, so I snuck it in. http://gerrit.cloudera.org:8080/#/c/12097/5//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12097/5//COMMIT_MSG@85 PS5, Line 85: adn > Nit: and Done http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465 PS5, Line 465: remaining_scan_range_issuances_ > Here is how it works (as I understand it, which might be incomplete): I am now trying remaining_scan_range_submissions_. I'd argue that it's all better than num_unqueued_files_. :) http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@79 PS5, Line 79: auto remaining = remaining_scan_range_issuances_.Load(); : if (remaining != 0) { : LOG(INFO) << "XXX Assertion Failed " << remaining << " " << GetStackTrace(); : } > If you take the suggestion of using the above DCHECK, I don't think you nee You were right and we can move this into Close() once the threads have been joined. I did switch to DCHECK, but I'm still using the ifdef. I could write a more complicated boolean expression rather than using the if(... && ... && ...) business, but I think it's clearer with the ifndef. -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 29 Jan 2019 23:11:31 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12097 to look at the new patch set (#6). Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. This commit removes num_unqueued_files_ and replaces it with a more tightly scoped and easier to reason about remaining_scan_range_submissions_ variable. This variable (and its predecessor) are used as a way to signal to scanner threads they may exit (instead of spinning) because there will never be a scan range provided to them, because no more scan ranges will be added. In practice, most scanner implementations can never call AddDiskIoRanges() after IssueInitialRanges(). The exception is sequence files and Avro, which share a common base class. Instead of incrementing and decrementing this counter in a variety of paths, this commit makes the common case simple (set to 1 initially; decrement at exit points of IssueInitialRanges()) and the complicated, sequence-file case is treated within base-sequence-scanner.cc. Note that this is not the first instance of a subtle bug in this code. The following two JIRAs (and corresponding commits) are fundamentally similar bugs: IMPALA-3798: Disable per-split filtering for sequence-based scanners IMPALA-1730: reduce scanner thread spinning windows We ran into this bug when running TPC-DS query 1 on scale factor 10,000 (10TB) on a 140-node cluster with replica_preference=remote, we observed really high system CPU usage for some of the scan nodes: HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 100.00% - BytesRead: 80.50 MB (84408563) - ScannerThreadsSysTime: 36m17s Using 36 minutes of system time in only 1 minute of wall-clock time required ~30 threads to be spinning in the kernel. We were able to use perf to find a lot of usage of futex_wait() and pthread_cond_wait(). Eventually, we figured out that ScannerThreads, once started, loop forever looking for work. The case that there is no work is supposed to be rare, and the scanner threads are supposed to exit based on num_unqueued_files_ being 0, but, in some cases, that counter isn't appropriately decremented. The reproduction is any query that uses runtime filters to filter out entire files. Something like: set RUNTIME_FILTER_WAIT_TIME_MS=1; select count(*) from customer join customer_address on c_current_addr_sk = ca_address_sk where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist"; triggers this behavior. This code path is covered by several existing tests, most directly in test_runtime_filters.py:test_file_filtering(). Interestingly, though this wastes cycles, query results are unaffected. I initially fixed this bug with a point fix that handled the case when runtime filters caused files to be skipped and added assertions that checked that num_unqueued_files_ was decremented to zero when queries finished. Doing this led me, somewhat slowly, to both finding similar bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had the same bug if the entire file was skipped) and fighting with races on the assertion itself. I eventually concluded that there's really no shared synchronization between progress_.Done() and num_unqueued_files_. The same conclusion is true for the current implementation, so there aren't assertions. I added a metric for how many times the scanners run through their loop without doing any work and observed it to be non-zero for a query from tests/query_test/test_runtime_filters.py:test_wait_time. To measure the effect of this, I set up a cluster of 9 impalad's and 1 coordinator, running against an entirely remote HDFS. The machines were r4.4xlarge and the remote disks were EBS st1's, though everything was likely buffer cached. I ran TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against tpcds_1000_decimal_parquet 10 times. The big observable thing is that ScannerThreadsSysTime went from 5.6 seconds per query to 1.9 seconds per query. (I ran the text profiles through the old-fashioned: grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += $3/100 } END { print x }' ) The query time effect was quite small (the fastest query was 3.373s with the change and 3.82s without the change, but the averages were tighter), but the extra work was visible in the profiles. I happened to rename HdfsScanNode::file_type_counts_ to HdfsScanNode::file_type_counts_lock_ because HdfsScanNodeBase::file_type_counts_ also exists, and is totally different. This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip Zeyliger. Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 5: (4 comments) http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node-base.h@465 PS5, Line 465: remaining_scan_range_issuances_ > This feels a bit of a misnomer as parquet scanner will actually issue more Do you have any suggestions? Todd previously said: If we want to make this reflect its actual numeric value, maybe something like remaining_scan_range_addition_steps ? ie the Parquet scanner has one "scan range addition step" whereas the text ones have one such step per file? Or perhaps remaining_scan_range_issuances_? http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc File be/src/exec/hdfs-scan-node.cc: http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@78 PS5, Line 78: if (!(ranges_issued_barrier_.pending() != 0) && initial_ranges_issued_ && progress_.done()) { > line too long (95 > 90) Done http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@79 PS5, Line 79: auto remaining = remaining_scan_range_issuances_.Load(); : if (remaining != 0) { : LOG(INFO) << "XXX Assertion Failed " << remaining << " " << GetStackTrace(); : } > DCHECK_EQ(remaining_scan_range_issuances_.Load(), 0) << GetStackTrace(); D'oh. I don't know how this one got by me. http://gerrit.cloudera.org:8080/#/c/12097/5/be/src/exec/hdfs-scan-node.cc@212 PS5, Line 212: thread_state_.Close(this); > I wonder if you can DCHECK remaining_scan_range_issuances_ to 0 after threa I'll try it. I think the limit reach case is handled by progress_.done() being false. -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Tue, 29 Jan 2019 00:26:48 + Gerrit-HasComments: Yes
[native-toolchain-CR] Initial support for building the toolchain in docker
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12285 ) Change subject: Initial support for building the toolchain in docker .. Patch Set 1: (16 comments) Thanks! This is a huge improvement on the Jenkins-only version of this. in-docker.sh is borderline worthy of re-writing in Python, though it's a bit hard to tell whether or not it'll get shorter. I'll leave that up to you. I think we should try to separate the docker image building versus the building a little bit more than you have. Historically, we've pinned our AMI's, so the build is fairly reproducible. As you've got it here, a change to RedHat's libc will tease its way into our stuff, and maybe we want that to be explicit. I don't think that's a big deal: it's a matter of defining a default place to pull the images from... http://gerrit.cloudera.org:8080/#/c/12285/1/buildall-docker.sh File buildall-docker.sh: http://gerrit.cloudera.org:8080/#/c/12285/1/buildall-docker.sh@2 PS1, Line 2: # Copyright 2017 Cloudera Inc. 2019. http://gerrit.cloudera.org:8080/#/c/12285/1/buildall-docker.sh@23 PS1, Line 23: for image in $(cd docker; ./buildall.sh); do Perhaps add a TODO for parallelism? http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/assert.sh File docker/all/assert.sh: http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/assert.sh@2 PS1, Line 2: # Copyright 2015 Cloudera Inc. 2019 http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/assert.sh@16 PS1, Line 16: # Assert that a system has the correct libraries/binaries to build the Perhaps rename assert to something more friendly? Like "assert-dependencies-present.sh"? http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/assert.sh@19 PS1, Line 19: set -u : set -e : set -o pipefail It's temping to write this in python from the get-go. The parsing of ldconfig -p in particular is a bit tedious in shell. http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/assert.sh@35 PS1, Line 35: rm -f /tmp/lib You seem like you use a temp dir on line 23; what's this about? http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/postinstall.sh File docker/all/postinstall.sh: http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/postinstall.sh@2 PS1, Line 2: # Copyright 2015 Cloudera Inc. 2019 http://gerrit.cloudera.org:8080/#/c/12285/1/docker/all/postinstall.sh@18 PS1, Line 18: getent hosts github.com| xargs -n1 ssh-keyscan -t rsa,dsa >> /etc/ssh/ssh_known_hosts Comment what this does? http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos/yum-install File docker/centos/yum-install: http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos/yum-install@23 PS1, Line 23: yum install $@ 2>&1 | tee $t Does this preserve yum failures, given that you don't have pipefail set? Should this be "$@" rather than $@? http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos/yum-install@25 PS1, Line 25: >&2 echo "yum-install: One or more packages not available but specified as arguments" avoid tabs? http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos5.df File docker/centos5.df: http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos5.df@47 PS1, Line 47: # XXX: Thrift 0.9.3 requires python2.6, and uses system python. ??? Can we manipulate this via PATH? http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos5/epel/epel-release-5-4.noarch.rpm File docker/centos5/epel/epel-release-5-4.noarch.rpm: http://gerrit.cloudera.org:8080/#/c/12285/1/docker/centos5/epel/epel-release-5-4.noarch.rpm@a1 PS1, Line 1: I always worry about binary files in git repos... Can we do this in a more text-y way? (perhaps by just editing yum.repos.d files? http://gerrit.cloudera.org:8080/#/c/12285/1/in-docker.sh File in-docker.sh: http://gerrit.cloudera.org:8080/#/c/12285/1/in-docker.sh@16 PS1, Line 16: # Script that sets up environment used in order to perform full (or partial) Can you document how the file system mounting is set up here? i.e., where the output is. http://gerrit.cloudera.org:8080/#/c/12285/1/in-docker.sh@49 PS1, Line 49: TARGET_DIR=/mnt Is this useful to be configured? http://gerrit.cloudera.org:8080/#/c/12285/1/in-docker.sh@66 PS1, Line 66: while [[ $1 ]]; do Can you do this with "-z" and avoid the "set +u" stuff? http://gerrit.cloudera.org:8080/#/c/12285/1/in-docker.sh@87 PS1, Line 87: # GCC_VERSION works here Comment that this looks for subdirectories of source (like source/foo) and produces variables FOO_VERSION for them. You don't really need the basename here. You can do something like: find . -mindepth 1 -maxdepth 1 -type d -printf "%f\n" The %f gives you the last component of the name. -- To view, visit http://gerrit.cloudera.org:8080/12285 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id:
[native-toolchain-CR] Fix orc compilation failures in centos 5
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12284 ) Change subject: Fix orc compilation failures in centos 5 .. Patch Set 1: We shouldn't be doing anything that requires centos:5? -- To view, visit http://gerrit.cloudera.org:8080/12284 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: native-toolchain Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I741d55b55d0dfdfd99bd393855b439166367ee10 Gerrit-Change-Number: 12284 Gerrit-PatchSet: 1 Gerrit-Owner: hector.aco...@cloudera.com Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 28 Jan 2019 22:03:00 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12097 to look at the new patch set (#5). Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. This commit removes num_unqueued_files_ and replaces it with a more tightly scoped and easier to reason about remaining_scan_range_issuances_ variable. This variable (and its predecessor) are used as a way to signal to scanner threads they may exit (instead of spinning) because there will never be a scan range provided to them, because no more scan ranges will be added. In practice, most scanner implementations can never call AddDiskIoRanges() after IssueInitialRanges(). The exception is sequence files and Avro, which share a common base class. Instead of incrementing and decrementing this counter in a variety of paths, this commit makes the common case simple (set to 1 initially; decrement at exit points of IssueInitialRanges()) and the complicated, sequence-file case is treated within base-sequence-scanner.cc. Note that this is not the first instance of a subtle bug in this code. The following two JIRAs (and corresponding commits) are fundamentally similar bugs: IMPALA-3798: Disable per-split filtering for sequence-based scanners IMPALA-1730: reduce scanner thread spinning windows We ran into this bug when running TPC-DS query 1 on scale factor 10,000 (10TB) on a 140-node cluster with replica_preference=remote, we observed really high system CPU usage for some of the scan nodes: HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 100.00% - BytesRead: 80.50 MB (84408563) - ScannerThreadsSysTime: 36m17s Using 36 minutes of system time in only 1 minute of wall-clock time required ~30 threads to be spinning in the kernel. We were able to use perf to find a lot of usage of futex_wait() and pthread_cond_wait(). Eventually, we figured out that ScannerThreads, once started, loop forever looking for work. The case that there is no work is supposed to be rare, and the scanner threads are supposed to exit based on num_unqueued_files_ being 0, but, in some cases, that counter isn't appropriately decremented. The reproduction is any query that uses runtime filters to filter out entire files. Something like: set RUNTIME_FILTER_WAIT_TIME_MS=1; select count(*) from customer join customer_address on c_current_addr_sk = ca_address_sk where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist"; triggers this behavior. This code path is covered by several existing tests, most directly in test_runtime_filters.py:test_file_filtering(). Interestingly, though this wastes cycles, query results are unaffected. I initially fixed this bug with a point fix that handled the case when runtime filters caused files to be skipped and added assertions that checked that num_unqueued_files_ was decremented to zero when queries finished. Doing this led me, somewhat slowly, to both finding similar bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had the same bug if the entire file was skipped) and fighting with races on the assertion itself. I eventually concluded that there's really no shared synchronization between progress_.Done() and num_unqueued_files_. The same conclusion is true for the current implementation, so there aren't assertions. I added a metric for how many times the scanners run through their loop without doing any work and observed it to be non-zero for a query from tests/query_test/test_runtime_filters.py:test_wait_time. To measure the effect of this, I set up a cluster of 9 impalad's and 1 coordinator, running against an entirely remote HDFS. The machines were r4.4xlarge and the remote disks were EBS st1's, though everything was likely buffer cached. I ran TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against tpcds_1000_decimal_parquet 10 times. The big observable thing is that ScannerThreadsSysTime went from 5.6 seconds per query to 1.9 seconds per query. (I ran the text profiles through the old-fashioned: grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += $3/100 } END { print x }' ) The query time effect was quite small (the fastest query was 3.373s with the change adn 3.82s without the change, but the averages were tighter), but the extra work was visible in the profiles. This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip Zeyliger. Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-text-scanner.cc M
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/12097/3/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12097/3/be/src/exec/hdfs-scan-node-base.cc@452 PS3, Line 452: SCOPED_CLEANUP({ UpdateRemainingScanRangeIssuances(-1); }); > We have MakeScopeExitTrigger in Impala, but it seems like we could just swi Seems better to be consistent, so I switched to MakeScopeExitTrigger. -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 25 Jan 2019 19:05:11 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12097 to look at the new patch set (#4). Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. This commit removes num_unqueued_files_ and replaces it with a more tightly scoped and easier to reason about remaining_scan_range_issuances_ variable. This variable (and its predecessor) are used as a way to signal to scanner threads they may exit (instead of spinning) because there will never be a scan range provided to them, because no more scan ranges will be added. In practice, most scanner implementations can never call AddDiskIoRanges() after IssueInitialRanges(). The exception is sequence files and Avro, which share a common base class. Instead of incrementing and decrementing this counter in a variety of paths, this commit makes the common case simple (set to 1 initially; decrement at exit points of IssueInitialRanges()) and the complicated, sequence-file case is treated within base-sequence-scanner.cc. Note that this is not the first instance of a subtle bug in this code. The following two JIRAs (and corresponding commits) are fundamentally similar bugs: IMPALA-3798: Disable per-split filtering for sequence-based scanners IMPALA-1730: reduce scanner thread spinning windows We ran into this bug when running TPC-DS query 1 on scale factor 10,000 (10TB) on a 140-node cluster with replica_preference=remote, we observed really high system CPU usage for some of the scan nodes: HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 100.00% - BytesRead: 80.50 MB (84408563) - ScannerThreadsSysTime: 36m17s Using 36 minutes of system time in only 1 minute of wall-clock time required ~30 threads to be spinning in the kernel. We were able to use perf to find a lot of usage of futex_wait() and pthread_cond_wait(). Eventually, we figured out that ScannerThreads, once started, loop forever looking for work. The case that there is no work is supposed to be rare, and the scanner threads are supposed to exit based on num_unqueued_files_ being 0, but, in some cases, that counter isn't appropriately decremented. The reproduction is any query that uses runtime filters to filter out entire files. Something like: set RUNTIME_FILTER_WAIT_TIME_MS=1; select count(*) from customer join customer_address on c_current_addr_sk = ca_address_sk where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist"; triggers this behavior. This code path is covered by several existing tests, most directly in test_runtime_filters.py:test_file_filtering(). Interestingly, though this wastes cycles, query results are unaffected. I initially fixed this bug with a point fix that handled the case when runtime filters caused files to be skipped and added assertions that checked that num_unqueued_files_ was decremented to zero when queries finished. Doing this led me, somewhat slowly, to both finding similar bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had the same bug if the entire file was skipped) and fighting with races on the assertion itself. I eventually concluded that there's really no shared synchronization between progress_.Done() and num_unqueued_files_. The same conclusion is true for the current implementation, so there aren't assertions. I added a metric for how many times the scanners run through their loop without doing any work and observed it to be non-zero for a query from tests/query_test/test_runtime_filters.py:test_wait_time. To measure the effect of this, I set up a cluster of 9 impalad's and 1 coordinator, running against an entirely remote HDFS. The machines were r4.4xlarge and the remote disks were EBS st1's, though everything was likely buffer cached. I ran TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against tpcds_1000_decimal_parquet 10 times. The big observable thing is that ScannerThreadsSysTime went from 5.6 seconds per query to 1.9 seconds per query. (I ran the text profiles through the old-fashioned: grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += $3/100 } END { print x }' ) The query time effect was quite small (the fastest query was 3.373s with the change adn 3.82s without the change, but the averages were tighter), but the extra work was visible in the profiles. This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip Zeyliger. Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-text-scanner.cc M
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 3: The build failure here is because this needs to be pushed simultaneously with an LZO change. https://github.com/cloudera/impala-lzo/pull/4/files Tim--would you mind doing another pass? -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 3 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Fri, 25 Jan 2019 00:27:51 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6664: Tag log statements with fragment or query ids.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12129 ) Change subject: IMPALA-6664: Tag log statements with fragment or query ids. .. Patch Set 8: Code-Review+2 Carry +2. I had to make a minor change to the test because the log directory is actually in $IMPALA_EE_LOG_DIR (-ish). Also, functional.alltypes has more partitions/files, so serves as a more reliable test case. The tpcds table I was using was only running on two hosts, so had a bit more trouble. -- To view, visit http://gerrit.cloudera.org:8080/12129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6634ef9d1a7346339f24f2d40a7a3aa36a535da8 Gerrit-Change-Number: 12129 Gerrit-PatchSet: 8 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Thu, 24 Jan 2019 20:17:15 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6664: Tag log statements with fragment or query ids.
Hello Paul Rogers, Zoltan Borok-Nagy, Zoram Thanga, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12129 to look at the new patch set (#8). Change subject: IMPALA-6664: Tag log statements with fragment or query ids. .. IMPALA-6664: Tag log statements with fragment or query ids. This implements much of the desire in IMPALA-6664 to tag all log statements with their query ids. It re-uses the existing ThreadDebugInfo infrastructure as well as the existing InstallLogMessageListenerFunction() patch to glog (currently used for log redaction) to prefix log messages with fragment ids or query ids, when available. The fragment id is the query id with the last bits incremented, so it's possible to correlate a given query's log messages. For example: $ grep 85420d575b9ff4b9:402b8868 logs/cluster/impalad.INFO I0108 10:39:16.453958 14752 impala-server.cc:1052] 85420d575b9ff4b9:402b8868] Registered query query_id=85420d575b9ff4b9:402b8868 session_id=aa45e480434f0516:101ae5ac12679d94 I0108 10:39:16.454738 14752 Frontend.java:1242] 85420d575b9ff4b9:402b8868] Analyzing query: select count(*) from tpcds.web_sales I0108 10:39:16.456627 14752 Frontend.java:1282] 85420d575b9ff4b9:402b8868] Analysis finished. I0108 10:39:16.463538 14818 admission-controller.cc:598] 85420d575b9ff4b9:402b8868] Schedule for id=85420d575b9ff4b9:402b8868 in pool_name=default-pool per_host_mem_estimate=180.02 MB PoolConfig: max_requests=-1 max_queued=200 max_mem=-1.00 B I0108 10:39:16.463603 14818 admission-controller.cc:603] 85420d575b9ff4b9:402b8868] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0) I0108 10:39:16.463780 14818 admission-controller.cc:635] 85420d575b9ff4b9:402b8868] Admitted query id=85420d575b9ff4b9:402b8868 I0108 10:39:16.463896 14818 coordinator.cc:93] 85420d575b9ff4b9:402b8868] Exec() query_id=85420d575b9ff4b9:402b8868 stmt=select count(*) from tpcds.web_sales I0108 10:39:16.464795 14818 coordinator.cc:356] 85420d575b9ff4b9:402b8868] starting execution on 2 backends for query_id=85420d575b9ff4b9:402b8868 I0108 10:39:16.466384 24891 impala-internal-service.cc:49] ExecQueryFInstances(): query_id=85420d575b9ff4b9:402b8868 coord=pannier.sf.cloudera.com:22000 #instances=2 I0108 10:39:16.467339 14818 coordinator.cc:370] 85420d575b9ff4b9:402b8868] started execution on 2 backends for query_id=85420d575b9ff4b9:402b8868 I0108 10:39:16.467536 14823 query-state.cc:579] 85420d575b9ff4b9:402b8868] Executing instance. instance_id=85420d575b9ff4b9:402b8868 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=1 I0108 10:39:16.467627 14824 query-state.cc:579] 85420d575b9ff4b9:402b88680001] Executing instance. instance_id=85420d575b9ff4b9:402b88680001 fragment_idx=1 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2 I0108 10:39:16.820933 14824 query-state.cc:587] 85420d575b9ff4b9:402b88680001] Instance completed. instance_id=85420d575b9ff4b9:402b88680001 #in-flight=1 status=OK I0108 10:39:17.122299 14823 krpc-data-stream-mgr.cc:294] 85420d575b9ff4b9:402b8868] DeregisterRecvr(): fragment_instance_id=85420d575b9ff4b9:402b8868, node=2 I0108 10:39:17.123500 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22001 remaining=2 query_id=85420d575b9ff4b9:402b8868 I0108 10:39:17.123509 24038 coordinator-backend-state.cc:265] query_id=85420d575b9ff4b9:402b8868: first in-progress backend: pannier.sf.cloudera.com:22000 I0108 10:39:17.167752 14752 impala-beeswax-server.cc:197] 85420d575b9ff4b9:402b8868] get_results_metadata(): query_id=85420d575b9ff4b9:402b8868 I0108 10:39:17.168762 14752 coordinator.cc:483] 85420d575b9ff4b9:402b8868] ExecState: query id=85420d575b9ff4b9:402b8868 execution completed I0108 10:39:17.168808 14752 coordinator.cc:608] 85420d575b9ff4b9:402b8868] Coordinator waiting for backends to finish, 1 remaining. query_id=85420d575b9ff4b9:402b8868 I0108 10:39:17.168880 14823 query-state.cc:587] 85420d575b9ff4b9:402b8868] Instance completed. instance_id=85420d575b9ff4b9:402b8868 #in-flight=0 status=OK I0108 10:39:17.168977 14821 query-state.cc:252] UpdateBackendExecState(): last report for 85420d575b9ff4b9:402b8868 I0108 10:39:17.174401 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22000 remaining=1 query_id=85420d575b9ff4b9:402b8868 I0108 10:39:17.174513 14752 coordinator.cc:814] 85420d575b9ff4b9:402b8868] Release admission control
[Impala-ASF-CR] Support centos:7 for test-with-docker.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12139 ) Change subject: Support centos:7 for test-with-docker. .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/12139/1/bin/bootstrap_system.sh File bin/bootstrap_system.sh: http://gerrit.cloudera.org:8080/#/c/12139/1/bin/bootstrap_system.sh@253 PS1, Line 253: redhat6 sudo service ntpd start || grep docker /proc/1/cgroup > Ignoring the nit, we stop ntpd on redhat7 notindocker. Don't we need to sta Ah, I understand now. Adding. -- To view, visit http://gerrit.cloudera.org:8080/12139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I7577949b6eaaa2239bcf0fadf64e1490c2106b08 Gerrit-Change-Number: 12139 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 23 Jan 2019 18:54:41 + Gerrit-HasComments: Yes
[Impala-ASF-CR] Support centos:7 for test-with-docker.
Hello Laszlo Gaal, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12139 to look at the new patch set (#2). Change subject: Support centos:7 for test-with-docker. .. Support centos:7 for test-with-docker. As a follow-on to IMPALA-7698, adds various incantations so that centos:7 can build under test-with-docker. The core issue is that the centos:7 image doesn't let you start sshd (necessary for the HBase startup scripts, and probably could be worked around) or postgresql (harder to work around) with systemctl, because systemd isn't "running." To avoid this, we start them manually with /usr/sbin/sshd and pg_ctl. Change-Id: I7577949b6eaaa2239bcf0fadf64e1490c2106b08 --- M bin/bootstrap_system.sh M docker/entrypoint.sh 2 files changed, 90 insertions(+), 12 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/39/12139/2 -- To view, visit http://gerrit.cloudera.org:8080/12139 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I7577949b6eaaa2239bcf0fadf64e1490c2106b08 Gerrit-Change-Number: 12139 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Philip Zeyliger
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 3: (12 comments) Thanks for the reviews. Here's another iteration. The LZO stuff also needed handling, it turns out, so that's in progress. http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG Commit Message: http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@7 PS2, Line 7: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. > In theory some of the lines in this commit message should be shorter I tried to shorten some stuff. Didn't get all of it. http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@21 PS2, Line 21: base-sequence-scanner.cc. > did you consider an approach like adding something like this to the top lev So, an HdfsScanNode can have multiple files, of different file formats. And then it uses static methods to call IssueInitialRanges(). So I wasn't able to shoe-horn in some object-oriented stuff in there. I didn't want to take on a bigger refactoring. Here's the relevant snippet from hdfs-scan-node-base.cc. for (const auto& entry : matching_per_type_files) { if (entry.second.empty()) continue; switch (entry.first) { case THdfsFileFormat::PARQUET: RETURN_IF_ERROR(HdfsParquetScanner::IssueInitialRanges(this, entry.second)); break; case THdfsFileFormat::TEXT: RETURN_IF_ERROR(HdfsTextScanner::IssueInitialRanges(this, entry.second)); break; case THdfsFileFormat::SEQUENCE_FILE: case THdfsFileFormat::RC_FILE: case THdfsFileFormat::AVRO: RETURN_IF_ERROR(BaseSequenceScanner::IssueInitialRanges(this, entry.second)); break; case THdfsFileFormat::ORC: RETURN_IF_ERROR(HdfsOrcScanner::IssueInitialRanges(this, entry.second)); break; default: DCHECK(false) << "Unexpected file type " << entry.first; } } http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@38 PS2, Line 38: required ~30 threads to be spinning in the kernel. We were able to use > typo Done http://gerrit.cloudera.org:8080/#/c/12097/2//COMMIT_MSG@74 PS2, Line 74: I se > typo Done http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@266 PS2, Line 266: Must n > Must? Done http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@277 PS2, Line 277: threads > impala has plugins? This is related to the Impala lzo plugin. The task here is to change the signature in that repo, and then delete this signature. There's no compatibility guarantee that I have to keep around; I just wanted to avoid having to teach some test infrastructure about multiple branches. Anyway, I didn't see [[deprecated]] in use elsewhere, and this should be short-lived, so I'm going to ignore it at the moment. http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@287 PS2, Line 287: Tuple* InitTemplateTuple(const std::vector& value_evals, > can you DCHECK that the result doesn't go below zero? Done http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@471 PS2, Line 471: m a slot's path to its index into ma > yea, I was also confused by the inverted naming here. Sorry about the confusing naming. As you can imagine, I went back and forth between booleans and ints and stuff here. I went with "remaining_scan_range_issuances_". http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.cc File be/src/exec/hdfs-scan-node-base.cc: http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.cc@453 PS2, Line 453: > is it important to not decrement this in the error cases below on line 471- It has to be decremented *after* IssueInitialRanges(), since, if you don't, all the scanner threads might exit and you'll hang. I was trying to catch all the error cases and always decrement in those. I think the easiest way to do it is to wrap inside of another function, so I'll do that. http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node.h File be/src/exec/hdfs-scan-node.h: http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node.h@139 PS2, Line 139: : /// Number of times scanner thread didn't find work to do. : RuntimeProfile::Counter* scanner_thread_workless_loops_counter_ = nullptr; > hrm, I see the value of this for your test, but not entirely convinced it m I'm curious whether or not this loop is hot in practice at our customers, and this was the easiest way to tell. I've seen this be ~10 in the benchmarking I was doing, and it'll be useful to check up on it in more cases. I considered getting rid of this whole loop and using condition
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12097 ) Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h File be/src/exec/hdfs-scan-node-base.h: http://gerrit.cloudera.org:8080/#/c/12097/2/be/src/exec/hdfs-scan-node-base.h@277 PS2, Line 277: plugins > This is related to the Impala lzo plugin. The task here is to change the si And now this is done... -- To view, visit http://gerrit.cloudera.org:8080/12097 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 Gerrit-Change-Number: 12097 Gerrit-PatchSet: 2 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Joe McDonnell Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Todd Lipcon Gerrit-Comment-Date: Wed, 23 Jan 2019 18:01:09 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7980: Fix spinning because of buggy num unqueued files .
Hello Michael Ho, Todd Lipcon, Tim Armstrong, Joe McDonnell, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12097 to look at the new patch set (#3). Change subject: IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. .. IMPALA-7980: Fix spinning because of buggy num_unqueued_files_. This commit removes num_unqueued_files_ and replaces it with a more tightly scoped and easier to reason about remaining_scan_range_issuances_ variable. This variable (and its predecessor) are used as a way to signal to scanner threads they may exit (instead of spinning) because there will never be a scan range provided to them, because no more scan ranges will be added. In practice, most scanner implementations can never call AddDiskIoRanges() after IssueInitialRanges(). The exception is sequence files and Avro, which share a common base class. Instead of incrementing and decrementing this counter in a variety of paths, this commit makes the common case simple (set to 1 initially; decrement at exit points of IssueInitialRanges()) and the complicated, sequence-file case is treated within base-sequence-scanner.cc. Note that this is not the first instance of a subtle bug in this code. The following two JIRAs (and corresponding commits) are fundamentally similar bugs: IMPALA-3798: Disable per-split filtering for sequence-based scanners IMPALA-1730: reduce scanner thread spinning windows We ran into this bug when running TPC-DS query 1 on scale factor 10,000 (10TB) on a 140-node cluster with replica_preference=remote, we observed really high system CPU usage for some of the scan nodes: HDFS_SCAN_NODE (id=6):(Total: 59s107ms, non-child: 59s107ms, % non- child: 100.00% - BytesRead: 80.50 MB (84408563) - ScannerThreadsSysTime: 36m17s Using 36 minutes of system time in only 1 minute of wall-clock time required ~30 threads to be spinning in the kernel. We were able to use perf to find a lot of usage of futex_wait() and pthread_cond_wait(). Eventually, we figured out that ScannerThreads, once started, loop forever looking for work. The case that there is no work is supposed to be rare, and the scanner threads are supposed to exit based on num_unqueued_files_ being 0, but, in some cases, that counter isn't appropriately decremented. The reproduction is any query that uses runtime filters to filter out entire files. Something like: set RUNTIME_FILTER_WAIT_TIME_MS=1; select count(*) from customer join customer_address on c_current_addr_sk = ca_address_sk where ca_street_name="DoesNotExist" and c_last_name="DoesNotExist"; triggers this behavior. This code path is covered by several existing tests, most directly in test_runtime_filters.py:test_file_filtering(). Interestingly, though this wastes cycles, query results are unaffected. I initially fixed this bug with a point fix that handled the case when runtime filters caused files to be skipped and added assertions that checked that num_unqueued_files_ was decremented to zero when queries finished. Doing this led me, somewhat slowly, to both finding similar bugs in other parts of the code (HdfsTextScanner::IssueInitialRanges had the same bug if the entire file was skipped) and fighting with races on the assertion itself. I eventually concluded that there's really no shared synchronization between progress_.Done() and num_unqueued_files_. The same conclusion is true for the current implementation, so there aren't assertions. I added a metric for how many times the scanners run through their loop without doing any work and observed it to be non-zero for a query from tests/query_test/test_runtime_filters.py:test_wait_time. To measure the effect of this, I set up a cluster of 9 impalad's and 1 coordinator, running against an entirely remote HDFS. The machines were r4.4xlarge and the remote disks were EBS st1's, though everything was likely buffer cached. I ran TPCDS-Q1 with RUNTIME_FILTER_WAIT_TIME_MS=2000 against tpcds_1000_decimal_parquet 10 times. The big observable thing is that ScannerThreadsSysTime went from 5.6 seconds per query to 1.9 seconds per query. (I ran the text profiles through the old-fashioned: grep ScannerThreadsSysTime profiles | awk '/ms/ { x += $3/1000 } /ns/ { x += $3/100 } END { print x }' ) The query time effect was quite small (the fastest query was 3.373s with the change adn 3.82s without the change, but the averages were tighter), but the extra work was visible in the profiles. This bug was co-debugged by Todd Lipcon, Joe McDonnell, and Philip Zeyliger. Change-Id: I133de13238d3d05c510e2ff771d48979125735b1 --- M be/src/exec/base-sequence-scanner.cc M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/hdfs-scan-node-base.h M be/src/exec/hdfs-scan-node.cc M be/src/exec/hdfs-scan-node.h M be/src/exec/hdfs-scanner.cc M be/src/exec/hdfs-text-scanner.cc M
[Impala-ASF-CR] Using 'master' branch of Impala-lzo and allowing test-with-docker to configure it.
Philip Zeyliger has uploaded this change for review. ( http://gerrit.cloudera.org:8080/12259 Change subject: Using 'master' branch of Impala-lzo and allowing test-with-docker to configure it. .. Using 'master' branch of Impala-lzo and allowing test-with-docker to configure it. This updates bootstrap_system.sh to check out the 'master' branch of Impala-lzo. (I've separately updated the 'master' branch to be identical to today's cdh5-trunk branch; it had grown a few years stale.) I've also added support to teasing the configuration through test-with-docker. This allows for Impala 2.x and 3.x to diverge here, and it allows for testing changes to Impala-lzo. Change-Id: Ieba45fc18d9e490f75d16c477cdc1cce26f41ce9 --- M bin/bootstrap_system.sh M docker/entrypoint.sh M docker/test-with-docker.py 3 files changed, 22 insertions(+), 2 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/12259/1 -- To view, visit http://gerrit.cloudera.org:8080/12259 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: Ieba45fc18d9e490f75d16c477cdc1cce26f41ce9 Gerrit-Change-Number: 12259 Gerrit-PatchSet: 1 Gerrit-Owner: Philip Zeyliger
[Impala-ASF-CR] IMPALA-8091: incremental improvements to NTP sync
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12253 ) Change subject: IMPALA-8091: incremental improvements to NTP sync .. Patch Set 1: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/12253 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ifc8cacf81765d132dd574993f8149231bdbb7bc6 Gerrit-Change-Number: 12253 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Brown Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Jim Apple Gerrit-Reviewer: Laszlo Gaal Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Wed, 23 Jan 2019 00:02:20 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-6664: Tag log statements with fragment or query ids.
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12129 ) Change subject: IMPALA-6664: Tag log statements with fragment or query ids. .. Patch Set 5: (3 comments) http://gerrit.cloudera.org:8080/#/c/12129/2/be/src/common/thread-debug-info.h File be/src/common/thread-debug-info.h: http://gerrit.cloudera.org:8080/#/c/12129/2/be/src/common/thread-debug-info.h@a116 PS2, Line 116: > Thanks! Do you plan to fix it in this change, or in a follow-up commit? Thanks for catching that I didn't do this. I fixed it. Turned out to be nice and easy. http://gerrit.cloudera.org:8080/#/c/12129/5/be/src/service/impala-hs2-server.cc File be/src/service/impala-hs2-server.cc: http://gerrit.cloudera.org:8080/#/c/12129/5/be/src/service/impala-hs2-server.cc@126 PS5, Line 126: ScopedThreadContext(GetThreadDebugInfo(), query_ctx.query_id); > Currently it creates a temporary object that is deleted promptly. Whoops. Thanks for catching this. Done. http://gerrit.cloudera.org:8080/#/c/12129/5/be/src/service/impala-internal-service.cc File be/src/service/impala-internal-service.cc: http://gerrit.cloudera.org:8080/#/c/12129/5/be/src/service/impala-internal-service.cc@49 PS5, Line 49: ScopedThreadContext(GetThreadDebugInfo(), params.query_ctx.query_id); > It creates a temporary that is deleted promptly. Done -- To view, visit http://gerrit.cloudera.org:8080/12129 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I6634ef9d1a7346339f24f2d40a7a3aa36a535da8 Gerrit-Change-Number: 12129 Gerrit-PatchSet: 5 Gerrit-Owner: Philip Zeyliger Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Paul Rogers Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Reviewer: Zoltan Borok-Nagy Gerrit-Reviewer: Zoram Thanga Gerrit-Comment-Date: Tue, 22 Jan 2019 20:46:54 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-7731: Add Read/Exchange counters to profile
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12229 ) Change subject: IMPALA-7731: Add Read/Exchange counters to profile .. Patch Set 3: Code-Review+2 (4 comments) Thanks! http://gerrit.cloudera.org:8080/#/c/12229/3/be/src/runtime/coordinator-backend-state.cc File be/src/runtime/coordinator-backend-state.cc: http://gerrit.cloudera.org:8080/#/c/12229/3/be/src/runtime/coordinator-backend-state.cc@524 PS3, Line 524: //if (!p->metadata().__isset.plan_node_id) continue; ?? http://gerrit.cloudera.org:8080/#/c/12229/3/be/src/runtime/coordinator-backend-state.cc@531 PS3, Line 531: RuntimeProfile::Counter* bytes_sent = p->GetCounter("TotalBytesSent"); This this be a constant somewhere? http://gerrit.cloudera.org:8080/#/c/12229/3/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: http://gerrit.cloudera.org:8080/#/c/12229/3/be/src/runtime/coordinator.cc@790 PS3, Line 790: // node. Is it the case that TotalBytesSent == TotalScanBytesSent + TotalInnerBytesSent ? Do we have a place to DCHECK assert that and/or test it? http://gerrit.cloudera.org:8080/#/c/12229/3/tests/query_test/test_observability.py File tests/query_test/test_observability.py: http://gerrit.cloudera.org:8080/#/c/12229/3/tests/query_test/test_observability.py@390 PS3, Line 390: def test_exchange_scan_ratio_non_zero(self): Should we add small tests for the other new metrics you have? -- To view, visit http://gerrit.cloudera.org:8080/12229 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: Ife7ec78fe42558429c1cbe6e5eba79842bffd648 Gerrit-Change-Number: 12229 Gerrit-PatchSet: 3 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Csaba Ringhofer Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Philip Zeyliger Gerrit-Comment-Date: Fri, 18 Jan 2019 22:40:15 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-6664: Tag log statements with fragment or query ids.
Hello Paul Rogers, Zoltan Borok-Nagy, Zoram Thanga, Tim Armstrong, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/12129 to look at the new patch set (#6). Change subject: IMPALA-6664: Tag log statements with fragment or query ids. .. IMPALA-6664: Tag log statements with fragment or query ids. This implements much of the desire in IMPALA-6664 to tag all log statements with their query ids. It re-uses the existing ThreadDebugInfo infrastructure as well as the existing InstallLogMessageListenerFunction() patch to glog (currently used for log redaction) to prefix log messages with fragment ids or query ids, when available. The fragment id is the query id with the last bits incremented, so it's possible to correlate a given query's log messages. For example: $ grep 85420d575b9ff4b9:402b8868 logs/cluster/impalad.INFO I0108 10:39:16.453958 14752 impala-server.cc:1052] 85420d575b9ff4b9:402b8868] Registered query query_id=85420d575b9ff4b9:402b8868 session_id=aa45e480434f0516:101ae5ac12679d94 I0108 10:39:16.454738 14752 Frontend.java:1242] 85420d575b9ff4b9:402b8868] Analyzing query: select count(*) from tpcds.web_sales I0108 10:39:16.456627 14752 Frontend.java:1282] 85420d575b9ff4b9:402b8868] Analysis finished. I0108 10:39:16.463538 14818 admission-controller.cc:598] 85420d575b9ff4b9:402b8868] Schedule for id=85420d575b9ff4b9:402b8868 in pool_name=default-pool per_host_mem_estimate=180.02 MB PoolConfig: max_requests=-1 max_queued=200 max_mem=-1.00 B I0108 10:39:16.463603 14818 admission-controller.cc:603] 85420d575b9ff4b9:402b8868] Stats: agg_num_running=0, agg_num_queued=0, agg_mem_reserved=0, local_host(local_mem_admitted=0, num_admitted_running=0, num_queued=0, backend_mem_reserved=0) I0108 10:39:16.463780 14818 admission-controller.cc:635] 85420d575b9ff4b9:402b8868] Admitted query id=85420d575b9ff4b9:402b8868 I0108 10:39:16.463896 14818 coordinator.cc:93] 85420d575b9ff4b9:402b8868] Exec() query_id=85420d575b9ff4b9:402b8868 stmt=select count(*) from tpcds.web_sales I0108 10:39:16.464795 14818 coordinator.cc:356] 85420d575b9ff4b9:402b8868] starting execution on 2 backends for query_id=85420d575b9ff4b9:402b8868 I0108 10:39:16.466384 24891 impala-internal-service.cc:49] ExecQueryFInstances(): query_id=85420d575b9ff4b9:402b8868 coord=pannier.sf.cloudera.com:22000 #instances=2 I0108 10:39:16.467339 14818 coordinator.cc:370] 85420d575b9ff4b9:402b8868] started execution on 2 backends for query_id=85420d575b9ff4b9:402b8868 I0108 10:39:16.467536 14823 query-state.cc:579] 85420d575b9ff4b9:402b8868] Executing instance. instance_id=85420d575b9ff4b9:402b8868 fragment_idx=0 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=1 I0108 10:39:16.467627 14824 query-state.cc:579] 85420d575b9ff4b9:402b88680001] Executing instance. instance_id=85420d575b9ff4b9:402b88680001 fragment_idx=1 per_fragment_instance_idx=0 coord_state_idx=0 #in-flight=2 I0108 10:39:16.820933 14824 query-state.cc:587] 85420d575b9ff4b9:402b88680001] Instance completed. instance_id=85420d575b9ff4b9:402b88680001 #in-flight=1 status=OK I0108 10:39:17.122299 14823 krpc-data-stream-mgr.cc:294] 85420d575b9ff4b9:402b8868] DeregisterRecvr(): fragment_instance_id=85420d575b9ff4b9:402b8868, node=2 I0108 10:39:17.123500 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22001 remaining=2 query_id=85420d575b9ff4b9:402b8868 I0108 10:39:17.123509 24038 coordinator-backend-state.cc:265] query_id=85420d575b9ff4b9:402b8868: first in-progress backend: pannier.sf.cloudera.com:22000 I0108 10:39:17.167752 14752 impala-beeswax-server.cc:197] 85420d575b9ff4b9:402b8868] get_results_metadata(): query_id=85420d575b9ff4b9:402b8868 I0108 10:39:17.168762 14752 coordinator.cc:483] 85420d575b9ff4b9:402b8868] ExecState: query id=85420d575b9ff4b9:402b8868 execution completed I0108 10:39:17.168808 14752 coordinator.cc:608] 85420d575b9ff4b9:402b8868] Coordinator waiting for backends to finish, 1 remaining. query_id=85420d575b9ff4b9:402b8868 I0108 10:39:17.168880 14823 query-state.cc:587] 85420d575b9ff4b9:402b8868] Instance completed. instance_id=85420d575b9ff4b9:402b8868 #in-flight=0 status=OK I0108 10:39:17.168977 14821 query-state.cc:252] UpdateBackendExecState(): last report for 85420d575b9ff4b9:402b8868 I0108 10:39:17.174401 24038 coordinator.cc:709] Backend completed: host=pannier.sf.cloudera.com:22000 remaining=1 query_id=85420d575b9ff4b9:402b8868 I0108 10:39:17.174513 14752 coordinator.cc:814] 85420d575b9ff4b9:402b8868] Release admission control
[Impala-ASF-CR] IMPALA-7694: Add host resource usage metrics to profile
Philip Zeyliger has posted comments on this change. ( http://gerrit.cloudera.org:8080/12069 ) Change subject: IMPALA-7694: Add host resource usage metrics to profile .. Patch Set 10: Code-Review+2 What was the bug you found? -- To view, visit http://gerrit.cloudera.org:8080/12069 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I3aedc20c553ab8d7ed50f72a1a936eba151487d9 Gerrit-Change-Number: 12069 Gerrit-PatchSet: 10 Gerrit-Owner: Lars Volker Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Philip Zeyliger Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 18 Jan 2019 20:32:36 + Gerrit-HasComments: No