On Tue, Jun 9, 2020 at 3:59 AM Hashan Gayasri <hashan.gaya...@gmail.com> wrote:
> Thanks for the quick response Tim! Setting the classpath according to > the file set-classpath.sh resolved both crashes. > > Regarding the set-classpath.sh file - > In Impala v3.3.0 release, in addition to files from maven user home, > the script adds the following lines: > * impala-3.3.0/fe/src/test/resources > * impala-3.3.0/fe/target/classes > * impala-3.3.0/fe/target/dependency > * impala-3.3.0/fe/target/test-classes > * $HIVE_HOME/lib/datanucleus-api-jdo-3.2.1.jar > * $HIVE_HOME/lib/datanucleus-core-3.2.2.jar > * $HIVE_HOME/lib/datanucleus-rdbms-3.2.1.jar > > But the "datanucleus" versions don't correspond to those of the actual > jar files in the said path. Do the correct datanucleus-*.jar files > need to be added to the classpath? From what I noticed, only > "fe/target/dependency" and "fe/target/classes" were actually needed > out of the above. Is it okay to just keep those in the classpath? Yeah, there is some cruft in the classpath - you can safely remove the datanucleus stuff and the various references to fe/src and fe/target - those are added for the purposes of various tests. > > Also, I couldn't figure out which source/release in github would > correspond to "hadoop-3.0.0-cdh6.x-SNAPSHOT" as the > "rel/release-3.0.0" tag didn't contain the source file you linked > (even though the default branch does). > This would be a CDH release derived from Hadoop 3 - essentially the Hadoop 3 release with a bunch of patches from later versions on top of it. This is kinda messy since the sources aren't published to a public place (they used to be, but things have changed in various ways). There are some source artifacts included in the tarballs we build against. This is something I'd like to clean up - our current story here around dependencies isn't great and is just the way it is for historical reasons. We did do this decoupling for Kudu recently, would be nice to do it for more components. > > Is there a publicly available page that lists down the version > requirements of the dependencies? Specifically the Apache Kudu, > Hadoop, Hive, and HBase version requirements since I'm planning to use > locally compiled versions of the said components for the Impala build. > I noticed that the "impala-config.sh" file contains the exact versions > of the dependent components. But is there a version compatibility > matrix or something similar? > Nothing formal - I think the general feeling in the community is that we don't want to claim to support things unless we're thoroughly testing them each release. It is genuinely a lot of work to pull together a bunch of versions of components that works well together, doesn't have security vulnerabilities, etc. We were able to build against some fairly divergent versions of dependencies (Hive2 vs Hive 3, etc), but that required a bunch of shims and can be a bit brittle. I'd expect it's possible to build against a wide range of source versions of the dependencies, but it might require tweaks to work around minor issues (different versions of dependencies, minor changes to APIs, etc). The hardcoded CDH/CDP versions are definitely the well-beaten path. As far as wire compatibility. As a general rule the client/server protocols of the various dependent services are forward compatible. I.e. older clients can talk with newer servers. In practice they're also often backward compatible, e.g. the HDFS client protocol is very stable. In the same file, I noted that there are sometimes even major version > differences in the CDH version vs the CDP version. Which version > should I use to if I am to use github releases of the above mentioned > dependent components? > We've been moving towards using the newer CDP dependencies - CDH was the default for the Impala 3.x release though. Probably the biggest difference is the Hive version cause we integrate most closely with that - the CDH set of dependencies is built around Hive 2, and the CDP is built around Hive 3. > In order to use the locally built versions of Apache Kudu, Hadoop, > Hive, and HBase, would it be sufficient to set the following variables > or are there more steps involved? > > * DOWNLOAD_CDH_COMPONENTS=false > * KUDU_BUILD_DIR and KUDU_CLIENT_DIR > * HIVE_SRC_DIR_OVERRIDE > * HADOOP_INCLUDE_DIR_OVERRIDE and HADOOP_LIB_DIR_OVERRIDE > That looks right to me. > > > > Thank you. > > Regards, > Hashan > > > On Thu, Jun 4, 2020 at 11:12 PM Tim Armstrong <tarmstr...@cloudera.com> > wrote: > > > > The first crash is a symptom of some classes being missing from the > classpath. If you look at the code where it crashed, it's loading a bunch > of HDFS classes - > https://github.com/apache/hadoop/blob/18c57cf0464f4d1fa95899d75b2f59cae33c7c33/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c#L69 > > > > You need a lot of things on the classpath, you really need something > automated to set it up correctly. In the dev environment we generate a file > that contains the classpath > https://github.com/apache/impala/blob/master/bin/set-classpath.sh#L45 > > > > On Thu, Jun 4, 2020 at 3:20 AM Hashan Gayasri <hashan.gaya...@gmail.com> > wrote: > >> > >> Hi all, > >> > >> I've been trying to get Impala (v3.3.0) that I compiled running. Upon > >> startup, Catalogd (Impalad binary) seems to crash in native code. > >> In a dynamically-linked, debug build, the stack trace was as follows. > >> > >> [1] > >> ... > >> #4 0x00007ffff1fd0a05 in JVM_handle_linux_signal () from > >> /home/hashan/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so > >> #5 0x00007ffff1fc3cd8 in signalHandler(int, siginfo*, void*) () from > >> /home/hashan/jdk1.8.0_191/jre/lib/amd64/server/libjvm.so > >> #6 <signal handler called> > >> #7 initCachedClass (cachedJclass=<optimized out>, > >> className=<optimized out>, env=0x0) > >> at > /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:54 > >> #8 initCachedClasses (env=0x0) at > >> > /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jclasses.c:117 > >> #9 0x00007ffff26a3f62 in getJNIEnv () at > >> > /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/jni_helper.c:555 > >> #10 0x00007ffff26aa3b1 in hdfsBuilderConnect (bld=0x389b2c0) > >> at > /container.redhat6/build/cdh/hadoop/3.0.0-cdh6.x-SNAPSHOT/rpm/BUILD/hadoop-3.0.0-cdh6.x-SNAPSHOT/hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs/hdfs.c:697 > >> #11 0x00007ffff77540e3 in impala::JniUtil::InitLibhdfs () at > >> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/util/jni-util.cc:215 > >> #12 0x00007ffff7753660 in impala::JniUtil::Init () at > >> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/util/jni-util.cc:132 > >> #13 0x00007ffff7e84146 in impala::InitCommonRuntime (argc=1, > >> argv=0x7fffffff65c8, init_jvm=true, > >> test_mode=impala::TestInfo::NON_TEST) at > >> /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/common/init.cc:364 > >> #14 0x00007ffff3c31bdc in CatalogdMain (argc=1, argv=0x7fffffff65c8) > >> at > /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalogd-main.cc:62 > >> #15 0x00000000008c60ef in main (argc=1, argv=0x7fffffff65c8) at > >> > /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/service/daemon-main.cc:41 > >> ... > >> > >> The loaded hdfs native library was: > >> > impala-3.3.0/toolchain/cdh_components-1173663/hadoop-3.0.0-cdh6.x-SNAPSHOT/lib/native/libhdfs.so.0.0.0 > >> > >> > >> After getting the same results when using the native hdfs library > >> (libhdfs.so.0.0.0) shipped with the rpm package > >> "hadoop-libhdfs-3.0.0+cdh6.3.0-1279813.el7.x86_64", > >> I tried using the libhdfs.so.0.0.0 library compiled from hadoop v3.1.3 > >> github sources. This seemed to pass the previous stage. > >> > >> [2] > >> This time the /tmp/catalogd.ERROR file contained: > >> > >> E0604 09:50:31.550122 69397 catalog.cc:91] NoClassDefFoundError: > >> org/apache/hadoop/hive/metastore/api/Database > >> CAUSED BY: ClassNotFoundException: > org.apache.hadoop.hive.metastore.api.Database > >> . Impalad exiting. > >> loadFileSystems error: > >> ClassNotFoundException: > >> org.apache.hadoop.fs.FileSystemjava.lang.NoClassDefFoundError: > >> org/apache/hadoop/fs/FileSystem > >> Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.fs.FileSystem > >> ... > >> hdfsBuilderConnect(forceNewInstance=0, nn=default, port=0, > >> kerbTicketCachePath=(NULL), userName=(NULL)) error: > >> ClassNotFoundException: > >> org.apache.hadoop.conf.Configurationjava.lang.NoClassDefFoundError: > >> org/apache/hadoop/conf/Configuration > >> Caused by: java.lang.ClassNotFoundException: > >> org.apache.hadoop.conf.Configuration > >> ... > >> > >> [3] > >> After adding the following jar files to the CLASSPATH, > >> * hive-3.1.2/lib/hive-metastore-3.1.2.jar > >> * hive-3.1.2/lib/hive-standalone-metastore-3.1.2.jar > >> * hadoop-3.1.3/share/hadoop/client/hadoop-client-runtime-3.1.3.jar > >> > >> /tmp/catalogd.ERROR file contained: > >> Caused by: java.lang.ClassNotFoundException: > org.apache.hadoop.fs.FileSystem > >> Caused by: java.lang.ClassNotFoundException: > >> org.apache.hadoop.conf.Configuration > >> > >> In both latter cases using the newly compiled libhdfs.so.0.0.0 > >> library, although the /tmp/catalogd.ERROR is different, the crash > >> would occur at: > >> catalog-server.cc:252 > >> (gdb) bt > >> #0 impala::CatalogServer::Start (this=0x7fffffff6030) at > >> > /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalog-server.cc:252 > >> #1 0x00007ffff3c3235e in CatalogdMain (argc=1, argv=0x7fffffff65c8) > >> at > /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/catalog/catalogd-main.cc:87 > >> #2 0x00000000008c60ef in main (argc=1, argv=0x7fffffff65c8) at > >> > /home/hashan/BUILD/impala-3.3.0/impala-3.3.0/be/src/service/daemon-main.cc:41 > >> (gdb) l > >> 252 catalog_.reset(new Catalog()); > >> (gdb) p catalog_ > >> $2 = {px = 0x0} > >> > >> > >> 1) Does anyone have some idea why the first issue arises when using > >> the native hdfs library built as a part of the toolchain? > >> > >> 2) Does anyone know if the issue in 2nd and 3rd runs (using locally > >> built libhdfs) is actually related to missing JAR files and if so, > >> which JAR files are missing from the classpath? > >> > >> I'm sorry for the length of this mail. Any help in resolving these > >> issues would be greatly appreciated. > >> > >> Thanks in advance. > >> > >> Regards, > >> Hashan Gayasri > > > > -- > -Hashan Gayasri >