Thanks for the replies :) I managed to get it working following the instructions here <https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/building.html#vendor-specific-versions>, but I found a few issues that I guess were specific to HDInsight, or at least to the HDP version it uses. Trying to summarize:
Hadoop version After running “hadoop version”, the result was “2.7.3.2.6.1.3-4”. However, when building I was getting errors that some dependencies from the Hortonworks repo were not found, for instance zookeeper “3.4.6.2.6.1.3-4”. I browsed to the Hortonworks repo <http://repo.hortonworks.com/content/repositories/releases/org/apache/zookeeper/zookeeper/> to find a suitable version, so I ended up using 2.7.3.2.6.1.31-3 instead. Scala version I also had issues with dependencies if I tried using Scala version 2.11.11, so I compiled agains 2.11.7. So, the maven command I used was this: mvn install -DskipTests -Dscala.version=2.11.7 -Pvendor-repos -Dhadoop.version=2.7.3.2.6.1.31-3 Azure Jars With all that, I still had class not found errors errors when trying to start my Flink session, for instance "java.lang.ClassNotFoundException: Class org.apache.hadoop.fs.adl.HdiAdlFileSystem”. To fix that, I had to find the azure-specific jars I needed to use. For that, I checked which jars Spark was using and copied / symlinked them into Flink’s “lib” directory: /usr/hdp/current/spark2-client/jars/*azure*.jar /usr/lib/hdinsight-datalake/adls2-oauth2-token-provider.jar Guava Jar Finally, for some reason my jobs were failing (the Cassandra driver was complaining about the Guava version being too old, although I had the right version in my assembled jar). I just downloaded the version I needed (in my case, 23.0 <http://central.maven.org/maven2/com/google/guava/guava/23.0/guava-23.0.jar>) and also put that into Flink’s lib directory. I hope it helps other people trying to run Flink on Azure HDInsight :) Kind regards, Albert > On Aug 31, 2017, at 8:18 PM, Banias H <banias4sp...@gmail.com> wrote: > > We had the same issue. Get the hdp version, from > /usr/hdp/current/hadoop-client/hadoop-common-<version>.jar for example. Then > rebuild flink from src: > mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=<version> > > for example: mvn clean install -DskipTests -Pvendor-repos > -Dhadoop.version=2.7.3.2.6.1.0-129 > > Copy and setup build-target/ to the cluster. Export HADOOP_CONF_DIR and > YARN_CONF_DIR according to your env. You should have no problem starting the > session. > > > On Wed, Aug 30, 2017 at 6:45 AM, Federico D'Ambrosio <fedex...@gmail.com > <mailto:fedex...@gmail.com>> wrote: > Hi, > What is your "hadoop version" output? I'm asking because you said your hadoop > distribution is in /usr/hdp so it looks like you're using Hortonworks HDP, > just like myself. So, this would be a third party distribution and you'd need > to build Flink from source according to this: > https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/building.html#vendor-specific-versions > > <https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/building.html#vendor-specific-versions> > > Federico D'Ambrosio > > Il 30 ago 2017 13:33, "albert" <alb...@datacamp.com > <mailto:alb...@datacamp.com>> ha scritto: > Hi Chesnay, > > Thanks for your reply. I did download the binaries matching my Hadoop > version (2.7), that's why I was wondering if the issue had something to do > with the exact hadoop version flink is compiled again, or if there might be > things that are missing in my environment. > > > > -- > Sent from: > http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ > <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/> >