Thanks for the replies :)

I managed to get it working following the instructions here 
<https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/building.html#vendor-specific-versions>,
 but I found a few issues that I guess were specific to HDInsight, or at least 
to the HDP version it uses. Trying to summarize:

Hadoop version 
After running “hadoop version”, the result was “2.7.3.2.6.1.3-4”.
However, when building I was getting errors that some dependencies from the 
Hortonworks repo were not found, for instance zookeeper “3.4.6.2.6.1.3-4”.
I browsed to the Hortonworks repo 
<http://repo.hortonworks.com/content/repositories/releases/org/apache/zookeeper/zookeeper/>
 to find a suitable version, so I ended up using 2.7.3.2.6.1.31-3 instead.

Scala version
I also had issues with dependencies if I tried using Scala version 2.11.11, so 
I compiled agains 2.11.7.

So, the maven command I used was this:
mvn install -DskipTests -Dscala.version=2.11.7 -Pvendor-repos 
-Dhadoop.version=2.7.3.2.6.1.31-3

Azure Jars
With all that, I still had class not found errors errors when trying to start 
my Flink session, for instance "java.lang.ClassNotFoundException: Class 
org.apache.hadoop.fs.adl.HdiAdlFileSystem”.
To fix that, I had to find the azure-specific jars I needed to use. For that, I 
checked which jars Spark was using and copied / symlinked them into Flink’s 
“lib” directory:
/usr/hdp/current/spark2-client/jars/*azure*.jar
/usr/lib/hdinsight-datalake/adls2-oauth2-token-provider.jar

Guava Jar
Finally, for some reason my jobs were failing (the Cassandra driver was 
complaining about the Guava version being too old, although I had the right 
version in my assembled jar).
I just downloaded the version I needed (in my case, 23.0 
<http://central.maven.org/maven2/com/google/guava/guava/23.0/guava-23.0.jar>) 
and also put that into Flink’s lib directory.

I hope it helps other people trying to run Flink on Azure HDInsight :)

Kind regards,

Albert

> On Aug 31, 2017, at 8:18 PM, Banias H <banias4sp...@gmail.com> wrote:
> 
> We had the same issue. Get the hdp version, from 
> /usr/hdp/current/hadoop-client/hadoop-common-<version>.jar for example. Then 
> rebuild flink from src:
> mvn clean install -DskipTests -Pvendor-repos -Dhadoop.version=<version>
> 
> for example: mvn clean install -DskipTests -Pvendor-repos 
> -Dhadoop.version=2.7.3.2.6.1.0-129
> 
> Copy and setup build-target/ to the cluster. Export HADOOP_CONF_DIR and 
> YARN_CONF_DIR according to your env. You should have no problem starting the 
> session.
> 
> 
> On Wed, Aug 30, 2017 at 6:45 AM, Federico D'Ambrosio <fedex...@gmail.com 
> <mailto:fedex...@gmail.com>> wrote:
> Hi,
> What is your "hadoop version" output? I'm asking because you said your hadoop 
> distribution is in /usr/hdp so it looks like you're using Hortonworks HDP, 
> just like myself. So, this would be a third party distribution and you'd need 
> to build Flink from source according to this: 
> https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/building.html#vendor-specific-versions
>  
> <https://ci.apache.org/projects/flink/flink-docs-release-1.3/setup/building.html#vendor-specific-versions>
> 
> Federico D'Ambrosio
> 
> Il 30 ago 2017 13:33, "albert" <alb...@datacamp.com 
> <mailto:alb...@datacamp.com>> ha scritto:
> Hi Chesnay,
> 
> Thanks for your reply. I did download the binaries matching my Hadoop
> version (2.7), that's why I was wondering if the issue had something to do
> with the exact hadoop version flink is compiled again, or if there might be
> things that are missing in my environment.
> 
> 
> 
> --
> Sent from: 
> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/ 
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/>
> 

Reply via email to