I found that I can still write to s3, using my Flink build of 1.1-SNAPSHOT, for example if I run the word count example: ./bin/flink run ./examples/batch/WordCount.jar --input hdfs:///tmp/LICENSE --output s3://permutive-flink/wordcount-result.txt
This works fine - it's just the RocksDBStateBackend which is erroring with the s3 URI. I'm wondering if it could be an issue with RocksDBStateBackend? On Fri, Jun 17, 2016 at 12:09 PM, Josh <jof...@gmail.com> wrote: > Hi Gordon/Fabian, > > Thanks for helping with this! Downgrading the Maven version I was using to > build Flink appears to have fixed that problem - I was using Maven 3.3.3 > before and have downgraded to 3.2.5. > > Just for reference, I printed the loaded class at runtime and found that > when I was using Flink built with Maven 3.3.3, it was pulling in: > > jar:file:/opt/flink/flink-1.1-SNAPSHOT/lib/flink-dist_2.11-1.1-SNAPSHOT.jar!/org/apache/http/params/HttpConnectionParams.class > But after building with the older Maven version, it pulled in the class > from my jar: > > jar:file:/tmp/my-assembly-1.0.jar!/org/apache/http/params/HttpConnectionParams.class > > > Unfortunately now that problem is fixed I've now got a different classpath > issue. It started with: > > java.lang.RuntimeException: java.lang.ClassNotFoundException: Class > com.amazon.ws.emr.hadoop.fs.EmrFileSystem not found > at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2227) > at > org.apache.flink.runtime.fs.hdfs.HadoopFileSystem.getHadoopWrapperClassNameForFileSystem(HadoopFileSystem.java:460) > at > org.apache.flink.core.fs.FileSystem.getHadoopWrapperClassNameForFileSystem(FileSystem.java:352) > at org.apache.flink.core.fs.FileSystem.get(FileSystem.java:280) > at > org.apache.flink.runtime.state.filesystem.FsStateBackend.validateAndNormalizeUri(FsStateBackend.java:383) > at > org.apache.flink.runtime.state.filesystem.FsStateBackend.<init>(FsStateBackend.java:175) > at > org.apache.flink.runtime.state.filesystem.FsStateBackend.<init>(FsStateBackend.java:144) > at > org.apache.flink.contrib.streaming.state.RocksDBStateBackend.<init>(RocksDBStateBackend.java:205) > > This is strange because I used an s3:// checkpoint directory when running > Flink 1.0.3 on EMR and it worked fine. (according to > https://ci.apache.org/projects/flink/flink-docs-master/setup/aws.html#provide-s3-filesystem-dependency > no configuration should be needed to use S3 when running on EMR). > > Anyway I tried executing /etc/hadoop/conf/hadoop-env.sh before running my > job, as this sets up the HADOOP_CLASSPATH env var. The exception then > changed to: > java.lang.NoClassDefFoundError: org/apache/hadoop/fs/common/Abortable > > I found that this class is related to a jar called s3-dist-cp, so then I > tried copying that jar to Flink's lib directory from > /usr/share/aws/emr/s3-dist-cp/lib/* > > And now I'm back to another Kinesis connector classpath error: > > java.lang.NoClassDefFoundError: org/apache/http/conn/ssl/SSLSocketFactory > at > com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:136) > at > com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:120) > at > com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:157) > at > com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:137) > at > org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.<init>(KinesisProxy.java:76) > at > org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumer.<init>(FlinkKinesisConsumer.java:166) > at > org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumer.<init>(FlinkKinesisConsumer.java:140) > at > org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumer.<init>(FlinkKinesisConsumer.java:123) > > I guess this is related to me adding a bunch of extra stuff to the > classpath in an attempt to solve the EmrFileSystem error. Any ideas what > caused that error in the first place? > > By the way, I built Flink with: > mvn clean install -Pinclude-kinesis,vendor-repos -DskipTests > -Dhadoop.version=2.7.1 > > Josh > > On Fri, Jun 17, 2016 at 9:56 AM, Fabian Hueske <fhue...@gmail.com> wrote: > >> Hi Josh, >> >> I assume that you build the SNAPSHOT version yourself. I had similar >> version conflicts for Apache HttpCore with Flink SNAPSHOT versions on EMR. >> The problem is cause by a changed behavior in Maven 3.3 and following >> versions. >> Due to these changes, the dependency shading is not working correctly. >> That's why we use Maven 3.2 to build the Flink release artifacts. >> >> Can you check whether you used Maven 3.3 and try to downgrade to 3.2 if >> that was the case? >> >> Cheers, Fabian >> >> 2016-06-17 8:12 GMT+02:00 Tai Gordon <tzuli...@gmail.com>: >> >>> Hi Josh, >>> >>> I’m looking into the problem. Seems like the connector is somehow using >>> older versions of httpclient. >>> Can you print the loaded class path at runtime, and check the path & >>> version of the loaded httpclient / httpcore dependency? >>> i.e. `classOf[HttpConnectionParams].getResource(" >>> HttpConnectionParams.class").toString` >>> >>> Also, on which commit was your kinesis connector built? >>> >>> Regards, >>> Gordon >>> >>> >>> On June 17, 2016 at 1:08:37 AM, Josh (jof...@gmail.com) wrote: >>> >>> Hey, >>> >>> I've been running the Kinesis connector successfully now for a couple of >>> weeks, on a Flink cluster running Flink 1.0.3 on EMR 2.7.1/YARN. >>> >>> Today I've been trying to get it working on a cluster running the >>> current Flink master (1.1-SNAPSHOT) but am running into a classpath issue >>> when starting the job. This only happens when running on EMR/YARN (it's >>> fine when running 1.1-SNAPSHOT locally, and when running 1.0.3 on EMR) >>> >>> ---- >>> The program finished with the following exception: >>> >>> java.lang.NoSuchMethodError: >>> org.apache.http.params.HttpConnectionParams.setSoKeepalive(Lorg/apache/http/params/HttpParams;Z)V >>> at >>> com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:96) >>> at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:187) >>> at >>> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:136) >>> at >>> com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:120) >>> at >>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:157) >>> at >>> com.amazonaws.services.kinesis.AmazonKinesisClient.<init>(AmazonKinesisClient.java:137) >>> at >>> org.apache.flink.streaming.connectors.kinesis.proxy.KinesisProxy.<init>(KinesisProxy.java:76) >>> at >>> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumer.<init>(FlinkKinesisConsumer.java:166) >>> at >>> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumer.<init>(FlinkKinesisConsumer.java:140) >>> at >>> org.apache.flink.streaming.connectors.kinesis.FlinkKinesisConsumer.<init>(FlinkKinesisConsumer.java:123) >>> --- >>> >>> Any ideas what's going on? >>> >>> The job I'm deploying has httpclient 4.3.6 and httpcore 4.3.3 which I >>> believe are the libraries with the HttpConnectionParams class. >>> >>> Thanks, >>> Josh >>> >>> >> >