Hi Kostas, Thanks for the info. That error caused by I built your code along with not up-to-date baseline. I rebased my branch build, and there's no more such issue. I've been testing, and until now have some questions/issues as below:
1. I'm not able to write to S3 with the following URI format: *s3*://<path>, and had to use *s3a*://<path>. Is this behaviour expected? (I am running Flink on AWS EMR, and I thought that EMR provides a wrapper for HDFS over S3 with something called EMRFS). 2. Occasionally/randomly I got the below message ( parquet_error1.log <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/parquet_error1.log> ). I'm using ParquetAvroWriters.forReflectRecord() method to write Scala case classes. Re-running the job doesn't get that error at the same data location, so I don't think that there's issue with data. *java.lang.ArrayIndexOutOfBoundsException: <some random number>* /at org.apache.parquet.column.values.dictionary.DictionaryValuesWriter$PlainBinaryDictionaryValuesWriter.fallBackDictionaryEncodedData/. 3. Sometimes I got this error message when I use parallelism of 8 for the sink ( parquet_error2.log <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/parquet_error2.log> ). Reducing to 2 solves the issue. But is it possible to increase the pool size? I could not find any place that I can change the /fs.s3.maxconnections/ parameter. /java.io.InterruptedIOException: initiate MultiPartUpload on Test/output/dt=2018-09-20/part-7-5: org.apache.flink.fs.s3base.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool/ 4. Where is the temporary folder that you store the parquet file before uploading to S3? Thanks a lot for your help. Best regards, Averell -- Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/