I have a 4 data node apex system using Hadoop distribution CDH-5.6.0 and Apex 
version incubator-apex-core-3.3.0-incubating.  I have an application that 
should input (stream) a file from Amazon S3.  If the file (which is a tar.gz 
file with a single file inside which has json lines delimited by a new line) is 
small (maybe a thousand records) there are no issues.  If I try a 5GB file, the 
stream works for a while and I am able to process maybe 200,000 of the 
1,000,000 records (amount changes every time, sometimes more processed 
sometimes less) and then exceptions are thrown such as:

-com.esotericsoftware.kryo.KryoException: Class cannot be created (missing 
no-arg constructor): com.amazonaws.internal.StaticCredentialsProvider

-java.lang.IllegalStateException: Deploy request failed: [OperatorDeployInfo

-WARN com.datatorrent.netlet.OptimizedEventLoop: Exception on unattached 
SelectionKey 
sun.nio.ch.SelectionKeyImpl@29369cb9<mailto:sun.nio.ch.SelectionKeyImpl@29369cb9>
java.io.IOException: Broken pipe

Once these exceptions come up, the KyroExceptions continuously come up and no 
more data is processed.  Is there something that needs to be done in the Apex 
operators to handle processing large files(Streams)?

Reply via email to