Make the AWS credentials related objects transient and initialize them in a operator Setup call.
Troubleshooting guide has the more details about the Kryo exceptions. http://docs.datatorrent.com/troubleshooting/ On Thu, Jun 30, 2016 at 9:38 AM Doyle, Austin O. < [email protected]> wrote: > I have a 4 data node apex system using Hadoop distribution CDH-5.6.0 and > Apex version incubator-apex-core-3.3.0-incubating. I have an application > that should input (stream) a file from Amazon S3. If the file (which is a > tar.gz file with a single file inside which has json lines delimited by a > new line) is small (maybe a thousand records) there are no issues. If I > try a 5GB file, the stream works for a while and I am able to process maybe > 200,000 of the 1,000,000 records (amount changes every time, sometimes more > processed sometimes less) and then exceptions are thrown such as: > > > > -com.esotericsoftware.kryo.KryoException: Class cannot be created (missing > no-arg constructor): com.amazonaws.internal.StaticCredentialsProvider > > > > -java.lang.IllegalStateException: Deploy request failed: > [OperatorDeployInfo > > > > -WARN com.datatorrent.netlet.OptimizedEventLoop: Exception on unattached > SelectionKey sun.nio.ch.SelectionKeyImpl@29369cb9 > > java.io.IOException: Broken pipe > > > > Once these exceptions come up, the KyroExceptions continuously come up and > no more data is processed. Is there something that needs to be done in the > Apex operators to handle processing large files(Streams)? > > >
