Re: Issue with Inputting large 5GB file through input operator

Munagala Ramanath Thu, 30 Jun 2016 11:12:34 -0700

To add some additional explanation to what Sandesh said, it looks like your
operator is
dying or getting killed after some time, so you should look at the
Application Master logs to find
out why this is happening.


When it goes down, a new operator is created and state from an earlier
checkpoint is
restored; to do this, default constructors must be available for all
non-transient non-primitive fields.
Otherwise, you get the Kryo exception.

Ram

On Thu, Jun 30, 2016 at 9:38 AM, Doyle, Austin O. <
austin.o.doyl...@leidos.com> wrote:

> I have a 4 data node apex system using Hadoop distribution CDH-5.6.0 and
> Apex version incubator-apex-core-3.3.0-incubating.  I have an application
> that should input (stream) a file from Amazon S3.  If the file (which is a
> tar.gz file with a single file inside which has json lines delimited by a
> new line) is small (maybe a thousand records) there are no issues.  If I
> try a 5GB file, the stream works for a while and I am able to process maybe
> 200,000 of the 1,000,000 records (amount changes every time, sometimes more
> processed sometimes less) and then exceptions are thrown such as:
>
>
>
> -com.esotericsoftware.kryo.KryoException: Class cannot be created (missing
> no-arg constructor): com.amazonaws.internal.StaticCredentialsProvider
>
>
>
> -java.lang.IllegalStateException: Deploy request failed:
> [OperatorDeployInfo
>
>
>
> -WARN com.datatorrent.netlet.OptimizedEventLoop: Exception on unattached
> SelectionKey sun.nio.ch.SelectionKeyImpl@29369cb9
>
> java.io.IOException: Broken pipe
>
>
>
> Once these exceptions come up, the KyroExceptions continuously come up and
> no more data is processed.  Is there something that needs to be done in the
> Apex operators to handle processing large files(Streams)?
>
>
>

Re: Issue with Inputting large 5GB file through input operator

Reply via email to