I tried to reproduce the error on a subset of the data and actually
reducing the available memory and increasing a lot the gc (creating a lot
of useless objects in one of the first UDFs) caused this error:

Caused by: java.io.IOException: Thread 'SortMerger spilling thread'
terminated due to an exception: / by zero
at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:800)
Caused by: java.lang.ArithmeticException: / by zero
at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.getSegmentsForReaders(UnilateralSortMerger.java:1651)
at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.mergeChannelList(UnilateralSortMerger.java:1565)
at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$SpillingThread.go(UnilateralSortMerger.java:1417)
at
org.apache.flink.runtime.operators.sort.UnilateralSortMerger$ThreadBase.run(UnilateralSortMerger.java:796)

I hope this could help to restrict the debugging area :)

Best,
Flavio

On Fri, May 27, 2016 at 8:21 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> That is a pretty thing indeed :-) Will try to look into this in a few
> days...
>
> Stephan
>
>
> On Fri, May 27, 2016 at 12:10 PM, Flavio Pompermaier <pomperma...@okkam.it
> > wrote:
>
>> Running the job with log level set to DEBUG made the job run
>> successfully...Is this meaningful..? Maybe slowing down a little bit the
>> threads could help serialization?
>>
>>
>> On Thu, May 26, 2016 at 12:34 PM, Flavio Pompermaier <
>> pomperma...@okkam.it> wrote:
>>
>>> Still not able to reproduce the error locally but remotly :)
>>> Any suggestions about how to try to reproduce it locally on a subset of
>>> the data?
>>> This time I had:
>>>
>>> com.esotericsoftware.kryo.KryoException: Unable to find class: ^Z^A
>>>         at
>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
>>>         at
>>> com.esotericsoftware.kryo.util.DefaultClassResolver.readClass(DefaultClassResolver.java:115)
>>>         at com.esotericsoftware.kryo.Kryo.readClass(Kryo.java:641)
>>>         at
>>> com.esotericsoftware.kryo.Kryo.readClassAndObject(Kryo.java:752)
>>>         at
>>> org.apache.flink.api.java.typeutils.runtime.kryo.KryoSerializer.deserialize(KryoSerializer.java:228)
>>>         at
>>> org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:431)
>>>         at
>>> org.apache.flink.runtime.plugable.NonReusingDeserializationDelegate.read(NonReusingDeserializationDelegate.java:55)
>>>         at
>>> org.apache.flink.runtime.io.network.api.serialization.SpillingAdaptiveSpanningRecordDeserializer.getNextRecord(SpillingAdaptiveSpanningRecordDeserializer.java:124)
>>>         at
>>> org.apache.flink.runtime.io.network.api.reader.AbstractRecordReader.getNextRecord(AbstractRecordReader.java:65)
>>>         at
>>> org.apache.flink.runtime.io.network.api.reader.MutableRecordReader.next(MutableRecordReader.java:34)
>>>         at
>>> org.apache.flink.runtime.operators.util.ReaderIterator.next(ReaderIterator.java:73)
>>>         at
>>> org.apache.flink.runtime.operators.MapDriver.run(MapDriver.java:96)
>>>         at
>>> org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:480)
>>>         at
>>> org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:345)
>>>         at org.apache.flink.runtime.taskmanager.Task.run(Task.java:559)
>>>         at java.lang.Thread.run(Thread.java:745)
>>>
>>> Best,
>>> Flavio
>>>
>>>
>>> On Tue, May 24, 2016 at 5:47 PM, Flavio Pompermaier <
>>> pomperma...@okkam.it> wrote:
>>>
>>>> Do you have any suggestion about how to reproduce the error on a subset
>>>> of the data?
>>>> I'm trying changing the following but I can't find a configuration
>>>> causing the error :(
>>>>
>>>> rivate static ExecutionEnvironment getLocalExecutionEnv() {
>>>>         org.apache.flink.configuration.Configuration c = new
>>>> org.apache.flink.configuration.Configuration();
>>>>         c.setString(ConfigConstants.TASK_MANAGER_TMP_DIR_KEY, "/tmp");
>>>>         c.setString(ConfigConstants.BLOB_STORAGE_DIRECTORY_KEY,"/tmp");
>>>>         c.setFloat(ConfigConstants.TASK_MANAGER_MEMORY_FRACTION_KEY,
>>>> 0.9f);
>>>>         c.setLong(ConfigConstants.LOCAL_NUMBER_TASK_MANAGER, 4);
>>>>         c.setLong(ConfigConstants.TASK_MANAGER_NUM_TASK_SLOTS, 4);
>>>>         c.setString(ConfigConstants.AKKA_ASK_TIMEOUT, "10000 s");
>>>>         c.setLong(ConfigConstants.TASK_MANAGER_NETWORK_NUM_BUFFERS_KEY,
>>>> 2048 * 12);
>>>>         ExecutionEnvironment env =
>>>> ExecutionEnvironment.createLocalEnvironment(c);
>>>>         env.setParallelism(16);
>>>>         env.registerTypeWithKryoSerializer(DateTime.class,
>>>> JodaDateTimeSerializer.class );
>>>>         return env;
>>>>     }
>>>>
>>>> Best,
>>>> Flavio
>>>>
>>>>
>>>> On Tue, May 24, 2016 at 11:13 AM, Till Rohrmann <trohrm...@apache.org>
>>>> wrote:
>>>>
>>>>> The error look really strange. Flavio, could you compile a test
>>>>> program with example data and configuration to reproduce the problem. 
>>>>> Given
>>>>> that, we could try to debug the problem.
>>>>>
>>>>> Cheers,
>>>>> Till
>>>>>
>>>>
>>>>
>>>
>>
>

Reply via email to