After a second look to KryoSerializer I fear that Input and Output are
never closed..am I right?
On Tue, Jun 7, 2016 at 3:06 PM, Flavio Pompermaier
wrote:
> Hi Aljoscha,
> of course I can :)
> Thanks for helping me..do you think it is the right thing to do calling
>
Hi Aljoscha,
of course I can :)
Thanks for helping me..do you think it is the right thing to do calling
reset()?
Actually, I don't know whether this is meaningful or not, but I already ran
the job successfully once on the cluster (a second attempt is curerntly
running) after my accidental
That's nice. Can you try it on your cluster with an added "reset" call on
the buffer?
On Tue, 7 Jun 2016 at 14:35 Flavio Pompermaier wrote:
> After "some" digging into this problem I'm quite convinced that the
> problem is caused by a missing reset of the buffer during the
After "some" digging into this problem I'm quite convinced that the problem
is caused by a missing reset of the buffer during the Kryo deserialization,
likewise to what has been fixed by FLINK-2800 (
https://github.com/apache/flink/pull/1308/files).
That fix added an output.clear() in
Unless someone really invests time into debugging this I fear that the
different misspellings are not really helpful, Flavio.
On Mon, Jun 6, 2016 at 10:31 AM, Flavio Pompermaier
wrote:
> This time I had the following exception (obviously
>
The last week I've been able to run the job several times without any
error. then I just recompiled it and the error reappered :(
This time I had:
java.lang.Exception: The data preparation for task 'CHAIN CoGroup
(CoGroup at main(DataInference.java:372)) -> Map (Map at
Hi Flavio,
can you privately share the source code of your Flink job with me?
I'm wondering whether the issue might be caused by a version mixup between
different versions on the cluster (different JVM versions? or different
files in the lib/ folder?), How are you deploying the Flink job?
I tried to reproduce the error on a subset of the data and actually
reducing the available memory and increasing a lot the gc (creating a lot
of useless objects in one of the first UDFs) caused this error:
Caused by: java.io.IOException: Thread 'SortMerger spilling thread'
terminated due to an
Hi!
That is a pretty thing indeed :-) Will try to look into this in a few
days...
Stephan
On Fri, May 27, 2016 at 12:10 PM, Flavio Pompermaier
wrote:
> Running the job with log level set to DEBUG made the job run
> successfully...Is this meaningful..? Maybe slowing down
Running the job with log level set to DEBUG made the job run
successfully...Is this meaningful..? Maybe slowing down a little bit the
threads could help serialization?
On Thu, May 26, 2016 at 12:34 PM, Flavio Pompermaier
wrote:
> Still not able to reproduce the error
Still not able to reproduce the error locally but remotly :)
Any suggestions about how to try to reproduce it locally on a subset of the
data?
This time I had:
com.esotericsoftware.kryo.KryoException: Unable to find class: ^Z^A
at
Do you have any suggestion about how to reproduce the error on a subset of
the data?
I'm trying changing the following but I can't find a configuration causing
the error :(
rivate static ExecutionEnvironment getLocalExecutionEnv() {
org.apache.flink.configuration.Configuration c = new
The error look really strange. Flavio, could you compile a test program
with example data and configuration to reproduce the problem. Given that,
we could try to debug the problem.
Cheers,
Till
Till mentioned the fact that 'spilling on disk' was managed through
exception catch. The last serialization error was related to bad management
of Kryo buffer that was not cleaned after spilling on exception management.
Is it possible we are dealing with an issue similar to this but caused by
You can try with this:
import org.apache.flink.api.java.ExecutionEnvironment;
import org.joda.time.DateTime;
import de.javakaffee.kryoserializers.jodatime.JodaDateTimeSerializer;
public class DateTimeError {
public static void main(String[] args) throws Exception {
What error do you get when you don't register the Kryo serializer?
On Mon, May 23, 2016 at 11:57 AM, Flavio Pompermaier
wrote:
> With this last settings I was able to terminate the job the second time I
> retried to run it, without restarting the cluster..
> If I don't
Hi Flavio,
These error messages are quite odd. Looks like an off by one error in the
serializer/deserializer. Must be somehow related to the Kryo serialization
stack because it doesn't seem to occur with Flink's serialization system.
Does the job run fine if you don't register the custom Kryo
Changing
- taskmanager.memory.fraction, from 0.7 to 0.9
- taskmanager.memory.off-heap, from true to false
- decreasing the slots of each tm from 2 to 1
I had this Exception:
java.lang.Exception: The data preparation for task 'GroupReduce
(GroupReduce at main(AciDataInference.java:331))'
Changing
- taskmanager.memory.fraction, from 0.9 to 0.7
- taskmanager.memory.off-heap, from false to true
- decreasing the slots of each tm from 3 to 2
I had this error:
2016-05-23 09:55:42,534 ERROR
org.apache.flink.runtime.operators.BatchTask - Error in
task code:
I've slightly modified the program to shorten the length on the entire job
and this time I had this Exception:
2016-05-23 09:26:51,438 ERROR
org.apache.flink.runtime.io.disk.iomanager.IOManager - IO Thread
'IOManager writer thread #1' terminated due to an exception. Shutting down
I/O
I think this bug comes from something in
SpillingAdaptiveSpanningRecordDeserializer..I've tried to find a common
point of failure in all those messages and I found that it contains also
this error message that I got once:
private static final String BROKEN_SERIALIZATION_ERROR_MESSAGE =
Right now I'm using Flink 1.0.2...to which version should I downgrade?
The hardware seems to be ok..how could I detect a faulty hardware?
These errors appeared in every run of my job after I moved the temporary
directory from ssd to hdd and I extended my pipeline with a dataset that
grows as the
The problem seems to occur quite often.
Did you update your Flink version recently? If so, could you try to
downgrade and see if the problem disappears.
Is it otherwise possible that it is cause by faulty hardware?
2016-05-20 18:05 GMT+02:00 Flavio Pompermaier :
> This
This time (Europed instead of Europe):
java.lang.Exception: The data preparation for task 'CHAIN GroupReduce
(GroupReduce at createResult(PassaggioWithComprAndVend.java:132)) ->
Map (Key Extractor)' , caused an error: Error obtaining the sorted
input: Thread 'SortMerger spilling thread'
This time another error (rerialization instead of serialization):
com.esotericsoftware.kryo.KryoException: Unable to find class:
it.okkam.flink.entitons.*rerialization*.pojo.EntitonQuadPojo
at
com.esotericsoftware.kryo.util.DefaultClassResolver.readName(DefaultClassResolver.java:138)
Hi Ufuk,
my records could be quite large Pojos (I think some MB).
The only thing I do to configure Kryo is:
env.registerTypeWithKryoSerializer(DateTime.class,
JodaDateTimeSerializer.class );
Best,
Flavio
On Fri, May 20, 2016 at 12:38 PM, Ufuk Celebi wrote:
> @Stefano: the
@Stefano: the records are serialized anyway for batch jobs. The
spilling deserializer is only relevant if single records are very
large. How large are your records? In any case, I don't expect this to
be the problem.
@Flavio: The class name "typo" errors (Vdhicle instead of Vehicle and
ttil
Today I've got this other strange error.. Obviously I don't have a
VdhicleEvent class, but a VehicleEvent class :(
java.lang.RuntimeException: Cannot instantiate class.
at
org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize(PojoSerializer.java:407)
at
That exception showed just once, but the following happens randomly (if I
re-run the job after stopping and restartign the cluster it doesn't show up
usually):
Caused by: java.io.IOException: Serializer consumed more bytes than the
record had. This indicates broken serialization. If you are using
Hi to all,
in my last run of a job I received this weird Kryo Exception in one of the
TaskManager...obviously this class in not mentioned anywhere, neither in my
project nor in flink...
Any help is appreciated!
Best,
Flavio
INFO org.apache.flink.runtime.taskmanager.Task - CHAIN GroupReduce
30 matches
Mail list logo