Hi Ufuk, any news on this? On Thu, Oct 6, 2016 at 1:30 PM, Ufuk Celebi <u...@apache.org> wrote:
> I guess that this is caused by a bug in the checksum calculation. Let > me check that. > > On Thu, Oct 6, 2016 at 1:24 PM, Flavio Pompermaier <pomperma...@okkam.it> > wrote: > > I've ran the job once more (always using the checksum branch) and this > time > > I got: > > > > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112 > > at > > org.apache.flink.api.common.typeutils.base.EnumSerializer. > deserialize(EnumSerializer.java:83) > > at > > org.apache.flink.api.common.typeutils.base.EnumSerializer. > deserialize(EnumSerializer.java:32) > > at > > org.apache.flink.api.java.typeutils.runtime.PojoSerializer.deserialize( > PojoSerializer.java:431) > > at > > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize( > TupleSerializer.java:135) > > at > > org.apache.flink.api.java.typeutils.runtime.TupleSerializer.deserialize( > TupleSerializer.java:30) > > at > > org.apache.flink.runtime.io.disk.ChannelReaderInputViewIterator.next( > ChannelReaderInputViewIterator.java:100) > > at > > org.apache.flink.runtime.operators.sort.MergeIterator$ > HeadStream.nextHead(MergeIterator.java:161) > > at > > org.apache.flink.runtime.operators.sort.MergeIterator. > next(MergeIterator.java:113) > > at > > org.apache.flink.runtime.operators.util.metrics. > CountingMutableObjectIterator.next(CountingMutableObjectIterator.java:45) > > at > > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator. > advanceToNext(NonReusingKeyGroupedIterator.java:130) > > at > > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator.access$300( > NonReusingKeyGroupedIterator.java:32) > > at > > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator$ > ValuesIterator.next(NonReusingKeyGroupedIterator.java:192) > > at > > org.okkam.entitons.mapping.flink.IndexMappingExecutor$ > TupleToEntitonJsonNode.reduce(IndexMappingExecutor.java:64) > > at > > org.apache.flink.runtime.operators.GroupReduceDriver. > run(GroupReduceDriver.java:131) > > at org.apache.flink.runtime.operators.BatchTask.run(BatchTask.java:486) > > at > > org.apache.flink.runtime.operators.BatchTask.invoke(BatchTask.java:351) > > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585) > > at java.lang.Thread.run(Thread.java:745) > > > > > > On Thu, Oct 6, 2016 at 11:00 AM, Ufuk Celebi <u...@apache.org> wrote: > >> > >> Yes, if that's the case you should go with option (2) and run with the > >> checksums I think. > >> > >> On Thu, Oct 6, 2016 at 10:32 AM, Flavio Pompermaier > >> <pomperma...@okkam.it> wrote: > >> > The problem is that data is very large and usually cannot run on a > >> > single > >> > machine :( > >> > > >> > On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi <u...@apache.org> wrote: > >> >> > >> >> On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh <tarand...@gmail.com > > > >> >> wrote: > >> >> > @Stephan my flink cluster setup- 5 nodes, each running 1 > TaskManager. > >> >> > Slots > >> >> > per task manager: 2-4 (I tried varying this to see if this has any > >> >> > impact). > >> >> > Network buffers: 5k - 20k (tried different values for it). > >> >> > >> >> Could you run the job first on a single task manager to see if the > >> >> error occurs even if no network shuffle is involved? That should be > >> >> less overhead for you than running the custom build (which might be > >> >> buggy ;)). > >> >> > >> >> – Ufuk > >> > > >> > > >> > > >> > > > > > > > >