Ok, thanks for the update Ufuk! Let me know if you need test or anything! Best, Flavio
On Wed, Oct 12, 2016 at 11:26 AM, Ufuk Celebi <u...@apache.org> wrote: > No, sorry. I was waiting for Tarandeep's feedback before looking into > it further. I will do it over the next days in any case. > > On Wed, Oct 12, 2016 at 10:49 AM, Flavio Pompermaier > <pomperma...@okkam.it> wrote: > > Hi Ufuk, > > any news on this? > > > > On Thu, Oct 6, 2016 at 1:30 PM, Ufuk Celebi <u...@apache.org> wrote: > >> > >> I guess that this is caused by a bug in the checksum calculation. Let > >> me check that. > >> > >> On Thu, Oct 6, 2016 at 1:24 PM, Flavio Pompermaier < > pomperma...@okkam.it> > >> wrote: > >> > I've ran the job once more (always using the checksum branch) and this > >> > time > >> > I got: > >> > > >> > Caused by: java.lang.ArrayIndexOutOfBoundsException: 1953786112 > >> > at > >> > > >> > org.apache.flink.api.common.typeutils.base.EnumSerializer. > deserialize(EnumSerializer.java:83) > >> > at > >> > > >> > org.apache.flink.api.common.typeutils.base.EnumSerializer. > deserialize(EnumSerializer.java:32) > >> > at > >> > > >> > org.apache.flink.api.java.typeutils.runtime. > PojoSerializer.deserialize(PojoSerializer.java:431) > >> > at > >> > > >> > org.apache.flink.api.java.typeutils.runtime. > TupleSerializer.deserialize(TupleSerializer.java:135) > >> > at > >> > > >> > org.apache.flink.api.java.typeutils.runtime. > TupleSerializer.deserialize(TupleSerializer.java:30) > >> > at > >> > > >> > org.apache.flink.runtime.io.disk.ChannelReaderInputViewIterator.next( > ChannelReaderInputViewIterator.java:100) > >> > at > >> > > >> > org.apache.flink.runtime.operators.sort.MergeIterator$ > HeadStream.nextHead(MergeIterator.java:161) > >> > at > >> > > >> > org.apache.flink.runtime.operators.sort.MergeIterator. > next(MergeIterator.java:113) > >> > at > >> > > >> > org.apache.flink.runtime.operators.util.metrics. > CountingMutableObjectIterator.next(CountingMutableObjectIterator.java:45) > >> > at > >> > > >> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator. > advanceToNext(NonReusingKeyGroupedIterator.java:130) > >> > at > >> > > >> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator. > access$300(NonReusingKeyGroupedIterator.java:32) > >> > at > >> > > >> > org.apache.flink.runtime.util.NonReusingKeyGroupedIterator$ > ValuesIterator.next(NonReusingKeyGroupedIterator.java:192) > >> > at > >> > > >> > org.okkam.entitons.mapping.flink.IndexMappingExecutor$ > TupleToEntitonJsonNode.reduce(IndexMappingExecutor.java:64) > >> > at > >> > > >> > org.apache.flink.runtime.operators.GroupReduceDriver. > run(GroupReduceDriver.java:131) > >> > at org.apache.flink.runtime.operators.BatchTask.run( > BatchTask.java:486) > >> > at > >> > org.apache.flink.runtime.operators.BatchTask.invoke( > BatchTask.java:351) > >> > at org.apache.flink.runtime.taskmanager.Task.run(Task.java:585) > >> > at java.lang.Thread.run(Thread.java:745) > >> > > >> > > >> > On Thu, Oct 6, 2016 at 11:00 AM, Ufuk Celebi <u...@apache.org> wrote: > >> >> > >> >> Yes, if that's the case you should go with option (2) and run with > the > >> >> checksums I think. > >> >> > >> >> On Thu, Oct 6, 2016 at 10:32 AM, Flavio Pompermaier > >> >> <pomperma...@okkam.it> wrote: > >> >> > The problem is that data is very large and usually cannot run on a > >> >> > single > >> >> > machine :( > >> >> > > >> >> > On Thu, Oct 6, 2016 at 10:11 AM, Ufuk Celebi <u...@apache.org> > wrote: > >> >> >> > >> >> >> On Wed, Oct 5, 2016 at 7:08 PM, Tarandeep Singh > >> >> >> <tarand...@gmail.com> > >> >> >> wrote: > >> >> >> > @Stephan my flink cluster setup- 5 nodes, each running 1 > >> >> >> > TaskManager. > >> >> >> > Slots > >> >> >> > per task manager: 2-4 (I tried varying this to see if this has > any > >> >> >> > impact). > >> >> >> > Network buffers: 5k - 20k (tried different values for it). > >> >> >> > >> >> >> Could you run the job first on a single task manager to see if the > >> >> >> error occurs even if no network shuffle is involved? That should > be > >> >> >> less overhead for you than running the custom build (which might > be > >> >> >> buggy ;)). > >> >> >> > >> >> >> – Ufuk > >> >> > > >> >> > > >> >> > > >> >> > > >> > > >> > > >> > > > > > > > >