Chris, Are you referring to TEZ-2237, or is there some additional work that isn't reported yet ? 2237 isn't related to Broadcast edges (not just yet anyway); I don't think memory is causing issues either - increasing it is an optimization.
If you're running into additional issues outside of TEZ-2237, can you please open another jira or post details. Thanks - Sid Thanks On Fri, Apr 3, 2015 at 12:52 PM, Chris K Wensel <[email protected]> wrote: > a quick update. > > we have been working to identify red herrings throughout the logs (one of > which is the exception in the subject). > > outside of those, we have noticed trouble around a vertex broadcasting to > two vertices. here is the edge definition (remember, there are two edges > from the source vertex) > > edgeValues.outputClassName = UnorderedKVOutput.class.getName(); > edgeValues.inputClassName = UnorderedKVInput.class.getName(); > > edgeValues.movementType = EdgeProperty.DataMovementType.BROADCAST; > edgeValues.sourceType = EdgeProperty.DataSourceType.PERSISTED; > edgeValues.schedulingType = EdgeProperty.SchedulingType.SEQUENTIAL; > > I don’t have the full logs (Cyrille may be able to follow up), but it > seems the vertices receiving the broadcast are the ones having troubles. > > they are also HashJoins, so memory concerns are being looked at (the logs > seem to be shouting something about that). > > but I wanted to double check if broadcasting to two vertices from a single > has known issues. > > that said, i’m trying to see why these plans are being created and if > Cascading can prevent/minimize/not-aggravate this issue. from a quick look, > in this context, I think there is some redundancy sneaking in that needs to > be addressed. > > ckw > > On Mar 26, 2015, at 3:17 AM, Cyrille Chépélov <[email protected]> > wrote: > > Hi, > > I'm the original victim :) just sent up TEZ-2237. > > Sent as much logs as was practical up to this point; can supply on a > direct basis as much as required to nail the issue. > > To give some context: these two failing DAG are part of a meta-DAG > comprised of 20 distinct DAG, all generated through scalding-cascading (in > cascading terms, there is one Cascade with 20 Jobs. When the same cascade > is run with the traditional "hadoop" fabric instead of the experimental TEZ > backend, this results in 460 separate MR jobs). > > While the 20-legged meta-DAG monster hasn't ever completed under TEZ yet, > the progress made in the last few weeks is very encouraging, hinting at > very significant speedups compared to MR; we definitely want to help > getting to the point we can compare the outputs. > > -- Cyrille > > -------- Message transféré -------- > > > *Reply-To: *[email protected] > *Subject: **Re: BufferTooSmallException* > *From: *Hitesh Shah <[email protected]> > *Date: *March 23, 2015 at 1:11:45 PM PDT > *To: *[email protected] > > Hi Chris, > > I don’t believe this issue has been seen before. Could you file a jira for > this with the full application logs ( obtained via bin/yarn logs > -application ) and the configuration used? > > thanks > — Hitesh > > On Mar 23, 2015, at 1:01 PM, Chris K Wensel <[email protected]> wrote: > > Hey all > > We have a user running Scalding, on Cascading3, on Tez. This exception > tends to crop up for DAGs that hang indefinitely (this DAG has 140 > vertices). > > It looks like the flag exception BufferTooSmallException isn’t being > caught and forcing the buffer to reset. Nor is the exception, when passed > up to the thread, causing the Node/DAG to fail. > > Or is this a misinterpretation. > > ckw > > > 2015-03-23 11:32:40,445 INFO [TezChild] > writers.UnorderedPartitionedKVWriter: Moving to next buffer and triggering > spill > 2015-03-23 11:32:40,496 INFO [UnorderedOutSpiller > [E61683F3D94D46C2998CDC61CD112750]] writers.UnorderedPartitionedKVWriter: > Finished spill 1 > 2015-03-23 11:32:40,496 INFO [UnorderedOutSpiller > [E61683F3D94D46C2998CDC61CD112750]] writers.UnorderedPartitionedKVWriter: > Spill# 1 complete. > 2015-03-23 11:32:41,185 ERROR [TezChild] > hadoop.TupleSerialization$SerializationElementWriter: failed serializing > token: 181 with classname: scala.Tuple2 > > org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$BufferTooSmallException > at > org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$ByteArrayOutputStream.write(UnorderedPartitionedKVWriter.java:651) > at > org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$ByteArrayOutputStream.write(UnorderedPartitionedKVWriter.java:646) > at java.io.DataOutputStream.write(DataOutputStream.java:88) > at java.io.DataOutputStream.writeInt(DataOutputStream.java:198) > at > com.twitter.chill.hadoop.KryoSerializer.serialize(KryoSerializer.java:50) > at > cascading.tuple.hadoop.TupleSerialization$SerializationElementWriter.write(TupleSerialization.java:705) > at > cascading.tuple.io.TupleOutputStream.writeElement(TupleOutputStream.java:114) > at > cascading.tuple.io.TupleOutputStream.write(TupleOutputStream.java:89) > at > cascading.tuple.io.TupleOutputStream.writeTuple(TupleOutputStream.java:64) > at > cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:37) > at > cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:28) > at > org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:212) > at > org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:194) > at > cascading.flow.tez.stream.element.OldOutputCollector.collect(OldOutputCollector.java:51) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > […] > > — > Chris K Wensel > [email protected] > > > > > > > — > Chris K Wensel > [email protected] > > > > > > > > — > Chris K Wensel > [email protected] > > > > >
