Chris,
Are you referring to TEZ-2237, or is there some additional work that isn't
reported yet ?
2237 isn't related to Broadcast edges (not just yet anyway); I don't think
memory is causing issues either - increasing it is an optimization.

If you're running into additional issues outside of TEZ-2237, can you
please open another jira or post details.

Thanks
- Sid

Thanks

On Fri, Apr 3, 2015 at 12:52 PM, Chris K Wensel <[email protected]> wrote:

> a quick update.
>
> we have been working to identify red herrings throughout the logs (one of
> which is the exception in the subject).
>
> outside of those, we have noticed trouble around a vertex broadcasting to
> two vertices. here is the edge definition (remember, there are two edges
> from the source vertex)
>
> edgeValues.outputClassName = UnorderedKVOutput.class.getName();
> edgeValues.inputClassName = UnorderedKVInput.class.getName();
>
> edgeValues.movementType = EdgeProperty.DataMovementType.BROADCAST;
> edgeValues.sourceType = EdgeProperty.DataSourceType.PERSISTED;
> edgeValues.schedulingType = EdgeProperty.SchedulingType.SEQUENTIAL;
>
> I don’t have the full logs (Cyrille may be able to follow up), but it
> seems the vertices receiving the broadcast are the ones having troubles.
>
> they are also HashJoins, so memory concerns are being looked at (the logs
> seem to be shouting something about that).
>
> but I wanted to double check if broadcasting to two vertices from a single
> has known issues.
>
> that said, i’m trying to see why these plans are being created and if
> Cascading can prevent/minimize/not-aggravate this issue. from a quick look,
> in this context, I think there is some redundancy sneaking in that needs to
> be addressed.
>
> ckw
>
> On Mar 26, 2015, at 3:17 AM, Cyrille Chépélov <[email protected]>
> wrote:
>
>  Hi,
>
> I'm the original victim :) just sent up TEZ-2237.
>
> Sent as much logs as was practical up to this point; can supply on a
> direct basis as much as required to nail the issue.
>
> To give some context: these two failing DAG are part of a meta-DAG
> comprised of 20 distinct DAG, all generated through scalding-cascading (in
> cascading terms, there is one Cascade with 20 Jobs. When the same cascade
> is run with the traditional "hadoop" fabric instead of the experimental TEZ
> backend, this results in 460 separate MR jobs).
>
> While the 20-legged meta-DAG monster hasn't ever completed under TEZ yet,
> the progress made in the last few weeks is very encouraging, hinting at
> very significant speedups compared to MR; we definitely want to help
> getting to the point we can compare the outputs.
>
>     -- Cyrille
>
> -------- Message transféré --------
>
>
> *Reply-To: *[email protected]
>  *Subject: **Re: BufferTooSmallException*
>  *From: *Hitesh Shah <[email protected]>
>  *Date: *March 23, 2015 at 1:11:45 PM PDT
>  *To: *[email protected]
>
> Hi Chris,
>
> I don’t believe this issue has been seen before. Could you file a jira for
> this with the full application logs ( obtained via bin/yarn logs
> -application ) and the configuration used?
>
> thanks
> — Hitesh
>
> On Mar 23, 2015, at 1:01 PM, Chris K Wensel <[email protected]> wrote:
>
> Hey all
>
> We have a user running Scalding, on Cascading3, on Tez. This exception
> tends to crop up for DAGs that hang indefinitely (this DAG has 140
> vertices).
>
> It looks like the flag exception BufferTooSmallException isn’t being
> caught and forcing the buffer to reset. Nor is the exception, when passed
> up to the thread, causing the Node/DAG to fail.
>
> Or is this a misinterpretation.
>
> ckw
>
>
> 2015-03-23 11:32:40,445 INFO [TezChild]
> writers.UnorderedPartitionedKVWriter: Moving to next buffer and triggering
> spill
> 2015-03-23 11:32:40,496 INFO [UnorderedOutSpiller
> [E61683F3D94D46C2998CDC61CD112750]] writers.UnorderedPartitionedKVWriter:
> Finished spill 1
> 2015-03-23 11:32:40,496 INFO [UnorderedOutSpiller
> [E61683F3D94D46C2998CDC61CD112750]] writers.UnorderedPartitionedKVWriter:
> Spill# 1 complete.
> 2015-03-23 11:32:41,185 ERROR [TezChild]
> hadoop.TupleSerialization$SerializationElementWriter: failed serializing
> token: 181 with classname: scala.Tuple2
>
> org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$BufferTooSmallException
>        at
> org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$ByteArrayOutputStream.write(UnorderedPartitionedKVWriter.java:651)
>        at
> org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter$ByteArrayOutputStream.write(UnorderedPartitionedKVWriter.java:646)
>        at java.io.DataOutputStream.write(DataOutputStream.java:88)
>        at java.io.DataOutputStream.writeInt(DataOutputStream.java:198)
>        at
> com.twitter.chill.hadoop.KryoSerializer.serialize(KryoSerializer.java:50)
>        at
> cascading.tuple.hadoop.TupleSerialization$SerializationElementWriter.write(TupleSerialization.java:705)
>        at
> cascading.tuple.io.TupleOutputStream.writeElement(TupleOutputStream.java:114)
>        at
> cascading.tuple.io.TupleOutputStream.write(TupleOutputStream.java:89)
>        at
> cascading.tuple.io.TupleOutputStream.writeTuple(TupleOutputStream.java:64)
>        at
> cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:37)
>        at
> cascading.tuple.hadoop.io.TupleSerializer.serialize(TupleSerializer.java:28)
>        at
> org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:212)
>        at
> org.apache.tez.runtime.library.common.writers.UnorderedPartitionedKVWriter.write(UnorderedPartitionedKVWriter.java:194)
>        at
> cascading.flow.tez.stream.element.OldOutputCollector.collect(OldOutputCollector.java:51)
>        at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>        at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> […]
>
> —
> Chris K Wensel
> [email protected]
>
>
>
>
>
>
>  —
> Chris K Wensel
> [email protected]
>
>
>
>
>
>
>
> —
> Chris K Wensel
> [email protected]
>
>
>
>
>

Reply via email to