>>> In the WordCount example, while creating the Tokenizer Vertex, neither the 
>>> parallelism or VertexLocation hints is specified. My guess is that at 
>>> runtime, based on InputInitializer, these values are populated.
Correct, the parallelism and VertexLocation is specified at runtime by 
InputInitializer

>>> What should I do such that location of the tasks for the Tokenizer vertex 
>>> are not based on HDFS splits but can be arbitrarily configured while 
>>> creation ?
Do you mean your input is not hdfs file ?  In that case I think you need to 
create your own DataSourceDescriptor. You can refer the DataSourceDescriptor 
that is used by WordCount example as following.  If possible, let us know more 
about your context. What kind of data is your input ? And how would you specify 
the VertexLocation for your input ?


    DataSourceDescriptor dataSource = MRInput.createConfigBuilder(new 
Configuration(tezConf),

        TextInputFormat.class, 
inputPath).groupSplits(!isDisableSplitGrouping()).build();



Best Regard,
Jeff Zhang


From: Raajay <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, September 10, 2015 at 1:10 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: Error of setting vertex location hints

I am just getting started with understanding tez code, so bear with me; I might 
be wrong here.

In the WordCount example, while creating the Tokenizer Vertex, neither the 
parallelism or VertexLocation hints is specified. My guess is that at runtime, 
based on InputInitializer, these values are populated.

However, I do not want them to be populated at runtime, but rather want them 
specified while creating the DAG itself. When I do that, I get the exception 
mentioned in the previous mail.

What should I do such that location of the tasks for the Tokenizer vertex are 
not based on HDFS splits but can be arbitrarily configured while creation ?

Raajay



On Thu, Sep 10, 2015 at 12:01 AM, Jianfeng (Jeff) Zhang 
<[email protected]<mailto:[email protected]>> wrote:

Actually Tokenizer vertex should already have the VertexLocationHints from the 
hdfs file split info at runtime. Did you see any unexpected behavior ?



Best Regard,
Jeff Zhang


From: Raajay <[email protected]<mailto:[email protected]>>
Reply-To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Date: Thursday, September 10, 2015 at 12:35 PM
To: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Error of setting vertex location hints

In the WordCount example, I am trying to fix the location of map tasks by 
providing "VertexLocationHints" to the "tokenizer" vertex.

However, the application fails with an exception (stacktrace below). I guess it 
is because, the vertex manager expects the parallelism to be -1, so that it can 
compute it.


What minimal modification to the example would avoid invoking the VertexManager 
and allow me use my own customized VertexLocationHint ?


Thanks
Raajay



DAG diagnostics: [Vertex failed, vertexName=Tokenizer, 
vertexId=vertex_1441839249749_0017_1_00, diagnostics=[Vertex 
vertex_1441839249749_0017_1_00 [Tokenizer] killed/failed due 
to:AM_USERCODE_FAILURE, Exception in VertexManager, 
vertex:vertex_1441839249749_0017_1_00 [Tokenizer], 
java.lang.IllegalStateException: Parallelism for the vertex should be set to -1 
if the InputInitializer is setting parallelism, VertexName: Tokenizer
        at 
com.google.common.base.Preconditions.checkState(Preconditions.java:145)
        at 
org.apache.tez.dag.app.dag.impl.RootInputVertexManager.onRootVertexInitialized(RootInputVertexManager.java:60)
        at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEventRootInputInitialized.invoke(VertexManager.java:610)
        at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:631)
        at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent$1.run(VertexManager.java:626)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
        at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:626)
        at 
org.apache.tez.dag.app.dag.impl.VertexManager$VertexManagerEvent.call(VertexManager.java:615)
        at java.util.concurrent.FutureTask.run(FutureTask.java:262)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
], Vertex killed, vertexName=Summation, 
vertexId=vertex_1441839249749_0017_1_01, diagnostics=[Vertex received Kill in 
INITED state., Vertex vertex_1441839249749_0017_1_01 [Summation] killed/failed 
due to:null], DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 
killedVertices:1]
DAG did not succeed


Reply via email to