If I understand this right, there is a jar with user code in it. The jar needs 
to be available during split creation but it is not available.

Is split creation happening on the client or on the AM. If its happening on the 
AM, and the AM is not getting the jars then how are you specifying the jars to 
be sent to the AM. There are different ways to do it.

1)      Set tez.aux.uris in tez-site.xml to an HDFS location and copy user jars 
there

2)      Upload the user jar to HDFS and create a YARN local resource for it. 
Then use either of the following to add the local resource to the AM/DAG that 
needs it.

a.       TezClient#addAppMasterLocalFiles(…)

b.      DAG#addTaskLocalFiles(…)

Not sure what is meant by classic Hadoop style jars?

Bikas

From: Chris K Wensel [mailto:[email protected]]
Sent: Wednesday, June 17, 2015 4:41 PM
To: [email protected]
Cc: [email protected]
Subject: Re: ClassNotFoundException with custom InputFormat.

cross posting down to dev… should continue the discussion there I believe.

as I understand it, all Cascading users familiar with packaging a Hadoop job 
jar with a lib folder, in which the packaged custom InputFormat is placed — 
pulled from maven etc, will have this issue.

this also expands to projects on top of Cascading including Scalding and 
Cascalog.

oddly the org.apache.tez.client.AMConfiguration has a

private Map<String, String> env;

but is unused.

On Jun 17, 2015, at 4:32 PM, Andre Kelpe 
<[email protected]<mailto:[email protected]>> wrote:

Hi,
we are currently running into a problem when a user of Cascading uses a custom 
InputFormat with Tez. The ApplicationMaster is running into a 
ClassNotFoundException when calculating the splits, since we are unable to 
control the environment/classpath visibile to the ApplicationMaster. We have a 
work-around, where the users have to supply a fat-jar to make it work, but we 
need to be able to support other ways as well.

When interacting with the DAG, we are able to pass along a custom 
environment/classpath, but that API is missing on the TezClient, causing the 
AppMaster to fail, when the user is using classic hadoop style jars (embedded 
lib directory).

In order to get lingual, our SQL layer on top of Cascading to work correctly, 
we need a way to supply the environment in a more dynamic way then one fatjar, 
so it would be great if the API could be extendend to do that.
I have opened https://issues.apache.org/jira/browse/TEZ-2563
Thanks!

- André

--
André Kelpe
[email protected]<mailto:[email protected]>
http://concurrentinc.com<http://concurrentinc.com/>

—
Chris K Wensel
[email protected]<mailto:[email protected]>



Reply via email to