If I understand this right, there is a jar with user code in it. The jar needs to be available during split creation but it is not available.
Is split creation happening on the client or on the AM. If its happening on the AM, and the AM is not getting the jars then how are you specifying the jars to be sent to the AM. There are different ways to do it. 1) Set tez.aux.uris in tez-site.xml to an HDFS location and copy user jars there 2) Upload the user jar to HDFS and create a YARN local resource for it. Then use either of the following to add the local resource to the AM/DAG that needs it. a. TezClient#addAppMasterLocalFiles(…) b. DAG#addTaskLocalFiles(…) Not sure what is meant by classic Hadoop style jars? Bikas From: Chris K Wensel [mailto:[email protected]] Sent: Wednesday, June 17, 2015 4:41 PM To: [email protected] Cc: [email protected] Subject: Re: ClassNotFoundException with custom InputFormat. cross posting down to dev… should continue the discussion there I believe. as I understand it, all Cascading users familiar with packaging a Hadoop job jar with a lib folder, in which the packaged custom InputFormat is placed — pulled from maven etc, will have this issue. this also expands to projects on top of Cascading including Scalding and Cascalog. oddly the org.apache.tez.client.AMConfiguration has a private Map<String, String> env; but is unused. On Jun 17, 2015, at 4:32 PM, Andre Kelpe <[email protected]<mailto:[email protected]>> wrote: Hi, we are currently running into a problem when a user of Cascading uses a custom InputFormat with Tez. The ApplicationMaster is running into a ClassNotFoundException when calculating the splits, since we are unable to control the environment/classpath visibile to the ApplicationMaster. We have a work-around, where the users have to supply a fat-jar to make it work, but we need to be able to support other ways as well. When interacting with the DAG, we are able to pass along a custom environment/classpath, but that API is missing on the TezClient, causing the AppMaster to fail, when the user is using classic hadoop style jars (embedded lib directory). In order to get lingual, our SQL layer on top of Cascading to work correctly, we need a way to supply the environment in a more dynamic way then one fatjar, so it would be great if the API could be extendend to do that. I have opened https://issues.apache.org/jira/browse/TEZ-2563 Thanks! - André -- André Kelpe [email protected]<mailto:[email protected]> http://concurrentinc.com<http://concurrentinc.com/> — Chris K Wensel [email protected]<mailto:[email protected]>
