cross posting down to dev… should continue the discussion there I believe.
as I understand it, all Cascading users familiar with packaging a Hadoop job jar with a lib folder, in which the packaged custom InputFormat is placed — pulled from maven etc, will have this issue. this also expands to projects on top of Cascading including Scalding and Cascalog. oddly the org.apache.tez.client.AMConfiguration has a private Map<String, String> env; but is unused. > On Jun 17, 2015, at 4:32 PM, Andre Kelpe <[email protected]> wrote: > > Hi, > > we are currently running into a problem when a user of Cascading uses a > custom InputFormat with Tez. The ApplicationMaster is running into a > ClassNotFoundException when calculating the splits, since we are unable to > control the environment/classpath visibile to the ApplicationMaster. We have > a work-around, where the users have to supply a fat-jar to make it work, but > we need to be able to support other ways as well. > > When interacting with the DAG, we are able to pass along a custom > environment/classpath, but that API is missing on the TezClient, causing the > AppMaster to fail, when the user is using classic hadoop style jars (embedded > lib directory). > > In order to get lingual, our SQL layer on top of Cascading to work correctly, > we need a way to supply the environment in a more dynamic way then one > fatjar, so it would be great if the API could be extendend to do that. > > I have opened https://issues.apache.org/jira/browse/TEZ-2563 > <https://issues.apache.org/jira/browse/TEZ-2563> > > Thanks! > > - André > > -- > André Kelpe > [email protected] <mailto:[email protected]> > http://concurrentinc.com <http://concurrentinc.com/> — Chris K Wensel [email protected]
