That did it. Thanks Josh On Mon, Jun 22, 2015 at 3:59 PM Josh Wills <[email protected]> wrote:
> The InputSplit on the MapContext implements the InputSupplier interface, > which allows you to get the underlying FileSplit that the map task is > processing. So you have to do a bunch of casting, but you can get at it. > > On Monday, June 22, 2015, David Ortiz <[email protected]> wrote: > >> Gave it a shot in the following MapFn, but it seems to always return null. >> >> new MapFn<String, Pair<String, String>>() { >> >> private static final long serialVersionUID = 1L; >> int min = minColumns; >> int max = maxColumns; >> >> @Override >> public Pair<String, String> map(String input) { >> //int columns = StringUtils.countMatches(input, "\t") + 1; >> int columns = input.split("\t").length; >> if (columns >= min && columns <= max) { >> StringBuilder output = new StringBuilder(input); >> output.append('\t'); >> String loc = >> this.getContext().getConfiguration().get(TaskInputOutputContext.MAP_INPUT_FILE); >> output.append(loc); >> return new Pair<>(output.toString(), null); >> } else { >> return new Pair<>(null, input); >> } >> } >> >> } >> >> >> Also tried setting crunch.disable.combine.file to true figuring that combine >> files might mess with it. No dice. Does anything look suspect in that >> snippet? >> >> >> Thanks, >> >> Dave >> >> >> On Mon, Jun 22, 2015 at 2:41 PM Micah Whitacre <[email protected]> >> wrote: >> >>> The DoFn should give you access to the TaskInputOutputContext[1] which >>> should contain that information. I believe the context then should hold >>> the file as a config like "MAP_INPUT_FILE". I haven't really tested >>> this out so definitely verify. >>> >>> >>> [1] - >>> https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapreduce/TaskInputOutputContext.html >>> >>> On Mon, Jun 22, 2015 at 1:28 PM, David Ortiz <[email protected]> wrote: >>> >>>> Hello, >>>> >>>> Is there a way in my crunch pipeline that I can retrieve the file >>>> name of the input file for my MapFn? This function is definitely applied >>>> as a Mapper, so I think it should be possible, just having some difficulty >>>> working through the exact method of doing so. >>>> >>>> Thanks, >>>> Dave >>>> >>> >>> > > -- > Director of Data Science > Cloudera <http://www.cloudera.com> > Twitter: @josh_wills <http://twitter.com/josh_wills> > >
