Can you gist up a patch and/or post it to a JIRA so we can take a look?
On Fri, Apr 26, 2013 at 12:30 PM, Micah Whitacre <[email protected]>wrote: > So as mentioned I'm currently trying out adding Avro Trevni support to > Crunch. I think I've gotten everything working with the exception that my > output is not being copied to the correct directory upon completion. > > I'm extending the FileTargetImpl and have the following in my > implementation: > > @Override > public void configureForMapReduce(Job job, PType<?> ptype, Path > outputPath, String name) { > ..... > configureForMapReduce(job, AvroKey.class, NullWritable.class, > AvroTrevniKeyOutputFormat.class, > outputPath, name); > > //AvroTrevniKeyOutputFormat uses this set value to write content > directly to this path. Therefore > // resetting the value with the named value. > if(name != null){ > FileOutputFormat.setOutputPath(job, new Path(outputPath, > name)); > } > > This produces the following in the crunch tmp directory: > > $ pwd > > /var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6467712912178902519/tmp-crunch.tmp.dir/crunch-1902403831/p1/output/out0 > $ ls > _SUCCESS part-m-00000 > $ cd part-m-00000/ > $ ls -l > total 8 > -rwxrwxrwx 1 mw010351 staff 493 Apr 26 13:52 part-0.trv > -rw-r--r-- 1 mw010351 staff 0 Apr 26 13:52 part-m-00000 > > the part-0.trv is the file of the most interest and ideally I'd be able to > avoid the extra part-m-00000 directory (but I can work on that > configuration because it is inside of Trevni I think). > > Unfortunately the directories from the crunch tmdir isn't getting copied > to the expected output directory because the CrunchJobHooks for completion > expects folders to be of the form "out#-*" and the directory that is > getting created does not have the "-" or take the form like others > ("out0-m-00000"). Am I missing some configuration in my target that would > cause the directory to be created like that? Or should the pattern for > finding directories to copy be lessened to not have the final "-"? > > Thoughts? > -- Director of Data Science Cloudera <http://www.cloudera.com> Twitter: @josh_wills <http://twitter.com/josh_wills>
