Logged CRUNCH-199. https://issues.apache.org/jira/browse/CRUNCH-199
On Fri, Apr 26, 2013 at 3:21 PM, Micah Whitacre <[email protected]>wrote: > >> Can you gist up a patch and/or post it to a JIRA so we can take a look? > > I'll work on cleaning up my code a bit and attach it to a JIRA. > > > On Fri, Apr 26, 2013 at 3:19 PM, Josh Wills <[email protected]> wrote: > >> Can you gist up a patch and/or post it to a JIRA so we can take a look? >> >> >> On Fri, Apr 26, 2013 at 12:30 PM, Micah Whitacre <[email protected]>wrote: >> >>> So as mentioned I'm currently trying out adding Avro Trevni support to >>> Crunch. I think I've gotten everything working with the exception that my >>> output is not being copied to the correct directory upon completion. >>> >>> I'm extending the FileTargetImpl and have the following in my >>> implementation: >>> >>> @Override >>> public void configureForMapReduce(Job job, PType<?> ptype, Path >>> outputPath, String name) { >>> ..... >>> configureForMapReduce(job, AvroKey.class, NullWritable.class, >>> AvroTrevniKeyOutputFormat.class, >>> outputPath, name); >>> >>> //AvroTrevniKeyOutputFormat uses this set value to write content >>> directly to this path. Therefore >>> // resetting the value with the named value. >>> if(name != null){ >>> FileOutputFormat.setOutputPath(job, new Path(outputPath, >>> name)); >>> } >>> >>> This produces the following in the crunch tmp directory: >>> >>> $ pwd >>> >>> /var/folders/0f/l_2w0gxd0p15k9410b18j8q40000gp/T/junit6467712912178902519/tmp-crunch.tmp.dir/crunch-1902403831/p1/output/out0 >>> $ ls >>> _SUCCESS part-m-00000 >>> $ cd part-m-00000/ >>> $ ls -l >>> total 8 >>> -rwxrwxrwx 1 mw010351 staff 493 Apr 26 13:52 part-0.trv >>> -rw-r--r-- 1 mw010351 staff 0 Apr 26 13:52 part-m-00000 >>> >>> the part-0.trv is the file of the most interest and ideally I'd be able >>> to avoid the extra part-m-00000 directory (but I can work on that >>> configuration because it is inside of Trevni I think). >>> >>> Unfortunately the directories from the crunch tmdir isn't getting copied >>> to the expected output directory because the CrunchJobHooks for completion >>> expects folders to be of the form "out#-*" and the directory that is >>> getting created does not have the "-" or take the form like others >>> ("out0-m-00000"). Am I missing some configuration in my target that would >>> cause the directory to be created like that? Or should the pattern for >>> finding directories to copy be lessened to not have the final "-"? >>> >>> Thoughts? >>> >> >> >> >> -- >> Director of Data Science >> Cloudera <http://www.cloudera.com> >> Twitter: @josh_wills <http://twitter.com/josh_wills> >> > >
