I have a workflow that periodically executes a Pig script, concatenates the output into a single tab-separated value file, compresses the file using Gzip and then FTPs it to a remote server. I could do this pretty easily using a standard shell script, but was looking into whether or not this would be a good candidate for an Oozie workflow. I am new to Oozie and spent the last few days learning how to write and debug Oozie workflows. I have run into a few issues and was wondering if anyone had some advice.
Is there an easy way to concatenate output using Oozie? HDFS supports the getmerge command, but it appears this is not supported as an Oozie action. Would it make sense to execute this command using a shell or SSH action? Likewise I would like to compress and FTP this output using shell or SSH actions. I guess I have two basic questions. First, is there an easy way to do all of this in Ooozie. Second, is this a good use case for Oozie? Reading through the Oozie use cases and working through the examples, this doesn't seem to be one of the primary use cases for Oozie. Would this be better to run as a standard cron job using a shell script? I appreciate any experience or feedback you might have. Thanks, Shawn
