Hi there, The current documentation says: > By default, data is not compressed. You can compress your data by using the > deflate (gzip) algorithm with the -z or --compress argument, or specify any > Hadoop compression codec using the --compression-codec argument. This applies > to both SequenceFiles or text files. > But I think this is a bit misleading.
Currently if output compression is enabled in a cluster, then the Sqooped data is alway compressed, regardless of the setting of this flag. It seems better to actually make compression controllable via --compress, which means changing ImportJobBase.configureOutputFormat() if (options.shouldUseCompression()) { FileOutputFormat.setCompressOutput(job, true); FileOutputFormat.setOutputCompressorClass(job, GzipCodec.class); SequenceFileOutputFormat.setOutputCompressionType(job, CompressionType.BLOCK); } // new stuff else { FileOutputFormat.setCompressOutput(job, false); } Thoughts? -- Ken -------------------------- Ken Krugler +1 530-210-6378 http://bixolabs.com custom big data solutions & training Hadoop, Cascading, Mahout & Solr