There is a bit of a difference between encryption and compression. You're better off using coprocessors to encrypt the data as its being written than trying to encrypt the actual HFile.
On Aug 7, 2012, at 3:31 AM, Harsh J <[email protected]> wrote: > Farrokh, > > I do not know of a way to plug in a codec that applies to all files on > HDFS transparently yet. Check out > https://issues.apache.org/jira/browse/HDFS-2542 and friends for some > work that may arrive in future. > > For HBase, by default, your choices are limited. You get only what > HBase has tested to offer (None, LZO, GZ, Snappy) and adding in > support for a new codec requires modification of sources. This is > cause HBase uses an Enum of codec identifiers (to save space in its > HFiles). But yes it can be done, and there're hackier ways of doing > this too (Renaming your CryptoCodec to SnappyCodec for instance, to > have HBase unknowingly use it, ugly ugly ugly). > So yes, it is indeed best to discuss this need with the HBase > community than the Hadoop one here. > > On Tue, Aug 7, 2012 at 1:43 PM, Farrokh Shahriari > <[email protected]> wrote: >> Thanks, >> What if I want to use this encryption in a cluster with hbase running on top >> of hadoop? Can't hadoop be configured to automatically encrypt each file >> which is going to be written on it? >> If not I probably should be asking how to enable encryption on hbase, and >> asking this question on the hbase mailing list, right? >> >> >> On Tue, Aug 7, 2012 at 12:32 PM, Harsh J <[email protected]> wrote: >>> >>> Farrokh, >>> >>> The codec org.apache.hadoop.io.compress.crypto.CyptoCodec needs to be >>> used. What you've done so far is merely add it to be loaded by Hadoop >>> at runtime, but you will need to use it in your programs if you wish >>> for it to get applied. >>> >>> For example, for MapReduce outputs to be compressed, you may run an MR >>> job with the following option set on its configuration: >>> >>> >>> "-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.crypto.CyptoCodec" >>> >>> And then you can notice that your output files were all properly >>> encrypted with the above codec. >>> >>> Likewise, if you're using direct HDFS writes, you will need to wrap >>> your outputstream with this codec. Look at the CompressionCodec API to >>> see how: >>> http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/io/compress/CompressionCodec.html#createOutputStream(java.io.OutputStream) >>> (Where your CompressionCodec must be the >>> org.apache.hadoop.io.compress.crypto.CyptoCodec instance). >>> >>> On Tue, Aug 7, 2012 at 1:11 PM, Farrokh Shahriari >>> <[email protected]> wrote: >>>> >>>> Hello >>>> I use "Hadoop Crypto Compressor" from this >>>> site"https://github.com/geisbruch/HadoopCryptoCompressor" for encryption >>>> hdfs files. >>>> I've downloaded the complete code & create the jar file,Change the >>>> propertise in core-site.xml as the site says. >>>> But when I add a new file,nothing has happened & encryption isn't >>>> working. >>>> What can I do for encryption hdfs files ? Does anyone know how I should >>>> use this class ? >>>> >>>> Tnx >>> >>> >>> >>> >>> -- >>> Harsh J >> >> > > > > -- > Harsh J >
