Re: Encrypting files in Hadoop - Using the io.compression.codecs

Michael Segel Tue, 07 Aug 2012 05:41:11 -0700

There is a bit of a difference between encryption and compression. 

You're better off using coprocessors to encrypt the data as its being written 
than trying to encrypt the actual HFile.



On Aug 7, 2012, at 3:31 AM, Harsh J <[email protected]> wrote:

> Farrokh,
> 
> I do not know of a way to plug in a codec that applies to all files on
> HDFS transparently yet. Check out
> https://issues.apache.org/jira/browse/HDFS-2542 and friends for some
> work that may arrive in future.
> 
> For HBase, by default, your choices are limited. You get only what
> HBase has tested to offer (None, LZO, GZ, Snappy) and adding in
> support for a new codec requires modification of sources. This is
> cause HBase uses an Enum of codec identifiers (to save space in its
> HFiles). But yes it can be done, and there're hackier ways of doing
> this too (Renaming your CryptoCodec to SnappyCodec for instance, to
> have HBase unknowingly use it, ugly ugly ugly).
> So yes, it is indeed best to discuss this need with the HBase
> community than the Hadoop one here.
> 
> On Tue, Aug 7, 2012 at 1:43 PM, Farrokh Shahriari
> <[email protected]> wrote:
>> Thanks,
>> What if I want to use this encryption in a cluster with hbase running on top
>> of hadoop? Can't hadoop be configured to automatically encrypt each file
>> which is going to be written on it?
>> If not I probably should be asking how to enable encryption on hbase, and
>> asking this question on the hbase mailing list, right?
>> 
>> 
>> On Tue, Aug 7, 2012 at 12:32 PM, Harsh J <[email protected]> wrote:
>>> 
>>> Farrokh,
>>> 
>>> The codec org.apache.hadoop.io.compress.crypto.CyptoCodec needs to be
>>> used. What you've done so far is merely add it to be loaded by Hadoop
>>> at runtime, but you will need to use it in your programs if you wish
>>> for it to get applied.
>>> 
>>> For example, for MapReduce outputs to be compressed, you may run an MR
>>> job with the following option set on its configuration:
>>> 
>>> 
>>> "-Dmapred.output.compression.codec=org.apache.hadoop.io.compress.crypto.CyptoCodec"
>>> 
>>> And then you can notice that your output files were all properly
>>> encrypted with the above codec.
>>> 
>>> Likewise, if you're using direct HDFS writes, you will need to wrap
>>> your outputstream with this codec. Look at the CompressionCodec API to
>>> see how:
>>> http://hadoop.apache.org/common/docs/stable/api/org/apache/hadoop/io/compress/CompressionCodec.html#createOutputStream(java.io.OutputStream)
>>> (Where your CompressionCodec must be the
>>> org.apache.hadoop.io.compress.crypto.CyptoCodec instance).
>>> 
>>> On Tue, Aug 7, 2012 at 1:11 PM, Farrokh Shahriari
>>> <[email protected]> wrote:
>>>> 
>>>> Hello
>>>> I use "Hadoop Crypto Compressor" from this
>>>> site"https://github.com/geisbruch/HadoopCryptoCompressor"; for encryption
>>>> hdfs files.
>>>> I've downloaded the complete code & create the jar file,Change the
>>>> propertise in core-site.xml as the site says.
>>>> But when I add a new file,nothing has happened & encryption isn't
>>>> working.
>>>> What can I do for encryption hdfs files ? Does anyone know how I should
>>>> use this class ?
>>>> 
>>>> Tnx
>>> 
>>> 
>>> 
>>> 
>>> --
>>> Harsh J
>> 
>> 
> 
> 
> 
> -- 
> Harsh J
>

Re: Encrypting files in Hadoop - Using the io.compression.codecs

Reply via email to