if 'STORE' worked, LOAD should work fine too. On Thu, Oct 13, 2011 at 6:29 PM, Cameron Gandevia <[email protected]>wrote:
> Hi > > I currently have a bunch of data in json format in hdfs. I would like to > use > pig to load it dedupe it and store it back using snappy compression. > > Currently I do something like this. > > raw = LOAD '$INPUT' USING PigJsonLoader(); > uniq = DISTINCT raw; > STORE uniq INTO '$OUTPUT' USING PigStorage(); > > If I add the following to the pig job it seems to write the files with a > '.snappy' extension > > <property> > <name>mapred.output.compress</name> > <value>true</value> > </property> > <property> > <name>mapred.output.compression.codec</name> > <value>org.apache.hadoop.io.compress.SnappyCodec</value> > </property> > <property> > <name>mapred.output.compression.type</name> > <value>BLOCK</value> > </property> > > Is this all I need to do? Or do I need to write it in a different format? > and is there a way to load the snappy compressed json data or do I need to > implement a new load function? > > any help is much appreciated. > > Thanks >
