Hello,

 

we ran a test client generating data into GZ and LZO compressed table.
Equal data sets (number of rows: 1008000 and the same table schema). ~
7.78 GB disk space uncompressed in HDFS. LZO is ~ 887 MB whereas GZ is ~
444 MB, so basically half of LZO.

 

Execution time of the data generating client was 1373 seconds into the
uncompressed table, 3374 sec. into LZO and 2198 sec. into GZ. The data
generation client is based on HTablePool and using batch operations.

 

So in our (simple) test, GZ beats LZO in both, disk usage and execution
time of the client. We haven't tried reads yet.

 

Is this an expected result? I thought LZO is the recommended compression
algorithm? Or does LZO outperforms GZ with a growing amount of data or
in read scenarios?

 

Regards,

Thomas

 

Reply via email to