Very little.
Only think i could find was a "info" blog on haddop + spark used a twitter.
It does not contain the details though.

A small LZO compressed file (5MB) with index file works with my code.
So i know that my code must be working fine
but for larger LZO the system chokes trying to uncompress a 50GB file.

/*--------------------------------------------------------------*/
import spark._
import spark.SparkContext._
import java.io._
import com.hadoop.mapred._

import com.hadoop.compression.lzo.LzopCodec
import java.io.FileInputStream
import org.apache.hadoop.conf.Configuration

System.setProperty("spark.io.compression.codec",
"com.hadoop.compression.lzo.LzopCodec")
System.setProperty("spark.io.compression.codec",
"com.hadoop.compression.lzo.LzoCodec")

val input = sc.hadoopFile("hdfs://hadoop00/tmp/lldpc.sstv3.lzo")

def parseInput(line: String): (Double, Int) = {
    var fields = line.split(" ")
    var (hts, svPair) = (fields(0), fields(1))
    var ts = fields(0).split("#")(1).toDouble
      (ts, 1)
}
var KVPairs = input.map(parseInput _).reduceByKey(_ + _).cache()
val minTs = KVPairs.map{case(t, ss) => (t)}.collect().min

/*--------------------------------------------------------------*/
Hope this helps
and someone can help read a LZO file

regards
Rajeev

Rajeev Srivastava
Silverline Design Inc
2118 Walsh ave, suite 204
Santa Clara, CA, 95050
cell : 408-409-0940


On Tue, Dec 10, 2013 at 9:06 AM, Andrew Ash <[email protected]> wrote:

> I'm interested in doing this too Rajeev. Did you make any progress?
>
>
> On Mon, Dec 9, 2013 at 1:57 PM, Rajeev Srivastava <
> [email protected]> wrote:
>
>> Hello experts,
>>      I would like to read a LZO splittable compressed file into spark.
>> I have followed available material on the web on working with LZO
>> compressed data.
>> I am able to create the index file needed by hadoop.
>>
>> But i am unable to read the LZO file in spark. I use spark 0.7.2
>>
>> I would like to know if someone has had success reading a large LZO
>> compressed file.
>>
>> regards
>> Rajeev Srivastava
>> Silverline Design Inc
>> 2118 Walsh ave, suite 204
>> Santa Clara, CA, 95050
>> cell : 408-409-0940
>>
>
>

Reply via email to