Basically, you need to convert it to a serializable format before doing the
collect/take.
You can fire up a spark shell and paste this:
val sFile = sc.sequenceFile[LongWritable, Text]("/home/akhld/sequence
> /sigmoid")
> *.map(_._2.toString)*
> sFile.take(5).foreach(println)
Use t
Spark Shell:
val x =
sc.sequenceFile("/sys/edw/dw_lstg_item/snapshot/2015/06/01/00/part-r-00761",
classOf[org.apache.hadoop.io.Text], classOf[org.apache.hadoop.io.Text])
OR
val x =
sc.sequenceFile("/sys/edw/dw_lstg_item/snapshot/2015/06/01/00/part-r-00761",
classOf[org.apache.hadoop.io.Text],
cl
I have a sequence file
SEQorg.apache.hadoop.io.Textorg.apache.hadoop.io.Text'org.apache.hadoop.io.compress.GzipCodec?v?
Key = Text
Value = Text
and it seems to be using GzipCodec.
How should i read it from Spark
I am using
val x = sc.sequenceFile(dwTable, classOf[Text], classOf[Text]).parti