I have the same question and tried with 1, but get compilation error: [error] …. could not find implicit value for parameter kcf: () => org.apache.spark.WritableConverter[String] [error] val t2 = sc.sequenceFile[String, Int](“/test/data", 20)
Yishu On Mar 9, 2014, at 12:21 AM, Shixiong Zhu <zsxw...@gmail.com> wrote: > Hi Kane, > > In the sequence file, the class is org.apache.hadoop.io.Text. You need to > convert Text to String. There are two approaches: > > 1. Use implicit conversions to convert Text to String automatically. I > recommend this one. E.g., > > val t2 = sc.sequenceFile[String, String]("/user/hdfs/e1Mseq") > t2.groupByKey().take(5) > > 2. Use "classOf[Text]" to specify the correct class in the sequence file and > convert Text to String. E.g., > > import org.apache.hadoop.io.Text > val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[Text], classOf[Text]) > t2.map { case (k,v) => (k.toString, v.toString) } .groupByKey().take(5) > > > Best Regards, > > Shixiong Zhu > > > 2014-03-09 13:30 GMT+08:00 Kane <kane.ist...@gmail.com>: > when i try to open sequence file: > val t2 = sc.sequenceFile("/user/hdfs/e1Mseq", classOf[String], > classOf[String]) > t2.groupByKey().take(5) > > I get: > org.apache.spark.SparkException: Job aborted: Task 25.0:0 had a not > serializable result: java.io.NotSerializableException: > org.apache.hadoop.io.Text > > another thing is: > t2.take(5) - returns 5 identical items, i guess I have to map/clone items, > but i get something like org.apache.hadoop.io.Text cannot be cast to > java.lang.String, how do i clone it? > > Thanks. > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/sequenceFile-and-groupByKey-tp2428.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. >