How to use groupByKey and CqlPagingInputFormat

Martin Gammelsæter Wed, 02 Jul 2014 04:44:02 -0700

Hi!

Total Scala and Spark noob here with a few questions.


I am trying to modify a few of the examples in the spark repo to fit
my needs, but running into a few problems.

I am making an RDD from Cassandra, which I've finally gotten to work,
and trying to do some operations on it. Specifically I am trying to do
a grouping by key for future calculations.
I want the key to be the column "navn" from a certain column family,
but I don't think I understand the returned types. Why are two Maps
returned, instead of one? I'd think that you'd get a list of some kind
with every row, where every element in the list was a map from column
name to the value. So my first question is: What do these maps
represent?

   val casRdd = sc.newAPIHadoopRDD(job.getConfiguration(),
      classOf[CqlPagingInputFormat],
      classOf[java.util.Map[String,ByteBuffer]],
      classOf[java.util.Map[String,ByteBuffer]])

    val navnrevmap = casRdd.map({
      case (key, value) =>
        (ByteBufferUtil.string(value.get("navn")),
ByteBufferUtil.toInt(value.get("revisjon"))
    }).groupByKey()

The second question (probably stemming from my not understanding the
first question) is why am I not allowed to do a groupByKey in the
above code? I understand that the type does not have that function,
but I'm unclear on what I have to do to make it work.

-- 
Best regards,
Martin Gammelsæter

How to use groupByKey and CqlPagingInputFormat

Reply via email to