Martin, 1) The first map contains the columns in the primary key, which could be a compound primary key containing multiple columns, and the second map contains all the non-key columns. 2) try this fixed code: val navnrevmap = casRdd.map{ case (key, value) => (ByteBufferUtil.string(value.get("navn")), ByteBufferUtil.toInt(value.get("revisjon"))) }.groupByKey()
Mohammed -----Original Message----- From: Martin Gammelsæter [mailto:martingammelsae...@gmail.com] Sent: Wednesday, July 2, 2014 4:36 AM To: user@spark.apache.org Subject: How to use groupByKey and CqlPagingInputFormat Hi! Total Scala and Spark noob here with a few questions. I am trying to modify a few of the examples in the spark repo to fit my needs, but running into a few problems. I am making an RDD from Cassandra, which I've finally gotten to work, and trying to do some operations on it. Specifically I am trying to do a grouping by key for future calculations. I want the key to be the column "navn" from a certain column family, but I don't think I understand the returned types. Why are two Maps returned, instead of one? I'd think that you'd get a list of some kind with every row, where every element in the list was a map from column name to the value. So my first question is: What do these maps represent? val casRdd = sc.newAPIHadoopRDD(job.getConfiguration(), classOf[CqlPagingInputFormat], classOf[java.util.Map[String,ByteBuffer]], classOf[java.util.Map[String,ByteBuffer]]) val navnrevmap = casRdd.map({ case (key, value) => (ByteBufferUtil.string(value.get("navn")), ByteBufferUtil.toInt(value.get("revisjon")) }).groupByKey() The second question (probably stemming from my not understanding the first question) is why am I not allowed to do a groupByKey in the above code? I understand that the type does not have that function, but I'm unclear on what I have to do to make it work. -- Best regards, Martin Gammelsæter