Thanks, Matei. I expected something along these lines. Robert
On Fri, Dec 13, 2013 at 5:28 AM, Matei Zaharia <[email protected]>wrote: > The hadoopFile method reuses the Writable object between records that it > reads by default, so you get back the same object. You should clone them if > you need to cache them. This is kind of an unintuitive behavior that we’ll > probably need to turn off by default; it’s helpful when you don’t need to > cache the objects because it reduces allocation. > > Matei > > On Dec 11, 2013, at 8:51 PM, Robert Fink <[email protected]> wrote: > > > Hi, > > > > I have a file containing avro GenericRecords; for debug purposes, let' > read one particular field, "date_time" and print it to the screen: > > > > def sc = new SparkContext("local", "My Spark Context") > > val job = new org.apache.hadoop.mapreduce.Job > > > > // input data: > > def avrofile = "debug-data/records.avro" > > > > // Load > > val rdd = sc.newAPIHadoopFile( > > avrofile, > > > classOf[org.apache.avro.mapreduce.AvroKeyInputFormat[GenericRecord]], > > classOf[org.apache.avro.mapred.AvroKey[GenericRecord]], > > classOf[org.apache.hadoop.io.NullWritable], > > job.getConfiguration).map( x => x._1.datum ) > > > > rdd.foreach( x => println( x.get("date_time"))) > > > > This is all jolly good, the output is: > > 2013-10-21T00:19:25-04:00 > > 2013-10-21T00:12:39-04:00 > > 2013-10-21T00:08:09-04:00 > > 2013-10-21T00:12:54-04:00 > > [...] > > > > When I change the loading statement to use caching like this > > > > val rdd = sc.newAPIHadoopFile( > > avrofile, > > > classOf[org.apache.avro.mapreduce.AvroKeyInputFormat[GenericRecord]], > > classOf[org.apache.avro.mapred.AvroKey[GenericRecord]], > > classOf[org.apache.hadoop.io.NullWritable], > > job.getConfiguration).map( x => x._1.datum ).cache > > > > rdd.foreach( x => println( x.get("date_time"))) > > > > , then all records have the same date_time, in fact, they are all > identical records: > > 2013-10-21T00:01:29-04:00 > > 2013-10-21T00:01:29-04:00 > > 2013-10-21T00:01:29-04:00 > > 2013-10-21T00:01:29-04:00 > > [...] > > > > Any idea what's going on here? > > > > Best, > > Robert > >
