http://stackoverflow.com/questions/24450540/how-to-use-sparks-newapihadooprdd-from-java This issue seems to indicate your analysis is correct, that the compilation error has to do with there being an intermediate AbstractInputFormat. I'd be curious whether a different compiler would help or not, as they suggest.
On Mon, May 4, 2015 at 8:46 AM, Marc Reichman <[email protected]> wrote: > Has anyone done any testing with Spark and AccumuloRowInputFormat? I have > no problem doing this for AccumuloInputFormat: > > JavaPairRDD<Key, Value> pairRDD = > sparkContext.newAPIHadoopRDD(job.getConfiguration(), > AccumuloInputFormat.class, > Key.class, Value.class); > > But I run into a snag trying to do a similar thing: > > JavaPairRDD<Text, PeekingIterator<Map.Entry<Key, Value>>> pairRDD = > sparkContext.newAPIHadoopRDD(job.getConfiguration(), > AccumuloRowInputFormat.class, > Text.class, PeekingIterator.class); > > The compilation error is (big, sorry): > > Error:(141, 97) java: method newAPIHadoopRDD in class > org.apache.spark.api.java.JavaSparkContext cannot be applied to given types; > required: > org.apache.hadoop.conf.Configuration,java.lang.Class<F>,java.lang.Class<K>,java.lang.Class<V> > found: > org.apache.hadoop.conf.Configuration,java.lang.Class<org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat>,java.lang.Class<org.apache.hadoop.io.Text>,java.lang.Class<org.apache.accumulo.core.util.PeekingIterator> > reason: inferred type does not conform to declared bound(s) > inferred: org.apache.accumulo.core.client.mapreduce.AccumuloRowInputFormat > bound(s): > org.apache.hadoop.mapreduce.InputFormat<org.apache.hadoop.io.Text,org.apache.accumulo.core.util.PeekingIterator> > > I've tried a few things, the signature of the function is: > > public <K, V, F extends org.apache.hadoop.mapreduce.InputFormat<K, V>> > JavaPairRDD<K, V> newAPIHadoopRDD(Configuration conf, Class<F> fClass, > Class<K> kClass, Class<V> vClass) > > I guess it's having trouble with the format extending InputFormatBase with > its own additional generic parameters (the Map.Entry inside > PeekingIterator). > > This may be an issue to chase with Spark vs Accumulo, unless something can > be tweaked on the Accumulo side or I could wrap the InputFormat with my own > somehow. > > Accumulo 1.6.1, Spark 1.3.1, JDK 7u71. > > Stopping short of this, can anyone think of a good way to use > AccumuloInputFormat to get what I'm getting from the Row version in a > performant way? It doesn't necessarily have to be an iterator approach, but > I'd need all my values with the key in one consuming function. I'm looking > into ways to do it in spark functions but trying to avoid any major > performance hits. > > Thanks, > > Marc > > p.s. The summit was absolutely great, thank you all for having it! > >
