Re: reading rfiles directly

Bulldog20630405 Tue, 04 Aug 2020 10:11:12 -0700

okay; thanx; the GeoMesaAccumuloInputFormat look interesting; i just need
to make it more generic ... thanx!


On Mon, Aug 3, 2020 at 5:38 PM Jim Hughes <[email protected]> wrote:

> Good question.  As a very general note, one can leverage Hadoop
> InputFormats to create Spark RDDs.
>
> As a rather non-trivial example, you could check out GeoMesa's
> implementation of mapping Accumulo entries to geospatial data types.
>
> The basic strategy is make a Hadoop Configuration object representing
> what to scan in Accumulo and call SparkContext.newAPIHadoopRDD to get an
> RDD.
>
> If you want a DataFrame/DataSet, you'll need to implement that Spark
> DataSource API.
>
> Hope that helps!
>
> Cheers,
>
> Jim
>
> 1. Current implementation; decently refactored.
>
> https://github.com/locationtech/geomesa/blob/main/geomesa-accumulo/geomesa-accumulo-spark/src/main/scala/org/locationtech/geomesa/spark/accumulo/AccumuloSpatialRDDProvider.scala#L52-L82
>
> 2. Older implementation; less refactoring, may be more clear.
>
> https://github.com/locationtech/geomesa/blob/geomesa_2.11-1.3.0/geomesa-accumulo/geomesa-accumulo-spark/src/main/scala/org/locationtech/geomesa/spark/accumulo/AccumuloSpatialRDDProvider.scala#L51-L100
>
> p.s. Alternatively, if you just want to get a little data out of
> Accumulo, you could just query for it on the master, and fan the data
> out on the cluster.  *shrugs*
>
> On 8/3/20 4:46 PM, Bulldog20630405 wrote:
> >
> > we would like to read rfiles directly outside an active accumulo
> > instance using spark.  is there a example to do this?
> >
> > note: i know there is an utility to print rfiles and i could start
> > there and build my own; but was hoping to leverage something already
> > there.
>
>
>

Re: reading rfiles directly

Reply via email to