from:"Russ Weeks"

Re: Indexing Support

2015-10-18 Thread Russ Weeks

Distributed R-Trees are not very common. Most "big data" spatial solutions collapse multi-dimensional data into a distributed one-dimensional index using a space-filling curve. Many implementations exist outside of Spark for eg. Hbase or Accumulo. It's simple enough to write a map function that tak

Re: iPython Notebook + Spark + Accumulo -- best practice?

2015-03-26 Thread Russ Weeks

Hi, David, This is the code that I use to create a JavaPairRDD from an Accumulo table: JavaSparkContext sc = new JavaSparkContext(conf); Job hadoopJob = Job.getInstance(conf,"TestSparkJob"); job.setInputFormatClass(AccumuloInputFormat.class); AccumuloInputFormat.setZooKeeperInstance(job, conf

Re: Reading from HBase is too slow

2014-09-29 Thread Russ Weeks

Hi, Tao, When I used newAPIHadoopRDD (Accumulo not HBase) I found that I had to specify executor-memory and num-executors explicitly on the command line or else I didn't get any parallelism across the cluster. I used --executor-memory 3G --num-executors 24 but obviously other parameters will be b

Re: Does anyone have experience with using Hadoop InputFormats?

2014-09-24 Thread Russ Weeks

No, they do not implement Serializable. There are a couple of places where I've had to do a Text->String conversion but generally it hasn't been a problem. -Russ On Wed, Sep 24, 2014 at 10:27 AM, Steve Lewis wrote: > Do your custom Writable classes implement Serializable - I think that is > the

Re: Does anyone have experience with using Hadoop InputFormats?

2014-09-24 Thread Russ Weeks

I use newAPIHadoopRDD with AccumuloInputFormat. It produces a PairRDD using Accumulo's Key and Value classes, both of which extend Writable. Works like a charm. I use the same InputFormat for all my MR jobs. -Russ On Wed, Sep 24, 2014 at 9:33 AM, Steve Lewis wrote: > I tried newAPIHadoopFile an

Re: Spark + AccumuloInputFormat

2014-09-10 Thread Russ Weeks

query time down to 30s from 18 minutes and I'm seeing much better utilization of my accumulo tablet servers. -Russ On Tue, Sep 9, 2014 at 5:13 PM, Russ Weeks wrote: > Hi, > > I'm trying to execute Spark SQL queries on top of the AccumuloInputFormat. > Not sure if I shoul

Re: Accumulo and Spark

2014-09-10 Thread Russ Weeks

It's very straightforward to set up a Hadoop RDD to use AccumuloInputFormat. Something like this will do the trick: private JavaPairRDD newAccumuloRDD(JavaSparkContext sc, AgileConf agileConf, String appName, Authorizations auths) throws IOException, AccumuloSecurityException { Job had

Spark + AccumuloInputFormat

2014-09-09 Thread Russ Weeks

Hi, I'm trying to execute Spark SQL queries on top of the AccumuloInputFormat. Not sure if I should be asking on the Spark list or the Accumulo list, but I'll try here. The problem is that the workload to process SQL queries doesn't seem to be distributed across my cluster very well. My Spark SQL

Re: Indexing Support

Re: iPython Notebook + Spark + Accumulo -- best practice?

Re: Reading from HBase is too slow

Re: Does anyone have experience with using Hadoop InputFormats?

Re: Does anyone have experience with using Hadoop InputFormats?

Re: Spark + AccumuloInputFormat

Re: Accumulo and Spark

Spark + AccumuloInputFormat

8 matches

Site Navigation

Mail list logo

Footer information