Re: clashing hbase queries

2016-10-19 Thread Pat Ferrel
Status: clashing HBase queries was indeed the problem. I went back to an older version of the template, which did not cause the 2 tasks to be in parallel and things work fine. It only makes sense but the DAG is a mysterious thing so we need a way to serialize HBase access to stop it from being

Re: clashing hbase queries

2016-10-15 Thread Pat Ferrel
I may have been on the wrong track with the 2 parallel task idea, which is a problem. The typical use with Spark is to get all data out of HBase and work on it as RDDs but getting it out may cause parallel tasks to access HBase at the same time. There must be a way to serialize the execution of

Re: clashing hbase queries

2016-10-13 Thread Andrew Purtell
This sounds like hotspotting. Ideally the workload over the keyspace can be better distributed, which is another avenue of attack - partitioning, keying strategy. > On Oct 13, 2016, at 6:10 PM, Pat Ferrel wrote: > > The DAG for a template just happens to schedule 2 tasks that do something

clashing hbase queries

2016-10-13 Thread Pat Ferrel
The DAG for a template just happens to schedule 2 tasks that do something like this: val fieldsRDD: RDD[(ItemID, PropertyMap)] = PEventStore.aggregateProperties( appName = dsp.appName, entityType = "item")(sc) to execute in parallel The PEventStore calls from 2 separate closures start hitti