Re: Performance Benchmark Hbase vs Cassandra
You should think about using ycsb and write an adapter for spark perf tests against these databases if it doesn't already exist. See here: https://github.com/brianfrankcooper/YCSB Sent from my iPhone On Jun 29, 2017, at 7:33 PM, Raj, Deepu> wrote: Hi Team, I want to do a performance benchmark with some specific use case with Spark -->HBase and Spark --> Cassandra. Can anyone provide inputs:- 1. Scenarios / Parameters to monitor? 2. Any automation tool to make this work? 3. Any previous Learnings/ Blogs/environment setup? Thanks, Deepu
Re: Performance Benchmark Hbase vs Cassandra
For Cassandra, I found: https://www.instaclustr.com/multi-data-center-sparkcassandra-benchmark-round-2/ My coworker (on vacation at the moment) was doing benchmark with hbase. When he comes back, the result can be published. Note: it is hard to find comparison results with same setup (hardware, number of nodes, etc). On Thu, Jun 29, 2017 at 7:33 PM, Raj, Deepuwrote: > Hi Team, > > > > I want to do a performance benchmark with some specific use case with > Spark àHBase and Spark à Cassandra. > > > > Can anyone provide inputs:- > > 1. Scenarios / Parameters to monitor? > > 2. Any automation tool to make this work? > > 3. Any previous Learnings/ Blogs/environment setup? > > > > Thanks, > > Deepu >
Performance Benchmark Hbase vs Cassandra
Hi Team, I want to do a performance benchmark with some specific use case with Spark -->HBase and Spark --> Cassandra. Can anyone provide inputs:- 1. Scenarios / Parameters to monitor? 2. Any automation tool to make this work? 3. Any previous Learnings/ Blogs/environment setup? Thanks, Deepu
RE: Spark Hbase Connector
Thanks Weiging / Ted From: Weiqing Yang [mailto:yangweiqing...@gmail.com] Sent: Friday, 30 June 2017 10:34 AM To: Ted YuCc: Raj, Deepu ; dev@spark.apache.org Subject: Re: Spark Hbase Connector https://github.com/hortonworks-spark/shc/releases (v1.x.x-2.1 for Spark 2.1) https://github.com/hortonworks-spark/shc/tree/branch-2.1 (for Spark 2.1) On Thu, Jun 29, 2017 at 4:36 PM, Ted Yu > wrote: Please take a look at HBASE-16179 (work in progress). On Thu, Jun 29, 2017 at 4:30 PM, Raj, Deepu > wrote: Hi Team, Is there stable Spark HBase connector for Spark 2.0 ? Thanks, Deepu Raj
Re: Spark Hbase Connector
https://github.com/hortonworks-spark/shc/releases (v1.x.x-2.1 for Spark 2.1) https://github.com/hortonworks-spark/shc/tree/branch-2.1 (for Spark 2.1) On Thu, Jun 29, 2017 at 4:36 PM, Ted Yuwrote: > Please take a look at HBASE-16179 (work in progress). > > On Thu, Jun 29, 2017 at 4:30 PM, Raj, Deepu > wrote: > >> Hi Team, >> >> >> >> Is there stable Spark HBase connector for Spark 2.0 ? >> >> >> >> Thanks, >> >> Deepu Raj >> >> >> > >
Re: PlanLater not being optimized out of Query Plan
Figured it out, it was in my Exec, I hadn't defined it as a case class (Just a normal class) and just left in stubs for the Product trait methods. This just led to some... unwanted behaviors. On Thu, Jun 29, 2017 at 4:26 PM Russell Spitzerwrote: > I've been writing some toy experimental strategies which end up adding > UnaryExec nodes to the plan. For some reason though my "PlanLater" nodes > end up being ignored and end up in the final physical plan. Is there > anything in general that I might be missing? > > I'm doing my sample work on 2.1.X and adding the strategy through > experimentalMethods > > Here is an abbreviated code sample > > class CassandraDirectJoinStrategy extends Strategy with Serializable { > import CassandraDirectJoinStrategy._ > > override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { > case ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, > right) > if (validJoinBranch(left, leftKeys) || validJoinBranch(right, > rightKeys)) => > > //TODO Check which side should be the target > val (otherBranch, joinTargetBranch, joinKeys, buildType) = { > if (validJoinBranch(left, leftKeys)){ > (right, left, leftKeys, BuildLeft) > } else { > (left, right, rightKeys, BuildRight) > } > } > > logDebug(s"Performing Direct Cassandra Join against $joinTargetBranch") > > new CassandraDirectJoinExec( > leftKeys, > rightKeys, > joinType, > buildType, > condition, > planLater(otherBranch), > joinTargetBranch) :: Nil > case _ => Nil > } > } > > == Parsed Logical Plan == > Join Inner, (k#267 = k#270) > :- LocalRelation [k#267, v#268] > +- Relation[k#270,v#271] > org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97 > > == Analyzed Logical Plan == > k: int, v: int, k: int, v: int > Join Inner, (k#267 = k#270) > :- LocalRelation [k#267, v#268] > +- Relation[k#270,v#271] > org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97 > > == Optimized Logical Plan == > Join Inner, (k#267 = k#270) > :- LocalRelation [k#267, v#268] > +- Relation[k#270,v#271] > org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97 > > == Physical Plan == > CassandraDirectJoin > +- PlanLater LocalRelation [k#267, v#268] > >
Spark Hbase Connector
Hi Team, Is there stable Spark HBase connector for Spark 2.0 ? Thanks, Deepu Raj
PlanLater not being optimized out of Query Plan
I've been writing some toy experimental strategies which end up adding UnaryExec nodes to the plan. For some reason though my "PlanLater" nodes end up being ignored and end up in the final physical plan. Is there anything in general that I might be missing? I'm doing my sample work on 2.1.X and adding the strategy through experimentalMethods Here is an abbreviated code sample class CassandraDirectJoinStrategy extends Strategy with Serializable { import CassandraDirectJoinStrategy._ override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match { case ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, right) if (validJoinBranch(left, leftKeys) || validJoinBranch(right, rightKeys)) => //TODO Check which side should be the target val (otherBranch, joinTargetBranch, joinKeys, buildType) = { if (validJoinBranch(left, leftKeys)){ (right, left, leftKeys, BuildLeft) } else { (left, right, rightKeys, BuildRight) } } logDebug(s"Performing Direct Cassandra Join against $joinTargetBranch") new CassandraDirectJoinExec( leftKeys, rightKeys, joinType, buildType, condition, planLater(otherBranch), joinTargetBranch) :: Nil case _ => Nil } } == Parsed Logical Plan == Join Inner, (k#267 = k#270) :- LocalRelation [k#267, v#268] +- Relation[k#270,v#271] org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97 == Analyzed Logical Plan == k: int, v: int, k: int, v: int Join Inner, (k#267 = k#270) :- LocalRelation [k#267, v#268] +- Relation[k#270,v#271] org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97 == Optimized Logical Plan == Join Inner, (k#267 = k#270) :- LocalRelation [k#267, v#268] +- Relation[k#270,v#271] org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97 == Physical Plan == CassandraDirectJoin +- PlanLater LocalRelation [k#267, v#268]