Re: Performance Benchmark Hbase vs Cassandra

2017-06-29 Thread Saikat Kanjilal
You should think about using ycsb and write an adapter for spark perf tests 
against these databases if it doesn't already exist.  See here:  
https://github.com/brianfrankcooper/YCSB

Sent from my iPhone

On Jun 29, 2017, at 7:33 PM, Raj, Deepu 
> wrote:

Hi Team,

I want to do a performance benchmark with some specific use case with   Spark   
-->HBase   and Spark --> Cassandra.

Can anyone provide inputs:-

1.   Scenarios / Parameters to monitor?

2.   Any automation tool to make this work?

3.   Any previous Learnings/ Blogs/environment setup?

Thanks,
Deepu


Re: Performance Benchmark Hbase vs Cassandra

2017-06-29 Thread Ted Yu
For Cassandra, I found:

https://www.instaclustr.com/multi-data-center-sparkcassandra-benchmark-round-2/

My coworker (on vacation at the moment) was doing benchmark with hbase.
When he comes back, the result can be published.

Note: it is hard to find comparison results with same setup (hardware,
number of nodes, etc).

On Thu, Jun 29, 2017 at 7:33 PM, Raj, Deepu 
wrote:

> Hi Team,
>
>
>
> I want to do a performance benchmark with some specific use case with
>   Spark   àHBase   and Spark à Cassandra.
>
>
>
> Can anyone provide inputs:-
>
> 1.   Scenarios / Parameters to monitor?
>
> 2.   Any automation tool to make this work?
>
> 3.   Any previous Learnings/ Blogs/environment setup?
>
>
>
> Thanks,
>
> Deepu
>


Performance Benchmark Hbase vs Cassandra

2017-06-29 Thread Raj, Deepu
Hi Team,

I want to do a performance benchmark with some specific use case with   Spark   
-->HBase   and Spark --> Cassandra.

Can anyone provide inputs:-

1.   Scenarios / Parameters to monitor?

2.   Any automation tool to make this work?

3.   Any previous Learnings/ Blogs/environment setup?

Thanks,
Deepu


RE: Spark Hbase Connector

2017-06-29 Thread Raj, Deepu
Thanks Weiging / Ted

From: Weiqing Yang [mailto:yangweiqing...@gmail.com]
Sent: Friday, 30 June 2017 10:34 AM
To: Ted Yu 
Cc: Raj, Deepu ; dev@spark.apache.org
Subject: Re: Spark Hbase Connector

https://github.com/hortonworks-spark/shc/releases (v1.x.x-2.1 for Spark 2.1)
https://github.com/hortonworks-spark/shc/tree/branch-2.1 (for Spark 2.1)

On Thu, Jun 29, 2017 at 4:36 PM, Ted Yu 
> wrote:
Please take a look at HBASE-16179 (work in progress).

On Thu, Jun 29, 2017 at 4:30 PM, Raj, Deepu 
> wrote:
Hi Team,

Is there stable Spark HBase connector for Spark 2.0 ?

Thanks,
Deepu Raj





Re: Spark Hbase Connector

2017-06-29 Thread Weiqing Yang
https://github.com/hortonworks-spark/shc/releases (v1.x.x-2.1 for Spark 2.1)
https://github.com/hortonworks-spark/shc/tree/branch-2.1 (for Spark 2.1)

On Thu, Jun 29, 2017 at 4:36 PM, Ted Yu  wrote:

> Please take a look at HBASE-16179 (work in progress).
>
> On Thu, Jun 29, 2017 at 4:30 PM, Raj, Deepu 
> wrote:
>
>> Hi Team,
>>
>>
>>
>> Is there stable Spark HBase connector for Spark 2.0 ?
>>
>>
>>
>> Thanks,
>>
>> Deepu Raj
>>
>>
>>
>
>


Re: PlanLater not being optimized out of Query Plan

2017-06-29 Thread Russell Spitzer
Figured it out, it was in my Exec, I hadn't defined it as a case class
(Just a normal class) and just left in stubs for the Product trait methods.
This just led to some... unwanted behaviors.

On Thu, Jun 29, 2017 at 4:26 PM Russell Spitzer 
wrote:

> I've been writing some toy experimental strategies which end up adding
> UnaryExec nodes to the plan. For some reason though my "PlanLater" nodes
> end up being ignored and end up in the final physical plan. Is there
> anything in general that I might be missing?
>
> I'm doing my sample work on 2.1.X and adding the strategy through
> experimentalMethods
>
> Here is an abbreviated code sample
>
> class CassandraDirectJoinStrategy extends Strategy with Serializable {
>   import CassandraDirectJoinStrategy._
>
>   override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
> case ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition, left, 
> right)
>   if (validJoinBranch(left, leftKeys) || validJoinBranch(right, 
> rightKeys)) =>
>
>   //TODO Check which side should be the target
>   val (otherBranch, joinTargetBranch, joinKeys, buildType) = {
> if (validJoinBranch(left, leftKeys)){
>   (right, left, leftKeys, BuildLeft)
> } else {
>   (left, right, rightKeys, BuildRight)
> }
>   }
>
>   logDebug(s"Performing Direct Cassandra Join against $joinTargetBranch")
>
>   new CassandraDirectJoinExec(
> leftKeys,
> rightKeys,
> joinType,
> buildType,
> condition,
> planLater(otherBranch),
> joinTargetBranch) :: Nil
> case _ => Nil
>   }
> }
>
> == Parsed Logical Plan ==
> Join Inner, (k#267 = k#270)
> :- LocalRelation [k#267, v#268]
> +- Relation[k#270,v#271] 
> org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97
>
> == Analyzed Logical Plan ==
> k: int, v: int, k: int, v: int
> Join Inner, (k#267 = k#270)
> :- LocalRelation [k#267, v#268]
> +- Relation[k#270,v#271] 
> org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97
>
> == Optimized Logical Plan ==
> Join Inner, (k#267 = k#270)
> :- LocalRelation [k#267, v#268]
> +- Relation[k#270,v#271] 
> org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97
>
> == Physical Plan ==
> CassandraDirectJoin
> +- PlanLater LocalRelation [k#267, v#268]
>
>


Spark Hbase Connector

2017-06-29 Thread Raj, Deepu
Hi Team,

Is there stable Spark HBase connector for Spark 2.0 ?

Thanks,
Deepu Raj



PlanLater not being optimized out of Query Plan

2017-06-29 Thread Russell Spitzer
I've been writing some toy experimental strategies which end up adding
UnaryExec nodes to the plan. For some reason though my "PlanLater" nodes
end up being ignored and end up in the final physical plan. Is there
anything in general that I might be missing?

I'm doing my sample work on 2.1.X and adding the strategy through
experimentalMethods

Here is an abbreviated code sample

class CassandraDirectJoinStrategy extends Strategy with Serializable {
  import CassandraDirectJoinStrategy._

  override def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
case ExtractEquiJoinKeys(joinType, leftKeys, rightKeys, condition,
left, right)
  if (validJoinBranch(left, leftKeys) || validJoinBranch(right,
rightKeys)) =>

  //TODO Check which side should be the target
  val (otherBranch, joinTargetBranch, joinKeys, buildType) = {
if (validJoinBranch(left, leftKeys)){
  (right, left, leftKeys, BuildLeft)
} else {
  (left, right, rightKeys, BuildRight)
}
  }

  logDebug(s"Performing Direct Cassandra Join against $joinTargetBranch")

  new CassandraDirectJoinExec(
leftKeys,
rightKeys,
joinType,
buildType,
condition,
planLater(otherBranch),
joinTargetBranch) :: Nil
case _ => Nil
  }
}

== Parsed Logical Plan ==
Join Inner, (k#267 = k#270)
:- LocalRelation [k#267, v#268]
+- Relation[k#270,v#271]
org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97

== Analyzed Logical Plan ==
k: int, v: int, k: int, v: int
Join Inner, (k#267 = k#270)
:- LocalRelation [k#267, v#268]
+- Relation[k#270,v#271]
org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97

== Optimized Logical Plan ==
Join Inner, (k#267 = k#270)
:- LocalRelation [k#267, v#268]
+- Relation[k#270,v#271]
org.apache.spark.sql.cassandra.CassandraSourceRelation@1237db97

== Physical Plan ==
CassandraDirectJoin
+- PlanLater LocalRelation [k#267, v#268]