[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2018-04-05 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427729#comment-16427729
 ] 

stack commented on HBASE-13992:
---

This go reverted from branch-2 via another issue.

> Integrate SparkOnHBase into HBase
> -
>
> Key: HBASE-13992
> URL: https://issues.apache.org/jira/browse/HBASE-13992
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Theodore michael Malaska
>Assignee: Theodore michael Malaska
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
> HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
> HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
> HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
> HBASE-13992.patch.5
>
>
> This Jira is to ask if SparkOnHBase can find a home in side HBase core.
> Here is the github: 
> https://github.com/cloudera-labs/SparkOnHBase
> I am the core author of this project and the license is Apache 2.0
> A blog explaining this project is here
> http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
> A spark Streaming example is here
> http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
> A real customer using this in produce is blogged here
> http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
> Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648186#comment-14648186
 ] 

Steve Loughran commented on HBASE-13992:


Coverage is an odd metric anyway, because as well as code there's state 
coverage : ipv6, windows, timezone=GMT0, locale=turkish, which can break things 
even in code which nominally had 100%. Having tests which generate failure 
conditions (done here) with test setups that explore the configuration space 
are about the best you can get.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-28 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645127#comment-14645127
 ] 

Hudson commented on HBASE-13992:


FAILURE: Integrated in HBase-TRUNK #6683 (See 
[https://builds.apache.org/job/HBase-TRUNK/6683/])
HBASE-13992 Integrate SparkOnHBase into HBase (busbey: rev 
30f7d127c3974cff9e3058e13d7c50805ee4482f)
* 
hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseContextSuite.scala
* 
hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkGetExample.java
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseDistributedScanExample.scala
* hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseMapPartitionExample.scala
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseStreamingBulkPutExample.scala
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkPutExample.scala
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/JavaHBaseContext.scala
* dev-support/test-patch.properties
* 
hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctionsSuite.scala
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkGetExample.scala
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctions.scala
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkDeleteExample.scala
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseBulkGetExample.scala
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseDStreamFunctions.scala
* 
hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkDeleteExample.java
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseForeachPartitionExample.scala
* 
hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseDistributedScan.java
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkPutTimestampExample.scala
* 
hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkPutExample.java
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseBulkPutExample.scala
* 
hbase-spark/src/test/java/org/apache/hadoop/hbase/spark/JavaHBaseContextSuite.java
* 
hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseDStreamFunctionsSuite.scala
* 
hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseMapGetPutExample.java
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkPutExampleFromFile.scala
* hbase-spark/pom.xml
* pom.xml
* 
hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseStreamingBulkPutExample.java
* 
hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseBulkDeleteExample.scala


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-28 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644461#comment-14644461
 ] 

Sean Busbey commented on HBASE-13992:
-

Looks good to me. As previously agreed I'll amend to up the javadoc warn count.

Things fine by you [~ste...@apache.org]?

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644580#comment-14644580
 ] 

Steve Loughran commented on HBASE-13992:


LGTM : I only worry about testability, and that's a good start. More tests will 
no doubt come over time ... something in Bigtop would be good for the 
integration

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-28 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644910#comment-14644910
 ] 

Andrew Purtell commented on HBASE-13992:


Thanks so much for the contribution Ted M, this is a great start! 

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-28 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644914#comment-14644914
 ] 

Sean Busbey commented on HBASE-13992:
-

If folks have a target, e.g. some coverage %, I'm happy to add another jira to 
the list of gates for HBASE-14160

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-28 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644975#comment-14644975
 ] 

Andrew Purtell commented on HBASE-13992:


Not opposed to a coverage target but I don't think we need one, at least not 
outside of a larger effort to up coverage everywhere. I think Steve's 
suggestion to have tests covering the likely runtime failure scenarios would be 
a good start.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-28 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644907#comment-14644907
 ] 

Andrew Purtell commented on HBASE-13992:


I agree with [~steve_l] we should have more tests. Let's follow up on another 
issue before this gets into a release. 

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-27 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643213#comment-14643213
 ] 

Ted Malaska commented on HBASE-13992:
-

Thanks Sean.  Let me know if there r any more changes needed for this patch.  

Also if this meets the min required to go in it will allow me to multi thread.

I would like to have three thirds of development:
1. More testing
2. Dataframes
3. Bulk load



 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, 
 HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, 
 HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-27 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643163#comment-14643163
 ] 

Sean Busbey commented on HBASE-13992:
-

all those zombie tests look unrelated and they pass locally.

How do the additions look, [~ste...@apache.org]?

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, 
 HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, 
 HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-27 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643354#comment-14643354
 ] 

Sean Busbey commented on HBASE-13992:
-

{quote}
its probably best to put artifact versions in the root pom.xml, not the spark 
one, for one single place for dependencies ... this will matter if 1 scala 
module goes in.
{quote}

We've had a couple of folks (myself, Andrew) request all spark and scala 
related bits be isolated to this module specifically because we don't have any 
other scala modules yet and we want to minimize the reach of those dependencies 
until we do. A top-level Scala version would require us to make a determination 
now about handling Scala version incompatibilities, e.g. with different module 
layouts.

{quote}
{{HBaseDStreamFunctionsSuite.scala}} has the wrong assumption.
{{assert(foo5.equals(bar), foo4 + !=bar)}}
Scalatest lets you use assertResult instead, for an auto-generated message
{code}assertResult(bar) { foo5 } {code}
And you can use == for a slightly less informative error message, but one which 
still includes the values on either side
{quote}

sounds like a reasonable fixup.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, 
 HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, 
 HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-27 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643365#comment-14643365
 ] 

Ted Malaska commented on HBASE-13992:
-

Ok I will make the two code changes and get a patch to use guys soon.

Thanks for the review

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, 
 HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, 
 HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643322#comment-14643322
 ] 

Steve Loughran commented on HBASE-13992:


New tests look good.

# its probably best to put artifact versions in the root pom.xml, not the spark 
one, for one single place for dependencies ... this will matter if 1 scala 
module goes in.

# {{HBaseDStreamFunctionsSuite.scala }} has the wrong assumption. 
{code}
{{assert(foo5.equals(bar), foo4 + !=bar)}}
{code}

Scalatest lets you use assertResult instead, for an auto-generated message

{code}
assertResult(bar) { foo5 } 
{code}

And you can use {{==}} for a slightly less informative error message, but one 
which still includes the values on either side




 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, 
 HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, 
 HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-27 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643715#comment-14643715
 ] 

Hadoop QA commented on HBASE-13992:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12747422/HBASE-13992.12.patch
  against master branch at commit 1566ec5fdc751897b2e931b2b0920c6d503c85ce.
  ATTACHMENT ID: 12747422

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 26 javac compiler 
warnings (more than the master's current 24 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):   
at 
org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScanBase.testScan(TestTableInputFormatScanBase.java:244)
at 
org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan2.testScanYYYToEmpty(TestTableInputFormatScan2.java:96)
at 
org.apache.hadoop.hbase.mapreduce.TestCellCounter.testCellCounterStartTimeRange(TestCellCounter.java:137)
at 
org.apache.hadoop.hbase.mapreduce.TestImportExport.testWithFilter(TestImportExport.java:447)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14901//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14901//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14901//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14901//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14901//console

This message is automatically generated.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-25 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641429#comment-14641429
 ] 

Hadoop QA commented on HBASE-13992:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12747159/HBASE-13992.11.patch
  against master branch at commit dad4cad30e5b0c69694ee90908ad8e74c592d821.
  ATTACHMENT ID: 12747159

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 26 javac compiler 
warnings (more than the master's current 24 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):   
at 
org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot.testExportFileSystemState(TestMobExportSnapshot.java:285)
at 
org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot.testExportFileSystemState(TestMobExportSnapshot.java:259)
at 
org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot.testExportWithTargetName(TestMobExportSnapshot.java:217)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:288)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:262)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testEmptyExportFileSystemState(TestExportSnapshot.java:206)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:288)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:262)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testSnapshotWithRefsExportFileSystemState(TestExportSnapshot.java:256)
at 
org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testSnapshotWithRefsExportFileSystemState(TestExportSnapshot.java:236)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14891//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14891//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14891//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14891//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14891//console

This message is automatically generated.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, 
 HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, 
 HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 

[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-24 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639984#comment-14639984
 ] 

Hadoop QA commented on HBASE-13992:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12746933/HBASE-13992.10.patch
  against master branch at commit dad4cad30e5b0c69694ee90908ad8e74c592d821.
  ATTACHMENT ID: 12746933

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 26 javac compiler 
warnings (more than the master's current 24 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 3 zombie test(s):   
at 
org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes.testDeleteFamilyLatestTimeStampWithMulipleVersionsWithoutCellVisibilityInPuts(TestVisibilityLabelsWithDeletes.java:1511)
at 
org.apache.hadoop.hbase.coprocessor.TestOpenTableInCoprocessor.testCoprocessorCanCreateConnectionToRemoteTableWithCustomPool(TestOpenTableInCoprocessor.java:145)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14876//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14876//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14876//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14876//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14876//console

This message is automatically generated.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.5.patch, 
 HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, 
 HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-24 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641228#comment-14641228
 ] 

Sean Busbey commented on HBASE-13992:
-

v10 passes for me locally now.

It looks like both of the failure cases are checking for SparkException. Could 
we check directly for TableNotFoundException and NoSuchColumnFamilyException?

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.5.patch, 
 HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, 
 HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639526#comment-14639526
 ] 

Lars Hofhansl commented on HBASE-13992:
---

Would like to meet up. Out of the country on vacation until August 11th, though.

Re: Tests. I think we can commit this, and file a sub jira to add tests. (On 
the other hand, why rush? If more tests are requested, why not add them now 
before commit?) I'm fine either way.


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639709#comment-14639709
 ] 

Sean Busbey commented on HBASE-13992:
-

{quote}
Were a project I was a committer on, I'd be mandating the failure tests, as 
they are the tests most likely to break things. As I'm not an HBase committer, 
I will leave the opinions to others. At the very least, there needs to be a 
followup JIRA for the extra tests.
{quote}

HBase works by consensus across our community, not just consensus from those 
who have been flagged as more established in the project.

Ted, let me know if you run into any trouble getting these set up or if you 
want a hand off on adding them.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639697#comment-14639697
 ] 

Steve Loughran commented on HBASE-13992:


Were a project I was a committer on, I'd be mandating the failure tests, as 
they are the tests most likely to break things. As I'm not an HBase committer, 
I will leave the opinions to others. At the very least, there needs to be a 
followup JIRA for the extra tests.

As ted notes, they should just throw the standard exceptions.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638800#comment-14638800
 ] 

Ted Malaska commented on HBASE-13992:
-

Thanks lars.  Will do there will be a lot of jiras after this one.  Also if u 
have time next week or the week after I would love to brain storm about the 
road map.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638666#comment-14638666
 ] 

Lars Hofhansl commented on HBASE-13992:
---

+1 on V9. Thanks for your patience [~ted.m].

One more idea for a future improvement: Make BulkGet return things in exactly 
the same format as distributed scan.
Would be cool, since then one could plug ways to get data (bulk get, scan with 
filter, InputputFormat, etc) and leave all the rest of the code identical. Can 
do later.


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639734#comment-14639734
 ] 

Ted Malaska commented on HBASE-13992:
-

Hey Steve, no worries.  You r right those tests are important and I will try to 
get a patch in the next couple of hours.

But just note this is not a perfect patch.  After this there is about 4 to 6 
more patches to make this really great.  Once this patch is in more people can 
help and we can multi thread our development effort.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639471#comment-14639471
 ] 

Sean Busbey commented on HBASE-13992:
-

I agree that test handling around failures will be necessary for this to be 
robust and usable for downstream folks.

Steve, do you think we need those tests in place to land on master? Would you 
be fine with making those additions part of the gate for getting this 
backported to branch-1? The expectation is that backporting to branch-1 will 
mark when users end up interacting with this code. Similar to needing failure 
handling we'll definitely need to have user-facing docs before that happens, 
but those docs are part of the follow on work after this first jira.

I think right now the expectation is that master won't have a release before 
the end of the year and these changes in branch-1 would correspond to 1.3. 
Given that we're trying to get 1.2 out now, that would give us at least  3-4 
months for the follow on tickets.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639587#comment-14639587
 ] 

Ted Malaska commented on HBASE-13992:
-

I can add in those tests today or tomorrow.  There if nothing special about the 
Spark integration it just gives a connection to the executors.  So in these 
cases we will just see a normal HBase exception like if you did without spark.

test against non-existent database
attempt to work with a table that doesn't exist
attempt to read a column that doesn't exist

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639357#comment-14639357
 ] 

Steve Loughran commented on HBASE-13992:


There's not much in the way of tests here, in particular, not much in the way 
of generation of failure conditions and validation of outcome

Ideally, there'd be one test to generate each failure condition: the exception 
handling including those which downgrade a failure to a log message...the test 
should verify that such actions are the correct response.

At the very least, I'd recommend

# test against non-existent database
# attempt to work with a table that doesn't exist
# attempt to read a column that doesn't exist


I'd also make sure test teardown is robust, catching exceptions  downgrading 
to logs. That way, if something didn't get set up properly, the root cause of 
the failure isn't hidden by any exception generated in teardown.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639290#comment-14639290
 ] 

Sean Busbey commented on HBASE-13992:
-

I'm trying to do a final build to make sure I've got the patch applied 
correctly, and the new integration tests are failing.

{code}
[INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ hbase-spark ---
WARNING: -c has been deprecated and will be reused for a different (but still 
very cool) purpose in ScalaTest 2.0. Please change all uses of -c to -P.
Discovery starting.
Discovery completed in 284 milliseconds.
Run starting. Expected test count is: 11
HBaseDStreamFunctionsSuite:
HBaseContextSuite:
HBaseRDDFunctionsSuite:
2015-07-23 10:50:56.702 java[97585:6403] Unable to load realm info from 
SCDynamicStore
*** RUN ABORTED ***
  java.util.concurrent.ExecutionException: java.io.IOException: Shutting down
  at java.util.concurrent.FutureTask.report(FutureTask.java:122)
  at java.util.concurrent.FutureTask.get(FutureTask.java:188)
  at 
org.scalatest.tools.ConcurrentDistributor.waitUntilDone(ConcurrentDistributor.scala:52)
  at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2549)
  at 
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044)
  at 
org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043)
  at 
org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722)
  at 
org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043)
  at org.scalatest.tools.Runner$.main(Runner.scala:860)
  at org.scalatest.tools.Runner.main(Runner.scala)
  ...
  Cause: java.io.IOException: Shutting down
  at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:232)
  at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:94)
  at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1040)
  at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1006)
  at 
org.apache.hadoop.hbase.spark.HBaseDStreamFunctionsSuite.beforeAll(HBaseDStreamFunctionsSuite.scala:44)
  at 
org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
  at 
org.apache.hadoop.hbase.spark.HBaseDStreamFunctionsSuite.beforeAll(HBaseDStreamFunctionsSuite.scala:30)
  at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253)
  at 
org.apache.hadoop.hbase.spark.HBaseDStreamFunctionsSuite.run(HBaseDStreamFunctionsSuite.scala:30)
  at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55)
  ...
  Cause: java.lang.RuntimeException: Failed construction of Master: class 
org.apache.hadoop.hbase.master.HMasterAddress already in use
  at 
org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:143)
  at 
org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:218)
  at 
org.apache.hadoop.hbase.LocalHBaseCluster.init(LocalHBaseCluster.java:154)
  at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:214)
  at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:94)
  at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1040)
  at 
org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1006)
  at 
org.apache.hadoop.hbase.spark.HBaseDStreamFunctionsSuite.beforeAll(HBaseDStreamFunctionsSuite.scala:44)
  at 
org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187)
  at 
org.apache.hadoop.hbase.spark.HBaseDStreamFunctionsSuite.beforeAll(HBaseDStreamFunctionsSuite.scala:30)
  ...
  Cause: java.net.BindException: Port in use: 0.0.0.0:16010
  at org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1013)
  at org.apache.hadoop.hbase.http.HttpServer.start(HttpServer.java:949)
  at org.apache.hadoop.hbase.http.InfoServer.start(InfoServer.java:91)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.putUpWebUI(HRegionServer.java:1789)
  at 
org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:604)
  at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:363)
  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
  at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
  at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
  at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
  ...
  Cause: java.net.BindException: Address already in use
  at sun.nio.ch.Net.bind0(Native Method)
  at sun.nio.ch.Net.bind(Net.java:444)
  at sun.nio.ch.Net.bind(Net.java:436)
  at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214)
  at 

[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-22 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637463#comment-14637463
 ] 

Ted Malaska commented on HBASE-13992:
-

Hold on that last patch.  I have a couple more changes left.  give me 10 minutes

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.patch, 
 HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-22 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636943#comment-14636943
 ] 

Ted Malaska commented on HBASE-13992:
-

So today from 3 to 6 I will have time to work on this.  So if this jira is 
closed by 3 I will make a new one for the HBaseRDD.collect other wise I will 
add an additional patch to this jira.

But after that next patch please help me close this out.  I do have a number of 
jiras I would like to start to add additional functionality to this original 
patch

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637720#comment-14637720
 ] 

Hadoop QA commented on HBASE-13992:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12746620/HBASE-13992.9.patch
  against master branch at commit 20739542fdca185eb857813bf269d6262c11b652.
  ATTACHMENT ID: 12746620

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 26 javac compiler 
warnings (more than the master's current 24 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 1 zombie test(s):   
at 
org.apache.camel.test.spring.CamelSpringTestSupport.doCreateApplicationContext(CamelSpringTestSupport.java:90)
at 
org.apache.camel.test.spring.CamelSpringTestSupport.doPreSetup(CamelSpringTestSupport.java:80)
at 
org.apache.camel.test.junit4.CamelTestSupport.setUp(CamelTestSupport.java:237)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14864//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14864//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14864//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14864//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14864//console

This message is automatically generated.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-22 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637736#comment-14637736
 ] 

Ted Malaska commented on HBASE-13992:
-

Ok patch 9 passed the build.  It has lar's hbaserdd.collect change.  Are we 
good for a commit?

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-22 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637730#comment-14637730
 ] 

Hadoop QA commented on HBASE-13992:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12746611/HBASE-13992.8.patch
  against master branch at commit 20739542fdca185eb857813bf269d6262c11b652.
  ATTACHMENT ID: 12746611

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 26 javac compiler 
warnings (more than the master's current 24 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1871 checkstyle errors (more than the master's current 1870 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+  private static class ScanConvertFunction implements 
FunctionTuple2ImmutableBytesWritable, Result, String {
+  private void populateTableWithMockData(Configuration conf, TableName 
tableName) throws IOException {

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):   
at 
org.apache.hadoop.hbase.client.TestAdmin1.testForceSplitMultiFamily(TestAdmin1.java:1021)
at 
org.apache.hadoop.hbase.client.TestReplicaWithCluster.testChangeTable(TestReplicaWithCluster.java:224)
at 
org.apache.hadoop.hbase.client.TestMetaWithReplicas.testShutdownHandling(TestMetaWithReplicas.java:141)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14863//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14863//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14863//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14863//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14863//console

This message is automatically generated.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-22 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637757#comment-14637757
 ] 

Sean Busbey commented on HBASE-13992:
-

I'm good with v9. I can amend to update the number of allowed javadoc warnings 
as previously discussed.

[~lhofhansl]?

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635591#comment-14635591
 ] 

Andrew Purtell commented on HBASE-13992:


Latest patch lgtm 

There is a javadoc nit:

{noformat}
[INFO] Compiling 3 source files to 
/home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/hbase-spark/target/test-classes
 at 1437449685681
[WARNING] warning: Class org.apache.hadoop.mapred.MiniMRCluster not found - 
continuing with a stub.
[WARNING] one warning found
{noformat}

Not quite sure what to make of that. We could bump the expected count if 
there's no solution.



 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-21 Thread Matteo Bertozzi (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635598#comment-14635598
 ] 

Matteo Bertozzi commented on HBASE-13992:
-

{quote}Not quite sure what to make of that. We could bump the expected count if 
there's no solution.{quote}
yeah I was looking at that too, there is no use of MiniMRCluster in the patch. 
so it is probably from the hadoop-common dependency or something like that.
I'm +1 to bump the count unless [~busbey] or someone else knows how to make 
that warn go away.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-21 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635616#comment-14635616
 ] 

Sean Busbey commented on HBASE-13992:
-

+1 let's get it in and fix the warning as a follow-on. Ted, would you mind 
filing a jira for the minimrcluster warning and assigning it to me?

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-21 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635813#comment-14635813
 ] 

Ted Malaska commented on HBASE-13992:
-

Oh now I understand the part that Lars is referencing.  I was think of the 
hbase rdd functions which is like what Lars is saying but there is a 
distributed scan that is not the same.

Can I do what sean said and fix it in anot her jira.

I can make and submit the jira tomorrow.  

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-21 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635823#comment-14635823
 ] 

Andrew Purtell commented on HBASE-13992:


I'm ok with getting this in to master either with or without the adjustment for 
HBaseRDD.collect for further refinement before it makes its way into a release. 
Hoping we can get more exposure of this among Spark app developers for further 
feedback. I know Ted also has a bunch of incremental improvements in mind. 

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-21 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635791#comment-14635791
 ] 

Sean Busbey commented on HBASE-13992:
-

I haven't done much day to day with Spark, so I don't know how inconvenient 
that difference is.

This is only targeting master initially. How about we add fixing the return 
from HBaseRDD.collect to the list of gatekeeper issues for pulling it back into 
branch-1?

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-21 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635783#comment-14635783
 ] 

Lars Hofhansl commented on HBASE-13992:
---

Sorry to harp the point... Is everybody OK that HBaseRDD.collect() returns 
ListTuple2byte[], ListTuple3byte[], byte[], byte[]?
Instead of perhaps ListTuple2byte[], Result?


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633537#comment-14633537
 ] 

Lars Hofhansl commented on HBASE-13992:
---

super minor nit: it's bet to have an attached patch end in .txt (which I 
prefer) or .patch, so that browser can open it directly. :)

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633547#comment-14633547
 ] 

Lars Hofhansl commented on HBASE-13992:
---

From the example:
{code}
ListTuple2byte[], ListTuple3byte[], byte[], byte[] results = 
javaRdd.collect();
{code}

Still does map byte[] - Result, and so will be different from the HadoopRDD 
would do. (unless I am missing something)


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-20 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633630#comment-14633630
 ] 

Ted Malaska commented on HBASE-13992:
-

[~lhofhansl] My bad.  That shouldn't be to hard to fix.  I will have that in 
the next patch.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-20 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633802#comment-14633802
 ] 

Ted Malaska commented on HBASE-13992:
-

[~ted_yu] thanks I will have a new patch hopefully tonight and I will name it 
with .patch at the end.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-20 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633796#comment-14633796
 ] 

Ted Yu commented on HBASE-13992:


[~ted.m]:
The patch ending in .5 was not picked up by QA bot.
Name your patch with .patch or .txt extension.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634141#comment-14634141
 ] 

Hadoop QA commented on HBASE-13992:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12746162/HBASE-13992.5.patch
  against master branch at commit 0f614a1c44e1887ca7177b66bb6208b6e69db7e1.
  ATTACHMENT ID: 12746162

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 49 javac compiler 
warnings (more than the master's current 24 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 5 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1878 checkstyle errors (more than the master's current 1871 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 release 
audit warnings (more than the master's current 0 warnings).

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+ xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+new SparkConf().setAppName(JavaHBaseStreamingBulkPutExample  + 
tableName + : + port + : + tableName);
+configBroadcast: 
Broadcast[SerializableWritable[Configuration]],
+  private def getConf(configBroadcast: 
Broadcast[SerializableWritable[Configuration]]): Configuration = {
+   configBroadcast: 
Broadcast[SerializableWritable[Configuration]],
+ * or security issues. For instance, an Array[AnyRef] can hold any type T, 
but may lose primitive
+val sparkConf = new SparkConf().setAppName(HBaseBulkPutTimestampExample  
+ tableName +   + columnFamily)
+(Bytes.toBytes(6), Array((Bytes.toBytes(columnFamily), 
Bytes.toBytes(1), Bytes.toBytes(1,
+(Bytes.toBytes(7), Array((Bytes.toBytes(columnFamily), 
Bytes.toBytes(1), Bytes.toBytes(2,
+(Bytes.toBytes(8), Array((Bytes.toBytes(columnFamily), 
Bytes.toBytes(1), Bytes.toBytes(3,

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14840//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14840//artifact/patchprocess/patchReleaseAuditWarnings.txt
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14840//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14840//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14840//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14840//console

This message is automatically generated.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.patch, 
 HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-20 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633980#comment-14633980
 ] 

Ted Malaska commented on HBASE-13992:
-

[~ted_yu] I uploaded a new patch.  No code changes.  Just a new name.



 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.patch, 
 HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-20 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633977#comment-14633977
 ] 

Ted Malaska commented on HBASE-13992:
-

[~lhofhansl] So maybe I read your comment wrong.

Patch 5 returns RDD[(ImmutableBytesWritable, Result)] from hbaseBulkGet

What would you like it to return?  Maybe I read it wrong but doesn't 
TableInputFormat return (ImmutableBytesWritable, Result)?

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-20 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633981#comment-14633981
 ] 

Lars Hofhansl commented on HBASE-13992:
---

No, that's right. Hmm. Why does HBaseRDD.collect() return something different? 
(I think that part confused me)

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.patch, 
 HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634388#comment-14634388
 ] 

Hadoop QA commented on HBASE-13992:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12746201/HBASE-13992.6.patch
  against master branch at commit 7ce318dd3be9df0ee1c025b4792ded0161aa2c9c.
  ATTACHMENT ID: 12746201

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 44 javac compiler 
warnings (more than the master's current 24 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+ * or security issues. For instance, an Array[AnyRef] can hold any 
type T, but may lose primitive
+  putRecord._2.foreach((putValue) = put.addColumn(putValue._1, 
putValue._2, timeStamp, putValue._3))
+ val sparkConf = new SparkConf().setAppName(HBaseBulkPutExample  + 
tableName +   + columnFamily)
+ (Bytes.toBytes(1), Array((Bytes.toBytes(columnFamily), 
Bytes.toBytes(1), Bytes.toBytes(1,
+ (Bytes.toBytes(2), Array((Bytes.toBytes(columnFamily), 
Bytes.toBytes(1), Bytes.toBytes(2,
+ (Bytes.toBytes(3), Array((Bytes.toBytes(columnFamily), 
Bytes.toBytes(1), Bytes.toBytes(3,
+ (Bytes.toBytes(4), Array((Bytes.toBytes(columnFamily), 
Bytes.toBytes(1), Bytes.toBytes(4,
+ (Bytes.toBytes(5), Array((Bytes.toBytes(columnFamily), 
Bytes.toBytes(1), Bytes.toBytes(5

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.mapreduce.TestImportTSVWithOperationAttributes
  org.apache.hadoop.hbase.mapreduce.TestRowCounter

 {color:red}-1 core zombie tests{color}.  There are 5 zombie test(s):   
at 
org.apache.phoenix.hbase.index.covered.example.EndToEndCoveredIndexingIT.testMultipleTimestampsInSingleDelete(EndToEndCoveredIndexingIT.java:428)
at 
org.apache.phoenix.hbase.index.balancer.IndexLoadBalancerIT.testRandomAssignmentDuringIndexTableEnable(IndexLoadBalancerIT.java:265)

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14845//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14845//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14845//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14845//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14845//console

This message is automatically generated.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate 

[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-20 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634491#comment-14634491
 ] 

Hadoop QA commented on HBASE-13992:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12746246/HBASE-13992.7.patch
  against master branch at commit 7ce318dd3be9df0ee1c025b4792ded0161aa2c9c.
  ATTACHMENT ID: 12746246

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 9 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 26 javac compiler 
warnings (more than the master's current 24 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 2 
warning messages.

{color:green}+1 checkstyle{color}.  The applied patch does not increase the 
total number of checkstyle errors

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 lineLengths{color}.  The patch does not introduce lines 
longer than 100

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
 

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14849//testReport/
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14849//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14849//artifact/patchprocess/checkstyle-aggregate.html

  Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14849//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14849//console

This message is automatically generated.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-18 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632420#comment-14632420
 ] 

Lars Hofhansl commented on HBASE-13992:
---

bq. As for the Result you do get the results but you need to convert it to 
something that can go into an RDD.

The TableInputFormat will return tuples of byte[] - Result, that's also what 
one would get when using the HadoopRDD with TableInputFormat.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-18 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632589#comment-14632589
 ] 

Ted Malaska commented on HBASE-13992:
-

[~lhofhansl] great idea I will try to add that in to the next patch which 
should be out tomorrow.

Thanks for helping me through this

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-17 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631369#comment-14631369
 ] 

Ted Malaska commented on HBASE-13992:
-

[~lhofhansl]
* I will add the comments
* As for the Result you do get the results but you need to convert it to 
something that can go into an RDD.
* As for the memory needed.  Think of it as if you are reading a file from S3 
or HDFS.  It is the same thing here.
* Yes I tested 1.4 with -D and it seemed to work.  

And yes I'm super excited about this too, because this is only the beginning.

I will work through the weekend to get all these reviews into the next patch.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-17 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631283#comment-14631283
 ] 

Sean Busbey commented on HBASE-13992:
-

They were on review board

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631344#comment-14631344
 ] 

Lars Hofhansl commented on HBASE-13992:
---

Few nits:
* Comments in what's happening in the Scala and JavaHBase*Examples
* Might want to have a comment in *HBaseBulkDeleteExample.java explaining that 
it is probably better to use the built-in BulkDeleteEndpoint (which in turn we 
have to document better)
* Comment explaining in *HBaseDistributedScan as to why this is preferred over 
using Spark's built-in HadoopRDD with TableInputFormat or the included HBaseRDD 
(it's obvious, but hey, somebody might look at the classes and wonder why).
* I assume there's quite some heap needed to get the RDD resulting from a scan 
into a (RowKey, List[(columnFamily, columnQualifier, Value) (going by the 
article here). Can that be avoided if it is a problem? Or in other words, is 
there an easy to way to get a raw Result[] or ListResult? ... sorry if I'm 
missing something.

Very excited about having this in HBase proper.


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-17 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631097#comment-14631097
 ] 

Lars Hofhansl commented on HBASE-13992:
---

bq. -Dspark.version=1.4.0

That's what I meant. Sorry I didn't say explicitly.

bq. thank you vary much for your reviews today

Did these happen offline?

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-16 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630135#comment-14630135
 ] 

Ted Malaska commented on HBASE-13992:
-

[~lhofhansl] I would like to add 1.4 or 1.5 into another patch.  This patch is 
so large already.  My hope is to close this one out then start a couple more 
right off of this like:

1. Newer versions of Spark
2. Adding DataFrame Support
3. Documentation
4. UpdateStateBy Key that will work through HBase
5. Bulk load to HBase
6. Much More

Thanks

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-16 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630666#comment-14630666
 ] 

Ted Malaska commented on HBASE-13992:
-

Hey [~andrew.purt...@gmail.com], yes just tested -Dspark.version=1.4.0 and it 
worked fine.  I had one compile bug that I fixed.  I will send it in the next 
patch.

[~andrew.purt...@gmail.com] and [~busbey] thank you vary much for your reviews 
today.  I know it was a lot of code and I thank you for taking the time.

I will spend the next couple of days getting you a patch.  Hopefully by sunday 
night.



 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-16 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630630#comment-14630630
 ] 

Andrew Purtell commented on HBASE-13992:


bq. can we at least add an optional build against 1.4.0 (as we do with Hadoop)?

The pom lets you do {{-Dspark.version=1.4.0}} (and also scala.version and 
scala.binary.version). Does this work for now? [~malaskat] does this compile 
with Spark 1.4? 

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, 
 HBASE-13992.patch.4


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628028#comment-14628028
 ] 

Lars Hofhansl commented on HBASE-13992:
---

Belated +1 on having this in a hbase-spark module as part of HBase. Looking 
at the patch now.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-15 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628041#comment-14628041
 ] 

Lars Hofhansl commented on HBASE-13992:
---

Also came across Spark version 1.3.0. Will SparkOnHBase currently not 
compile/work with 1.4.x? If not... Fine. But if it does, can we at least add an 
optional build against 1.4.0 (as we do with Hadoop)?


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch, HBASE-13992.patch.3


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-14 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627231#comment-14627231
 ] 

Andrew Purtell commented on HBASE-13992:


bq. Should newer Spark release, such as 1.4.0, be used ?

See 
https://issues.apache.org/jira/browse/HBASE-13992?focusedCommentId=14621073page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14621073
 [~tedyu]

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-13 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625729#comment-14625729
 ] 

Sean Busbey commented on HBASE-13992:
-

Some help on the feedback from QA, since our current version is lacking some of 
the niceties of the soon-to-be version:

* to find the javac warnings, you can either run {{mvn -DSkipTests package | 
tee some_log.log}} before and after, then diff, or if you want to presume the 
new warnings are just in your new module do it after and search for lines 
starting with \[WARNING].
* to find the javadoc warnings you can either run {{mvn -DskipTests 
javadoc:javadoc | tee some_other.log}} before and after, then diff. or again 
you could just look in your module for the start of \[WARNING] Javadoc 
Warnings
* 10 of the checkstyle warnings will be the too long lines. to get the rest 
you'll need to run {{mvn -DskipTests checkstyle:aggregate}} and save the 
{{target/checkstyle-result.xml}} files to compare before/after.
* the release audit warning is the missing license header noted on the review
* the TestRemoteTable failure seems unrelated.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-13 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625745#comment-14625745
 ] 

Ted Malaska commented on HBASE-13992:
-

[~busbey] will do, thank you and sorry for missing that on the first draft
[~ted_yu] Done.  Added HBase to the review board group



 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625743#comment-14625743
 ] 

Ted Yu commented on HBASE-13992:


There have been review comments on the review board while I haven't received 
notification.

Please add hbase to Groups field.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-13 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625691#comment-14625691
 ] 

Ted Yu commented on HBASE-13992:


Uploading onto reviewboard would make reviewing easier.
{code}
+spark.version1.3.0/spark.version
{code}
Should newer Spark release, such as 1.4.0, be used ?
{code}
+!-- scopetest/scope Return--
{code}
Uncomment the above ?
Please add short javadoc for the XXExample classes.
{code}
+  System.out
+  .println(JavaHBaseBulkGetExample  {master} {tableName});
{code}
Merge above two lines.

For JavaHBaseDistributedScan:
{code}
+results.size();
+  }
{code}
Did you intend to print the result size ?

For JavaHBaseMapGetPutExample, GetFunction isn't called.
{code}
+  .println(JavaHBaseBulkPutExample  {master} {host} {post} 
{tableName} {columnFamily});
{code}
post - port

For HBaseContext,
{code}
+  * serializable Configuration object
{code}
There're 3 parameters to HBaseContext. Above is one of them. Did you intend to 
provide scaladoc for all of them ?
{code}
+  def mapPartition[T, R: ClassTag](rdd: RDD[T],
{code}
Should the above method be called mapPartitions (to align with method of RDD) ?
{code}
+  def streamForeachRDD[T](dstream: DStream[T],
{code}
Should the method be called streamForeachPartition since there is foreachRDD 
method which accepts DStream already.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-13 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625702#comment-14625702
 ] 

Ted Malaska commented on HBASE-13992:
-

Hey [~ted_yu] the link for the review board is in the jira.

But here it is again.  https://reviews.apache.org/r/36457

Thank you for reviewing.

I will try to get a new cut out tomorrow

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-13 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625515#comment-14625515
 ] 

Ted Malaska commented on HBASE-13992:
-

Oh for help for the reviews.  All the magic is in HBaseContext.

Everything else is ether one of the following:
1. Examples
2. Tests
3. Implicit Scala Functions
4. Java port

Also this doesn't include the following, which will come in following patches:
1. Validation that the code will be able to accept new HBase Kerberos tickets 
given through Spark-Submit in Yarn-Cluster mode.
2. Integration with DataFrames.  This is easy to do I just wanted to separate 
it out into a different jira.
3. Better unit testing.  I'm testing every function with the HBase test 
cluster, but I'm not the best at unit test, so on a following patch I will work 
with others to add more tests.
4. More Examples.  I would like to build on common Spark Stream use cases with 
HBase.
5. Documentation.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-13 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625643#comment-14625643
 ] 

Hadoop QA commented on HBASE-13992:
---

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12745137/HBASE-13992.patch
  against master branch at commit a3d30892b41f604ab5a62d4f612fa7c230267dfe.
  ATTACHMENT ID: 12745137

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+1 tests included{color}.  The patch appears to include 7 new 
or modified tests.

{color:green}+1 hadoop versions{color}. The patch compiles with all 
supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0)

{color:red}-1 javac{color}.  The applied patch generated 42 javac compiler 
warnings (more than the master's current 20 warnings).

{color:green}+1 protoc{color}.  The applied patch does not increase the 
total number of protoc compiler warnings.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 4 
warning messages.

{color:red}-1 checkstyle{color}.  The applied patch generated 
1890 checkstyle errors (more than the master's current 1873 errors).

{color:green}+1 findbugs{color}.  The patch does not introduce any  new 
Findbugs (version 2.0.3) warnings.

{color:red}-1 release audit{color}.  The applied patch generated 1 release 
audit warnings (more than the master's current 0 warnings).

{color:red}-1 lineLengths{color}.  The patch introduces the following lines 
longer than 100:
+ xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+argLine-Xmx1536m -XX:MaxPermSize=512m 
-XX:ReservedCodeCacheSize=512m/argLine
+  public static class CustomFunction implements 
VoidFunctionTuple2Iteratorbyte[], HConnection {
+  .println(JavaHBaseBulkPutExample  {master} {host} {post} 
{tableName} {columnFamily});
+JavaReceiverInputDStreamString javaDstream = jssc.socketTextStream(host, 
Integer.parseInt(port));
+  private def bulkMutation[T](rdd: RDD[T], tableName: TableName, f: (T) = 
Mutation, batchSize: Integer) {
+  def hbaseRDD[U: ClassTag](tableName: TableName, scan: Scan, f: 
((ImmutableBytesWritable, Result)) = U): RDD[U] = {
+TableMapReduceUtil.initTableMapperJob(tableName, scan, 
classOf[IdentityTableMapper], null, null, job)
+  list.add((CellUtil.cloneFamily(cell), CellUtil.cloneQualifier(cell), 
CellUtil.cloneValue(cell)))
+configBroadcast: 
Broadcast[SerializableWritable[Configuration]],

  {color:green}+1 site{color}.  The mvn post-site goal succeeds with this patch.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.rest.client.TestRemoteTable

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14762//testReport/
Release audit warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14762//artifact/patchprocess/patchReleaseAuditWarnings.txt
Release Findbugs (version 2.0.3)warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14762//artifact/patchprocess/newFindbugsWarnings.html
Checkstyle Errors: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14762//artifact/patchprocess/checkstyle-aggregate.html

Javadoc warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14762//artifact/patchprocess/patchJavadocWarnings.txt
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/14762//console

This message is automatically generated.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.patch


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-10 Thread Kostas Sakellis (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621993#comment-14621993
 ] 

Kostas Sakellis commented on HBASE-13992:
-

Just driving by and curious about
bq. gap: data frames are missing key functionality pre-1.4.
What specific features is this referring to?

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-09 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621073#comment-14621073
 ] 

Sean Busbey commented on HBASE-13992:
-

Notes from call today where [~apurtell] and I helped walk through a plan for 
this with [~ted.m].


h3. Overall Roadmap

* start with current capabilities refactored into a module with hbase packages
* initial support targets Spark 1.3
* later add RDD extension to expose hbase-specific functionality
* later add data frame support, likely as additional module
* start with master-only,  use backport patch for branch-1 post refactor and 
docs 

h3. Minimums for Landing on Master

* use an isolated top-level module to reduce impact on rest of project, modeled 
after the phoenix-spark module
** hbase-spark
** should help limit reach of dependency conflicts. look to phoenix-spark for 
any conflict resolution.
* target spark 1.3 as upstream dependency
** already works with 1.3
** 1.3 seen as able to reach a large existing user base and stable
** gap: data frames are missing key functionality pre-1.4.
** gap: unknown spark compatibility promises, history of scala version changes 
may limit how many future versions this can work with.
* move existing implementation classes into an hbase subpackage
**  org.apache.hadoop.hbase.spark
** there is currently a workaround to access a spark-package-private set of 
configs, needed for pre-1.3. since 1.3 is our minimum expected version, remove 
this workaround.
* leave examples as a part of the single module definition for first landing
* leave tests, including scala tests, as unit tests on the single module for 
first landing

h3. Follow on for branch-1

* documentation - need to add a ref guide section on using the spark bindings
* examples refactoring - examples should be moved out of the primary module 
into something like hbase-spark-examples
* tests refactoring - java tests should be categorized to match rest of 
project. Scala tests should be moved to an IT


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark
 Fix For: 2.0.0


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-09 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621098#comment-14621098
 ] 

Sean Busbey commented on HBASE-13992:
-

I thought of another follow-on: we need to decide on and annotate the supported 
public API. Not sure how that or automated docs work for any Scala parts of the 
implementation.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-01 Thread Nicolas Liochon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610738#comment-14610738
 ] 

Nicolas Liochon commented on HBASE-13992:
-

+1 as well for me. How does it work for the binaries version, will we have to 
enter into the scala game, i.e. hbase-spark-2_10? What about the spark version? 
The spark-hadoop version?


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-01 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610694#comment-14610694
 ] 

stack commented on HBASE-13992:
---

[~malaskat] ping if you have questions or need a hand doing up a patch. I'd be 
game.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-06-30 Thread Enis Soztutar (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609073#comment-14609073
 ] 

Enis Soztutar commented on HBASE-13992:
---

This seems fine as a module. +1 for {{hbase-spark}} as a name. Phoenix also 
comes with a Spark module itself. 

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-06-29 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606559#comment-14606559
 ] 

Andrew Purtell commented on HBASE-13992:


I suggest giving it a day or two for more comment to come in. If nothing 
changes, I'd say we could proceed with the integration approach of a new Maven 
module, addressing Elliott's comments.

bq. To make the design I will need help from committers can I get assigned a 
Committer to give me guidance?

We don't assign committers but maybe someone will volunteer. Otherwise I 
suggest writing up what questions you have or how you'd like to proceed, or 
both, and post them here. I suspect you'll get guidance back in the responses.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-06-29 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606505#comment-14606505
 ] 

Andrew Purtell commented on HBASE-13992:


I think this would be a good thing to have in. We can set it up as a separate 
Maven module so the Spark and Scala dependencies only come in there, optionally 
by way of a profile that isn't turned on by default but is turned on for the 
release profile. hbase-spark sounds good as a module name to me.


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-06-29 Thread Elliott Clark (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606289#comment-14606289
 ] 

Elliott Clark commented on HBASE-13992:
---

I really like the idea. Spark streaming and HBase make some great sense 
together.

Thoughts on the code as it is currently:

* Need to scrub/remove cdh and cloudera references.
* Seems really messy as far as structure/layout.
** Why javatest/ and src/test/java
** Why java/ and src/main/java
** Single random scala file in root dir?
* Tons of ide files in here.
* Everything is named example. These should all be bases ( either extendable 
base classes or scala traits )
* Why java when most of spark is scala?
* Needs some serious work on java/scala docing.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-06-29 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606567#comment-14606567
 ] 

Sean Busbey commented on HBASE-13992:
-

It sounds like so far everyone agrees that as a feature it's a good idea. 
Feature additions are handled by lazy consensus; so long as someone wants to 
work on a feature it's up to those who oppose it to speak up to avoid moving 
forward.

Normally the way we'd proceed is you as the contributor would post a short 
document describing the feature: the problem, the overall approach, some light 
implementation details. In this case, I personally think the existing materials 
provide enough context to move forward.

The next step after that would be for you to post a patch that adds a new 
module with the implementation. If you ran into trouble as you went, you'd post 
a question either here or on dev@hbase and one or more committers would step up 
to help.

What kind of guidance are you looking for? I'm particularly interested in how 
our contributor guide could provide the needed groundwork for folks to work on 
complex features like this.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-06-29 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606347#comment-14606347
 ] 

Ted Malaska commented on HBASE-13992:
-

1. Yes CDH will be removed
2. Yes there is some clean up we can do
3. Yes I will take care of the ide file
4. Not everything is named example only the example.  I can walk you through 
the code through a webex later this week if you have time.
5. Why Java.  Well the main code is in Scala the only java is the wrapper so it 
will work in Java Spark and the Java examples
6. 100% agree


 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-06-29 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606575#comment-14606575
 ] 

Ted Malaska commented on HBASE-13992:
-

Cool thanks Sean and Andrew.

Sounds great.  I will wait a couple of days then I would love to talk with a 
committer or two about putting together an outline for package structure and 
naming conventions. 

After that I should be good to get a patch made.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-06-29 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606260#comment-14606260
 ] 

Ted Malaska commented on HBASE-13992:
-

You know Sean, if we do that we can rebuild the balk load in Spark.  That would 
be so cool.

I guess that would be a different jira, but it should be a jira.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-06-29 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606269#comment-14606269
 ] 

Sean Busbey commented on HBASE-13992:
-

One step at a time. :)

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-06-29 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606550#comment-14606550
 ] 

Ted Malaska commented on HBASE-13992:
-

So how should we move forward.

- Do we vote on weather this should be or are we past that?
- To make the design I will need help from committers can I get assigned a 
Committer to give me guidance?

Ted Malaska 

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-06-29 Thread Sean Busbey (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606238#comment-14606238
 ] 

Sean Busbey commented on HBASE-13992:
-

sweet. Adding this in as a module (hbase-spark ?) would also give us a good 
push to move the mapreduce libraries out of hbase-server finally. linking 
HBASE-11843.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska
  Labels: spark

 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)