[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427729#comment-16427729 ] stack commented on HBASE-13992: --- This go reverted from branch-2 via another issue. > Integrate SparkOnHBase into HBase > - > > Key: HBASE-13992 > URL: https://issues.apache.org/jira/browse/HBASE-13992 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Theodore michael Malaska >Assignee: Theodore michael Malaska >Priority: Major > Fix For: 3.0.0 > > Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, > HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, > HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, > HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, > HBASE-13992.patch.5 > > > This Jira is to ask if SparkOnHBase can find a home in side HBase core. > Here is the github: > https://github.com/cloudera-labs/SparkOnHBase > I am the core author of this project and the license is Apache 2.0 > A blog explaining this project is here > http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ > A spark Streaming example is here > http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ > A real customer using this in produce is blogged here > http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ > Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648186#comment-14648186 ] Steve Loughran commented on HBASE-13992: Coverage is an odd metric anyway, because as well as code there's state coverage : ipv6, windows, timezone=GMT0, locale=turkish, which can break things even in code which nominally had 100%. Having tests which generate failure conditions (done here) with test setups that explore the configuration space are about the best you can get. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645127#comment-14645127 ] Hudson commented on HBASE-13992: FAILURE: Integrated in HBase-TRUNK #6683 (See [https://builds.apache.org/job/HBase-TRUNK/6683/]) HBASE-13992 Integrate SparkOnHBase into HBase (busbey: rev 30f7d127c3974cff9e3058e13d7c50805ee4482f) * hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseContextSuite.scala * hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkGetExample.java * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseDistributedScanExample.scala * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseMapPartitionExample.scala * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseStreamingBulkPutExample.scala * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkPutExample.scala * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/JavaHBaseContext.scala * dev-support/test-patch.properties * hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctionsSuite.scala * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkGetExample.scala * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseRDDFunctions.scala * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkDeleteExample.scala * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseBulkGetExample.scala * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseDStreamFunctions.scala * hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkDeleteExample.java * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseForeachPartitionExample.scala * hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseDistributedScan.java * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkPutTimestampExample.scala * hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseBulkPutExample.java * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseBulkPutExample.scala * hbase-spark/src/test/java/org/apache/hadoop/hbase/spark/JavaHBaseContextSuite.java * hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/HBaseDStreamFunctionsSuite.scala * hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseMapGetPutExample.java * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/hbasecontext/HBaseBulkPutExampleFromFile.scala * hbase-spark/pom.xml * pom.xml * hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/example/hbasecontext/JavaHBaseStreamingBulkPutExample.java * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/example/rdd/HBaseBulkDeleteExample.scala Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644461#comment-14644461 ] Sean Busbey commented on HBASE-13992: - Looks good to me. As previously agreed I'll amend to up the javadoc warn count. Things fine by you [~ste...@apache.org]? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644580#comment-14644580 ] Steve Loughran commented on HBASE-13992: LGTM : I only worry about testability, and that's a good start. More tests will no doubt come over time ... something in Bigtop would be good for the integration Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644910#comment-14644910 ] Andrew Purtell commented on HBASE-13992: Thanks so much for the contribution Ted M, this is a great start! Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644914#comment-14644914 ] Sean Busbey commented on HBASE-13992: - If folks have a target, e.g. some coverage %, I'm happy to add another jira to the list of gates for HBASE-14160 Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644975#comment-14644975 ] Andrew Purtell commented on HBASE-13992: Not opposed to a coverage target but I don't think we need one, at least not outside of a larger effort to up coverage everywhere. I think Steve's suggestion to have tests covering the likely runtime failure scenarios would be a good start. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644907#comment-14644907 ] Andrew Purtell commented on HBASE-13992: I agree with [~steve_l] we should have more tests. Let's follow up on another issue before this gets into a release. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643213#comment-14643213 ] Ted Malaska commented on HBASE-13992: - Thanks Sean. Let me know if there r any more changes needed for this patch. Also if this meets the min required to go in it will allow me to multi thread. I would like to have three thirds of development: 1. More testing 2. Dataframes 3. Bulk load Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643163#comment-14643163 ] Sean Busbey commented on HBASE-13992: - all those zombie tests look unrelated and they pass locally. How do the additions look, [~ste...@apache.org]? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643354#comment-14643354 ] Sean Busbey commented on HBASE-13992: - {quote} its probably best to put artifact versions in the root pom.xml, not the spark one, for one single place for dependencies ... this will matter if 1 scala module goes in. {quote} We've had a couple of folks (myself, Andrew) request all spark and scala related bits be isolated to this module specifically because we don't have any other scala modules yet and we want to minimize the reach of those dependencies until we do. A top-level Scala version would require us to make a determination now about handling Scala version incompatibilities, e.g. with different module layouts. {quote} {{HBaseDStreamFunctionsSuite.scala}} has the wrong assumption. {{assert(foo5.equals(bar), foo4 + !=bar)}} Scalatest lets you use assertResult instead, for an auto-generated message {code}assertResult(bar) { foo5 } {code} And you can use == for a slightly less informative error message, but one which still includes the values on either side {quote} sounds like a reasonable fixup. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643365#comment-14643365 ] Ted Malaska commented on HBASE-13992: - Ok I will make the two code changes and get a patch to use guys soon. Thanks for the review Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643322#comment-14643322 ] Steve Loughran commented on HBASE-13992: New tests look good. # its probably best to put artifact versions in the root pom.xml, not the spark one, for one single place for dependencies ... this will matter if 1 scala module goes in. # {{HBaseDStreamFunctionsSuite.scala }} has the wrong assumption. {code} {{assert(foo5.equals(bar), foo4 + !=bar)}} {code} Scalatest lets you use assertResult instead, for an auto-generated message {code} assertResult(bar) { foo5 } {code} And you can use {{==}} for a slightly less informative error message, but one which still includes the values on either side Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643715#comment-14643715 ] Hadoop QA commented on HBASE-13992: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12747422/HBASE-13992.12.patch against master branch at commit 1566ec5fdc751897b2e931b2b0920c6d503c85ce. ATTACHMENT ID: 12747422 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:red}-1 javac{color}. The applied patch generated 26 javac compiler warnings (more than the master's current 24 warnings). {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 5 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScanBase.testScan(TestTableInputFormatScanBase.java:244) at org.apache.hadoop.hbase.mapreduce.TestTableInputFormatScan2.testScanYYYToEmpty(TestTableInputFormatScan2.java:96) at org.apache.hadoop.hbase.mapreduce.TestCellCounter.testCellCounterStartTimeRange(TestCellCounter.java:137) at org.apache.hadoop.hbase.mapreduce.TestImportExport.testWithFilter(TestImportExport.java:447) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14901//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14901//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14901//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14901//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14901//console This message is automatically generated. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641429#comment-14641429 ] Hadoop QA commented on HBASE-13992: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12747159/HBASE-13992.11.patch against master branch at commit dad4cad30e5b0c69694ee90908ad8e74c592d821. ATTACHMENT ID: 12747159 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:red}-1 javac{color}. The applied patch generated 26 javac compiler warnings (more than the master's current 24 warnings). {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 5 zombie test(s): at org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot.testExportFileSystemState(TestMobExportSnapshot.java:285) at org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot.testExportFileSystemState(TestMobExportSnapshot.java:259) at org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot.testExportWithTargetName(TestMobExportSnapshot.java:217) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:288) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:262) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testEmptyExportFileSystemState(TestExportSnapshot.java:206) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:288) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testExportFileSystemState(TestExportSnapshot.java:262) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testSnapshotWithRefsExportFileSystemState(TestExportSnapshot.java:256) at org.apache.hadoop.hbase.snapshot.TestExportSnapshot.testSnapshotWithRefsExportFileSystemState(TestExportSnapshot.java:236) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14891//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14891//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14891//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14891//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14891//console This message is automatically generated. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639984#comment-14639984 ] Hadoop QA commented on HBASE-13992: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12746933/HBASE-13992.10.patch against master branch at commit dad4cad30e5b0c69694ee90908ad8e74c592d821. ATTACHMENT ID: 12746933 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:red}-1 javac{color}. The applied patch generated 26 javac compiler warnings (more than the master's current 24 warnings). {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 3 zombie test(s): at org.apache.hadoop.hbase.security.visibility.TestVisibilityLabelsWithDeletes.testDeleteFamilyLatestTimeStampWithMulipleVersionsWithoutCellVisibilityInPuts(TestVisibilityLabelsWithDeletes.java:1511) at org.apache.hadoop.hbase.coprocessor.TestOpenTableInCoprocessor.testCoprocessorCanCreateConnectionToRemoteTableWithCustomPool(TestOpenTableInCoprocessor.java:145) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14876//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14876//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14876//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14876//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14876//console This message is automatically generated. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14641228#comment-14641228 ] Sean Busbey commented on HBASE-13992: - v10 passes for me locally now. It looks like both of the failure cases are checking for SparkException. Could we check directly for TableNotFoundException and NoSuchColumnFamilyException? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.10.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639526#comment-14639526 ] Lars Hofhansl commented on HBASE-13992: --- Would like to meet up. Out of the country on vacation until August 11th, though. Re: Tests. I think we can commit this, and file a sub jira to add tests. (On the other hand, why rush? If more tests are requested, why not add them now before commit?) I'm fine either way. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639709#comment-14639709 ] Sean Busbey commented on HBASE-13992: - {quote} Were a project I was a committer on, I'd be mandating the failure tests, as they are the tests most likely to break things. As I'm not an HBase committer, I will leave the opinions to others. At the very least, there needs to be a followup JIRA for the extra tests. {quote} HBase works by consensus across our community, not just consensus from those who have been flagged as more established in the project. Ted, let me know if you run into any trouble getting these set up or if you want a hand off on adding them. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639697#comment-14639697 ] Steve Loughran commented on HBASE-13992: Were a project I was a committer on, I'd be mandating the failure tests, as they are the tests most likely to break things. As I'm not an HBase committer, I will leave the opinions to others. At the very least, there needs to be a followup JIRA for the extra tests. As ted notes, they should just throw the standard exceptions. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638800#comment-14638800 ] Ted Malaska commented on HBASE-13992: - Thanks lars. Will do there will be a lot of jiras after this one. Also if u have time next week or the week after I would love to brain storm about the road map. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14638666#comment-14638666 ] Lars Hofhansl commented on HBASE-13992: --- +1 on V9. Thanks for your patience [~ted.m]. One more idea for a future improvement: Make BulkGet return things in exactly the same format as distributed scan. Would be cool, since then one could plug ways to get data (bulk get, scan with filter, InputputFormat, etc) and leave all the rest of the code identical. Can do later. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639734#comment-14639734 ] Ted Malaska commented on HBASE-13992: - Hey Steve, no worries. You r right those tests are important and I will try to get a patch in the next couple of hours. But just note this is not a perfect patch. After this there is about 4 to 6 more patches to make this really great. Once this patch is in more people can help and we can multi thread our development effort. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639471#comment-14639471 ] Sean Busbey commented on HBASE-13992: - I agree that test handling around failures will be necessary for this to be robust and usable for downstream folks. Steve, do you think we need those tests in place to land on master? Would you be fine with making those additions part of the gate for getting this backported to branch-1? The expectation is that backporting to branch-1 will mark when users end up interacting with this code. Similar to needing failure handling we'll definitely need to have user-facing docs before that happens, but those docs are part of the follow on work after this first jira. I think right now the expectation is that master won't have a release before the end of the year and these changes in branch-1 would correspond to 1.3. Given that we're trying to get 1.2 out now, that would give us at least 3-4 months for the follow on tickets. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639587#comment-14639587 ] Ted Malaska commented on HBASE-13992: - I can add in those tests today or tomorrow. There if nothing special about the Spark integration it just gives a connection to the executors. So in these cases we will just see a normal HBase exception like if you did without spark. test against non-existent database attempt to work with a table that doesn't exist attempt to read a column that doesn't exist Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639357#comment-14639357 ] Steve Loughran commented on HBASE-13992: There's not much in the way of tests here, in particular, not much in the way of generation of failure conditions and validation of outcome Ideally, there'd be one test to generate each failure condition: the exception handling including those which downgrade a failure to a log message...the test should verify that such actions are the correct response. At the very least, I'd recommend # test against non-existent database # attempt to work with a table that doesn't exist # attempt to read a column that doesn't exist I'd also make sure test teardown is robust, catching exceptions downgrading to logs. That way, if something didn't get set up properly, the root cause of the failure isn't hidden by any exception generated in teardown. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639290#comment-14639290 ] Sean Busbey commented on HBASE-13992: - I'm trying to do a final build to make sure I've got the patch applied correctly, and the new integration tests are failing. {code} [INFO] --- scalatest-maven-plugin:1.0:test (integration-test) @ hbase-spark --- WARNING: -c has been deprecated and will be reused for a different (but still very cool) purpose in ScalaTest 2.0. Please change all uses of -c to -P. Discovery starting. Discovery completed in 284 milliseconds. Run starting. Expected test count is: 11 HBaseDStreamFunctionsSuite: HBaseContextSuite: HBaseRDDFunctionsSuite: 2015-07-23 10:50:56.702 java[97585:6403] Unable to load realm info from SCDynamicStore *** RUN ABORTED *** java.util.concurrent.ExecutionException: java.io.IOException: Shutting down at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.scalatest.tools.ConcurrentDistributor.waitUntilDone(ConcurrentDistributor.scala:52) at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:2549) at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1044) at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1043) at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:2722) at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1043) at org.scalatest.tools.Runner$.main(Runner.scala:860) at org.scalatest.tools.Runner.main(Runner.scala) ... Cause: java.io.IOException: Shutting down at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:232) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:94) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1040) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1006) at org.apache.hadoop.hbase.spark.HBaseDStreamFunctionsSuite.beforeAll(HBaseDStreamFunctionsSuite.scala:44) at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) at org.apache.hadoop.hbase.spark.HBaseDStreamFunctionsSuite.beforeAll(HBaseDStreamFunctionsSuite.scala:30) at org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:253) at org.apache.hadoop.hbase.spark.HBaseDStreamFunctionsSuite.run(HBaseDStreamFunctionsSuite.scala:30) at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:55) ... Cause: java.lang.RuntimeException: Failed construction of Master: class org.apache.hadoop.hbase.master.HMasterAddress already in use at org.apache.hadoop.hbase.util.JVMClusterUtil.createMasterThread(JVMClusterUtil.java:143) at org.apache.hadoop.hbase.LocalHBaseCluster.addMaster(LocalHBaseCluster.java:218) at org.apache.hadoop.hbase.LocalHBaseCluster.init(LocalHBaseCluster.java:154) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:214) at org.apache.hadoop.hbase.MiniHBaseCluster.init(MiniHBaseCluster.java:94) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1040) at org.apache.hadoop.hbase.HBaseTestingUtility.startMiniHBaseCluster(HBaseTestingUtility.java:1006) at org.apache.hadoop.hbase.spark.HBaseDStreamFunctionsSuite.beforeAll(HBaseDStreamFunctionsSuite.scala:44) at org.scalatest.BeforeAndAfterAll$class.beforeAll(BeforeAndAfterAll.scala:187) at org.apache.hadoop.hbase.spark.HBaseDStreamFunctionsSuite.beforeAll(HBaseDStreamFunctionsSuite.scala:30) ... Cause: java.net.BindException: Port in use: 0.0.0.0:16010 at org.apache.hadoop.hbase.http.HttpServer.openListeners(HttpServer.java:1013) at org.apache.hadoop.hbase.http.HttpServer.start(HttpServer.java:949) at org.apache.hadoop.hbase.http.InfoServer.start(InfoServer.java:91) at org.apache.hadoop.hbase.regionserver.HRegionServer.putUpWebUI(HRegionServer.java:1789) at org.apache.hadoop.hbase.regionserver.HRegionServer.init(HRegionServer.java:604) at org.apache.hadoop.hbase.master.HMaster.init(HMaster.java:363) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) ... Cause: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:444) at sun.nio.ch.Net.bind(Net.java:436) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:214) at
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637463#comment-14637463 ] Ted Malaska commented on HBASE-13992: - Hold on that last patch. I have a couple more changes left. give me 10 minutes Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14636943#comment-14636943 ] Ted Malaska commented on HBASE-13992: - So today from 3 to 6 I will have time to work on this. So if this jira is closed by 3 I will make a new one for the HBaseRDD.collect other wise I will add an additional patch to this jira. But after that next patch please help me close this out. I do have a number of jiras I would like to start to add additional functionality to this original patch Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637720#comment-14637720 ] Hadoop QA commented on HBASE-13992: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12746620/HBASE-13992.9.patch against master branch at commit 20739542fdca185eb857813bf269d6262c11b652. ATTACHMENT ID: 12746620 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:red}-1 javac{color}. The applied patch generated 26 javac compiler warnings (more than the master's current 24 warnings). {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.camel.test.spring.CamelSpringTestSupport.doCreateApplicationContext(CamelSpringTestSupport.java:90) at org.apache.camel.test.spring.CamelSpringTestSupport.doPreSetup(CamelSpringTestSupport.java:80) at org.apache.camel.test.junit4.CamelTestSupport.setUp(CamelTestSupport.java:237) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14864//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14864//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14864//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14864//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14864//console This message is automatically generated. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637736#comment-14637736 ] Ted Malaska commented on HBASE-13992: - Ok patch 9 passed the build. It has lar's hbaserdd.collect change. Are we good for a commit? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637730#comment-14637730 ] Hadoop QA commented on HBASE-13992: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12746611/HBASE-13992.8.patch against master branch at commit 20739542fdca185eb857813bf269d6262c11b652. ATTACHMENT ID: 12746611 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:red}-1 javac{color}. The applied patch generated 26 javac compiler warnings (more than the master's current 24 warnings). {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1871 checkstyle errors (more than the master's current 1870 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + private static class ScanConvertFunction implements FunctionTuple2ImmutableBytesWritable, Result, String { + private void populateTableWithMockData(Configuration conf, TableName tableName) throws IOException { {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 5 zombie test(s): at org.apache.hadoop.hbase.client.TestAdmin1.testForceSplitMultiFamily(TestAdmin1.java:1021) at org.apache.hadoop.hbase.client.TestReplicaWithCluster.testChangeTable(TestReplicaWithCluster.java:224) at org.apache.hadoop.hbase.client.TestMetaWithReplicas.testShutdownHandling(TestMetaWithReplicas.java:141) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14863//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14863//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14863//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14863//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14863//console This message is automatically generated. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14637757#comment-14637757 ] Sean Busbey commented on HBASE-13992: - I'm good with v9. I can amend to update the number of allowed javadoc warnings as previously discussed. [~lhofhansl]? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635591#comment-14635591 ] Andrew Purtell commented on HBASE-13992: Latest patch lgtm There is a javadoc nit: {noformat} [INFO] Compiling 3 source files to /home/jenkins/jenkins-slave/workspace/PreCommit-HBASE-Build@2/hbase-spark/target/test-classes at 1437449685681 [WARNING] warning: Class org.apache.hadoop.mapred.MiniMRCluster not found - continuing with a stub. [WARNING] one warning found {noformat} Not quite sure what to make of that. We could bump the expected count if there's no solution. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635598#comment-14635598 ] Matteo Bertozzi commented on HBASE-13992: - {quote}Not quite sure what to make of that. We could bump the expected count if there's no solution.{quote} yeah I was looking at that too, there is no use of MiniMRCluster in the patch. so it is probably from the hadoop-common dependency or something like that. I'm +1 to bump the count unless [~busbey] or someone else knows how to make that warn go away. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635616#comment-14635616 ] Sean Busbey commented on HBASE-13992: - +1 let's get it in and fix the warning as a follow-on. Ted, would you mind filing a jira for the minimrcluster warning and assigning it to me? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635813#comment-14635813 ] Ted Malaska commented on HBASE-13992: - Oh now I understand the part that Lars is referencing. I was think of the hbase rdd functions which is like what Lars is saying but there is a distributed scan that is not the same. Can I do what sean said and fix it in anot her jira. I can make and submit the jira tomorrow. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635823#comment-14635823 ] Andrew Purtell commented on HBASE-13992: I'm ok with getting this in to master either with or without the adjustment for HBaseRDD.collect for further refinement before it makes its way into a release. Hoping we can get more exposure of this among Spark app developers for further feedback. I know Ted also has a bunch of incremental improvements in mind. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635791#comment-14635791 ] Sean Busbey commented on HBASE-13992: - I haven't done much day to day with Spark, so I don't know how inconvenient that difference is. This is only targeting master initially. How about we add fixing the return from HBaseRDD.collect to the list of gatekeeper issues for pulling it back into branch-1? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14635783#comment-14635783 ] Lars Hofhansl commented on HBASE-13992: --- Sorry to harp the point... Is everybody OK that HBaseRDD.collect() returns ListTuple2byte[], ListTuple3byte[], byte[], byte[]? Instead of perhaps ListTuple2byte[], Result? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633537#comment-14633537 ] Lars Hofhansl commented on HBASE-13992: --- super minor nit: it's bet to have an attached patch end in .txt (which I prefer) or .patch, so that browser can open it directly. :) Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633547#comment-14633547 ] Lars Hofhansl commented on HBASE-13992: --- From the example: {code} ListTuple2byte[], ListTuple3byte[], byte[], byte[] results = javaRdd.collect(); {code} Still does map byte[] - Result, and so will be different from the HadoopRDD would do. (unless I am missing something) Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633630#comment-14633630 ] Ted Malaska commented on HBASE-13992: - [~lhofhansl] My bad. That shouldn't be to hard to fix. I will have that in the next patch. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633802#comment-14633802 ] Ted Malaska commented on HBASE-13992: - [~ted_yu] thanks I will have a new patch hopefully tonight and I will name it with .patch at the end. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633796#comment-14633796 ] Ted Yu commented on HBASE-13992: [~ted.m]: The patch ending in .5 was not picked up by QA bot. Name your patch with .patch or .txt extension. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634141#comment-14634141 ] Hadoop QA commented on HBASE-13992: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12746162/HBASE-13992.5.patch against master branch at commit 0f614a1c44e1887ca7177b66bb6208b6e69db7e1. ATTACHMENT ID: 12746162 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:red}-1 javac{color}. The applied patch generated 49 javac compiler warnings (more than the master's current 24 warnings). {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 5 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1878 checkstyle errors (more than the master's current 1871 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings (more than the master's current 0 warnings). {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; +new SparkConf().setAppName(JavaHBaseStreamingBulkPutExample + tableName + : + port + : + tableName); +configBroadcast: Broadcast[SerializableWritable[Configuration]], + private def getConf(configBroadcast: Broadcast[SerializableWritable[Configuration]]): Configuration = { + configBroadcast: Broadcast[SerializableWritable[Configuration]], + * or security issues. For instance, an Array[AnyRef] can hold any type T, but may lose primitive +val sparkConf = new SparkConf().setAppName(HBaseBulkPutTimestampExample + tableName + + columnFamily) +(Bytes.toBytes(6), Array((Bytes.toBytes(columnFamily), Bytes.toBytes(1), Bytes.toBytes(1, +(Bytes.toBytes(7), Array((Bytes.toBytes(columnFamily), Bytes.toBytes(1), Bytes.toBytes(2, +(Bytes.toBytes(8), Array((Bytes.toBytes(columnFamily), Bytes.toBytes(1), Bytes.toBytes(3, {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14840//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14840//artifact/patchprocess/patchReleaseAuditWarnings.txt Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14840//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14840//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14840//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14840//console This message is automatically generated. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633980#comment-14633980 ] Ted Malaska commented on HBASE-13992: - [~ted_yu] I uploaded a new patch. No code changes. Just a new name. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633977#comment-14633977 ] Ted Malaska commented on HBASE-13992: - [~lhofhansl] So maybe I read your comment wrong. Patch 5 returns RDD[(ImmutableBytesWritable, Result)] from hbaseBulkGet What would you like it to return? Maybe I read it wrong but doesn't TableInputFormat return (ImmutableBytesWritable, Result)? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14633981#comment-14633981 ] Lars Hofhansl commented on HBASE-13992: --- No, that's right. Hmm. Why does HBaseRDD.collect() return something different? (I think that part confused me) Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634388#comment-14634388 ] Hadoop QA commented on HBASE-13992: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12746201/HBASE-13992.6.patch against master branch at commit 7ce318dd3be9df0ee1c025b4792ded0161aa2c9c. ATTACHMENT ID: 12746201 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:red}-1 javac{color}. The applied patch generated 44 javac compiler warnings (more than the master's current 24 warnings). {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + * or security issues. For instance, an Array[AnyRef] can hold any type T, but may lose primitive + putRecord._2.foreach((putValue) = put.addColumn(putValue._1, putValue._2, timeStamp, putValue._3)) + val sparkConf = new SparkConf().setAppName(HBaseBulkPutExample + tableName + + columnFamily) + (Bytes.toBytes(1), Array((Bytes.toBytes(columnFamily), Bytes.toBytes(1), Bytes.toBytes(1, + (Bytes.toBytes(2), Array((Bytes.toBytes(columnFamily), Bytes.toBytes(1), Bytes.toBytes(2, + (Bytes.toBytes(3), Array((Bytes.toBytes(columnFamily), Bytes.toBytes(1), Bytes.toBytes(3, + (Bytes.toBytes(4), Array((Bytes.toBytes(columnFamily), Bytes.toBytes(1), Bytes.toBytes(4, + (Bytes.toBytes(5), Array((Bytes.toBytes(columnFamily), Bytes.toBytes(1), Bytes.toBytes(5 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.mapreduce.TestImportTSVWithOperationAttributes org.apache.hadoop.hbase.mapreduce.TestRowCounter {color:red}-1 core zombie tests{color}. There are 5 zombie test(s): at org.apache.phoenix.hbase.index.covered.example.EndToEndCoveredIndexingIT.testMultipleTimestampsInSingleDelete(EndToEndCoveredIndexingIT.java:428) at org.apache.phoenix.hbase.index.balancer.IndexLoadBalancerIT.testRandomAssignmentDuringIndexTableEnable(IndexLoadBalancerIT.java:265) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14845//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14845//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14845//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14845//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14845//console This message is automatically generated. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14634491#comment-14634491 ] Hadoop QA commented on HBASE-13992: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12746246/HBASE-13992.7.patch against master branch at commit 7ce318dd3be9df0ee1c025b4792ded0161aa2c9c. ATTACHMENT ID: 12746246 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 9 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:red}-1 javac{color}. The applied patch generated 26 javac compiler warnings (more than the master's current 24 warnings). {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 2 warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14849//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14849//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14849//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14849//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14849//console This message is automatically generated. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632420#comment-14632420 ] Lars Hofhansl commented on HBASE-13992: --- bq. As for the Result you do get the results but you need to convert it to something that can go into an RDD. The TableInputFormat will return tuples of byte[] - Result, that's also what one would get when using the HadoopRDD with TableInputFormat. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14632589#comment-14632589 ] Ted Malaska commented on HBASE-13992: - [~lhofhansl] great idea I will try to add that in to the next patch which should be out tomorrow. Thanks for helping me through this Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631369#comment-14631369 ] Ted Malaska commented on HBASE-13992: - [~lhofhansl] * I will add the comments * As for the Result you do get the results but you need to convert it to something that can go into an RDD. * As for the memory needed. Think of it as if you are reading a file from S3 or HDFS. It is the same thing here. * Yes I tested 1.4 with -D and it seemed to work. And yes I'm super excited about this too, because this is only the beginning. I will work through the weekend to get all these reviews into the next patch. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631283#comment-14631283 ] Sean Busbey commented on HBASE-13992: - They were on review board Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631344#comment-14631344 ] Lars Hofhansl commented on HBASE-13992: --- Few nits: * Comments in what's happening in the Scala and JavaHBase*Examples * Might want to have a comment in *HBaseBulkDeleteExample.java explaining that it is probably better to use the built-in BulkDeleteEndpoint (which in turn we have to document better) * Comment explaining in *HBaseDistributedScan as to why this is preferred over using Spark's built-in HadoopRDD with TableInputFormat or the included HBaseRDD (it's obvious, but hey, somebody might look at the classes and wonder why). * I assume there's quite some heap needed to get the RDD resulting from a scan into a (RowKey, List[(columnFamily, columnQualifier, Value) (going by the article here). Can that be avoided if it is a problem? Or in other words, is there an easy to way to get a raw Result[] or ListResult? ... sorry if I'm missing something. Very excited about having this in HBase proper. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14631097#comment-14631097 ] Lars Hofhansl commented on HBASE-13992: --- bq. -Dspark.version=1.4.0 That's what I meant. Sorry I didn't say explicitly. bq. thank you vary much for your reviews today Did these happen offline? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630135#comment-14630135 ] Ted Malaska commented on HBASE-13992: - [~lhofhansl] I would like to add 1.4 or 1.5 into another patch. This patch is so large already. My hope is to close this one out then start a couple more right off of this like: 1. Newer versions of Spark 2. Adding DataFrame Support 3. Documentation 4. UpdateStateBy Key that will work through HBase 5. Bulk load to HBase 6. Much More Thanks Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630666#comment-14630666 ] Ted Malaska commented on HBASE-13992: - Hey [~andrew.purt...@gmail.com], yes just tested -Dspark.version=1.4.0 and it worked fine. I had one compile bug that I fixed. I will send it in the next patch. [~andrew.purt...@gmail.com] and [~busbey] thank you vary much for your reviews today. I know it was a lot of code and I thank you for taking the time. I will spend the next couple of days getting you a patch. Hopefully by sunday night. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14630630#comment-14630630 ] Andrew Purtell commented on HBASE-13992: bq. can we at least add an optional build against 1.4.0 (as we do with Hadoop)? The pom lets you do {{-Dspark.version=1.4.0}} (and also scala.version and scala.binary.version). Does this work for now? [~malaskat] does this compile with Spark 1.4? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628028#comment-14628028 ] Lars Hofhansl commented on HBASE-13992: --- Belated +1 on having this in a hbase-spark module as part of HBase. Looking at the patch now. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14628041#comment-14628041 ] Lars Hofhansl commented on HBASE-13992: --- Also came across Spark version 1.3.0. Will SparkOnHBase currently not compile/work with 1.4.x? If not... Fine. But if it does, can we at least add an optional build against 1.4.0 (as we do with Hadoop)? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch, HBASE-13992.patch.3 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14627231#comment-14627231 ] Andrew Purtell commented on HBASE-13992: bq. Should newer Spark release, such as 1.4.0, be used ? See https://issues.apache.org/jira/browse/HBASE-13992?focusedCommentId=14621073page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14621073 [~tedyu] Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625729#comment-14625729 ] Sean Busbey commented on HBASE-13992: - Some help on the feedback from QA, since our current version is lacking some of the niceties of the soon-to-be version: * to find the javac warnings, you can either run {{mvn -DSkipTests package | tee some_log.log}} before and after, then diff, or if you want to presume the new warnings are just in your new module do it after and search for lines starting with \[WARNING]. * to find the javadoc warnings you can either run {{mvn -DskipTests javadoc:javadoc | tee some_other.log}} before and after, then diff. or again you could just look in your module for the start of \[WARNING] Javadoc Warnings * 10 of the checkstyle warnings will be the too long lines. to get the rest you'll need to run {{mvn -DskipTests checkstyle:aggregate}} and save the {{target/checkstyle-result.xml}} files to compare before/after. * the release audit warning is the missing license header noted on the review * the TestRemoteTable failure seems unrelated. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625745#comment-14625745 ] Ted Malaska commented on HBASE-13992: - [~busbey] will do, thank you and sorry for missing that on the first draft [~ted_yu] Done. Added HBase to the review board group Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625743#comment-14625743 ] Ted Yu commented on HBASE-13992: There have been review comments on the review board while I haven't received notification. Please add hbase to Groups field. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625691#comment-14625691 ] Ted Yu commented on HBASE-13992: Uploading onto reviewboard would make reviewing easier. {code} +spark.version1.3.0/spark.version {code} Should newer Spark release, such as 1.4.0, be used ? {code} +!-- scopetest/scope Return-- {code} Uncomment the above ? Please add short javadoc for the XXExample classes. {code} + System.out + .println(JavaHBaseBulkGetExample {master} {tableName}); {code} Merge above two lines. For JavaHBaseDistributedScan: {code} +results.size(); + } {code} Did you intend to print the result size ? For JavaHBaseMapGetPutExample, GetFunction isn't called. {code} + .println(JavaHBaseBulkPutExample {master} {host} {post} {tableName} {columnFamily}); {code} post - port For HBaseContext, {code} + * serializable Configuration object {code} There're 3 parameters to HBaseContext. Above is one of them. Did you intend to provide scaladoc for all of them ? {code} + def mapPartition[T, R: ClassTag](rdd: RDD[T], {code} Should the above method be called mapPartitions (to align with method of RDD) ? {code} + def streamForeachRDD[T](dstream: DStream[T], {code} Should the method be called streamForeachPartition since there is foreachRDD method which accepts DStream already. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625702#comment-14625702 ] Ted Malaska commented on HBASE-13992: - Hey [~ted_yu] the link for the review board is in the jira. But here it is again. https://reviews.apache.org/r/36457 Thank you for reviewing. I will try to get a new cut out tomorrow Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625515#comment-14625515 ] Ted Malaska commented on HBASE-13992: - Oh for help for the reviews. All the magic is in HBaseContext. Everything else is ether one of the following: 1. Examples 2. Tests 3. Implicit Scala Functions 4. Java port Also this doesn't include the following, which will come in following patches: 1. Validation that the code will be able to accept new HBase Kerberos tickets given through Spark-Submit in Yarn-Cluster mode. 2. Integration with DataFrames. This is easy to do I just wanted to separate it out into a different jira. 3. Better unit testing. I'm testing every function with the HBase test cluster, but I'm not the best at unit test, so on a following patch I will work with others to add more tests. 4. More Examples. I would like to build on common Spark Stream use cases with HBase. 5. Documentation. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14625643#comment-14625643 ] Hadoop QA commented on HBASE-13992: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12745137/HBASE-13992.patch against master branch at commit a3d30892b41f604ab5a62d4f612fa7c230267dfe. ATTACHMENT ID: 12745137 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 7 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:red}-1 javac{color}. The applied patch generated 42 javac compiler warnings (more than the master's current 20 warnings). {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 4 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1890 checkstyle errors (more than the master's current 1873 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:red}-1 release audit{color}. The applied patch generated 1 release audit warnings (more than the master's current 0 warnings). {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd; +argLine-Xmx1536m -XX:MaxPermSize=512m -XX:ReservedCodeCacheSize=512m/argLine + public static class CustomFunction implements VoidFunctionTuple2Iteratorbyte[], HConnection { + .println(JavaHBaseBulkPutExample {master} {host} {post} {tableName} {columnFamily}); +JavaReceiverInputDStreamString javaDstream = jssc.socketTextStream(host, Integer.parseInt(port)); + private def bulkMutation[T](rdd: RDD[T], tableName: TableName, f: (T) = Mutation, batchSize: Integer) { + def hbaseRDD[U: ClassTag](tableName: TableName, scan: Scan, f: ((ImmutableBytesWritable, Result)) = U): RDD[U] = { +TableMapReduceUtil.initTableMapperJob(tableName, scan, classOf[IdentityTableMapper], null, null, job) + list.add((CellUtil.cloneFamily(cell), CellUtil.cloneQualifier(cell), CellUtil.cloneValue(cell))) +configBroadcast: Broadcast[SerializableWritable[Configuration]], {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.rest.client.TestRemoteTable Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14762//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14762//artifact/patchprocess/patchReleaseAuditWarnings.txt Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14762//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14762//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14762//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14762//console This message is automatically generated. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 Attachments: HBASE-13992.patch This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621993#comment-14621993 ] Kostas Sakellis commented on HBASE-13992: - Just driving by and curious about bq. gap: data frames are missing key functionality pre-1.4. What specific features is this referring to? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621073#comment-14621073 ] Sean Busbey commented on HBASE-13992: - Notes from call today where [~apurtell] and I helped walk through a plan for this with [~ted.m]. h3. Overall Roadmap * start with current capabilities refactored into a module with hbase packages * initial support targets Spark 1.3 * later add RDD extension to expose hbase-specific functionality * later add data frame support, likely as additional module * start with master-only, use backport patch for branch-1 post refactor and docs h3. Minimums for Landing on Master * use an isolated top-level module to reduce impact on rest of project, modeled after the phoenix-spark module ** hbase-spark ** should help limit reach of dependency conflicts. look to phoenix-spark for any conflict resolution. * target spark 1.3 as upstream dependency ** already works with 1.3 ** 1.3 seen as able to reach a large existing user base and stable ** gap: data frames are missing key functionality pre-1.4. ** gap: unknown spark compatibility promises, history of scala version changes may limit how many future versions this can work with. * move existing implementation classes into an hbase subpackage ** org.apache.hadoop.hbase.spark ** there is currently a workaround to access a spark-package-private set of configs, needed for pre-1.3. since 1.3 is our minimum expected version, remove this workaround. * leave examples as a part of the single module definition for first landing * leave tests, including scala tests, as unit tests on the single module for first landing h3. Follow on for branch-1 * documentation - need to add a ref guide section on using the spark bindings * examples refactoring - examples should be moved out of the primary module into something like hbase-spark-examples * tests refactoring - java tests should be categorized to match rest of project. Scala tests should be moved to an IT Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark Fix For: 2.0.0 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14621098#comment-14621098 ] Sean Busbey commented on HBASE-13992: - I thought of another follow-on: we need to decide on and annotate the supported public API. Not sure how that or automated docs work for any Scala parts of the implementation. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Fix For: 2.0.0 This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610738#comment-14610738 ] Nicolas Liochon commented on HBASE-13992: - +1 as well for me. How does it work for the binaries version, will we have to enter into the scala game, i.e. hbase-spark-2_10? What about the spark version? The spark-hadoop version? Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14610694#comment-14610694 ] stack commented on HBASE-13992: --- [~malaskat] ping if you have questions or need a hand doing up a patch. I'd be game. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14609073#comment-14609073 ] Enis Soztutar commented on HBASE-13992: --- This seems fine as a module. +1 for {{hbase-spark}} as a name. Phoenix also comes with a Spark module itself. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606559#comment-14606559 ] Andrew Purtell commented on HBASE-13992: I suggest giving it a day or two for more comment to come in. If nothing changes, I'd say we could proceed with the integration approach of a new Maven module, addressing Elliott's comments. bq. To make the design I will need help from committers can I get assigned a Committer to give me guidance? We don't assign committers but maybe someone will volunteer. Otherwise I suggest writing up what questions you have or how you'd like to proceed, or both, and post them here. I suspect you'll get guidance back in the responses. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606505#comment-14606505 ] Andrew Purtell commented on HBASE-13992: I think this would be a good thing to have in. We can set it up as a separate Maven module so the Spark and Scala dependencies only come in there, optionally by way of a profile that isn't turned on by default but is turned on for the release profile. hbase-spark sounds good as a module name to me. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606289#comment-14606289 ] Elliott Clark commented on HBASE-13992: --- I really like the idea. Spark streaming and HBase make some great sense together. Thoughts on the code as it is currently: * Need to scrub/remove cdh and cloudera references. * Seems really messy as far as structure/layout. ** Why javatest/ and src/test/java ** Why java/ and src/main/java ** Single random scala file in root dir? * Tons of ide files in here. * Everything is named example. These should all be bases ( either extendable base classes or scala traits ) * Why java when most of spark is scala? * Needs some serious work on java/scala docing. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606567#comment-14606567 ] Sean Busbey commented on HBASE-13992: - It sounds like so far everyone agrees that as a feature it's a good idea. Feature additions are handled by lazy consensus; so long as someone wants to work on a feature it's up to those who oppose it to speak up to avoid moving forward. Normally the way we'd proceed is you as the contributor would post a short document describing the feature: the problem, the overall approach, some light implementation details. In this case, I personally think the existing materials provide enough context to move forward. The next step after that would be for you to post a patch that adds a new module with the implementation. If you ran into trouble as you went, you'd post a question either here or on dev@hbase and one or more committers would step up to help. What kind of guidance are you looking for? I'm particularly interested in how our contributor guide could provide the needed groundwork for folks to work on complex features like this. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606347#comment-14606347 ] Ted Malaska commented on HBASE-13992: - 1. Yes CDH will be removed 2. Yes there is some clean up we can do 3. Yes I will take care of the ide file 4. Not everything is named example only the example. I can walk you through the code through a webex later this week if you have time. 5. Why Java. Well the main code is in Scala the only java is the wrapper so it will work in Java Spark and the Java examples 6. 100% agree Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606575#comment-14606575 ] Ted Malaska commented on HBASE-13992: - Cool thanks Sean and Andrew. Sounds great. I will wait a couple of days then I would love to talk with a committer or two about putting together an outline for package structure and naming conventions. After that I should be good to get a patch made. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606260#comment-14606260 ] Ted Malaska commented on HBASE-13992: - You know Sean, if we do that we can rebuild the balk load in Spark. That would be so cool. I guess that would be a different jira, but it should be a jira. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606269#comment-14606269 ] Sean Busbey commented on HBASE-13992: - One step at a time. :) Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606550#comment-14606550 ] Ted Malaska commented on HBASE-13992: - So how should we move forward. - Do we vote on weather this should be or are we past that? - To make the design I will need help from committers can I get assigned a Committer to give me guidance? Ted Malaska Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase
[ https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14606238#comment-14606238 ] Sean Busbey commented on HBASE-13992: - sweet. Adding this in as a module (hbase-spark ?) would also give us a good push to move the mapreduce libraries out of hbase-server finally. linking HBASE-11843. Integrate SparkOnHBase into HBase - Key: HBASE-13992 URL: https://issues.apache.org/jira/browse/HBASE-13992 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska Labels: spark This Jira is to ask if SparkOnHBase can find a home in side HBase core. Here is the github: https://github.com/cloudera-labs/SparkOnHBase I am the core author of this project and the license is Apache 2.0 A blog explaining this project is here http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/ A spark Streaming example is here http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/ A real customer using this in produce is blogged here http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/ Please debate and let me know what I can do to make this happen. -- This message was sent by Atlassian JIRA (v6.3.4#6332)