[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16427751#comment-16427751 ] stack commented on HBASE-14181: --- Spark removed from branch-2/2.0.0 by HBASE-18817 > Add Spark DataFrame DataSource to HBase-Spark Module > > > Key: HBASE-14181 > URL: https://issues.apache.org/jira/browse/HBASE-14181 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Theodore michael Malaska >Assignee: Theodore michael Malaska >Priority: Minor > Fix For: 3.0.0 > > Attachments: HBASE-14181.1.patch, HBASE-14181.10.patch, > HBASE-14181.11.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, > HBASE-14181.4.patch, HBASE-14181.5.patch, HBASE-14181.6.patch, > HBASE-14181.7.patch, HBASE-14181.8.patch, HBASE-14181.8.patch, > HBASE-14181.9.patch > > > Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733950#comment-14733950 ] Hudson commented on HBASE-14181: FAILURE: Integrated in HBase-TRUNK #6786 (See [https://builds.apache.org/job/HBase-TRUNK/6786/]) HBASE-14181 Add Spark DataFrame DataSource to HBase-Spark Module (busbey: rev e95358a7fc3f554dcbb351c8b7295cafc01e8c23) * hbase-spark/src/main/java/org/apache/hadoop/hbase/spark/SparkSQLPushDownFilter.java * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/DefaultSource.scala * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/ColumnFamilyQualifierMapKeyWrapper.scala * hbase-spark/pom.xml * hbase-protocol/src/main/protobuf/Filter.proto * hbase-protocol/src/main/java/org/apache/hadoop/hbase/protobuf/generated/FilterProtos.java * hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/HBaseContext.scala * hbase-spark/src/test/scala/org/apache/hadoop/hbase/spark/DefaultSourceSuite.scala > Add Spark DataFrame DataSource to HBase-Spark Module > > > Key: HBASE-14181 > URL: https://issues.apache.org/jira/browse/HBASE-14181 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-14181.1.patch, HBASE-14181.10.patch, > HBASE-14181.11.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, > HBASE-14181.4.patch, HBASE-14181.5.patch, HBASE-14181.6.patch, > HBASE-14181.7.patch, HBASE-14181.8.patch, HBASE-14181.8.patch, > HBASE-14181.9.patch > > > Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14733227#comment-14733227 ] Hadoop QA commented on HBASE-14181: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12754417/HBASE-14181.10.patch against master branch at commit bada19bb54a358233db2b3e23c86d215ac2dc29b. ATTACHMENT ID: 12754417 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1836 checkstyle errors (more than the master's current 1834 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +FilterProtos.SQLPredicatePushDownFilter.Builder builder = FilterProtos.SQLPredicatePushDownFilter +new IllegalArgumentException("Invalid value for " + BATCHING_NUM_KEY +" '" + batchingNumStr + "'", e) +new IllegalArgumentException("Invalid value for " + CACHING_NUM_KEY +" '" + cachingNumStr + "'", e) {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestDistributedLogSplitting {color:red}-1 core zombie tests{color}. There are 3 zombie test(s): at org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testNotCachingDataBlocksDuringCompactionInternals(TestCacheOnWrite.java:484) at org.apache.hadoop.hbase.io.hfile.TestCacheOnWrite.testNotCachingDataBlocksDuringCompaction(TestCacheOnWrite.java:509) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15449//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15449//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15449//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15449//console This message is automatically generated. > Add Spark DataFrame DataSource to HBase-Spark Module > > > Key: HBASE-14181 > URL: https://issues.apache.org/jira/browse/HBASE-14181 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Minor > Attachments: HBASE-14181.1.patch, HBASE-14181.10.patch, > HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch, > HBASE-14181.5.patch, HBASE-14181.6.patch, HBASE-14181.7.patch, > HBASE-14181.8.patch, HBASE-14181.8.patch, HBASE-14181.9.patch > > > Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731828#comment-14731828 ] Hadoop QA commented on HBASE-14181: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12754311/HBASE-14181.9.patch against master branch at commit 2969093b5b39cb950d8710cfffa7e55484d40259. ATTACHMENT ID: 12754311 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1836 checkstyle errors (more than the master's current 1834 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +FilterProtos.SQLPredicatePushDownFilter.Builder builder = FilterProtos.SQLPredicatePushDownFilter +new NumberFormatException("Invalid value for " + BATCHING_NUM_KEY +" '" + batchingNumStr + "'") +new NumberFormatException("Invalid value for " + CACHING_NUM_KEY +" '" + cachingNumStr + "'") {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestWALLockup org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat2.testMRIncrementalLoadWithSplit(TestHFileOutputFormat2.java:385) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15426//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15426//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15426//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15426//console This message is automatically generated. > Add Spark DataFrame DataSource to HBase-Spark Module > > > Key: HBASE-14181 > URL: https://issues.apache.org/jira/browse/HBASE-14181 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Minor > Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, > HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, > HBASE-14181.6.patch, HBASE-14181.7.patch, HBASE-14181.8.patch, > HBASE-14181.8.patch, HBASE-14181.9.patch > > > Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732133#comment-14732133 ] Sean Busbey commented on HBASE-14181: - In addition to the notes on v9 for RB, the checkstyle warnings are legit additions: {code} {code} test failures are unrelated AFAICT > Add Spark DataFrame DataSource to HBase-Spark Module > > > Key: HBASE-14181 > URL: https://issues.apache.org/jira/browse/HBASE-14181 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Minor > Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, > HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, > HBASE-14181.6.patch, HBASE-14181.7.patch, HBASE-14181.8.patch, > HBASE-14181.8.patch, HBASE-14181.9.patch > > > Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14731742#comment-14731742 ] Ted Malaska commented on HBASE-14181: - Thank you [~busbey] for the review and thank you for the ship it. I have made the changes and updated the patch. Let me know if anyone else has more comments. Thanks again Ted Malaska > Add Spark DataFrame DataSource to HBase-Spark Module > > > Key: HBASE-14181 > URL: https://issues.apache.org/jira/browse/HBASE-14181 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Minor > Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, > HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, > HBASE-14181.6.patch, HBASE-14181.7.patch, HBASE-14181.8.patch, > HBASE-14181.8.patch, HBASE-14181.9.patch > > > Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14726078#comment-14726078 ] Ted Malaska commented on HBASE-14181: - Hey guys, What is left to do on this ticket? Let me know. I have time today and tomorrow. > Add Spark DataFrame DataSource to HBase-Spark Module > > > Key: HBASE-14181 > URL: https://issues.apache.org/jira/browse/HBASE-14181 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Minor > Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, > HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, > HBASE-14181.6.patch, HBASE-14181.7.patch, HBASE-14181.8.patch, > HBASE-14181.8.patch > > > Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14724465#comment-14724465 ] Hadoop QA commented on HBASE-14181: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12753386/HBASE-14181.8.patch against master branch at commit 498c1845ab7b01710955153c27501fdc7492849d. ATTACHMENT ID: 12753386 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0 2.7.1) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1848 checkstyle errors (more than the master's current 1846 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 5 zombie test(s): at org.apache.hadoop.hdfs.qjournal.client.TestEpochsAreUnique.testSingleThreaded(TestEpochsAreUnique.java:93) at org.apache.hadoop.hbase.client.TestRestoreSnapshotFromClient.testRestoreSchemaChange(TestRestoreSnapshotFromClient.java:202) at org.apache.hadoop.hbase.client.TestReplicasClient.testSmallScanWithReplicas(TestReplicasClient.java:606) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15367//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15367//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15367//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15367//console This message is automatically generated. > Add Spark DataFrame DataSource to HBase-Spark Module > > > Key: HBASE-14181 > URL: https://issues.apache.org/jira/browse/HBASE-14181 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Minor > Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, > HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, > HBASE-14181.6.patch, HBASE-14181.7.patch, HBASE-14181.8.patch, > HBASE-14181.8.patch > > > Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721718#comment-14721718 ] Ted Malaska commented on HBASE-14181: - I just rechecked this patch and I had no trouble compiling and running the test. What am I missing? I did the follwing 1. mvn clean package -DskipTests=true in the root directory 2. mvn -Dtest=NoUnitTests clean verify in the hbase-spark folder Thanks for the help Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, HBASE-14181.6.patch, HBASE-14181.7.patch, HBASE-14181.8.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721725#comment-14721725 ] Sean Busbey commented on HBASE-14181: - it's HBASE-14337. specifically, when building with Hadoop 2.4 we're missing a license for some transitive dependency. Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, HBASE-14181.6.patch, HBASE-14181.7.patch, HBASE-14181.8.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14721756#comment-14721756 ] Ted Malaska commented on HBASE-14181: - What should I do? Let me know what else is needed on this one. This and the documentation are the two last big ones. Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, HBASE-14181.6.patch, HBASE-14181.7.patch, HBASE-14181.8.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716944#comment-14716944 ] Ted Malaska commented on HBASE-14181: - Thanks [~busbey] I missed that comment by [~apurtell]. I will look into it now. Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, HBASE-14181.6.patch, HBASE-14181.7.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14716926#comment-14716926 ] Sean Busbey commented on HBASE-14181: - From [~apurtell] on RB: {quote} Have you looked at the types library? See org.apache.hadoop.hbase.types in hbase-common. Should this share that code/representation for interoperability with types library users? Note the Raw* types are equivalent to encodings made by Bytes.toBytes(type). The DataType type in that package handles encoding of row keys and column qualifiers with typed data. {quote} Looking at the current patch, it's still relying on the Bytes utility directly rather than the hbase datatypes library. Did you get a chance to look at this as an alternative Ted? Was there a particular rationale for picking one over the other? Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, HBASE-14181.6.patch, HBASE-14181.7.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717020#comment-14717020 ] Ted Malaska commented on HBASE-14181: - I reviewed the Types code in [~apurtell] comment. My only concern is SimplePositionedByteRange. I would rather not create one on ever call to encode. So I'm thinking about using ThreadLocal. Does anyone have any better solutions? In the mean time I will make a patch with the Type code and a ThreadLocal Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, HBASE-14181.6.patch, HBASE-14181.7.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14717493#comment-14717493 ] Hadoop QA commented on HBASE-14181: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12752846/HBASE-14181.8.patch against master branch at commit 8f95318f6252c1c0b7a073619525eae6d991f47b. ATTACHMENT ID: 12752846 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:red}-1 javac{color}. The patch appears to cause mvn compile goal to fail with Hadoop version 2.4.0. Compilation errors resume: [ERROR] Error invoking method 'get(java.lang.Integer)' in java.util.ArrayList at META-INF/LICENSE.vm[line 1619, column 22] [ERROR] Failed to execute goal org.apache.maven.plugins:maven-remote-resources-plugin:1.5:process (default) on project hbase-assembly: Error rendering velocity resource. Error invoking method 'get(java.lang.Integer)' in java.util.ArrayList at META-INF/LICENSE.vm[line 1619, column 22]: InvocationTargetException: Index: 0, Size: 0 - [Help 1] [ERROR] [ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch. [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :hbase-assembly Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15305//console This message is automatically generated. Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, HBASE-14181.6.patch, HBASE-14181.7.patch, HBASE-14181.8.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14711494#comment-14711494 ] Ted Malaska commented on HBASE-14181: - This JIRA is ready for review. We should be very close to closing this jira out at this point Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, HBASE-14181.6.patch, HBASE-14181.7.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14700936#comment-14700936 ] Hadoop QA commented on HBASE-14181: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750939/HBASE-14181.5.patch against master branch at commit 395ec5a9bb48324a8b7dd61790a954a2998a8f80. ATTACHMENT ID: 12750939 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1860 checkstyle errors (more than the master's current 1852 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +ScanRange scanRange = new ScanRange(rowRange.getStopRow().toByteArray(), rowRange.getStopRowInclusive(), + val schemaMappingDefinition:java.util.HashMap[String, SchemaQualifierDefinition], +requiredQualifierDefinitionArray.foreach( d = get.addColumn(d.columnFamilyBytes, d.qualifierBytes)) +requiredQualifierDefinitionArray.foreach( d = scan.addColumn(d.columnFamilyBytes, d.qualifierBytes)) + requiredQualifierDefinitionArray.foreach( d = scan.addColumn(d.columnFamilyBytes, d.qualifierBytes)) + var points:mutable.MutableList[Array[Byte]] = new mutable.MutableList[Array[Byte]](), + var ranges:mutable.MutableList[ScanRange] = new mutable.MutableList[ScanRange]() ) extends Serializable { + requiredQualifierDefinitionArray: mutable.MutableList[SchemaQualifierDefinition]):Unit = { + Map(hbase.columns.mapping - KEY_FIELD STRING :key, A_FIELD STRING c:a, B_FIELD STRING c:b,, {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.coprocessor.TestRegionObserverScannerOpenHook org.apache.hadoop.hbase.client.TestMultiParallel org.apache.hadoop.hbase.snapshot.TestSnapshotClientRetries org.apache.hadoop.hbase.snapshot.TestFlushSnapshotFromClient org.apache.hadoop.hbase.TestStochasticBalancerJmxMetrics org.apache.hadoop.hbase.namespace.TestNamespaceAuditor org.apache.hadoop.hbase.snapshot.TestMobExportSnapshot org.apache.hadoop.hbase.snapshot.TestMobFlushSnapshotFromClient org.apache.hadoop.hbase.snapshot.TestMobRestoreFlushSnapshotFromClient org.apache.hadoop.hbase.snapshot.TestSecureExportSnapshot org.apache.hadoop.hbase.client.TestScannersFromClientSide org.apache.hadoop.hbase.snapshot.TestExportSnapshot org.apache.hadoop.hbase.snapshot.TestRestoreFlushSnapshotFromClient {color:red}-1 core zombie tests{color}. There are 4 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15137//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15137//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15137//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15137//console This message is automatically generated. Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch Build a
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14701847#comment-14701847 ] Hadoop QA commented on HBASE-14181: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12751033/HBASE-14181.6.patch against master branch at commit 71d3d24d8b5892121ac9c81606f3c71392a43e0b. ATTACHMENT ID: 12751033 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestRegionReplicaFailover Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15144//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15144//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15144//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15144//console This message is automatically generated. Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch, HBASE-14181.5.patch, HBASE-14181.6.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696644#comment-14696644 ] Hadoop QA commented on HBASE-14181: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12750443/HBASE-14181.4.patch against master branch at commit 4dd30ab019cfbf3691fd08f7941d33d8bbc37f05. ATTACHMENT ID: 12750443 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1858 checkstyle errors (more than the master's current 1856 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + val schemaMappingDefinition:java.util.HashMap[String, SchemaQualifierDefinition], +requiredQualifierDefinitionArray.foreach( d = get.addColumn(d.columnFamilyBytes, d.qualifierBytes)) +requiredQualifierDefinitionArray.foreach( d = scan.addColumn(d.columnFamilyBytes, d.qualifierBytes)) + requiredQualifierDefinitionArray.foreach( d = scan.addColumn(d.columnFamilyBytes, d.qualifierBytes)) + requiredQualifierDefinitionArray: mutable.MutableList[SchemaQualifierDefinition]):Unit = { + Map(hbase.columns.mapping - KEY_FIELD STRING :key, A_FIELD STRING c:a, B_FIELD STRING c:b,, {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.master.procedure.TestWALProcedureStoreOnHDFS Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15095//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15095//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15095//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15095//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15095//console This message is automatically generated. Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch, HBASE-14181.4.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693880#comment-14693880 ] Ted Malaska commented on HBASE-14181: - Making progress. Most of the comments are now in my copy of the code. Going to now add a bunch of unit testing logic and javadoc. Helpfully I will have a patch tomorrow. Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682012#comment-14682012 ] Ted Yu commented on HBASE-14181: Understood. Just wanted to see if tests pass. Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682007#comment-14682007 ] Ted Malaska commented on HBASE-14181: - Thanks [~tedyu] for changing the status. Just note to people watching the last patch is totally and beta version that just scratched by working. I will spend the next two days cleaning it up and building good unit tests. There is just a lot going on to make Spark SQL and HBase interaction work write. So I have to do a couple of passes. Thanks to all the reviewers they will help me get through this process faster. Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682120#comment-14682120 ] Hadoop QA commented on HBASE-14181: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12749725/HBASE-14181.3.patch against master branch at commit 7d4de20cafd6b765bd5f33df72fc0e630d1731f7. ATTACHMENT ID: 12749725 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 2 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1863 checkstyle errors (more than the master's current 1858 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + public PushDownFilterJava(HashMapColumnFamilyQualifierWrapper, ColumnFilter qualifierFilterTupleList) { + def generateSchemaMappingMap(schemaMappingString:String): mutable.Map[String, SchemaQualifierDefinition] = { + val pushDownFilterJava = new PushDownFilterJava(columnFilterCollection.generateFamilyQualifiterFilterMap(serializableMap)) + def buildColumnFilterCollection(parentFilterCollection:ColumnFilterCollection, filter:Filter): Unit = { +new ScanRange(null, true, Utils.getByteValue(attr,schemaMappingDefinition, value.toString), false))) +new ScanRange(null, true, Utils.getByteValue(attr,schemaMappingDefinition, value.toString), true))) + println(Filter: + Bytes.toString(value) + : + upperBoundPass + , + lowerBoundPass + + result) + new ColumnFamilyQualifierWrapper(definition.columnFamilyBytes, definition.qualifierBytes), e._2) + Map(hbase.columns.mapping - KEY_FIELD STRING :key, A_FIELD STRING c:a, B_FIELD STRING c:b,, {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.hadoop.hbase.TestChoreService.testForceTrigger(TestChoreService.java:398) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/15049//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/15049//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/15049//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/15049//console This message is automatically generated. Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Components: spark Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Attachments: HBASE-14181.1.patch, HBASE-14181.2.patch, HBASE-14181.3.patch Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14181) Add Spark DataFrame DataSource to HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14652731#comment-14652731 ] Ted Malaska commented on HBASE-14181: - Note a dataSource in Spark can have a lot of advanced functionality like Filter push down, Scan Range push down, and column filters. This Jira will try to get a base implementation down. But will leave room for more advanced functionality in additional jiras. Ted Malaska Add Spark DataFrame DataSource to HBase-Spark Module Key: HBASE-14181 URL: https://issues.apache.org/jira/browse/HBASE-14181 Project: HBase Issue Type: New Feature Reporter: Ted Malaska Assignee: Ted Malaska Priority: Minor Build a RelationProvider for HBase-Spark Module. -- This message was sent by Atlassian JIRA (v6.3.4#6332)