[jira] [Commented] (HBASE-15333) Enhance the filter to handle short, integer, long, float and double
[ https://issues.apache.org/jira/browse/HBASE-15333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228495#comment-15228495 ] Ted Malaska commented on HBASE-15333: - I gave it a quick look and it looks good but let's give [~jmhsieh] some time to look over it. I like that all the unit tests are reused. Have we tested the performance difference. Is there any reason to be concerned on performance? > Enhance the filter to handle short, integer, long, float and double > --- > > Key: HBASE-15333 > URL: https://issues.apache.org/jira/browse/HBASE-15333 > Project: HBase > Issue Type: Sub-task >Reporter: Zhan Zhang >Assignee: Zhan Zhang > Attachments: HBASE-15333-1.patch, HBASE-15333-2.patch, > HBASE-15333-3.patch, HBASE-15333-4.patch, HBASE-15333-5.patch > > > Currently, the range filter is based on the order of bytes. But for java > primitive type, such as short, int, long, double, float, etc, their order is > not consistent with their byte order, extra manipulation has to be in place > to take care of them correctly. > For example, for the integer range (-100, 100), the filter <= 1, the current > filter will return 0 and 1, and the right return value should be (-100, 1] -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures
[ https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186180#comment-15186180 ] Ted Malaska commented on HBASE-15271: - Hey [~busbey] the current tests will test that everything still works the same with the rename added. It don't however test that a rename happened. > Spark Bulk Load: Need to write HFiles to tmp location then rename to protect > from Spark Executor Failures > - > > Key: HBASE-15271 > URL: https://issues.apache.org/jira/browse/HBASE-15271 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Ted Malaska >Assignee: Ted Malaska > Fix For: 2.0.0 > > Attachments: HBASE-15271.1.patch, HBASE-15271.2.patch, > HBASE-15271.3.patch, HBASE-15271.4.patch > > > With the current code if an executor failure before the HFile is close it > will cause problems. This jira will have the files first write out to a file > that starts with an underscore. Then when the HFile is complete it will be > renamed and the underscore will be removed. > The underscore is important because the load bulk functionality will skip > files with an underscore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14789) Enhance the current spark-hbase connector
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168157#comment-15168157 ] Ted Malaska commented on HBASE-14789: - This looks really cool. Can we add a couple more. 5. Add support for DECIMAL 6. Add support for Nested Types 7. Add support for write with Bulk Load vs Puts with SparkSQL 8. Add support for pluggable change cell format (This is be implemented for item 2) > Enhance the current spark-hbase connector > - > > Key: HBASE-14789 > URL: https://issues.apache.org/jira/browse/HBASE-14789 > Project: HBase > Issue Type: Improvement >Reporter: Zhan Zhang >Assignee: Zhan Zhang > Attachments: shc.pdf > > > This JIRA is to optimize the RDD construction in the current connector > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166481#comment-15166481 ] Ted Malaska commented on HBASE-14801: - I'm having a full day I will be able to look tomorrow. But before we more forward with this. I think I would like to see at least two more committers review this patch. Because this patch includes to components. 1. Is if the code is correct and tested, 2. Is the new style of defining tables what we want to commit to long term. Code review I can do. But the style review I would like to see a couple more people give their say. Because if we make this change I would like to not change it again in the future. > Enhance the Spark-HBase connector catalog with json format > -- > > Key: HBASE-14801 > URL: https://issues.apache.org/jira/browse/HBASE-14801 > Project: HBase > Issue Type: Improvement >Reporter: Zhan Zhang >Assignee: Zhan Zhang > Attachments: HBASE-14801-1.patch, HBASE-14801-2.patch, > HBASE-14801-3.patch, HBASE-14801-4.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster
[ https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159987#comment-15159987 ] Ted Malaska commented on HBASE-15184: - Thank u Ted yu -- Sent from Gmail Mobile > SparkSQL Scan operation doesn't work on kerberos cluster > > > Key: HBASE-15184 > URL: https://issues.apache.org/jira/browse/HBASE-15184 > Project: HBase > Issue Type: Bug > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15184.1.patch, HBaseSparkModule.zip > > > I was using the HBase Spark Module at a client with Kerberos and I ran into > an issue with the Scan. > I made a fix for the client but we need to put it back into HBase. I will > attach my solution, but it has a major problem. I had to over ride a > protected class in spark. I will need help to decover a better approach -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster
[ https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-15184: Attachment: HBASE-15184.1.patch This was tested both in unit tests and in a 10 node kerberos cluster > SparkSQL Scan operation doesn't work on kerberos cluster > > > Key: HBASE-15184 > URL: https://issues.apache.org/jira/browse/HBASE-15184 > Project: HBase > Issue Type: Bug > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBASE-15184.1.patch, HBaseSparkModule.zip > > > I was using the HBase Spark Module at a client with Kerberos and I ran into > an issue with the Scan. > I made a fix for the client but we need to put it back into HBase. I will > attach my solution, but it has a major problem. I had to over ride a > protected class in spark. I will need help to decover a better approach -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster
[ https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156351#comment-15156351 ] Ted Malaska commented on HBASE-15184: - OK finally back ported the code and ran it on my Kerberos Cluster and yup it breaks. Going to make the changes tonight and hopefully tomorrow we will have something. > SparkSQL Scan operation doesn't work on kerberos cluster > > > Key: HBASE-15184 > URL: https://issues.apache.org/jira/browse/HBASE-15184 > Project: HBase > Issue Type: Bug > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBaseSparkModule.zip > > > I was using the HBase Spark Module at a client with Kerberos and I ran into > an issue with the Scan. > I made a fix for the client but we need to put it back into HBase. I will > attach my solution, but it has a major problem. I had to over ride a > protected class in spark. I will need help to decover a better approach -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures
[ https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154871#comment-15154871 ] Ted Malaska commented on HBASE-15271: - Thank you Joh H and Ted Y > Spark Bulk Load: Need to write HFiles to tmp location then rename to protect > from Spark Executor Failures > - > > Key: HBASE-15271 > URL: https://issues.apache.org/jira/browse/HBASE-15271 > Project: HBase > Issue Type: Bug >Affects Versions: 2.0.0 >Reporter: Ted Malaska >Assignee: Ted Malaska > Fix For: 2.0.0 > > Attachments: HBASE-15271.1.patch, HBASE-15271.2.patch, > HBASE-15271.3.patch, HBASE-15271.4.patch > > > With the current code if an executor failure before the HFile is close it > will cause problems. This jira will have the files first write out to a file > that starts with an underscore. Then when the HFile is complete it will be > renamed and the underscore will be removed. > The underscore is important because the load bulk functionality will skip > files with an underscore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures
[ https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-15271: Attachment: HBASE-15271.4.patch Made changes based on Jon H comments > Spark Bulk Load: Need to write HFiles to tmp location then rename to protect > from Spark Executor Failures > - > > Key: HBASE-15271 > URL: https://issues.apache.org/jira/browse/HBASE-15271 > Project: HBase > Issue Type: Bug >Reporter: Ted Malaska >Assignee: Ted Malaska > Attachments: HBASE-15271.1.patch, HBASE-15271.2.patch, > HBASE-15271.3.patch, HBASE-15271.4.patch > > > With the current code if an executor failure before the HFile is close it > will cause problems. This jira will have the files first write out to a file > that starts with an underscore. Then when the HFile is complete it will be > renamed and the underscore will be removed. > The underscore is important because the load bulk functionality will skip > files with an underscore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures
[ https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-15271: Attachment: HBASE-15271.3.patch Made changes for Ted Yu's comment > Spark Bulk Load: Need to write HFiles to tmp location then rename to protect > from Spark Executor Failures > - > > Key: HBASE-15271 > URL: https://issues.apache.org/jira/browse/HBASE-15271 > Project: HBase > Issue Type: Bug >Reporter: Ted Malaska >Assignee: Ted Malaska > Attachments: HBASE-15271.1.patch, HBASE-15271.2.patch, > HBASE-15271.3.patch > > > With the current code if an executor failure before the HFile is close it > will cause problems. This jira will have the files first write out to a file > that starts with an underscore. Then when the HFile is complete it will be > renamed and the underscore will be removed. > The underscore is important because the load bulk functionality will skip > files with an underscore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures
[ https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-15271: Attachment: HBASE-15271.2.patch Added change based on Ted Yu's comment > Spark Bulk Load: Need to write HFiles to tmp location then rename to protect > from Spark Executor Failures > - > > Key: HBASE-15271 > URL: https://issues.apache.org/jira/browse/HBASE-15271 > Project: HBase > Issue Type: Bug >Reporter: Ted Malaska >Assignee: Ted Malaska > Attachments: HBASE-15271.1.patch, HBASE-15271.2.patch > > > With the current code if an executor failure before the HFile is close it > will cause problems. This jira will have the files first write out to a file > that starts with an underscore. Then when the HFile is complete it will be > renamed and the underscore will be removed. > The underscore is important because the load bulk functionality will skip > files with an underscore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15282) Bump Spark on Hbase to use Spark 1.6.
[ https://issues.apache.org/jira/browse/HBASE-15282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153484#comment-15153484 ] Ted Malaska commented on HBASE-15282: - +1 > Bump Spark on Hbase to use Spark 1.6. > - > > Key: HBASE-15282 > URL: https://issues.apache.org/jira/browse/HBASE-15282 > Project: HBase > Issue Type: Improvement > Components: spark >Affects Versions: 2.0.0 >Reporter: Jonathan Hsieh >Assignee: Jonathan Hsieh > Fix For: 2.0.0 > > Attachments: hbase-15282.patch > > > The latest stable Spark is spark 1.6. [1] > Let's bump the version. > [1] http://spark.apache.org/news/spark-1-6-0-released.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures
[ https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-15271: Attachment: HBASE-15271.1.patch First draft. Built and ran tests > Spark Bulk Load: Need to write HFiles to tmp location then rename to protect > from Spark Executor Failures > - > > Key: HBASE-15271 > URL: https://issues.apache.org/jira/browse/HBASE-15271 > Project: HBase > Issue Type: Bug >Reporter: Ted Malaska >Assignee: Ted Malaska > Attachments: HBASE-15271.1.patch > > > With the current code if an executor failure before the HFile is close it > will cause problems. This jira will have the files first write out to a file > that starts with an underscore. Then when the HFile is complete it will be > renamed and the underscore will be removed. > The underscore is important because the load bulk functionality will skip > files with an underscore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures
Ted Malaska created HBASE-15271: --- Summary: Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures Key: HBASE-15271 URL: https://issues.apache.org/jira/browse/HBASE-15271 Project: HBase Issue Type: Bug Reporter: Ted Malaska Assignee: Ted Malaska With the current code if an executor failure before the HFile is close it will cause problems. This jira will have the files first write out to a file that starts with an underscore. Then when the HFile is complete it will be renamed and the underscore will be removed. The underscore is important because the load bulk functionality will skip files with an underscore. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster
[ https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147419#comment-15147419 ] Ted Malaska commented on HBASE-15184: - I'm also here on this Jira. I'm on a clint application now and I want to make sure everything is solid before I submit the batch. I'm sorry this is taking longer then initially planned. But I want to finish my testing on the kerberos cluster first. > SparkSQL Scan operation doesn't work on kerberos cluster > > > Key: HBASE-15184 > URL: https://issues.apache.org/jira/browse/HBASE-15184 > Project: HBase > Issue Type: Bug > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBaseSparkModule.zip > > > I was using the HBase Spark Module at a client with Kerberos and I ran into > an issue with the Scan. > I made a fix for the client but we need to put it back into HBase. I will > attach my solution, but it has a major problem. I had to over ride a > protected class in spark. I will need help to decover a better approach -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15225) Connecting to HBase via newAPIHadoopRDD in PySpark gives org.apache.hadoop.hbase.client.RetriesExhaustedException
[ https://issues.apache.org/jira/browse/HBASE-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136012#comment-15136012 ] Ted Malaska commented on HBASE-15225: - I have a version that works on kerberos clusters that my clients are using. I'm back from vacation this week and I will add it to hbase soon. Let me see if I can get it on my github. > Connecting to HBase via newAPIHadoopRDD in PySpark gives > org.apache.hadoop.hbase.client.RetriesExhaustedException > -- > > Key: HBASE-15225 > URL: https://issues.apache.org/jira/browse/HBASE-15225 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.4 > Environment: spark 1.6.0 , Hbase 0.98.4, kerberos, > hbase.rpc.protection set to authentication. >Reporter: Sanjay Kumar > > Unable to read HBase table into Spark with hbase security authentication set > to kerberos. Seeing the following error. > : org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=31, exceptions: > Thu Feb 04 22:01:55 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:56 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:56 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:57 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:59 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:03 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:13 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:23 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:34 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:04 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:24 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:04 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:24 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:05:04 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.IOException: Connection reset by peer > . > . > . > Thu Feb 04 22:09:46 CST 2016, >
[jira] [Commented] (HBASE-15225) Connecting to HBase via newAPIHadoopRDD in PySpark gives org.apache.hadoop.hbase.client.RetriesExhaustedException
[ https://issues.apache.org/jira/browse/HBASE-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136013#comment-15136013 ] Ted Malaska commented on HBASE-15225: - OK I have updated the Github. This has everything that is in HBase but with the scan kerberos fix plus it is back ported to CDH 5.5. If you are not using CDH then just make the needed changes and fork the code. https://github.com/tmalaska/SparkOnHBase This week I will work with the HBase committers to get the kerberos fit checked in. Let me know if there is anything else you need. > Connecting to HBase via newAPIHadoopRDD in PySpark gives > org.apache.hadoop.hbase.client.RetriesExhaustedException > -- > > Key: HBASE-15225 > URL: https://issues.apache.org/jira/browse/HBASE-15225 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.4 > Environment: spark 1.6.0 , Hbase 0.98.4, kerberos, > hbase.rpc.protection set to authentication. >Reporter: Sanjay Kumar > > Unable to read HBase table into Spark with hbase security authentication set > to kerberos. Seeing the following error. > : org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=31, exceptions: > Thu Feb 04 22:01:55 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:56 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:56 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:57 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:59 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:03 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:13 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:23 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:34 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:04 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:24 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:04 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:24 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:05:04 CST 2016, >
[jira] [Commented] (HBASE-14340) Add second bulk load option to Spark Bulk Load to send puts as the value
[ https://issues.apache.org/jira/browse/HBASE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136083#comment-15136083 ] Ted Malaska commented on HBASE-14340: - Thank you Andrew for your review. :) > Add second bulk load option to Spark Bulk Load to send puts as the value > > > Key: HBASE-14340 > URL: https://issues.apache.org/jira/browse/HBASE-14340 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-14340.1.patch, HBASE-14340.2.patch > > > The initial bulk load option for Spark bulk load sends values over one by one > through the shuffle. This is the similar to how the original MR bulk load > worked. > How ever the MR bulk loader have more then one bulk load option. There is a > second option that allows for all the Column Families, Qualifiers, and Values > or a row to be combined in the map side. > This only works if the row is not super wide. > But if the row is not super wide this method of sending values through the > shuffle will reduce the data and work the shuffle has to deal with. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster
[ https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136082#comment-15136082 ] Ted Malaska commented on HBASE-15184: - ok I'm back from vacation I will try to finish a patch by mid week. [~asrabkin] the problem is really simple. If a scan operation is the first operation you are doing in your spark context (you haven't done a map or foreach yet for example) Then the readers will not have the kerberos creds applied yet and the scan will fail. In the Zip file and in https://github.com/tmalaska/SparkOnHBase I solved this in one way that required a extension of class that I shouldn't have extended. Sense the creation of this Jira I had a vacation and on that vacation I figured out how to implement this solution with out doing that extension that is a no no. I have a busy week, but I will try to get this jira in. Thx > SparkSQL Scan operation doesn't work on kerberos cluster > > > Key: HBASE-15184 > URL: https://issues.apache.org/jira/browse/HBASE-15184 > Project: HBase > Issue Type: Bug > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBaseSparkModule.zip > > > I was using the HBase Spark Module at a client with Kerberos and I ran into > an issue with the Scan. > I made a fix for the client but we need to put it back into HBase. I will > attach my solution, but it has a major problem. I had to over ride a > protected class in spark. I will need help to decover a better approach -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster
[ https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska reassigned HBASE-15184: --- Assignee: Ted Malaska > SparkSQL Scan operation doesn't work on kerberos cluster > > > Key: HBASE-15184 > URL: https://issues.apache.org/jira/browse/HBASE-15184 > Project: HBase > Issue Type: Bug > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Critical > Fix For: 2.0.0 > > Attachments: HBaseSparkModule.zip > > > I was using the HBase Spark Module at a client with Kerberos and I ran into > an issue with the Scan. > I made a fix for the client but we need to put it back into HBase. I will > attach my solution, but it has a major problem. I had to over ride a > protected class in spark. I will need help to decover a better approach -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15225) Connecting to HBase via newAPIHadoopRDD in PySpark gives org.apache.hadoop.hbase.client.RetriesExhaustedException
[ https://issues.apache.org/jira/browse/HBASE-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135624#comment-15135624 ] Ted Malaska commented on HBASE-15225: - This should be marked as involved Jira. [~88.sanjay] don't use newAPIHadoopRDD use the functions defined in the HBaseContext object which is defined in the hbaseSpark Module. That will take care of all you Spark to HBase connection issues. Documentation can be found here: https://hbase.apache.org/book.html#spark > Connecting to HBase via newAPIHadoopRDD in PySpark gives > org.apache.hadoop.hbase.client.RetriesExhaustedException > -- > > Key: HBASE-15225 > URL: https://issues.apache.org/jira/browse/HBASE-15225 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.4 > Environment: spark 1.6.0 , Hbase 0.98.4, kerberos, > hbase.rpc.protection set to authentication. >Reporter: Sanjay Kumar > > Unable to read HBase table into Spark with hbase security authentication set > to kerberos. Seeing the following error. > : org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=31, exceptions: > Thu Feb 04 22:01:55 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:56 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:56 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:57 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:59 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:03 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:13 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:23 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:34 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:04 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:24 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:04 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:24 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:05:04 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to
[jira] [Commented] (HBASE-15225) Connecting to HBase via newAPIHadoopRDD in PySpark gives org.apache.hadoop.hbase.client.RetriesExhaustedException
[ https://issues.apache.org/jira/browse/HBASE-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135626#comment-15135626 ] Ted Malaska commented on HBASE-15225: - Also if you are using PySpark then use the Spark SQL to HBase access pattern also found in here https://hbase.apache.org/book.html#_sparksql_dataframes > Connecting to HBase via newAPIHadoopRDD in PySpark gives > org.apache.hadoop.hbase.client.RetriesExhaustedException > -- > > Key: HBASE-15225 > URL: https://issues.apache.org/jira/browse/HBASE-15225 > Project: HBase > Issue Type: Bug >Affects Versions: 0.98.4 > Environment: spark 1.6.0 , Hbase 0.98.4, kerberos, > hbase.rpc.protection set to authentication. >Reporter: Sanjay Kumar > > Unable to read HBase table into Spark with hbase security authentication set > to kerberos. Seeing the following error. > : org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after > attempts=31, exceptions: > Thu Feb 04 22:01:55 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:56 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:56 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:57 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:01:59 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:03 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:13 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:23 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:34 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:02:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:04 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:24 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:03:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:04 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:24 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:04:44 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.EOFException > Thu Feb 04 22:05:04 CST 2016, > org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, > java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed > on local exception: java.io.IOException: Connection reset by peer > . > . > . > Thu Feb 04 22:09:46 CST 2016, >
[jira] [Updated] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster
[ https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-15184: Attachment: HBaseSparkModule.zip Solution that worked on CDH 5.5 on client kerberos cluster, but also includes spark package to override a protected class. > SparkSQL Scan operation doesn't work on kerberos cluster > > > Key: HBASE-15184 > URL: https://issues.apache.org/jira/browse/HBASE-15184 > Project: HBase > Issue Type: Bug >Reporter: Ted Malaska > Attachments: HBaseSparkModule.zip > > > I was using the HBase Spark Module at a client with Kerberos and I ran into > an issue with the Scan. > I made a fix for the client but we need to put it back into HBase. I will > attach my solution, but it has a major problem. I had to over ride a > protected class in spark. I will need help to decover a better approach -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster
Ted Malaska created HBASE-15184: --- Summary: SparkSQL Scan operation doesn't work on kerberos cluster Key: HBASE-15184 URL: https://issues.apache.org/jira/browse/HBASE-15184 Project: HBase Issue Type: Bug Reporter: Ted Malaska I was using the HBase Spark Module at a client with Kerberos and I ran into an issue with the Scan. I made a fix for the client but we need to put it back into HBase. I will attach my solution, but it has a major problem. I had to over ride a protected class in spark. I will need help to decover a better approach -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14796) Enhance the Gets in the connector
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072938#comment-15072938 ] Ted Malaska commented on HBASE-14796: - Zhan good points. I agree, even if it is slower it is better. Thanks > Enhance the Gets in the connector > - > > Key: HBASE-14796 > URL: https://issues.apache.org/jira/browse/HBASE-14796 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: HBASE-14976.patch > > > Current the Spark-Module Spark SQL implementation gets records from HBase > from the driver if there is something like the following found in the SQL. > rowkey = 123 > The reason for this original was normal sql will not have many equal > operations in a single where clause. > Zhan, had brought up too points that have value. > 1. The SQL may be generated and may have many many equal statements in it so > moving the work to an executor protects the driver from load > 2. In the correct implementation the drive is connecting to HBase and > exceptions may cause trouble with the Spark application and not just with the > a single task execution -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15036) Update HBase Spark documentation to include bulk load with thin records
[ https://issues.apache.org/jira/browse/HBASE-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-15036: Attachment: HBASE-15036.patch First Draft > Update HBase Spark documentation to include bulk load with thin records > --- > > Key: HBASE-15036 > URL: https://issues.apache.org/jira/browse/HBASE-15036 > Project: HBase > Issue Type: New Feature >Reporter: Ted Malaska >Assignee: Ted Malaska > Attachments: HBASE-15036.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-15036) Update HBase Spark documentation to include bulk load with thin records
[ https://issues.apache.org/jira/browse/HBASE-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-15036: Attachment: HBASE-15036.1.patch Removed extra spaces > Update HBase Spark documentation to include bulk load with thin records > --- > > Key: HBASE-15036 > URL: https://issues.apache.org/jira/browse/HBASE-15036 > Project: HBase > Issue Type: New Feature >Reporter: Ted Malaska >Assignee: Ted Malaska > Attachments: HBASE-15036.1.patch, HBASE-15036.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14796) Enhance the Gets in the connector
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070177#comment-15070177 ] Ted Malaska commented on HBASE-14796: - I reviewed the code and I'm giving it a +1 Did you do the performance tests? As long as we are not going slower I'm good here. The test should be done on a cluster not in local mode. It can be done on warmed Yarn containers, we don't need to count the time to start Yarn. I would like to se what the different in time is when running the following tests: 1. a select statement with a single get 2. a select statement with a 10 get 3. a select statement with a 1000 get Maybe also we should test with different row sizes. 1. 300bits 2. 3Kb 3. 30KB Let me know what you think. Thanks again Zhan > Enhance the Gets in the connector > - > > Key: HBASE-14796 > URL: https://issues.apache.org/jira/browse/HBASE-14796 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: HBASE-14976.patch > > > Current the Spark-Module Spark SQL implementation gets records from HBase > from the driver if there is something like the following found in the SQL. > rowkey = 123 > The reason for this original was normal sql will not have many equal > operations in a single where clause. > Zhan, had brought up too points that have value. > 1. The SQL may be generated and may have many many equal statements in it so > moving the work to an executor protects the driver from load > 2. In the correct implementation the drive is connecting to HBase and > exceptions may cause trouble with the Spark application and not just with the > a single task execution -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-15036) Update HBase Spark documentation to include bulk load with thin records
Ted Malaska created HBASE-15036: --- Summary: Update HBase Spark documentation to include bulk load with thin records Key: HBASE-15036 URL: https://issues.apache.org/jira/browse/HBASE-15036 Project: HBase Issue Type: New Feature Reporter: Ted Malaska Assignee: Ted Malaska -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064491#comment-15064491 ] Ted Malaska commented on HBASE-14849: - I reviewed the your changes. I'm liking what you did. So I give it a +1 But let one more person review Thanks Zhan > Add option to set block cache to false on SparkSQL executions > - > > Key: HBASE-14849 > URL: https://issues.apache.org/jira/browse/HBASE-14849 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Zhan Zhang > Attachments: HBASE-14849-1.patch, HBASE-14849-2.patch, > HBASE-14849.patch > > > I was working at a client with a ported down version of the Spark module for > HBase and realized we didn't add an option to turn of block cache for the > scans. > At the client I just disabled all caching with Spark SQL, this is an easy but > very impactful fix. > The fix for this patch will make this configurable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062634#comment-15062634 ] Ted Malaska commented on HBASE-14849: - I added comments > Add option to set block cache to false on SparkSQL executions > - > > Key: HBASE-14849 > URL: https://issues.apache.org/jira/browse/HBASE-14849 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Zhan Zhang > Attachments: HBASE-14849-1.patch, HBASE-14849.patch > > > I was working at a client with a ported down version of the Spark module for > HBase and realized we didn't add an option to turn of block cache for the > scans. > At the client I just disabled all caching with Spark SQL, this is an easy but > very impactful fix. > The fix for this patch will make this configurable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062684#comment-15062684 ] Ted Malaska commented on HBASE-14849: - Thanks Zhan, I updated again. Thank you for the work > Add option to set block cache to false on SparkSQL executions > - > > Key: HBASE-14849 > URL: https://issues.apache.org/jira/browse/HBASE-14849 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Zhan Zhang > Attachments: HBASE-14849-1.patch, HBASE-14849.patch > > > I was working at a client with a ported down version of the Spark module for > HBase and realized we didn't add an option to turn of block cache for the > scans. > At the client I just disabled all caching with Spark SQL, this is an easy but > very impactful fix. > The fix for this patch will make this configurable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059253#comment-15059253 ] Ted Malaska commented on HBASE-14849: - Can you make the review board? Thanks > Add option to set block cache to false on SparkSQL executions > - > > Key: HBASE-14849 > URL: https://issues.apache.org/jira/browse/HBASE-14849 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Zhan Zhang > Attachments: HBASE-14849.patch > > > I was working at a client with a ported down version of the Spark module for > HBase and realized we didn't add an option to turn of block cache for the > scans. > At the client I just disabled all caching with Spark SQL, this is an easy but > very impactful fix. > The fix for this patch will make this configurable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14849: Assignee: Zhan Zhang (was: Ted Malaska) > Add option to set block cache to false on SparkSQL executions > - > > Key: HBASE-14849 > URL: https://issues.apache.org/jira/browse/HBASE-14849 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Zhan Zhang > > I was working at a client with a ported down version of the Spark module for > HBase and realized we didn't add an option to turn of block cache for the > scans. > At the client I just disabled all caching with Spark SQL, this is an easy but > very impactful fix. > The fix for this patch will make this configurable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057071#comment-15057071 ] Ted Malaska commented on HBASE-14849: - Thanks Zhan. I will be able to review as soon as you finish > Add option to set block cache to false on SparkSQL executions > - > > Key: HBASE-14849 > URL: https://issues.apache.org/jira/browse/HBASE-14849 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Zhan Zhang > > I was working at a client with a ported down version of the Spark module for > HBase and realized we didn't add an option to turn of block cache for the > scans. > At the client I just disabled all caching with Spark SQL, this is an easy but > very impactful fix. > The fix for this patch will make this configurable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052639#comment-15052639 ] Ted Malaska commented on HBASE-14795: - Will do, but don't hold up this patch for my testing. If I find anything we will connect through a new jira. Also once this patch is in I will like to get HBASE-14849 starting and checked in. Let me know if you want to do HBASE-14849 or if your ok with me doing it. We can have that chat on the HBASE-14849 jira. Thanks again Zhan > Enhance the spark-hbase scan operations > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: > 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, > HBASE-14795-1.patch, HBASE-14795-2.patch, HBASE-14795-3.patch, > HBASE-14795-4.patch > > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14929) There is a space missing from Table "foo" is not currently available.
[ https://issues.apache.org/jira/browse/HBASE-14929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053519#comment-15053519 ] Ted Malaska commented on HBASE-14929: - Great lets get the patch file up on the jira and create a review board entry and I will help review. > There is a space missing from Table "foo" is not currently available. > - > > Key: HBASE-14929 > URL: https://issues.apache.org/jira/browse/HBASE-14929 > Project: HBase > Issue Type: Bug >Reporter: Ted Malaska >Assignee: Carlos A. Morillo >Priority: Trivial > > Go to the following line in LoadIncrementalHFiles.java > throw new TableNotFoundException("Table " + table.getName() + "is not > currently available."); > and add a space before is and after ' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14929) There is a space missing from Table "foo" is not currently available.
[ https://issues.apache.org/jira/browse/HBASE-14929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053531#comment-15053531 ] Ted Malaska commented on HBASE-14929: - +1 > There is a space missing from Table "foo" is not currently available. > - > > Key: HBASE-14929 > URL: https://issues.apache.org/jira/browse/HBASE-14929 > Project: HBase > Issue Type: Bug >Reporter: Ted Malaska >Assignee: Carlos A. Morillo >Priority: Trivial > Attachments: HBASE-14929.patch > > > Go to the following line in LoadIncrementalHFiles.java > throw new TableNotFoundException("Table " + table.getName() + "is not > currently available."); > and add a space before is and after ' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051936#comment-15051936 ] Ted Malaska commented on HBASE-14795: - That was a cool addition. I like the wrapping of the function to catch the exceptions. I'm +1 also next week I'm going to run this on a 10 billion record + dataset just to see it in action. Since I'm not a commenter I don't know if my +1 means much, but you have it. Thanks Zhan > Enhance the spark-hbase scan operations > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: > 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, > HBASE-14795-1.patch, HBASE-14795-2.patch, HBASE-14795-3.patch, > HBASE-14795-4.patch > > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051244#comment-15051244 ] Ted Malaska commented on HBASE-14795: - Hey Zhan I left one comment about the sync block. and I do see that you added a bunch of try catch blocks. But the problem still remains where the table and scanner can be un closed. I think we need to add something like the following: https://github.com/apache/spark/blob/f434f36d508eb4dcade70871611fc022ae0feb56/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L154 You will note this is given for free if you use a InputFormat. Which asks the question should these changes go back into the TableInputFormat and we just use the TableInputFormat. This would allow us to maintain reading from table in one location and it would also mean you don't have to worry about the life cycle of anything. > Enhance the spark-hbase scan operations > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: > 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, > HBASE-14795-1.patch, HBASE-14795-2.patch, HBASE-14795-3.patch > > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047701#comment-15047701 ] Ted Malaska commented on HBASE-14795: - Will review tonight. Long day :) -- Sent from Gmail Mobile > Enhance the spark-hbase scan operations > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: > 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, > HBASE-14795-1.patch, HBASE-14795-2.patch > > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047779#comment-15047779 ] Ted Malaska commented on HBASE-14795: - OK finished my review. Please review my review, it seems to me that there is still failure paths where closing tables or scanner doesn't take place. Leaving open tables and scanners on long lived executors. There is also a question about thread safety and a question about implicit methods that don't seem to be called. If you get a patch in tomorrow I can review tomorrow night. > Enhance the spark-hbase scan operations > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: > 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, > HBASE-14795-1.patch, HBASE-14795-2.patch > > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046210#comment-15046210 ] Ted Malaska commented on HBASE-14795: - I just looked at the review board. I don't see the changes. I see the comment status updated but the code doesn't look to have changed. Am I missing something? Thanks > Enhance the spark-hbase scan operations > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: > 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, > HBASE-14795-1.patch > > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044226#comment-15044226 ] Ted Malaska commented on HBASE-14795: - I did a first pass review and left comments. Mainly concerned about closing scanners and tables. > Enhance the spark-hbase scan operations > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: > 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch > > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14929) There is a space missing from Table "foo" is not currently available.
Ted Malaska created HBASE-14929: --- Summary: There is a space missing from Table "foo" is not currently available. Key: HBASE-14929 URL: https://issues.apache.org/jira/browse/HBASE-14929 Project: HBase Issue Type: Bug Reporter: Ted Malaska Priority: Trivial Go to the following line in LoadIncrementalHFiles.java throw new TableNotFoundException("Table " + table.getName() + "is not currently available."); and add a space before is and after ' -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039668#comment-15039668 ] Ted Malaska commented on HBASE-14795: - Can we open up a review board for this. Thx > Enhance the spark-hbase scan operations > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > Attachments: > 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch > > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14849: Description: I was working at a client with a ported down version of the Spark module for HBase and realized we didn't add an option to turn of block cache for the scans. At the client I just disabled all caching with Spark SQL, this is an easy but very impactful fix. The fix for this patch will make this configurable was: I was working at a client with a ported down version of the Spark module for HBase and realized we didn't add an option to turn of block cache for the scans. This is an easy but very impactful fix. > Add option to set block cache to false on SparkSQL executions > - > > Key: HBASE-14849 > URL: https://issues.apache.org/jira/browse/HBASE-14849 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska > > I was working at a client with a ported down version of the Spark module for > HBase and realized we didn't add an option to turn of block cache for the > scans. > At the client I just disabled all caching with Spark SQL, this is an easy but > very impactful fix. > The fix for this patch will make this configurable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-14849) Add option to set block cache to false on SparkSQL executions
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska reassigned HBASE-14849: --- Assignee: Ted Malaska > Add option to set block cache to false on SparkSQL executions > - > > Key: HBASE-14849 > URL: https://issues.apache.org/jira/browse/HBASE-14849 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska > > I was working at a client with a ported down version of the Spark module for > HBase and realized we didn't add an option to turn of block cache for the > scans. > This is an easy but very impactful fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions
[ https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029922#comment-15029922 ] Ted Malaska commented on HBASE-14849: - If it is ok I'm going to start this, with the hopes of getting a patch in the next 5 days. It should be an easy pass. My only worry is how to unit test something like this. hmm > Add option to set block cache to false on SparkSQL executions > - > > Key: HBASE-14849 > URL: https://issues.apache.org/jira/browse/HBASE-14849 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska > > I was working at a client with a ported down version of the Spark module for > HBase and realized we didn't add an option to turn of block cache for the > scans. > This is an easy but very impactful fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014342#comment-15014342 ] Ted Malaska commented on HBASE-14795: - Hey Zhan, What is your ETA on this JIRA. I just opened HBASE-14849 and I wanted to know if I should do that now or wait until this jira is done, or if you want to include HBASE-14849 into this jira. Let me know. > Enhance the spark-hbase scan operations > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14849) Add option to set block cache to false on SparkSQL executions
Ted Malaska created HBASE-14849: --- Summary: Add option to set block cache to false on SparkSQL executions Key: HBASE-14849 URL: https://issues.apache.org/jira/browse/HBASE-14849 Project: HBase Issue Type: New Feature Reporter: Ted Malaska I was working at a client with a ported down version of the Spark module for HBase and realized we didn't add an option to turn of block cache for the scans. This is an easy but very impactful fix. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14340) Add second bulk load option to Spark Bulk Load to send puts as the value
[ https://issues.apache.org/jira/browse/HBASE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009797#comment-15009797 ] Ted Malaska commented on HBASE-14340: - Thank u Andrew. Let me know if there r any other jiras u would like me to look at. Thank again On Tuesday, November 17, 2015, Andrew Purtell (JIRA)-- Sent from Gmail Mobile > Add second bulk load option to Spark Bulk Load to send puts as the value > > > Key: HBASE-14340 > URL: https://issues.apache.org/jira/browse/HBASE-14340 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-14340.1.patch, HBASE-14340.2.patch > > > The initial bulk load option for Spark bulk load sends values over one by one > through the shuffle. This is the similar to how the original MR bulk load > worked. > How ever the MR bulk loader have more then one bulk load option. There is a > second option that allows for all the Column Families, Qualifiers, and Values > or a row to be combined in the map side. > This only works if the row is not super wide. > But if the row is not super wide this method of sending values through the > shuffle will reduce the data and work the shuffle has to deal with. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14340) Add second bulk load option to Spark Bulk Load to send puts as the value
[ https://issues.apache.org/jira/browse/HBASE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14340: Attachment: HBASE-14340.2.patch Fixed copy paste issue. It was my mistake. The code was write on my laptop but I had made the patch out of sycn or something. Thanks for finding that. > Add second bulk load option to Spark Bulk Load to send puts as the value > > > Key: HBASE-14340 > URL: https://issues.apache.org/jira/browse/HBASE-14340 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-14340.1.patch, HBASE-14340.2.patch > > > The initial bulk load option for Spark bulk load sends values over one by one > through the shuffle. This is the similar to how the original MR bulk load > worked. > How ever the MR bulk loader have more then one bulk load option. There is a > second option that allows for all the Column Families, Qualifiers, and Values > or a row to be combined in the map side. > This only works if the row is not super wide. > But if the row is not super wide this method of sending values through the > shuffle will reduce the data and work the shuffle has to deal with. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14340) Add second bulk load option to Spark Bulk Load to send puts as the value
[ https://issues.apache.org/jira/browse/HBASE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004362#comment-15004362 ] Ted Malaska commented on HBASE-14340: - Thank you Andrew for the review I will get to this jira in the next couple of days. > Add second bulk load option to Spark Bulk Load to send puts as the value > > > Key: HBASE-14340 > URL: https://issues.apache.org/jira/browse/HBASE-14340 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska >Priority: Minor > Fix For: 2.0.0 > > Attachments: HBASE-14340.1.patch > > > The initial bulk load option for Spark bulk load sends values over one by one > through the shuffle. This is the similar to how the original MR bulk load > worked. > How ever the MR bulk loader have more then one bulk load option. There is a > second option that allows for all the Column Families, Qualifiers, and Values > or a row to be combined in the map side. > This only works if the row is not super wide. > But if the row is not super wide this method of sending values through the > shuffle will reduce the data and work the shuffle has to deal with. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format
[ https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003057#comment-15003057 ] Ted Malaska commented on HBASE-14801: - I have no problem with this, I think it looks a lot prettier then what I did on the first draft. Does anyone else have an thought on this? We don't want to change this too many times once it gets in users hands, so let agree that this JSON format is what we want long term. > Enhance the Spark-HBase connector catalog with json format > -- > > Key: HBASE-14801 > URL: https://issues.apache.org/jira/browse/HBASE-14801 > Project: HBase > Issue Type: Improvement >Reporter: Zhan Zhang >Assignee: Zhan Zhang > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14789) Enhance the current spark-hbase connector
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003062#comment-15003062 ] Ted Malaska commented on HBASE-14789: - Adding Jira for Changing the table definition to JSON > Enhance the current spark-hbase connector > - > > Key: HBASE-14789 > URL: https://issues.apache.org/jira/browse/HBASE-14789 > Project: HBase > Issue Type: Improvement >Reporter: Zhan Zhang >Assignee: Zhan Zhang > Attachments: shc.pdf > > > This JIRA is to optimize the RDD construction in the current connector > implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001362#comment-15001362 ] Ted Malaska commented on HBASE-14796: - Yeah agreed. It also depends on the time it takes to start a task. But yeah I'm very interested to see if there is a difference. It is a great science experiment :) > Provide an alternative spark-hbase SQL implementations for Gets > --- > > Key: HBASE-14796 > URL: https://issues.apache.org/jira/browse/HBASE-14796 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > > Current the Spark-Module Spark SQL implementation gets records from HBase > from the driver if there is something like the following found in the SQL. > rowkey = 123 > The reason for this original was normal sql will not have many equal > operations in a single where clause. > Zhan, had brought up too points that have value. > 1. The SQL may be generated and may have many many equal statements in it so > moving the work to an executor protects the driver from load > 2. In the correct implementation the drive is connecting to HBase and > exceptions may cause trouble with the Spark application and not just with the > a single task execution -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001136#comment-15001136 ] Ted Malaska commented on HBASE-14795: - I would like that. Thanks Zhan. > Provide an alternative spark-hbase SQL implementations for Scan > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001083#comment-15001083 ] Ted Malaska commented on HBASE-14789: - Hey Zhan, I'm not sure I understand the question. What I'm thinking is the changes you are asking for should fit nicely into the existing code. And we can use the sub jira to discuss the implementations of each. Example with the Scan implementation I would like to ask if that functionality could be added to tableInputFormat because it could be of value to more then just SparkSQL and because we can consolidate code. For the BulkGet implementation I would like to see some performance tests to make sure we are not introducing latancy, also if we should use the existing BulkGet functionality in HBase-Spark because we might want to execute the gets in more then one task. But lets have this discussions in the sub jiras, for they are completely different components that are not dependent on each other. Thanks > Provide an alternative spark-hbase connector > > > Key: HBASE-14789 > URL: https://issues.apache.org/jira/browse/HBASE-14789 > Project: HBase > Issue Type: Improvement >Reporter: Zhan Zhang >Assignee: Zhan Zhang > Attachments: shc.pdf > > > This JIRA is to provide user an option to choose different Spark-HBase > implementation based on requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001156#comment-15001156 ] Ted Malaska commented on HBASE-14796: - My only concern here is if we are adding latency for the normal single row get query. Can you run some tests to see what if an impact there is on this? Not just a unit test but a test of a real cluster. If the latency difference is nothing big they I don't see any problem with the full change to the executor get design. If the latency change is huge, maybe we can make this configurable. > Provide an alternative spark-hbase SQL implementations for Gets > --- > > Key: HBASE-14796 > URL: https://issues.apache.org/jira/browse/HBASE-14796 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > > Current the Spark-Module Spark SQL implementation gets records from HBase > from the driver if there is something like the following found in the SQL. > rowkey = 123 > The reason for this original was normal sql will not have many equal > operations in a single where clause. > Zhan, had brought up too points that have value. > 1. The SQL may be generated and may have many many equal statements in it so > moving the work to an executor protects the driver from load > 2. In the correct implementation the drive is connecting to HBase and > exceptions may cause trouble with the Spark application and not just with the > a single task execution -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001301#comment-15001301 ] Ted Malaska commented on HBASE-14796: - I agreed with point one. But the use case I'm thinking about is one like this. HBase table 100 million or a billion records (number does matter much, just make it a lot) Then the select looks like this Select * from hbase_table where rowkey = "foobar" I can see this being very common not optimal but common. > Provide an alternative spark-hbase SQL implementations for Gets > --- > > Key: HBASE-14796 > URL: https://issues.apache.org/jira/browse/HBASE-14796 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > > Current the Spark-Module Spark SQL implementation gets records from HBase > from the driver if there is something like the following found in the SQL. > rowkey = 123 > The reason for this original was normal sql will not have many equal > operations in a single where clause. > Zhan, had brought up too points that have value. > 1. The SQL may be generated and may have many many equal statements in it so > moving the work to an executor protects the driver from load > 2. In the correct implementation the drive is connecting to HBase and > exceptions may cause trouble with the Spark application and not just with the > a single task execution -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999385#comment-14999385 ] Ted Malaska commented on HBASE-14789: - Can you help me understand what components this has that don't already exist in the current HBase-Spark module and also what requirements are not met by the current Spark-Module implementation but are supported with this code? Thanks > Provide an alternative spark-hbase connector > > > Key: HBASE-14789 > URL: https://issues.apache.org/jira/browse/HBASE-14789 > Project: HBase > Issue Type: Bug >Reporter: Zhan Zhang >Assignee: Zhan Zhang > Attachments: shc.pdf > > > This JIRA is to provide user an option to choose different Spark-HBase > implementation based on requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan
[ https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999621#comment-14999621 ] Ted Malaska commented on HBASE-14795: - There is no real negative to this proposed approach other then a second implementation of table scan. To bad the existing TableInputFormat can not be updated to handle this because then this would be in one local. As for implementation these is no reason this can't just be invoked straight from line 330 from DefaultSource or could be an alternate implementation in hbaseRDD that tables multi scan objects. https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/DefaultSource.scala#L330 > Provide an alternative spark-hbase SQL implementations for Scan > --- > > Key: HBASE-14795 > URL: https://issues.apache.org/jira/browse/HBASE-14795 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > > This is a sub-jira of HBASE-14789. This jira is to focus on the replacement > of TableInputFormat for a more custom scan implementation that will make the > following use case more effective. > Use case: > In the case you have multiple scan ranges on a single table with in a single > query. TableInputFormat will scan the the outer range of the scan start and > end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999628#comment-14999628 ] Ted Malaska commented on HBASE-14789: - Put comments related to the bulk get implementation in jira HBASE-14796 > Provide an alternative spark-hbase connector > > > Key: HBASE-14789 > URL: https://issues.apache.org/jira/browse/HBASE-14789 > Project: HBase > Issue Type: Improvement >Reporter: Zhan Zhang >Assignee: Zhan Zhang > Attachments: shc.pdf > > > This JIRA is to provide user an option to choose different Spark-HBase > implementation based on requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999635#comment-14999635 ] Ted Malaska commented on HBASE-14796: - If implemented this code would fit great right around https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/DefaultSource.scala#L347 > Provide an alternative spark-hbase SQL implementations for Gets > --- > > Key: HBASE-14796 > URL: https://issues.apache.org/jira/browse/HBASE-14796 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > > Current the Spark-Module Spark SQL implementation gets records from HBase > from the driver if there is something like the following found in the SQL. > rowkey = 123 > The reason for this original was normal sql will not have many equal > operations in a single where clause. > Zhan, had brought up too points that have value. > 1. The SQL may be generated and may have many many equal statements in it so > moving the work to an executor protects the driver from load > 2. In the correct implementation the drive is connecting to HBase and > exceptions may cause trouble with the Spark application and not just with the > a single task execution -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets
Ted Malaska created HBASE-14796: --- Summary: Provide an alternative spark-hbase SQL implementations for Gets Key: HBASE-14796 URL: https://issues.apache.org/jira/browse/HBASE-14796 Project: HBase Issue Type: Improvement Reporter: Ted Malaska Assignee: Zhan Zhang Priority: Minor Current the Spark-Module Spark SQL implementation gets records from HBase from the driver if there is something like the following found in the SQL. rowkey = 123 The reason for this original was normal sql will not have many equal operations in a single where clause. Zhan, had brought up too points that have value. 1. The SQL may be generated and may have many many equal statements in it so moving the work to an executor protects the driver from load 2. In the correct implementation the drive is connecting to HBase and exceptions may cause trouble with the Spark application and not just with the a single task execution -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999422#comment-14999422 ] Ted Malaska commented on HBASE-14789: - Cool I read the doc so there are two points. * Bulk Get - Do bulk Gets on an executor * TableInputFormat - Don't use this because or the thought that only one can run at a time * Change the table description format - Add more JSON like definition * Add write support - For SparkSQL writes to HBase #First lets talk to each point first: * Bulk Get: - As we have talked about in other jira's executing this on the executor side really doesn't add much value. It would be vary odd if people would have more then a 1000 equals in a where cause. If they did then we need to figure out at what point 1000, 1, 5 does it become faster to run the code on the executor. The normal use case is just a couple = per where cause so this is not a real concern, now if you want to do a real bulk get then use the bulk get command, that will be much better for a lot of reasons. * Not Using TableInputFormat: In the code today Spark if given the TablInputFormat in different requests so they are at different points on the DAG. So why does Spark not read from both? Also the locality is given and we are not reinventing the wheel. * Change the table description format: This is a preference thing is current version is more like the HBase shell. Ether way makes sense it makes no real difference. * Add write support: Yes we should add this. #Summery First I think any and all changes would fit into the current implementation of the HBase-Spark module with little changes. This are pretty pointed changes that effect a scoped area of the code. Second we should separate out this jira into 4 different jiras each focusing on the different points, for these different points are not dependent or related. We should open up a jira to address each features and then discuss the approach for each one and how it can be added and or if it should be added. Thanks Zhan Let me know if I missed anything > Provide an alternative spark-hbase connector > > > Key: HBASE-14789 > URL: https://issues.apache.org/jira/browse/HBASE-14789 > Project: HBase > Issue Type: Bug >Reporter: Zhan Zhang >Assignee: Zhan Zhang > Attachments: shc.pdf > > > This JIRA is to provide user an option to choose different Spark-HBase > implementation based on requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan
Ted Malaska created HBASE-14795: --- Summary: Provide an alternative spark-hbase SQL implementations for Scan Key: HBASE-14795 URL: https://issues.apache.org/jira/browse/HBASE-14795 Project: HBase Issue Type: Improvement Reporter: Ted Malaska Assignee: Zhan Zhang Priority: Minor This is a sub-jira of HBASE-14789. This jira is to focus on the replacement of TableInputFormat for a more custom scan implementation that will make the following use case more effective. Use case: In the case you have multiple scan ranges on a single table with in a single query. TableInputFormat will scan the the outer range of the scan start and end range where this implementation can be more pointed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999623#comment-14999623 ] Ted Malaska commented on HBASE-14789: - Put response to TableInputFormat design in HBASE-14795 > Provide an alternative spark-hbase connector > > > Key: HBASE-14789 > URL: https://issues.apache.org/jira/browse/HBASE-14789 > Project: HBase > Issue Type: Improvement >Reporter: Zhan Zhang >Assignee: Zhan Zhang > Attachments: shc.pdf > > > This JIRA is to provide user an option to choose different Spark-HBase > implementation based on requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector
[ https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999622#comment-14999622 ] Ted Malaska commented on HBASE-14789: - This is a sub jira > Provide an alternative spark-hbase connector > > > Key: HBASE-14789 > URL: https://issues.apache.org/jira/browse/HBASE-14789 > Project: HBase > Issue Type: Improvement >Reporter: Zhan Zhang >Assignee: Zhan Zhang > Attachments: shc.pdf > > > This JIRA is to provide user an option to choose different Spark-HBase > implementation based on requirements. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets
[ https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999633#comment-14999633 ] Ted Malaska commented on HBASE-14796: - So there is value in this idea for the generated queries, but for normal SQL operations it may be over kill that we need to use a task on an executor to get a single record from HBase. As for the argument about protecting the driver there is some merit to this. I think there is more merit to the first argument for distributed the get load to the executers to support multi user environments. But honestly if the developer is using Spark SQL to gets on HBase I question the approach. The user would be better off using the Spark-Module Bulk Get functionality that is already checked in. That implementation will distribute the gets across N number of tasks and executors. > Provide an alternative spark-hbase SQL implementations for Gets > --- > > Key: HBASE-14796 > URL: https://issues.apache.org/jira/browse/HBASE-14796 > Project: HBase > Issue Type: Improvement >Reporter: Ted Malaska >Assignee: Zhan Zhang >Priority: Minor > > Current the Spark-Module Spark SQL implementation gets records from HBase > from the driver if there is something like the following found in the SQL. > rowkey = 123 > The reason for this original was normal sql will not have many equal > operations in a single where clause. > Zhan, had brought up too points that have value. > 1. The SQL may be generated and may have many many equal statements in it so > moving the work to an executor protects the driver from load > 2. In the correct implementation the drive is connecting to HBase and > exceptions may cause trouble with the Spark application and not just with the > a single task execution -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14149) Add Data Frame support for HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982527#comment-14982527 ] Ted Malaska commented on HBASE-14149: - closing this jira because we got dataframe support with https://issues.apache.org/jira/browse/HBASE-14181 > Add Data Frame support for HBase-Spark Module > - > > Key: HBASE-14149 > URL: https://issues.apache.org/jira/browse/HBASE-14149 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska > > Add on to the work done in HBASE-13992 and add support for dataframes for > bulk puts, bulk gets, and scans -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14149) Add Data Frame support for HBase-Spark Module
[ https://issues.apache.org/jira/browse/HBASE-14149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska resolved HBASE-14149. - Resolution: Duplicate This was done in https://issues.apache.org/jira/browse/HBASE-14181 With connection to Spark SQL > Add Data Frame support for HBase-Spark Module > - > > Key: HBASE-14149 > URL: https://issues.apache.org/jira/browse/HBASE-14149 > Project: HBase > Issue Type: New Feature > Components: spark >Reporter: Ted Malaska >Assignee: Ted Malaska > > Add on to the work done in HBASE-13992 and add support for dataframes for > bulk puts, bulk gets, and scans -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961155#comment-14961155 ] Ted Malaska commented on HBASE-14406: - own -> owe > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, > HBASE-14406.11.patch, HBASE-14406.2.patch, HBASE-14406.3.patch, > HBASE-14406.4.patch, HBASE-14406.5.patch, HBASE-14406.6.patch, > HBASE-14406.7.patch, HBASE-14406.9.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961150#comment-14961150 ] Ted Malaska commented on HBASE-14406: - OMG I own you guys a beer. That was a long patch. Thank you both. > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, > HBASE-14406.11.patch, HBASE-14406.2.patch, HBASE-14406.3.patch, > HBASE-14406.4.patch, HBASE-14406.5.patch, HBASE-14406.6.patch, > HBASE-14406.7.patch, HBASE-14406.9.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959844#comment-14959844 ] Ted Malaska commented on HBASE-14406: - I just looked at https://issues.apache.org/jira/secure/attachment/12766912/HBASE-14406.10.patch and search for diff --git a/hbase-spark/src/main/protobuf/Filter.proto b/hbase-spark/src/main/protobuf/Filter.proto It's there. Let me know if I missed something > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, > HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, > HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, > HBASE-14406.9.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959872#comment-14959872 ] Ted Malaska commented on HBASE-14406: - Ohh [~ted_yu] so I need to add the generated file into the patch. Now I understand what you are saying. Sorry I was reading to fast. Will make new patch now > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, > HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, > HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, > HBASE-14406.9.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14406: Attachment: (was: TestSuite.txt) > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14406: Attachment: HBASE-14406.10.patch Just double checking and uploading the newest version > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, > HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, > HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, > HBASE-14406.9.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14406: Attachment: (was: Surefile-reports.zip) > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959847#comment-14959847 ] Ted Malaska commented on HBASE-14406: - Also the diff number of the review board if off by one. Which is my fault I skipped version 8. It never made it to up loaded :) So version 9 on reviewBoard is version 10 on jira > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, > HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, > HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, > HBASE-14406.9.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14406: Attachment: HBASE-14406.11.patch Added FilterProtos.java to git > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, > HBASE-14406.11.patch, HBASE-14406.2.patch, HBASE-14406.3.patch, > HBASE-14406.4.patch, HBASE-14406.5.patch, HBASE-14406.6.patch, > HBASE-14406.7.patch, HBASE-14406.9.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959866#comment-14959866 ] Ted Malaska commented on HBASE-14406: - grr the build system didn't generate the proto classes Let me do some research > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, > HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, > HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, > HBASE-14406.9.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959791#comment-14959791 ] Ted Malaska commented on HBASE-14406: - OK I rebuilt everything and restarted my computer. And everything is fine. I'm not sure what caused the problem originally but unit tests in patch 9 work on my local. Sorry for the false alarm > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, > Surefile-reports.zip, TestSuite.txt > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959830#comment-14959830 ] Ted Malaska commented on HBASE-14406: - yup it is in there https://reviews.apache.org/r/38536/diff/9#4 Let me know if you don't see it > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, > HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, > HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, > HBASE-14406.9.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959641#comment-14959641 ] Ted Malaska commented on HBASE-14406: - I just did "mvn -Dtest=NoUnitTests clean verify" in the hbase-spark folder It totally worked before I rebased then I rebased and it didn't work. I also got a fresh copy of master (so with out this patch) and I tried it on my box and two other people's boxes and all three failed. The change in the host file was successful on one of the boxes to fix the problem. I will try to repeat the fix when I get home. > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, > Surefile-reports.zip > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14406: Attachment: HBASE-14406.9.patch rebasing pom. On a side note something funky happen to hbase when I rebased. On my computer all the unit tests are broken in the master branch unless I do a "reverse look-up-able IP" to make it work. This issue is unrelated to my patch it was something else that change recently. I get this error java.io.IOException: java.lang.RuntimeException: Could not resolve Kerberos principal name: java.net.UnknownHostException: tmalaska-MBP-2.home: tmalaska-MBP-2.home: nodename nor servname provided, or not known On a HBaseTestingUtility startMiniCluster I have tested this on more then one computer with friends. So repeat the patch should be good, but something is not good with HBase in the latest master with respeck to HBaseTestingUtility startMiniCluster > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, > Surefile-reports.zip > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14406: Attachment: TestSuite.txt BTW here is the full stack trace when the host file is not updated to do a reverse look up. > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, > Surefile-reports.zip, TestSuite.txt > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959772#comment-14959772 ] Ted Malaska commented on HBASE-14406: - I tried this build and it doesn't have the hbase unit testing problem tmalaska-MBP-2:hbase-spark ted.malaska$ git log | head -n 1 commit 8f95318f6252c1c0b7a073619525eae6d991f47b > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, > Surefile-reports.zip, TestSuite.txt > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959643#comment-14959643 ] Ted Malaska commented on HBASE-14406: - Also this is my version tmalaska-MBP-2:hbase-spark ted.malaska$ git log | head -n 1 commit d5ed46bc9f9285f75d2d906ec9c120cb408827df > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, > Surefile-reports.zip > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959648#comment-14959648 ] Ted Malaska commented on HBASE-14406: - The code that break is just var TEST_UTIL: HBaseTestingUtility = new HBaseTestingUtility TEST_UTIL.startMiniCluster() //BOOM There is nothing else that runs no Spark stuff no nothing. Just HBaseTestingUtility > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, > Surefile-reports.zip > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959311#comment-14959311 ] Ted Malaska commented on HBASE-14406: - OK I will make this change in the next hour or so. Thanks Ted Yu > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, Surefile-reports.zip > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957413#comment-14957413 ] Ted Malaska commented on HBASE-14406: - What went wrong with the build? > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, Surefile-reports.zip > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14406: Attachment: HBASE-14406.7.patch Moved ProtoBufs to hbase-spark and out of hbase-protoco > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, HBASE-14406.7.patch, Surefile-reports.zip > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955850#comment-14955850 ] Ted Malaska commented on HBASE-14406: - Well that does make sense. Let me look into that tomorrow. > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, > HBASE-14406.6.patch, Surefile-reports.zip > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14158) Add documentation for Initial Release for HBase-Spark Module integration
[ https://issues.apache.org/jira/browse/HBASE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14158: Attachment: HBASE-14158.7.patch Removed the long lines and used the following command instead of git diff git format-patch --stdout origin/master > HBASE-14158.7.patch > Add documentation for Initial Release for HBase-Spark Module integration > - > > Key: HBASE-14158 > URL: https://issues.apache.org/jira/browse/HBASE-14158 > Project: HBase > Issue Type: Improvement > Components: documentation, spark >Reporter: Ted Malaska >Assignee: Ted Malaska > Fix For: 2.0.0 > > Attachments: HBASE-14158.1.patch, HBASE-14158.2.patch, > HBASE-14158.5.patch, HBASE-14158.5.patch, HBASE-14158.6.patch, > HBASE-14158.7.patch > > > Add documentation for Initial Release for HBase-Spark Module integration -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953360#comment-14953360 ] Ted Malaska commented on HBASE-14406: - I think the bug from last time was the following two ( rowkey < 1 or col > 2 ) and ( colA < 1 or colB > 2 ) The functionality of (rowkey < 1 and col > 2) worked in the last patch But here are some related tests that should cover both cases test("Test SQL point and range combo") test("Test OR logic with a one RowKey and One column") test("Test two complete range non merge rowKey query") test("Test OR logic with a two columns") > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953463#comment-14953463 ] Ted Malaska commented on HBASE-14406: - [~zhanzhang] np. Lets me add it now. It will take hopefully less then an hour. > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14406: Attachment: HBASE-14406.4.patch Applied worked for Zhan Zhang and Ted Yu > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior
[ https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Malaska updated HBASE-14406: Attachment: HBASE-14406.5.patch Then to Than > The dataframe datasource filter is wrong, and will result in data loss or > unexpected behavior > - > > Key: HBASE-14406 > URL: https://issues.apache.org/jira/browse/HBASE-14406 > Project: HBase > Issue Type: Bug > Components: spark >Affects Versions: 2.0.0 >Reporter: Zhan Zhang >Assignee: Ted Malaska >Priority: Blocker > Fix For: 2.0.0 > > Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, > HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch > > > Following condition will result in the same filter. It will have data loss > with the current filter construction. > col1 > 4 && col2 < 3 > col1 > 4 || col2 < 3 -- This message was sent by Atlassian JIRA (v6.3.4#6332)