[jira] [Commented] (HBASE-15333) Enhance the filter to handle short, integer, long, float and double

2016-04-06 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15228495#comment-15228495
 ] 

Ted Malaska commented on HBASE-15333:
-

I gave it a quick look and it looks good but let's give [~jmhsieh] some time to 
look over it.

I like that all the unit tests are reused.  
Have we tested the performance difference.  Is there any reason to be concerned 
on performance?

> Enhance the filter to handle short, integer, long, float and double
> ---
>
> Key: HBASE-15333
> URL: https://issues.apache.org/jira/browse/HBASE-15333
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: HBASE-15333-1.patch, HBASE-15333-2.patch, 
> HBASE-15333-3.patch, HBASE-15333-4.patch, HBASE-15333-5.patch
>
>
> Currently, the range filter is based on the order of bytes. But for java 
> primitive type, such as short, int, long, double, float, etc, their order is 
> not consistent with their byte order, extra manipulation has to be in place 
> to take care of them  correctly.
> For example, for the integer range (-100, 100), the filter <= 1, the current 
> filter will return 0 and 1, and the right return value should be (-100, 1]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures

2016-03-08 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186180#comment-15186180
 ] 

Ted Malaska commented on HBASE-15271:
-

Hey [~busbey] 

the current tests will test that everything still works the same with the 
rename added.  

It don't however test that a rename happened.

> Spark Bulk Load: Need to write HFiles to tmp location then rename to protect 
> from Spark Executor Failures
> -
>
> Key: HBASE-15271
> URL: https://issues.apache.org/jira/browse/HBASE-15271
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Ted Malaska
>Assignee: Ted Malaska
> Fix For: 2.0.0
>
> Attachments: HBASE-15271.1.patch, HBASE-15271.2.patch, 
> HBASE-15271.3.patch, HBASE-15271.4.patch
>
>
> With the current code if an executor failure before the HFile is close it 
> will cause problems.  This jira will have the files first write out to a file 
> that starts with an underscore.  Then when the HFile is complete it will be 
> renamed and the underscore will be removed.
> The underscore is important because the load bulk functionality will skip 
> files with an underscore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14789) Enhance the current spark-hbase connector

2016-02-25 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15168157#comment-15168157
 ] 

Ted Malaska commented on HBASE-14789:
-

This looks really cool.  Can we add a couple more.

5. Add support for DECIMAL
6. Add support for Nested Types
7. Add support for write with Bulk Load vs Puts with SparkSQL
8. Add support for pluggable change cell format (This is be implemented for 
item 2)



> Enhance the current spark-hbase connector
> -
>
> Key: HBASE-14789
> URL: https://issues.apache.org/jira/browse/HBASE-14789
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: shc.pdf
>
>
> This JIRA is to optimize the RDD construction in the current connector 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2016-02-24 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15166481#comment-15166481
 ] 

Ted Malaska commented on HBASE-14801:
-

I'm having a full day I will be able to look tomorrow.

But before we more forward with this.  I think I would like to see at least two 
more committers review this patch.

Because this patch includes to components.  1. Is if the code is correct and 
tested, 2. Is the new style of defining tables what we want to commit to long 
term.

Code review I can do.  But the style review I would like to see a couple more 
people give their say.  Because if we make this change I would like to not 
change it again in the future.  


> Enhance the Spark-HBase connector catalog with json format
> --
>
> Key: HBASE-14801
> URL: https://issues.apache.org/jira/browse/HBASE-14801
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: HBASE-14801-1.patch, HBASE-14801-2.patch, 
> HBASE-14801-3.patch, HBASE-14801-4.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster

2016-02-23 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15159987#comment-15159987
 ] 

Ted Malaska commented on HBASE-15184:
-

Thank u Ted yu




-- 
Sent from Gmail Mobile


> SparkSQL Scan operation doesn't work on kerberos cluster
> 
>
> Key: HBASE-15184
> URL: https://issues.apache.org/jira/browse/HBASE-15184
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15184.1.patch, HBaseSparkModule.zip
>
>
> I was using the HBase Spark Module at a client with Kerberos and I ran into 
> an issue with the Scan.  
> I made a fix for the client but we need to put it back into HBase.  I will 
> attach my solution, but it has a major problem.  I had to over ride a 
> protected class in spark.  I will need help to decover a better approach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster

2016-02-22 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-15184:

Attachment: HBASE-15184.1.patch

This was tested both in unit tests and in a 10 node kerberos cluster

> SparkSQL Scan operation doesn't work on kerberos cluster
> 
>
> Key: HBASE-15184
> URL: https://issues.apache.org/jira/browse/HBASE-15184
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBASE-15184.1.patch, HBaseSparkModule.zip
>
>
> I was using the HBase Spark Module at a client with Kerberos and I ran into 
> an issue with the Scan.  
> I made a fix for the client but we need to put it back into HBase.  I will 
> attach my solution, but it has a major problem.  I had to over ride a 
> protected class in spark.  I will need help to decover a better approach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster

2016-02-21 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15156351#comment-15156351
 ] 

Ted Malaska commented on HBASE-15184:
-

OK finally back ported the code and ran it on my Kerberos Cluster and yup it 
breaks.  Going to make the changes tonight and hopefully tomorrow we will have 
something.

> SparkSQL Scan operation doesn't work on kerberos cluster
> 
>
> Key: HBASE-15184
> URL: https://issues.apache.org/jira/browse/HBASE-15184
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBaseSparkModule.zip
>
>
> I was using the HBase Spark Module at a client with Kerberos and I ran into 
> an issue with the Scan.  
> I made a fix for the client but we need to put it back into HBase.  I will 
> attach my solution, but it has a major problem.  I had to over ride a 
> protected class in spark.  I will need help to decover a better approach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures

2016-02-19 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15154871#comment-15154871
 ] 

Ted Malaska commented on HBASE-15271:
-

Thank you Joh H and Ted Y

> Spark Bulk Load: Need to write HFiles to tmp location then rename to protect 
> from Spark Executor Failures
> -
>
> Key: HBASE-15271
> URL: https://issues.apache.org/jira/browse/HBASE-15271
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Ted Malaska
>Assignee: Ted Malaska
> Fix For: 2.0.0
>
> Attachments: HBASE-15271.1.patch, HBASE-15271.2.patch, 
> HBASE-15271.3.patch, HBASE-15271.4.patch
>
>
> With the current code if an executor failure before the HFile is close it 
> will cause problems.  This jira will have the files first write out to a file 
> that starts with an underscore.  Then when the HFile is complete it will be 
> renamed and the underscore will be removed.
> The underscore is important because the load bulk functionality will skip 
> files with an underscore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures

2016-02-19 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-15271:

Attachment: HBASE-15271.4.patch

Made changes based on Jon H comments

> Spark Bulk Load: Need to write HFiles to tmp location then rename to protect 
> from Spark Executor Failures
> -
>
> Key: HBASE-15271
> URL: https://issues.apache.org/jira/browse/HBASE-15271
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Malaska
>Assignee: Ted Malaska
> Attachments: HBASE-15271.1.patch, HBASE-15271.2.patch, 
> HBASE-15271.3.patch, HBASE-15271.4.patch
>
>
> With the current code if an executor failure before the HFile is close it 
> will cause problems.  This jira will have the files first write out to a file 
> that starts with an underscore.  Then when the HFile is complete it will be 
> renamed and the underscore will be removed.
> The underscore is important because the load bulk functionality will skip 
> files with an underscore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures

2016-02-18 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-15271:

Attachment: HBASE-15271.3.patch

Made changes for Ted Yu's comment

> Spark Bulk Load: Need to write HFiles to tmp location then rename to protect 
> from Spark Executor Failures
> -
>
> Key: HBASE-15271
> URL: https://issues.apache.org/jira/browse/HBASE-15271
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Malaska
>Assignee: Ted Malaska
> Attachments: HBASE-15271.1.patch, HBASE-15271.2.patch, 
> HBASE-15271.3.patch
>
>
> With the current code if an executor failure before the HFile is close it 
> will cause problems.  This jira will have the files first write out to a file 
> that starts with an underscore.  Then when the HFile is complete it will be 
> renamed and the underscore will be removed.
> The underscore is important because the load bulk functionality will skip 
> files with an underscore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures

2016-02-18 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-15271:

Attachment: HBASE-15271.2.patch

Added change based on Ted Yu's comment

> Spark Bulk Load: Need to write HFiles to tmp location then rename to protect 
> from Spark Executor Failures
> -
>
> Key: HBASE-15271
> URL: https://issues.apache.org/jira/browse/HBASE-15271
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Malaska
>Assignee: Ted Malaska
> Attachments: HBASE-15271.1.patch, HBASE-15271.2.patch
>
>
> With the current code if an executor failure before the HFile is close it 
> will cause problems.  This jira will have the files first write out to a file 
> that starts with an underscore.  Then when the HFile is complete it will be 
> renamed and the underscore will be removed.
> The underscore is important because the load bulk functionality will skip 
> files with an underscore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15282) Bump Spark on Hbase to use Spark 1.6.

2016-02-18 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153484#comment-15153484
 ] 

Ted Malaska commented on HBASE-15282:
-

+1

> Bump Spark on Hbase to use Spark 1.6.
> -
>
> Key: HBASE-15282
> URL: https://issues.apache.org/jira/browse/HBASE-15282
> Project: HBase
>  Issue Type: Improvement
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Jonathan Hsieh
>Assignee: Jonathan Hsieh
> Fix For: 2.0.0
>
> Attachments: hbase-15282.patch
>
>
> The latest stable Spark is spark 1.6. [1] 
> Let's bump the version.
> [1] http://spark.apache.org/news/spark-1-6-0-released.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures

2016-02-18 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-15271:

Attachment: HBASE-15271.1.patch

First draft.  Built and ran tests

> Spark Bulk Load: Need to write HFiles to tmp location then rename to protect 
> from Spark Executor Failures
> -
>
> Key: HBASE-15271
> URL: https://issues.apache.org/jira/browse/HBASE-15271
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Malaska
>Assignee: Ted Malaska
> Attachments: HBASE-15271.1.patch
>
>
> With the current code if an executor failure before the HFile is close it 
> will cause problems.  This jira will have the files first write out to a file 
> that starts with an underscore.  Then when the HFile is complete it will be 
> renamed and the underscore will be removed.
> The underscore is important because the load bulk functionality will skip 
> files with an underscore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15271) Spark Bulk Load: Need to write HFiles to tmp location then rename to protect from Spark Executor Failures

2016-02-15 Thread Ted Malaska (JIRA)
Ted Malaska created HBASE-15271:
---

 Summary: Spark Bulk Load: Need to write HFiles to tmp location 
then rename to protect from Spark Executor Failures
 Key: HBASE-15271
 URL: https://issues.apache.org/jira/browse/HBASE-15271
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Assignee: Ted Malaska


With the current code if an executor failure before the HFile is close it will 
cause problems.  This jira will have the files first write out to a file that 
starts with an underscore.  Then when the HFile is complete it will be renamed 
and the underscore will be removed.

The underscore is important because the load bulk functionality will skip files 
with an underscore.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster

2016-02-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15147419#comment-15147419
 ] 

Ted Malaska commented on HBASE-15184:
-

I'm also here on this Jira.  I'm on a clint application now and I want to make 
sure everything is solid before I submit the batch.  I'm sorry this is taking 
longer then initially planned.  But I want to finish my testing on the kerberos 
cluster first.

> SparkSQL Scan operation doesn't work on kerberos cluster
> 
>
> Key: HBASE-15184
> URL: https://issues.apache.org/jira/browse/HBASE-15184
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBaseSparkModule.zip
>
>
> I was using the HBase Spark Module at a client with Kerberos and I ran into 
> an issue with the Scan.  
> I made a fix for the client but we need to put it back into HBase.  I will 
> attach my solution, but it has a major problem.  I had to over ride a 
> protected class in spark.  I will need help to decover a better approach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15225) Connecting to HBase via newAPIHadoopRDD in PySpark gives org.apache.hadoop.hbase.client.RetriesExhaustedException

2016-02-06 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136012#comment-15136012
 ] 

Ted Malaska commented on HBASE-15225:
-

I have a version that works on kerberos clusters that my clients are using.  
I'm back from vacation this week and I will add it to hbase soon.

Let me see if I can get it on my github.



> Connecting to HBase via newAPIHadoopRDD in PySpark gives  
> org.apache.hadoop.hbase.client.RetriesExhaustedException
> --
>
> Key: HBASE-15225
> URL: https://issues.apache.org/jira/browse/HBASE-15225
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.4
> Environment: spark 1.6.0 , Hbase 0.98.4, kerberos,  
> hbase.rpc.protection set to authentication.
>Reporter: Sanjay Kumar
>
> Unable to read HBase table into Spark with hbase security authentication set 
> to kerberos. Seeing the following error. 
> : org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=31, exceptions:
> Thu Feb 04 22:01:55 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:56 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:56 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:57 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:59 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:03 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:13 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:23 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:34 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:04 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:24 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:04 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:24 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:05:04 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.IOException: Connection reset by peer
> .
> .
> .
> Thu Feb 04 22:09:46 CST 2016, 
> 

[jira] [Commented] (HBASE-15225) Connecting to HBase via newAPIHadoopRDD in PySpark gives org.apache.hadoop.hbase.client.RetriesExhaustedException

2016-02-06 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136013#comment-15136013
 ] 

Ted Malaska commented on HBASE-15225:
-

OK I have updated the Github.  This has everything that is in HBase but with 
the scan kerberos fix plus it is back ported to CDH 5.5.  If you are not using 
CDH then just make the needed changes and fork the code.

https://github.com/tmalaska/SparkOnHBase

This week I will work with the HBase committers to get the kerberos fit checked 
in.

Let me know if there is anything else you need.

> Connecting to HBase via newAPIHadoopRDD in PySpark gives  
> org.apache.hadoop.hbase.client.RetriesExhaustedException
> --
>
> Key: HBASE-15225
> URL: https://issues.apache.org/jira/browse/HBASE-15225
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.4
> Environment: spark 1.6.0 , Hbase 0.98.4, kerberos,  
> hbase.rpc.protection set to authentication.
>Reporter: Sanjay Kumar
>
> Unable to read HBase table into Spark with hbase security authentication set 
> to kerberos. Seeing the following error. 
> : org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=31, exceptions:
> Thu Feb 04 22:01:55 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:56 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:56 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:57 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:59 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:03 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:13 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:23 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:34 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:04 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:24 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:04 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:24 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:05:04 CST 2016, 
> 

[jira] [Commented] (HBASE-14340) Add second bulk load option to Spark Bulk Load to send puts as the value

2016-02-06 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136083#comment-15136083
 ] 

Ted Malaska commented on HBASE-14340:
-

Thank you Andrew for your review.

:)

> Add second bulk load option to Spark Bulk Load to send puts as the value
> 
>
> Key: HBASE-14340
> URL: https://issues.apache.org/jira/browse/HBASE-14340
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-14340.1.patch, HBASE-14340.2.patch
>
>
> The initial bulk load option for Spark bulk load sends values over one by one 
> through the shuffle.  This is the similar to how the original MR bulk load 
> worked.
> How ever the MR bulk loader have more then one bulk load option.  There is a 
> second option that allows for all the Column Families, Qualifiers, and Values 
> or a row to be combined in the map side.
> This only works if the row is not super wide.
> But if the row is not super wide this method of sending values through the 
> shuffle will reduce the data and work the shuffle has to deal with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster

2016-02-06 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15136082#comment-15136082
 ] 

Ted Malaska commented on HBASE-15184:
-

ok I'm back from vacation I will try to finish a patch by mid week.

[~asrabkin] the problem is really simple.  If a scan operation is the first 
operation you are doing in your spark context (you haven't done a map or 
foreach yet for example)  Then the readers will not have the kerberos creds 
applied yet and the scan will fail.  

In the Zip file and in https://github.com/tmalaska/SparkOnHBase I solved this 
in one way that required a extension of class that I shouldn't have extended.  
Sense the creation of this Jira I had a vacation and on that vacation I figured 
out how to implement this solution with out doing that extension that is a no 
no.

I have a busy week, but I will try to get this jira in.

Thx

> SparkSQL Scan operation doesn't work on kerberos cluster
> 
>
> Key: HBASE-15184
> URL: https://issues.apache.org/jira/browse/HBASE-15184
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBaseSparkModule.zip
>
>
> I was using the HBase Spark Module at a client with Kerberos and I ran into 
> an issue with the Scan.  
> I made a fix for the client but we need to put it back into HBase.  I will 
> attach my solution, but it has a major problem.  I had to over ride a 
> protected class in spark.  I will need help to decover a better approach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster

2016-02-06 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska reassigned HBASE-15184:
---

Assignee: Ted Malaska

> SparkSQL Scan operation doesn't work on kerberos cluster
> 
>
> Key: HBASE-15184
> URL: https://issues.apache.org/jira/browse/HBASE-15184
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: HBaseSparkModule.zip
>
>
> I was using the HBase Spark Module at a client with Kerberos and I ran into 
> an issue with the Scan.  
> I made a fix for the client but we need to put it back into HBase.  I will 
> attach my solution, but it has a major problem.  I had to over ride a 
> protected class in spark.  I will need help to decover a better approach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-15225) Connecting to HBase via newAPIHadoopRDD in PySpark gives org.apache.hadoop.hbase.client.RetriesExhaustedException

2016-02-05 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135624#comment-15135624
 ] 

Ted Malaska commented on HBASE-15225:
-

This should be marked as involved Jira.

[~88.sanjay] don't use newAPIHadoopRDD use the functions defined in the 
HBaseContext object which is defined in the hbaseSpark Module.

That will take care of all you Spark to HBase connection issues.

Documentation can be found here: https://hbase.apache.org/book.html#spark



> Connecting to HBase via newAPIHadoopRDD in PySpark gives  
> org.apache.hadoop.hbase.client.RetriesExhaustedException
> --
>
> Key: HBASE-15225
> URL: https://issues.apache.org/jira/browse/HBASE-15225
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.4
> Environment: spark 1.6.0 , Hbase 0.98.4, kerberos,  
> hbase.rpc.protection set to authentication.
>Reporter: Sanjay Kumar
>
> Unable to read HBase table into Spark with hbase security authentication set 
> to kerberos. Seeing the following error. 
> : org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=31, exceptions:
> Thu Feb 04 22:01:55 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:56 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:56 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:57 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:59 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:03 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:13 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:23 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:34 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:04 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:24 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:04 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:24 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:05:04 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to 

[jira] [Commented] (HBASE-15225) Connecting to HBase via newAPIHadoopRDD in PySpark gives org.apache.hadoop.hbase.client.RetriesExhaustedException

2016-02-05 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-15225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15135626#comment-15135626
 ] 

Ted Malaska commented on HBASE-15225:
-

Also if you are using PySpark then use the Spark SQL to HBase access pattern  
also found in here https://hbase.apache.org/book.html#_sparksql_dataframes

> Connecting to HBase via newAPIHadoopRDD in PySpark gives  
> org.apache.hadoop.hbase.client.RetriesExhaustedException
> --
>
> Key: HBASE-15225
> URL: https://issues.apache.org/jira/browse/HBASE-15225
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.4
> Environment: spark 1.6.0 , Hbase 0.98.4, kerberos,  
> hbase.rpc.protection set to authentication.
>Reporter: Sanjay Kumar
>
> Unable to read HBase table into Spark with hbase security authentication set 
> to kerberos. Seeing the following error. 
> : org.apache.hadoop.hbase.client.RetriesExhaustedException: Failed after 
> attempts=31, exceptions:
> Thu Feb 04 22:01:55 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:56 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:56 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:57 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:01:59 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:03 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:13 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:23 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:34 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:02:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:04 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:24 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:03:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:04 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:24 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:04:44 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.EOFException
> Thu Feb 04 22:05:04 CST 2016, 
> org.apache.hadoop.hbase.client.RpcRetryingCaller@395327da, 
> java.io.IOException: Call to d-767tfz1.target.com/10.66.241.13:60020 failed 
> on local exception: java.io.IOException: Connection reset by peer
> .
> .
> .
> Thu Feb 04 22:09:46 CST 2016, 
> 

[jira] [Updated] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster

2016-01-28 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-15184:

Attachment: HBaseSparkModule.zip

Solution that worked on CDH 5.5 on client kerberos cluster, but also includes 
spark package to override a protected class.

> SparkSQL Scan operation doesn't work on kerberos cluster
> 
>
> Key: HBASE-15184
> URL: https://issues.apache.org/jira/browse/HBASE-15184
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Malaska
> Attachments: HBaseSparkModule.zip
>
>
> I was using the HBase Spark Module at a client with Kerberos and I ran into 
> an issue with the Scan.  
> I made a fix for the client but we need to put it back into HBase.  I will 
> attach my solution, but it has a major problem.  I had to over ride a 
> protected class in spark.  I will need help to decover a better approach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15184) SparkSQL Scan operation doesn't work on kerberos cluster

2016-01-28 Thread Ted Malaska (JIRA)
Ted Malaska created HBASE-15184:
---

 Summary: SparkSQL Scan operation doesn't work on kerberos cluster
 Key: HBASE-15184
 URL: https://issues.apache.org/jira/browse/HBASE-15184
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska


I was using the HBase Spark Module at a client with Kerberos and I ran into an 
issue with the Scan.  

I made a fix for the client but we need to put it back into HBase.  I will 
attach my solution, but it has a major problem.  I had to over ride a protected 
class in spark.  I will need help to decover a better approach



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14796) Enhance the Gets in the connector

2015-12-28 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15072938#comment-15072938
 ] 

Ted Malaska commented on HBASE-14796:
-

Zhan good points.  I agree, even if it is slower it is better.

Thanks

> Enhance the Gets in the connector
> -
>
> Key: HBASE-14796
> URL: https://issues.apache.org/jira/browse/HBASE-14796
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
> Attachments: HBASE-14976.patch
>
>
> Current the Spark-Module Spark SQL implementation gets records from HBase 
> from the driver if there is something like the following found in the SQL.
> rowkey = 123
> The reason for this original was normal sql will not have many equal 
> operations in a single where clause.
> Zhan, had brought up too points that have value.
> 1. The SQL may be generated and may have many many equal statements in it so 
> moving the work to an executor protects the driver from load
> 2. In the correct implementation the drive is connecting to HBase and 
> exceptions may cause trouble with the Spark application and not just with the 
> a single task execution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15036) Update HBase Spark documentation to include bulk load with thin records

2015-12-23 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-15036:

Attachment: HBASE-15036.patch

First Draft

> Update HBase Spark documentation to include bulk load with thin records
> ---
>
> Key: HBASE-15036
> URL: https://issues.apache.org/jira/browse/HBASE-15036
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ted Malaska
>Assignee: Ted Malaska
> Attachments: HBASE-15036.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-15036) Update HBase Spark documentation to include bulk load with thin records

2015-12-23 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-15036?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-15036:

Attachment: HBASE-15036.1.patch

Removed extra spaces

> Update HBase Spark documentation to include bulk load with thin records
> ---
>
> Key: HBASE-15036
> URL: https://issues.apache.org/jira/browse/HBASE-15036
> Project: HBase
>  Issue Type: New Feature
>Reporter: Ted Malaska
>Assignee: Ted Malaska
> Attachments: HBASE-15036.1.patch, HBASE-15036.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14796) Enhance the Gets in the connector

2015-12-23 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15070177#comment-15070177
 ] 

Ted Malaska commented on HBASE-14796:
-

I reviewed the code and I'm giving it a +1

Did you do the performance tests?  As long as we are not going slower I'm good 
here.

The test should be done on a cluster not in local mode.  It can be done on 
warmed Yarn containers, we don't need to count the time to start Yarn.

I would like to se what the different in time is when running the following 
tests:
1. a select statement with a single get
2. a select statement with a 10 get
3. a select statement with a 1000 get

Maybe also we should test with different row sizes.  
1. 300bits
2. 3Kb
3. 30KB

Let me know what you think.

Thanks again Zhan

> Enhance the Gets in the connector
> -
>
> Key: HBASE-14796
> URL: https://issues.apache.org/jira/browse/HBASE-14796
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
> Attachments: HBASE-14976.patch
>
>
> Current the Spark-Module Spark SQL implementation gets records from HBase 
> from the driver if there is something like the following found in the SQL.
> rowkey = 123
> The reason for this original was normal sql will not have many equal 
> operations in a single where clause.
> Zhan, had brought up too points that have value.
> 1. The SQL may be generated and may have many many equal statements in it so 
> moving the work to an executor protects the driver from load
> 2. In the correct implementation the drive is connecting to HBase and 
> exceptions may cause trouble with the Spark application and not just with the 
> a single task execution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-15036) Update HBase Spark documentation to include bulk load with thin records

2015-12-23 Thread Ted Malaska (JIRA)
Ted Malaska created HBASE-15036:
---

 Summary: Update HBase Spark documentation to include bulk load 
with thin records
 Key: HBASE-15036
 URL: https://issues.apache.org/jira/browse/HBASE-15036
 Project: HBase
  Issue Type: New Feature
Reporter: Ted Malaska
Assignee: Ted Malaska






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-18 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15064491#comment-15064491
 ] 

Ted Malaska commented on HBASE-14849:
-

I reviewed the your changes.  I'm liking what you did.  So I give it a +1

But let one more person review

Thanks
Zhan

> Add option to set block cache to false on SparkSQL executions
> -
>
> Key: HBASE-14849
> URL: https://issues.apache.org/jira/browse/HBASE-14849
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
> Attachments: HBASE-14849-1.patch, HBASE-14849-2.patch, 
> HBASE-14849.patch
>
>
> I was working at a client with a ported down version of the Spark module for 
> HBase and realized we didn't add an option to turn of block cache for the 
> scans.  
> At the client I just disabled all caching with Spark SQL, this is an easy but 
> very impactful fix.
> The fix for this patch will make this configurable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-17 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062634#comment-15062634
 ] 

Ted Malaska commented on HBASE-14849:
-

I added comments

> Add option to set block cache to false on SparkSQL executions
> -
>
> Key: HBASE-14849
> URL: https://issues.apache.org/jira/browse/HBASE-14849
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
> Attachments: HBASE-14849-1.patch, HBASE-14849.patch
>
>
> I was working at a client with a ported down version of the Spark module for 
> HBase and realized we didn't add an option to turn of block cache for the 
> scans.  
> At the client I just disabled all caching with Spark SQL, this is an easy but 
> very impactful fix.
> The fix for this patch will make this configurable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-17 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15062684#comment-15062684
 ] 

Ted Malaska commented on HBASE-14849:
-

Thanks Zhan, I updated again.

Thank you for the work

> Add option to set block cache to false on SparkSQL executions
> -
>
> Key: HBASE-14849
> URL: https://issues.apache.org/jira/browse/HBASE-14849
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
> Attachments: HBASE-14849-1.patch, HBASE-14849.patch
>
>
> I was working at a client with a ported down version of the Spark module for 
> HBase and realized we didn't add an option to turn of block cache for the 
> scans.  
> At the client I just disabled all caching with Spark SQL, this is an easy but 
> very impactful fix.
> The fix for this patch will make this configurable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15059253#comment-15059253
 ] 

Ted Malaska commented on HBASE-14849:
-

Can you make the review board?

Thanks

> Add option to set block cache to false on SparkSQL executions
> -
>
> Key: HBASE-14849
> URL: https://issues.apache.org/jira/browse/HBASE-14849
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
> Attachments: HBASE-14849.patch
>
>
> I was working at a client with a ported down version of the Spark module for 
> HBase and realized we didn't add an option to turn of block cache for the 
> scans.  
> At the client I just disabled all caching with Spark SQL, this is an easy but 
> very impactful fix.
> The fix for this patch will make this configurable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-14 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14849:

Assignee: Zhan Zhang  (was: Ted Malaska)

> Add option to set block cache to false on SparkSQL executions
> -
>
> Key: HBASE-14849
> URL: https://issues.apache.org/jira/browse/HBASE-14849
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>
> I was working at a client with a ported down version of the Spark module for 
> HBase and realized we didn't add an option to turn of block cache for the 
> scans.  
> At the client I just disabled all caching with Spark SQL, this is an easy but 
> very impactful fix.
> The fix for this patch will make this configurable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-12-14 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15057071#comment-15057071
 ] 

Ted Malaska commented on HBASE-14849:
-

Thanks Zhan.

I will be able to review as soon as you finish

> Add option to set block cache to false on SparkSQL executions
> -
>
> Key: HBASE-14849
> URL: https://issues.apache.org/jira/browse/HBASE-14849
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>
> I was working at a client with a ported down version of the Spark module for 
> HBase and realized we didn't add an option to turn of block cache for the 
> scans.  
> At the client I just disabled all caching with Spark SQL, this is an easy but 
> very impactful fix.
> The fix for this patch will make this configurable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-11 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15052639#comment-15052639
 ] 

Ted Malaska commented on HBASE-14795:
-

Will do, but don't hold up this patch for my testing.  If I find anything we 
will connect through a new jira.

Also once this patch is in I will like to get HBASE-14849 starting and checked 
in.

Let me know if you want to do HBASE-14849 or if your ok with me doing it.  We 
can have that chat on the HBASE-14849 jira.

Thanks again Zhan

> Enhance the spark-hbase scan operations
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
> Attachments: 
> 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, 
> HBASE-14795-1.patch, HBASE-14795-2.patch, HBASE-14795-3.patch, 
> HBASE-14795-4.patch
>
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14929) There is a space missing from Table "foo" is not currently available.

2015-12-11 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053519#comment-15053519
 ] 

Ted Malaska commented on HBASE-14929:
-

Great lets get the patch file up on the jira and create a review board entry 
and I will help review.

> There is a space missing from Table "foo" is not currently available.
> -
>
> Key: HBASE-14929
> URL: https://issues.apache.org/jira/browse/HBASE-14929
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Malaska
>Assignee: Carlos A. Morillo
>Priority: Trivial
>
> Go to the following line in LoadIncrementalHFiles.java
> throw new TableNotFoundException("Table " + table.getName() + "is not 
> currently available.");
> and add a space before is and after '



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14929) There is a space missing from Table "foo" is not currently available.

2015-12-11 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15053531#comment-15053531
 ] 

Ted Malaska commented on HBASE-14929:
-

+1

> There is a space missing from Table "foo" is not currently available.
> -
>
> Key: HBASE-14929
> URL: https://issues.apache.org/jira/browse/HBASE-14929
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Malaska
>Assignee: Carlos A. Morillo
>Priority: Trivial
> Attachments: HBASE-14929.patch
>
>
> Go to the following line in LoadIncrementalHFiles.java
> throw new TableNotFoundException("Table " + table.getName() + "is not 
> currently available.");
> and add a space before is and after '



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-10 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051936#comment-15051936
 ] 

Ted Malaska commented on HBASE-14795:
-

That was a cool addition.  I like the wrapping of the function to catch the 
exceptions.  

I'm +1 also next week I'm going to run this on a 10 billion record + dataset 
just to see it in action.

Since I'm not a commenter I don't know if my +1 means much, but you have it.

Thanks Zhan

> Enhance the spark-hbase scan operations
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
> Attachments: 
> 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, 
> HBASE-14795-1.patch, HBASE-14795-2.patch, HBASE-14795-3.patch, 
> HBASE-14795-4.patch
>
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-10 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15051244#comment-15051244
 ] 

Ted Malaska commented on HBASE-14795:
-

Hey Zhan

I left one comment about the sync block.  and I do see that you added a bunch 
of try catch blocks.  But the problem still remains where the table and scanner 
can be un closed.

I think we need to add something like the following: 

https://github.com/apache/spark/blob/f434f36d508eb4dcade70871611fc022ae0feb56/core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala#L154

You will note this is given for free if you use a InputFormat.  Which asks the 
question should these changes go back into the TableInputFormat and we just use 
the TableInputFormat.  This would allow us to maintain reading from table in 
one location and it would also mean you don't have to worry about the life 
cycle of anything.

> Enhance the spark-hbase scan operations
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
> Attachments: 
> 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, 
> HBASE-14795-1.patch, HBASE-14795-2.patch, HBASE-14795-3.patch
>
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-08 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047701#comment-15047701
 ] 

Ted Malaska commented on HBASE-14795:
-

Will review tonight.  Long day :)




-- 
Sent from Gmail Mobile


> Enhance the spark-hbase scan operations
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
> Attachments: 
> 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, 
> HBASE-14795-1.patch, HBASE-14795-2.patch
>
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-08 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047779#comment-15047779
 ] 

Ted Malaska commented on HBASE-14795:
-

OK finished my review.  Please review my review, it seems to me that there is 
still failure paths where closing tables or scanner doesn't take place.  
Leaving open tables and scanners on long lived executors. 

There is also a question about thread safety and a question about implicit 
methods that don't seem to be called.

If you get a patch in tomorrow I can review tomorrow night.



> Enhance the spark-hbase scan operations
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
> Attachments: 
> 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, 
> HBASE-14795-1.patch, HBASE-14795-2.patch
>
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-07 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046210#comment-15046210
 ] 

Ted Malaska commented on HBASE-14795:
-

I just looked at the review board.  I don't see the changes.  I see the comment 
status updated but the code doesn't look to have changed.  Am I missing 
something?

Thanks

> Enhance the spark-hbase scan operations
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
> Attachments: 
> 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch, 
> HBASE-14795-1.patch
>
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-06 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15044226#comment-15044226
 ] 

Ted Malaska commented on HBASE-14795:
-

I did a first pass review and left comments.  Mainly concerned about closing 
scanners and tables.

> Enhance the spark-hbase scan operations
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
> Attachments: 
> 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch
>
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14929) There is a space missing from Table "foo" is not currently available.

2015-12-04 Thread Ted Malaska (JIRA)
Ted Malaska created HBASE-14929:
---

 Summary: There is a space missing from Table "foo" is not 
currently available.
 Key: HBASE-14929
 URL: https://issues.apache.org/jira/browse/HBASE-14929
 Project: HBase
  Issue Type: Bug
Reporter: Ted Malaska
Priority: Trivial


Go to the following line in LoadIncrementalHFiles.java

throw new TableNotFoundException("Table " + table.getName() + "is not currently 
available.");

and add a space before is and after '



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-12-03 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15039668#comment-15039668
 ] 

Ted Malaska commented on HBASE-14795:
-

Can we open up a review board for this. 

Thx

> Enhance the spark-hbase scan operations
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
> Attachments: 
> 0001-HBASE-14795-Enhance-the-spark-hbase-scan-operations.patch
>
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-11-27 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14849:

Description: 
I was working at a client with a ported down version of the Spark module for 
HBase and realized we didn't add an option to turn of block cache for the 
scans.  

At the client I just disabled all caching with Spark SQL, this is an easy but 
very impactful fix.

The fix for this patch will make this configurable

  was:
I was working at a client with a ported down version of the Spark module for 
HBase and realized we didn't add an option to turn of block cache for the 
scans.  

This is an easy but very impactful fix.


> Add option to set block cache to false on SparkSQL executions
> -
>
> Key: HBASE-14849
> URL: https://issues.apache.org/jira/browse/HBASE-14849
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>
> I was working at a client with a ported down version of the Spark module for 
> HBase and realized we didn't add an option to turn of block cache for the 
> scans.  
> At the client I just disabled all caching with Spark SQL, this is an easy but 
> very impactful fix.
> The fix for this patch will make this configurable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-11-27 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska reassigned HBASE-14849:
---

Assignee: Ted Malaska

> Add option to set block cache to false on SparkSQL executions
> -
>
> Key: HBASE-14849
> URL: https://issues.apache.org/jira/browse/HBASE-14849
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>
> I was working at a client with a ported down version of the Spark module for 
> HBase and realized we didn't add an option to turn of block cache for the 
> scans.  
> This is an easy but very impactful fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-11-27 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15029922#comment-15029922
 ] 

Ted Malaska commented on HBASE-14849:
-

If it is ok I'm going to start this, with the hopes of getting a patch in the 
next 5 days.

It should be an easy pass.  My only worry is how to unit test something like 
this.  hmm

> Add option to set block cache to false on SparkSQL executions
> -
>
> Key: HBASE-14849
> URL: https://issues.apache.org/jira/browse/HBASE-14849
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>
> I was working at a client with a ported down version of the Spark module for 
> HBase and realized we didn't add an option to turn of block cache for the 
> scans.  
> This is an easy but very impactful fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Enhance the spark-hbase scan operations

2015-11-19 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15014342#comment-15014342
 ] 

Ted Malaska commented on HBASE-14795:
-

Hey Zhan,

What is your ETA on this JIRA.  I just opened HBASE-14849 and I wanted to know 
if I should do that now or wait until this jira is done, or if you want to 
include HBASE-14849 into this jira.

Let me know.

> Enhance the spark-hbase scan operations
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14849) Add option to set block cache to false on SparkSQL executions

2015-11-19 Thread Ted Malaska (JIRA)
Ted Malaska created HBASE-14849:
---

 Summary: Add option to set block cache to false on SparkSQL 
executions
 Key: HBASE-14849
 URL: https://issues.apache.org/jira/browse/HBASE-14849
 Project: HBase
  Issue Type: New Feature
Reporter: Ted Malaska


I was working at a client with a ported down version of the Spark module for 
HBase and realized we didn't add an option to turn of block cache for the 
scans.  

This is an easy but very impactful fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14340) Add second bulk load option to Spark Bulk Load to send puts as the value

2015-11-17 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15009797#comment-15009797
 ] 

Ted Malaska commented on HBASE-14340:
-

Thank u Andrew.  Let me know if there r any other jiras u would like me to
look at.

Thank again

On Tuesday, November 17, 2015, Andrew Purtell (JIRA) 



-- 
Sent from Gmail Mobile


> Add second bulk load option to Spark Bulk Load to send puts as the value
> 
>
> Key: HBASE-14340
> URL: https://issues.apache.org/jira/browse/HBASE-14340
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-14340.1.patch, HBASE-14340.2.patch
>
>
> The initial bulk load option for Spark bulk load sends values over one by one 
> through the shuffle.  This is the similar to how the original MR bulk load 
> worked.
> How ever the MR bulk loader have more then one bulk load option.  There is a 
> second option that allows for all the Column Families, Qualifiers, and Values 
> or a row to be combined in the map side.
> This only works if the row is not super wide.
> But if the row is not super wide this method of sending values through the 
> shuffle will reduce the data and work the shuffle has to deal with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14340) Add second bulk load option to Spark Bulk Load to send puts as the value

2015-11-15 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14340:

Attachment: HBASE-14340.2.patch

Fixed copy paste issue.

It was my mistake.  The code was write on my laptop but I had made the patch 
out of sycn or something.  

Thanks for finding that.

> Add second bulk load option to Spark Bulk Load to send puts as the value
> 
>
> Key: HBASE-14340
> URL: https://issues.apache.org/jira/browse/HBASE-14340
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-14340.1.patch, HBASE-14340.2.patch
>
>
> The initial bulk load option for Spark bulk load sends values over one by one 
> through the shuffle.  This is the similar to how the original MR bulk load 
> worked.
> How ever the MR bulk loader have more then one bulk load option.  There is a 
> second option that allows for all the Column Families, Qualifiers, and Values 
> or a row to be combined in the map side.
> This only works if the row is not super wide.
> But if the row is not super wide this method of sending values through the 
> shuffle will reduce the data and work the shuffle has to deal with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14340) Add second bulk load option to Spark Bulk Load to send puts as the value

2015-11-13 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15004362#comment-15004362
 ] 

Ted Malaska commented on HBASE-14340:
-

Thank you Andrew for the review I will get to this jira in the next couple of 
days.

> Add second bulk load option to Spark Bulk Load to send puts as the value
> 
>
> Key: HBASE-14340
> URL: https://issues.apache.org/jira/browse/HBASE-14340
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-14340.1.patch
>
>
> The initial bulk load option for Spark bulk load sends values over one by one 
> through the shuffle.  This is the similar to how the original MR bulk load 
> worked.
> How ever the MR bulk loader have more then one bulk load option.  There is a 
> second option that allows for all the Column Families, Qualifiers, and Values 
> or a row to be combined in the map side.
> This only works if the row is not super wide.
> But if the row is not super wide this method of sending values through the 
> shuffle will reduce the data and work the shuffle has to deal with.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14801) Enhance the Spark-HBase connector catalog with json format

2015-11-12 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003057#comment-15003057
 ] 

Ted Malaska commented on HBASE-14801:
-

I have no problem with this, I think it looks a lot prettier then what I did on 
the first draft.  

Does anyone else have an thought on this?  We don't want to change this too 
many times once it gets in users hands, so let agree that this JSON format is 
what we want long term.

> Enhance the Spark-HBase connector catalog with json format
> --
>
> Key: HBASE-14801
> URL: https://issues.apache.org/jira/browse/HBASE-14801
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14789) Enhance the current spark-hbase connector

2015-11-12 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15003062#comment-15003062
 ] 

Ted Malaska commented on HBASE-14789:
-

Adding Jira for Changing the table definition to JSON

> Enhance the current spark-hbase connector
> -
>
> Key: HBASE-14789
> URL: https://issues.apache.org/jira/browse/HBASE-14789
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: shc.pdf
>
>
> This JIRA is to optimize the RDD construction in the current connector 
> implementation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets

2015-11-11 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001362#comment-15001362
 ] 

Ted Malaska commented on HBASE-14796:
-

Yeah agreed. It also depends on the time it takes to start a task.  But yeah 
I'm very interested to see if there is a difference.  It is a great science 
experiment :)

> Provide an alternative spark-hbase SQL implementations for Gets
> ---
>
> Key: HBASE-14796
> URL: https://issues.apache.org/jira/browse/HBASE-14796
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> Current the Spark-Module Spark SQL implementation gets records from HBase 
> from the driver if there is something like the following found in the SQL.
> rowkey = 123
> The reason for this original was normal sql will not have many equal 
> operations in a single where clause.
> Zhan, had brought up too points that have value.
> 1. The SQL may be generated and may have many many equal statements in it so 
> moving the work to an executor protects the driver from load
> 2. In the correct implementation the drive is connecting to HBase and 
> exceptions may cause trouble with the Spark application and not just with the 
> a single task execution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan

2015-11-11 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001136#comment-15001136
 ] 

Ted Malaska commented on HBASE-14795:
-

I would like that.  Thanks Zhan.

> Provide an alternative spark-hbase SQL implementations for Scan
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-11 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001083#comment-15001083
 ] 

Ted Malaska commented on HBASE-14789:
-

Hey Zhan,

I'm not sure I understand the question.

What I'm thinking is the changes you are asking for should fit nicely into the 
existing code.

And we can use the sub jira to discuss the implementations of each.  Example 
with the Scan implementation I would like to ask if that functionality could be 
added to tableInputFormat because it could be of value to more then just 
SparkSQL and because we can consolidate code.  For the BulkGet implementation I 
would like to see some performance tests to make sure we are not introducing 
latancy, also if we should use the existing BulkGet functionality in 
HBase-Spark because we might want to execute the gets in more then one task. 

But lets have this discussions in the sub jiras, for they are completely 
different components that are not dependent on each other.

Thanks

> Provide an alternative spark-hbase connector
> 
>
> Key: HBASE-14789
> URL: https://issues.apache.org/jira/browse/HBASE-14789
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: shc.pdf
>
>
> This JIRA is to provide user an option to choose different Spark-HBase 
> implementation based on requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets

2015-11-11 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001156#comment-15001156
 ] 

Ted Malaska commented on HBASE-14796:
-

My only concern here is if we are adding latency for the normal single row get 
query.

Can you run some tests to see what if an impact there is on this?  Not just a 
unit test but a test of a real cluster.  

If the latency difference is nothing big they I don't see any problem with the 
full change to the executor get design.  If the latency change is huge, maybe 
we can make this configurable. 

> Provide an alternative spark-hbase SQL implementations for Gets
> ---
>
> Key: HBASE-14796
> URL: https://issues.apache.org/jira/browse/HBASE-14796
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> Current the Spark-Module Spark SQL implementation gets records from HBase 
> from the driver if there is something like the following found in the SQL.
> rowkey = 123
> The reason for this original was normal sql will not have many equal 
> operations in a single where clause.
> Zhan, had brought up too points that have value.
> 1. The SQL may be generated and may have many many equal statements in it so 
> moving the work to an executor protects the driver from load
> 2. In the correct implementation the drive is connecting to HBase and 
> exceptions may cause trouble with the Spark application and not just with the 
> a single task execution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets

2015-11-11 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15001301#comment-15001301
 ] 

Ted Malaska commented on HBASE-14796:
-

I agreed with point one.  But the use case I'm thinking about is one like this.

HBase table 100 million or a billion records (number does matter much, just 
make it a lot)

Then the select looks like this

Select * from hbase_table where rowkey = "foobar"

I can see this being very common not optimal but common.

> Provide an alternative spark-hbase SQL implementations for Gets
> ---
>
> Key: HBASE-14796
> URL: https://issues.apache.org/jira/browse/HBASE-14796
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> Current the Spark-Module Spark SQL implementation gets records from HBase 
> from the driver if there is something like the following found in the SQL.
> rowkey = 123
> The reason for this original was normal sql will not have many equal 
> operations in a single where clause.
> Zhan, had brought up too points that have value.
> 1. The SQL may be generated and may have many many equal statements in it so 
> moving the work to an executor protects the driver from load
> 2. In the correct implementation the drive is connecting to HBase and 
> exceptions may cause trouble with the Spark application and not just with the 
> a single task execution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-10 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999385#comment-14999385
 ] 

Ted Malaska commented on HBASE-14789:
-

Can you help me understand what components this has that don't already exist in 
the current HBase-Spark module and also what requirements are not met by the 
current Spark-Module implementation but are supported with this code?

Thanks

> Provide an alternative spark-hbase connector
> 
>
> Key: HBASE-14789
> URL: https://issues.apache.org/jira/browse/HBASE-14789
> Project: HBase
>  Issue Type: Bug
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: shc.pdf
>
>
> This JIRA is to provide user an option to choose different Spark-HBase 
> implementation based on requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan

2015-11-10 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999621#comment-14999621
 ] 

Ted Malaska commented on HBASE-14795:
-

There is no real negative to this proposed approach other then a second 
implementation of table scan.  To bad the existing TableInputFormat can not be 
updated to handle this because then this would be in one local.

As for implementation these is no reason this can't just be invoked straight 
from line 330 from DefaultSource or could be an alternate implementation in 
hbaseRDD that tables multi scan objects. 

https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/DefaultSource.scala#L330


> Provide an alternative spark-hbase SQL implementations for Scan
> ---
>
> Key: HBASE-14795
> URL: https://issues.apache.org/jira/browse/HBASE-14795
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement 
> of TableInputFormat for a more custom scan implementation that will make the 
> following use case more effective.
> Use case:
> In the case you have multiple scan ranges on a single table with in a single 
> query.  TableInputFormat will scan the the outer range of the scan start and 
> end range where this implementation can be more pointed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-10 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999628#comment-14999628
 ] 

Ted Malaska commented on HBASE-14789:
-

Put comments related to the bulk get implementation in jira HBASE-14796

> Provide an alternative spark-hbase connector
> 
>
> Key: HBASE-14789
> URL: https://issues.apache.org/jira/browse/HBASE-14789
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: shc.pdf
>
>
> This JIRA is to provide user an option to choose different Spark-HBase 
> implementation based on requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets

2015-11-10 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999635#comment-14999635
 ] 

Ted Malaska commented on HBASE-14796:
-

If implemented this code would fit great right around

https://github.com/apache/hbase/blob/master/hbase-spark/src/main/scala/org/apache/hadoop/hbase/spark/DefaultSource.scala#L347

> Provide an alternative spark-hbase SQL implementations for Gets
> ---
>
> Key: HBASE-14796
> URL: https://issues.apache.org/jira/browse/HBASE-14796
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> Current the Spark-Module Spark SQL implementation gets records from HBase 
> from the driver if there is something like the following found in the SQL.
> rowkey = 123
> The reason for this original was normal sql will not have many equal 
> operations in a single where clause.
> Zhan, had brought up too points that have value.
> 1. The SQL may be generated and may have many many equal statements in it so 
> moving the work to an executor protects the driver from load
> 2. In the correct implementation the drive is connecting to HBase and 
> exceptions may cause trouble with the Spark application and not just with the 
> a single task execution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets

2015-11-10 Thread Ted Malaska (JIRA)
Ted Malaska created HBASE-14796:
---

 Summary: Provide an alternative spark-hbase SQL implementations 
for Gets
 Key: HBASE-14796
 URL: https://issues.apache.org/jira/browse/HBASE-14796
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Malaska
Assignee: Zhan Zhang
Priority: Minor


Current the Spark-Module Spark SQL implementation gets records from HBase from 
the driver if there is something like the following found in the SQL.

rowkey = 123

The reason for this original was normal sql will not have many equal operations 
in a single where clause.

Zhan, had brought up too points that have value.
1. The SQL may be generated and may have many many equal statements in it so 
moving the work to an executor protects the driver from load
2. In the correct implementation the drive is connecting to HBase and 
exceptions may cause trouble with the Spark application and not just with the a 
single task execution





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-10 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999422#comment-14999422
 ] 

Ted Malaska commented on HBASE-14789:
-

Cool I read the doc so there are two points.

* Bulk Get - Do bulk Gets on an executor
* TableInputFormat - Don't use this because or the thought that only one can 
run at a time
* Change the table description format - Add more JSON like definition
* Add write support - For SparkSQL writes to HBase

#First lets talk to each point first:
* Bulk Get: - As we have talked about in other jira's executing this on the 
executor side really doesn't add much value.  It would be vary odd if people 
would have more then a 1000 equals in a where cause.  If they did then we need 
to figure out at what point 1000, 1, 5 does it become faster to run the 
code on the executor.  The normal use case is just a couple = per where cause 
so this is not a real concern, now if you want to do a real bulk get then use 
the bulk get command, that will be much better for a lot of reasons.

* Not Using TableInputFormat: In the code today Spark if given the 
TablInputFormat in different requests so they are at different points on the 
DAG.  So why does Spark not read from both?  Also the locality is given and we 
are not reinventing the wheel.

* Change the table description format: This is a preference thing is current 
version is more like the HBase shell.  Ether way makes sense it makes no real 
difference.

* Add write support: Yes we should add this.  

#Summery
First I think any and all changes would fit into the current implementation of 
the HBase-Spark module with little changes.  This are pretty pointed changes 
that effect a scoped area of the code.  

Second we should separate out this jira into 4 different jiras each focusing on 
the different points, for these different points are not dependent or related. 
We should open up a jira to address each features and then discuss the approach 
for each one and how it can be added and or if it should be added.

Thanks Zhan

Let me know if I missed anything

 



> Provide an alternative spark-hbase connector
> 
>
> Key: HBASE-14789
> URL: https://issues.apache.org/jira/browse/HBASE-14789
> Project: HBase
>  Issue Type: Bug
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: shc.pdf
>
>
> This JIRA is to provide user an option to choose different Spark-HBase 
> implementation based on requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HBASE-14795) Provide an alternative spark-hbase SQL implementations for Scan

2015-11-10 Thread Ted Malaska (JIRA)
Ted Malaska created HBASE-14795:
---

 Summary: Provide an alternative spark-hbase SQL implementations 
for Scan
 Key: HBASE-14795
 URL: https://issues.apache.org/jira/browse/HBASE-14795
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Malaska
Assignee: Zhan Zhang
Priority: Minor


This is a sub-jira of HBASE-14789.  This jira is to focus on the replacement of 
TableInputFormat for a more custom scan implementation that will make the 
following use case more effective.

Use case:
In the case you have multiple scan ranges on a single table with in a single 
query.  TableInputFormat will scan the the outer range of the scan start and 
end range where this implementation can be more pointed.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-10 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999623#comment-14999623
 ] 

Ted Malaska commented on HBASE-14789:
-

Put response to TableInputFormat design in HBASE-14795 

> Provide an alternative spark-hbase connector
> 
>
> Key: HBASE-14789
> URL: https://issues.apache.org/jira/browse/HBASE-14789
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: shc.pdf
>
>
> This JIRA is to provide user an option to choose different Spark-HBase 
> implementation based on requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14789) Provide an alternative spark-hbase connector

2015-11-10 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14789?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999622#comment-14999622
 ] 

Ted Malaska commented on HBASE-14789:
-

This is a sub jira

> Provide an alternative spark-hbase connector
> 
>
> Key: HBASE-14789
> URL: https://issues.apache.org/jira/browse/HBASE-14789
> Project: HBase
>  Issue Type: Improvement
>Reporter: Zhan Zhang
>Assignee: Zhan Zhang
> Attachments: shc.pdf
>
>
> This JIRA is to provide user an option to choose different Spark-HBase 
> implementation based on requirements.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14796) Provide an alternative spark-hbase SQL implementations for Gets

2015-11-10 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14999633#comment-14999633
 ] 

Ted Malaska commented on HBASE-14796:
-

So there is value in this idea for the generated queries, but for normal SQL 
operations it may be over kill that we need to use a task on an executor to get 
a single record from HBase.

As for the argument about protecting the driver there is some merit to this.

I think there is more merit to the first argument for distributed the get load 
to the executers to support multi user environments. 

But honestly if the developer is using Spark SQL to gets on HBase I question 
the approach.  The user would be better off using the Spark-Module Bulk Get 
functionality that is already checked in.  That implementation will distribute 
the gets across N number of tasks and executors.

> Provide an alternative spark-hbase SQL implementations for Gets
> ---
>
> Key: HBASE-14796
> URL: https://issues.apache.org/jira/browse/HBASE-14796
> Project: HBase
>  Issue Type: Improvement
>Reporter: Ted Malaska
>Assignee: Zhan Zhang
>Priority: Minor
>
> Current the Spark-Module Spark SQL implementation gets records from HBase 
> from the driver if there is something like the following found in the SQL.
> rowkey = 123
> The reason for this original was normal sql will not have many equal 
> operations in a single where clause.
> Zhan, had brought up too points that have value.
> 1. The SQL may be generated and may have many many equal statements in it so 
> moving the work to an executor protects the driver from load
> 2. In the correct implementation the drive is connecting to HBase and 
> exceptions may cause trouble with the Spark application and not just with the 
> a single task execution



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14149) Add Data Frame support for HBase-Spark Module

2015-10-30 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14982527#comment-14982527
 ] 

Ted Malaska commented on HBASE-14149:
-

closing this jira because we got dataframe support with 
https://issues.apache.org/jira/browse/HBASE-14181

> Add Data Frame support for HBase-Spark Module
> -
>
> Key: HBASE-14149
> URL: https://issues.apache.org/jira/browse/HBASE-14149
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>
> Add on to the work done in HBASE-13992 and add support for dataframes for 
> bulk puts, bulk gets, and scans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-14149) Add Data Frame support for HBase-Spark Module

2015-10-30 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska resolved HBASE-14149.
-
Resolution: Duplicate

This was done in https://issues.apache.org/jira/browse/HBASE-14181

With connection to Spark SQL

> Add Data Frame support for HBase-Spark Module
> -
>
> Key: HBASE-14149
> URL: https://issues.apache.org/jira/browse/HBASE-14149
> Project: HBase
>  Issue Type: New Feature
>  Components: spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
>
> Add on to the work done in HBASE-13992 and add support for dataframes for 
> bulk puts, bulk gets, and scans



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-16 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961155#comment-14961155
 ] 

Ted Malaska commented on HBASE-14406:
-

own -> owe

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, 
> HBASE-14406.11.patch, HBASE-14406.2.patch, HBASE-14406.3.patch, 
> HBASE-14406.4.patch, HBASE-14406.5.patch, HBASE-14406.6.patch, 
> HBASE-14406.7.patch, HBASE-14406.9.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-16 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14961150#comment-14961150
 ] 

Ted Malaska commented on HBASE-14406:
-

OMG I own you guys a beer.  That was a long patch.  Thank you both.

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, 
> HBASE-14406.11.patch, HBASE-14406.2.patch, HBASE-14406.3.patch, 
> HBASE-14406.4.patch, HBASE-14406.5.patch, HBASE-14406.6.patch, 
> HBASE-14406.7.patch, HBASE-14406.9.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959844#comment-14959844
 ] 

Ted Malaska commented on HBASE-14406:
-

I just looked at 

https://issues.apache.org/jira/secure/attachment/12766912/HBASE-14406.10.patch

and search for 

diff --git a/hbase-spark/src/main/protobuf/Filter.proto 
b/hbase-spark/src/main/protobuf/Filter.proto

It's there.  Let me know if I missed something

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, 
> HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, 
> HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, 
> HBASE-14406.9.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959872#comment-14959872
 ] 

Ted Malaska commented on HBASE-14406:
-

Ohh [~ted_yu] so I need to add the generated file into the patch.  Now I 
understand what you are saying.

Sorry I was reading to fast.

Will make new patch now

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, 
> HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, 
> HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, 
> HBASE-14406.9.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14406:

Attachment: (was: TestSuite.txt)

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14406:

Attachment: HBASE-14406.10.patch

Just double checking and uploading the newest version

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, 
> HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, 
> HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, 
> HBASE-14406.9.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14406:

Attachment: (was: Surefile-reports.zip)

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959847#comment-14959847
 ] 

Ted Malaska commented on HBASE-14406:
-

Also the diff number of the review board if off by one.  Which is my fault  I 
skipped version 8.  It never made it to up loaded :)

So version 9 on reviewBoard is version 10 on jira

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, 
> HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, 
> HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, 
> HBASE-14406.9.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14406:

Attachment: HBASE-14406.11.patch

Added FilterProtos.java to git

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, 
> HBASE-14406.11.patch, HBASE-14406.2.patch, HBASE-14406.3.patch, 
> HBASE-14406.4.patch, HBASE-14406.5.patch, HBASE-14406.6.patch, 
> HBASE-14406.7.patch, HBASE-14406.9.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959866#comment-14959866
 ] 

Ted Malaska commented on HBASE-14406:
-

grr the build system didn't generate the proto classes

Let me do some research

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, 
> HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, 
> HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, 
> HBASE-14406.9.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959791#comment-14959791
 ] 

Ted Malaska commented on HBASE-14406:
-

OK I rebuilt everything and restarted my computer.  And everything is fine.

I'm not sure what caused the problem originally but unit tests in patch 9 work 
on my local.

Sorry for the false alarm


> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, 
> Surefile-reports.zip, TestSuite.txt
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959830#comment-14959830
 ] 

Ted Malaska commented on HBASE-14406:
-

yup it is in there

https://reviews.apache.org/r/38536/diff/9#4

Let me know if you don't see it


> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.10.patch, 
> HBASE-14406.2.patch, HBASE-14406.3.patch, HBASE-14406.4.patch, 
> HBASE-14406.5.patch, HBASE-14406.6.patch, HBASE-14406.7.patch, 
> HBASE-14406.9.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959641#comment-14959641
 ] 

Ted Malaska commented on HBASE-14406:
-

I just did "mvn -Dtest=NoUnitTests clean verify" in the hbase-spark folder

It totally worked before I rebased then I rebased and it didn't work.

I also got a fresh copy of master (so with out this patch) and I tried it on my 
box and two other people's boxes and all three failed.

The change in the host file was successful on one of the boxes to fix the 
problem.  I will try to repeat the fix when I get home.

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, 
> Surefile-reports.zip
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14406:

Attachment: HBASE-14406.9.patch

rebasing pom.

On a side note something funky happen to hbase when I rebased.  On my computer 
all the unit tests are broken in the master branch unless I do a "reverse 
look-up-able IP" to make it work.  This issue is unrelated to my patch it was 
something else that change recently.

I get this error 
java.io.IOException: java.lang.RuntimeException: Could not resolve Kerberos 
principal name: java.net.UnknownHostException: tmalaska-MBP-2.home: 
tmalaska-MBP-2.home: nodename nor servname provided, or not known

On a HBaseTestingUtility startMiniCluster

I have tested this on more then one computer with friends.  

So repeat the patch should be good, but something is not good with HBase in the 
latest master with respeck to HBaseTestingUtility startMiniCluster



> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, 
> Surefile-reports.zip
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14406:

Attachment: TestSuite.txt

BTW here is the full stack trace when the host file is not updated to do a 
reverse look up.



> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, 
> Surefile-reports.zip, TestSuite.txt
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959772#comment-14959772
 ] 

Ted Malaska commented on HBASE-14406:
-

I tried this build and it doesn't have the hbase unit testing problem

tmalaska-MBP-2:hbase-spark ted.malaska$ git log | head -n 1
commit 8f95318f6252c1c0b7a073619525eae6d991f47b

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, 
> Surefile-reports.zip, TestSuite.txt
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959643#comment-14959643
 ] 

Ted Malaska commented on HBASE-14406:
-

Also this is my version

tmalaska-MBP-2:hbase-spark ted.malaska$  git log | head -n 1
commit d5ed46bc9f9285f75d2d906ec9c120cb408827df

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, 
> Surefile-reports.zip
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959648#comment-14959648
 ] 

Ted Malaska commented on HBASE-14406:
-

The code that break is just

  var TEST_UTIL: HBaseTestingUtility = new HBaseTestingUtility
TEST_UTIL.startMiniCluster() //BOOM

There is nothing else that runs no Spark stuff no nothing.  Just 
HBaseTestingUtility

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, HBASE-14406.9.patch, 
> Surefile-reports.zip
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-15 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14959311#comment-14959311
 ] 

Ted Malaska commented on HBASE-14406:
-

OK I will make this change in the next hour or so.

Thanks Ted Yu

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, Surefile-reports.zip
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-14 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14957413#comment-14957413
 ] 

Ted Malaska commented on HBASE-14406:
-

What went wrong with the build?

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, Surefile-reports.zip
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-14 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14406:

Attachment: HBASE-14406.7.patch

Moved ProtoBufs to hbase-spark and out of hbase-protoco

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, HBASE-14406.7.patch, Surefile-reports.zip
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-13 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14955850#comment-14955850
 ] 

Ted Malaska commented on HBASE-14406:
-

Well that does make sense.  Let me look into that tomorrow.

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch, 
> HBASE-14406.6.patch, Surefile-reports.zip
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14158) Add documentation for Initial Release for HBase-Spark Module integration

2015-10-12 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14158:

Attachment: HBASE-14158.7.patch

Removed the long lines and used the following command instead of git diff

git format-patch --stdout origin/master > HBASE-14158.7.patch 

> Add documentation for Initial Release for HBase-Spark Module integration 
> -
>
> Key: HBASE-14158
> URL: https://issues.apache.org/jira/browse/HBASE-14158
> Project: HBase
>  Issue Type: Improvement
>  Components: documentation, spark
>Reporter: Ted Malaska
>Assignee: Ted Malaska
> Fix For: 2.0.0
>
> Attachments: HBASE-14158.1.patch, HBASE-14158.2.patch, 
> HBASE-14158.5.patch, HBASE-14158.5.patch, HBASE-14158.6.patch, 
> HBASE-14158.7.patch
>
>
> Add documentation for Initial Release for HBase-Spark Module integration 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-12 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953360#comment-14953360
 ] 

Ted Malaska commented on HBASE-14406:
-

I think the bug from last time was the following two

( rowkey < 1 or col > 2 )

and

( colA < 1 or colB > 2 )

The functionality of (rowkey < 1 and col > 2) worked in the last patch

But here are some related tests that should cover both cases
test("Test SQL point and range combo") 
test("Test OR logic with a one RowKey and One column")
test("Test two complete range non merge rowKey query")
test("Test OR logic with a two columns")


> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-12 Thread Ted Malaska (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14953463#comment-14953463
 ] 

Ted Malaska commented on HBASE-14406:
-

[~zhanzhang] np.  Lets me add it now.  It will take hopefully less then an hour.

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-12 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14406:

Attachment: HBASE-14406.4.patch

Applied worked for Zhan Zhang and Ted Yu

> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-14406) The dataframe datasource filter is wrong, and will result in data loss or unexpected behavior

2015-10-12 Thread Ted Malaska (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-14406?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Malaska updated HBASE-14406:

Attachment: HBASE-14406.5.patch

Then to Than


> The dataframe datasource filter is wrong, and will result in data loss or 
> unexpected behavior
> -
>
> Key: HBASE-14406
> URL: https://issues.apache.org/jira/browse/HBASE-14406
> Project: HBase
>  Issue Type: Bug
>  Components: spark
>Affects Versions: 2.0.0
>Reporter: Zhan Zhang
>Assignee: Ted Malaska
>Priority: Blocker
> Fix For: 2.0.0
>
> Attachments: HBASE-14406.1.patch, HBASE-14406.2.patch, 
> HBASE-14406.3.patch, HBASE-14406.4.patch, HBASE-14406.5.patch
>
>
> Following condition will result in the same filter. It will have data loss 
> with the current filter construction.
> col1 > 4 && col2 < 3
> col1 > 4 || col2 < 3



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   >