[jira] [Work started] (HBASE-28056) [HBoss] add support for AWS v2 SDK

2023-09-16 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-28056 started by Steve Loughran.
--
> [HBoss] add support for AWS v2 SDK
> --
>
> Key: HBASE-28056
> URL: https://issues.apache.org/jira/browse/HBASE-28056
> Project: HBase
>  Issue Type: Bug
>  Components: hboss
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, 
> which HADOOP-18703 will do on hadoop trunk within a few days.
> I think the solution here is probably some profile to build against different 
> sdk/hadoop versions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28056) [HBoss] add support for AWS v2 SDK

2023-09-14 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17765230#comment-17765230
 ] 

Steve Loughran commented on HBASE-28056:


FYI, i have got things compiling, but mock tests are failing as the api is 
being used differently. HADOOP-1 makes things slightly easier, but 
HADOOP-18877 should provide an easier plugin point to pub the stub s3fs behind, 
as all calls to s3 will be behind an internal interface: what you need to 
implement becomes a lot clearer.

> [HBoss] add support for AWS v2 SDK
> --
>
> Key: HBASE-28056
> URL: https://issues.apache.org/jira/browse/HBASE-28056
> Project: HBase
>  Issue Type: Bug
>  Components: hboss
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, 
> which HADOOP-18703 will do on hadoop trunk within a few days.
> I think the solution here is probably some profile to build against different 
> sdk/hadoop versions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (HBASE-28056) [HBoss] add support for AWS v2 SDK

2023-08-31 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-28056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran updated HBASE-28056:
---
Description: 
HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, 
which HADOOP-18703 will do on hadoop trunk within a few days.

I think the solution here is probably some profile to build against different 
sdk/hadoop versions

  was:
HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, 
which HADOOP-18703 will do on hadoop trunk within a few days.




> [HBoss] add support for AWS v2 SDK
> --
>
> Key: HBASE-28056
> URL: https://issues.apache.org/jira/browse/HBASE-28056
> Project: HBase
>  Issue Type: Bug
>  Components: hboss
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, 
> which HADOOP-18703 will do on hadoop trunk within a few days.
> I think the solution here is probably some profile to build against different 
> sdk/hadoop versions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-28056) [HBoss] add support for AWS v2 SDK

2023-08-31 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-28056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17760881#comment-17760881
 ] 

Steve Loughran commented on HBASE-28056:



{code}
20:40:56 2023-08-30 19:40:56 - ERROR-root::util|431:: [ERROR] Failed to execute 
goal org.apache.maven.plugins:maven-compiler-plugin:3.7.0:compile 
(default-compile) on project hadoop-testutils: Compilation failure: Compilation 
failure:
20:40:56 2023-08-30 19:40:56 - ERROR-root::util|431:: [ERROR] 
/.../hbase_filesystem/hadoop-testutils/src/main/java/org/apache/hadoop/hbase/oss/Hadoop33EmbeddedS3ClientFactory.java:[33,8]
 org.apache.hadoop.hbase.oss.Hadoop33EmbeddedS3ClientFactory is not abstract 
and does not override abstract method 
createS3TransferManager(software.amazon.awssdk.services.s3.S3AsyncClient) in 
org.apache.hadoop.fs.s3a.S3ClientFactory
20:40:56 2023-08-30 19:40:56 - ERROR-root::util|431:: [ERROR] 
/.../hbase_filesystem/hadoop-testutils/src/main/java/org/apache/hadoop/hbase/oss/Hadoop33EmbeddedS3ClientFactory.java:[56,19]
 
createS3Client(java.net.URI,org.apache.hadoop.fs.s3a.S3ClientFactory.S3ClientCreationParameters)
 in org.apache.hadoop.hbase.oss.Hadoop33EmbeddedS3ClientFactory cannot 
implement 
createS3Client(java.net.URI,org.apache.hadoop.fs.s3a.S3ClientFactory.S3ClientCreationParameters)
 in org.apache.hadoop.fs.s3a.S3ClientFactory
20:40:56 2023-08-30 19:40:56 - ERROR-root::util|431:: [ERROR]   return type 
com.amazonaws.services.s3.AmazonS3 is not compatible with 
software.amazon.awssdk.services.s3.S3Client
20:40:56 2023-08-30 19:40:56 - ERROR-root::util|431:: [ERROR] -> [Help 1]

{code}


> [HBoss] add support for AWS v2 SDK
> --
>
> Key: HBASE-28056
> URL: https://issues.apache.org/jira/browse/HBASE-28056
> Project: HBase
>  Issue Type: Bug
>  Components: hboss
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, 
> which HADOOP-18703 will do on hadoop trunk within a few days.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-28056) [HBoss] add support for AWS v2 SDK

2023-08-31 Thread Steve Loughran (Jira)
Steve Loughran created HBASE-28056:
--

 Summary: [HBoss] add support for AWS v2 SDK
 Key: HBASE-28056
 URL: https://issues.apache.org/jira/browse/HBASE-28056
 Project: HBase
  Issue Type: Bug
  Components: hboss
Reporter: Steve Loughran
Assignee: Steve Loughran


HBoss doesn't compile against a version of hadoop built with the AWS v2 SDK, 
which HADOOP-18703 will do on hadoop trunk within a few days.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Work started] (HBASE-27900) [HBOSS] Open file fails with NumberFormatException for S3AFileSystem

2023-06-01 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-27900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-27900 started by Steve Loughran.
--
> [HBOSS] Open file fails with NumberFormatException for S3AFileSystem
> 
>
> Key: HBASE-27900
> URL: https://issues.apache.org/jira/browse/HBASE-27900
> Project: HBase
>  Issue Type: Bug
>  Components: Filesystem Integration
>Affects Versions: 1.0.0-alpha2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> In HADOOP-18724 it is shown that the new overloaded setters for double and 
> float can cause type mismatch and end up setting s3a integer values to floats 
> to see what breaks.
> The wrapper in HBASE-26483 should mimic the hadoop fix and for all 
> double/float values passed in, cast to log and set as integers only. Nothing 
> has ever used the float/double values, so this isn't a regression



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27900) [HBOSS] Open file fails with NumberFormatException for S3AFileSystem

2023-06-01 Thread Steve Loughran (Jira)
Steve Loughran created HBASE-27900:
--

 Summary: [HBOSS] Open file fails with NumberFormatException for 
S3AFileSystem
 Key: HBASE-27900
 URL: https://issues.apache.org/jira/browse/HBASE-27900
 Project: HBase
  Issue Type: Bug
  Components: Filesystem Integration
Affects Versions: 1.0.0-alpha2
Reporter: Steve Loughran
Assignee: Steve Loughran


In HADOOP-18724 it is shown that the new overloaded setters for double and 
float can cause type mismatch and end up setting s3a integer values to floats 
to see what breaks.

The wrapper in HBASE-26483 should mimic the hadoop fix and for all double/float 
values passed in, cast to log and set as integers only. Nothing has ever used 
the float/double values, so this isn't a regression



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (HBASE-27076) [HBOSS] compile against hadoop 3.3.2+ only

2022-05-30 Thread Steve Loughran (Jira)
Steve Loughran created HBASE-27076:
--

 Summary: [HBOSS] compile against hadoop 3.3.2+ only
 Key: HBASE-27076
 URL: https://issues.apache.org/jira/browse/HBASE-27076
 Project: HBase
  Issue Type: Improvement
  Components: hboss
Reporter: Steve Loughran


to get openFile and other things to work safely, hboss needs to be changed so 
it only builds against hadoopp 3.3 only. 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Work started] (HBASE-26483) [HBOSS] add lock around openFile operation

2022-05-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-26483 started by Steve Loughran.
--
> [HBOSS] add lock around openFile operation
> --
>
> Key: HBASE-26483
> URL: https://issues.apache.org/jira/browse/HBASE-26483
> Project: HBase
>  Issue Type: Improvement
>  Components: hboss
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> The HBoss FS wrapper doesn't wrap the openFile(path) call with a lock, which 
> means anything using that builder isn't going to have access synchronized.
> adding a wrapper for this method will allow hbase to use the api call and so 
> request different read policies on different files, or other options



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Assigned] (HBASE-26483) [HBOSS] add lock around openFile operation

2022-05-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-26483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reassigned HBASE-26483:
--

Assignee: Steve Loughran

> [HBOSS] add lock around openFile operation
> --
>
> Key: HBASE-26483
> URL: https://issues.apache.org/jira/browse/HBASE-26483
> Project: HBase
>  Issue Type: Improvement
>  Components: hboss
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>
> The HBoss FS wrapper doesn't wrap the openFile(path) call with a lock, which 
> means anything using that builder isn't going to have access synchronized.
> adding a wrapper for this method will allow hbase to use the api call and so 
> request different read policies on different files, or other options



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-27042) hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut

2022-05-24 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17541673#comment-17541673
 ] 

Steve Loughran commented on HBASE-27042:


thanks; i will follow up with the other issues next week/month

> hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut
> ---
>
> Key: HBASE-27042
> URL: https://issues.apache.org/jira/browse/HBASE-27042
> Project: HBase
>  Issue Type: Bug
>  Components: hboss
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: hbase-filesystem-1.0.0-alpha2
>
>
> HBoss doesn't compile against hadoop builds containing HADOOP-17409, "remove 
> s3guard", as test setup tries to turn it off.
> there's no need for s3guard any more, so hboss can just avoid all settings 
> and expect it to be disabled (hadoop 3.3.3. or earlier) or removed (3.4+)
> (hboss version is 1.0.0-alpha2-SNAPSHOT)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-27042) hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut

2022-05-16 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-27042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537531#comment-17537531
 ] 

Steve Loughran commented on HBASE-27042:


an update of the AWS SDK also breaks the tests, because the AWS client now 
requires another method to be implemented


{code}
[ERROR] 
testListFilesEmptyDirectoryNonrecursive(org.apache.hadoop.hbase.oss.contract.TestHBOSSContractGetFileStatus)
  Time elapsed: 4.248 s  <<< ERROR!
java.lang.UnsupportedOperationException: Extend AbstractAmazonS3 to provide an 
implementation

{code}

once you tell maven  to give you useful stack traces, you can track this down


{code}

[ERROR] 
testListLocatedStatusEmptyDirectory(org.apache.hadoop.hbase.oss.contract.TestHBOSSContractGetFileStatus)
  Time elapsed: 1.365 s  <<< ERROR!
java.lang.UnsupportedOperationException: Extend AbstractAmazonS3 to provide an 
implementation
at 
com.amazonaws.services.s3.AbstractAmazonS3.deleteObject(AbstractAmazonS3.java:642)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$null$13(S3AFileSystem.java:2696)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDurationOfInvocation(IOStatisticsBinding.java:464)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$deleteObject$14(S3AFileSystem.java:2694)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:414)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:377)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObject(S3AFileSystem.java:2690)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjectAtPath(S3AFileSystem.java:2725)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.lambda$deleteObjectAtPath$0(S3AFileSystem.java:2055)
at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:135)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:117)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:133)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.deleteObjectAtPath(S3AFileSystem.java:2054)
 
{code}


> hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut
> ---
>
> Key: HBASE-27042
> URL: https://issues.apache.org/jira/browse/HBASE-27042
> Project: HBase
>  Issue Type: Bug
>  Components: hboss
>Reporter: Steve Loughran
>Priority: Minor
>
> HBoss doesn't compile against hadoop builds containing HADOOP-17409, "remove 
> s3guard", as test setup tries to turn it off.
> there's no need for s3guard any more, so hboss can just avoid all settings 
> and expect it to be disabled (hadoop 3.3.3. or earlier) or removed (3.4+)
> (hboss version is 1.0.0-alpha2-SNAPSHOT)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-27042) hboss doesn't compile against hadoop branch-3.3 now that s3guard is cut

2022-05-16 Thread Steve Loughran (Jira)
Steve Loughran created HBASE-27042:
--

 Summary: hboss doesn't compile against hadoop branch-3.3 now that 
s3guard is cut
 Key: HBASE-27042
 URL: https://issues.apache.org/jira/browse/HBASE-27042
 Project: HBase
  Issue Type: Bug
  Components: hboss
Reporter: Steve Loughran


HBoss doesn't compile against hadoop builds containing HADOOP-17409, "remove 
s3guard", as test setup tries to turn it off.

there's no need for s3guard any more, so hboss can just avoid all settings and 
expect it to be disabled (hadoop 3.3.3. or earlier) or removed (3.4+)

(hboss version is 1.0.0-alpha2-SNAPSHOT)



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (HBASE-26483) [HBOSS] add lock around openFile operation

2021-11-23 Thread Steve Loughran (Jira)
Steve Loughran created HBASE-26483:
--

 Summary: [HBOSS] add lock around openFile operation
 Key: HBASE-26483
 URL: https://issues.apache.org/jira/browse/HBASE-26483
 Project: HBase
  Issue Type: Improvement
  Components: hboss
Reporter: Steve Loughran


The HBoss FS wrapper doesn't wrap the openFile(path) call with a lock, which 
means anything using that builder isn't going to have access synchronized.

adding a wrapper for this method will allow hbase to use the api call and so 
request different read policies on different files, or other options



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (HBASE-24989) [HBOSS] Some code cleanup

2021-11-23 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17448023#comment-17448023
 ] 

Steve Loughran commented on HBASE-24989:


HADOOP-17409 is going to break EmbeddedS3 as we cut all of the s3guard classes. 
hboss is going to need to remove the bit where s3guard is set up...not needed 
anyway

> [HBOSS] Some code cleanup
> -
>
> Key: HBASE-24989
> URL: https://issues.apache.org/jira/browse/HBASE-24989
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: hbase-filesystem-1.0.0-alpha1
>Reporter: Wellington Chevreuil
>Assignee: Wellington Chevreuil
>Priority: Trivial
> Fix For: hbase-filesystem-1.0.0-alpha2
>
>
> This is a cleanup of unused methods/imported classes around several classes 
> of HBOSS project.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (HBASE-25900) HBoss tests compile/failure against Hadoop 3.3.1

2021-05-19 Thread Steve Loughran (Jira)
Steve Loughran created HBASE-25900:
--

 Summary: HBoss tests compile/failure against Hadoop 3.3.1
 Key: HBASE-25900
 URL: https://issues.apache.org/jira/browse/HBASE-25900
 Project: HBase
  Issue Type: Bug
  Components: Filesystem Integration
Affects Versions: 1.0.2
Reporter: Steve Loughran


Changes in Hadoop 3.3.x stop the tests compiling/working. 

* changes in signature of nominally private classes (HADOOP-17497): fix, update
* HADOOP-16721  -s3a rename throwing more exceptions, but no longer failing if 
the dest parent doesn't exist. Fix: change s3a.xml
* HADOOP-17531/HADOOP-17620 distcp moving to listIterator; test failures. 
* HADOOP-13327: tests on syncable which expect files being written to to be 
visible. Fix: skip that test

The fix for HADOOP-17497 stops this compiling against Hadoop < 3.3.1. This is 
unfortunate but I can't see an easy fix. The new signature takes a parameters 
class, so we can (and already are) adding new config options without breaking 
this signature again. And I've tagged it as LimitedPrivate so that future 
developers will know it's used here



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22149) HBOSS: A FileSystem implementation to provide HBase's required semantics

2019-04-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817543#comment-16817543
 ] 

Steve Loughran commented on HBASE-22149:


bq.  LimitedPrivate({"HBase"}),

Prefer @VisibleForTesting. I really hate the limited private stuff

> HBOSS: A FileSystem implementation to provide HBase's required semantics
> 
>
> Key: HBASE-22149
> URL: https://issues.apache.org/jira/browse/HBASE-22149
> Project: HBase
>  Issue Type: New Feature
>  Components: Filesystem Integration
>Reporter: Sean Mackrory
>Assignee: Sean Mackrory
>Priority: Critical
> Attachments: HBASE-22149-hadoop.patch, HBASE-22149-hbase-2.patch, 
> HBASE-22149-hbase-3.patch, HBASE-22149-hbase.patch
>
>
> (Have been using the name HBOSS for HBase / Object Store Semantics)
> I've had some thoughts about how to solve the problem of running HBase on 
> object stores. There has been some thought in the past about adding the 
> required semantics to S3Guard, but I have some concerns about that. First, 
> it's mixing complicated solutions to different problems (bridging the gap 
> between a flat namespace and a hierarchical namespace vs. solving 
> inconsistency). Second, it's S3-specific, whereas other objects stores could 
> use virtually identical solutions. And third, we can't do things like atomic 
> renames in a true sense. There would have to be some trade-offs specific to 
> HBase's needs and it's better if we can solve that in an HBase-specific 
> module without mixing all that logic in with the rest of S3A.
> Ideas to solve this above the FileSystem layer have been proposed and 
> considered (HBASE-20431, for one), and maybe that's the right way forward 
> long-term, but it certainly seems to be a hard problem and hasn't been done 
> yet. But I don't know enough of all the internal considerations to make much 
> of a judgment on that myself.
> I propose a FileSystem implementation that wraps another FileSystem instance 
> and provides locking of FileSystem operations to ensure correct semantics. 
> Locking could quite possibly be done on the same ZooKeeper ensemble as an 
> HBase cluster already uses (I'm sure there are some performance 
> considerations here that deserve more attention). I've put together a 
> proof-of-concept on which I've tested some aspects of atomic renames and 
> atomic file creates. Both of these tests fail reliably on a naked s3a 
> instance. I've also done a small YCSB run against a small cluster to sanity 
> check other functionality and was successful. I will post the patch, and my 
> laundry list of things that still need work. The WAL is still placed on HDFS, 
> but the HBase root directory is otherwise on S3.
> Note that my prototype is built on Hadoop's source tree right now. That's 
> purely for my convenience in putting it together quickly, as that's where I 
> mostly work. I actually think long-term, if this is accepted as a good 
> solution, it makes sense to live in HBase (or it's own repository). It only 
> depends on stable, public APIs in Hadoop and is targeted entirely at HBase's 
> needs, so it should be able to iterate on the HBase community's terms alone.
> Another idea [~ste...@apache.org] proposed to me is that of an inode-based 
> FileSystem that keeps hierarchical metadata in a more appropriate store that 
> would allow the required transactions (maybe a special table in HBase could 
> provide that store itself for other tables), and stores the underlying files 
> with unique identifiers on S3. This allows renames to actually become fast 
> instead of just large atomic operations. It does however place a strong 
> dependency on the metadata store. I have not explored this idea much. My 
> current proof-of-concept has been pleasantly simple, so I think it's the 
> right solution unless it proves unable to provide the required performance 
> characteristics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22005) Use ByteBuff's refcnt to track the life cycle of data block

2019-03-11 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16789558#comment-16789558
 ] 

Steve Loughran commented on HBASE-22005:


bq. I think 2.7.x is a stable release line, and lots of users are still on it, 
so it is not likely that we will drop the support for hadoop 2.7.x for our 
hbase 2.x releases.

license incompatibilities in libaries we dist (aws SDK) and ASF mean that we 
aren't in a position to release new versions; its built on a version of java 
that's ~impossible to get hold of. We don't really have a choice in the matter

> Use ByteBuff's refcnt to track the life cycle of data block
> ---
>
> Key: HBASE-22005
> URL: https://issues.apache.org/jira/browse/HBASE-22005
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: HBASE-22005.HBASE-21879.v1.patch, 
> HBASE-22005.HBASE-21879.v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22005) Use ByteBuff's refcnt to track the life cycle of data block

2019-03-08 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787773#comment-16787773
 ] 

Steve Loughran commented on HBASE-22005:


bq. And for high latency object storage, such as S3, I do not see any 
difference between passing a  and ?

I did raise that as in option in HADOOP-11867, but Its not what works for 
Parquet, ORC etc, and its not what we get from the HTTP APIs anyway. So we'd be 
converting from stream to byte buffer, they'd be going back again. 


bq. Anyway, we need the StreamCapabilities API to query whether a given stream 
has the ability, sadly it is only provided in hadoop-2.9+ and we still need to 
support 2.7...

It cant be backported to 2.7.x as no more releases are coming out there -it's 
EOL. 2.8.x though, that could be done without much difficulty. Have you 
considered moving to later hadoop libraries? Or more specifically: is there 
something that's been done there which stops you, or is it just the inertia of 
the installed base?

> Use ByteBuff's refcnt to track the life cycle of data block
> ---
>
> Key: HBASE-22005
> URL: https://issues.apache.org/jira/browse/HBASE-22005
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: HBASE-22005.HBASE-21879.v1.patch, 
> HBASE-22005.HBASE-21879.v2.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22005) Use ByteBuff's refcnt to track the life cycle of data block

2019-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16787053#comment-16787053
 ] 

Steve Loughran commented on HBASE-22005:


I think you'll need a plan to continue to work with stores which don't support 
BB; that includes object stores which ship with HBase support today (hello 
Azure!) and whose users will be unhappy when things not working. 

bq. I still think those basic fs, such as LocalFileSystem/DistributedFileSystem 
need the ByteBuffer read/pread method, it's so common to use 

I see the world moving away from Posix in two directions

* near-RAM-speed solid state storage. Here memory access operations make a lot 
more sense than the stream API, because in hardware these can be part of the 
memory space of the application. Why copy it into  process memory at all, when 
it can just be memory mapped?

* object storage. Here we go the other way - high latency IO where the cost of 
a seek() is such that you can see the logs pause and you'll know "hey! it's a 
GET". There we're looking at async IO APIs, vectored IO ops etc. I don't expect 
stores to implement ByteBufferReadable; async vector reads where you provide a 
list of  Use ByteBuff's refcnt to track the life cycle of data block
> ---
>
> Key: HBASE-22005
> URL: https://issues.apache.org/jira/browse/HBASE-22005
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: HBASE-22005.HBASE-21879.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-22005) Use ByteBuff's refcnt to track the life cycle of data block

2019-03-07 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-22005?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16786813#comment-16786813
 ] 

Steve Loughran commented on HBASE-22005:


A lot of filesystems don't implement the byte buffer operations, not just 
through laziness but because if the underlying APIs used for data just work at 
the stream level (e.g. all the http clients), its pretty suboptimal to try: 
we'd be streaming into a byte buffer, the app would be pulling out thinking 
they were getting a performance boost when they weren't etc.

See HADOOP-11867 for some discussion of this.

I'll accept a patch to let you use the StreamCapabilities to query all the way 
through a wrapped input stream to see if the feature is available, and once 
HADOOP-15691 is in, a check on a filesystem before even opening a file. This 
should let you decide when to switch to ByteBuffers.

> Use ByteBuff's refcnt to track the life cycle of data block
> ---
>
> Key: HBASE-22005
> URL: https://issues.apache.org/jira/browse/HBASE-22005
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Zheng Hu
>Assignee: Zheng Hu
>Priority: Major
> Attachments: HBASE-22005.HBASE-21879.v1.patch
>
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20774) FSHDFSUtils#isSameHdfs doesn't handle S3 filesystems correctly.

2018-11-14 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16687155#comment-16687155
 ] 

Steve Loughran commented on HBASE-20774:


HADOOP-14556 will add a canonical name when you turn DTs on

> FSHDFSUtils#isSameHdfs doesn't handle S3 filesystems correctly.
> ---
>
> Key: HBASE-20774
> URL: https://issues.apache.org/jira/browse/HBASE-20774
> Project: HBase
>  Issue Type: Bug
>  Components: Filesystem Integration
>Reporter: Austin Heyne
>Priority: Major
>  Labels: S3, S3Native, s3
>
> FSHDFSUtils#isSameHdfs retrieves the Canonical Service Name from Hadoop to 
> determine if source and destination are on the same filesystem. 
> NativeS3FileSystem, S3FileSystem and presumably S3NativeFileSystem 
> (com.amazon) always return null in getCanonicalServiceName() which 
> incorrectly causes isSameHdfs to return false even when they could be the 
> same. 
> Error encountered while trying to perform bulk load from S3 to HBase on S3 
> backed by the same bucket. This is causing bulk loads from S3 to copy all the 
> data to the workers and back up to S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21149) TestIncrementalBackupWithBulkLoad may fail due to file copy failure

2018-10-15 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650163#comment-16650163
 ] 

Steve Loughran commented on HBASE-21149:


I wouldn't blame distcp here, yet. This hints of a race condition in the distcp 
setup process: have you kicked off distcp while some of the source files were 
being written?

> TestIncrementalBackupWithBulkLoad may fail due to file copy failure
> ---
>
> Key: HBASE-21149
> URL: https://issues.apache.org/jira/browse/HBASE-21149
> Project: HBase
>  Issue Type: Test
>  Components: backuprestore
>Reporter: Ted Yu
>Assignee: Vladimir Rodionov
>Priority: Major
> Attachments: 21149.v2.txt, HBASE-21149-v1.patch, 
> testIncrementalBackupWithBulkLoad-output.txt
>
>
> From 
> https://builds.apache.org/job/HBase%20Nightly/job/master/471/testReport/junit/org.apache.hadoop.hbase.backup/TestIncrementalBackupWithBulkLoad/TestIncBackupDeleteTable/
>  :
> {code}
> 2018-09-03 11:54:30,526 ERROR [Time-limited test] 
> impl.TableBackupClient(235): Unexpected Exception : Failed copy from 
> hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_
>  to hdfs://localhost:53075/backupUT/backup_1535975655488
> java.io.IOException: Failed copy from 
> hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/0f626c66493649daaf84057b8dd71a30_SeqId_205_,hdfs://localhost:53075/user/jenkins/test-data/ecd40bd0-cb93-91e0-90b5-7bfd5bb2c566/data/default/test-1535975627781/773f5709b645b46bd3840f9cfb549c5a/f/ad8df6415bd9459d9b3df76c588d79df_SeqId_205_
>  to hdfs://localhost:53075/backupUT/backup_1535975655488
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:351)
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.copyBulkLoadedFiles(IncrementalTableBackupClient.java:219)
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.handleBulkLoad(IncrementalTableBackupClient.java:198)
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:320)
>   at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605)
>   at 
> org.apache.hadoop.hbase.backup.TestIncrementalBackupWithBulkLoad.TestIncBackupDeleteTable(TestIncrementalBackupWithBulkLoad.java:104)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:498)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> {code}
> However, some part of the test output was lost:
> {code}
> 2018-09-03 11:53:36,793 DEBUG [RS:0;765c9ca5ea28:36357] regions
> ...[truncated 398396 chars]...
> 8)
>   at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:292)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20429) Support for mixed or write-heavy workloads on non-HDFS filesystems

2018-08-29 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16596423#comment-16596423
 ] 

Steve Loughran commented on HBASE-20429:


BTW, HADOOP-15691 is my latest iteration of having each FS declare its 
capabilities. As I've noted at the end, as well as through a new interface, we 
could expose this as new config options you can look for in 
fsInstance.getCon().get("option"), provided the FS instances clone their 
supplied configs and then patch them. This would let you check to see what an 
FS offered.

w.r.t s3guard, need to know what semantics you get. With S3Guard you get 
consistent listings, but rename is still sub-atomic

thanks for promising to invite me to any discussions, as long as it not via 
Amazon Chime or Skype for Business I'm up for it.

> Support for mixed or write-heavy workloads on non-HDFS filesystems
> --
>
> Key: HBASE-20429
> URL: https://issues.apache.org/jira/browse/HBASE-20429
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Andrew Purtell
>Priority: Major
>
> We can support reasonably well use cases on non-HDFS filesystems, like S3, 
> where an external writer has loaded (and continues to load) HFiles via the 
> bulk load mechanism, and then we serve out a read only workload at the HBase 
> API.
> Mixed workloads or write-heavy workloads won't fare as well. In fact, data 
> loss seems certain. It will depend in the specific filesystem, but all of the 
> S3 backed Hadoop filesystems suffer from a couple of obvious problems, 
> notably a lack of atomic rename. 
> This umbrella will serve to collect some related ideas for consideration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20429) Support for mixed or write-heavy workloads on non-HDFS filesystems

2018-08-17 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584555#comment-16584555
 ] 

Steve Loughran commented on HBASE-20429:


One thing which would be good for you all to write down is: what are your 
expectations of an FS to work.

in particular
* create/read/update/delete consistency
* listing consistency
* which ops are required to be atomic and O(1)
* is it ok for create(path, overwrite=false) to be non-atomic?
* when you expect things to be written to store
* how long do you expect the final close() to take.

Identify these things and you can start to see what stores can work. And show 
you where you need to involve other thigs for the semantics you need. 

> Support for mixed or write-heavy workloads on non-HDFS filesystems
> --
>
> Key: HBASE-20429
> URL: https://issues.apache.org/jira/browse/HBASE-20429
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Andrew Purtell
>Priority: Major
>
> We can support reasonably well use cases on non-HDFS filesystems, like S3, 
> where an external writer has loaded (and continues to load) HFiles via the 
> bulk load mechanism, and then we serve out a read only workload at the HBase 
> API.
> Mixed workloads or write-heavy workloads won't fare as well. In fact, data 
> loss seems certain. It will depend in the specific filesystem, but all of the 
> S3 backed Hadoop filesystems suffer from a couple of obvious problems, 
> notably a lack of atomic rename. 
> This umbrella will serve to collect some related ideas for consideration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (HBASE-20429) Support for mixed or write-heavy workloads on non-HDFS filesystems

2018-08-17 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16584555#comment-16584555
 ] 

Steve Loughran edited comment on HBASE-20429 at 8/18/18 1:02 AM:
-

One thing which would be good for you all to write down is: what are your 
expectations of an FS to work.

in particular
* create/read/update/delete consistency
* listing consistency
* which ops are required to be atomic and O(1)
* is it ok for create(path, overwrite=false) to be non-atomic?
* when you expect things to be written to store
* how long do you expect the final close() to take.

Identify these things and you can start to see what stores can work. And show 
you where you need to involve other things for the semantics you need. 


was (Author: ste...@apache.org):
One thing which would be good for you all to write down is: what are your 
expectations of an FS to work.

in particular
* create/read/update/delete consistency
* listing consistency
* which ops are required to be atomic and O(1)
* is it ok for create(path, overwrite=false) to be non-atomic?
* when you expect things to be written to store
* how long do you expect the final close() to take.

Identify these things and you can start to see what stores can work. And show 
you where you need to involve other thigs for the semantics you need. 

> Support for mixed or write-heavy workloads on non-HDFS filesystems
> --
>
> Key: HBASE-20429
> URL: https://issues.apache.org/jira/browse/HBASE-20429
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Andrew Purtell
>Priority: Major
>
> We can support reasonably well use cases on non-HDFS filesystems, like S3, 
> where an external writer has loaded (and continues to load) HFiles via the 
> bulk load mechanism, and then we serve out a read only workload at the HBase 
> API.
> Mixed workloads or write-heavy workloads won't fare as well. In fact, data 
> loss seems certain. It will depend in the specific filesystem, but all of the 
> S3 backed Hadoop filesystems suffer from a couple of obvious problems, 
> notably a lack of atomic rename. 
> This umbrella will serve to collect some related ideas for consideration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20774) FSHDFSUtils#isSameHdfs doesn't handle S3 filesystems correctly.

2018-06-22 Thread Steve Loughran (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-20774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16520735#comment-16520735
 ] 

Steve Loughran commented on HBASE-20774:


as the S3A javadocs say, "Override getCanonicalServiceName because we don't 
support token in S3A

you will have to fall back to getting the fully qualified path of the root dir  
fs.makeQualified("/") and then compare by path equality

> FSHDFSUtils#isSameHdfs doesn't handle S3 filesystems correctly.
> ---
>
> Key: HBASE-20774
> URL: https://issues.apache.org/jira/browse/HBASE-20774
> Project: HBase
>  Issue Type: Bug
>  Components: Filesystem Integration
>Reporter: Austin Heyne
>Priority: Major
>  Labels: S3, S3Native, s3
>
> FSHDFSUtils#isSameHdfs retrieves the Canonical Service Name from Hadoop to 
> determine if source and destination are on the same filesystem. 
> NativeS3FileSystem, S3FileSystem and presumably S3NativeFileSystem 
> (com.amazon) always return null in getCanonicalServiceName() which 
> incorrectly causes isSameHdfs to return false even when they could be the 
> same. 
> Error encountered while trying to perform bulk load from S3 to HBase on S3 
> backed by the same bucket. This is causing bulk loads from S3 to copy all the 
> data to the workers and back up to S3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20431) Store commit transaction for filesystems that do not support an atomic rename

2018-04-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456203#comment-16456203
 ] 

Steve Loughran commented on HBASE-20431:


FWIW, I've been discussing with Stephan Ewen on the flink team about allowing 
an option to create files on s3a without doing the precursor checks (is this 
really a directory, etc), for people which know what they are doing. We'd 
always do a s3guard check (low cost, stops it becoming corrupted), but it'd 
avoid the caching of the 404 in the AWS load balancers. They are trying to 
defend against DoS attacks: nobody else has to. 

This would fix the GaPaG consistency problem by not doing the G before the P. 
It would be in a new FileSystem/FileContext create() call which returned a 
builder that supported custom fs-specific options, as hadoop-3 already does for 
open(). Something l'd like, but not in my schedule right now, though S3 Select 
depends on it.

> Store commit transaction for filesystems that do not support an atomic rename
> -
>
> Key: HBASE-20431
> URL: https://issues.apache.org/jira/browse/HBASE-20431
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Priority: Major
>
> HBase expects the Hadoop filesystem implementation to support an atomic 
> rename() operation. HDFS does. The S3 backed filesystems do not. The 
> fundamental issue is the non-atomic and eventually consistent nature of the 
> S3 service. A S3 bucket is not a filesystem. S3 is not always immediately 
> read-your-writes. Object metadata can be temporarily inconsistent just after 
> new objects are stored. There can be a settling period to ride over. 
> Renaming/moving objects from one path to another are copy operations with 
> O(file) complexity and O(data) time followed by a series of deletes with 
> O(file) complexity. Failures at any point prior to completion will leave the 
> operation in an inconsistent state. The missing atomic rename semantic opens 
> opportunities for corruption and data loss, which may or may not be 
> repairable with HBCK.
> Handling this at the HBase level could be done with a new multi-step 
> filesystem transaction framework. Call it StoreCommitTransaction. 
> SplitTransaction and MergeTransaction are well established cases where even 
> on HDFS we have non-atomic filesystem changes and are our implementation 
> template for the new work. In this new StoreCommitTransaction we'd be moving 
> flush and compaction temporaries out of the temporary directory into the 
> region store directory. On HDFS the implementation would be easy. We can rely 
> on the filesystem's atomic rename semantics. On S3 it would be work: First we 
> would build the list of objects to move, then copy each object into the 
> destination, and then finally delete all objects at the original path. We 
> must handle transient errors with retry strategies appropriate for the action 
> at hand. We must handle serious or permanent errors where the RS doesn't need 
> to be aborted with a rollback that cleans it all up. Finally, we must handle 
> permanent errors where the RS must be aborted with a rollback during region 
> open/recovery. Note that after all objects have been copied and we are 
> deleting obsolete source objects we must roll forward, not back. To support 
> recovery after an abort we must utilize the WAL to track transaction 
> progress. Put markers in for StoreCommitTransaction start and completion 
> state, with details of the store file(s) involved, so it can be rolled back 
> during region recovery at open. This will be significant work in HFile, 
> HStore, flusher, compactor, and HRegion. Wherever we use HDFS's rename now we 
> would substitute the running of this new multi-step filesystem transaction.
> We need to determine this for certain, but I believe on S3 the PUT or 
> multipart upload of an object must complete before the object is visible, so 
> we don't have to worry about the case where an object is visible before fully 
> uploaded as part of normal operations. So an individual object copy will 
> either happen entirely and the target will then become visible, or it won't 
> and the target won't exist.
> S3 has an optimization, PUT COPY 
> (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html), which 
> the AmazonClient embedded in S3A utilizes for moves. When designing the 
> StoreCommitTransaction be sure to allow for filesystem implementations that 
> leverage a server side copy operation. Doing a get-then-put should be 
> optional. (Not sure Hadoop has an interface that advertises this capability 
> yet; we can add one if not.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20433) HBase Export Snapshot utility does not close FileSystem instances

2018-04-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16456185#comment-16456185
 ] 

Steve Loughran commented on HBASE-20433:


Maybe we could do a special build of hadop-aws which can be set to print a 
stack trace on creation if debug is set. Or actually do this in in FileSystem 
for broader debugging of all FS leaks. I Can see the value in that from time to 
time

> HBase Export Snapshot utility does not close FileSystem instances
> -
>
> Key: HBASE-20433
> URL: https://issues.apache.org/jira/browse/HBASE-20433
> Project: HBase
>  Issue Type: Bug
>  Components: Client, fs, snapshots
>Affects Versions: 1.2.6, 1.4.3
>Reporter: Voyta
>Priority: Major
>
> It seems org.apache.hadoop.hbase.snapshot.ExportSnapshot disallows FileSystem 
> instance caching.
> When verifySnapshot method is being run it calls often methods like 
> org.apache.hadoop.hbase.util.FSUtils#getRootDir that instantiate FileSystem 
> but never calls org.apache.hadoop.fs.FileSystem#close method. This behaviour 
> allows allocation of unwanted objects potentially causing memory leaks.
> Related issue: https://issues.apache.org/jira/browse/HADOOP-15392
>  
> Expectation:
>  * HBase should properly release/close all objects, especially FileSystem 
> instances.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20431) Store commit transaction for filesystems that do not support an atomic rename

2018-04-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452436#comment-16452436
 ] 

Steve Loughran commented on HBASE-20431:


[~mbenjamin]: I don't want non-AWS stores to need S3Guard, and there's always 
the possibility that AWS S3 may itself become consistent. PUT, COPY or MPU 
should be all that's needed to commit a single file, which is all [~apurtell] 
thinks he needs

> Store commit transaction for filesystems that do not support an atomic rename
> -
>
> Key: HBASE-20431
> URL: https://issues.apache.org/jira/browse/HBASE-20431
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Priority: Major
>
> HBase expects the Hadoop filesystem implementation to support an atomic 
> rename() operation. HDFS does. The S3 backed filesystems do not. The 
> fundamental issue is the non-atomic and eventually consistent nature of the 
> S3 service. A S3 bucket is not a filesystem. S3 is not always immediately 
> read-your-writes. Object metadata can be temporarily inconsistent just after 
> new objects are stored. There can be a settling period to ride over. 
> Renaming/moving objects from one path to another are copy operations with 
> O(file) complexity and O(data) time followed by a series of deletes with 
> O(file) complexity. Failures at any point prior to completion will leave the 
> operation in an inconsistent state. The missing atomic rename semantic opens 
> opportunities for corruption and data loss, which may or may not be 
> repairable with HBCK.
> Handling this at the HBase level could be done with a new multi-step 
> filesystem transaction framework. Call it StoreCommitTransaction. 
> SplitTransaction and MergeTransaction are well established cases where even 
> on HDFS we have non-atomic filesystem changes and are our implementation 
> template for the new work. In this new StoreCommitTransaction we'd be moving 
> flush and compaction temporaries out of the temporary directory into the 
> region store directory. On HDFS the implementation would be easy. We can rely 
> on the filesystem's atomic rename semantics. On S3 it would be work: First we 
> would build the list of objects to move, then copy each object into the 
> destination, and then finally delete all objects at the original path. We 
> must handle transient errors with retry strategies appropriate for the action 
> at hand. We must handle serious or permanent errors where the RS doesn't need 
> to be aborted with a rollback that cleans it all up. Finally, we must handle 
> permanent errors where the RS must be aborted with a rollback during region 
> open/recovery. Note that after all objects have been copied and we are 
> deleting obsolete source objects we must roll forward, not back. To support 
> recovery after an abort we must utilize the WAL to track transaction 
> progress. Put markers in for StoreCommitTransaction start and completion 
> state, with details of the store file(s) involved, so it can be rolled back 
> during region recovery at open. This will be significant work in HFile, 
> HStore, flusher, compactor, and HRegion. Wherever we use HDFS's rename now we 
> would substitute the running of this new multi-step filesystem transaction.
> We need to determine this for certain, but I believe on S3 the PUT or 
> multipart upload of an object must complete before the object is visible, so 
> we don't have to worry about the case where an object is visible before fully 
> uploaded as part of normal operations. So an individual object copy will 
> either happen entirely and the target will then become visible, or it won't 
> and the target won't exist.
> S3 has an optimization, PUT COPY 
> (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html), which 
> the AmazonClient embedded in S3A utilizes for moves. When designing the 
> StoreCommitTransaction be sure to allow for filesystem implementations that 
> leverage a server side copy operation. Doing a get-then-put should be 
> optional. (Not sure Hadoop has an interface that advertises this capability 
> yet; we can add one if not.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20431) Store commit transaction for filesystems that do not support an atomic rename

2018-04-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16452099#comment-16452099
 ] 

Steve Loughran commented on HBASE-20431:


[~mackrorysd] says: 

bq. one could modify S3Guard to prevent a destination directory from being 
visible until it's complete

we can, but it'd restrict the code to requiring a DDB, which would make the WDC 
and Ceph groups sad. I think Andrew could get by without it, if a single file 
is all that's needed for the commit.

> Store commit transaction for filesystems that do not support an atomic rename
> -
>
> Key: HBASE-20431
> URL: https://issues.apache.org/jira/browse/HBASE-20431
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Priority: Major
>
> HBase expects the Hadoop filesystem implementation to support an atomic 
> rename() operation. HDFS does. The S3 backed filesystems do not. The 
> fundamental issue is the non-atomic and eventually consistent nature of the 
> S3 service. A S3 bucket is not a filesystem. S3 is not always immediately 
> read-your-writes. Object metadata can be temporarily inconsistent just after 
> new objects are stored. There can be a settling period to ride over. 
> Renaming/moving objects from one path to another are copy operations with 
> O(file) complexity and O(data) time followed by a series of deletes with 
> O(file) complexity. Failures at any point prior to completion will leave the 
> operation in an inconsistent state. The missing atomic rename semantic opens 
> opportunities for corruption and data loss, which may or may not be 
> repairable with HBCK.
> Handling this at the HBase level could be done with a new multi-step 
> filesystem transaction framework. Call it StoreCommitTransaction. 
> SplitTransaction and MergeTransaction are well established cases where even 
> on HDFS we have non-atomic filesystem changes and are our implementation 
> template for the new work. In this new StoreCommitTransaction we'd be moving 
> flush and compaction temporaries out of the temporary directory into the 
> region store directory. On HDFS the implementation would be easy. We can rely 
> on the filesystem's atomic rename semantics. On S3 it would be work: First we 
> would build the list of objects to move, then copy each object into the 
> destination, and then finally delete all objects at the original path. We 
> must handle transient errors with retry strategies appropriate for the action 
> at hand. We must handle serious or permanent errors where the RS doesn't need 
> to be aborted with a rollback that cleans it all up. Finally, we must handle 
> permanent errors where the RS must be aborted with a rollback during region 
> open/recovery. Note that after all objects have been copied and we are 
> deleting obsolete source objects we must roll forward, not back. To support 
> recovery after an abort we must utilize the WAL to track transaction 
> progress. Put markers in for StoreCommitTransaction start and completion 
> state, with details of the store file(s) involved, so it can be rolled back 
> during region recovery at open. This will be significant work in HFile, 
> HStore, flusher, compactor, and HRegion. Wherever we use HDFS's rename now we 
> would substitute the running of this new multi-step filesystem transaction.
> We need to determine this for certain, but I believe on S3 the PUT or 
> multipart upload of an object must complete before the object is visible, so 
> we don't have to worry about the case where an object is visible before fully 
> uploaded as part of normal operations. So an individual object copy will 
> either happen entirely and the target will then become visible, or it won't 
> and the target won't exist.
> S3 has an optimization, PUT COPY 
> (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html), which 
> the AmazonClient embedded in S3A utilizes for moves. When designing the 
> StoreCommitTransaction be sure to allow for filesystem implementations that 
> leverage a server side copy operation. Doing a get-then-put should be 
> optional. (Not sure Hadoop has an interface that advertises this capability 
> yet; we can add one if not.)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20431) Store commit transaction for filesystems that do not support an atomic rename

2018-04-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450384#comment-16450384
 ] 

Steve Loughran commented on HBASE-20431:


S3Guard is only needed when you want consistency on S3A'; amazon have their own 
(consistent emrfs). and other people (WDC) sell products which are consistent 
out the box.  if ceph is consistent, all is good and you don't need anything 
else. Trying to work with an inconsistent S3 is dangerous unless you explicitly 
put long delays in. For example, in a recovery, always wait a minute or more 
before listing.

bq.  in testing I noticed some times we'd get back (paraphrased) "200 Internal 
Error, please retry"

Not seen that; assume its handled in the AWS client. We do have retries on some 
throttles and transient errors, especially that final POST of an MPU, but 200 
isn't considered error code. 503 is throttle, I believe (see 
S3AUtils.translateException()) for our understandings there

bq.  We also have in our design scope running against Ceph's radosgw so I don't 
know if we can rely on it totally, but we can take advantage of it if we detect 
we are running against S3 proper.

Raw AWS S3 *absolutely* keeps the output of an MPU invisible until the final 
POST of the ordered list of checksums of the uploaded parts. You get billed for 
all that data, so its good to have code to list & purge it (the hadoop s3guard 
CLI does). Provided the other stores you work with do have the same MPU 
visibility semantics, all will be well.

Who to ask about Ceph? 

# Maybe [~stevewatt]  has a suggestion? It's good to ask the developers to see 
what they think their system should do...
# [~iyonger] has been testing S3A and ceph
# And I think maybe now we should make sure there is an explicit test for s3a 
which verifies that uncommitted MPUs aren't visible. I'm sure that's done 
implicitly, but having it drawn out into a single method is easier to look at 
when there are failures.

bq. I would not expect you to volunteer code, no worries! (That would be 
obnoxious... (smile))

Thanks. I'd volunteer Ewan and Thomas but (a) they don't listen to me and (b) 
they're going to do the API you need with a goal of having it work with other 
stores too.

FYI [~fabbri]

> Store commit transaction for filesystems that do not support an atomic rename
> -
>
> Key: HBASE-20431
> URL: https://issues.apache.org/jira/browse/HBASE-20431
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Priority: Major
>
> HBase expects the Hadoop filesystem implementation to support an atomic 
> rename() operation. HDFS does. The S3 backed filesystems do not. The 
> fundamental issue is the non-atomic and eventually consistent nature of the 
> S3 service. A S3 bucket is not a filesystem. S3 is not always immediately 
> read-your-writes. Object metadata can be temporarily inconsistent just after 
> new objects are stored. There can be a settling period to ride over. 
> Renaming/moving objects from one path to another are copy operations with 
> O(file) complexity and O(data) time followed by a series of deletes with 
> O(file) complexity. Failures at any point prior to completion will leave the 
> operation in an inconsistent state. The missing atomic rename semantic opens 
> opportunities for corruption and data loss, which may or may not be 
> repairable with HBCK.
> Handling this at the HBase level could be done with a new multi-step 
> filesystem transaction framework. Call it StoreCommitTransaction. 
> SplitTransaction and MergeTransaction are well established cases where even 
> on HDFS we have non-atomic filesystem changes and are our implementation 
> template for the new work. In this new StoreCommitTransaction we'd be moving 
> flush and compaction temporaries out of the temporary directory into the 
> region store directory. On HDFS the implementation would be easy. We can rely 
> on the filesystem's atomic rename semantics. On S3 it would be work: First we 
> would build the list of objects to move, then copy each object into the 
> destination, and then finally delete all objects at the original path. We 
> must handle transient errors with retry strategies appropriate for the action 
> at hand. We must handle serious or permanent errors where the RS doesn't need 
> to be aborted with a rollback that cleans it all up. Finally, we must handle 
> permanent errors where the RS must be aborted with a rollback during region 
> open/recovery. Note that after all objects have been copied and we are 
> deleting obsolete source objects we must roll forward, not back. To support 
> recovery after an abort we must utilize the WAL to track transaction 
> progress. Put markers in for StoreCommitTransaction start and completion 
> state, with details of 

[jira] [Commented] (HBASE-20431) Store commit transaction for filesystems that do not support an atomic rename

2018-04-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450053#comment-16450053
 ] 

Steve Loughran commented on HBASE-20431:


* you are correct, neither PUT or multipart upload "MPU" has any visible 
outcome until they are complete. MPUs can be completed in a POST from a 
different host than that/those uploading blocks, which is how we implement the 
S3A committers. Talk to [~ehiggs]  & [~Thomas Demoor] about theirideas for 
making that public. If you could use a single MPU to commit the final output, 
you get a nice O(1) atomic operation.
* PUT-COPY is atomic, but it's a 6-10MB/s atomic operation; it's essentially 
what you get when you rename() a single file. though there we DELETE the source 
afterwards. We could expose it for S3 & the other stores which offer a similar 
operation. One thought to consider: although its O(data), its bandwidth is ~0, 
so you can do most of the copies in parallel.
* You aren't worrying about S3 consistency here. For AWS S3 life is easier if 
you mandate using S3Guard for the consistency layer. Otherwise, you can turn on 
fault injection in the S3A connector and see what breaks...

Looking forward to see what you do here, offering some consultancy on design 
and test strategies, carefully not volunteering to provide any code...

> Store commit transaction for filesystems that do not support an atomic rename
> -
>
> Key: HBASE-20431
> URL: https://issues.apache.org/jira/browse/HBASE-20431
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Andrew Purtell
>Priority: Major
>
> HBase expects the Hadoop filesystem implementation to support an atomic 
> rename() operation. HDFS does. The S3 backed filesystems do not. The 
> fundamental issue is the non-atomic and eventually consistent nature of the 
> S3 service. A S3 bucket is not a filesystem. S3 is not always immediately 
> read-your-writes. Object metadata can be temporarily inconsistent just after 
> new objects are stored. There can be a settling period to ride over. 
> Renaming/moving objects from one path to another are copy operations with 
> O(file) complexity and O(data) time followed by a series of deletes with 
> O(file) complexity. Failures at any point prior to completion will leave the 
> operation in an inconsistent state. The missing atomic rename semantic opens 
> opportunities for corruption and data loss, which may or may not be 
> repairable with HBCK.
> Handling this at the HBase level could be done with a new multi-step 
> filesystem transaction framework. Call it StoreCommitTransaction. 
> SplitTransaction and MergeTransaction are well established cases where even 
> on HDFS we have non-atomic filesystem changes and are our implementation 
> template for the new work. In this new StoreCommitTransaction we'd be moving 
> flush and compaction temporaries out of the temporary directory into the 
> region store directory. On HDFS the implementation would be easy. We can rely 
> on the filesystem's atomic rename semantics. On S3 it would be work: First we 
> would build the list of objects to move, then copy each object into the 
> destination, and then finally delete all objects at the original path. We 
> must handle transient errors with retry strategies appropriate for the action 
> at hand. We must handle serious or permanent errors where the RS doesn't need 
> to be aborted with a rollback that cleans it all up. Finally, we must handle 
> permanent errors where the RS must be aborted with a rollback during region 
> open/recovery. Note that after all objects have been copied and we are 
> deleting obsolete source objects we must roll forward, not back. To support 
> recovery after an abort we must utilize the WAL to track transaction 
> progress. Put markers in for StoreCommitTransaction start and completion 
> state, with details of the store file(s) involved, so it can be rolled back 
> during region recovery at open. This will be significant work in HFile, 
> HStore, flusher, compactor, and HRegion. Wherever we use HDFS's rename now we 
> would substitute the running of this new multi-step filesystem transaction.
> We need to determine this for certain, but I believe on S3 the PUT or 
> multipart upload of an object must complete before the object is visible, so 
> we don't have to worry about the case where an object is visible before fully 
> uploaded as part of normal operations. So an individual object copy will 
> either happen entirely and the target will then become visible, or it won't 
> and the target won't exist.
> S3 has an optimization, PUT COPY 
> (https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectCOPY.html), which 
> the AmazonClient embedded in S3A utilizes for moves. When designing the 
> StoreCommitTransaction 

[jira] [Commented] (HBASE-20226) Performance Improvement Taking Large Snapshots In Remote Filesystems

2018-03-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16411165#comment-16411165
 ] 

Steve Loughran commented on HBASE-20226:


Amazon throttle DELETE to the same shared, so speedup will be sublinear, even 
though the cost of a delete/bulk delete is low in terms of network traffic. 

f you are doing bulk deletes in > 1 it's probably best to do a bit of shuffling 
of the list of directories to delete before queuing the operations.


> Performance Improvement Taking Large Snapshots In Remote Filesystems
> 
>
> Key: HBASE-20226
> URL: https://issues.apache.org/jira/browse/HBASE-20226
> Project: HBase
>  Issue Type: Improvement
>  Components: snapshots
>Affects Versions: 1.4.0
> Environment: HBase 1.4.0 running on an AWS EMR cluster with the 
> hbase.rootdir set to point to a folder in S3 
>Reporter: Saad Mufti
>Priority: Minor
> Attachments: HBASE-20226..01.patch
>
>
> When taking a snapshot of any table, one of the last steps is to delete the 
> region manifests, which have already been rolled up into a larger overall 
> manifest and thus have redundant information.
> This proposal is to do the deletion in a thread pool bounded by 
> hbase.snapshot.thread.pool.max . For large tables with a lot of regions, the 
> current single threaded deletion is taking longer than all the rest of the 
> snapshot tasks when the Hbase data and the snapshot folder are both in a 
> remote filesystem like S3.
> I have a patch for this proposal almost ready and will submit it tomorrow for 
> feedback, although I haven't had a chance to write any tests yet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-20123) Backup test fails against hadoop 3

2018-03-05 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-20123?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16386757#comment-16386757
 ] 

Steve Loughran commented on HBASE-20123:


That looks like branch-2 stack trace; HADOOP-13626 changed 
CopyListingFileStatus to not be a subclass of FileStatus, instead explictly 
marshalling the permissions.

At the same time, that getSymlink() in readFields() call is a branch-3 
operation; it's in an assert at the end
{code}
assert (isDirectory() && getSymlink() == null) || !isDirectory();
{code}

I believe that assertion is wrong. It's assuming that getSymlink() returns null 
if there is no symlink, but instead it raises and exception.

And as its an assert(), it's only going to show up in JVMs with assert turned 
on.

I'd suggest that someone (you?) files a JIRA against Hadoop with a patch that 
changes the exception to something like 

{code}
assert (!(isDirectory() && isSymlink())
{code}

that is, you can't be both a dir and a symlink.




> Backup test fails against hadoop 3
> --
>
> Key: HBASE-20123
> URL: https://issues.apache.org/jira/browse/HBASE-20123
> Project: HBase
>  Issue Type: Bug
>Reporter: Ted Yu
>Priority: Major
>
> When running backup unit test against hadoop3, I saw:
> {code}
> [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 
> 88.862 s <<< FAILURE! - in 
> org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes
> [ERROR] 
> testBackupMultipleDeletes(org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes)
>   Time elapsed: 86.206 s  <<< ERROR!
> java.io.IOException: java.io.IOException: Failed copy from 
> hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to 
> hdfs://localhost:40578/backupUT
>   at 
> org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82)
> Caused by: java.io.IOException: Failed copy from 
> hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 to 
> hdfs://localhost:40578/backupUT
>   at 
> org.apache.hadoop.hbase.backup.TestBackupMultipleDeletes.testBackupMultipleDeletes(TestBackupMultipleDeletes.java:82)
> {code}
> In the test output, I found:
> {code}
> 2018-03-03 14:46:10,858 ERROR [Time-limited test] 
> mapreduce.MapReduceBackupCopyJob$BackupDistCp(237): java.io.IOException: Path 
> hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic 
> link
> java.io.IOException: Path 
> hdfs://localhost:40578/backupUT/.tmp/backup_1520088356047 is not a symbolic 
> link
>   at org.apache.hadoop.fs.FileStatus.getSymlink(FileStatus.java:338)
>   at org.apache.hadoop.fs.FileStatus.readFields(FileStatus.java:461)
>   at 
> org.apache.hadoop.tools.CopyListingFileStatus.readFields(CopyListingFileStatus.java:155)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.getCurrentValue(SequenceFile.java:2308)
>   at 
> org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:163)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:91)
>   at 
> org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
>   at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:84)
>   at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:382)
>   at 
> org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.createInputFileListing(MapReduceBackupCopyJob.java:297)
>   at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:181)
>   at org.apache.hadoop.tools.DistCp.execute(DistCp.java:153)
>   at 
> org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob$BackupDistCp.execute(MapReduceBackupCopyJob.java:196)
>   at org.apache.hadoop.tools.DistCp.run(DistCp.java:126)
>   at 
> org.apache.hadoop.hbase.backup.mapreduce.MapReduceBackupCopyJob.copy(MapReduceBackupCopyJob.java:408)
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.incrementalCopyHFiles(IncrementalTableBackupClient.java:348)
>   at 
> org.apache.hadoop.hbase.backup.impl.IncrementalTableBackupClient.execute(IncrementalTableBackupClient.java:290)
>   at 
> org.apache.hadoop.hbase.backup.impl.BackupAdminImpl.backupTables(BackupAdminImpl.java:605)
> {code}
> It seems the failure was related to how we use distcp.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-7608) Considering Java 8

2018-02-12 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7608?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16361341#comment-16361341
 ] 

Steve Loughran commented on HBASE-7608:
---

Is it time to close this as done/worksforme?

> Considering Java 8
> --
>
> Key: HBASE-7608
> URL: https://issues.apache.org/jira/browse/HBASE-7608
> Project: HBase
>  Issue Type: Umbrella
>Reporter: Andrew Purtell
>Priority: Trivial
>
> Musings (as subtasks) on experimental ideas for when JRE8 is a viable runtime.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-19289) CommonFSUtils$StreamLacksCapabilityException: hflush when running test against hadoop3 beta1

2017-12-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16283472#comment-16283472
 ] 

Steve Loughran commented on HBASE-19289:


What about giving the property some name to make clear its experimental/risky? 
""hbase.experimental.stream.capability.enforce.disabled"

Then if people set it, well, "told you so"


> CommonFSUtils$StreamLacksCapabilityException: hflush when running test 
> against hadoop3 beta1
> 
>
> Key: HBASE-19289
> URL: https://issues.apache.org/jira/browse/HBASE-19289
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Mike Drob
> Attachments: 19289.v1.txt, 19289.v2.txt, HBASE-19289.patch, 
> HBASE-19289.v2.patch
>
>
> As of commit d8fb10c8329b19223c91d3cda6ef149382ad4ea0 , I encountered the 
> following exception when running unit test against hadoop3 beta1:
> {code}
> testRefreshStoreFiles(org.apache.hadoop.hbase.regionserver.TestHStore)  Time 
> elapsed: 0.061 sec  <<< ERROR!
> java.io.IOException: cannot get log writer
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19289) CommonFSUtils$StreamLacksCapabilityException: hflush when running test against hadoop3 beta1

2017-12-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16280708#comment-16280708
 ] 

Steve Loughran commented on HBASE-19289:


looking at the patch

* it can take time to start and stop the serve, good to see you making this a 
class rule.
* failures in stop should be caught & logged, in case raising them could iding 
the underlying exception triggering a test failure.  (I don't know enough about 
custom rules here to know for sure, just based on test teardown method 
experience)

> CommonFSUtils$StreamLacksCapabilityException: hflush when running test 
> against hadoop3 beta1
> 
>
> Key: HBASE-19289
> URL: https://issues.apache.org/jira/browse/HBASE-19289
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
>Assignee: Mike Drob
> Attachments: 19289.v1.txt, 19289.v2.txt, HBASE-19289.patch
>
>
> As of commit d8fb10c8329b19223c91d3cda6ef149382ad4ea0 , I encountered the 
> following exception when running unit test against hadoop3 beta1:
> {code}
> testRefreshStoreFiles(org.apache.hadoop.hbase.regionserver.TestHStore)  Time 
> elapsed: 0.061 sec  <<< ERROR!
> java.io.IOException: cannot get log writer
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19289) CommonFSUtils$StreamLacksCapabilityException: hflush when running test against hadoop3 beta1

2017-11-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261454#comment-16261454
 ] 

Steve Loughran commented on HBASE-19289:


ps: Are people using HBase against file:// today? If so, they've not been 
getting the persistence/durability HBase needs. Tell them to stop it.



> CommonFSUtils$StreamLacksCapabilityException: hflush when running test 
> against hadoop3 beta1
> 
>
> Key: HBASE-19289
> URL: https://issues.apache.org/jira/browse/HBASE-19289
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
> Attachments: 19289.v1.txt, 19289.v2.txt
>
>
> As of commit d8fb10c8329b19223c91d3cda6ef149382ad4ea0 , I encountered the 
> following exception when running unit test against hadoop3 beta1:
> {code}
> testRefreshStoreFiles(org.apache.hadoop.hbase.regionserver.TestHStore)  Time 
> elapsed: 0.061 sec  <<< ERROR!
> java.io.IOException: cannot get log writer
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19289) CommonFSUtils$StreamLacksCapabilityException: hflush when running test against hadoop3 beta1

2017-11-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261449#comment-16261449
 ] 

Steve Loughran commented on HBASE-19289:


If people really want hbase -> file://

* they'd need a distributed file:// or some shared NFS server
* it'd presumably need its own RAID > 0 to do checksumming; so checksum fs is 
moot

I'd look at seeing whether checksumfs could actually bypass its checksum, say 
if we set the property "bytes per checksum == 0" as the secret no, turn me off" 
switch. But people would probably then use it for performance and then be upset 
when all their data got corrupted without anything noticing. It's too critical 
a layer under HDFS really.

I was thinking about what if we added a raw:// URL which bonded directly to raw 
local fs, but RawLocalFileSystem has an expectation that file:// is its schema 
and returns it in getURI(), so forcing you back to CheckummedFS

I believe the way to do this is
* subclass RawLocalFileSystem
* give it a new schema, like say "raw"
* have it remember its URI in initialize() and return it in getURI()
* register it (statically, dynamically)


> CommonFSUtils$StreamLacksCapabilityException: hflush when running test 
> against hadoop3 beta1
> 
>
> Key: HBASE-19289
> URL: https://issues.apache.org/jira/browse/HBASE-19289
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
> Attachments: 19289.v1.txt, 19289.v2.txt
>
>
> As of commit d8fb10c8329b19223c91d3cda6ef149382ad4ea0 , I encountered the 
> following exception when running unit test against hadoop3 beta1:
> {code}
> testRefreshStoreFiles(org.apache.hadoop.hbase.regionserver.TestHStore)  Time 
> elapsed: 0.061 sec  <<< ERROR!
> java.io.IOException: cannot get log writer
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-19289) CommonFSUtils$StreamLacksCapabilityException: hflush when running test against hadoop3 beta1

2017-11-19 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-19289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258446#comment-16258446
 ] 

Steve Loughran commented on HBASE-19289:


Closed HADOOP-15051 as wontfix. LocalFS output streams don't declare their 
support for hflush/sync for the following reason, as covered in HADOOP-13327 
(oustanding, reviews welcome)

h3. Output streams which do not implement the flush/persitence semantics of 
hflush/hsync MUST NOT declare that their streams have that capability.

LocalFileSystem is a subclass of ChecksumFileSystem; ChecksumFileSystem output 
streams don't implement hflush/hsync, therefore it's the correct behaviour in 
the Hadoop code.

If HBase requires the methods for the correct persistence of its data, then it 
cannot safely use localFS as destination of its output. It's check is therefore 
also the correct behavior

In which case, "expressly tell folks not to run HBase on top of 
LocalFileSystem," is the correct action on your part. People must not be using 
the local FS as a direct destination of HDFS output.

> CommonFSUtils$StreamLacksCapabilityException: hflush when running test 
> against hadoop3 beta1
> 
>
> Key: HBASE-19289
> URL: https://issues.apache.org/jira/browse/HBASE-19289
> Project: HBase
>  Issue Type: Test
>Reporter: Ted Yu
> Attachments: 19289.v1.txt
>
>
> As of commit d8fb10c8329b19223c91d3cda6ef149382ad4ea0 , I encountered the 
> following exception when running unit test against hadoop3 beta1:
> {code}
> testRefreshStoreFiles(org.apache.hadoop.hbase.regionserver.TestHStore)  Time 
> elapsed: 0.061 sec  <<< ERROR!
> java.io.IOException: cannot get log writer
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962)
> Caused by: 
> org.apache.hadoop.hbase.util.CommonFSUtils$StreamLacksCapabilityException: 
> hflush
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.initHRegion(TestHStore.java:215)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:220)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:195)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:190)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:185)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:179)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.init(TestHStore.java:173)
>   at 
> org.apache.hadoop.hbase.regionserver.TestHStore.testRefreshStoreFiles(TestHStore.java:962)
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-18784) Use of filesystem that requires hflush / hsync / append / etc should query outputstream capabilities

2017-09-20 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-18784?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173353#comment-16173353
 ] 

Steve Loughran commented on HBASE-18784:


{{StreamCapabilities}} is on Hadoop 2.9, so you can start planning it earlier. 
Also: I've been playing with using it for input stream capabilities too 
(CanUnbuffer, seek()), etc.

> Use of filesystem that requires hflush / hsync / append / etc should query 
> outputstream capabilities
> 
>
> Key: HBASE-18784
> URL: https://issues.apache.org/jira/browse/HBASE-18784
> Project: HBase
>  Issue Type: Improvement
>  Components: Filesystem Integration
>Affects Versions: 1.4.0, 2.0.0-alpha-2
>Reporter: Sean Busbey
>Assignee: Sean Busbey
>Priority: Blocker
> Fix For: 2.1.0, 1.5.0
>
>
> In places where we rely on the underlying filesystem holding up the promises 
> of hflush/hsync (most importantly the WAL), we should use the new interfaces 
> provided by HDFS-11644 to fail loudly when they are not present (e.g. on S3, 
> on EC mounts, etc).



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17125) Inconsistent result when use filter to read data

2017-06-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16057556#comment-16057556
 ] 

Steve Loughran commented on HBASE-17125:


bq. Then why don't you implement it by yourself? If you think it is easy, then 
please implement it.

Generally, as a committer it's actually more productive to nurture other 
developers into working towards what you believe to be the right answer than do 
it yourself. As well as sharing some of your unrealistic set of deliverables 
with others, you can be the reviewer to gets the stuff in, instead of having a 
patch you have chase other people to review. Long term: the more people you can 
you can get to collaborate helps the project all round.

No opinions on the patch, just making sure everyone works together on this. 
Thanks.




> Inconsistent result when use filter to read data
> 
>
> Key: HBASE-17125
> URL: https://issues.apache.org/jira/browse/HBASE-17125
> Project: HBase
>  Issue Type: Bug
>Reporter: Guanghao Zhang
>Assignee: Guanghao Zhang
>Priority: Critical
> Fix For: 2.0.0
>
> Attachments: example.diff, HBASE-17125.master.001.patch, 
> HBASE-17125.master.002.patch, HBASE-17125.master.002.patch, 
> HBASE-17125.master.003.patch, HBASE-17125.master.004.patch, 
> HBASE-17125.master.005.patch, HBASE-17125.master.006.patch, 
> HBASE-17125.master.007.patch, HBASE-17125.master.008.patch, 
> HBASE-17125.master.009.patch, HBASE-17125.master.009.patch, 
> HBASE-17125.master.010.patch, HBASE-17125.master.011.patch, 
> HBASE-17125.master.011.patch, HBASE-17125.master.no-specified-filter.patch
>
>
> Assume a cloumn's max versions is 3, then we write 4 versions of this column. 
> The oldest version doesn't remove immediately. But from the user view, the 
> oldest version has gone. When user use a filter to query, if the filter skip 
> a new version, then the oldest version will be seen again. But after compact 
> the region, then the oldest version will never been seen. So it is weird for 
> user. The query will get inconsistent result before and after region 
> compaction.
> The reason is matchColumn method of UserScanQueryMatcher. It first check the 
> cell by filter, then check the number of versions needed. So if the filter 
> skip the new version, then the oldest version will be seen again when it is 
> not removed.
> Have a discussion offline with [~Apache9] and [~fenghh], now we have two 
> solution for this problem. The first idea is check the number of versions 
> first, then check the cell by filter. As the comment of setFilter, the filter 
> is called after all tests for ttl, column match, deletes and max versions 
> have been run.
> {code}
>   /**
>* Apply the specified server-side filter when performing the Query.
>* Only {@link Filter#filterKeyValue(Cell)} is called AFTER all tests
>* for ttl, column match, deletes and max versions have been run.
>* @param filter filter to run on the server
>* @return this for invocation chaining
>*/
>   public Query setFilter(Filter filter) {
> this.filter = filter;
> return this;
>   }
> {code}
> But this idea has another problem, if a column's max version is 5 and the 
> user query only need 3 versions. It first check the version's number, then 
> check the cell by filter. So the cells number of the result may less than 3. 
> But there are 2 versions which don't read anymore.
> So the second idea has three steps.
> 1. check by the max versions of this column
> 2. check the kv by filter
> 3. check the versions which user need.
> But this will lead the ScanQueryMatcher more complicated. And this will break 
> the javadoc of Query.setFilter.
> Now we don't have a final solution for this problem. Suggestions are welcomed.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-17878) java.lang.NoSuchMethodError: org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter when starting HBase with hbase.rootdir on S3

2017-05-02 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993396#comment-15993396
 ] 

Steve Loughran commented on HBASE-17878:


AWS SDK uses Joda time to generate authentication strings; due to changes in 
JVMs, it needs a version >= 2.8.1., but should be fairly forgiving as to which 
version after that.

Hadoop trunk has switched to a fully-shaded version of the AWS SDK, more to 
deal with jackson versions rather than Joda Time, but again, it may work here. 
That is still stabilising: adding 50MB of .class has its own unexpected 
consequences

> java.lang.NoSuchMethodError: 
> org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter
>  when starting HBase with hbase.rootdir on S3
> -
>
> Key: HBASE-17878
> URL: https://issues.apache.org/jira/browse/HBASE-17878
> Project: HBase
>  Issue Type: Bug
>  Components: master
>Reporter: Xiang Li
>Assignee: Xiang Li
>Priority: Minor
> Fix For: 2.0.0
>
> Attachments: HBASE-17878.master.000.patch, jruby-core-dep-tree.txt
>
>
> When setting up HBASE-17437 (Support specifying a WAL directory outside of 
> the root directory), we specify
> (1) hbase.rootdir on s3a
> (2) hbase.wal.dir on HDFS
> When starting HBase, the following exception is thrown:
> {code}
> Caused by: java.lang.NoSuchMethodError: 
> org.joda.time.format.DateTimeFormatter.withZoneUTC()Lorg/joda/time/format/DateTimeFormatter;
> at 
> com.amazonaws.auth.internal.AWS4SignerUtils.(AWS4SignerUtils.java:26)
> at 
> com.amazonaws.auth.internal.AWS4SignerRequestParams.(AWS4SignerRequestParams.java:85)
> at com.amazonaws.auth.AWS4Signer.sign(AWS4Signer.java:184)
> at 
> com.amazonaws.http.AmazonHttpClient.executeOneRequest(AmazonHttpClient.java:709)
> at 
> com.amazonaws.http.AmazonHttpClient.executeHelper(AmazonHttpClient.java:489)
> at 
> com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:310)
> at 
> com.amazonaws.services.s3.AmazonS3Client.invoke(AmazonS3Client.java:3785)
> at 
> com.amazonaws.services.s3.AmazonS3Client.headBucket(AmazonS3Client.java:1107)
> at 
> com.amazonaws.services.s3.AmazonS3Client.doesBucketExist(AmazonS3Client.java:1070)
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:232)
> at 
> org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2669)
> at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:94)
> at 
> org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2703)
> at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2685)
> at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:373)
> at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
> at org.apache.hadoop.hbase.util.FSUtils.getRootDir(FSUtils.java:1007)
> at 
> org.apache.hadoop.hbase.util.FSUtils.isValidWALRootDir(FSUtils.java:1050)
> at 
> org.apache.hadoop.hbase.util.FSUtils.getWALRootDir(FSUtils.java:1032)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.initializeFileSystem(HRegionServer.java:627)
> at 
> org.apache.hadoop.hbase.regionserver.HRegionServer.(HRegionServer.java:570)
> at org.apache.hadoop.hbase.master.HMaster.(HMaster.java:393)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at 
> org.apache.hadoop.hbase.master.HMaster.constructMaster(HMaster.java:2456)
> ... 5 more
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-17877) Replace/improve HBase's byte[] comparator

2017-04-18 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-17877?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15973578#comment-15973578
 ] 

Steve Loughran commented on HBASE-17877:


bq Also while looking at some UnsignedBytes function for the guava version we 
were using I noticed that guava v14.0.1 also uses this implementation so 
probably hadoop borrowed it from there ..

I was about to say "no it doesn't", but then I saw the comment at the top, 
"This is borrowed and slightly modified from Guava's". Been in there a long 
time though (since 2012), so it's history is lost. x86 pert is what really 
matters, though we don't want to be pathologically antisocial to the other CPU 
arches, not just for the sake of PPC, but for when Arm goes mainstream in the 
DC.



> Replace/improve HBase's byte[] comparator
> -
>
> Key: HBASE-17877
> URL: https://issues.apache.org/jira/browse/HBASE-17877
> Project: HBase
>  Issue Type: Bug
>Reporter: Lars Hofhansl
>Assignee: Vikas Vishwakarma
> Attachments: 17877-1.2.patch, 17877-v2-1.3.patch, 17877-v3-1.3.patch, 
> 17877-v4-1.3.patch, ByteComparatorJiraHBASE-17877.pdf, 
> HBASE-17877.branch-1.3.001.patch, HBASE-17877.branch-1.3.002.patch, 
> HBASE-17877.master.001.patch, HBASE-17877.master.002.patch
>
>
> [~vik.karma] did some extensive tests and found that Hadoop's version is 
> faster - dramatically faster in some cases.
> Patch forthcoming.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (HBASE-11045) Replace deprecated method FileSystem#createNonRecursive

2016-05-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299781#comment-15299781
 ] 

Steve Loughran commented on HBASE-11045:


BTW, given that HDFS has un-deprecated {{createNonRecursive()}}, what about 
closing this as a WONTFIX?

> Replace deprecated method FileSystem#createNonRecursive
> ---
>
> Key: HBASE-11045
> URL: https://issues.apache.org/jira/browse/HBASE-11045
> Project: HBase
>  Issue Type: Task
>Reporter: Gustavo Anatoly
>Assignee: Gustavo Anatoly
>Priority: Minor
> Fix For: 2.0.0
>
>
> This change affect directly ProtobufLogWriter#init() associated to 
> TestHLog#testFailedToCreateHLogIfParentRenamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11045) Replace deprecated method FileSystem#createNonRecursive

2016-05-25 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299778#comment-15299778
 ] 

Steve Loughran commented on HBASE-11045:


you *cannot* use swift instead of HDFS. It isn't a real filesystem and things 
will fail dramatically —even if this method was implemented, there are too many 
other differences. The fact that your attempt is failing this early on, while 
frustrating, stops you getting deeper into trouble. Sorry. Note that you can't 
use S3 either, same problem.


see: [Object stores vs 
filesystems](http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/filesystem/introduction.html).


> Replace deprecated method FileSystem#createNonRecursive
> ---
>
> Key: HBASE-11045
> URL: https://issues.apache.org/jira/browse/HBASE-11045
> Project: HBase
>  Issue Type: Task
>Reporter: Gustavo Anatoly
>Assignee: Gustavo Anatoly
>Priority: Minor
> Fix For: 2.0.0
>
>
> This change affect directly ProtobufLogWriter#init() associated to 
> TestHLog#testFailedToCreateHLogIfParentRenamed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14648186#comment-14648186
 ] 

Steve Loughran commented on HBASE-13992:


Coverage is an odd metric anyway, because as well as code there's state 
coverage : ipv6, windows, timezone=GMT0, locale=turkish, which can break things 
even in code which nominally had 100%. Having tests which generate failure 
conditions (done here) with test setups that explore the configuration space 
are about the best you can get.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-28 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14644580#comment-14644580
 ] 

Steve Loughran commented on HBASE-13992:


LGTM : I only worry about testability, and that's a good start. More tests will 
no doubt come over time ... something in Bigtop would be good for the 
integration

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.12.patch, HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14643322#comment-14643322
 ] 

Steve Loughran commented on HBASE-13992:


New tests look good.

# its probably best to put artifact versions in the root pom.xml, not the spark 
one, for one single place for dependencies ... this will matter if 1 scala 
module goes in.

# {{HBaseDStreamFunctionsSuite.scala }} has the wrong assumption. 
{code}
{{assert(foo5.equals(bar), foo4 + !=bar)}}
{code}

Scalatest lets you use assertResult instead, for an auto-generated message

{code}
assertResult(bar) { foo5 } 
{code}

And you can use {{==}} for a slightly less informative error message, but one 
which still includes the values on either side




 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.10.patch, HBASE-13992.11.patch, 
 HBASE-13992.5.patch, HBASE-13992.6.patch, HBASE-13992.7.patch, 
 HBASE-13992.8.patch, HBASE-13992.9.patch, HBASE-13992.patch, 
 HBASE-13992.patch.3, HBASE-13992.patch.4, HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639697#comment-14639697
 ] 

Steve Loughran commented on HBASE-13992:


Were a project I was a committer on, I'd be mandating the failure tests, as 
they are the tests most likely to break things. As I'm not an HBase committer, 
I will leave the opinions to others. At the very least, there needs to be a 
followup JIRA for the extra tests.

As ted notes, they should just throw the standard exceptions.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13992) Integrate SparkOnHBase into HBase

2015-07-23 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14639357#comment-14639357
 ] 

Steve Loughran commented on HBASE-13992:


There's not much in the way of tests here, in particular, not much in the way 
of generation of failure conditions and validation of outcome

Ideally, there'd be one test to generate each failure condition: the exception 
handling including those which downgrade a failure to a log message...the test 
should verify that such actions are the correct response.

At the very least, I'd recommend

# test against non-existent database
# attempt to work with a table that doesn't exist
# attempt to read a column that doesn't exist


I'd also make sure test teardown is robust, catching exceptions  downgrading 
to logs. That way, if something didn't get set up properly, the root cause of 
the failure isn't hidden by any exception generated in teardown.

 Integrate SparkOnHBase into HBase
 -

 Key: HBASE-13992
 URL: https://issues.apache.org/jira/browse/HBASE-13992
 Project: HBase
  Issue Type: New Feature
  Components: spark
Reporter: Ted Malaska
Assignee: Ted Malaska
 Fix For: 2.0.0

 Attachments: HBASE-13992.5.patch, HBASE-13992.6.patch, 
 HBASE-13992.7.patch, HBASE-13992.8.patch, HBASE-13992.9.patch, 
 HBASE-13992.patch, HBASE-13992.patch.3, HBASE-13992.patch.4, 
 HBASE-13992.patch.5


 This Jira is to ask if SparkOnHBase can find a home in side HBase core.
 Here is the github: 
 https://github.com/cloudera-labs/SparkOnHBase
 I am the core author of this project and the license is Apache 2.0
 A blog explaining this project is here
 http://blog.cloudera.com/blog/2014/12/new-in-cloudera-labs-sparkonhbase/
 A spark Streaming example is here
 http://blog.cloudera.com/blog/2014/11/how-to-do-near-real-time-sessionization-with-spark-streaming-and-apache-hadoop/
 A real customer using this in produce is blogged here
 http://blog.cloudera.com/blog/2015/03/how-edmunds-com-used-spark-streaming-to-build-a-near-real-time-dashboard/
 Please debate and let me know what I can do to make this happen.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12006) [JDK 8] KeyStoreTestUtil#generateCertificate fails due to subject class type invalid

2014-09-24 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12006?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14146231#comment-14146231
 ] 

Steve Loughran commented on HBASE-12006:


{{sun.security.x509}} will go away in Java 9. This means the test utils may 
need some more work. 

 [JDK 8] KeyStoreTestUtil#generateCertificate fails due to subject class type 
 invalid
 --

 Key: HBASE-12006
 URL: https://issues.apache.org/jira/browse/HBASE-12006
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.99.0, 2.0.0
Reporter: Andrew Purtell
Assignee: Andrew Purtell
Priority: Minor

 Running tests on Java 8. All unit tests for branch 0.98 pass. On master 
 branch some variation in the security API is causing a failure in 
 TestSSLHttpServer:
 {noformat}
 Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.181 sec  
 FAILURE! - in org.apache.hadoop.hbase.http.TestSSLHttpServer
 org.apache.hadoop.hbase.http.TestSSLHttpServer  Time elapsed: 0.181 sec   
 ERROR!
 java.security.cert.CertificateException: Subject class type invalid.
   at sun.security.x509.X509CertInfo.setSubject(X509CertInfo.java:888)
   at sun.security.x509.X509CertInfo.set(X509CertInfo.java:415)
   at 
 org.apache.hadoop.hbase.http.ssl.KeyStoreTestUtil.generateCertificate(KeyStoreTestUtil.java:94)
   at 
 org.apache.hadoop.hbase.http.ssl.KeyStoreTestUtil.setupSSLConfig(KeyStoreTestUtil.java:246)
   at 
 org.apache.hadoop.hbase.http.TestSSLHttpServer.setup(TestSSLHttpServer.java:72)
 org.apache.hadoop.hbase.http.TestSSLHttpServer  Time elapsed: 0.181 sec   
 ERROR!
 java.lang.NullPointerException: null
   at 
 org.apache.hadoop.hbase.http.TestSSLHttpServer.cleanup(TestSSLHttpServer.java:100)
 Tests in error: 
   TestSSLHttpServer.setup:72 » Certificate Subject class type invalid.
   TestSSLHttpServer.cleanup:100 NullPointer
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11045) Replace deprecated method FileSystem#createNonRecursive

2014-04-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976573#comment-13976573
 ] 

Steve Loughran commented on HBASE-11045:


{{FileSystem#createNonRecursive()}} isn't implemented by many filesystems, 
using it would run the risk of hitting implementations that don't.

Is there any barrier to using a check for the parent dir existing before 
calling create? That's essentially what most filesystems would end up doing

{code}
createNoRecursive(Filesystem fs, Path p) {
  if (!fs.exists(p.parent()) throw FileNotFoundException(p.parent())
  fs.create(p)
}
{code}

It's not atomic, but if you look closely at the source, it's not atomic in most 
FS implementations anyway, including native (mkdirs() isn't atomic there).

 Replace deprecated method FileSystem#createNonRecursive
 ---

 Key: HBASE-11045
 URL: https://issues.apache.org/jira/browse/HBASE-11045
 Project: HBase
  Issue Type: Task
Reporter: Gustavo Anatoly
Assignee: Gustavo Anatoly
Priority: Minor
 Fix For: 0.99.0


 This change affect directly ProtobufLogWriter#init() associated to 
 TestHLog#testFailedToCreateHLogIfParentRenamed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11045) Replace deprecated method FileSystem#createNonRecursive

2014-04-22 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977194#comment-13977194
 ] 

Steve Loughran commented on HBASE-11045:


Personally, I wouldn't have had {{create()}} create any parent directories at 
all, leave it to the responsibility of the caller, but for reasons of history, 
that's not the case...

 Replace deprecated method FileSystem#createNonRecursive
 ---

 Key: HBASE-11045
 URL: https://issues.apache.org/jira/browse/HBASE-11045
 Project: HBase
  Issue Type: Task
Reporter: Gustavo Anatoly
Assignee: Gustavo Anatoly
Priority: Minor
 Fix For: 0.99.0


 This change affect directly ProtobufLogWriter#init() associated to 
 TestHLog#testFailedToCreateHLogIfParentRenamed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10444) NPE seen in logs at tail of fatal shutdown

2014-01-31 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887589#comment-13887589
 ] 

Steve Loughran commented on HBASE-10444:


commit #50f5a7a, by the look of things, unless I've accidentally been using an 
older version

 NPE seen in logs at tail of fatal shutdown
 --

 Key: HBASE-10444
 URL: https://issues.apache.org/jira/browse/HBASE-10444
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
 Environment: in 0.98.0 RC1
Reporter: Steve Loughran
Priority: Minor

 hbase RS logs show an NPE in shutdown; no other info
 {code}
 14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186
 Exception in thread regionserver57186 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
   at java.lang.Thread.run(Thread.java:744)
 14/01/30 14:18:25 ERROR regionserver.HRegionServerCommand
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10444) NPE seen in logs at tail of fatal shutdown

2014-01-31 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13887668#comment-13887668
 ] 

Steve Loughran commented on HBASE-10444:


I may have unintentionally deployed 0.96.0 instead; propose closing as cannot 
reproduce until I can see it again?

 NPE seen in logs at tail of fatal shutdown
 --

 Key: HBASE-10444
 URL: https://issues.apache.org/jira/browse/HBASE-10444
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
 Environment: in 0.98.0 RC1
Reporter: Steve Loughran
Priority: Minor

 hbase RS logs show an NPE in shutdown; no other info
 {code}
 14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186
 Exception in thread regionserver57186 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
   at java.lang.Thread.run(Thread.java:744)
 14/01/30 14:18:25 ERROR regionserver.HRegionServerCommand
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HBASE-10444) NPE seen in logs at tail of fatal shutdown

2014-01-30 Thread Steve Loughran (JIRA)
Steve Loughran created HBASE-10444:
--

 Summary: NPE seen in logs at tail of fatal shutdown
 Key: HBASE-10444
 URL: https://issues.apache.org/jira/browse/HBASE-10444
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
 Environment: in 0.98.0 RC1
Reporter: Steve Loughran
Priority: Minor


hbase RS logs show an NPE in shutdown; no other info

{code}
14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186
Exception in thread regionserver57186 java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
at java.lang.Thread.run(Thread.java:744)
14/01/30 14:18:25 ERROR regionserver.HRegionServerCommand
{code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10444) NPE seen in logs at tail of fatal shutdown

2014-01-30 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13886619#comment-13886619
 ] 

Steve Loughran commented on HBASE-10444:


more logs
{code}
14/01/30 14:18:22 FATAL regionserver.HRegionServer: RegionServer abort: loaded 
coprocessors are: []
14/01/30 14:18:22 INFO regionserver.HRegionServer: STOPPED: Unexpected 
exception during initialization, aborting
14/01/30 14:18:23 INFO zookeeper.ClientCnxn: Opening socket connection to 
server ubuntu/192.168.1.132:2181. Will not attempt to authenticate using SASL 
(unknown error)
14/01/30 14:18:23 WARN zookeeper.ClientCnxn: Session 0x0 for server null, 
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/01/30 14:18:24 INFO zookeeper.ClientCnxn: Opening socket connection to 
server ubuntu/192.168.1.132:2181. Will not attempt to authenticate using SASL 
(unknown error)
14/01/30 14:18:24 WARN zookeeper.ClientCnxn: Session 0x0 for server null, 
unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at 
org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186
14/01/30 14:18:25 FATAL regionserver.HRegionServer: ABORTING region server 
ubuntu,57186,1391091486549: Initialization of RS failed.  Hence aborting RS.
java.io.IOException: Received the shutdown message while waiting.
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.blockAndCheckIfStopped(HRegionServer.java:757)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.initializeZooKeeper(HRegionServer.java:706)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.preRegistrationInitialization(HRegionServer.java:678)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:806)
at java.lang.Thread.run(Thread.java:744)
14/01/30 14:18:25 FATAL regionserver.HRegionServer: RegionServer abort: loaded 
coprocessors are: []
14/01/30 14:18:25 INFO regionserver.HRegionServer: STOPPED: Initialization of 
RS failed.  Hence aborting RS.
14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186
Exception in thread regionserver57186 java.lang.NullPointerException
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
at java.lang.Thread.run(Thread.java:744)
14/01/30 14:18:25 ERROR regionserver.HRegionServerCommandLine: Region server 
exiting
java.lang.RuntimeException: HRegionServer Aborted
at 
org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:66)
at 
org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:85)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
at 
org.apache.hadoop.hbase.util.ServerCommandLine.doMain(ServerCommandLine.java:126)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.main(HRegionServer.java:2336)
14/01/30 14:18:25 INFO regionserver.ShutdownHook: Shutdown hook starting; 
hbase.shutdown.hook=true; 
fsShutdownHook=org.apache.hadoop.fs.FileSystem$Cache$ClientFinalizer@1407408
14/01/30 14:18:25 INFO regionserver.HRegionServer: STOPPED: Shutdown hook
14/01/30 14:18:25 INFO regionserver.ShutdownHook: Starting fs shutdown hook 
thread.
14/01/30 14:18:25 INFO regionserver.ShutdownHook: Shutdown hook finished.
{code}

 NPE seen in logs at tail of fatal shutdown
 --

 Key: HBASE-10444
 URL: https://issues.apache.org/jira/browse/HBASE-10444
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0
 Environment: in 0.98.0 RC1
Reporter: Steve Loughran
Priority: Minor

 hbase RS logs show an NPE in shutdown; no other info
 {code}
 14/01/30 14:18:25 INFO ipc.RpcServer: Stopping server on 57186
 Exception in thread regionserver57186 java.lang.NullPointerException
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:897)
   at java.lang.Thread.run(Thread.java:744)
 14/01/30 14:18:25 ERROR regionserver.HRegionServerCommand
 {code}



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency

2014-01-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866467#comment-13866467
 ] 

Steve Loughran commented on HBASE-10296:


The google chubby paper goes into some detail about why they implemented a 
Paxos Service and not a paxos library.

yet perhaps you could persuade the ZK team to rework the code enough that you 
could reuse it independently of ZK.

Implementing a consensus protocol is surprisingly hard as you have to 
# understand Paxos
# implement it
# prove that your implementation is correct

Unit tests are not enough -talk to the ZK team about what they had to do to 
show that it works

 Replace ZK with a paxos running within master processes to provide better 
 master failover performance and state consistency
 ---

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency

2014-01-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13866846#comment-13866846
 ] 

Steve Loughran commented on HBASE-10296:


..but that ZK path is used to find the hbase master even if it moves round a 
cluster -what would happen there?

 Replace ZK with a paxos running within master processes to provide better 
 master failover performance and state consistency
 ---

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10296) Replace ZK with a paxos running within master processes to provide better master failover performance and state consistency

2014-01-08 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13865230#comment-13865230
 ] 

Steve Loughran commented on HBASE-10296:


One aspect of ZK that is worth remembering is that it lets other apps keep an 
eye on what is going on

 Replace ZK with a paxos running within master processes to provide better 
 master failover performance and state consistency
 ---

 Key: HBASE-10296
 URL: https://issues.apache.org/jira/browse/HBASE-10296
 Project: HBase
  Issue Type: Brainstorming
  Components: master, Region Assignment, regionserver
Reporter: Feng Honghua

 Currently master relies on ZK to elect active master, monitor liveness and 
 store almost all of its states, such as region states, table info, 
 replication info and so on. And zk also plays as a channel for 
 master-regionserver communication(such as in region assigning) and 
 client-regionserver communication(such as replication state/behavior change). 
 But zk as a communication channel is fragile due to its one-time watch and 
 asynchronous notification mechanism which together can leads to missed 
 events(hence missed messages), for example the master must rely on the state 
 transition logic's idempotence to maintain the region assigning state 
 machine's correctness, actually almost all of the most tricky inconsistency 
 issues can trace back their root cause to the fragility of zk as a 
 communication channel.
 Replace zk with paxos running within master processes have following benefits:
 1. better master failover performance: all master, either the active or the 
 standby ones, have the same latest states in memory(except lag ones but which 
 can eventually catch up later on). whenever the active master dies, the newly 
 elected active master can immediately play its role without such failover 
 work as building its in-memory states by consulting meta-table and zk.
 2. better state consistency: master's in-memory states are the only truth 
 about the system,which can eliminate inconsistency from the very beginning. 
 and though the states are contained by all masters, paxos guarantees they are 
 identical at any time.
 3. more direct and simple communication pattern: client changes state by 
 sending requests to master, master and regionserver talk directly to each 
 other by sending request and response...all don't bother to using a 
 third-party storage like zk which can introduce more uncertainty, worse 
 latency and more complexity.
 4. zk can only be used as liveness monitoring for determining if a 
 regionserver is dead, and later on we can eliminate zk totally when we build 
 heartbeat between master and regionserver.
 I know this might looks like a very crazy re-architect, but it deserves deep 
 thinking and serious discussion for it, right?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-11-27 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13833639#comment-13833639
 ] 

Steve Loughran commented on HBASE-9892:
---

I had a quick look and while the intricacies of HBase code escape me , it looks 
like the Masters get the port info via ZK. Does this propagate as far as the 
hbase status data you get with {{HBaseAdmin.getClusterStatus()}}? That's where 
I need to pick it up from -along with the infoserver port of the master itself

-steve

 Add info port to ServerName to support multi instances in a node
 

 Key: HBASE-9892
 URL: https://issues.apache.org/jira/browse/HBASE-9892
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, 
 HBASE-9892-0.94-v3.diff, HBASE-9892-0.94-v4.diff


 The full GC time of  regionserver with big heap( 30G ) usually  can not be 
 controlled in 30s. At the same time, the servers with 64G memory are normal. 
 So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
 each rs is about 20G ~ 24G.
 Most of the things works fine, except the hbase web ui. The master get the RS 
 info port from conf, which is suitable for this situation of multi rs  
 instances in a node. So we add info port to ServerName.
 a. at the startup, rs report it's info port to Hmaster.
 b, For root region, rs write the servername with info port ro the zookeeper 
 root-region-server node.
 c, For meta regions, rs write the servername with info port to root region 
 d. For user regions,  rs write the servername with info port to meta regions 
 So hmaster and client can get info port from the servername.
 To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
 we can test it in standalone mode,
 I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
 how Hoya handle this problem?
 PS: There are  different formats for servername in zk node and meta table, i 
 think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9892) Add info port to ServerName to support multi instances in a node

2013-11-06 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13815174#comment-13815174
 ] 

Steve Loughran commented on HBASE-9892:
---

As Enis says, currently we know the problem is there but don't try to fix it. 

The issue we have there is not just if YARN assigns 1 region server to the 
same node (it doesn't currently support anti-affinity in allocation requests), 
but that someone else may be running their own application, HBase or otherwise, 
on the same machine. If you hard code a port it can fail -any port. The sole 
advantage we have is that will trigger a new container request/review

Because this also affects the masters, we have to leave that UI at port 0 too 
-which is the worst issue. I would really like to get hold of that via ZK, from 
where we can bootstrap the rest of the cluster information 

 Add info port to ServerName to support multi instances in a node
 

 Key: HBASE-9892
 URL: https://issues.apache.org/jira/browse/HBASE-9892
 Project: HBase
  Issue Type: Improvement
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Attachments: HBASE-9892-0.94-v1.diff, HBASE-9892-0.94-v2.diff, 
 HBASE-9892-0.94-v3.diff


 The full GC time of  regionserver with big heap( 30G ) usually  can not be 
 controlled in 30s. At the same time, the servers with 64G memory are normal. 
 So we try to deploy multi rs instances(2-3 ) in a single node and the heap of 
 each rs is about 20G ~ 24G.
 Most of the things works fine, except the hbase web ui. The master get the RS 
 info port from conf, which is suitable for this situation of multi rs  
 instances in a node. So we add info port to ServerName.
 a. at the startup, rs report it's info port to Hmaster.
 b, For root region, rs write the servername with info port ro the zookeeper 
 root-region-server node.
 c, For meta regions, rs write the servername with info port to root region 
 d. For user regions,  rs write the servername with info port to meta regions 
 So hmaster and client can get info port from the servername.
 To test this feature, I change the rs num from 1 to 3 in standalone mode, so 
 we can test it in standalone mode,
 I think Hoya(hbase on yarn) will encounter the same problem.  Anyone knows 
 how Hoya handle this problem?
 PS: There are  different formats for servername in zk node and meta table, i 
 think we need to unify it and refactor the code.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9802) A new failover test framework for HBase

2013-10-18 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13799098#comment-13799098
 ] 

Steve Loughran commented on HBASE-9802:
---

This sounds interesting and potentially very useful beyond just HBase. Hadoop 
YARN applications are the obvious target, as they need to be written to expect 
failure, and if they don't get tested, well, they won't work. I ended up doing 
some basics of this with ssh and reboot operations, but I really wanted 
something that could talk to an open WRT base station and actually generate 
real network partitions, rather than just simulations. 

# Accumulo has something similar, though I've not seen it
# would it be possible to make this more generic? Even if starts off in HBase, 
it could be good to have the option of branching off into its own project -and 
to allow people downstream to use it even earlier.

I'd propose making the core test framework a module that could be picked up and 
used downstream, precisely to get that cross-application testing

 A new failover test framework for HBase
 ---

 Key: HBASE-9802
 URL: https://issues.apache.org/jira/browse/HBASE-9802
 Project: HBase
  Issue Type: Improvement
  Components: test
Affects Versions: 0.94.3
Reporter: chendihao
Priority: Minor

 Currently HBase uses ChaosMonkey for IT test and fault injection. It will 
 restart regionserver, force balancer and perform other actions randomly and 
 periodically. However, we need a more extensible and full-featured framework 
 for our failover test and we find ChaosMonkey cant' suit our needs since it 
 has the following drawbacks.
 1) Only process-level actions can be simulated, not support 
 machine-level/hardware-level/network-level actions.
 2) No data validation before and after the test, the fatal bugs such as that 
 can cause data inconsistent may be overlook.
 3) When failure occurs, we can't repro the problem and hard to figure out the 
 reason.
 Therefore, we have developed a new framework to satisfy the need of failover 
 test. We extended ChaosMonkey and implement the function to validate data and 
 to replay failed actions. Here are the features we add.
 1) Policy/Task/Action abstraction, seperating Task from Policy and Action 
 makes it easier to manage and replay a set of actions.
 2) Make action configurable. We have implemented some actions to cause 
 machine failure and defined the same interface as original actions.
 3) We should validate the date consistent before and after failover test to 
 ensure the availability and data correctness.
 4) After performing a set of actions, we also check the consistency of table 
 as well.
 5) The set of actions that caused test failure can be replayed, and the 
 reproducibility of actions can help fixing the exposed bugs.
 Our team has developed this framework and run for a while. Some bugs were 
 exposed and fixed by running this test framework. Moreover, we have a monitor 
 program which shows the progress of failover test and make sure our cluster 
 is as stable as we want. Now we are trying to make it more general and will 
 opensource it later.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Resolved] (HBASE-9545) NPE when trying to get cluster status on an hbase cluster that isn't there

2013-09-17 Thread Steve Loughran (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HBASE-9545.
---

Resolution: Duplicate

 NPE when trying to get cluster status on an hbase cluster that isn't there
 --

 Key: HBASE-9545
 URL: https://issues.apache.org/jira/browse/HBASE-9545
 Project: HBase
  Issue Type: Bug
  Components: Client
 Environment: 0-95.3 snapshot, commit 943bffc
Reporter: Steve Loughran
Priority: Minor

 As part of some fault injection testing, I'm trying to talk to an 
 HBaseCluster that isn't there, opening a connection and expecting things to 
 fail. It turns out you can create an {{HBaseAdmin}} instance, but when you 
 ask for its cluster status the NPE surfaces

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9545) NPE when trying to get cluster status on an hbase cluster that isn't there

2013-09-17 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13769346#comment-13769346
 ] 

Steve Loughran commented on HBASE-9545:
---

you are right -it goes away on trunk. Marking as duplicate

 NPE when trying to get cluster status on an hbase cluster that isn't there
 --

 Key: HBASE-9545
 URL: https://issues.apache.org/jira/browse/HBASE-9545
 Project: HBase
  Issue Type: Bug
  Components: Client
 Environment: 0-95.3 snapshot, commit 943bffc
Reporter: Steve Loughran
Priority: Minor

 As part of some fault injection testing, I'm trying to talk to an 
 HBaseCluster that isn't there, opening a connection and expecting things to 
 fail. It turns out you can create an {{HBaseAdmin}} instance, but when you 
 ask for its cluster status the NPE surfaces

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9545) NPE when trying to get cluster status on an hbase cluster that isn't there

2013-09-16 Thread Steve Loughran (JIRA)
Steve Loughran created HBASE-9545:
-

 Summary: NPE when trying to get cluster status on an hbase cluster 
that isn't there
 Key: HBASE-9545
 URL: https://issues.apache.org/jira/browse/HBASE-9545
 Project: HBase
  Issue Type: Bug
  Components: Client
 Environment: 0-95.3 snapshot, commit 943bffc
Reporter: Steve Loughran
Priority: Minor


As part of some fault injection testing, I'm trying to talk to an HBaseCluster 
that isn't there, opening a connection and expecting things to fail. It turns 
out you can create an {{HBaseAdmin}} instance, but when you ask for its cluster 
status the NPE surfaces



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9545) NPE when trying to get cluster status on an hbase cluster that isn't there

2013-09-16 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13768522#comment-13768522
 ] 

Steve Loughran commented on HBASE-9545:
---

Stack trace
{code}

java.lang.NullPointerException
at 
org.apache.hadoop.hbase.client.HBaseAdmin$MasterMonitorCallable.close(HBaseAdmin.java:3053)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.executeCallable(HBaseAdmin.java:3089)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.getClusterStatus(HBaseAdmin.java:2081)
at 
org.apache.hadoop.hoya.yarn.cluster.failures.TestKilledAM.testKilledAM(TestKilledAM.groovy:84)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at 
org.junit.internal.runners.statements.FailOnTimeout$StatementThread.run(FailOnTimeout.java:74)

{code}

It looks like {{MasterMonitorCallable.close()}} assumes its {{masterMonitor}} 
field is never null, but if there is no connection,that isn't true. The close() 
operation should be made a bit more robust, so as not to hide the underlying 
RPC failures I expect to see

 NPE when trying to get cluster status on an hbase cluster that isn't there
 --

 Key: HBASE-9545
 URL: https://issues.apache.org/jira/browse/HBASE-9545
 Project: HBase
  Issue Type: Bug
  Components: Client
 Environment: 0-95.3 snapshot, commit 943bffc
Reporter: Steve Loughran
Priority: Minor

 As part of some fault injection testing, I'm trying to talk to an 
 HBaseCluster that isn't there, opening a connection and expecting things to 
 fail. It turns out you can create an {{HBaseAdmin}} instance, but when you 
 ask for its cluster status the NPE surfaces

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9294) NPE in /rs-status during RS shutdown

2013-08-21 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13747055#comment-13747055
 ] 

Steve Loughran commented on HBASE-9294:
---

{code}
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmplImpl.renderNoFlush(RSStatusTmplImpl.java:163)
at 
org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmpl.renderNoFlush(RSStatusTmpl.java:172)
at 
org.apache.hadoop.hbase.tmpl.regionserver.RSStatusTmpl.render(RSStatusTmpl.java:163)
at 
org.apache.hadoop.hbase.regionserver.RSStatusServlet.doGet(RSStatusServlet.java:49)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
at 
org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at 
org.apache.hadoop.http.lib.StaticUserWebFilter$StaticUserFilter.doFilter(StaticUserWebFilter.java:109)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:1077)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.apache.hadoop.http.NoCacheFilter.doFilter(NoCacheFilter.java:45)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
{code}

 NPE in /rs-status during RS shutdown
 

 Key: HBASE-9294
 URL: https://issues.apache.org/jira/browse/HBASE-9294
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.95.2
Reporter: Steve Loughran
Priority: Minor

 While hitting reload to see when a kill-initiated RS shutdown would make the 
 Web UI go away, I got a stack trace from an NPE

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9294) NPE in /rs-status during RS shutdown

2013-08-21 Thread Steve Loughran (JIRA)
Steve Loughran created HBASE-9294:
-

 Summary: NPE in /rs-status during RS shutdown
 Key: HBASE-9294
 URL: https://issues.apache.org/jira/browse/HBASE-9294
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.95.2
Reporter: Steve Loughran
Priority: Minor


While hitting reload to see when a kill-initiated RS shutdown would make the 
Web UI go away, I got a stack trace from an NPE

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9185) mvn site target fails when building with Maven 3.1

2013-08-12 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13736618#comment-13736618
 ] 

Steve Loughran commented on HBASE-9185:
---

[~stack] -thanks for the fix. 

.bq (What you doing messing w/ mvn!).

breaking your build. Next question?

 mvn site target fails when building with Maven 3.1
 --

 Key: HBASE-9185
 URL: https://issues.apache.org/jira/browse/HBASE-9185
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.95.2
 Environment: Apache Maven 3.1.0 
 (893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-27 19:15:32-0700)
 Java version: 1.6.0_51, vendor: Apple Inc.
 Java home: 
 /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
 Default locale: en_US, platform encoding: MacRoman
 OS name: mac os x, version: 10.8.4, arch: x86_64, family: mac
Reporter: Steve Loughran
Assignee: stack
Priority: Minor
 Fix For: 0.98.0, 0.95.2

 Attachments: 9185.txt


 mvn site fails when building with mvn 3.1 due to various class changes inside 
 maven. They promise that switching to new versions of some mvn modules will 
 result in builds that work in both 3.0.x and 3.1:
 [https://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9185) mvn site target fails when building with Maven 3.1

2013-08-09 Thread Steve Loughran (JIRA)
Steve Loughran created HBASE-9185:
-

 Summary: mvn site target fails when building with Maven 3.1
 Key: HBASE-9185
 URL: https://issues.apache.org/jira/browse/HBASE-9185
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.95.2
 Environment: Apache Maven 3.1.0 
(893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-27 19:15:32-0700)
Java version: 1.6.0_51, vendor: Apple Inc.
Java home: /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
Default locale: en_US, platform encoding: MacRoman
OS name: mac os x, version: 10.8.4, arch: x86_64, family: mac
Reporter: Steve Loughran
Priority: Minor


mvn site fails when building with mvn 3.1 due to various class changes inside 
maven. They promise that switching to new versions of some mvn modules will 
result in builds that work in both 3.0.x and 3.1:

[https://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9185) mvn site target fails when building with Maven 3.1

2013-08-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735606#comment-13735606
 ] 

Steve Loughran commented on HBASE-9185:
---

BTW, the command that failed was
{code}
mvn clean install -DskipTests javadoc:aggregate site assembly:single
{code}

 mvn site target fails when building with Maven 3.1
 --

 Key: HBASE-9185
 URL: https://issues.apache.org/jira/browse/HBASE-9185
 Project: HBase
  Issue Type: Bug
  Components: build
Affects Versions: 0.95.2
 Environment: Apache Maven 3.1.0 
 (893ca28a1da9d5f51ac03827af98bb730128f9f2; 2013-06-27 19:15:32-0700)
 Java version: 1.6.0_51, vendor: Apple Inc.
 Java home: 
 /Library/Java/JavaVirtualMachines/1.6.0_51-b11-457.jdk/Contents/Home
 Default locale: en_US, platform encoding: MacRoman
 OS name: mac os x, version: 10.8.4, arch: x86_64, family: mac
Reporter: Steve Loughran
Priority: Minor

 mvn site fails when building with mvn 3.1 due to various class changes inside 
 maven. They promise that switching to new versions of some mvn modules will 
 result in builds that work in both 3.0.x and 3.1:
 [https://cwiki.apache.org/confluence/display/MAVEN/AetherClassNotFound]

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9185) mvn site target fails when building with Maven 3.1

2013-08-09 Thread Steve Loughran (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13735601#comment-13735601
 ] 

Steve Loughran commented on HBASE-9185:
---

full log
{code}
INFO] --- maven-site-plugin:3.2:site (default-site) @ hbase ---
[WARNING] Error injecting: 
org.apache.maven.reporting.exec.DefaultMavenReportExecutor
java.lang.NoClassDefFoundError: org/sonatype/aether/graph/DependencyFilter
at java.lang.Class.getDeclaredConstructors0(Native Method)
at java.lang.Class.privateGetDeclaredConstructors(Class.java:2437)
at java.lang.Class.getDeclaredConstructors(Class.java:1863)
at 
com.google.inject.spi.InjectionPoint.forConstructorOf(InjectionPoint.java:245)
at 
com.google.inject.internal.ConstructorBindingImpl.create(ConstructorBindingImpl.java:99)
at 
com.google.inject.internal.InjectorImpl.createUninitializedBinding(InjectorImpl.java:653)
at 
com.google.inject.internal.InjectorImpl.createJustInTimeBinding(InjectorImpl.java:863)
at 
com.google.inject.internal.InjectorImpl.createJustInTimeBindingRecursive(InjectorImpl.java:790)
at 
com.google.inject.internal.InjectorImpl.getJustInTimeBinding(InjectorImpl.java:278)
at 
com.google.inject.internal.InjectorImpl.getBindingOrThrow(InjectorImpl.java:210)
at 
com.google.inject.internal.InjectorImpl.getProviderOrThrow(InjectorImpl.java:986)
at 
com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:1019)
at 
com.google.inject.internal.InjectorImpl.getProvider(InjectorImpl.java:982)
at 
com.google.inject.internal.InjectorImpl.getInstance(InjectorImpl.java:1032)
at 
org.eclipse.sisu.reflect.AbstractDeferredClass.get(AbstractDeferredClass.java:44)
at 
com.google.inject.internal.ProviderInternalFactory.provision(ProviderInternalFactory.java:86)
at 
com.google.inject.internal.InternalFactoryToInitializableAdapter.provision(InternalFactoryToInitializableAdapter.java:55)
at 
com.google.inject.internal.ProviderInternalFactory$1.call(ProviderInternalFactory.java:70)
at 
com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:100)
at 
org.eclipse.sisu.plexus.lifecycles.PlexusLifecycleManager.onProvision(PlexusLifecycleManager.java:134)
at 
com.google.inject.internal.ProvisionListenerStackCallback$Provision.provision(ProvisionListenerStackCallback.java:109)
at 
com.google.inject.internal.ProvisionListenerStackCallback.provision(ProvisionListenerStackCallback.java:55)
at 
com.google.inject.internal.ProviderInternalFactory.circularGet(ProviderInternalFactory.java:68)
at 
com.google.inject.internal.InternalFactoryToInitializableAdapter.get(InternalFactoryToInitializableAdapter.java:47)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter$1.call(ProviderToInternalFactoryAdapter.java:46)
at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1054)
at 
com.google.inject.internal.ProviderToInternalFactoryAdapter.get(ProviderToInternalFactoryAdapter.java:40)
at com.google.inject.Scopes$1$1.get(Scopes.java:59)
at 
com.google.inject.internal.InternalFactoryToProviderAdapter.get(InternalFactoryToProviderAdapter.java:41)
at 
com.google.inject.internal.InjectorImpl$2$1.call(InjectorImpl.java:997)
at 
com.google.inject.internal.InjectorImpl.callInContext(InjectorImpl.java:1047)
at com.google.inject.internal.InjectorImpl$2.get(InjectorImpl.java:993)
at 
org.eclipse.sisu.locators.LazyBeanEntry.getValue(LazyBeanEntry.java:82)
at 
org.eclipse.sisu.plexus.locators.LazyPlexusBean.getValue(LazyPlexusBean.java:52)
at 
org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:259)
at 
org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:239)
at 
org.codehaus.plexus.DefaultPlexusContainer.lookup(DefaultPlexusContainer.java:233)
at 
org.apache.maven.plugins.site.AbstractSiteRenderingMojo.getReports(AbstractSiteRenderingMojo.java:229)
at org.apache.maven.plugins.site.SiteMojo.execute(SiteMojo.java:121)
at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:106)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:208)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
at