[jira] [Created] (HBASE-17722) Metrics subsystem stop/start messages add a lot of useless bulk to operational logging
Andrew Purtell created HBASE-17722: -- Summary: Metrics subsystem stop/start messages add a lot of useless bulk to operational logging Key: HBASE-17722 URL: https://issues.apache.org/jira/browse/HBASE-17722 Project: HBase Issue Type: Bug Components: metrics Affects Versions: 1.2.4, 1.3.0 Reporter: Andrew Purtell Metrics subsystem stop/start messages add a lot of useless bulk to operational logging. Say you are collecting logs from a fleet of thousands of servers and want to have them around for ~month or longer. It adds up. I think these should at least be at DEBUG level and ideally at TRACE. They don't offer much utility. {noformat} INFO [] impl.MetricsSystemImpl: HBase metrics system started INFO [] impl.MetricsSystemImpl: Stopping HBase metrics system... INFO [] impl.MetricsSystemImpl: HBase metrics system stopped. INFO [] impl.MetricsConfig: loaded properties from hadoop-metrics2-hbase.properties INFO [] impl.MetricsSystemImpl: Scheduled snapshot period at 10 second(s). {noformat} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17721) Provide streaming APIs with SSL/TLS
Alex Araujo created HBASE-17721: --- Summary: Provide streaming APIs with SSL/TLS Key: HBASE-17721 URL: https://issues.apache.org/jira/browse/HBASE-17721 Project: HBase Issue Type: Umbrella Reporter: Alex Araujo Assignee: Alex Araujo Fix For: 2.0.0 Umbrella to add optional client/server streaming capabilities to HBase. This would allow bandwidth to be used more efficiently for certain operations, and allow clients to use SSL/TLS for authentication and encryption. Desired client/server scaffolding: - HTTP/2 support - Protocol negotiation (blocking vs streaming, auth, encryption, etc.) - TLS/SSL support - Streaming RPC support Possibilities (and their tradeoffs): - gRPC: Some initial work and discussion on HBASE-13467 (Prototype using GRPC as IPC mechanism) -- Has most or all of the desired scaffolding -- Adds additional g* dependencies. Compat story for g* dependencies not always ideal - Custom HTTP/2 based client/server APIs -- More control over compat story -- Non-trivial to build scaffolding; might reinvent wheels along the way - Others? Related Jiras that might be rolled in as sub-tasks (or closed/replaced with new ones): HBASE-17708 (Expose config to set two-way auth over TLS in HttpServer and add a test) HBASE-8691 (High-Throughput Streaming Scan API) HBASE-14899 (Create custom Streaming ReplicationEndpoint) -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure
[ https://issues.apache.org/jira/browse/HBASE-17720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ben Lau resolved HBASE-17720. - Resolution: Duplicate > Possible bug in FlushSnapshotSubprocedure > - > > Key: HBASE-17720 > URL: https://issues.apache.org/jira/browse/HBASE-17720 > Project: HBase > Issue Type: Bug > Components: dataloss, snapshots >Reporter: Ben Lau > > I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that > it does not appear to explicitly handle a DroppedSnapshotException. In the > primary codepath when flushing memstores, (see > MemStoreFlusher.flushRegion()), there is a try/catch for > DroppedSnapshotException that will abort the regionserver to replay WALs to > avoid data loss. I don't see this in FlushSnapshotSubProcedure. Is this an > accidental omission or is there a reason this isn't present? > I'm not too familiar with procedure V1 or V2. I assume it is the case that > if a participant dies that all other participants will terminate any > outstanding operations for the procedure? If so and if this lack of > RS.abort() for DroppedSnapshotException is a bug, then it can't be fixed > naively otherwise I assume a failed flush on 1 region server could cause a > cascade of RS abortions on the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17720) Possible bug in FlushSnapshotSubprocedure
Ben Lau created HBASE-17720: --- Summary: Possible bug in FlushSnapshotSubprocedure Key: HBASE-17720 URL: https://issues.apache.org/jira/browse/HBASE-17720 Project: HBase Issue Type: Bug Components: dataloss, snapshots Reporter: Ben Lau I noticed that FlushSnapshotSubProcedure differs from MemstoreFlusher in that it does not appear to explicitly handle a DroppedSnapshotException. In the primary codepath when flushing memstores, (see MemStoreFlusher.flushRegion()), there is a try/catch for DroppedSnapshotException that will abort the regionserver to replay WALs to avoid data loss. I don't see this in FlushSnapshotSubProcedure. Is this an accidental omission or is there a reason this isn't present? I'm not too familiar with procedure V1 or V2. I assume it is the case that if a participant dies that all other participants will terminate any outstanding operations for the procedure? If so and if this lack of RS.abort() for DroppedSnapshotException is a bug, then it can't be fixed naively otherwise I assume a failed flush on 1 region server could cause a cascade of RS abortions on the cluster. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (HBASE-17714) Client heartbeats seems to be broken
[ https://issues.apache.org/jira/browse/HBASE-17714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Samarth Jain resolved HBASE-17714. -- Resolution: Not A Bug > Client heartbeats seems to be broken > > > Key: HBASE-17714 > URL: https://issues.apache.org/jira/browse/HBASE-17714 > Project: HBase > Issue Type: Bug >Reporter: Samarth Jain > > We have a test in Phoenix where we introduce an artificial sleep of 2 times > the RPC timeout in preScannerNext() hook of a co-processor. > {code} > public static class SleepingRegionObserver extends SimpleRegionObserver { > public SleepingRegionObserver() {} > > @Override > public boolean preScannerNext(final > ObserverContext c, > final InternalScanner s, final List results, > final int limit, final boolean hasMore) throws IOException { > try { > if (SLEEP_NOW && > c.getEnvironment().getRegion().getRegionInfo().getTable().getNameAsString().equals(TABLE_NAME)) > { > Thread.sleep(RPC_TIMEOUT * 2); > } > } catch (InterruptedException e) { > throw new IOException(e); > } > return super.preScannerNext(c, s, results, limit, hasMore); > } > } > {code} > This test was passing fine till 1.1.3 but started failing sometime before > 1.1.9 with an OutOfOrderScannerException. See PHOENIX-3702. [~lhofhansl] > mentioned that we have client heartbeats enabled and that should prevent us > from running into issues like this. FYI, this test fails with 1.2.3 version > of HBase too. > CC [~apurtell], [~jamestaylor] -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (HBASE-17719) Pre-Emptive Fast Fail does not apply to scanners
James Moore created HBASE-17719: --- Summary: Pre-Emptive Fast Fail does not apply to scanners Key: HBASE-17719 URL: https://issues.apache.org/jira/browse/HBASE-17719 Project: HBase Issue Type: Bug Components: Client Affects Versions: 1.2.0 Reporter: James Moore Assignee: James Moore on CDH 5.9.0 testing revealed that scanners do not leverage Pre-emptive fast fail. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Successful: HBase Generate Website
Build status: Successful If successful, the website and docs have been generated. To update the live site, follow the instructions below. If failed, skip to the bottom of this email. Use the following commands to download the patch and apply it to a clean branch based on origin/asf-site. If you prefer to keep the hbase-site repo around permanently, you can skip the clone step. git clone https://git-wip-us.apache.org/repos/asf/hbase-site.git cd hbase-site wget -O- https://builds.apache.org/job/hbase_generate_website/504/artifact/website.patch.zip | funzip > 697a55a8782d940aa4f1287c2ef4a45ba516cac1.patch git fetch git checkout -b asf-site-697a55a8782d940aa4f1287c2ef4a45ba516cac1 origin/asf-site git am --whitespace=fix 697a55a8782d940aa4f1287c2ef4a45ba516cac1.patch At this point, you can preview the changes by opening index.html or any of the other HTML pages in your local asf-site-697a55a8782d940aa4f1287c2ef4a45ba516cac1 branch. There are lots of spurious changes, such as timestamps and CSS styles in tables, so a generic git diff is not very useful. To see a list of files that have been added, deleted, renamed, changed type, or are otherwise interesting, use the following command: git diff --name-status --diff-filter=ADCRTXUB origin/asf-site To see only files that had 100 or more lines changed: git diff --stat origin/asf-site | grep -E '[1-9][0-9]{2,}' When you are satisfied, publish your changes to origin/asf-site using these commands: git commit --allow-empty -m "Empty commit" # to work around a current ASF INFRA bug git push origin asf-site-697a55a8782d940aa4f1287c2ef4a45ba516cac1:asf-site git checkout asf-site git branch -D asf-site-697a55a8782d940aa4f1287c2ef4a45ba516cac1 Changes take a couple of minutes to be propagated. You can verify whether they have been propagated by looking at the Last Published date at the bottom of http://hbase.apache.org/. It should match the date in the index.html on the asf-site branch in Git. As a courtesy- reply-all to this email to let other committers know you pushed the site. If failed, see https://builds.apache.org/job/hbase_generate_website/504/console
[jira] [Reopened] (HBASE-17595) Add partial result support for small/limited scan
[ https://issues.apache.org/jira/browse/HBASE-17595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Duo Zhang reopened HBASE-17595: --- hasMoreCellsInRow is not stable as KeyvalueHeap.peek does not use filter. > Add partial result support for small/limited scan > - > > Key: HBASE-17595 > URL: https://issues.apache.org/jira/browse/HBASE-17595 > Project: HBase > Issue Type: Sub-task > Components: asyncclient, Client, scan >Affects Versions: 2.0.0, 1.4.0 >Reporter: Duo Zhang >Assignee: Duo Zhang > Fix For: 2.0.0, 1.4.0 > > Attachments: HBASE-17595-branch-1.patch, HBASE-17595.patch, > HBASE-17595-v1.patch > > > The partial result support is marked as a 'TODO' when implementing > HBASE-17045. And when implementing HBASE-17508, we found that if we make > small scan share the same logic with general scan, the scan request other > than open scanner will not have the small flag so the server may return > partial result to the client and cause some strange behavior. It is solved by > modifying the logic at server side, but this means the 1.4.x client is not > safe to contact with earlier 1.x server. So we'd better address the problem > at client side. Marked as blocker as this issue should be finished before any > 2.x and 1.4.x releases. -- This message was sent by Atlassian JIRA (v6.3.15#6346)