[jira] [Commented] (HBASE-26553) OAuth Bearer authentication mech plugin for SASL

2023-12-15 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17797184#comment-17797184
 ] 

ramkrishna.s.vasudevan commented on HBASE-26553:


[~andor]  - Any idea when this feature is going to land? 

> OAuth Bearer authentication mech plugin for SASL
> 
>
> Key: HBASE-26553
> URL: https://issues.apache.org/jira/browse/HBASE-26553
> Project: HBase
>  Issue Type: New Feature
>  Components: security
>Reporter: Andor Molnar
>Assignee: Andor Molnar
>Priority: Major
> Fix For: HBASE-26553
>
>
> Implementation of a new SASL plugin to add support for OAuth Bearer token 
> authentication for HBase client RPC.
>  * The plugin supports secured (cryptographically signed) JSON Web Token 
> authentication as defined in 
> [RFC-7628|https://datatracker.ietf.org/doc/html/rfc7628]  and the JWT format 
> in [RFC-7519|https://datatracker.ietf.org/doc/html/rfc7519] .
>  * The implementation is inspired by [Apache Kafka's OAuth Bearer 
> token|https://docs.confluent.io/platform/current/kafka/authentication_sasl/authentication_sasl_oauth.html]
>  support with the important difference that HBase version is intended for 
> production usage. The two main differences are that Kafka supports unsecured 
> tokens only and it issues the tokens for itself which breaks the principle of 
> OAuth token authentication.
>  * We use the [Nimbus JOSE + 
> JWT|https://bitbucket.org/connect2id/nimbus-jose-jwt/src/master/] Java 
> library for signature verification and token processing and we add it as a 
> new dependency to HBase.
>  * We add secure JWT support and verification of digital signatures with 
> multiple algorithms as supported by Nimbus. Json-formatted JWK set is 
> required for the signature verification as defined in 
> [RFC-7517|https://datatracker.ietf.org/doc/html/rfc7517].
>  * The impl is verified with Apache Knox issued tokens, because that's the 
> primary use case of this new feature.
>  * New client example is added to the hbase-examples project to showcase the 
> feature.
>  * It's important that this Jira does not cover the solution for obtaining a 
> token from Knox. The assumption is that the client already has a valid token 
> in base64 encoded string and we only provide a helper method for adding it to 
> user's credentials.
>  * Renewing expired tokens is also the responsibility of the client. We don't 
> provide a mechanism for that in this Jira, but it's planned to be covered in 
> a follow-up ticket.
> The following new parameters are introduced in hbase-site.xml:
>  * hbase.security.oauth.jwt.jwks.file - Path of a local file for JWK set. 
> (required if URL not specified)
>  * hbase.security.oauth.jwt.jwks.url - URL to download the JWK set. (required 
> if File not specified)
>  * hbase.security.oauth.jwt.audience - Required audience, "aud" claim of the 
> JWT. (optional)
>  * hbase.security.oauth.jwt.issuer - Required issuer, "iss" claim of the JWT. 
> (optional)
> The feature will be behind feature-flag. No code part is executed unless the 
> following configuration is set in hbase-site.xml:
> {noformat}
>   
>     hbase.client.sasl.provider.extras
>     
> org.apache.hadoop.hbase.security.provider.OAuthBearerSaslClientAuthenticationProvider
>   
>   
>     hbase.server.sasl.provider.extras
>     
> org.apache.hadoop.hbase.security.provider.OAuthBearerSaslServerAuthenticationProvider
>   
>   
>     hbase.client.sasl.provider.class
>     
> org.apache.hadoop.hbase.security.provider.OAuthBearerSaslProviderSelector
>   
> {noformat}
> Example of Knox provided JWKS file:
> {noformat}
> {
>   "keys":
>   [{
> "kty": "RSA",
> "e": "",
> "use": "sig",
> "kid": "",
> "alg": "RS256",
> "n": ""
>   }]
> }{noformat}
> Example of Knox issued JWT header:
> {noformat}
> {
> "jku": "https://path/to/homepage/knoxtoken/api/v1/jwks.json;,
> "kid": "",
> "alg": "RS256"
> }{noformat}
> And payload:
> {noformat}
> {
>   "sub": "user_andor",
>   "aud": "knox-proxy-token",
>   "jku": "https://path/to/homepage/knoxtoken/api/v1/jwks.json;,
>   "kid": "",
>   "iss": "KNOXSSO",
>   "exp": 1636644029,
>   "managed.token": "true",
>   "knox.id": ""
> }{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (HBASE-26981) The CPU usage of the regionserver node where the meta table is located is too high

2022-04-27 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-26981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17528870#comment-17528870
 ] 

ramkrishna.s.vasudevan commented on HBASE-26981:


Thanks for the analysis [~zhengsicheng]

> The CPU usage of the regionserver node where the meta table is located is too 
> high
> --
>
> Key: HBASE-26981
> URL: https://issues.apache.org/jira/browse/HBASE-26981
> Project: HBase
>  Issue Type: Bug
>  Components: hbase-connectors
>Affects Versions: 2.3.4
>Reporter: zhengsicheng
>Priority: Major
> Attachments: image-2022-04-27-20-22-17-089.png, 
> image-2022-04-27-20-24-33-252.png, image-2022-04-27-20-45-09-227.png, jstack
>
>
> When the read and write pressure is high, the CPU usage of the meta table 
> node is too high
> !image-2022-04-27-20-22-17-089.png!
> !image-2022-04-27-20-24-33-252.png!



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (HBASE-25898) RS getting aborted due to NPE in Replication WALEntryStream

2021-05-19 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347414#comment-17347414
 ] 

ramkrishna.s.vasudevan commented on HBASE-25898:


Good one.

> RS getting aborted due to NPE in Replication WALEntryStream
> ---
>
> Key: HBASE-25898
> URL: https://issues.apache.org/jira/browse/HBASE-25898
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Critical
>
> Below sequence of events happened in a customer cluster
> An empty WAL file got roll req.
> The close of file failed at HDFS side but as there  file had all edits 
> synced, we continue.
> New WAL file is created and old rolled.
> This old WAL file got archived to oldWAL 
> {code}
> 2021-05-13 13:38:46.000   Riding over failed WAL close of 
> hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678,
>  cause="Unexpected EOF while trying to read response from server", errors=1; 
> THIS FILE WAS NOT CLOSED BUT ALL EDITS SYNCED SO SHOULD BE OK
> 2021-05-13 13:38:46.000   Rolled WAL 
> /xx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678 
> with entries=0, filesize=90 B; new WAL 
> /xx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620913126549
> 2021-05-13 13:38:46.000Archiving 
> hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678
>  to hdfs://xxx/oldWALs/xxxt%2C16020%2C1620828102351.1620910673678
> 2021-05-13 13:38:46.000   Log 
> hdfs://xxx/WALs/xxx,16020,1620828102351/xxx%2C16020%2C1620828102351.1620910673678
>  was moved to hdfs://xxx/oldWALs/xxx%2C16020%2C1620828102351.1620910673678
> {code}
> As there was move of file, the WALEntryStream got IOE and we will recreate 
> the stream .
> {code}
> ReplicationSourceWALReader#run
> while (isReaderRunning()) {
>   try {
> entryStream =
>   new WALEntryStream(logQueue, conf, currentPosition, 
> source.getWALFileLengthProvider(),
> source.getServerWALsBelongTo(), source.getSourceMetrics(), 
> walGroupId);
> while (isReaderRunning()) { 
> ...
> ...
> } catch (IOException e) { // stream related
> if (handleEofException(e, batch)) {
>   sleepMultiplier = 1;
> } else {
>   LOG.warn("Failed to read stream of replication entries", e);
>   if (sleepMultiplier < maxRetriesMultiplier) {
> sleepMultiplier++;
>   }
>   Threads.sleep(sleepForRetries * sleepMultiplier);
> }
> }
> {code}
> eofAutoRecovery is turned off anyways.  So it will go to outer while loop and 
> create new WALEntryStream object
> Then we do readWALEntries
> {code}
> protected WALEntryBatch readWALEntries(WALEntryStream entryStream,
>   WALEntryBatch batch) throws IOException, InterruptedException {
> Path currentPath = entryStream.getCurrentPath();
> if (!entryStream.hasNext()) {
> {code}
> Here the currentPath will be still null. 
> WALEntryStream#hasNext -> tryAdvanceEntry -> checkReader -> openNextLog
> {code}
> private boolean openNextLog() throws IOException {
> PriorityBlockingQueue queue = logQueue.getQueue(walGroupId);
> Path nextPath = queue.peek();
> if (nextPath != null) {
>   openReader(nextPath);
> 
> private void openReader(Path path) throws IOException {
> try {
>   // Detect if this is a new file, if so get a new reader else
>   // reset the current reader so that we see the new data
>   if (reader == null || !getCurrentPath().equals(path)) {
> closeReader();
> reader = WALFactory.createReader(fs, path, conf);
> seek();
> setCurrentPath(path);
>   } else {
> resetReader();
>   }
> } catch (FileNotFoundException fnfe) {
>   handleFileNotFound(path, fnfe);
> }  catch (RemoteException re) {
>   IOException ioe = re.unwrapRemoteException(FileNotFoundException.class);
>   if (!(ioe instanceof FileNotFoundException)) {
> throw ioe;
>   }
>   handleFileNotFound(path, (FileNotFoundException)ioe);
> } catch (LeaseNotRecoveredException lnre) {
>   // HBASE-15019 the WAL was not closed due to some hiccup.
>   LOG.warn("Try to recover the WAL lease " + currentPath, lnre);
>   recoverLease(conf, currentPath);
>   reader = null;
> } catch (NullPointerException npe) {
>   // Workaround for race condition in HDFS-4380
>   // which throws a NPE if we open a file before any data node has the 
> most recent block
>   // Just sleep and retry. Will require re-reading compressed WALs for 
> compressionContext.
>   LOG.warn("Got NPE opening reader, will retry.");
>

[jira] [Commented] (HBASE-25698) Persistent IllegalReferenceCountException at scanner open

2021-03-26 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25698?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17309551#comment-17309551
 ] 

ramkrishna.s.vasudevan commented on HBASE-25698:


This might be in the encoder flow where we might be missing something? Any 
other exception before this stack trace?

> Persistent IllegalReferenceCountException at scanner open
> -
>
> Key: HBASE-25698
> URL: https://issues.apache.org/jira/browse/HBASE-25698
> Project: HBase
>  Issue Type: Bug
>  Components: HFile, Scanners
>Affects Versions: 2.4.2
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0, 2.4.3
>
>
> Persistent scanner open failure. Not sure how it happened. Test scenario was 
> HBase 1 cluster replicating to HBase 2 cluster. ITBLL as data generator at 
> source, calm policy only. Scanner open errors on sink HBase 2 cluster later 
> during ITBLL verify phase. Sink schema settings bloom=ROW encoding=FAST_DIFF 
> compression=NONE.
> {noformat}
> Caused by: 
> org.apache.hbase.thirdparty.io.netty.util.IllegalReferenceCountException: 
> refCnt: 0, decrement: 1
> at 
> org.apache.hbase.thirdparty.io.netty.util.internal.ReferenceCountUpdater.toLiveRealRefCnt(ReferenceCountUpdater.java:74)
> at 
> org.apache.hbase.thirdparty.io.netty.util.internal.ReferenceCountUpdater.release(ReferenceCountUpdater.java:138)
> at 
> org.apache.hbase.thirdparty.io.netty.util.AbstractReferenceCounted.release(AbstractReferenceCounted.java:76)
> at org.apache.hadoop.hbase.nio.ByteBuff.release(ByteBuff.java:79)
> at 
> org.apache.hadoop.hbase.io.hfile.HFileBlock.release(HFileBlock.java:429)
> at 
> org.apache.hadoop.hbase.io.hfile.CompoundBloomFilter.contains(CompoundBloomFilter.java:109)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileReader.checkGeneralBloomFilter(StoreFileReader.java:433)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileReader.passesGeneralRowBloomFilter(StoreFileReader.java:322)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileReader.passesBloomFilter(StoreFileReader.java:251)
> at 
> org.apache.hadoop.hbase.regionserver.StoreFileScanner.shouldUseScanner(StoreFileScanner.java:491)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.selectScannersFrom(StoreScanner.java:471)
> at 
> org.apache.hadoop.hbase.regionserver.StoreScanner.(StoreScanner.java:249)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.createScanner(HStore.java:2177)
> at 
> org.apache.hadoop.hbase.regionserver.HStore.getScanner(HStore.java:2168)
> at 
> org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.initializeScanners(HRegion.java:7172)
> {noformat}
> Bloom filter type on all files here is ROW, block encoding is FAST_DIFF:
> {noformat}
> hbase:017:0> describe "IntegrationTestBigLinkedList"
> Table IntegrationTestBigLinkedList is ENABLED 
>   
> IntegrationTestBigLinkedList  
>   
> COLUMN FAMILIES DESCRIPTION   
>   
> {NAME => 'big', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DIF
> F', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE 
> => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '1'} 
> {NAME => 'meta', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DI
> FF', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE 
> => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '1'}
> {NAME => 'tiny', BLOOMFILTER => 'ROW', IN_MEMORY => 'false', VERSIONS => '1', 
> KEEP_DELETED_CELLS => 'FALSE', DATA_BLOCK_ENCODING => 'FAST_DI
> FF', COMPRESSION => 'NONE', TTL => 'FOREVER', MIN_VERSIONS => '0', BLOCKCACHE 
> => 'true', BLOCKSIZE => '65536', REPLICATION_SCOPE => '1'}
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24850) CellComparator perf improvement

2020-12-28 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-24850.

Fix Version/s: 2.4.1
   2.3.4
   2.2.7
 Hadoop Flags: Reviewed
   Resolution: Fixed

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.2.7, 2.3.4, 2.5.0, 2.4.1
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-28 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17255648#comment-17255648
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


Puhsed to master, branch-2, branch-2.2, branch-2.3 and branch-2.4.

Thanks to every one who helped with testing and reviews of this patch. Tested 
the final patch with a single region server and a single thread full scan with 
30 cols each. I can see a 14% improvement. 

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25424) Find a way to config OpenTelemetry tracing without directly depending on opentelemetry-sdk

2020-12-23 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17254010#comment-17254010
 ] 

ramkrishna.s.vasudevan commented on HBASE-25424:


Got it. Fine with me. 

> Find a way to config OpenTelemetry tracing without directly depending on 
> opentelemetry-sdk
> --
>
> Key: HBASE-25424
> URL: https://issues.apache.org/jira/browse/HBASE-25424
> Project: HBase
>  Issue Type: Sub-task
>  Components: dependencies, tracing
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> According to the document of OpenTelemetry, for all the modules which could 
> be depended by downstream users, we should only depend on opentelemetry-api.
> But the open telemetry propagator must be initialized programmatically, so we 
> need to have a module to implement the code and introduce dependency on 
> opentelemetry-sdk, and we need to call it before doing anything when starting 
> master, regionserver, and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25424) Find a way to config OpenTelemetry tracing without directly depending on opentelemetry-sdk

2020-12-23 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253990#comment-17253990
 ] 

ramkrishna.s.vasudevan commented on HBASE-25424:


bq.And on enable/disable trace on the fly, theoretically we could do this, as 
the OpenTelemetrySdk has a method to update trace config, and we could also 
reset the global OpenTelemetrySdk at runtime.

Right this is what we did in our POC. Just exposed a command at the shell which 
will go and enable the tracing at a given RS by changing the global tracer at 
runtime.  How ever in poc we did not trace from client to server. Just the 
server side path. 

bq.[https://github.com/open-telemetry/opentelemetry-java-instrumentation]

I see. I did not come across this one. Our own module might be better till this 
opentelemetry and its allied projects become GA? 

> Find a way to config OpenTelemetry tracing without directly depending on 
> opentelemetry-sdk
> --
>
> Key: HBASE-25424
> URL: https://issues.apache.org/jira/browse/HBASE-25424
> Project: HBase
>  Issue Type: Sub-task
>  Components: dependencies, tracing
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> According to the document of OpenTelemetry, for all the modules which could 
> be depended by downstream users, we should only depend on opentelemetry-api.
> But the open telemetry propagator must be initialized programmatically, so we 
> need to have a module to implement the code and introduce dependency on 
> opentelemetry-sdk, and we need to call it before doing anything when starting 
> master, regionserver, and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25424) Find a way to config OpenTelemetry tracing without directly depending on opentelemetry-sdk

2020-12-22 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253897#comment-17253897
 ] 

ramkrishna.s.vasudevan commented on HBASE-25424:


I also believe that we should give way to plug the type of exporters we need. 

> Find a way to config OpenTelemetry tracing without directly depending on 
> opentelemetry-sdk
> --
>
> Key: HBASE-25424
> URL: https://issues.apache.org/jira/browse/HBASE-25424
> Project: HBase
>  Issue Type: Sub-task
>  Components: dependencies, tracing
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> According to the document of OpenTelemetry, for all the modules which could 
> be depended by downstream users, we should only depend on opentelemetry-api.
> But the open telemetry propagator must be initialized programmatically, so we 
> need to have a module to implement the code and introduce dependency on 
> opentelemetry-sdk, and we need to call it before doing anything when starting 
> master, regionserver, and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25424) Find a way to config OpenTelemetry tracing without directly depending on opentelemetry-sdk

2020-12-22 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253896#comment-17253896
 ] 

ramkrishna.s.vasudevan commented on HBASE-25424:


[~zhangduo]

One Quick question on this, if we attach the open telemetry based tracing to 
the region server on startup or by some way and we do the tracing, how do we 
disable the tracing dynamically? Or say if I need to enable it for specified 
time and then again disable it? 

Should we have a script which by some way replaces the tracing instance to 
either NoOp tracer or the one that really collects the data points?

> Find a way to config OpenTelemetry tracing without directly depending on 
> opentelemetry-sdk
> --
>
> Key: HBASE-25424
> URL: https://issues.apache.org/jira/browse/HBASE-25424
> Project: HBase
>  Issue Type: Sub-task
>  Components: dependencies, tracing
>Reporter: Duo Zhang
>Assignee: Duo Zhang
>Priority: Major
>
> According to the document of OpenTelemetry, for all the modules which could 
> be depended by downstream users, we should only depend on opentelemetry-api.
> But the open telemetry propagator must be initialized programmatically, so we 
> need to have a module to implement the code and introduce dependency on 
> opentelemetry-sdk, and we need to call it before doing anything when starting 
> master, regionserver, and so on.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-22 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17253349#comment-17253349
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


[~zghao], [~ndimiduk], [~apurtell]

We have approval for this PR. Do you want in the branch-2.2, branch-2.3 and 
master branches?

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-17 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17251506#comment-17251506
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


In my stand alone tests I did not find much  of a difference. 

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-17 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250987#comment-17250987
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


bq.I think the following method is too big to be inlined. Can we make it 
smaller to fit to the default 325 bytes?

This does not happen any more as we have split the methods individually. 

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-12-16 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250873#comment-17250873
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


[~huaxiangsun]

I did the inlining in HBASE-24850 for the compare() method . that I believe is 
not the only reason for this issue because the branching seems to be the reason 
here. When we don't have branching the inlining is much better. But we can give 
it a try. 

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-16 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250810#comment-17250810
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


So with this patch we are inlining the inner method calls that happesn on the 
Byte[] or BB. But ya the parent method becomes too big. Lets see if we can 
honour that too. So that the compare() as such is not so big. 

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-16 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250809#comment-17250809
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


bq.If there is not a single "happy path" then you want separate comparator 
methods for each case, each small enough to be inlined, and let the JIT figure 
out by runtime profiling which comparator method call can be predicted and 
inlined.

Thansks [~apurtell]. Let me see if we can reduce that too. 

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-16 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250806#comment-17250806
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


bq.I left comments on the JIRA... suggesting no need of the marker interface 
since Cell impls and comparator are all package private Thanks.

Am on it. Will update the PR accordingly.

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-12-16 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250798#comment-17250798
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


[~stack] and [~bharathv]

Thanks for chiming in. I believe atleast in the MR case the Puts that are 
generated here (in PutSortReducer) is anywa going to KVs only as in the client 
facing Put API we expose Cells that are always KV. If we can generate a KV 
comparator code that will be the best way I agree to it. I don't have much 
experience in doing this code generation. I can look into options for that and 
see if it can be used here.

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-12-15 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250107#comment-17250107
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


[~stack]

What needs to be done from our tests is that, even if we have HBASE-24850 the 
bulk load performance improves by 15% but does not perform as 1.x or outperform 
it. Without HBASE-24850 we are around 40 to 45% slower.

The reason is that with HBASE-24850 the inlining takes effect but still we have 
branching. If you see the PutSortReducer - it handles single row at a time. The 
reduce() API creates a map adds the data to it and writes it. Even if we have 
300 cols we do that row by row. So the optimization that we have don in 
HBASE-24850 related to inlining does not kick in fully. Instead when we have a 
KVComparator where there is no branching and also we do inlining we are able to 
outperform 1.3 performance. 

So this brings the fact that we  might have to have a pure KVComparator for 
bulk load types of uses cases. Thoughts?

 

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-15 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250105#comment-17250105
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


bq.Lets get this JIRA landed. Next would be fixing this stack trace of 
[~ram_krish]'s from the previous issue 
https://issues.apache.org/jira/browse/HBASE-24754?focusedCommentId=17172541=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17172541
 The difference in the stack trace is extreme between hbase1 and hbase2; if all 
is inlined, then they are the same if they are not, there is an opportunity 
for speedup collapsing the #compare.

This patch already does that. There in the other Jira we just tried to see the 
impact of Cellcomparator vs KVComparator. So the inlining has been handled by 
this patch. 

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-15 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250103#comment-17250103
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


Attached two inlining output generated by adding the 
-XX:+UnlockDiagnosticVMOptions -XX:+PrintInlining.

op_withopt -> is with optimization (with patch), flamegraph_opt -> is with patch

op_withnoopt -> is without the patch, flamegraph.svg -> without patch

You can see that inlining does not happen when there is no patch. But with 
patch everthing is inlined. we don't see the 'inlining too deep' case.

I also tried adding the jvm -XX:_MaxInlineLevel_=20 without patch , but that 
does not improve as we see with patch case. (though in-lining happens).

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-15 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17250104#comment-17250104
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


[~stack] - fyi.

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24850) CellComparator perf improvement

2020-12-15 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-24850:
---
Attachment: flamegraph_opt.svg

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, flamegraph_opt.svg, op_withnoopt, 
> op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24850) CellComparator perf improvement

2020-12-15 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-24850:
---
Attachment: flamegraph.svg

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: flamegraph.svg, op_withnoopt, op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24850) CellComparator perf improvement

2020-12-15 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-24850:
---
Attachment: op_withopt
op_withnoopt

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
> Attachments: op_withnoopt, op_withopt
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249534#comment-17249534
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


bq.Can we have compare(Cell) in ExtendedCell? 

Yes I looked into this option. What it will end up finally is that say if KV 
implements this compare(Cell). Now if the current cell (as pointed by 'this') 
and the incoming Cell are KVs ya we can handle it in KV. But if the incoming 
cell is BBKV then it means 'this' (referring to KV) needs to compare with BBKV. 
Then we should add that logic inside the compare(Cell) of KV. So it would be 
better to move such things to an Util method. that is why I felt that 
redirection from compare() to that Util method is not needed. 

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249533#comment-17249533
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


The branching and the parsing of the cells parts are both the reasons. 

In 1.3.x we don't do branching (as only one cell type is available)but we are 
not parsing the cell parts indvidually. So that causes a performance issue. 

In 2.x we do branching and also  don't do parsing. So it is having more impact. 
Now in the recent patch we do the parsing so it becomes faster but also in the 
branching we remove branches getting created again like inside compareRows, 
compareFam, compareCols etc. That helps. There is also a reason why the bulk 
load perf is slower based on these factors. 

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-12-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17249532#comment-17249532
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


bq.I thought BBKV was default in branch-2? Sounds like its not?

Ya it is not. When we use a file system based cache even there is i KV only. 
And anyway the keys that gets added to the memstore are purely KV only. We 
generally don't turn on offheap write path. 

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.5.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25346) hbase2.x the performance is lower than hbase 1.x ?

2020-12-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247692#comment-17247692
 ] 

ramkrishna.s.vasudevan commented on HBASE-25346:


>From both the logs that you have attached (the PE output) of 2.x and 1.2.x it 
>seems the total run time difference is about 2 secs. you mean this 2 secs is 
>consistently observed? (like around 3%).

> hbase2.x the performance is lower than hbase 1.x  ?
> ---
>
> Key: HBASE-25346
> URL: https://issues.apache.org/jira/browse/HBASE-25346
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.2
>Reporter: nilonealex
>Priority: Critical
> Attachments: hbase-pe-performace-test.log, hbase-site.xml, 
> test_for_randomWrite.log, test_for_randomWrite_hbase1.2.1.log
>
>
> Recently we found that the newly built production hbase cluster is running a 
> bit slow , the hadoop version is Hbase2.0.2 ( HDP3.1.1) and it has 100 
> nodes.Then we begin to  do load & query performance verification between 
> Hbase2.0.2 ( HDP3.1.1) & Hbase1.2.0 ( CDH5.13.3 ) test environment (4nodes), 
> found that : put data based on hbase2.0 is much slower than hbase1.x (the 
> former is almost half of the latter), I use BufferedMutator and 
> BufferedMutatorParams term for batch put to improve efficiency. More 
> confusing is the performance of the production environment is worse than my 
> test environment
> Some of the codes are as follows:
> ---
> {color:#4C9AFF}List mutator = new ArrayList<>();
> BufferedMutator table = null;
> BufferedMutatorParams params = new 
> BufferedMutatorParams(TableName.valueOf(fileHbRule.getHbaseTableName()));
> params.writeBufferSize(fileHbRule.getFlushBuffer().intValue()*1024*1024);
> table = connection.getBufferedMutator(params);
>   
> mutator.add(p);
> if(totalCnts % 5000 == 0 ) {
>   table.mutate(mutator);
>   mutator.clear();
> }{color}
> ---
> The file to put is a text format file: 2 million rows comma-separated text 
> file, each row records 110 columns, total size is about 1G. In addition to 
> the main parameter configuration such as heap memory, I kept the default 
> parameter values ??for most of the hbase services.
> The load program is designed for single thread.
> The following is the progress information :
> --- Hbase1.2.0 ( CDH5.13.3 ) 
> 
> 2020-12-01 16:48:18 inserted:  10
> 2020-12-01 16:48:36 inserted:  20
> 2020-12-01 16:48:52 inserted:  30
> 2020-12-01 16:49:08 inserted:  40
> 2020-12-01 16:49:23 inserted:  50
> 2020-12-01 16:49:39 inserted:  60
> 2020-12-01 16:49:56 inserted:  70
> 2020-12-01 16:50:12 inserted:  80
> 2020-12-01 16:50:29 inserted:  90
> 2020-12-01 16:50:45 inserted:  100
> 2020-12-01 16:51:01 inserted:  110
> 2020-12-01 16:51:17 inserted:  120
> 2020-12-01 16:51:34 inserted:  130
> 2020-12-01 16:51:49 inserted:  140
> 2020-12-01 16:52:05 inserted:  150
> 2020-12-01 16:52:21 inserted:  160
> 2020-12-01 16:52:40 inserted:  170
> 2020-12-01 16:52:57 inserted:  180
> 2020-12-01 16:53:19 inserted:  190
> 2020-12-01 16:53:42 inserted:  200
> 2020-12-01 16:53:48 inserted:  200
> imp finished ok! 
> --job finished--
> ---Hbase.2.0.2 ( 
> HDP3.1.1)-
> 2020-12-01 17:25:24 inserted:  10
> 2020-12-01 17:26:03 inserted:  20
> 2020-12-01 17:26:39 inserted:  30
> 2020-12-01 17:27:13 inserted:  40
> 2020-12-01 17:27:47 inserted:  50
> 2020-12-01 17:28:23 inserted:  60
> 2020-12-01 17:29:03 inserted:  70
> 2020-12-01 17:29:40 inserted:  80
> 2020-12-01 17:30:15 inserted:  90
> 2020-12-01 17:30:51 inserted:  100
> 2020-12-01 17:31:27 inserted:  110
> 2020-12-01 17:32:03 inserted:  120
> 2020-12-01 17:32:39 inserted:  130
> 2020-12-01 17:33:14 inserted:  140
> 2020-12-01 17:33:50 inserted:  150
> 2020-12-01 17:34:25 inserted:  160
> 2020-12-01 17:35:01 inserted:  170
> 2020-12-01 17:35:38 inserted:  180
> 2020-12-01 17:36:14 inserted:  190
> 2020-12-01 17:36:51 inserted:  200
> 2020-12-01 17:36:55 inserted:  200
> imp finished ok! 
> --job finished--
> returnCode=0
> In addition, we also did some benchmark tests on the production cluster.The 
> delay is seem to be a bit high. The detailed report is in the attachment.
> Are there any key points that I have not done configuration? or,, this 
> version has performance defects ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25346) hbase2.x the performance is lower than hbase 1.x ?

2020-12-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17247133#comment-17247133
 ] 

ramkrishna.s.vasudevan commented on HBASE-25346:


Here it is the random write report with 2.0. 

What is the value u see with 1.x based hbase?  The WAL system is AsyncFSWAL in 
2.x and FileSystem in 1.x.  BTW how many nodes are you testing? 

In HBASE-24850 we have seen issues with the CellComparator performance when we 
add more columns per row. The addition to memstore takes more time due to the 
comparisons. Can you see if you can try that patch there to see if it improves 
your write performance? I have a PR raised  against branch-2.3. 

 

> hbase2.x the performance is lower than hbase 1.x  ?
> ---
>
> Key: HBASE-25346
> URL: https://issues.apache.org/jira/browse/HBASE-25346
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.2
>Reporter: nilonealex
>Priority: Critical
> Attachments: hbase-pe-performace-test.log, hbase-site.xml, 
> test_for_randomWrite.log
>
>
> Recently we found that the newly built production hbase cluster is running a 
> bit slow , the hadoop version is Hbase2.0.2 ( HDP3.1.1) and it has 100 
> nodes.Then we begin to  do load & query performance verification between 
> Hbase2.0.2 ( HDP3.1.1) & Hbase1.2.0 ( CDH5.13.3 ) test environment (4nodes), 
> found that : put data based on hbase2.0 is much slower than hbase1.x (the 
> former is almost half of the latter), I use BufferedMutator and 
> BufferedMutatorParams term for batch put to improve efficiency. More 
> confusing is the performance of the production environment is worse than my 
> test environment
> Some of the codes are as follows:
> ---
> {color:#4C9AFF}List mutator = new ArrayList<>();
> BufferedMutator table = null;
> BufferedMutatorParams params = new 
> BufferedMutatorParams(TableName.valueOf(fileHbRule.getHbaseTableName()));
> params.writeBufferSize(fileHbRule.getFlushBuffer().intValue()*1024*1024);
> table = connection.getBufferedMutator(params);
>   
> mutator.add(p);
> if(totalCnts % 5000 == 0 ) {
>   table.mutate(mutator);
>   mutator.clear();
> }{color}
> ---
> The file to put is a text format file: 2 million rows comma-separated text 
> file, each row records 110 columns, total size is about 1G. In addition to 
> the main parameter configuration such as heap memory, I kept the default 
> parameter values ??for most of the hbase services.
> The load program is designed for single thread.
> The following is the progress information :
> --- Hbase1.2.0 ( CDH5.13.3 ) 
> 
> 2020-12-01 16:48:18 inserted:  10
> 2020-12-01 16:48:36 inserted:  20
> 2020-12-01 16:48:52 inserted:  30
> 2020-12-01 16:49:08 inserted:  40
> 2020-12-01 16:49:23 inserted:  50
> 2020-12-01 16:49:39 inserted:  60
> 2020-12-01 16:49:56 inserted:  70
> 2020-12-01 16:50:12 inserted:  80
> 2020-12-01 16:50:29 inserted:  90
> 2020-12-01 16:50:45 inserted:  100
> 2020-12-01 16:51:01 inserted:  110
> 2020-12-01 16:51:17 inserted:  120
> 2020-12-01 16:51:34 inserted:  130
> 2020-12-01 16:51:49 inserted:  140
> 2020-12-01 16:52:05 inserted:  150
> 2020-12-01 16:52:21 inserted:  160
> 2020-12-01 16:52:40 inserted:  170
> 2020-12-01 16:52:57 inserted:  180
> 2020-12-01 16:53:19 inserted:  190
> 2020-12-01 16:53:42 inserted:  200
> 2020-12-01 16:53:48 inserted:  200
> imp finished ok! 
> --job finished--
> ---Hbase.2.0.2 ( 
> HDP3.1.1)-
> 2020-12-01 17:25:24 inserted:  10
> 2020-12-01 17:26:03 inserted:  20
> 2020-12-01 17:26:39 inserted:  30
> 2020-12-01 17:27:13 inserted:  40
> 2020-12-01 17:27:47 inserted:  50
> 2020-12-01 17:28:23 inserted:  60
> 2020-12-01 17:29:03 inserted:  70
> 2020-12-01 17:29:40 inserted:  80
> 2020-12-01 17:30:15 inserted:  90
> 2020-12-01 17:30:51 inserted:  100
> 2020-12-01 17:31:27 inserted:  110
> 2020-12-01 17:32:03 inserted:  120
> 2020-12-01 17:32:39 inserted:  130
> 2020-12-01 17:33:14 inserted:  140
> 2020-12-01 17:33:50 inserted:  150
> 2020-12-01 17:34:25 inserted:  160
> 2020-12-01 17:35:01 inserted:  170
> 2020-12-01 17:35:38 inserted:  180
> 2020-12-01 17:36:14 inserted:  190
> 2020-12-01 17:36:51 inserted:  200
> 2020-12-01 17:36:55 inserted:  200
> imp finished ok! 
> --job finished--
> returnCode=0
> In addition, we also did some benchmark tests on the production cluster.The 
> delay is seem to be a bit 

[jira] [Commented] (HBASE-25378) Legacy comparator in Hfile trailer will fail to load

2020-12-09 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25378?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17246683#comment-17246683
 ] 

ramkrishna.s.vasudevan commented on HBASE-25378:


Ya Pankaj. Sorry even I missed this part. how ever it is good to fix it in the 
getComparatorClass() to handle this change. 

> Legacy comparator in Hfile trailer will fail to load
> 
>
> Key: HBASE-25378
> URL: https://issues.apache.org/jira/browse/HBASE-25378
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.4.0, 2.3.2
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.3.4
>
>
> HBASE-24968 moved MetaCellComparator out from CellComparatorImpl to avoid the 
> deadlock issue. But this introduced compatibility issue, old hfile with 
> comparator class as 
> "org.apache.hadoop.hbase.CellComparator$MetaCellComparator" will fail to open 
> due to ClassNotFoundException.
> Also we should also handle the case when comparator class is 
> "org.apache.hadoop.hbase.CellComparatorImpl$MetaCellComparator", which was 
> case before HBASE-24968.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25346) hbase2.x the performance is lower than hbase 1.x ?

2020-12-08 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245725#comment-17245725
 ] 

ramkrishna.s.vasudevan commented on HBASE-25346:


[~nilone2]

>From the report you have attached am not sure whether it is the writes that is 
>slower or reads also? 

Is it possible to attach the TPS/latency numbers with writes and reads? 

> hbase2.x the performance is lower than hbase 1.x  ?
> ---
>
> Key: HBASE-25346
> URL: https://issues.apache.org/jira/browse/HBASE-25346
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.2
>Reporter: nilonealex
>Priority: Critical
> Attachments: hbase-pe-performace-test.log, hbase-site.xml
>
>
> Recently we found that the newly built production hbase cluster is running a 
> bit slow , the hadoop version is Hbase2.0.2 ( HDP3.1.1) and it has 100 
> nodes.Then we begin to  do load & query performance verification between 
> Hbase2.0.2 ( HDP3.1.1) & Hbase1.2.0 ( CDH5.13.3 ) test environment (4nodes), 
> found that : put data based on hbase2.0 is much slower than hbase1.x (the 
> former is almost half of the latter), I use BufferedMutator and 
> BufferedMutatorParams term for batch put to improve efficiency. More 
> confusing is the performance of the production environment is worse than my 
> test environment
> Some of the codes are as follows:
> ---
> {color:#4C9AFF}List mutator = new ArrayList<>();
> BufferedMutator table = null;
> BufferedMutatorParams params = new 
> BufferedMutatorParams(TableName.valueOf(fileHbRule.getHbaseTableName()));
> params.writeBufferSize(fileHbRule.getFlushBuffer().intValue()*1024*1024);
> table = connection.getBufferedMutator(params);
>   
> mutator.add(p);
> if(totalCnts % 5000 == 0 ) {
>   table.mutate(mutator);
>   mutator.clear();
> }{color}
> ---
> The file to put is a text format file: 2 million rows comma-separated text 
> file, each row records 110 columns, total size is about 1G. In addition to 
> the main parameter configuration such as heap memory, I kept the default 
> parameter values ??for most of the hbase services.
> The load program is designed for single thread.
> The following is the progress information :
> --- Hbase1.2.0 ( CDH5.13.3 ) 
> 
> 2020-12-01 16:48:18 inserted:  10
> 2020-12-01 16:48:36 inserted:  20
> 2020-12-01 16:48:52 inserted:  30
> 2020-12-01 16:49:08 inserted:  40
> 2020-12-01 16:49:23 inserted:  50
> 2020-12-01 16:49:39 inserted:  60
> 2020-12-01 16:49:56 inserted:  70
> 2020-12-01 16:50:12 inserted:  80
> 2020-12-01 16:50:29 inserted:  90
> 2020-12-01 16:50:45 inserted:  100
> 2020-12-01 16:51:01 inserted:  110
> 2020-12-01 16:51:17 inserted:  120
> 2020-12-01 16:51:34 inserted:  130
> 2020-12-01 16:51:49 inserted:  140
> 2020-12-01 16:52:05 inserted:  150
> 2020-12-01 16:52:21 inserted:  160
> 2020-12-01 16:52:40 inserted:  170
> 2020-12-01 16:52:57 inserted:  180
> 2020-12-01 16:53:19 inserted:  190
> 2020-12-01 16:53:42 inserted:  200
> 2020-12-01 16:53:48 inserted:  200
> imp finished ok! 
> --job finished--
> ---Hbase.2.0.2 ( 
> HDP3.1.1)-
> 2020-12-01 17:25:24 inserted:  10
> 2020-12-01 17:26:03 inserted:  20
> 2020-12-01 17:26:39 inserted:  30
> 2020-12-01 17:27:13 inserted:  40
> 2020-12-01 17:27:47 inserted:  50
> 2020-12-01 17:28:23 inserted:  60
> 2020-12-01 17:29:03 inserted:  70
> 2020-12-01 17:29:40 inserted:  80
> 2020-12-01 17:30:15 inserted:  90
> 2020-12-01 17:30:51 inserted:  100
> 2020-12-01 17:31:27 inserted:  110
> 2020-12-01 17:32:03 inserted:  120
> 2020-12-01 17:32:39 inserted:  130
> 2020-12-01 17:33:14 inserted:  140
> 2020-12-01 17:33:50 inserted:  150
> 2020-12-01 17:34:25 inserted:  160
> 2020-12-01 17:35:01 inserted:  170
> 2020-12-01 17:35:38 inserted:  180
> 2020-12-01 17:36:14 inserted:  190
> 2020-12-01 17:36:51 inserted:  200
> 2020-12-01 17:36:55 inserted:  200
> imp finished ok! 
> --job finished--
> returnCode=0
> In addition, we also did some benchmark tests on the production cluster.The 
> delay is seem to be a bit high. The detailed report is in the attachment.
> Are there any key points that I have not done configuration? or,, this 
> version has performance defects ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24637) Reseek regression related to filter SKIP hinting

2020-12-07 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17245133#comment-17245133
 ] 

ramkrishna.s.vasudevan commented on HBASE-24637:


I have attached a PPt to explain what the patch basically tries to do to 
understand better. 

The HFileReaderImpl clearly knows how the next() and seek() has worked and it 
can indicate whether the same next() block has been seeked or not. Hence the 
HFileReader's decision is used at the StoreSCanner layer. 

> Reseek regression related to filter SKIP hinting
> 
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl, seeksVsSkip_ppt.pptx
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24637) Reseek regression related to filter SKIP hinting

2020-12-07 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-24637:
---
Attachment: seeksVsSkip_ppt.pptx

> Reseek regression related to filter SKIP hinting
> 
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl, seeksVsSkip_ppt.pptx
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23811) [OpenTracing] Add shaded JaegerTracing tracer to hbase-thirdparty

2020-12-06 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244954#comment-17244954
 ] 

ramkrishna.s.vasudevan commented on HBASE-23811:


[~zhangduo]

yes I agree with you. Let the hbase code have only the OpenTelemetry or 
OpenTracing APIs only. Let the user decide on the impl. 

In our POC we ensured that we used the Open tracing based shims that is 
provided by Opentelemetry so that we can still use Opentracing APIs but at the 
end of the day it is OpenTelemetry that is working. The idea was that since 
already the APIs are OpenTelemetry based you still have Opentracing based 
compatability. But when I checked last there was no GA release for 
OpenTelemetry.

> [OpenTracing] Add shaded JaegerTracing tracer to hbase-thirdparty
> -
>
> Key: HBASE-23811
> URL: https://issues.apache.org/jira/browse/HBASE-23811
> Project: HBase
>  Issue Type: Sub-task
>  Components: thirdparty
>Reporter: Wei-Chiu Chuang
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> JaegerTracing pulls in lots of dependencies. Some, like libthrift (0.13.0) 
> conflicts the one ships in HBase (0.12.0).
> Additionally, not everyone may want to use Jaeger.
> I propose to shade JaegerTracing and its dependencies into an uber jar, place 
> it as a hbase-thirdparty artifact. As an added benefit, this makes the 
> management of tracers in the HBase's dependency tree much easier. Finally, we 
> can follow the same suit and provide Zipkin tracer support in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-22120) Replace HTrace with OpenTracing

2020-12-05 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-22120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17244539#comment-17244539
 ] 

ramkrishna.s.vasudevan commented on HBASE-22120:


We did a small POC with open telemetry changes and just updated some existing 
Tracing calls to use the Open telemetry way and some new changes. We were able 
to make it work with OpenTelemetry. JFYI. 

> Replace HTrace with OpenTracing
> ---
>
> Key: HBASE-22120
> URL: https://issues.apache.org/jira/browse/HBASE-22120
> Project: HBase
>  Issue Type: New Feature
>  Components: tracing
>Affects Versions: 3.0.0-alpha-1
>Reporter: Sergey Shelukhin
>Assignee: Wei-Chiu Chuang
>Priority: Major
>
> h2. Deprecate HTrace usage in HBase
>  * HBase 1.x (branch-1)
>  * Declare HTrace (htrace 3.x) deprecated in the user doc.
>  * HBase 2.x (branch-2)
>  * Declare HTrace deprecated in the user doc. Furthermore, state that it is 
> known not working.
>  * Either fix the trace context propagation bug in HBase 2.x, or backport 
> OpenTracing support from the master branch. I am inclined to the latter.
>  * HBase 3.x (master branch)
>  * Remove HTrace entirely.
>  * Add OpenTracing APIs. Potentially backport to HBase 2.4.
>  * Replace OpenTracing API with OpenTelemetry when the latter stabilizes.
> h1. Milestones
>  # Doc -- deprecation notice
>  # Replace existing HTrace code with OpenTracing code in the master branch 
> (3.x) 
>  # Java (a [poc|https://github.com/jojochuang/hbase/tree/HBASE-22120] is 
> currently under way)
>  # HBase shell and scripts (Ruby, shell script)
>  # Doc 
>  # Add new trace instrumentation code for new features not instrumented by 
> the existing HTace code.
>  # Propagate the traces to other systems such as HDFS and MapReduce.
>  # Support other OpenTracing tracers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25346) hbase2.x the performance is lower than hbase 1.x ?

2020-12-01 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17241422#comment-17241422
 ] 

ramkrishna.s.vasudevan commented on HBASE-25346:


The WAL sits on HDFS and that is same in both the clusters ? 

> hbase2.x the performance is lower than hbase 1.x  ?
> ---
>
> Key: HBASE-25346
> URL: https://issues.apache.org/jira/browse/HBASE-25346
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.0.2
>Reporter: nilonealex
>Priority: Critical
> Attachments: hbase-site.xml
>
>
> Recently we found that the newly built production hbase cluster is running a 
> bit slow , the hadoop version is Hbase2.0.2 ( HDP3.1.1) and it has 100 
> nodes.Then we begin to  do load & query performance verification between 
> Hbase2.0.2 ( HDP3.1.1) & Hbase1.2.0 ( CDH5.13.3 ) test environment (4nodes), 
> found that : put data based on hbase2.0 is much slower than hbase1.x (the 
> former is almost half of the latter), I use BufferedMutator and 
> BufferedMutatorParams term for batch put to improve efficiency. More 
> confusing is the performance of the production environment is worse than my 
> test environment
> Some of the codes are as follows:
> ---
> {color:#4C9AFF}List mutator = new ArrayList<>();
> BufferedMutator table = null;
> BufferedMutatorParams params = new 
> BufferedMutatorParams(TableName.valueOf(fileHbRule.getHbaseTableName()));
> params.writeBufferSize(fileHbRule.getFlushBuffer().intValue()*1024*1024);
> table = connection.getBufferedMutator(params);
>   
> mutator.add(p);
> if(totalCnts % 5000 == 0 ) {
>   table.mutate(mutator);
>   mutator.clear();
> }{color}
> ---
> The file to put is a text format file: 2 million rows comma-separated text 
> file, each row records 110 columns, total size is about 1G. In addition to 
> the main parameter configuration such as heap memory, I kept the default 
> parameter values ??for most of the hbase services.
> The load program is designed for single thread.
> The following is the progress information :
> --- Hbase1.2.0 ( CDH5.13.3 ) 
> 
> 2020-12-01 16:48:18 inserted:  10
> 2020-12-01 16:48:36 inserted:  20
> 2020-12-01 16:48:52 inserted:  30
> 2020-12-01 16:49:08 inserted:  40
> 2020-12-01 16:49:23 inserted:  50
> 2020-12-01 16:49:39 inserted:  60
> 2020-12-01 16:49:56 inserted:  70
> 2020-12-01 16:50:12 inserted:  80
> 2020-12-01 16:50:29 inserted:  90
> 2020-12-01 16:50:45 inserted:  100
> 2020-12-01 16:51:01 inserted:  110
> 2020-12-01 16:51:17 inserted:  120
> 2020-12-01 16:51:34 inserted:  130
> 2020-12-01 16:51:49 inserted:  140
> 2020-12-01 16:52:05 inserted:  150
> 2020-12-01 16:52:21 inserted:  160
> 2020-12-01 16:52:40 inserted:  170
> 2020-12-01 16:52:57 inserted:  180
> 2020-12-01 16:53:19 inserted:  190
> 2020-12-01 16:53:42 inserted:  200
> 2020-12-01 16:53:48 inserted:  200
> imp finished ok! 
> --job finished--
> ---Hbase.2.0.2 ( 
> HDP3.1.1)-
> 2020-12-01 17:25:24 inserted:  10
> 2020-12-01 17:26:03 inserted:  20
> 2020-12-01 17:26:39 inserted:  30
> 2020-12-01 17:27:13 inserted:  40
> 2020-12-01 17:27:47 inserted:  50
> 2020-12-01 17:28:23 inserted:  60
> 2020-12-01 17:29:03 inserted:  70
> 2020-12-01 17:29:40 inserted:  80
> 2020-12-01 17:30:15 inserted:  90
> 2020-12-01 17:30:51 inserted:  100
> 2020-12-01 17:31:27 inserted:  110
> 2020-12-01 17:32:03 inserted:  120
> 2020-12-01 17:32:39 inserted:  130
> 2020-12-01 17:33:14 inserted:  140
> 2020-12-01 17:33:50 inserted:  150
> 2020-12-01 17:34:25 inserted:  160
> 2020-12-01 17:35:01 inserted:  170
> 2020-12-01 17:35:38 inserted:  180
> 2020-12-01 17:36:14 inserted:  190
> 2020-12-01 17:36:51 inserted:  200
> 2020-12-01 17:36:55 inserted:  200
> imp finished ok! 
> --job finished--
> returnCode=0
> In addition, we also did some benchmark tests on the production cluster.The 
> delay is seem to be a bit high. The detailed report is in the attachment.
> Are there any key points that I have not done configuration? or,, this 
> version has performance defects ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25050) We initialize Filesystems more than once.

2020-11-24 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17238124#comment-17238124
 ] 

ramkrishna.s.vasudevan commented on HBASE-25050:


Pushed to master and branch-2. Thanks [~apurtell].

 

> We initialize Filesystems more than once.
> -
>
> Key: HBASE-25050
> URL: https://issues.apache.org/jira/browse/HBASE-25050
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.6
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> In HFileSystem
> {code}
> // Create the default filesystem with checksum verification switched on.
> // By default, any operation to this FilterFileSystem occurs on
> // the underlying filesystem that has checksums switched on.
> this.fs = FileSystem.get(conf);
> this.useHBaseChecksum = useHBaseChecksum;
> fs.initialize(getDefaultUri(conf), conf);
> {code}
> We call fs.initialize(). Generally the FS would have been created and inited 
> either in the FileSystem.get() call above or even when we try to check 
> {code}
>   FileSystem fs = p.getFileSystem(c);
> {code}
> The FS that gets cached in the hadoop-common layer does the init for us. So 
> we doing it again is redundant. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25277) postScannerFilterRow impacts Scan performance a lot in HBase 2.x

2020-11-24 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-25277:
---
Labels: perfomance scanning  (was: )

> postScannerFilterRow impacts Scan performance a lot in HBase 2.x
> 
>
> Key: HBASE-25277
> URL: https://issues.apache.org/jira/browse/HBASE-25277
> Project: HBase
>  Issue Type: Bug
>  Components: Coprocessors, scan
>Reporter: Pankaj Kumar
>Assignee: Pankaj Kumar
>Priority: Critical
>  Labels: perfomance, scanning
> Fix For: 3.0.0-alpha-1
>
> Attachments: test_report.png
>
>
> In our test we observed Scan performance is degraded by more than 60% in 
> HBase-2.x as compared to 1.3.x.  As per the flamegraph report, RS spent 31% 
> of the time in postScannerFilterRow, however the coprocessors 
> (AccessController, VisibilityController & ConstraintProcessor) does nothing 
> in that hook.
> HBASE-14489 added the logic to avoid the call to postScannerFilterRow when 
> not needed which is not working as expected in HBase 2.x. AccessController, 
> VisibilityController & ConstraintProcessor override the postScannerFilterRow 
> with dummy (same as RegionObserver) implementation, so 
> RegionCoprocessorHost.hasCustomPostScannerFilterRow will be TRUE and call the 
> hook for all configured CPs while processing each row .  Suppose we have 
> configured 5 region CPs and there are 1 M rows in table, then there will be 1 
> * 5 M dummy call to postScannerFilterRow during whole table scan.
> We need to remove postScannerFilterRow hook from these CPs as these are not 
> doing anything.
> Another problem is in RegionCoprocessorHost.hasCustomPostScannerFilterRow 
> init logic, currently it is always TRUE even though we remove 
> postScannerFilterRow hook from AccessController, VisibilityController & 
> ConstraintProcessor, because we are finding  postScannerFilterRow until  it 
> is found (we look in configured CP's super class also) or clazz is NULL.
> https://github.com/apache/hbase/blob/035c192eb665469ce0c071db86c78f4a873c123b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java#L301
> Supper class of Object (super class of AccessController) will be NULL, so 
> RegionCoprocessorHost.hasCustomPostScannerFilterRow will be set as TRUE
> https://github.com/apache/hbase/blob/035c192eb665469ce0c071db86c78f4a873c123b/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/RegionCoprocessorHost.java#L279



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25050) We initialize Filesystems more than once.

2020-11-24 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-25050:
---
Fix Version/s: 2.4.0
   3.0.0-alpha-1

> We initialize Filesystems more than once.
> -
>
> Key: HBASE-25050
> URL: https://issues.apache.org/jira/browse/HBASE-25050
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 3.0.0-alpha-1, 2.3.1, 2.4.0, 2.2.6
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> In HFileSystem
> {code}
> // Create the default filesystem with checksum verification switched on.
> // By default, any operation to this FilterFileSystem occurs on
> // the underlying filesystem that has checksums switched on.
> this.fs = FileSystem.get(conf);
> this.useHBaseChecksum = useHBaseChecksum;
> fs.initialize(getDefaultUri(conf), conf);
> {code}
> We call fs.initialize(). Generally the FS would have been created and inited 
> either in the FileSystem.get() call above or even when we try to check 
> {code}
>   FileSystem fs = p.getFileSystem(c);
> {code}
> The FS that gets cached in the hadoop-common layer does the init for us. So 
> we doing it again is redundant. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25187) Improve SizeCachedKV variants initialization

2020-11-24 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237995#comment-17237995
 ] 

ramkrishna.s.vasudevan commented on HBASE-25187:


[~ndimiduk] and [~zghao] thanks for the heads up.

[~apurtell] - Thanks for the ping here. I am on vacation and I just pushed it 
to branch-2, branch-2.2 and branch-2.3.

> Improve SizeCachedKV variants initialization
> 
>
> Key: HBASE-25187
> URL: https://issues.apache.org/jira/browse/HBASE-25187
> Project: HBase
>  Issue Type: Improvement
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.4, 2.5.0
>
>
> Currently in SizeCachedKV we get the rowlength and Key length from the 
> buffers. This can be optimized because we can pass the keylen and row len 
> while actually creating the cell while reading the cell from the block.  Some 
> times we see that the SizeCachedKV takes the max width in a flame graph - 
> considering the fact we also do a sanity check on the created KV. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24637) Reseek regression related to filter SKIP hinting

2020-11-16 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232965#comment-17232965
 ] 

ramkrishna.s.vasudevan commented on HBASE-24637:


[~apurtell]

bq.Why does branch-2 do this and not branch-1?

Yes branch-1 does not do reseek because of the optimization added at the SQM 
layer as [~larsh] pointed out. So there we don't do any reseek for the case 
when the tracker says we need to do SEEK_COL but filter says SKIP. This only 
happens with addColumns.

But a case where the filter says INCLUDE and tracker says SEEK_COL  (again with 
addColumns) there wont be any regression between branch-1.3 and branch-2. Even 
the number of comparisons and reseek should be same except that branch-2 might 
suffer from some Comparator related perf which might not be impacting as shown 
here in these tests. 

bq.Is there a way to avoid the reseek per block?

I have not changed the SQM logic that was added as part of 
https://issues.apache.org/jira/browse/HBASE-17125. The way I have tried to 
solve that is the one that is attached in the PR where we can try checking for 
few blocks if at all we really do a reseek to a new block other than the one 
that reached out from trying to do a next(). If so continue the current way if 
not switch over to next() only for that scan query. 

 

> Reseek regression related to filter SKIP hinting
> 
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25187) Improve SizeCachedKV variants initialization

2020-11-15 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17232540#comment-17232540
 ] 

ramkrishna.s.vasudevan commented on HBASE-25187:


[~ndimiduk] and [~zghao]

Do you want this in 2.3 and 2.2 branches?

 

> Improve SizeCachedKV variants initialization
> 
>
> Key: HBASE-25187
> URL: https://issues.apache.org/jira/browse/HBASE-25187
> Project: HBase
>  Issue Type: Improvement
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.4.0, 2.3.4
>
>
> Currently in SizeCachedKV we get the rowlength and Key length from the 
> buffers. This can be optimized because we can pass the keylen and row len 
> while actually creating the cell while reading the cell from the block.  Some 
> times we see that the SizeCachedKV takes the max width in a flame graph - 
> considering the fact we also do a sanity check on the created KV. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24637) Reseek regression related to filter SKIP hinting

2020-11-07 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17227769#comment-17227769
 ] 

ramkrishna.s.vasudevan commented on HBASE-24637:


[~apurtell]

I agree that there are seeks but what I felt is that the tryToSkipNextRow or 
tryToSkipToNextCol is trying its best to avoid the seek. But it is more of the 
compares that is adding to the CPU cycles is what I observed because in the PE 
case we just add all cols of the given row. So when ever there is no 
nextIndexedKey as part of the next() call, only then we reseek. So it is once 
at the end of the block we do seek. Not for every column.

How ever I have  a patch where we try to understand whether we are always 
reseeking to the block that was actually fetched via the next() call, if both 
blocks are same and if this happens for certain number of blocks we fall back 
to pure next() mode rather then even trying to do next() or seek() which 
involves lot of compares. Will post a patch early next week. 

 

> Reseek regression related to filter SKIP hinting
> 
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24637) Reseek regression related to filter SKIP hinting

2020-10-27 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17221309#comment-17221309
 ] 

ramkrishna.s.vasudevan commented on HBASE-24637:


[~larsh] and [~apurtell]
 I have been trying to reproduce this issue and finally was able to reproduce 
this. It is very clear and evident that the full scans we are able to reproduce 
this issue when ever we have addColumns to it and that those addColumns cover 
almost majority of the columns in the given row. Why because say if we have 
only 3 columns (random) to be covered out of 25 columns then we don't see much 
of an impact. But assume we need to cover >20 cols then this is much 
pronounced. Particularly with PE if we have 25 cols then by default all columns 
get covered by using addColumn(). If that was not the case then this perf issue 
is not visible.

Now coming to the issue as [~larsh] rightly pointed out it is the SQM that is 
now saying to SEEK_NEXT_COL when the filter says SKIP and tracker says SEEK 
then we end up in this issue. But at the StoreScanner (as per my observation) 
it is not the reseek that is actually causing the issue. the reason is we don't 
actually reseek but we tend to do more comparisons in the case where we have 
tryskipOrSeekToNextcolumn(). In prevoius 1.x branches we directly got a SKIP 
and so we just did a skip but here we are deciding whether to skip or Seek and 
there we spend more time. In a table with 10 rows and 20 columns (all added 
as part of addColumn) we doing approx 10*20 more compares. There is no seek 
happening at all. This is out of a simple test case running from mini dfs 
cluster and just adding filterAll and adding all the 20 columns as part of the 
scan. (all data in cache, versions 1).

> Reseek regression related to filter SKIP hinting
> 
>
> Key: HBASE-24637
> URL: https://issues.apache.org/jira/browse/HBASE-24637
> Project: HBase
>  Issue Type: Bug
>  Components: Filters, Performance, Scanners
>Affects Versions: 2.2.5
>Reporter: Andrew Kyle Purtell
>Priority: Major
> Attachments: W-7665966-FAST_DIFF-FILTER_ALL.pdf, 
> W-7665966-Instrument-low-level-scan-details-branch-1.patch, 
> W-7665966-Instrument-low-level-scan-details-branch-2.2.patch, 
> parse_call_trace.pl
>
>
> I have been looking into reported performance regressions in HBase 2 relative 
> to HBase 1. Depending on the test scenario, HBase 2 can demonstrate 
> significantly better microbenchmarks in a number of cases, and usually shows 
> improvement in whole cluster benchmarks like YCSB.
> To assist in debugging I added methods to RpcServer for updating per-call 
> metrics that leverage the fact it puts a reference to the current Call into a 
> thread local and that all activity for a given RPC is processed by a single 
> thread context. I then instrumented ScanQueryMatcher (in branch-1) and its 
> various friends (in branch-2.2), StoreScanner, HFileReaderV2 and 
> HFileReaderV3 (in branch-1) and HFileReaderImpl (in branch-2.2), HFileBlock, 
> and DefaultMemStore (branch-1) and SegmentScanner (branch-2.2). Test tables 
> with one family and 1, 5, 10, 20, 50, and 100 distinct column-qualifiers per 
> row were created, snapshot, dropped, and cloned from the snapshot. Both 1.6 
> and 2.2 versions under test operated on identical data files in HDFS. For 
> tests with 1.6 and 2.2 on the server side the same 1.6 PE client was used, to 
> ensure only the server side differed.
> The results for pe --filterAll were revealing. See attached. 
> It appears a refactor to ScanQueryMatcher and friends has disabled the 
> ability of filters to provide meaningful SKIP hints, which disables an 
> optimization that avoids reseeking, leading to a serious and proportional 
> regression in reseek activity and time spent in that code path. So for 
> queries that use filters, there can be a substantial regression.
> Other test cases that did not use filters did not show this regression. If 
> filters are not used the behavior of ScanQueryMatcher between 1.6 and 2.2 was 
> almost identical, as measured by counts of the hint types returned, whether 
> or not column or version trackers are called, and counts of store seeks or 
> reseeks. Regarding micro-timings, there was a 10% variance in my testing and 
> results generally fell within this range, except for the filter all case of 
> course. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25187) Improve SizeCachedKV variants initialization

2020-10-22 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17219459#comment-17219459
 ] 

ramkrishna.s.vasudevan commented on HBASE-25187:


This patch along with the removal of the sanity check in the KVUtil gives 12% 
improvement in scans reading 5G of data and filtering using SCVF. How ever in 
this patch I have not removed the sanity check so probably the improvement is 
going to be lesser than 12%.

> Improve SizeCachedKV variants initialization
> 
>
> Key: HBASE-25187
> URL: https://issues.apache.org/jira/browse/HBASE-25187
> Project: HBase
>  Issue Type: Improvement
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.3
>
>
> Currently in SizeCachedKV we get the rowlength and Key length from the 
> buffers. This can be optimized because we can pass the keylen and row len 
> while actually creating the cell while reading the cell from the block.  Some 
> times we see that the SizeCachedKV takes the max width in a flame graph - 
> considering the fact we also do a sanity check on the created KV. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25191) JVMMetrics tag.processName regression between hbase-1.3 and hbase-2.x+ versions

2020-10-15 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17215204#comment-17215204
 ] 

ramkrishna.s.vasudevan commented on HBASE-25191:


I think now even if we fix the processName from not appearing as 'IO', the 
tag.processName will be 'Server' and not as 'Master' and 'RegionServer'.

> JVMMetrics tag.processName regression between hbase-1.3 and hbase-2.x+ 
> versions
> ---
>
> Key: HBASE-25191
> URL: https://issues.apache.org/jira/browse/HBASE-25191
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>
> The regression is caused by 
> https://issues.apache.org/jira/browse/HBASE-15160.
> In order to monitor the FS latencies and pread latencies we have added the 
> MetricsIO as part of metrics. Since we account this at the HFileBlock layer, 
> we have created a static MetricIO variable at HFile.java so that we can use 
> that metrics in a static way.
> Internally the MEtricIO creates a MetricsIOWrapperImpl that in turns 
> registers the metrics with the BaseSource. The flow is as follows,
> {code}
> this(CompatibilitySingletonFactory.getInstance(MetricsRegionServerSourceFactory.class)
> .createIO(wrapper), wrapper);
> {code}
> The createIO -> inturn creates a MetricsIOSourceImpl where the Metrics_Name 
> is 'IO'.
> The BaseSourceImpl registers a singleton JVMMetrics
> {code}
> synchronized void init(String name) {
>   ...
>   DefaultMetricsSystem.initialize(HBASE_METRICS_SYSTEM_NAME);
>   JvmMetrics.initSingleton(name, "");
>
> }
> {code}
> The name passed here is 'IO'.  This is where the processName gets set with 
> 'IO'.
> All other metrics that we create in the HRS and master is not static level 
> metrics whereas all are instance level metrics. So the very first time we 
> create either Master metrics or Region server metrics so then the metrics 
> would have had the processName as either RegionServer or master.
> But pls note am note very sure on this now like if at all are we creating a  
> metric based on the actual process name like it was during the hbase-1.x 
> time. In other words my doubt is even if we solve this 'IO' process case do 
> we really get back the processName as 'Master' or 'RegionServer' as in 
> https://issues.apache.org/jira/browse/HBASE-12328. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25191) JVMMetrics tag.processName regression between hbase1.3 and hbase-2+ versions

2020-10-15 Thread ramkrishna.s.vasudevan (Jira)
ramkrishna.s.vasudevan created HBASE-25191:
--

 Summary: JVMMetrics tag.processName regression between hbase1.3 
and hbase-2+ versions
 Key: HBASE-25191
 URL: https://issues.apache.org/jira/browse/HBASE-25191
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.0.1
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


The regression is caused by 
https://issues.apache.org/jira/browse/HBASE-15160.
In order to monitor the FS latencies and pread latencies we have added the 
MetricsIO as part of metrics. Since we account this at the HFileBlock layer, we 
have created a static MetricIO variable at HFile.java so that we can use that 
metrics in a static way.
Internally the MEtricIO creates a MetricsIOWrapperImpl that in turns registers 
the metrics with the BaseSource. The flow is as follows,
{code}
this(CompatibilitySingletonFactory.getInstance(MetricsRegionServerSourceFactory.class)
.createIO(wrapper), wrapper);
{code}
The createIO -> inturn creates a MetricsIOSourceImpl where the Metrics_Name is 
'IO'.
The BaseSourceImpl registers a singleton JVMMetrics
{code}
synchronized void init(String name) {
  ...
  DefaultMetricsSystem.initialize(HBASE_METRICS_SYSTEM_NAME);
  JvmMetrics.initSingleton(name, "");
   
}
{code}
The name passed here is 'IO'.  This is where the processName gets set with 'IO'.
All other metrics that we create in the HRS and master is not static level 
metrics whereas all are instance level metrics. So the very first time we 
create either Master metrics or Region server metrics so then the metrics would 
have had the processName as either RegionServer or master.
But pls note am note very sure on this now like if at all are we creating a  
metric based on the actual process name like it was during the hbase-1.x time. 
In other words my doubt is even if we solve this 'IO' process case do we really 
get back the processName as 'Master' or 'RegionServer' as in 
https://issues.apache.org/jira/browse/HBASE-12328. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25191) JVMMetrics tag.processName regression between hbase-1.3 and hbase-2.x+ versions

2020-10-15 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-25191:
---
Summary: JVMMetrics tag.processName regression between hbase-1.3 and 
hbase-2.x+ versions  (was: JVMMetrics tag.processName regression between 
hbase1.3 and hbase-2+ versions)

> JVMMetrics tag.processName regression between hbase-1.3 and hbase-2.x+ 
> versions
> ---
>
> Key: HBASE-25191
> URL: https://issues.apache.org/jira/browse/HBASE-25191
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 2.0.1
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>
> The regression is caused by 
> https://issues.apache.org/jira/browse/HBASE-15160.
> In order to monitor the FS latencies and pread latencies we have added the 
> MetricsIO as part of metrics. Since we account this at the HFileBlock layer, 
> we have created a static MetricIO variable at HFile.java so that we can use 
> that metrics in a static way.
> Internally the MEtricIO creates a MetricsIOWrapperImpl that in turns 
> registers the metrics with the BaseSource. The flow is as follows,
> {code}
> this(CompatibilitySingletonFactory.getInstance(MetricsRegionServerSourceFactory.class)
> .createIO(wrapper), wrapper);
> {code}
> The createIO -> inturn creates a MetricsIOSourceImpl where the Metrics_Name 
> is 'IO'.
> The BaseSourceImpl registers a singleton JVMMetrics
> {code}
> synchronized void init(String name) {
>   ...
>   DefaultMetricsSystem.initialize(HBASE_METRICS_SYSTEM_NAME);
>   JvmMetrics.initSingleton(name, "");
>
> }
> {code}
> The name passed here is 'IO'.  This is where the processName gets set with 
> 'IO'.
> All other metrics that we create in the HRS and master is not static level 
> metrics whereas all are instance level metrics. So the very first time we 
> create either Master metrics or Region server metrics so then the metrics 
> would have had the processName as either RegionServer or master.
> But pls note am note very sure on this now like if at all are we creating a  
> metric based on the actual process name like it was during the hbase-1.x 
> time. In other words my doubt is even if we solve this 'IO' process case do 
> we really get back the processName as 'Master' or 'RegionServer' as in 
> https://issues.apache.org/jira/browse/HBASE-12328. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25187) Improve SizeCachedKV variants initialization

2020-10-14 Thread ramkrishna.s.vasudevan (Jira)
ramkrishna.s.vasudevan created HBASE-25187:
--

 Summary: Improve SizeCachedKV variants initialization
 Key: HBASE-25187
 URL: https://issues.apache.org/jira/browse/HBASE-25187
 Project: HBase
  Issue Type: Improvement
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 3.0.0-alpha-1, 2.3.3


Currently in SizeCachedKV we get the rowlength and Key length from the buffers. 
This can be optimized because we can pass the keylen and row len while actually 
creating the cell while reading the cell from the block.  Some times we see 
that the SizeCachedKV takes the max width in a flame graph - considering the 
fact we also do a sanity check on the created KV. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213808#comment-17213808
 ] 

ramkrishna.s.vasudevan commented on HBASE-25065:


[~zhangduo] I just raised an issue 
https://issues.apache.org/jira/browse/HBASE-25186. Marked it as blocker.

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25186) TestMasterRegionOnTwoFileSystems is failing after HBASE-25065

2020-10-14 Thread ramkrishna.s.vasudevan (Jira)
ramkrishna.s.vasudevan created HBASE-25186:
--

 Summary: TestMasterRegionOnTwoFileSystems is failing after 
HBASE-25065
 Key: HBASE-25186
 URL: https://issues.apache.org/jira/browse/HBASE-25186
 Project: HBase
  Issue Type: Bug
Affects Versions: 3.0.0-alpha-1, 2.4.0
Reporter: ramkrishna.s.vasudevan


After HBASE-25065, we are having a test case failure with 
TestMasterRegionOnTwoFileSystems. 
The reason is that we manually trigger a WAL roll on the master region. As part 
of the WAL roll we expect the Master region's WAL will also be moved from 
region oldWAL dir to the global oldWAL directory. This happens after 
afterRoll() method in AbstractWALRoller. 
Since  now the WAL archival is asynchronous the afterRoll() method does not 
find any WAL file to be moved in the local region oldWAL dir. So the movement 
to global oldWAL dir does not happen. 
The test case checks for the file in the oldWAL dir and since it is not found 
the test timesout. WE need a way to fix this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213679#comment-17213679
 ] 

ramkrishna.s.vasudevan commented on HBASE-25065:


It is not a logical issue just that when the afterRoll is called in the flow 
the archival has not been completed. 

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213678#comment-17213678
 ] 

ramkrishna.s.vasudevan commented on HBASE-25065:


The problem is related to the earlier test case failure. The AbstractWalRoller 
in master and branch-2 has a afterRoll() method. Mainly for the master region. 
That depends on the fact that the wal roll and archival happens in sync and 
that it expects the WAL file to be available in the walArchive path . If it is 
available it moves it to the globalArchive path. Now since this archival is 
async the afterRoll() does not happen and that causes the test case failure. 
[~zhangduo] - Is it possible to move this afterRoll() to that async archive 
method only? But to do that we need the LogRoller with the WAL implementation 
classes. 

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-13 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17213604#comment-17213604
 ] 

ramkrishna.s.vasudevan commented on HBASE-25065:


[~zhangduo] Thanks for pointing out. I was not available yesterday. Will have a 
look at this. I found a test failing in branch-2 and hence fixed it but not in 
master. Let me check 

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-12 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-25065.

Fix Version/s: 2.4.0
   3.0.0-alpha-1
   Resolution: Fixed

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17211832#comment-17211832
 ] 

ramkrishna.s.vasudevan commented on HBASE-25065:


[~ndimiduk] and [~zghao]
Do you want this in 2.3 and 2.2 branches? 

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25065) WAL archival to be done by a separate thread

2020-10-06 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-25065:
---
Summary: WAL archival to be done by a separate thread  (was: WAL archival 
can be batched/throttled and also done by a separate thread)

> WAL archival to be done by a separate thread
> 
>
> Key: HBASE-25065
> URL: https://issues.apache.org/jira/browse/HBASE-25065
> Project: HBase
>  Issue Type: Improvement
>  Components: wal
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
>
> Currently we do clean up of logs once we ensure that the region data has been 
> flushed. We track the sequence number and if we ensure that the seq number 
> has been flushed for any given region and the WAL that was rolled has that 
> seq number then those WAL can be archived.
> When we have around ~50 files to archive (per RS) - we do the archiving one 
> after the other. Since archiving is nothing but a rename operation it adds to 
> the meta operation load of Cloud based FS. 
> Not only that - the entire archival is done inside the rollWriterLock. Though 
> we have closed the writer and created a new writer and the writes are ongoing 
> - we never release the lock until we are done with the archiving. 
> What happens is that during that period our logs grow in size compared to the 
> default size configured (when we have consistent writes happening). 
> So the proposal is to move the log archival to a seperate thread and ensure 
> we can do some kind of throttling or batching so that we don't do archival at 
> one shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25135) Convert the internal seperator while emitting the memstore read metrics to #

2020-10-01 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17205505#comment-17205505
 ] 

ramkrishna.s.vasudevan commented on HBASE-25135:


[~zhangduo]
No. That is why I am just changing the internal seperator that was added. The 
external metric is still the same. This is just a logic change. (I actually 
thought of changing and raised a JIRA- and I closed it just because I did not 
want to change the actual metric).

> Convert the internal seperator while emitting the memstore read metrics to #
> 
>
> Key: HBASE-25135
> URL: https://issues.apache.org/jira/browse/HBASE-25135
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7
>
>
> Convert the '_' seperator while forming the metric for memstore reads and 
> mixed reads to '#'.
> This will avoid cases where the column family itself might have '_' in them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25135) Convert the internal seperator while emitting the memstore read metrics to #

2020-10-01 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-25135.

Fix Version/s: 2.2.7
   2.4.0
   2.3.3
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Convert the internal seperator while emitting the memstore read metrics to #
> 
>
> Key: HBASE-25135
> URL: https://issues.apache.org/jira/browse/HBASE-25135
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.7
>
>
> Convert the '_' seperator while forming the metric for memstore reads and 
> mixed reads to '#'.
> This will avoid cases where the column family itself might have '_' in them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25135) Convert the internal seperator while emitting the memstore read metrics to #

2020-09-30 Thread ramkrishna.s.vasudevan (Jira)
ramkrishna.s.vasudevan created HBASE-25135:
--

 Summary: Convert the internal seperator while emitting the 
memstore read metrics to #
 Key: HBASE-25135
 URL: https://issues.apache.org/jira/browse/HBASE-25135
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


Convert the '_' seperator while forming the metric for memstore reads and mixed 
reads to '#'.
This will avoid cases where the column family itself might have '_' in them.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25131) While emitting the memstore read based metrics use # instead of _ while emitting them metric

2020-09-30 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-25131.

Hadoop Flags:   (was: Incompatible change)
  Resolution: Won't Fix

I think if we change the format it looks odd. Let me handle in the code if the 
table name has '_' then the family name has to retrieved correctly. 

> While emitting the memstore read based metrics use # instead of _ while 
> emitting them metric
> 
>
> Key: HBASE-25131
> URL: https://issues.apache.org/jira/browse/HBASE-25131
> Project: HBase
>  Issue Type: Bug
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.3, 2.2.7
>
>
> We do have table names with '_' in it. so it should be '#' that we use as the 
> seperator when we emit the metric out. 
> Usually _ is used for all other metric detail but here since we do that 
> grouping of the metric based on table and store name - it is better we use 
> another seperator instead of '_' for these metrics. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25131) While emitting the memstore read based metrics use # instead of _ while emitting them metric

2020-09-30 Thread ramkrishna.s.vasudevan (Jira)
ramkrishna.s.vasudevan created HBASE-25131:
--

 Summary: While emitting the memstore read based metrics use # 
instead of _ while emitting them metric
 Key: HBASE-25131
 URL: https://issues.apache.org/jira/browse/HBASE-25131
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 3.0.0-alpha-1, 2.3.3, 2.2.7


We do have table names with '_' in it. so it should be '#' that we use as the 
seperator when we emit the metric out. 
Usually _ is used for all other metric detail but here since we do that 
grouping of the metric based on table and store name - it is better we use 
another seperator instead of '_' for these metrics. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25065) WAL archival can be batched/throttled and also done by a separate thread

2020-09-18 Thread ramkrishna.s.vasudevan (Jira)
ramkrishna.s.vasudevan created HBASE-25065:
--

 Summary: WAL archival can be batched/throttled and also done by a 
separate thread
 Key: HBASE-25065
 URL: https://issues.apache.org/jira/browse/HBASE-25065
 Project: HBase
  Issue Type: Improvement
  Components: wal
Affects Versions: 3.0.0-alpha-1, 2.4.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


Currently we do clean up of logs once we ensure that the region data has been 
flushed. We track the sequence number and if we ensure that the seq number has 
been flushed for any given region and the WAL that was rolled has that seq 
number then those WAL can be archived.
When we have around ~50 files to archive (per RS) - we do the archiving one 
after the other. Since archiving is nothing but a rename operation it adds to 
the meta operation load of Cloud based FS. 
Not only that - the entire archival is done inside the rollWriterLock. Though 
we have closed the writer and created a new writer and the writes are ongoing - 
we never release the lock until we are done with the archiving. 
What happens is that during that period our logs grow in size compared to the 
default size configured (when we have consistent writes happening). 
So the proposal is to move the log archival to a seperate thread and ensure we 
can do some kind of throttling or batching so that we don't do archival at one 
shot. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25015) PerformanceEvaluation with presplit randomWrite test has severe hotspotting

2020-09-16 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17197396#comment-17197396
 ] 

ramkrishna.s.vasudevan commented on HBASE-25015:


[~ndimiduk]
I am trying for randomWrites on a 3 nodes cluster with 2 PE clients each 20 
threads - each PE works on 2 different tables each with 200 presplits . Seems 
the request per sec is evenly distributed. The cluster is based on 2.1 though. 

> PerformanceEvaluation with presplit randomWrite test has severe hotspotting
> ---
>
> Key: HBASE-25015
> URL: https://issues.apache.org/jira/browse/HBASE-25015
> Project: HBase
>  Issue Type: Bug
>  Components: test
>Affects Versions: 2.3.1
>Reporter: Nick Dimiduk
>Priority: Major
>
> I'm attempting to apply a load to a small distributed cluster (12 RS). I use 
> {{-presplit}} and specify 3 regions per region server. {{randomWrite}} test 
> with {{-nomapred}} and 30 client threads. The result is severe hot-spotting 
> on a single region region (10's of thousands of reqs/sec) and minimal load 
> (high 10's to low 100's reqs/sec) to the others. It seems the split algorithm 
> and the load generator do not agree on an even data distribution.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25050) We initialize Filesystems more than once.

2020-09-16 Thread ramkrishna.s.vasudevan (Jira)
ramkrishna.s.vasudevan created HBASE-25050:
--

 Summary: We initialize Filesystems more than once.
 Key: HBASE-25050
 URL: https://issues.apache.org/jira/browse/HBASE-25050
 Project: HBase
  Issue Type: Bug
Affects Versions: 2.2.6, 2.3.1, 3.0.0-alpha-1, 2.4.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


In HFileSystem
{code}
// Create the default filesystem with checksum verification switched on.
// By default, any operation to this FilterFileSystem occurs on
// the underlying filesystem that has checksums switched on.
this.fs = FileSystem.get(conf);
this.useHBaseChecksum = useHBaseChecksum;

fs.initialize(getDefaultUri(conf), conf);
{code}
We call fs.initialize(). Generally the FS would have been created and inited 
either in the FileSystem.get() call above or even when we try to check 
{code}
  FileSystem fs = p.getFileSystem(c);
{code}
The FS that gets cached in the hadoop-common layer does the init for us. So we 
doing it again is redundant. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25026) Create a metric to track scans that have no start row and/or stop row

2020-09-14 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25026?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-25026:
---
Description: 
A metric that indicates how many of the scan requests were without start row 
and/or stop row. Generally such queries may be wrongly written or may require 
better schema design and those may be some queries doing some sanity check to 
verify if their actual application logic has done the necessary updates and the 
all that expected rows are processed. 

We do have some logs at the RPC layer to see what queries take time but nothing 
as a metric. 

  was:
A metric that indicates how many of the scan requests were without start row 
and/or stop row. Generally such queries may be wrongly written or may require 
better schema design and those may be some queries doing some sanity check to 
verify if their actual application logic has done the necessary updates and the 
all the expected rows are processed. 

We do have some logs at the RPC layer to see what queries take time but nothing 
as a metric. 


> Create a metric to track scans that have no start row and/or stop row
> -
>
> Key: HBASE-25026
> URL: https://issues.apache.org/jira/browse/HBASE-25026
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 3.0.0-alpha-1, 2.4.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Minor
>
> A metric that indicates how many of the scan requests were without start row 
> and/or stop row. Generally such queries may be wrongly written or may require 
> better schema design and those may be some queries doing some sanity check to 
> verify if their actual application logic has done the necessary updates and 
> the all that expected rows are processed. 
> We do have some logs at the RPC layer to see what queries take time but 
> nothing as a metric. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25026) Create a metric to track scans that have no start row and/or stop row

2020-09-14 Thread ramkrishna.s.vasudevan (Jira)
ramkrishna.s.vasudevan created HBASE-25026:
--

 Summary: Create a metric to track scans that have no start row 
and/or stop row
 Key: HBASE-25026
 URL: https://issues.apache.org/jira/browse/HBASE-25026
 Project: HBase
  Issue Type: Improvement
Affects Versions: 3.0.0-alpha-1, 2.4.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan


A metric that indicates how many of the scan requests were without start row 
and/or stop row. Generally such queries may be wrongly written or may require 
better schema design and those may be some queries doing some sanity check to 
verify if their actual application logic has done the necessary updates and the 
all the expected rows are processed. 

We do have some logs at the RPC layer to see what queries take time but nothing 
as a metric. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-25002) Create simple pattern matching query for retrieving metrics matching the pattern

2020-09-14 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-25002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17195484#comment-17195484
 ] 

ramkrishna.s.vasudevan commented on HBASE-25002:


Pushed to master and branch-2. Since it is an improvment not pushing to 
branch-2.2 and branch-2.3. 

> Create simple pattern matching query for retrieving metrics matching the 
> pattern
> 
>
> Key: HBASE-25002
> URL: https://issues.apache.org/jira/browse/HBASE-25002
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.2.5
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we allow querying a metric with a specific metric name. But 
> generally say under the MBean sub=Tables under a server we might have 'n' 
> number of tables and we might be interested in a specific metric across all 
> tables. 
> So in such cases giving a simple pattern based query will help so that we can 
> retrieve all metrics that follow that pattern.
> The other side effect is that we can also reduce the size of the json we pull 
> from the server to the querying client. Which are generally smaller scripts 
> which might not be able to process the big sized response JSON.
> Thanks [~anoopsamjohn] for the suggestion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-25002) Create simple pattern matching query for retrieving metrics matching the pattern

2020-09-14 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-25002.

Hadoop Flags: Reviewed
  Resolution: Fixed

> Create simple pattern matching query for retrieving metrics matching the 
> pattern
> 
>
> Key: HBASE-25002
> URL: https://issues.apache.org/jira/browse/HBASE-25002
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.2.5
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we allow querying a metric with a specific metric name. But 
> generally say under the MBean sub=Tables under a server we might have 'n' 
> number of tables and we might be interested in a specific metric across all 
> tables. 
> So in such cases giving a simple pattern based query will help so that we can 
> retrieve all metrics that follow that pattern.
> The other side effect is that we can also reduce the size of the json we pull 
> from the server to the querying client. Which are generally smaller scripts 
> which might not be able to process the big sized response JSON.
> Thanks [~anoopsamjohn] for the suggestion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25002) Create simple pattern matching query for retrieving metrics matching the pattern

2020-09-14 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-25002:
---
Fix Version/s: (was: 2.2.6)
   (was: 2.3.3)

> Create simple pattern matching query for retrieving metrics matching the 
> pattern
> 
>
> Key: HBASE-25002
> URL: https://issues.apache.org/jira/browse/HBASE-25002
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.2.5
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.4.0
>
>
> Currently we allow querying a metric with a specific metric name. But 
> generally say under the MBean sub=Tables under a server we might have 'n' 
> number of tables and we might be interested in a specific metric across all 
> tables. 
> So in such cases giving a simple pattern based query will help so that we can 
> retrieve all metrics that follow that pattern.
> The other side effect is that we can also reduce the size of the json we pull 
> from the server to the querying client. Which are generally smaller scripts 
> which might not be able to process the big sized response JSON.
> Thanks [~anoopsamjohn] for the suggestion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-25002) Create simple pattern matching query for retrieving metrics matching the pattern

2020-09-08 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-25002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-25002:
---
Description: 
Currently we allow querying a metric with a specific metric name. But generally 
say under the MBean sub=Tables under a server we might have 'n' number of 
tables and we might be interested in a specific metric across all tables. 
So in such cases giving a simple pattern based query will help so that we can 
retrieve all metrics that follow that pattern.
The other side effect is that we can also reduce the size of the json we pull 
from the server to the querying client. Which are generally smaller scripts 
which might not be able to process the big sized response JSON.

Thanks [~anoopsamjohn] for the suggestion. 

  was:
Currently we allow querying a metric with a specific metric name. But generally 
say under the MBean sub=Tables under a server we might have 'n' number of 
tables and we might be interested in a specific metric across all tables. 
So in such cases giving a simple pattern based query will help so that we can 
retrieve all metrics that follow that pattern.
The other side effect is that we can also reduce the size of the json we pull 
from the server to the querying client. Which are generally smaller scripts 
which might not be able to process the big sized response JSON.


> Create simple pattern matching query for retrieving metrics matching the 
> pattern
> 
>
> Key: HBASE-25002
> URL: https://issues.apache.org/jira/browse/HBASE-25002
> Project: HBase
>  Issue Type: Improvement
>Affects Versions: 2.2.5
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.6
>
>
> Currently we allow querying a metric with a specific metric name. But 
> generally say under the MBean sub=Tables under a server we might have 'n' 
> number of tables and we might be interested in a specific metric across all 
> tables. 
> So in such cases giving a simple pattern based query will help so that we can 
> retrieve all metrics that follow that pattern.
> The other side effect is that we can also reduce the size of the json we pull 
> from the server to the querying client. Which are generally smaller scripts 
> which might not be able to process the big sized response JSON.
> Thanks [~anoopsamjohn] for the suggestion. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (HBASE-25002) Create simple pattern matching query for retrieving metrics matching the pattern

2020-09-08 Thread ramkrishna.s.vasudevan (Jira)
ramkrishna.s.vasudevan created HBASE-25002:
--

 Summary: Create simple pattern matching query for retrieving 
metrics matching the pattern
 Key: HBASE-25002
 URL: https://issues.apache.org/jira/browse/HBASE-25002
 Project: HBase
  Issue Type: Improvement
Affects Versions: 2.2.5
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Fix For: 3.0.0-alpha-1, 2.3.3, 2.4.0, 2.2.6


Currently we allow querying a metric with a specific metric name. But generally 
say under the MBean sub=Tables under a server we might have 'n' number of 
tables and we might be interested in a specific metric across all tables. 
So in such cases giving a simple pattern based query will help so that we can 
retrieve all metrics that follow that pattern.
The other side effect is that we can also reduce the size of the json we pull 
from the server to the querying client. Which are generally smaller scripts 
which might not be able to process the big sized response JSON.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-09-03 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17189901#comment-17189901
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


[~brfrn169] _- I am currently working on this. Stuck with few other things 
internally. I will resume this work shortly and post a patch ASAP.

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.4.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175256#comment-17175256
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


BTW thanks [~sreenivasulureddy] for your tests here. We can try to solve this 
problem with CellComparator only and once again check if it is helping the perf 
here. 

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24850) CellComparator perf improvement

2020-08-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175246#comment-17175246
 ] 

ramkrishna.s.vasudevan commented on HBASE-24850:


Thanks for raising this. I was waiting for [~sreenivasulureddy] to check the 
other MR based change before I could raise a JIRA. 
I did add a KV only path in the CellComparator like the BBKV path and that 
helps a lot. And we get good numbers like the other Patch in 
https://issues.apache.org/jira/browse/HBASE-24754
This is critical in the write to the memstore case and a flush case where we 
will do a read from memstore. Currently memstore is pure KV only path.  




> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.4.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24850) CellComparator perf improvement

2020-08-10 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-24850:
--

Assignee: ramkrishna.s.vasudevan

> CellComparator perf improvement
> ---
>
> Key: HBASE-24850
> URL: https://issues.apache.org/jira/browse/HBASE-24850
> Project: HBase
>  Issue Type: Improvement
>  Components: Performance, scan
>Affects Versions: 2.0.0
>Reporter: Anoop Sam John
>Assignee: ramkrishna.s.vasudevan
>Priority: Critical
> Fix For: 2.4.0
>
>
> We have multiple perf issues in 2.x versions compared to 1.x.  Eg: 
> HBASE-24754, HBASE-24637.
> The pattern is clear that where ever we do more and more Cell compares, there 
> is some degrade.   In HBASE-24754, with an old KVComparator style comparator, 
> we see much better perf for the PutSortReducer.  (Again the gain is huge 
> because of large number of compare ops that test is doing).  This issue is to 
> address and optimize compares generally in CellComparatorImpl itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175244#comment-17175244
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


The patch here is to basically see if the problem is due to the Comparator. Ya 
we can try to solve the Cellcomparator generically. But atlesat i believe we 
can add a KV only path to solve the write cases because Memstore and flushes 
path will use that heavily. And my initial tests reveal that helps to solve the 
perf problem there (similar to how we have a BBKV path in comparator).

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-10 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17175243#comment-17175243
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


bq.This can be an issue in the overall perf issue which deal with so many Cells 
and compares. (The other 2.x perf issue of filtering cells in a range scan - 
HBASE-24637 )
Ya I too think so and added the same observation in one of my earlier comment.


> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-07 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17173035#comment-17173035
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


bq.Flame graphs for the branch-2 with the patch, some more calls are not there 
in the attached file like compareFamilies, compareColumns and compareTimestamps.
In the attached file, I have just pasted the flame graph that is generated for 
branch-2. See branch-1 even that is also similar. Means we are not branching 
out more and at the same time we don spent CPU in doing multiple Bytes.toXXX().

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-07 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172941#comment-17172941
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-24754 at 8/7/20, 7:08 AM:
-

Thanks [~sreenivasulureddy]
Can you see the patch 
https://issues.apache.org/jira/secure/attachment/13009270/Branc2_withComparator_atKeyValue.patch
We change the Comparator to use KV comparator only because we know we are 
dealing with Keyvalues only. Also avoided the hierarchy that the branch-2 
creates. 
Also attached the flamegraphs for branch-1 and branch-2 and after patching 
branch-2. Seems branch-2 perf was very random and not a constant. May be it is 
due to the hierarchy tree for the CellComparator. I did try the CellComparator 
way by avoiding all the if/else conditions in the comparator code to accomodate 
different cell types but that was also not giving a consistent perf. After I 
changed to use KV comparator directly with the above changes performance became 
consistent and on par with branch-1.3. 
[~sreenivasulureddy] - can you pls check .


was (Author: ram_krish):
[~sreenivasulureddy]
Can you the patch at 
https://issues.apache.org/jira/secure/attachment/13009270/Branc2_withComparator_atKeyValue.patch
We change the Comparator to use KV comparator only because we know we are 
dealing with Keyvalues only. Also avoided the hierarchy that the branch-2 
creates. 
Also attached the flamegraphs for branch-1 and branch-2 and after patching 
branch-2. Seems branch-2 perf was very random and not a constant. May be it is 
due to the hierarchy tree for the CellComparator. I did try the CellComparator 
way by avoiding all the if/else conditions in the comparator code to accomodate 
different cell types but that was also not giving a consistent perf. After I 
changed to use KV comparator directly with the above changes performance became 
consistent and on par with branch-1.3. 
[~sreenivasulureddy] - can you pls check .

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-07 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172941#comment-17172941
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


[~sreenivasulureddy]
Can you the patch at 
https://issues.apache.org/jira/secure/attachment/13009270/Branc2_withComparator_atKeyValue.patch
We change the Comparator to use KV comparator only because we know we are 
dealing with Keyvalues only. Also avoided the hierarchy that the branch-2 
creates. 
Also attached the flamegraphs for branch-1 and branch-2 and after patching 
branch-2. Seems branch-2 perf was very random and not a constant. May be it is 
due to the hierarchy tree for the CellComparator. I did try the CellComparator 
way by avoiding all the if/else conditions in the comparator code to accomodate 
different cell types but that was also not giving a consistent perf. After I 
changed to use KV comparator directly with the above changes performance became 
consistent and on par with branch-1.3. 
[~sreenivasulureddy] - can you pls check .

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Assigned] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-07 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan reassigned HBASE-24754:
--

Assignee: ramkrishna.s.vasudevan

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-07 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-24754:
---
Attachment: flamegraph_branch-2_afterpatch.svg
flamegraph_branch-2.svg
flamegraph_branch-1_new.svg
Branc2_withComparator_atKeyValue.patch

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Priority: Major
> Attachments: Branc2_withComparator_atKeyValue.patch, 
> Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch, flamegraph_branch-1_new.svg, 
> flamegraph_branch-2.svg, flamegraph_branch-2_afterpatch.svg
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Comment Edited] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-06 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172541#comment-17172541
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-24754 at 8/6/20, 5:16 PM:
-

I was able to verify in my local linux VM and the significant drop is due to 
the Comparator. 

The branch-1.3 took consistenly ~11 to 12 secs but the branch-2 is varying much 
from 15 to 22 secs. 

See the stack trace and that explains the reason 
Branch-1.3
{code}
main" #1 prio=5 os_prio=0 tid=0x7f5ffc010800 nid=0x4b0b runnable 
[0x7f6003887000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1897)
at java.util.TreeMap.put(TreeMap.java:552)
at java.util.TreeSet.add(TreeSet.java:255)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:104)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:157)
{code}
Where as in the branch-2 code base
{code}
"main" #1 prio=5 os_prio=0 tid=0x7f4a48016000 nid=0x488a runnable 
[0x7f4a507bb000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.util.Bytes$ConverterHolder$UnsafeConverter.toShort(Bytes.java:1533)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1127)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:)
at org.apache.hadoop.hbase.KeyValue.getRowLength(KeyValue.java:1337)
at org.apache.hadoop.hbase.KeyValue.getFamilyOffset(KeyValue.java:1353)
at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1368)
at 
org.apache.hadoop.hbase.KeyValue.getQualifierLength(KeyValue.java:1406)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareQualifiers(CellComparatorImpl.java:169)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareColumns(CellComparatorImpl.java:105)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareWithoutRow(CellComparatorImpl.java:266)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:86)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:67)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:45)
at java.util.TreeMap.put(TreeMap.java:552)
at java.util.TreeSet.add(TreeSet.java:255)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:191)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:242)
{code}
So we do more work to do the comparison when we have large rows. I think the 
similar thing is happening out in the other issue where we try to filter out 
large number of rows during a scan. (just saying but that i have not spent time 
on that ).


was (Author: ram_krish):
I was able to verify in my local linux VM and the significant drop is due to 
the Comparator. 

The branch-1.3 took consistenly ~11 to 12 secs but the branch-2 is varying much 
from 15 to 22 secs. 

See the stack trace and that explains the reason 
Branch-1.3
{code}
main" #1 prio=5 os_prio=0 tid=0x7f5ffc010800 nid=0x4b0b runnable 
[0x7f6003887000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1897)
at java.util.TreeMap.put(TreeMap.java:552)
at java.util.TreeSet.add(TreeSet.java:255)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:104)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:157)

Where as in the branch-2 code base
{code}
"main" #1 prio=5 os_prio=0 tid=0x7f4a48016000 nid=0x488a runnable 
[0x7f4a507bb000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.util.Bytes$ConverterHolder$UnsafeConverter.toShort(Bytes.java:1533)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1127)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:)
at org.apache.hadoop.hbase.KeyValue.getRowLength(KeyValue.java:1337)
at org.apache.hadoop.hbase.KeyValue.getFamilyOffset(KeyValue.java:1353)
at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1368)
at 
org.apache.hadoop.hbase.KeyValue.getQualifierLength(KeyValue.java:1406)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareQualifiers(CellComparatorImpl.java:169)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareColumns(CellComparatorImpl.java:105)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareWithoutRow(CellComparatorImpl.java:266)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:86)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:67)
at 

[jira] [Comment Edited] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-06 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172541#comment-17172541
 ] 

ramkrishna.s.vasudevan edited comment on HBASE-24754 at 8/6/20, 5:16 PM:
-

I was able to verify in my local linux VM and the significant drop is due to 
the Comparator. 

The branch-1.3 took consistenly ~11 to 12 secs but the branch-2 is varying much 
from 15 to 22 secs. 

See the stack trace and that explains the reason 
Branch-1.3
{code}
main" #1 prio=5 os_prio=0 tid=0x7f5ffc010800 nid=0x4b0b runnable 
[0x7f6003887000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1897)
at java.util.TreeMap.put(TreeMap.java:552)
at java.util.TreeSet.add(TreeSet.java:255)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:104)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:157)

Where as in the branch-2 code base
{code}
"main" #1 prio=5 os_prio=0 tid=0x7f4a48016000 nid=0x488a runnable 
[0x7f4a507bb000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.util.Bytes$ConverterHolder$UnsafeConverter.toShort(Bytes.java:1533)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1127)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:)
at org.apache.hadoop.hbase.KeyValue.getRowLength(KeyValue.java:1337)
at org.apache.hadoop.hbase.KeyValue.getFamilyOffset(KeyValue.java:1353)
at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1368)
at 
org.apache.hadoop.hbase.KeyValue.getQualifierLength(KeyValue.java:1406)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareQualifiers(CellComparatorImpl.java:169)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareColumns(CellComparatorImpl.java:105)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareWithoutRow(CellComparatorImpl.java:266)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:86)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:67)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:45)
at java.util.TreeMap.put(TreeMap.java:552)
at java.util.TreeSet.add(TreeSet.java:255)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:191)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:242)
{code}
So we do more work to do the comparison when we have large rows. I think the 
similar thing is happening out in the other issue where we try to filter out 
large number of rows during a scan. (just saying but that i have not spent time 
on that ).


was (Author: ram_krish):
I was able to verify in my local linux VM and the significant drop is due to 
the Comparator. 

The branch-1.3 took consistenly ~11 to 12 secs but the branch-2 is varying much 
from 15 to 22 secs. 

See the stack trace and that explains the reason 
Branch-1.3
{code}
main" #1 prio=5 os_prio=0 tid=0x7f5ffc010800 nid=0x4b0b runnable 
[0x7f6003887000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1897)
at java.util.TreeMap.put(TreeMap.java:552)
at java.util.TreeSet.add(TreeSet.java:255)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:104)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:157)
{code}
Where the code there is
{code}
 return Bytes.compareTo(left, loffset + lfamilylength,
llength - lfamilylength,
right, roffset + rfamilylength, rlength - rfamilylength);
{code}
Where as in the branch-2 code base
{code}
"main" #1 prio=5 os_prio=0 tid=0x7f4a48016000 nid=0x488a runnable 
[0x7f4a507bb000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.util.Bytes$ConverterHolder$UnsafeConverter.toShort(Bytes.java:1533)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1127)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:)
at org.apache.hadoop.hbase.KeyValue.getRowLength(KeyValue.java:1337)
at org.apache.hadoop.hbase.KeyValue.getFamilyOffset(KeyValue.java:1353)
at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1368)
at 
org.apache.hadoop.hbase.KeyValue.getQualifierLength(KeyValue.java:1406)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareQualifiers(CellComparatorImpl.java:169)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareColumns(CellComparatorImpl.java:105)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareWithoutRow(CellComparatorImpl.java:266)
at 

[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-06 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172557#comment-17172557
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


Oh . the code for branch-1.3 i added was from another branch. Sorry about that. 

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Priority: Major
> Attachments: Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-06 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172541#comment-17172541
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


I was able to verify in my local linux VM and the significant drop is due to 
the Comparator. 

The branch-1.3 took consistenly ~11 to 12 secs but the branch-2 is varying much 
from 15 to 22 secs. 

See the stack trace and that explains the reason 
Branch-1.3
{code}
main" #1 prio=5 os_prio=0 tid=0x7f5ffc010800 nid=0x4b0b runnable 
[0x7f6003887000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.KeyValue$KVComparator.compare(KeyValue.java:1897)
at java.util.TreeMap.put(TreeMap.java:552)
at java.util.TreeSet.add(TreeSet.java:255)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:104)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:157)
{code}
Where the code there is
{code}
 return Bytes.compareTo(left, loffset + lfamilylength,
llength - lfamilylength,
right, roffset + rfamilylength, rlength - rfamilylength);
{code}
Where as in the branch-2 code base
{code}
"main" #1 prio=5 os_prio=0 tid=0x7f4a48016000 nid=0x488a runnable 
[0x7f4a507bb000]
   java.lang.Thread.State: RUNNABLE
at 
org.apache.hadoop.hbase.util.Bytes$ConverterHolder$UnsafeConverter.toShort(Bytes.java:1533)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:1127)
at org.apache.hadoop.hbase.util.Bytes.toShort(Bytes.java:)
at org.apache.hadoop.hbase.KeyValue.getRowLength(KeyValue.java:1337)
at org.apache.hadoop.hbase.KeyValue.getFamilyOffset(KeyValue.java:1353)
at org.apache.hadoop.hbase.KeyValue.getFamilyLength(KeyValue.java:1368)
at 
org.apache.hadoop.hbase.KeyValue.getQualifierLength(KeyValue.java:1406)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareQualifiers(CellComparatorImpl.java:169)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareColumns(CellComparatorImpl.java:105)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compareWithoutRow(CellComparatorImpl.java:266)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:86)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:67)
at 
org.apache.hadoop.hbase.CellComparatorImpl.compare(CellComparatorImpl.java:45)
at java.util.TreeMap.put(TreeMap.java:552)
at java.util.TreeSet.add(TreeSet.java:255)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.reduce1Row(PutSortReducer.java:191)
at 
org.apache.hadoop.hbase.mapreduce.PutSortReducer.main(PutSortReducer.java:242)
{code}
So we do more work to do the comparison when we have large rows. I think the 
similar thing is happening out in the other issue where we try to filter out 
large number of rows during a scan. (just saying but that i have not spent time 
on that ).

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Priority: Major
> Attachments: Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 

[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-06 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172520#comment-17172520
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


[~sreenivasulureddy] -  are you working on this? If not I can check this and 
provide a patch.

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Priority: Major
> Attachments: Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24713) RS startup with FSHLog throws NPE after HBASE-21751

2020-08-05 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171493#comment-17171493
 ] 

ramkrishna.s.vasudevan commented on HBASE-24713:


Pushed to branch-2 also. 

> RS startup with FSHLog throws NPE after HBASE-21751
> ---
>
> Key: HBASE-24713
> URL: https://issues.apache.org/jira/browse/HBASE-24713
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.1.6
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.6
>
>
> Every RS startup creates this NPE
> {code}
> [sync.1] wal.FSHLog: UNEXPECTED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:582)
> at java.lang.Thread.run(Thread.java:748)
> 2020-07-07 10:51:23,208 WARN  [regionserver/x:16020] wal.FSHLog: Failed 
> sync-before-close but no outstanding appends; closing 
> WALjava.lang.NullPointerException
> {code}
> the reason is that the Disruptor frameworks starts the Syncrunner thread but 
> the init of the writer happens after that. A simple null check in the 
> Syncrunner will help here .
> No major damage happens though since we handle Throwable Exception. It will 
> good to solve this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-05 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17171274#comment-17171274
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


[~sreenivasulureddy] - just replace the entire code with what we have in 
branch-1.3 where we don check for any tag and its attributes from the tags and 
just 
{code}
Put p = put;
for (List cells : p.getFamilyCellMap().values()) {
  for (Cell cell : cells) {
KeyValue kv = KeyValueUtil.ensureKeyValueType(cell);
if (map.add(kv)) {// don't count duplicated kv into size
  curSize += kv.heapSize();
}
  }
}
If this still does not help then the only issue should be with the Compartor 
but at a first glance i don find anything there. 
{code}

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Priority: Major
> Attachments: Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-04 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170747#comment-17170747
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


Thanks [~sreenivasulureddy]. though tags are not there I thought we still check 
 cell.getTagsLength inside PrivateCellUtil#tagsIterator which will do 
Bytes.toInt. 

> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Priority: Major
> Attachments: Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24754) Bulk load performance is degraded in HBase 2

2020-08-04 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24754?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170710#comment-17170710
 ] 

ramkrishna.s.vasudevan commented on HBASE-24754:


Thanks [~sreenivasulureddy].
{code}
 TagUtil.carryForwardTags(tags, cell);
{code}
I think here we still find the length of the tag and then we check if the 
tags.isEmpty.
[~sreenivasulureddy] - can you just 
{code}
TagUtil.carryForwardTags(tags, cell);
if (!tags.isEmpty()) {
  kv = (KeyValue) kvCreator.create(cell.getRowArray(), 
cell.getRowOffset(),
cell.getRowLength(), cell.getFamilyArray(), 
cell.getFamilyOffset(),
cell.getFamilyLength(), cell.getQualifierArray(), 
cell.getQualifierOffset(),
cell.getQualifierLength(), cell.getTimestamp(), 
cell.getValueArray(),
cell.getValueOffset(), cell.getValueLength(), tags);
} else {
  kv = KeyValueUtil.ensureKeyValue(cell);
}
{code}
replace the above code with  
{code}
 kv = KeyValueUtil.ensureKeyValue(cell);
{code}
like in branch-1.3 code and rerun the above exp with branch-2? 


> Bulk load performance is degraded in HBase 2 
> -
>
> Key: HBASE-24754
> URL: https://issues.apache.org/jira/browse/HBASE-24754
> Project: HBase
>  Issue Type: Bug
>  Components: Performance
>Affects Versions: 2.2.3
>Reporter: Ajeet Rai
>Priority: Major
> Attachments: Branch1.3_putSortReducer_sampleCode.patch, 
> Branch2_putSortReducer_sampleCode.patch
>
>
> in our Test,It is observed that Bulk load performance is degraded in HBase 2 .
>  Test Input: 
> 1: Table with 500 region(300 column family)
> 2:  data =2 TB
> Data Sample
> 186000120150205100068110,1860001,20150205,5,404,735412,2938,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,111,1
> 3: Cluster: 7 node(2 master+5 Region Server)
>  4: No of Container Launched are same in both case
> HBase 2 took 10% more time then HBase 1.3 where test input is same for both 
> cluster
>  
> |Feature|HBase 2.2.3
>  Time(Sec)|HBase 1.3.1
>  Time(Sec)|Diff%|Snappy lib:
>   |
> |BulkLoad|21837|19686.16|-10.93|Snappy lib:
>  HBase 2.2.3: 1.4
>  HBase 1.3.1: 1.4|



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24326) Removal from streamReaders can be done in finally

2020-08-02 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-24326.

Resolution: Won't Fix

Since branch 2.1 is EOLed closing this JIRA as won't fix.

> Removal from streamReaders can be done in finally
> -
>
> Key: HBASE-24326
> URL: https://issues.apache.org/jira/browse/HBASE-24326
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 2.1.6
>Reporter: ramkrishna.s.vasudevan
>Assignee: Sambit Mohapatra
>Priority: Major
> Fix For: 2.1.10
>
>
> As part of the fix for https://issues.apache.org/jira/browse/HBASE-21551 we 
> removed the storeReaders from the set inside the try block.
> The code there seems to catch IOException. But if you drill down and see how 
> the actual readers are closed - FSDataInputStreamWrapper#close() uses 
> IOUtils.closeQuietly where we swallow the IOException (So ideally IOException 
> will not be thrown). But there are cases we endup getting other type of 
> RuntimeExceptions which may fail the close() and we endup not removing the 
> storeReader from the Set. So it is safe to always remove it in finally. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (HBASE-24713) RS startup with FSHLog throws NPE after HBASE-21751

2020-08-02 Thread ramkrishna.s.vasudevan (Jira)


 [ 
https://issues.apache.org/jira/browse/HBASE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan resolved HBASE-24713.

Fix Version/s: 2.2.6
   2.3.1
   3.0.0-alpha-1
 Hadoop Flags: Reviewed
   Resolution: Fixed

Pushed to branch-2.2,  branch-2.3 and master. Thanks for all the reviews. 

> RS startup with FSHLog throws NPE after HBASE-21751
> ---
>
> Key: HBASE-24713
> URL: https://issues.apache.org/jira/browse/HBASE-24713
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.1.6
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
> Fix For: 3.0.0-alpha-1, 2.3.1, 2.2.6
>
>
> Every RS startup creates this NPE
> {code}
> [sync.1] wal.FSHLog: UNEXPECTED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:582)
> at java.lang.Thread.run(Thread.java:748)
> 2020-07-07 10:51:23,208 WARN  [regionserver/x:16020] wal.FSHLog: Failed 
> sync-before-close but no outstanding appends; closing 
> WALjava.lang.NullPointerException
> {code}
> the reason is that the Disruptor frameworks starts the Syncrunner thread but 
> the init of the writer happens after that. A simple null check in the 
> Syncrunner will help here .
> No major damage happens though since we handle Throwable Exception. It will 
> good to solve this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-18070) Enable memstore replication for meta replica

2020-07-29 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-18070?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167616#comment-17167616
 ] 

ramkrishna.s.vasudevan commented on HBASE-18070:


Just checking the recent updates here. Yes anoop's concern is a valid. META 
should always be in sync replication. A single view should be there no matter 
which replica we reach out to. 
Probably like a load balancer and not like other user tables which is more to 
just ensure that atleast my cluster can do some reads (hopefully data will get 
consistent across replicas) instead of the entire region to be available. 

> Enable memstore replication for meta replica
> 
>
> Key: HBASE-18070
> URL: https://issues.apache.org/jira/browse/HBASE-18070
> Project: HBase
>  Issue Type: New Feature
>Reporter: Hua Xiang
>Assignee: Huaxiang Sun
>Priority: Major
>
> Based on the current doc, memstore replication is not enabled for meta 
> replica. Memstore replication will be a good improvement for meta replica. 
> Create jira to track this effort (feasibility, design, implementation, etc).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-24713) RS startup with FSHLog throws NPE after HBASE-21751

2020-07-26 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-24713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17165445#comment-17165445
 ] 

ramkrishna.s.vasudevan commented on HBASE-24713:


I had commented the similar thing in the PR. IMO adding null check in the sync 
runner is also fine. We try to create and publish a seq Id to the ring buffer 
based on the roll writer that we call. 
Hence it tries to sync it. Adding null check will be sufficient there as we 
don't break anything and also it is during the startup. 


> RS startup with FSHLog throws NPE after HBASE-21751
> ---
>
> Key: HBASE-24713
> URL: https://issues.apache.org/jira/browse/HBASE-24713
> Project: HBase
>  Issue Type: Bug
>  Components: wal
>Affects Versions: 2.1.6
>Reporter: ramkrishna.s.vasudevan
>Assignee: Gaurav Kanade
>Priority: Minor
>
> Every RS startup creates this NPE
> {code}
> [sync.1] wal.FSHLog: UNEXPECTED
> java.lang.NullPointerException
> at 
> org.apache.hadoop.hbase.regionserver.wal.FSHLog$SyncRunner.run(FSHLog.java:582)
> at java.lang.Thread.run(Thread.java:748)
> 2020-07-07 10:51:23,208 WARN  [regionserver/x:16020] wal.FSHLog: Failed 
> sync-before-close but no outstanding appends; closing 
> WALjava.lang.NullPointerException
> {code}
> the reason is that the Disruptor frameworks starts the Syncrunner thread but 
> the init of the writer happens after that. A simple null check in the 
> Syncrunner will help here .
> No major damage happens though since we handle Throwable Exception. It will 
> good to solve this. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23634) Enable "Split WAL to HFile" by default

2020-07-24 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164334#comment-17164334
 ] 

ramkrishna.s.vasudevan commented on HBASE-23634:


Ya was once again discussing with Anoop and I think he has a point there. One 
more level of folder name with WAL could be easiest way to tackle it. I missed 
that flow in my testing. 

> Enable "Split WAL to HFile" by default
> --
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23634) Enable "Split WAL to HFile" by default

2020-07-23 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164140#comment-17164140
 ] 

ramkrishna.s.vasudevan commented on HBASE-23634:


bq.Or else write directly under this but when the split attempt failed, the 
next one's 1st job will be to clean the existing result HFiles.
Ya this is what I meant. 

> Enable "Split WAL to HFile" by default
> --
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (HBASE-23634) Enable "Split WAL to HFile" by default

2020-07-23 Thread ramkrishna.s.vasudevan (Jira)


[ 
https://issues.apache.org/jira/browse/HBASE-23634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17164139#comment-17164139
 ] 

ramkrishna.s.vasudevan commented on HBASE-23634:


bq.. Say if HFile was placed under region/cf/recovered.edits/ dir,
Instead on a retry we can clear the files under region/cf/recovered.edits and 
then start over? that will also ensure that the partially written files are 
removed correct?

> Enable "Split WAL to HFile" by default
> --
>
> Key: HBASE-23634
> URL: https://issues.apache.org/jira/browse/HBASE-23634
> Project: HBase
>  Issue Type: Task
>Affects Versions: 3.0.0-alpha-1, 2.3.0
>Reporter: Guanghao Zhang
>Priority: Blocker
> Fix For: 3.0.0-alpha-1
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)


  1   2   3   4   5   6   7   8   9   10   >