[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186625#comment-15186625 ] stack commented on HBASE-15180: --- The horrid names are fine by me. They are explicit and for internal use only. Otherwise patch looks good. Have you looked at moving what is common in IPCUtil up into hbase-common and then shutting down the access on IPCUtil so only accessible by ipc client? Can move its guts to some codec util up in hbase common and the server-side could use that. Is that a load of work? Is it doable? Ok to file a separate issue to do this. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch, HBASE-15180_V6.patch, HBASE-15180_V7.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186406#comment-15186406 ] Anoop Sam John commented on HBASE-15180: Ya to make sure the client uses only which it supposed to, we can rename method at least. Any better name suggestion boss [~saint@gmail.com]? The best way would have been that, the codec instance know where am I. Whether it is in client context or server. We need to set it while constructing Codec instance. But ya at least a method name change we can do now. I will fix ur suggestion abt explicit javadoc statement on commit. Thanks for the review. Once Stack also good abt the patch, will commit. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch, HBASE-15180_V6.patch, HBASE-15180_V7.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186384#comment-15186384 ] Enis Soztutar commented on HBASE-15180: --- The patch looks much cleaner than v4 I think. Let's add a javadoc to this saying that the ownership / lifecycle of the passed BB is transferred to the Decoder and Cells returned from the decoder, so that the contract is explicit: {code} + Decoder getDecoder(ByteBuffer buf); {code} Again, the contract for the {{IPCUtil. createCellScanner()}} used by client side vs server side should be explicit. How do we even know that one is used only by the client or server? We can declare it in the method name explicitly, like {{createCellScannerReusingBuffers()}} (sorry for the horrible name). Otherwise looks good. [~saint@gmail.com] give it a quick glance? > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch, HBASE-15180_V6.patch, HBASE-15180_V7.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15186348#comment-15186348 ] Enis Soztutar commented on HBASE-15180: --- bq. Dropped branch-1 based fixed versions. Adding new method into Codec is fine there? Even if, we may change BB to new type in 2.0 later. So better we won't add any thing. The only Codec that is used by outside is the Phoenix index codec as far as I know. The codec encodes some more cells for the index corresponding to the IndexedKeyValue's. We can modify Phoenix to compile with HBase-2.0 when time comes. Other than that, Codec's are not user-level classes, they are/should be framework level only. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch, HBASE-15180_V6.patch, HBASE-15180_V7.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173573#comment-15173573 ] Hadoop QA commented on HBASE-15180: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 34s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 26s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 15s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 7m 7s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 43s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 59s {color} | {color:red} hbase-server in master has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 15s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 15s {color} | {color:red} hbase-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 19s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 9s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 9s {color} | {color:red} Patch generated 1 new checkstyle issues in hbase-common (total was 60, now 61). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 29m 5s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 26s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 17s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 23s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 7s {color} | {color:green} hbase-client in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 7s {color} | {color:green} hbase-common in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 146m 47s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 23s {color} | {color:green} hbase-client in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 11s {color} | {color:green} hbase-common in the patch passed with JDK v1.7.0_95. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 80m 52s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 2m 13s
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15173243#comment-15173243 ] Anoop Sam John commented on HBASE-15180: So in this patch made sure that on client side we wont go with optimization of Cells created directly over the response read buffer. The cells will be handed over to user app and we can not be sure whether some of the cells will be cached or not. Caching even single cell will make it such that we can not GC the bigger sized response buffer. Server side we have MSLAB and copy to that before adding Cells to memstore. [~enis] Ping for ur review as well. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch, HBASE-15180_V6.patch, HBASE-15180_V7.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160284#comment-15160284 ] stack commented on HBASE-15180: --- Looks better to me. All the messing is kept internal to Codec. Hopefully we get to remove the tag-particular codecs and KeyValue ones one day (let the codec figure whether tags are present or not rather than upper layers -- make Tags first class). > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch, HBASE-15180_V6.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15160261#comment-15160261 ] Anoop Sam John commented on HBASE-15180: wdyt of the latest patch [~saint@gmail.com], [~enis] > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch, HBASE-15180_V6.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153947#comment-15153947 ] Hadoop QA commented on HBASE-15180: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 1 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 52s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 59s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 9s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 6m 55s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 36s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 32s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 0s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 10s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 16s {color} | {color:red} hbase-client in the patch failed. {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 7s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 8s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 1m 8s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 9s {color} | {color:red} Patch generated 1 new checkstyle issues in hbase-common (total was 71, now 72). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 22m 15s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 54s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 48s {color} | {color:green} hbase-client in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 34s {color} | {color:green} hbase-common in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 86m 3s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 3s {color} | {color:green} hbase-client in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 49s {color} | {color:green} hbase-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 80m 22s {color} | {color:green} hbase-server in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 44s {color} | {color:green} Patch does
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153715#comment-15153715 ] Anoop Sam John commented on HBASE-15180: Dropped branch-1 based fixed versions. Adding new method into Codec is fine there? Even if, we may change BB to new type in 2.0 later. So better we won't add any thing. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch, HBASE-15180_V6.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15153713#comment-15153713 ] Anoop Sam John commented on HBASE-15180: bq.maybe it is best to cut to BB? Or, your Interface that does BB – ByteBuff? Something to consider. Ok.. Add getDecoder() with BB as of now.. Later we need change to new interface type, we can change then. So this fix will go into 2.0 only. So before 2.0 is out, we will make the change of BB to new type as needed. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151398#comment-15151398 ] stack commented on HBASE-15180: --- bq. Said that, continue with IS instead of a data structure like byte[]/BB is giving us all sort of flexibility. One thought I had on this subsequently was that [~enis] probably made the BB suggestion because he noticed we were turning somersaults to make IS work. So, while IS might be more flexible, if it means we have to do messy stuff, maybe it is best to cut to BB? Or, your Interface that does BB -- ByteBuff? Something to consider. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15151327#comment-15151327 ] Enis Soztutar commented on HBASE-15180: --- bq. Yeah, best if could get away with not changing it but I'd say it is ok. Its 2.0 and you are changing a class only used by hbase internals (i'd say) Codec is LimitedPrivate, and should be used only by other frameworks (Phoenix, etc). It should be ok to change the interface in 2.0. But, if we feel that IS is more flexible, it is fine. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149962#comment-15149962 ] stack commented on HBASE-15180: --- bq. Codec being non private if we add new methods, users may have to change in their code when use new version.. Yeah, best if could get away with not changing it but I'd say it is ok. Its 2.0 and you are changing a class only used by hbase internals (i'd say) bq. Said that, continue with IS instead of a data structure like byte[]/BB is giving us all sort of flexibility. This is reasonable. Could be your Interface that does a subset of ByteBuffer too? But as you say, maybe while you are working stuff out, best to stay flexible. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149932#comment-15149932 ] Anoop Sam John commented on HBASE-15180: 2 reasons why did not go that way 1. Codec being non private if we add new methods, users may have to change in their code when use new version.. Can we add method to Codec interface in a minor/patch release? If we do fulfill our need with out changing the interface, I thought that is better, 2. Ya we have BB now.. Was thinking of doing PoC with reading data into more than one buffer when we use these buffers from a pool (We have BBBpool now). We know the size need to read the reqs. As in BBBPool we can make have running avg of size and create bigger buffers. But on demand creation of buffers off heap may create some -ve impact on full GC? Any way all these are to be tested. Said that, continue with IS instead of a data structure like byte[]/BB is giving us all sort of flexibility. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149319#comment-15149319 ] stack commented on HBASE-15180: --- bq. Maybe {{Codec}}s should be able to work on top of both IS and BBs instead of the BBIS wrapper. I like that. So add to the Codec Interface a getDecoder(BB)? Would that work [~anoop.hbase]? > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149215#comment-15149215 ] Enis Soztutar commented on HBASE-15180: --- HBASE-14501 removed {{IS.available()}} dependency for the Codecs, but we are adding it back here because we can rely on BBIS.available()? Maybe {{Codec}}s should be able to work on top of both IS and BBs instead of the BBIS wrapper. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15149211#comment-15149211 ] stack commented on HBASE-15180: --- s/remainAwareIS/reliableAvailable/ ? Ok on the rest. Sounds good. If new patch, add to the EOF thrown how many bytes are missing. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148985#comment-15148985 ] Anoop Sam John commented on HBASE-15180: bq.Is this flag saying we can rely on available() giving a good answer? Yes. Any suggestion? The unsafe based BB read dont have boundary checks.. That is why thought of adding check in this Stream level. Yes in StreamUtils it is doing the read of 4 bytes at a time int n = in.read(readInBuf, bytesRead, readInBuf.length - bytesRead); The while loop is to make sure it tried it best to read 4 bytes from stream. Even if the 1st go could do only 2 bytes. Still tried. This was present in BaseDecoder. Just moved from there to here. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148960#comment-15148960 ] stack commented on HBASE-15180: --- bq. I think a better question would be why the test coverage in hbase-common is not sufficient to capture the behavior expected by the hbase-server module. This sounds good along w/ the general theme that modules should be self-contained (yeah, we need to break up hbase-server, move out mapreduce, etc.) Can we have a better data member name than remainAwareIS? Is this flag saying we can rely on available() giving a good answer? Why bother with this check? Just try reading? If EOF, it'll throw an EOFE anyways? Otherwise, if some other issue, we'll want to see that exception rather than the EOFE we create here: 66 if(this.available()Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148845#comment-15148845 ] Sean Busbey commented on HBASE-15180: - {quote} bq. UT case TestTag is in the hbase-server module which depends on hbase-common, but all changes in the latest patch lie in hbase-common, and it seems Yetus only checked UT of the hbase-common module and neglected the failure. We need to make sure we run all test in all modules? Because our tests are not really UTs. Some are (mainly those in hbase-server module) are FTs. So change in hbase-common can cause issues with tests in hbase-server {quote} precommit testing is meant to be a quick first-pass correctness filter (or at the least, that's [the ideology Yetus subscribes to|http://yetus.apache.org/documentation/0.1.0/precommit-architecture/#some-philosophy]). It's not a replacement for robust nightly checks nor committer reviews. Based on those principles, Yetus presumes that modules are relatively self-contained wrt unit tests. In general, I think a new feature in yetus that leveraged the same module-dependency detecting used to determine build order in Yetus 0.2.0+ could also be used for an opt-in "build everything that depends on module X" mode. In our case though, precommit tests already take an absurdly long time and hbase-server is the biggest offender (at > 1 hour per JVM). Always running the hbase-server unit tests is simply not sustainable for us. I think a better question would be why the test coverage in hbase-common is not sufficient to capture the behavior expected by the hbase-server module. All that said, if you want to force a build across all modules (on the presumption that the "change in X means also build Y" generalizes) you could touch the root pom.xml in your change, or really any file in the top level directory. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148222#comment-15148222 ] Anoop Sam John commented on HBASE-15180: Ya the test code change looks fine. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148203#comment-15148203 ] Anoop Sam John commented on HBASE-15180: Thanks for checking the test.. I will check that.. bq. UT case TestTag is in the hbase-server module which depends on hbase-common, but all changes in the latest patch lie in hbase-common, and it seems Yetus only checked UT of the hbase-common module and neglected the failure. We need to make sure we run all test in all modules? Because our tests are not really UTs. Some are (mainly those in hbase-server module) are FTs. So change in hbase-common can cause issues with tests in hbase-server > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15148194#comment-15148194 ] Yu Li commented on HBASE-15180: --- {{TestTags#testFlushAndCompactionWithoutTags}} will fail with the latest patch. After some investigation, it's caused by the bad design of the test case, that it assumes the KeyValue offset in the backup array is always 0, while after the below change in {{KeyValueCodec}} in this patch, the assumption no longer stands, thus causing the assertion to fail. {code} +Cell c = createCell(buf.array(), buf.arrayOffset() + buf.position(), len); {code} The fix should be straight forward, that to change: {code} assertTrue(current.getValueOffset() + current.getValueLength() == current.getLength()); {code} into: {code} assertTrue(current.getValueOffset() + current.getValueLength() == current.getOffset() + current.getLength()); {code} in both line 222 and 240 of {{TestTags}}. [~anoop.hbase] mind take a look here? What's more, the UT case TestTag is in the hbase-server module which depends on hbase-common, but all changes in the latest patch lie in hbase-common, and it seems Yetus only checked UT of the hbase-common module and neglected the failure. Is this something new to fix or a known issue [~busbey]? Thanks. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146976#comment-15146976 ] Anoop Sam John commented on HBASE-15180: Ya my mistake in selecting Eclipse import suggestion .. Will correct. Thanks for the good eye. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15146768#comment-15146768 ] Yu Li commented on HBASE-15180: --- Below change in the v4 patch seems like some typo {code} +import com.sun.xml.internal.rngom.parse.compact.EOFException {code} and we should change to use {{java.io.EOFException}}? [~anoop.hbase] > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Affects Versions: 0.98.0 >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0, 1.3.0, 1.2.1 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch, > HBASE-15180_V4.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15143139#comment-15143139 ] Hadoop QA commented on HBASE-15180: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:red}-1{color} | {color:red} test4tests {color} | {color:red} 0m 0s {color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 37s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 19s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 18s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 1m 10s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 0m 49s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 21s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 19s {color} | {color:green} master passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 0m 19s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} javac {color} | {color:green} 0m 20s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 13s {color} | {color:red} Patch generated 3 new checkstyle issues in hbase-common (total was 45, now 47). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 11s {color} | {color:green} the patch passed {color} | | {color:red}-1{color} | {color:red} whitespace {color} | {color:red} 0m 0s {color} | {color:red} The patch has 2 line(s) that end in whitespace. Use git apply --whitespace=fix. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 24m 59s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 0s {color} | {color:red} hbase-common introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 22s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 20s {color} | {color:green} the patch passed with JDK v1.7.0_95 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 57s {color} | {color:green} hbase-common in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 50s {color} | {color:green} hbase-common in the patch passed with JDK v1.7.0_95. {color} | | {color:green}+1{color} | {color:green} asflicense {color} | {color:green} 0m 7s {color} | {color:green} Patch does not generate ASF License warnings. {color} | | {color:black}{color} | {color:black} {color} | {color:black} 39m 27s {color} | {color:black} {color} | \\ \\ || Reason || Tests || | FindBugs | module:hbase-common | | | org.apache.hadoop.hbase.codec.KeyValueCodec$KeyValueDecoder.parseCell() ignores result of org.apache.hadoop.hbase.io.ByteBufferInputStream.skip(long) At KeyValueCodec.java: At KeyValueCodec.java:[line 81] | \\ \\ || Subsystem || Report/Notes ||
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15130258#comment-15130258 ] Anoop Sam John commented on HBASE-15180: Have a new patch ready but need change once HBASE-15177 goes in. We will be using BBIS instead BAIS then. I dont need a new extension for BAIS to gran the byte[] it uses. I can add required APIs in BBIS. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127259#comment-15127259 ] stack commented on HBASE-15180: --- bq. Codec and its Decoder being non private, we can not directly change the params/return types. Codec was done long time ago. It was kept minimal until use cases were better known. We are probably only user. Lets look at a case-by-case basis. At least we can add methods and deprecate the old. bq. And same Decoder used in WAL reading as well. Good. That was intent that it would be used everywhere. Pity it is not used reading and writing hfiles. bq. Now we are returning a BAIS object from this IpcUtil method. I see what you are saying. Rather than BAIS, instead a CIS, one that does Cells more natively. That sounds good. As long as the CellIS is an IS, we can use Codec Interfaces. Rather than pass a boolean to the method to do direct or tags, could you return a different Codec implementation? A server-side Codec and/or tags-capable (would be better if seriialization figured if tags were present rather than a meta boolean passed in by the server)? Would we still need to do context if serverside codec and clientside codec? Looking at CellUtil#createCell, its content is same as what is in CellReadableByteArrayInputStream; its duplicated code? We are pivoting on the underlying Stream being a BAIS. Will it always be a BAIS? Will it ever be a DBB? bq. While reading from WAL, same path of flow is executed and then the IS wont be CellInputStream type. Why not and should it? bq. Only diff is instead of BAIS we make CellBAIS object so we save copy. Could the createCell save a copy in same way? Look see if a BAIS and if so, use its buffer and offset creating the Cell? bq. As u suggest we can have new config which can be turned on at server side and off at client side/ Pardon me. Config is a bad idea. Does the caller have enough context to make a server vs client Decoder? How did we do this for tags? It had to know if server vs client? We could also pull createCellScanner out of IPCUtil. Maybe createCellScanner does not belong in IPCUtil? We can then in RPCClient or RPCServer create different cellscanners. Or, each creates an IPCUtil. We could pass in a 'Context' on construction of IPCUtil and in it it would say if Client or Server. The logic in CellReadablePBIS is same as for PBIS? We use PBIS just so we can figure if EOF. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126122#comment-15126122 ] Jingcheng Du commented on HBASE-15180: -- To supplement more information. +--+--+-+-+ ||mslab-pool-on | mslab-on | mslab-off | +--+--+-+-+ | throughput | 242801 | 240464| 225679 | +--+--+-+-+ | latency(us) | 486| 483 | 704 | +--+--+-+-+ | gc pause(ms) | 8440 | 12837 | 17131 | +--+--+-+-+ I used 2MB for the chunk size during the test. Speaking of G1GC, G1GC splits the heap into 2000 regions. In my test (the heap size is 64GB), each region will be 32MB. If the chunk pool is off, there might be fragments after minor gc (2MB is a fragment comparing to 32MB, and the fragments are compacted in mixed gc and full gc). If the chunk size is increased to 16MB or larger, I believe the performance can be improved comparing to 2MB chunk size. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126131#comment-15126131 ] Jingcheng Du commented on HBASE-15180: -- And the gc pause was measured per minute. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15127620#comment-15127620 ] Anoop Sam John commented on HBASE-15180: bq.I see what you are saying. Rather than BAIS, instead a CIS, one that does Cells more natively. That sounds good. As long as the CellIS is an IS, we can use Codec Interfaces. Yes bq.Rather than pass a boolean to the method to do direct or tags, could you return a different Codec implementation? A server-side Codec and/or tags-capable (would be better if seriialization figured if tags were present rather than a meta boolean passed in by the server)? Would we still need to do context if serverside codec and clientside codec? Ya let me see.. I agree that it is ugly passing the boolean throughout. As of now we dont support passing tags from client to server and reverse. Codec has to serialize tags when it is Replication. So we have a new Codec (KVCodecWithTags). But both these Codecs were using same BaseDecoder and Cell create paths. Let me see how we can solve this. bq.We are pivoting on the underlying Stream being a BAIS. Will it always be a BAIS? Will it ever be a DBB? It can be DBB later. Once we start reading the request into DBB (pooled) - yes. Said that we are not having any hard need for underlying IS to be BAIS. That is the reason why I did not add some thing like a new API in Codec where asking Decoder to work on a byte[] or so. COntinue that to be an IS based gives us the freedom to change the underlying data structure. We can make ByteBufferIS which is a Cell readable. We can direct create cell (with out copy) over the underlying DBB. (We have OffheapCell now) bq.Could the createCell save a copy in same way? Look see if a BAIS and if so, use its buffer and offset creating the Cell? You mean the createCell in CellUtil? No. even if the IS is BAIS, we can not directly make a Cell (with out any copy). BAIS is not exposing its backing byte[] buffer. We will need indirect way of grabbing the byte[] from it. That is why I made the CellBAIS extending BAIS which is having extra API to create a Cell directly from its underlying buffer(with out any copy) {quote} bq.While reading from WAL, same path of flow is executed and then the IS wont be CellInputStream type. Why not and should it? {quote} >From that stream we can not make Cell directly with out any copy. The >underlying stream is the one from DFS. Said that CellInputStream is a stream >from which we can make cell DIRECTLY WITH OUT COPY. The name is confusing? >From other streams also we can read cell but with copy. Ya I agree new configs I also dont prefer. Actually we can avoid this and any context. We can have 2 paths of Decoder make in client and server. Ya we can even move it from IPCUtil also. Let me see. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15126709#comment-15126709 ] Esteban Gutierrez commented on HBASE-15180: --- Good stuff [~jingcheng...@intel.com]. Very interesting numbers, I expected further tuning to provide a throughput, but didn't tried with larger chucks (my test was around 24GB). Once I have a chance, I'm going to re-run my tests with JDK8 u80 and I let you know to let you know if I'm seeing the same behavior. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15125780#comment-15125780 ] Jingcheng Du commented on HBASE-15180: -- bq. I believe Jingcheng Du done some YCSB test with writes (G1GC we have) and he said it was giving bad results. Thanks [~anoop.hbase]. I did some YCSB tests with HBase 1.0.0 release and G1GC (JDK is 1.8.0_51), and I used 64GB for the JVM heap in each RS, and test the cases with/without MSLAB. I ran YCSB with the command "ycsb load hbase-10 -P 1T_workload -threads 200 -p columnfamily=family -p clientbuffering=true -s > 1T_workload.dat". I observed the performance (throughput, latency and gc pause time), the performance of the case when enabling MSLAB with chunk pool is better than the case when enabling MSLAB without chunk pool, and the case without MSLAB is worst. The performance might be different with G1GC in different jre versions and different heap sizes. According to my tests, MSLAB with chunk was best. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124875#comment-15124875 ] stack commented on HBASE-15180: --- bq. My plan is to make a Cell aware ByteArrayInputStream which can read Cells directly from it. Where do we need this (trying to follow along). In current patch I see it being used inside in IPCUtils method that returns a CellScanner -- seems odd to use this new Stream in this method to give to the Codec which then does the CellScanner Interface. bq. Plan to introduce some thing like a CodecContext associated with every Codec instance which can say the server/client context. Why we need a Context? Don't we currently make a decoder per Cell type and/or context? Then we keep simple Codec API and any mess parsing is internal to the Codec implementation? bq. SO u suggest renaming of the interface. That should be fine and looks better. Yeah, I think suggested name is better but, lets spend some time on how this stuff will be used first. I remember being here with this Codec stuff and I kept bumping into need for a CellInputStream but in end was able to make do with CellScanner; that was then and stuff may be different now. bq. To avoid the overhead of parsing tagsLength every time this was done. Yeah. Lets move away from passing these withTags flags in the code base.. When we decode, we should be able to cheaply figure if tags present or not; lets fix that rather than pass extra flag all over. bq. This was needed because of the way we have this PushbackIS. Shouldn't we pass the length when we create the PBIS derivative? bq. Now any way you suggest add a new config to decide this copy or not rather than rely on MSLAB. Can we ask our environment if we are on the serverside and if so, just do the non-copy and presume that MSLAB or something else, if MSLAB is off, will assume ownership of the Cells so we can let go of the buffer? Doing this is a little more indirect but better I think than having MSLAB reference in RPC. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124905#comment-15124905 ] Anoop Sam John commented on HBASE-15180: Codec and its Decoder being non private, we can not directly change the params/return types.And same Decoder used in WAL reading as well. You can note that the getDecoder still takes an InputStream. Now we are returning a BAIS object from this IpcUtil method. So for reading a Cell, we need copy cell bytes into new byte[]. Here this actual Object type which we return only is getting Changed.. We extend BAIS and make a CellBAIS which implements CellInputStream. And while we read cells from Decoder, we have check like if it is CellInputStream, we read Cells directly from it. While reading from WAL, same path of flow is executed and then the IS wont be CellInputStream type. Then we will be doing copy from Stream and make Cells.. So Codec gives a Decoder (which is of type CellScanner) by taking an InputStream. There is no change am making in this area. Only diff is instead of BAIS we make CellBAIS object so we save copy. I thought of making the Context so as to know the codec context where it is running (client or server).. We use diff IpcUtil methods (extra boolean param for copy or not) and that is why I thought now Context is not needed. As u suggest we can have new config which can be turned on at server side and off at client side/ bq.lets fix that rather than pass extra flag all over. Let me try. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124906#comment-15124906 ] Anoop Sam John commented on HBASE-15180: bq.Can we ask our environment if we are on the serverside and if so IpcUtil we have the conf. How we can know server/client side? > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124908#comment-15124908 ] Anoop Sam John commented on HBASE-15180: bq.Yes, I noticed about 5-10% improvement on GC times and CPU utilization after disabling MSLAB only if using G1GC. Tuning MSLAB helps a little but I don't see to much advantage to have it enabled when G1GC is there. I believe [~jingcheng...@intel.com] done some YCSB test with writes (G1GC we have) and he said it was giving bad results. bq.MSLAB is like an HBASE-14918 Segment. Would it make sense to make it a Segment? Segment is a part of the memstore right? MSLAB is a place for only keeping the Cell data. So I am not sure whether MSLAB can be a Segment. Any way our work in this umbrella Jira is to make the write path off heap and make an off heap MSLAB impl. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15123913#comment-15123913 ] Hadoop QA commented on HBASE-15180: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 39s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 2m 5s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 37s {color} | {color:green} master passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 8m 12s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 50s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 0s {color} | {color:red} hbase-common in master has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 47s {color} | {color:red} hbase-server in master has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 53s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 46s {color} | {color:green} master passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 21s {color} | {color:red} hbase-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 37s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 43s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 43s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 35s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 35s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_91. {color} | | {color:black}{color} | {color:black} checkstyle {color} | {color:black} 1m 23s {color} | {color:black} Patch generated 2 new checkstyle issues in hbase-common (total was 100, now 102). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 57s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 30m 36s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 29s {color} | {color:red} hbase-common introduced 1 new FindBugs issues. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 15s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 2m 3s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 37s {color} | {color:green} hbase-client in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 2m 53s {color} | {color:green} hbase-common in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 49s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 26s {color} | {color:green} hbase-client in the patch passed with JDK v1.7.0_91. {color} |
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124586#comment-15124586 ] Anoop Sam John commented on HBASE-15180: CellScanner - Ya we made it like normal our scanner way of call advance and then get current. You mean we should have been doing just getting next Cell from the Decoder? bq. If so, we have CellOutputStream, should your CellReadable be a CellInputStream with read methods that return Cells to mirror the write methods we have in CellOutputStream. Your CellReadableByteArrayInputStream would become CellByteArrayInputStream and would implement CellInputStream. SO u suggest renaming of the interface. That should be fine and looks better. bq.I've asked this before I know but do we have to flag when tags and when without? Internally, when we read, the Cell will know if it has tags or not? To avoid the overhead of parsing tagsLength every time this was done. The old method we used to read cells from InputStream was iscreate(final InputStream in, boolean withTags) So KVCodecWithTags only pass this as true and we make KeyValue instance. By default we have KVCodec which pass false and we make NoTagsKV on which getTagsLength is just return 0. But there is still an issue more. When we copy the Cell to MSLAB before adding to Memstore, we do copy and make a new KeyValue only. So this NoTagsKV becomes KV there !!!Need handle but another issue. Ya as u said, we can avoid passing this boolean and after the Cell is been read out in new CellInputStream (KeyValue object then), we need to parse tags length once and see it is 0 or >0 and based on that recreate NoTagsKV if needed. That is one parsing op extra and one Object creation extra. For that gone with passing the boolean at that time I believe. What do u say? {quote} What is the length in the below? Cell readCell(int length, boolean withTags) throws IOException; Do we have to pass this in each time? {quote} This was needed because of the way we have this PushbackIS. In Decoder impl, we wrap the actual IS with this PBIS. The Decoder advance operation, reads one byte to know whether it is end. And now the remaining 3 bytes and old already read byte to be considered to get the Cell's total length. Said that the read of the Cell's length has to happen in the PBIS impl. (there only we know the old already read single byte).. And the read of the Cell (Construction of Cell) to be done in CellByteArrayInputStream. That is why we have to pass the length. I agree it looks ugly... I had no other way.. This PBIS thing was done to read Cells in WAL decoder properly. In case of reading Cells from byte[] from RPC, we dont really have such an issue. Let me see some way we can solve this. May be we need special treatment for both cases. bq.IPCUtil takes a Configuration? Can we not just read the Configuration on construction rather than pass this flag per call? IPCUtil #createCellScanner(final Codec codec, final CompressionCodec compressor, final byte [] cellBlock) - Used from RpcClient new method called from RpcServer alone. Yes I dont want this direct Cell create to happen in client side where we dont have an MSLAB copy. Still as u said, if we read the conf property and set in IPCUtil and use that while construction of Stream, I can avoid this passing of boolean. If any chance the config xml in server is used in the client and that says use MSLAB (Still we dont have as we dont have Memstore and all there), we will do it wrongly no. That is I was afraid to do that. Now any way you suggest add a new config to decide this copy or not rather than rely on MSLAB. I think ya it is good. Only downside is again a new config! But ya it looks better.. Let me do that.. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124771#comment-15124771 ] ramkrishna.s.vasudevan commented on HBASE-15180: Renaming the interface is better +1 on that. G1GC and MSLAB is interesting to see. We need a handoff from the incoming request handlers to the Server. Going based on MSLAB on or off makes sense but if it is 'off' then creating a BB pool on the Rpcserver request processing side may have to see for issues still the BBs are flushed to a hfile? > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124005#comment-15124005 ] Enis Soztutar commented on HBASE-15180: --- bq. May be BB based then. Again am trying to experiment with reading the req not just into one single large BB. From the pool we might be getting fixed sized smaller BBs (Say 64 KB or so) And we can read in to those many BBs. And the CellScanner need to work on a set of BBs (Like the MultiByteBuff stuff) Then again even this BB based API is an issue.. Continue with an InputStream based API gives us the freedom of experimenting with this different data structures. We are still creating an IS per request. It is not the end of the world though. At the time of the request we do know the RPC buffer size required. I was trying to reuse the BBPool from Hadoop which can return a buffer at least as large as the request size. bq. Ya we have MSALB enabled by default. I agree that doing the MSLAB check in RPC layer looks ugly. Wanted to avoid we refer to the req read byte[] (from memstore cells) when some one turns MSLAB off. So what do you say? Remove this check? We need to do a separate issue and remove the option for disabling MSLAB. Then we can assume in this patch that MSLAB is always enabled. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124024#comment-15124024 ] stack commented on HBASE-15180: --- On MSLAB on all the time, there is talk of turning it off in G1GC because there it does not help (this you [~esteban]?). TBD. On other hand, I can see need for a 'terminus', a 'handoff' from the reading handler so no longer a reference and so something like a copy into MSLAB makes some sense. MSLAB is like an HBASE-14918 Segment. Would it make sense to make it a Segment? > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124036#comment-15124036 ] Esteban Gutierrez commented on HBASE-15180: --- Yes, I noticed about 5-10% improvement on GC times and CPU utilization after disabling MSLAB only if using G1GC. Tuning MSLAB helps a little but I don't see to much advantage to have it enabled when G1GC is there. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124091#comment-15124091 ] Enis Soztutar commented on HBASE-15180: --- bq. Yes, I noticed about 5-10% improvement on GC times and CPU utilization after disabling MSLAB only if using G1GC. Even with G1, this is unexpected. Is there a theoretical explanation? > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124141#comment-15124141 ] Esteban Gutierrez commented on HBASE-15180: --- My guess at that time was that the rate the RSs where flushing the memstores aggressively, MSLAB was only creating extra work for the collector. If I remember correctly, I remember I started to notice the improvement of disabling MSLAB after going to higher throughputs (+100K puts/sec). At lower throughputs I don't think the contribution with our without was noticeable. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124145#comment-15124145 ] stack commented on HBASE-15180: --- bq. Even with G1, this is unexpected. Is there a theoretical explanation? Why you say [~enis]? Here's a SWAG. MSLAB is to prevent fragmentation in CMS. Every Cell gets copied and there is the allocation of the SLABs themselves (no reuse). G1GC avoids fragmentation by copying to a new region if fragmention. Many small copies and SLAB allocations cost more than the relatively macro copies G1GC does. To be verified... > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124076#comment-15124076 ] Hadoop QA commented on HBASE-15180: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 1s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 2m 40s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 0s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 7s {color} | {color:green} master passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 7m 10s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 37s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 42s {color} | {color:red} hbase-common in master has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 1m 50s {color} | {color:red} hbase-server in master has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 0m 58s {color} | {color:green} master passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 11s {color} | {color:green} master passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 14s {color} | {color:red} hbase-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 24s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 20s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 20s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 25s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 25s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 10s {color} | {color:red} Patch generated 2 new checkstyle issues in hbase-common (total was 100, now 102). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 36s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 21m 55s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 3m 56s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 1s {color} | {color:green} the patch passed with JDK v1.8.0_72 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 14s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 57s {color} | {color:green} hbase-client in the patch passed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 44s {color} | {color:green} hbase-common in the patch passed with JDK v1.8.0_72. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 20s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_72. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 56s {color} | {color:green} hbase-client in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} |
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15124131#comment-15124131 ] stack commented on HBASE-15180: --- Why not use Cell Codec for CellReadableByteArrayInputStream? CellCodec.Decoded takes an InputStream and returns Cells via CellScanner implementation. What is difference between a CellReadable and a CellScanner? You have to do advance and then current each time which is a little more awkward but it is a common pattern we want to institute throughout. I suppose it is awkward. That'd be your argument. If so, we have CellOutputStream, should your CellReadable be a CellInputStream with read methods that return Cells to mirror the write methods we have in CellOutputStream. Your CellReadableByteArrayInputStream would become CellByteArrayInputStream and would implement CellInputStream. I've asked this before I know but do we have to flag when tags and when without? Internally, when we read, the Cell will know if it has tags or not? What is the length in the below? Cell readCell(int length, boolean withTags) throws IOException; Do we have to pass this in each time? 192* @param directCellRead 193* Whether to make Cells directly from the cellBlock bytes or need to copy. Pass false 194* while using from client side. IPCUtil takes a Configuration? Can we not just read the Configuration on construction rather than pass this flag per call? Seems like you want server-side and client-side to act different. Having RPCServer 'know' about MSLAB don't seem right. It is pollution of rpc with internals on how we do memstore. Can we have another property for when we should lean on the rpc buffer (we 'know' it safe when mslab is going on. perhaps a method that obscures the rationale for when to copy ) Patch looks great otherwise. > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch, HBASE-15180_V2.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122488#comment-15122488 ] Enis Soztutar commented on HBASE-15180: --- This is a good idea. Having all cells in the same RPC share the same byte[]. Is {{CellReadable}} really necessary? Isn't this the same thing as Codec.Decoder. I mean, from a layering perspective, I thought that we would instead change the Codec to be aware of byte[] directly, and return a CellScanner that can return KV's from the same buffer. I was thinking of doing a Codec at the RPC layer to do something like FAST_DIFF. Can that still be done with this patch? Should we default to MSLAB for good? I don't think anybody runs with MSLAB off. RPCServer reaching this is not right: {code} +this.mslabEnabled = conf.getBoolean(HConstants.USEMSLAB_KEY, HConstants.USEMSLAB_DEFAULT); {code} Can the byte[4]'s be statically allocated? > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15122878#comment-15122878 ] Anoop Sam John commented on HBASE-15180: You mean to make Codec decoder to work over a byte[] also rather than on InputStream? That is nice. Why I did not go through that path is my next work is in E2E off heap in write path also. So when the write req comes, we might be reading it into an off heap BB (from a pool). So then byte[] based decoding is not possible. May be BB based then. Again am trying to experiment with reading the req not just into one single large BB. From the pool we might be getting fixed sized smaller BBs (Say 64 KB or so) And we can read in to those many BBs. And the CellScanner need to work on a set of BBs (Like the MultiByteBuff stuff)Then again even this BB based API is an issue.. Continue with an InputStream based API gives us the freedom of experimenting with this different data structures. bq.Should we default to MSLAB for good? I don't think anybody runs with MSLAB off. Ya we have MSALB enabled by default. I agree that doing the MSLAB check in RPC layer looks ugly. Wanted to avoid we refer to the req read byte[] (from memstore cells) when some one turns MSLAB off.So what do you say? Remove this check? bq.Can the byte[4]'s be statically allocated? No we can not. There will be many parallel req processing and many instances of this InputStream in action. On one instance only we dont have multi threaded reads. Making it static make all IS instances refer the same byte [] ! > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120788#comment-15120788 ] Hadoop QA commented on HBASE-15180: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:blue}0{color} | {color:blue} reexec {color} | {color:blue} 0m 0s {color} | {color:blue} Docker mode activated. {color} | | {color:green}+1{color} | {color:green} hbaseanti {color} | {color:green} 0m 0s {color} | {color:green} Patch does not have any anti-patterns. {color} | | {color:green}+1{color} | {color:green} @author {color} | {color:green} 0m 0s {color} | {color:green} The patch does not contain any @author tags. {color} | | {color:green}+1{color} | {color:green} test4tests {color} | {color:green} 0m 0s {color} | {color:green} The patch appears to include 3 new or modified test files. {color} | | {color:green}+1{color} | {color:green} mvninstall {color} | {color:green} 3m 10s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 13s {color} | {color:green} master passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} compile {color} | {color:green} 1m 17s {color} | {color:green} master passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} checkstyle {color} | {color:green} 7m 56s {color} | {color:green} master passed {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 42s {color} | {color:green} master passed {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 0m 50s {color} | {color:red} hbase-common in master has 1 extant Findbugs warnings. {color} | | {color:red}-1{color} | {color:red} findbugs {color} | {color:red} 2m 6s {color} | {color:red} hbase-server in master has 1 extant Findbugs warnings. {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 9s {color} | {color:green} master passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 18s {color} | {color:green} master passed with JDK v1.7.0_91 {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 16s {color} | {color:red} hbase-client in the patch failed. {color} | | {color:red}-1{color} | {color:red} mvninstall {color} | {color:red} 0m 29s {color} | {color:red} hbase-server in the patch failed. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 26s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 26s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} compile {color} | {color:red} 0m 29s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} javac {color} | {color:red} 0m 29s {color} | {color:red} hbase-server in the patch failed with JDK v1.7.0_91. {color} | | {color:red}-1{color} | {color:red} checkstyle {color} | {color:red} 1m 11s {color} | {color:red} Patch generated 2 new checkstyle issues in hbase-common (total was 100, now 102). {color} | | {color:green}+1{color} | {color:green} mvneclipse {color} | {color:green} 0m 40s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} whitespace {color} | {color:green} 0m 0s {color} | {color:green} Patch has no whitespace issues. {color} | | {color:green}+1{color} | {color:green} hadoopcheck {color} | {color:green} 25m 35s {color} | {color:green} Patch does not cause any errors with Hadoop 2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.1 2.6.2 2.6.3 2.7.1. {color} | | {color:green}+1{color} | {color:green} findbugs {color} | {color:green} 4m 18s {color} | {color:green} the patch passed {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 8s {color} | {color:green} the patch passed with JDK v1.8.0_66 {color} | | {color:green}+1{color} | {color:green} javadoc {color} | {color:green} 1m 19s {color} | {color:green} the patch passed with JDK v1.7.0_91 {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 52s {color} | {color:green} hbase-client in the patch passed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 1m 43s {color} | {color:green} hbase-common in the patch passed with JDK v1.8.0_66. {color} | | {color:red}-1{color} | {color:red} unit {color} | {color:red} 0m 29s {color} | {color:red} hbase-server in the patch failed with JDK v1.8.0_66. {color} | | {color:green}+1{color} | {color:green} unit {color} | {color:green} 0m 59s {color} | {color:green} hbase-client in the patch passed with JDK v1.7.0_91. {color} | | {color:green}+1{color} |
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120736#comment-15120736 ] Ted Yu commented on HBASE-15180: {code} 48public Cell readCell(boolean withTags) throws IOException { 49 if (intBuf == null) { 50// Lazy init. In real flow, we will use the readCell(int, boolean) API only 51intBuf = new byte[Bytes.SIZEOF_INT]; {code} There are two places where readCell() is implemented in the same manner. Can code duplication be reduced ? {code} 56protected static class CellReadablePBIS extends PBIS implements CellReadable { {code} What does PB mean above ? > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-15180) Reduce garbage created while reading Cells from Codec Decoder
[ https://issues.apache.org/jira/browse/HBASE-15180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120791#comment-15120791 ] Anoop Sam John commented on HBASE-15180: bq.What does PB mean above ? PushbackInputStream. We have PBIS already here. Let me see how can we avoid the duplication > Reduce garbage created while reading Cells from Codec Decoder > - > > Key: HBASE-15180 > URL: https://issues.apache.org/jira/browse/HBASE-15180 > Project: HBase > Issue Type: Sub-task > Components: regionserver >Reporter: Anoop Sam John >Assignee: Anoop Sam John > Fix For: 2.0.0 > > Attachments: HBASE-15180.patch > > > In KeyValueDecoder#parseCell (Default Codec decoder) we use > KeyValueUtil#iscreate to read cells from the InputStream. Here we 1st create > a byte[] of length 4 and read the cell length and then an array of Cell's > length and read in cell bytes into it and create a KV. > Actually in server we read the reqs into a byte[] and CellScanner is created > on top of a ByteArrayInputStream on top of this. By default in write path, we > have MSLAB usage ON. So while adding Cells to memstore, we will copy the Cell > bytes to MSLAB memory chunks (default 2 MB size) and recreate Cells over that > bytes. So there is no issue if we create Cells over the RPC read byte[] > directly here in Decoder. No need for 2 byte[] creation and copy for every > Cell in request. > My plan is to make a Cell aware ByteArrayInputStream which can read Cells > directly from it. > Same Codec path is used in client side also. There better we can avoid this > direct Cell create and continue to do the copy to smaller byte[]s path. Plan > to introduce some thing like a CodecContext associated with every Codec > instance which can say the server/client context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)