[jira] [Comment Edited] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622478#comment-14622478 ] Ted Yu edited comment on HBASE-11339 at 7/10/15 4:41 PM: - For Compactor.java : {code} // TODO mob introduced the fd parameter; can we make this cleaner and easier to extend in future? {code} Edit: DefaultMobStoreCompactor uses fd parameter. We can leave the API change as is. was (Author: yuzhih...@gmail.com): For Compactor.java : {code} // TODO mob introduced the fd parameter; can we make this cleaner and easier to extend in future? {code} parameter fd is not used in performCompaction(). Can this parameter be omitted ? > HBase MOB > - > > Key: HBASE-11339 > URL: https://issues.apache.org/jira/browse/HBASE-11339 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 2.0.0 >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Fix For: hbase-11339 > > Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase > MOB Design-v4.pdf, HBase MOB Design-v5.pdf, HBase MOB Design.pdf, MOB user > guide.docx, MOB user guide_v2.docx, MOB user guide_v3.docx, MOB user > guide_v4.docx, MOB user guide_v5.docx, hbase-11339-150519.patch, > hbase-11339-in-dev.patch, hbase-11339.150417.patch, merge-150212.patch, > merge.150212b.patch, merge.150212c.patch, merge.150710.patch > > > It's quite useful to save the medium binary data like images, documents > into Apache HBase. Unfortunately directly saving the binary MOB(medium > object) to HBase leads to a worse performance since the frequent split and > compaction. > In this design, the MOB data are stored in an more efficient way, which > keeps a high write/read performance and guarantees the data consistency in > Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487320#comment-14487320 ] Jonathan Hsieh edited comment on HBASE-11339 at 4/9/15 1:01 PM: I am in the process of merging with master in order to call a merge in the next week or so. Currently I'm working through are some unit test problems. was (Author: jmhsieh): I am in the process of rebasing in order to call a merge in the next week or so. Currently I'm working through are some unit test problems. > HBase MOB > - > > Key: HBASE-11339 > URL: https://issues.apache.org/jira/browse/HBASE-11339 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Affects Versions: 2.0.0 >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Fix For: hbase-11339 > > Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase > MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user > guide_v2.docx, MOB user guide_v3.docx, MOB user guide_v4.docx, MOB user > guide_v5.docx, hbase-11339-in-dev.patch, merge-150212.patch, > merge.150212b.patch, merge.150212c.patch > > > It's quite useful to save the medium binary data like images, documents > into Apache HBase. Unfortunately directly saving the binary MOB(medium > object) to HBase leads to a worse performance since the frequent split and > compaction. > In this design, the MOB data are stored in an more efficient way, which > keeps a high write/read performance and guarantees the data consistency in > Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120908#comment-14120908 ] Jonathan Hsieh edited comment on HBASE-11339 at 9/4/14 3:40 AM: Thanks Lars. Justifying features and implementations is a worthwhile exercise especially since it leaves a record of alternatives considered. Feature branch sounds good to me -- i've been a general fan of these. We'll call it the hbase-11339 branch. Along the way we'll likely commit the mr managed code, but refactor/remove it with the new mechanism and have metrics and snapshots support before we call a merge vote. was (Author: jmhsieh): Thanks Lars. Justify features and implementations is a worthwhile exercise especially since it leaves a record of alternatives considered. Feature branch sounds good to me -- i've a general fan of these. We'll call it the hbase-11339 branch. Along the way we'll likely commit the mr managed code, but refactor/remove it with the new mechanism and have metrics and snapshots support before we call a merge vote. > HBase MOB > - > > Key: HBASE-11339 > URL: https://issues.apache.org/jira/browse/HBASE-11339 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase > MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user > guide_v2.docx, MOB user guide_v3.docx, hbase-11339-in-dev.patch > > > It's quite useful to save the medium binary data like images, documents > into Apache HBase. Unfortunately directly saving the binary MOB(medium > object) to HBase leads to a worse performance since the frequent split and > compaction. > In this design, the MOB data are stored in an more efficient way, which > keeps a high write/read performance and guarantees the data consistency in > Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120250#comment-14120250 ] Jonathan Hsieh edited comment on HBASE-11339 at 9/4/14 3:32 AM: bq. The master's built in splitting was still available even if there was no MR runtime that could run the replay tool. If you were ok with 10 hr downtimes due to recovery (back then no meta first recovery), the sure. For large deployments that MR for this was critical and not really optional. bq. Stage = JIRA issue. sgtm. bq. If I read the above correctly we are looking at 2.0 as a possible release for shipping this feature? I suggest we communicate the feature status as experimental for the whole release line, i.e. until 2.1, like what we have done with the cell security features in the 0.98 line. Yes -- trunk is 2.0 and new features should only land in trunk and yes, we would note it as experimental until all pieces are in and some hardening as taken place. . Ideally, all major features would be experimental in their first release. If we follow through with having 2.0 \-> 2.1 be like will be like 0.92 \-> 0.94 or 0.96 \->0.98, then following the cell security approach for experimental status sounds good to me. (edit fixed some formatting with accidental -strikethroughs-) was (Author: jmhsieh): bq. The master's built in splitting was still available even if there was no MR runtime that could run the replay tool. If you were ok with 10 hr downtimes due to recovery (back then no meta first recovery), the sure. For large deployments that MR for this was critical and not really optional. bq. Stage = JIRA issue. sgtm. bq. If I read the above correctly we are looking at 2.0 as a possible release for shipping this feature? I suggest we communicate the feature status as experimental for the whole release line, i.e. until 2.1, like what we have done with the cell security features in the 0.98 line. Yes -- trunk is 2.0 and new features should only land in trunk and yes, we would note it as experimental until all pieces are in and some hardening as taken place. . Ideally, all major features would be experimental in their first release. If we follow through with having 2.0 -> 2.1 be like will be like 0.92 -> 0.94 or 0.96->0.98, then following the cell security approach for experimental status sounds good to me. > HBase MOB > - > > Key: HBASE-11339 > URL: https://issues.apache.org/jira/browse/HBASE-11339 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase > MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user > guide_v2.docx, MOB user guide_v3.docx, hbase-11339-in-dev.patch > > > It's quite useful to save the medium binary data like images, documents > into Apache HBase. Unfortunately directly saving the binary MOB(medium > object) to HBase leads to a worse performance since the frequent split and > compaction. > In this design, the MOB data are stored in an more efficient way, which > keeps a high write/read performance and guarantees the data consistency in > Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118911#comment-14118911 ] Jonathan Hsieh edited comment on HBASE-11339 at 9/2/14 11:02 PM: - bq. Related, the MOB design also attempts to avoid write amplification of large cells during compaction, by segregating large values into separate files set outside the normal compaction process. Rather than normal compaction, an external MapReduce based tool is used for compacting MOB files. HBase has never required MapReduce before and we should really think hard before introducing such a change. Are we sure the desired objectives cannot be met with a pluggable compaction policy? Removing the external processes that perform "mob compaction" is one of the follow up goals and is noted at HBASE-11861. We want to get rid of the MR dependencies because it introduces a new piece of operational complexity and I don't want that. I don't consider the MOB feature to be production ready if it still requires the external process to manage this. The mob feature, like other experimental features that require external tooling, will be experimental until simplified operationally. We've done this before -- for example,favored nodes HBASE-7932 is experimental because it is not "set-it-and-forget"; it requires extra processes such as an external balancer. For MOB, after we get the other blockers in (snapshot support, metrics) we'll revamp the mob compaction and then remove the experimental tag. Our goal would be to get this all in by the end of the year. was (Author: jmhsieh): bq. Related, the MOB design also attempts to avoid write amplification of large cells during compaction, by segregating large values into separate files set outside the normal compaction process. Rather than normal compaction, an external MapReduce based tool is used for compacting MOB files. HBase has never required MapReduce before and we should really think hard before introducing such a change. Are we sure the desired objectives cannot be met with a pluggable compaction policy? Removing the external processes that perform "mob compaction" is one of the follow up goals and is noted at HBASE-11861. We want to get rid of the MR dependencies because it introduces a new piece of operational complexity and I don't want that. I don't consider the MOB feature to be production ready if it still requires the external process to manage this. The mob feature like other features that require external tooling be experimental until simplified operationally. We've done this before -- for example,, I call favored nodes HBASE-7932 experimental becuase it is not "set-it-and-forget"; it requires extra processes such as an external balancer. For MOB, after we get the other blockers in (snapshot support, metrics) we'll revamp the mob compaction and then remove the experimental tag. Our goal would be to get this all in by the end of the year. > HBase MOB > - > > Key: HBASE-11339 > URL: https://issues.apache.org/jira/browse/HBASE-11339 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase > MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user > guide_v2.docx, hbase-11339-in-dev.patch > > > It's quite useful to save the medium binary data like images, documents > into Apache HBase. Unfortunately directly saving the binary MOB(medium > object) to HBase leads to a worse performance since the frequent split and > compaction. > In this design, the MOB data are stored in an more efficient way, which > keeps a high write/read performance and guarantees the data consistency in > Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118911#comment-14118911 ] Jonathan Hsieh edited comment on HBASE-11339 at 9/2/14 10:58 PM: - bq. Related, the MOB design also attempts to avoid write amplification of large cells during compaction, by segregating large values into separate files set outside the normal compaction process. Rather than normal compaction, an external MapReduce based tool is used for compacting MOB files. HBase has never required MapReduce before and we should really think hard before introducing such a change. Are we sure the desired objectives cannot be met with a pluggable compaction policy? Removing the external processes that perform "mob compaction" is one of the follow up goals and is noted at HBASE-11861. We want to get rid of the MR dependencies because it introduces a new piece of operational complexity and I don't want that. I don't consider the MOB feature to be production ready if it still requires the external process to manage this. The mob feature like other features that require external tooling be experimental until simplified operationally. We've done this before -- for example,, I call favored nodes HBASE-7932 experimental becuase it is not "set-it-and-forget"; it requires extra processes such as an external balancer. For MOB, after we get the other blockers in (snapshot support, metrics) we'll revamp the mob compaction and then remove the experimental tag. Our goal would be to get this all in by the end of the year. was (Author: jmhsieh): bq. Related, the MOB design also attempts to avoid write amplification of large cells during compaction, by segregating large values into separate files set outside the normal compaction process. Rather than normal compaction, an external MapReduce based tool is used for compacting MOB files. HBase has never required MapReduce before and we should really think hard before introducing such a change. Are we sure the desired objectives cannot be met with a pluggable compaction policy? Removing the external processes that perform "mob compaction" is one of the follow up goals and is noted at HBASE-11861. We want to get rid of the MR dependencies because it introduces a new piece of operational complexity and I don't want that. I don't consider the MOB feature to be production ready if it still requires the external process to manage this. The mob feature like other features that require external tooling be experimental until simplified operationally. We've done this before -- for example,, I call favored nodes HBASE-7932 experimental becuase it is not "set-it-and-forget"; it requires extra processes such as an external balancer. For MOB, after we get the other blockers in (snapshot support, metrics) we'll revamp the mob compaction and then remove the experimental tag. My goal would be to get this all in by the end of the year. > HBase MOB > - > > Key: HBASE-11339 > URL: https://issues.apache.org/jira/browse/HBASE-11339 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase > MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user > guide_v2.docx, hbase-11339-in-dev.patch > > > It's quite useful to save the medium binary data like images, documents > into Apache HBase. Unfortunately directly saving the binary MOB(medium > object) to HBase leads to a worse performance since the frequent split and > compaction. > In this design, the MOB data are stored in an more efficient way, which > keeps a high write/read performance and guarantees the data consistency in > Apache HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (HBASE-11339) HBase MOB
[ https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115526#comment-14115526 ] Jonathan Hsieh edited comment on HBASE-11339 at 8/29/14 5:24 PM: - [~jiajia], new version of docs look good, I think it is done for now unless we make changes to it. nits: I found there are two typos, "provinding" \-> "providing" and "handers"\->"handlers".Don't worry about fixing this for now -- we'll have [~misty] convert them into a chapter or section in the ref guide. Also, in the future, please do not delete attachments -- just provide a new version with a v2 or some think like that so we can keep track of the evolution. was (Author: jmhsieh): [~jiajia], new version of docs look good, I think it is done for now unless we make changes to it. nits: I found there are two typos, "provinding" -> "providing" and "handers"->"handlers".Don't worry about fixing this for now -- we'll have [~misty] convert them into a chapter or section in the ref guide. Also, in the future, please do not delete attachments -- just provide a new version with a v2 or some think like that so we can keep track of the evolution. > HBase MOB > - > > Key: HBASE-11339 > URL: https://issues.apache.org/jira/browse/HBASE-11339 > Project: HBase > Issue Type: Umbrella > Components: regionserver, Scanners >Reporter: Jingcheng Du >Assignee: Jingcheng Du > Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase > MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, > hbase-11339-in-dev.patch > > > It's quite useful to save the medium binary data like images, documents > into Apache HBase. Unfortunately directly saving the binary MOB(medium > object) to HBase leads to a worse performance since the frequent split and > compaction. > In this design, the MOB data are stored in an more efficient way, which > keeps a high write/read performance and guarantees the data consistency in > Apache HBase. -- This message was sent by Atlassian JIRA (v6.2#6252)