[jira] [Comment Edited] (HBASE-11339) HBase MOB

2015-07-10 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14622478#comment-14622478
 ] 

Ted Yu edited comment on HBASE-11339 at 7/10/15 4:41 PM:
-

For Compactor.java :
{code}
  // TODO mob introduced the fd parameter; can we make this cleaner and easier 
to extend in future?
{code}

Edit: DefaultMobStoreCompactor uses fd parameter.
We can leave the API change as is.


was (Author: yuzhih...@gmail.com):
For Compactor.java :
{code}
  // TODO mob introduced the fd parameter; can we make this cleaner and easier 
to extend in future?
{code}
parameter fd is not used in performCompaction(). Can this parameter be omitted ?

> HBase MOB
> -
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
>  Issue Type: Umbrella
>  Components: regionserver, Scanners
>Affects Versions: 2.0.0
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Fix For: hbase-11339
>
> Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase 
> MOB Design-v4.pdf, HBase MOB Design-v5.pdf, HBase MOB Design.pdf, MOB user 
> guide.docx, MOB user guide_v2.docx, MOB user guide_v3.docx, MOB user 
> guide_v4.docx, MOB user guide_v5.docx, hbase-11339-150519.patch, 
> hbase-11339-in-dev.patch, hbase-11339.150417.patch, merge-150212.patch, 
> merge.150212b.patch, merge.150212c.patch, merge.150710.patch
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-11339) HBase MOB

2015-04-09 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14487320#comment-14487320
 ] 

Jonathan Hsieh edited comment on HBASE-11339 at 4/9/15 1:01 PM:


I am in the process of  merging with master in order to call a merge in the 
next week or so.   Currently I'm working through are some unit test problems.


was (Author: jmhsieh):
I am in the process of rebasing in order to call a merge in the next week or 
so.   Currently I'm working through are some unit test problems.

> HBase MOB
> -
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
>  Issue Type: Umbrella
>  Components: regionserver, Scanners
>Affects Versions: 2.0.0
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Fix For: hbase-11339
>
> Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase 
> MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user 
> guide_v2.docx, MOB user guide_v3.docx, MOB user guide_v4.docx, MOB user 
> guide_v5.docx, hbase-11339-in-dev.patch, merge-150212.patch, 
> merge.150212b.patch, merge.150212c.patch
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-11339) HBase MOB

2014-09-03 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120908#comment-14120908
 ] 

Jonathan Hsieh edited comment on HBASE-11339 at 9/4/14 3:40 AM:


Thanks Lars.  Justifying features and implementations is a worthwhile exercise  
especially since it leaves a record of alternatives considered.  

Feature branch sounds good to me -- i've been a general fan of these.  We'll 
call it the hbase-11339 branch.  Along the way we'll likely commit the mr 
managed code, but refactor/remove it with the new mechanism and have metrics 
and snapshots support before we call a merge vote.




was (Author: jmhsieh):
Thanks Lars.  Justify features and implementations is a worthwhile exercise  
especially since it leaves a record of alternatives considered.  

Feature branch sounds good to me -- i've a general fan of these.  We'll call it 
the hbase-11339 branch.  Along the way we'll likely commit the mr managed code, 
but refactor/remove it with the new mechanism and have metrics and snapshots 
support before we call a merge vote.



> HBase MOB
> -
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
>  Issue Type: Umbrella
>  Components: regionserver, Scanners
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase 
> MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user 
> guide_v2.docx, MOB user guide_v3.docx, hbase-11339-in-dev.patch
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-11339) HBase MOB

2014-09-03 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14120250#comment-14120250
 ] 

Jonathan Hsieh edited comment on HBASE-11339 at 9/4/14 3:32 AM:


bq. The master's built in splitting was still available even if there was no MR 
runtime that could run the replay tool.

If you were ok with 10 hr downtimes due to recovery (back then no meta first 
recovery), the sure.  For large deployments that MR for this was critical and 
not really optional.

bq. Stage = JIRA issue.

sgtm.

bq. If I read the above correctly we are looking at 2.0 as a possible release 
for shipping this feature? I suggest we communicate the feature status as 
experimental for the whole release line, i.e. until 2.1, like what we have done 
with the cell security features in the 0.98 line.

Yes -- trunk is 2.0 and new features should only land in trunk and yes, we 
would note it as experimental until all pieces are in and some hardening as 
taken place. .  Ideally, all major features would be experimental in their 
first release. If we follow through with having 2.0 \-> 2.1 be like will be 
like 0.92 \-> 0.94 or 0.96 \->0.98, then following the cell security approach 
for experimental status sounds good to me.

(edit fixed some formatting with accidental -strikethroughs-)



was (Author: jmhsieh):
bq. The master's built in splitting was still available even if there was no MR 
runtime that could run the replay tool.

If you were ok with 10 hr downtimes due to recovery (back then no meta first 
recovery), the sure.  For large deployments that MR for this was critical and 
not really optional.

bq. Stage = JIRA issue.

sgtm.

bq. If I read the above correctly we are looking at 2.0 as a possible release 
for shipping this feature? I suggest we communicate the feature status as 
experimental for the whole release line, i.e. until 2.1, like what we have done 
with the cell security features in the 0.98 line.

Yes -- trunk is 2.0 and new features should only land in trunk and yes, we 
would note it as experimental until all pieces are in and some hardening as 
taken place. .  Ideally, all major features would be experimental in their 
first release. If we follow through with having 2.0 -> 2.1 be like will be like 
0.92 -> 0.94 or 0.96->0.98, then following the cell security approach for 
experimental status sounds good to me.



> HBase MOB
> -
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
>  Issue Type: Umbrella
>  Components: regionserver, Scanners
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase 
> MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user 
> guide_v2.docx, MOB user guide_v3.docx, hbase-11339-in-dev.patch
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-11339) HBase MOB

2014-09-02 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118911#comment-14118911
 ] 

Jonathan Hsieh edited comment on HBASE-11339 at 9/2/14 11:02 PM:
-

bq. Related, the MOB design also attempts to avoid write amplification of large 
cells during compaction, by segregating large values into separate files set 
outside the normal compaction process. Rather than normal compaction, an 
external MapReduce based tool is used for compacting MOB files. HBase has never 
required MapReduce before and we should really think hard before introducing 
such a change. Are we sure the desired objectives cannot be met with a 
pluggable compaction policy?

Removing the external processes that perform "mob compaction" is one of the 
follow up goals and is noted at HBASE-11861.  We want to get rid of the MR 
dependencies because it introduces a new piece of operational complexity and I 
don't want that.  I don't consider the MOB feature to be production ready if it 
still requires the external process to manage this.  

The mob feature, like other experimental features that require external 
tooling, will  be experimental until simplified operationally.  We've done this 
before -- for example,favored nodes HBASE-7932 is experimental because it is 
not "set-it-and-forget"; it requires extra processes such as an external 
balancer.  For MOB, after we get the other blockers in (snapshot support, 
metrics) we'll revamp the mob compaction and then remove the experimental tag.  
Our goal would be to get this all in by the end of the year.


was (Author: jmhsieh):
bq. Related, the MOB design also attempts to avoid write amplification of large 
cells during compaction, by segregating large values into separate files set 
outside the normal compaction process. Rather than normal compaction, an 
external MapReduce based tool is used for compacting MOB files. HBase has never 
required MapReduce before and we should really think hard before introducing 
such a change. Are we sure the desired objectives cannot be met with a 
pluggable compaction policy?

Removing the external processes that perform "mob compaction" is one of the 
follow up goals and is noted at HBASE-11861.  We want to get rid of the MR 
dependencies because it introduces a new piece of operational complexity and I 
don't want that.  I don't consider the MOB feature to be production ready if it 
still requires the external process to manage this.  

The mob feature like other features that require external tooling be 
experimental until simplified operationally.  We've done this before -- for 
example,, I call favored nodes HBASE-7932 experimental becuase it is not 
"set-it-and-forget"; it requires extra processes such as an external balancer.  
For MOB, after we get the other blockers in (snapshot support, metrics) we'll 
revamp the mob compaction and then remove the experimental tag.  Our goal would 
be to get this all in by the end of the year.

> HBase MOB
> -
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
>  Issue Type: Umbrella
>  Components: regionserver, Scanners
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase 
> MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user 
> guide_v2.docx, hbase-11339-in-dev.patch
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-11339) HBase MOB

2014-09-02 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118911#comment-14118911
 ] 

Jonathan Hsieh edited comment on HBASE-11339 at 9/2/14 10:58 PM:
-

bq. Related, the MOB design also attempts to avoid write amplification of large 
cells during compaction, by segregating large values into separate files set 
outside the normal compaction process. Rather than normal compaction, an 
external MapReduce based tool is used for compacting MOB files. HBase has never 
required MapReduce before and we should really think hard before introducing 
such a change. Are we sure the desired objectives cannot be met with a 
pluggable compaction policy?

Removing the external processes that perform "mob compaction" is one of the 
follow up goals and is noted at HBASE-11861.  We want to get rid of the MR 
dependencies because it introduces a new piece of operational complexity and I 
don't want that.  I don't consider the MOB feature to be production ready if it 
still requires the external process to manage this.  

The mob feature like other features that require external tooling be 
experimental until simplified operationally.  We've done this before -- for 
example,, I call favored nodes HBASE-7932 experimental becuase it is not 
"set-it-and-forget"; it requires extra processes such as an external balancer.  
For MOB, after we get the other blockers in (snapshot support, metrics) we'll 
revamp the mob compaction and then remove the experimental tag.  Our goal would 
be to get this all in by the end of the year.


was (Author: jmhsieh):
bq. Related, the MOB design also attempts to avoid write amplification of large 
cells during compaction, by segregating large values into separate files set 
outside the normal compaction process. Rather than normal compaction, an 
external MapReduce based tool is used for compacting MOB files. HBase has never 
required MapReduce before and we should really think hard before introducing 
such a change. Are we sure the desired objectives cannot be met with a 
pluggable compaction policy?

Removing the external processes that perform "mob compaction" is one of the 
follow up goals and is noted at HBASE-11861.  We want to get rid of the MR 
dependencies because it introduces a new piece of operational complexity and I 
don't want that.  I don't consider the MOB feature to be production ready if it 
still requires the external process to manage this.  

The mob feature like other features that require external tooling be 
experimental until simplified operationally.  We've done this before -- for 
example,, I call favored nodes HBASE-7932 experimental becuase it is not 
"set-it-and-forget"; it requires extra processes such as an external balancer.  
For MOB, after we get the other blockers in (snapshot support, metrics) we'll 
revamp the mob compaction and then remove the experimental tag.  My goal would 
be to get this all in by the end of the year.

> HBase MOB
> -
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
>  Issue Type: Umbrella
>  Components: regionserver, Scanners
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase 
> MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, MOB user 
> guide_v2.docx, hbase-11339-in-dev.patch
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (HBASE-11339) HBase MOB

2014-08-29 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14115526#comment-14115526
 ] 

Jonathan Hsieh edited comment on HBASE-11339 at 8/29/14 5:24 PM:
-

[~jiajia], new version of docs look good, I think it is done for now unless we 
make changes to it. 

nits: I found there are two typos,  "provinding" \-> "providing" and 
"handers"\->"handlers".Don't worry about fixing this for now -- we'll have 
[~misty] convert them into a chapter or section in the ref guide.

Also, in the future, please do not delete attachments -- just provide a new 
version with a v2 or some think like that so we can keep track of the 
evolution.  




was (Author: jmhsieh):
[~jiajia], new version of docs look good, I think it is done for now unless we 
make changes to it. 

nits: I found there are two typos,  "provinding" -> "providing" and 
"handers"->"handlers".Don't worry about fixing this for now -- we'll have 
[~misty] convert them into a chapter or section in the ref guide.

Also, in the future, please do not delete attachments -- just provide a new 
version with a v2 or some think like that so we can keep track of the 
evolution.  



> HBase MOB
> -
>
> Key: HBASE-11339
> URL: https://issues.apache.org/jira/browse/HBASE-11339
> Project: HBase
>  Issue Type: Umbrella
>  Components: regionserver, Scanners
>Reporter: Jingcheng Du
>Assignee: Jingcheng Du
> Attachments: HBase MOB Design-v2.pdf, HBase MOB Design-v3.pdf, HBase 
> MOB Design-v4.pdf, HBase MOB Design.pdf, MOB user guide.docx, 
> hbase-11339-in-dev.patch
>
>
>   It's quite useful to save the medium binary data like images, documents 
> into Apache HBase. Unfortunately directly saving the binary MOB(medium 
> object) to HBase leads to a worse performance since the frequent split and 
> compaction.
>   In this design, the MOB data are stored in an more efficient way, which 
> keeps a high write/read performance and guarantees the data consistency in 
> Apache HBase.



--
This message was sent by Atlassian JIRA
(v6.2#6252)