[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897820#comment-13897820 ] Lukas Nalezenec commented on HBASE-10413: - One more think: There is some versioning in class TableSplit (methods write & read). We dont need to increment it ? (I am just asking) > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Fix For: 0.98.1, 0.99.0 > > Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, > HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, > HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13897700#comment-13897700 ] Lukas Nalezenec commented on HBASE-10413: - Hi, thank you very much for your time. I need one small change. Its not critical but it will make considerable difference in user experience. My line LOG.info(MessageFormat.format("Input split length: {0} bytes.", tSplit.getLength())); was changed to LOG.info("Input split length: " + tSplit.getLength() + " bytes."); in last code review. The reason why i used MessageFormat.format is that the length is large number and it needs to be printed with thousands separator. It takes few seconds to read number 54798765321 How fast can you say if the number represents 5.4 TB or 5.4 GB ? but if you print it with separators you can correctly read it in a moment: 54,798,765,321 Can we add some formatting consistent with hbase coding standards ? Maybe String.format i dont know. Lukas > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Fix For: 0.98.1, 0.99.0 > > Attachments: 10413-7.patch, HBASE-10413-2.patch, HBASE-10413-3.patch, > HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, > HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896865#comment-13896865 ] Lukas Nalezenec commented on HBASE-10413: - It would be great. > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, > HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, > HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13896438#comment-13896438 ] Lukas Nalezenec commented on HBASE-10413: - I have removed setLength() from TableSplit. Unit tests are green, I would like to resolve this ticket. > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, > HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, > HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Attachment: HBASE-10413-6.patch > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, > HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413-6.patch, > HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Attachment: HBASE-10413-5.patch fix after code review. TableSplit still contains setLength() > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, > HBASE-10413-4.patch, HBASE-10413-5.patch, HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13895729#comment-13895729 ] Lukas Nalezenec commented on HBASE-10413: - Lets make RegionSizeCalculator @InterfaceAudience.Private. Users are not expected to directly call this, right? - I am not sure - I have no experience with using this interface InterfaceAudience. Lot of developers are using heavily customized TableInputFormat. They may want to use this class. I have changed it to Private (Btw: I was told to change it from Private to Public in previous code review ). Instead of TableSplit.setLength(), you can override the ctor. TableSplit acts like a immutable data bean like object. - It means there will be ctor with 6 parameters. IMO it is too much but if you really want me to do it I will. On some cases, the regions might split or merge concurrently between getting the startEndKeys and asking the regions from cluster. In this case, for that range, we might default to 0, but it should be ok I think. We are not just estimating the region sizes here. - I think its not worth doing - it will be rare and the difference will be insignificant most times. > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, > HBASE-10413-4.patch, HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Attachment: HBASE-10413-4.patch Calculator works only with store file size, not memstore size > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, > HBASE-10413-4.patch, HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894845#comment-13894845 ] Lukas Nalezenec commented on HBASE-10413: - ok, memstore size removed. > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, > HBASE-10413-4.patch, HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894790#comment-13894790 ] Lukas Nalezenec commented on HBASE-10413: - Ad: + long regionSizeBytes = (memSize + fileSize) * megaByte; Does memstore size have to be included ? I am not sure. What are cons and pros ? > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, > HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Attachment: HBASE-10413-3.patch code review > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413-2.patch, HBASE-10413-3.patch, > HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Release Note: TableSplit.getLength() contains correct sizes of region in bytes. It is used by M/R framework for better scheduling. Status: Patch Available (was: In Progress) > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413-2.patch, HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Attachment: HBASE-10413-2.patch latest patch with unit test category added > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413-2.patch, HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Attachment: HBASE-10413.patch > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > Attachments: HBASE-10413.patch > > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Work started] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on HBASE-10413 started by Lukas Nalezenec. > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890559#comment-13890559 ] Lukas Nalezenec commented on HBASE-10413: - New version with per table region filtering: https://github.com/apache/hbase/pull/8/files#diff-46ff60f1e27e3d77131acb7873050990R76 > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13890550#comment-13890550 ] Lukas Nalezenec commented on HBASE-10413: - Hi, I know it is hacky. It is my first hbase commit, i was not sure how to do it so I asked 3 people and then published first draft as soon as possible. Everybody was fine with the solution :( . The hacky solution is good enough for us - I have already deployed it yesterday. I cant spent much more time on this. I need to close it by tomorrow. How about this solution? I am not sure if it is the best way - it does not work with Scan ranges. ToDos: We need to filter regions by table It would be nice to if we could filter size by column families. https://github.com/apache/hbase/pull/8/files#diff-46ff60f1e27e3d77131acb7873050990R68 HBaseAdmin admin = new HBaseAdmin(configuration); ClusterStatus clusterStatus = admin.getClusterStatus(); Collection servers = clusterStatus.getServers(); for (ServerName serverName: servers) { ServerLoad serverLoad = clusterStatus.getLoad(serverName); for (Map.Entry regionEntry: serverLoad.getRegionsLoad().entrySet()) { byte[] regionId = regionEntry.getKey(); RegionLoad regionLoad = regionEntry.getValue(); long regionSize = 1024 * 1024 * (regionLoad.getMemStoreSizeMB() + regionLoad.getStorefileSizeMB()); sizeMap.put(regionId, regionSize); } } > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13889593#comment-13889593 ] Lukas Nalezenec commented on HBASE-10413: - I made big changes in code. You can check it and discus it in https://github.com/apache/hbase/pull/8/files . I have to write unit tests before making the patch. - I need help with unit test. Is there some simple unit test helper/utility i can use ? I need to create table with some regions and then work with their sizes. It should be local, there should be some level of abstraction. - I have added configuration option for disabling this feature: Is there some policy about new configuration options ? Should i move the configuration key constant to some place ? Should be the feature disabled or enabled by default ? - Computation of region sizes might be slow. We might need some parallelization. from mail: + public void setLength(long length) { This method in TableSplit can be package private. I think that lot of people uses Table Split in their custom Input format. IMHO this method should be part of API. > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec >Assignee: Lukas Nalezenec > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13887817#comment-13887817 ] Lukas Nalezenec commented on HBASE-10413: - first draft: https://github.com/lukasnalezenec/hbase/commit/bf560b3c19b15cefb114132ac86664ffc44dad32 > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13882737#comment-13882737 ] Lukas Nalezenec commented on HBASE-10413: - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. Anyway, we are interested in fixing this. See updated ticket description. > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Description: InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket was about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. was: InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket talked about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket was about production issue - I talked with guy who worked on this > and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Updated] (HBASE-10413) Tablesplit.getLength returns 0
[ https://issues.apache.org/jira/browse/HBASE-10413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lukas Nalezenec updated HBASE-10413: Description: InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } This is causing us problem with scheduling - we have got jobs that are supposed to finish in limited time but they get often stuck in last mapper working on large region. Can we implement this method ? What is the best way ? We were thinking about estimating size by size of files on HDFS. We would like to get Scanner from TableSplit, use startRow, stopRow and column families to get corresponding region than computing size of HDFS for given region and column family. Update: This ticket talked about production issue - I talked with guy who worked on this and he said our production issue was probably not directly caused by getLength() returning 0. was: We had serious issue in our production today. InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } Can we implement this method ? What is the best way ? Summary: Tablesplit.getLength returns 0 (was: TableSplits are not sorted by size.) > Tablesplit.getLength returns 0 > -- > > Key: HBASE-10413 > URL: https://issues.apache.org/jira/browse/HBASE-10413 > Project: HBase > Issue Type: Bug > Components: Client, mapreduce >Affects Versions: 0.96.1.1 >Reporter: Lukas Nalezenec > > InputSplits should be sorted by length but TableSplit does not contain real > getLength implementation: > @Override > public long getLength() { > // Not clear how to obtain this... seems to be used only for sorting > splits > return 0; > } > This is causing us problem with scheduling - we have got jobs that are > supposed to finish in limited time but they get often stuck in last mapper > working on large region. > Can we implement this method ? > What is the best way ? > We were thinking about estimating size by size of files on HDFS. > We would like to get Scanner from TableSplit, use startRow, stopRow and > column families to get corresponding region than computing size of HDFS for > given region and column family. > Update: > This ticket talked about production issue - I talked with guy who worked on > this and he said our production issue was probably not directly caused by > getLength() returning 0. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Created] (HBASE-10413) TableSplits are not sorted by size.
Lukas Nalezenec created HBASE-10413: --- Summary: TableSplits are not sorted by size. Key: HBASE-10413 URL: https://issues.apache.org/jira/browse/HBASE-10413 Project: HBase Issue Type: Bug Components: Client Affects Versions: 0.96.1.1 Reporter: Lukas Nalezenec We had serious issue in our production today. InputSplits should be sorted by length but TableSplit does not contain real getLength implementation: @Override public long getLength() { // Not clear how to obtain this... seems to be used only for sorting splits return 0; } Can we implement this method ? What is the best way ? -- This message was sent by Atlassian JIRA (v6.1.5#6160)