[jira] [Commented] (HBASE-13890) Get/Scan from MemStore only (Client API)
[ https://issues.apache.org/jira/browse/HBASE-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616234#comment-14616234 ] ramkrishna.s.vasudevan commented on HBASE-13890: bq.Yes, of course, but I think this should be additional hint/attribute of an operation. Having an additional hint or option should be the best option rather than another RPC call. bq.Data can be partial So the result is getting marked as partial? Get/Scan from MemStore only (Client API) Key: HBASE-13890 URL: https://issues.apache.org/jira/browse/HBASE-13890 Project: HBase Issue Type: New Feature Components: API, Client, Scanners Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Attachments: HBASE-13890-v1.patch This is short-circuit read for get/scan when recent data (version) of a cell can be found only in MemStore (with very high probability). Good examples are: Atomic counters and appends. This feature will allow to bypass completely store file scanners and improve performance and latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-13858) RS/MasterDumpServlet dumps threads before its “Stacks” header
[ https://issues.apache.org/jira/browse/HBASE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunhaitao reassigned HBASE-13858: - Assignee: sunhaitao RS/MasterDumpServlet dumps threads before its “Stacks” header - Key: HBASE-13858 URL: https://issues.apache.org/jira/browse/HBASE-13858 Project: HBase Issue Type: Bug Components: master, regionserver, UI Affects Versions: 1.1.0 Reporter: Lars George Assignee: sunhaitao Priority: Trivial Labels: beginner Fix For: 2.0.0, 1.3.0 The stacktraces are captured using a Hadoop helper method, then its output is merged with the current. I presume there is a simple flush after outputing the Stack header missing, before then the caught output is dumped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13867) Add endpoint coprocessor guide to HBase book
[ https://issues.apache.org/jira/browse/HBASE-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616959#comment-14616959 ] Gaurav Bhardwaj commented on HBASE-13867: - Patch corrected Add endpoint coprocessor guide to HBase book Key: HBASE-13867 URL: https://issues.apache.org/jira/browse/HBASE-13867 Project: HBase Issue Type: Task Components: Coprocessors, documentation Reporter: Vladimir Rodionov Assignee: Gaurav Bhardwaj Attachments: HBASE-13867.1.patch Endpoint coprocessors are very poorly documented. Coprocessor section of HBase book must be updated either with its own endpoint coprocessors HOW-TO guide or, at least, with the link(s) to some other guides. There is good description here: http://www.3pillarglobal.com/insights/hbase-coprocessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616987#comment-14616987 ] Josh Elser commented on HBASE-13561: If you didn't get to it.. I deleted {{\x00\x02\x91\x1E\xA5U\x97\xC9x\xA0\xAE\xCD\xED*C\x92}}, then re-{{Verify}}ied. {noformat} org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$Counts REFERENCED=3998 UNDEFINED=1 UNREFERENCED=1 undef \x00\x02\x91\x1E\xA5U\x97\xC9x\xA0\xAE\xCD\xED*C\x92=1 unref \xD7\xFE{~0\x1C\x91#\xA5\xE1\x01T\xA7UwY=1 2015-07-07 12:52:12,362 ERROR [main] test.IntegrationTestBigLinkedList$Verify: Found an undefined node. Undefined count=1 % echo $? 1 {noformat} I did notice that I could update the usage for ITBLL to be a bit more accurate after these changes. Will put up a v2 shortly. ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Attachments: HBASE-13561-v1.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13415) Procedure V2 - Use nonces for double submits from client
[ https://issues.apache.org/jira/browse/HBASE-13415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616904#comment-14616904 ] Matteo Bertozzi commented on HBASE-13415: - I think it just need another round with the fixes pointed out on reviewboard, nothing too big. if [~syuanjiang] does not have time I may be able to complete it. Procedure V2 - Use nonces for double submits from client Key: HBASE-13415 URL: https://issues.apache.org/jira/browse/HBASE-13415 Project: HBase Issue Type: Sub-task Components: master Reporter: Enis Soztutar Assignee: Stephen Yuan Jiang Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.3.0 The client can submit a procedure, but before getting the procId back, the master might fail. In this case, the client request will fail and the client will re-submit the request. If 1.1 client or if there is no contention for the table lock, the time window is pretty small, but still might happen. If the proc was accepted and stored in the procedure store, a re-submit from the client will add another procedure, which will execute after the first one. The first one will likely succeed, and the second one will fail (for example in the case of create table, the second one will throw TableExistsException). One idea is to use client generated nonces (that we already have) to guard against these cases. The client will submit the request with the nonce and the nonce will be saved together with the procedure in the store. In case of a double submit, the nonce-cache is checked and the procId of the original request is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13867) Add endpoint coprocessor guide to HBase book
[ https://issues.apache.org/jira/browse/HBASE-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaurav Bhardwaj updated HBASE-13867: Attachment: HBASE-13867.1.patch Uploading correct patch. Add endpoint coprocessor guide to HBase book Key: HBASE-13867 URL: https://issues.apache.org/jira/browse/HBASE-13867 Project: HBase Issue Type: Task Components: Coprocessors, documentation Reporter: Vladimir Rodionov Assignee: Gaurav Bhardwaj Attachments: HBASE-13867.1.patch Endpoint coprocessors are very poorly documented. Coprocessor section of HBase book must be updated either with its own endpoint coprocessors HOW-TO guide or, at least, with the link(s) to some other guides. There is good description here: http://www.3pillarglobal.com/insights/hbase-coprocessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-8642) [Snapshot] List and delete snapshot by table
[ https://issues.apache.org/jira/browse/HBASE-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617007#comment-14617007 ] Ashish Singhi commented on HBASE-8642: -- Ping for more reviews! [Snapshot] List and delete snapshot by table Key: HBASE-8642 URL: https://issues.apache.org/jira/browse/HBASE-8642 Project: HBase Issue Type: Improvement Components: snapshots Affects Versions: 0.98.0, 0.95.0, 0.95.1, 0.95.2 Reporter: Julian Zhou Assignee: Ashish Singhi Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: 8642-trunk-0.95-v0.patch, 8642-trunk-0.95-v1.patch, 8642-trunk-0.95-v2.patch, HBASE-8642-0.98.patch, HBASE-8642-v1.patch, HBASE-8642-v2.patch, HBASE-8642-v3.patch, HBASE-8642-v4.patch, HBASE-8642.patch Support list and delete snapshots by table names. User scenario: A user wants to delete all the snapshots which were taken in January month for a table 't' where snapshot names starts with 'Jan'. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13867) Add endpoint coprocessor guide to HBase book
[ https://issues.apache.org/jira/browse/HBASE-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gaurav Bhardwaj updated HBASE-13867: Status: Patch Available (was: In Progress) Please use patch [HBASE-13867.1.patch|https://issues.apache.org/jira/secure/attachment/12743985/HBASE-13867.1.patch] Add endpoint coprocessor guide to HBase book Key: HBASE-13867 URL: https://issues.apache.org/jira/browse/HBASE-13867 Project: HBase Issue Type: Task Components: Coprocessors, documentation Reporter: Vladimir Rodionov Assignee: Gaurav Bhardwaj Attachments: HBASE-13867.1.patch Endpoint coprocessors are very poorly documented. Coprocessor section of HBase book must be updated either with its own endpoint coprocessors HOW-TO guide or, at least, with the link(s) to some other guides. There is good description here: http://www.3pillarglobal.com/insights/hbase-coprocessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13415) Procedure V2 - Use nonces for double submits from client
[ https://issues.apache.org/jira/browse/HBASE-13415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616972#comment-14616972 ] Stephen Yuan Jiang commented on HBASE-13415: [~busbey] I should be be able to complete soon. Procedure V2 - Use nonces for double submits from client Key: HBASE-13415 URL: https://issues.apache.org/jira/browse/HBASE-13415 Project: HBase Issue Type: Sub-task Components: master Reporter: Enis Soztutar Assignee: Stephen Yuan Jiang Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.3.0 The client can submit a procedure, but before getting the procId back, the master might fail. In this case, the client request will fail and the client will re-submit the request. If 1.1 client or if there is no contention for the table lock, the time window is pretty small, but still might happen. If the proc was accepted and stored in the procedure store, a re-submit from the client will add another procedure, which will execute after the first one. The first one will likely succeed, and the second one will fail (for example in the case of create table, the second one will throw TableExistsException). One idea is to use client generated nonces (that we already have) to guard against these cases. The client will submit the request with the nonce and the nonce will be saved together with the procedure in the store. In case of a double submit, the nonce-cache is checked and the procId of the original request is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13387) Add ByteBufferedCell an extension to Cell
[ https://issues.apache.org/jira/browse/HBASE-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616958#comment-14616958 ] Hadoop QA commented on HBASE-13387: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743962/HBASE-13387_v2.patch against master branch at commit 7acb061e63614ad957da654f920f54ac7a02edd6. ATTACHMENT ID: 12743962 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 18 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:red}-1 javac{color}. The applied patch generated 20 javac compiler warnings (more than the master's current 16 warnings). {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 1 warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1899 checkstyle errors (more than the master's current 1898 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14691//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14691//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14691//artifact/patchprocess/checkstyle-aggregate.html Javadoc warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14691//artifact/patchprocess/patchJavadocWarnings.txt Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14691//console This message is automatically generated. Add ByteBufferedCell an extension to Cell - Key: HBASE-13387 URL: https://issues.apache.org/jira/browse/HBASE-13387 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: ByteBufferedCell.docx, HBASE-13387_v1.patch, HBASE-13387_v2.patch, WIP_HBASE-13387_V2.patch, WIP_ServerCell.patch, benchmark.zip This came in btw the discussion abt the parent Jira and recently Stack added as a comment on the E2E patch on the parent Jira. The idea is to add a new Interface 'ByteBufferedCell' in which we can add new buffer based getter APIs and getters for position in components in BB. We will keep this interface @InterfaceAudience.Private. When the Cell is backed by a DBB, we can create an Object implementing this new interface. The Comparators has to be aware abt this new Cell extension and has to use the BB based APIs rather than getXXXArray(). Also give util APIs in CellUtil to abstract the checks for new Cell type. (Like matchingXXX APIs, getValueAstype APIs etc) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-7912: - Attachment: HBaseBackupRestore-Jira-7912-v6.pdf Updated version of design document. Added section for KVs deduplication and some other stuff. HBase Backup/Restore Based on HBase Snapshot Key: HBASE-7912 URL: https://issues.apache.org/jira/browse/HBASE-7912 Project: HBase Issue Type: Sub-task Reporter: Richard Ding Assignee: Vladimir Rodionov Labels: backup Fix For: 2.0.0 Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf Finally, we completed the implementation of our backup/restore solution, and would like to share with community through this jira. We are leveraging existing hbase snapshot feature, and provide a general solution to common users. Our full backup is using snapshot to capture metadata locally and using exportsnapshot to move data to another cluster; the incremental backup is using offline-WALplayer to backup HLogs; we also leverage global distribution rolllog and flush to improve performance; other added-on values such as convert, merge, progress report, and CLI commands. So that a common user can backup hbase data without in-depth knowledge of hbase. Our solution also contains some usability features for enterprise users. The detail design document and CLI command will be attached in this jira. We plan to use 10~12 subtasks to share each of the following features, and document the detail implement in the subtasks: * *Full Backup* : provide local and remote back/restore for a list of tables * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental backup) * *distributed* Logroll and distributed flush * Backup *Manifest* and history * *Incremental* backup: to build on top of full backup as daily/weekly backup * *Convert* incremental backup WAL files into hfiles * *Merge* several backup images into one(like merge weekly into monthly) * *add and remove* table to and from Backup image * *Cancel* a backup process * backup progress *status* * full backup based on *existing snapshot* *-* *Below is the original description, to keep here as the history for the design and discussion back in 2013* There have been attempts in the past to come up with a viable HBase backup/restore solution (e.g., HBASE-4618). Recently, there are many advancements and new features in HBase, for example, FileLink, Snapshot, and Distributed Barrier Procedure. This is a proposal for a backup/restore solution that utilizes these new features to achieve better performance and consistency. A common practice of backup and restore in database is to first take full baseline backup, and then periodically take incremental backup that capture the changes since the full baseline backup. HBase cluster can store massive amount data. Combination of full backups with incremental backups has tremendous benefit for HBase as well. The following is a typical scenario for full and incremental backup. # The user takes a full backup of a table or a set of tables in HBase. # The user schedules periodical incremental backups to capture the changes from the full backup, or from last incremental backup. # The user needs to restore table data to a past point of time. # The full backup is restored to the table(s) or to different table name(s). Then the incremental backups that are up to the desired point in time are applied on top of the full backup. We would support the following key features and capabilities. * Full backup uses HBase snapshot to capture HFiles. * Use HBase WALs to capture incremental changes, but we use bulk load of HFiles for fast incremental restore. * Support single table or a set of tables, and column family level backup and restore. * Restore to different table names. * Support adding additional tables or CF to backup set without interruption of incremental backup schedule. * Support rollup/combining of incremental backups into longer period and bigger incremental backups. * Unified command line interface for all the above. The solution will support HBase backup to FileSystem, either on the same cluster or across clusters. It has the flexibility to support backup to other devices and servers in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13337) Table regions are not assigning back, after restarting all regionservers at once.
[ https://issues.apache.org/jira/browse/HBASE-13337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617056#comment-14617056 ] Lars Hofhansl commented on HBASE-13337: --- v3 looks reasonable to me. At the very least it won't hurt :) Nice find. +1 Table regions are not assigning back, after restarting all regionservers at once. - Key: HBASE-13337 URL: https://issues.apache.org/jira/browse/HBASE-13337 Project: HBase Issue Type: Bug Components: Region Assignment Affects Versions: 2.0.0 Reporter: Y. SREENIVASULU REDDY Assignee: Samir Ahmic Priority: Blocker Fix For: 2.0.0 Attachments: HBASE-13337-v2.patch, HBASE-13337-v3.patch, HBASE-13337.patch Regions of the table are continouly in state=FAILED_CLOSE. {noformat} RegionState RIT time (ms) 8f62e819b356736053e06240f7f7c6fd t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818 113929 caf59209ae65ea80fca6bdc6996a7d68 t1,,1427362431330.caf59209ae65ea80fca6bdc6996a7d68. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM2,16040,1427362533691 113929 db52a74988f71e5cf257bbabf31f26f3 t1,,1427362431330.db52a74988f71e5cf257bbabf31f26f3. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM3,16040,1427362533691 113920 43f3a65b9f9ff283f598c5450feab1f8 t1,,1427362431330.43f3a65b9f9ff283f598c5450feab1f8. state=FAILED_CLOSE, ts=Thu Mar 26 15:05:36 IST 2015 (113s ago), server=VM1,16040,1427362531818 113920 {noformat} *Steps to reproduce:* 1. Start HBase cluster with more than one regionserver. 2. Create a table with precreated regions. (lets say 15 regions) 3. Make sure the regions are well balanced. 4. Restart all the Regionservers process at once across the cluster, except HMaster process 5. After restarting the Regionservers, successfully will connect to the HMaster. *Bug:* But no regions are assigning back to the Regionservers. *Master log shows as follows:* {noformat} 2015-03-26 15:05:36,201 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd state=OFFLINE, ts=1427362536106, server=VM2,16040,1427362242602} to {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,202 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore: Updating row t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_OPENsn=VM1,16040,1427362531818 2015-03-26 15:05:36,244 DEBUG [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Force region state offline {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,244 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStates: Transition {8f62e819b356736053e06240f7f7c6fd state=PENDING_OPEN, ts=1427362536201, server=VM1,16040,1427362531818} to {8f62e819b356736053e06240f7f7c6fd state=PENDING_CLOSE, ts=1427362536244, server=VM1,16040,1427362531818} 2015-03-26 15:05:36,244 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.RegionStateStore: Updating row t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd. with state=PENDING_CLOSE 2015-03-26 15:05:36,248 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=1 of 10 2015-03-26 15:05:36,248 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=2 of 10 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd., try=3 of 10 2015-03-26 15:05:36,249 INFO [VM2,16020,1427362216887-GeneralBulkAssigner-0] master.AssignmentManager: Server VM1,16040,1427362531818 returned java.nio.channels.ClosedChannelException for t1,,1427362431330.8f62e819b356736053e06240f7f7c6fd.,
[jira] [Updated] (HBASE-13897) OOM may occur when Import imports a row with too many KeyValues
[ https://issues.apache.org/jira/browse/HBASE-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13897: --- Hadoop Flags: Reviewed Fix Version/s: 1.3.0 2.0.0 OOM may occur when Import imports a row with too many KeyValues --- Key: HBASE-13897 URL: https://issues.apache.org/jira/browse/HBASE-13897 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Liu Junhong Assignee: Liu Junhong Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: 13897-v2.txt, HBASE-13897-0.98.patch, HBASE-13897-master-20150629.patch, HBASE-13897-master-20150630.patch, HBASE-13897-master-20150707.patch, HBASE-13897-master.patch When importing a row with too many KeyValues (may have too many columns or versions),KeyValueReducer will incur OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14027) Clean up netty dependencies
[ https://issues.apache.org/jira/browse/HBASE-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-14027: Attachment: HBASE-14027.3.patch -03 * adds test dependency for hbase-it so that integration tests run via maven will work * unifies the netty 3.x version used in tests between hbase-server and hbase-it Given the above, need to test that keeping the jar out of the assembly doesn't prevent the ITs from running on a deployed cluster. Clean up netty dependencies --- Key: HBASE-14027 URL: https://issues.apache.org/jira/browse/HBASE-14027 Project: HBase Issue Type: Improvement Components: build Affects Versions: 1.0.0 Reporter: Sean Busbey Assignee: Sean Busbey Fix For: 2.0.0, 1.2.0 Attachments: HBASE-14027.1.patch, HBASE-14027.2.patch, HBASE-14027.3.patch We have multiple copies of Netty (3?) getting shipped around. clean some up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617050#comment-14617050 ] stack commented on HBASE-13561: --- I tried it... v1. Looks good. {code} ... org.apache.hadoop.hbase.test.IntegrationTestBigLinkedList$Verify$Counts REFERENCED=499395826 UNDEFINED=302087 UNREFERENCED=302087 File Input Format Counters Bytes Read=0 File Output Format Counters Bytes Written=20746296 undef \x00\x00$\xA6\x96W\x9F\xC5\x81\x83r\x5Co[;3=1 \x00\x005\xBC\x06\x0D\xE4'\xDD\xA6l\xA0\xB1c6~=1 \x00\x007\xB3\x0D^\x11\xF2\xC4\xD74\xAA\xC5!\xA8o=1 \x00\x008 \xD3\xD6Z\xCD\xD0\xBC\x9C\xE7\x1F\xEE\x11.=1 \x00\x00=+\xD2\xB4\x91h\xCFJ8`\xF8\x82\xA5\xE7=1 \x00\x00S\xE1\xD5\xC5n\xB9Y\xA3\xB8\xB9`\xA1\xCF\xB9=1 \x00\x00\x0D\xC3)\xCB\x85@t\x0E\x8EZ\xBAy6\x8E=1 \x00\x00\x16\xFF\x8E\x94\x0F\xFC\x13\xC1m~\xB9\xA8!\x85=1 \x00\x00\x82@Z\xB1N\x1B\xA05\xFB\xBC\xDB\xD0\x0D\x04=1 \x00\x00\x96z\xB9\x18\xE5\x9B\xC7\x14\xB1\xA6\x0Bf\x1F\xE7=1 \x00\x00\x98*U(\x8Fqi\x04\xD8A\x13\x0E8j=1 \x00\x00\xA3\xD8\x0F\x02\x13\x06n5\xD45.Y\xB3\x81=1 \x00\x00\xC4ItJ\x0BX\x9F\x8A\x0D\xB5\xDDn\xAE=1 \x00\x00\xEB\xA7\x902X\xB0\xDD\xE1\x17\x83\xAD\x0C\xD0\x9F=1 \x00\x00a\xF2k6\xBC;\xDD)5\xB2\xAD\xA7\xBA(=1 \x00\x00g\xBB\xF5\xD2\xBE\x9Dm\xE1L\x8F\xB1\xAB/=1 \x00\x00{\x8E\x12\xE0\x1Des!\xF3I\xC7}Zn=1 \x00\x00}k\x89\xF8b\x970\xC0\x07Xu\xAF\xDA\xC5=1 \x00\x01+\xBE+\x10/\x87\xA4\xB5\xF8aEDdU=1 \x00\x01.8\x9FiBj\xD3\x8E6e\xCF\xEC\xF8\xC9=1 \x00\x010F\xCB\x0B \xA3\x07\x0B\x8D^X\xC7\x5C\x5C=1 \x00\x016\xB0\x17\xD9\xE2\xF6S\xE9v\xB4u\xDD\xBF}=1 \x00\x01?\xB3\xB8\x88\x1A\xF4\xA4\xAF\xFA!\xA8\xA1\x93\x8B=1 \x00\x01J{tXz\x92\xDAI1\x96\x98E\x0E\x97=1 \x00\x01X\xB1[C]0\xEAP\x90\xDF\xBE\xD87\xBD=1 \x00\x01\x07\x93\x88/c3h\xA2i\xBAs\xB8\xB9\x5C=1 \x00\x01\x8C\xAB\xED+\x95\xD4\x07\x178\xA4m2\xCE*=1 \x00\x01\xAFGX\x0C\xFBi\xEB\xA4\xCB\x0D\x9B\xA3=1 \x00\x01\xD5i\xF3\x95\x8Bn\xFEx{\xEC\x13\xFE\xE5\xBB=1 \x00\x01\xDC\xF1\xE3FXZ\xE9\x00\xB4i.\x01\xFD\xC1=1 \x00\x01\xE5?\xB8eB\xDEM\x01\x90\xF3\xC8\x04\xB0=1 \x00\x01\xF0\xD1\x14\x1DK\xAB\xE6\xADZ\xBC\xE5y\x12o=1 \x00\x01c\xD1`\x00\x871qK \xB0\x88z;\x86=1 \x00\x02 \x1C\x0E\xF3\xBD\xCDSb3\xEB\x8E\xA7\xFAs=1 \x00\x020\xFA\x1F4\xAD\xA2K\xF7\xC2\xF5\xD9=\x86\x84=1 \x00\x02\x09\x00xb\x06\x0B\xFB\x89\xC9\xDF\xEB9\xB7\xC7=1 \x00\x02\xFE\xC6o\x91z\x85\xA6\xC1\xA2\xFDH\x05EK=1 \x00\x02{\x1F\xD9{5\x06\x06H\xC5ql\xB0\x93\xF8=1 \x00\x03`\xC0\xD1\xA1)\x8B\x18\x99=|\xCAk\x88\x88=1 \x00\x03es\xA0\xC9h\xEEd\xCFL\xDFB\x9A\x92C=1 unref \x00\x00\x87\xFE\xFA\xFF`\xD7\x8B\x0D#\xD9\xE2\xEFy\x89=1 \x00\x00\xC5_\x9F\xFC\xBB\x969\xBE%\x89\xAB\xC7\x94W=1 \x00\x00\xEF\xB0\xFC\xFD\x025$\xF9\x14\xC48\xA95\x8A=1 \x00\x00wb?\xA1=\xEA\xDC\x19\xBD\xD6\xEC\x09\xEE=1 \x00\x01RK\x86\x18|0\xB8\xE3\xA2C\xA1\x07\xA4\x0C=1 15/07/07 10:26:21 ERROR test.IntegrationTestBigLinkedList$Verify: Found an undefined node. Undefined count=302087 [stack@c2020 ~]$ [stack@c2020 ~]$ [stack@c2020 ~]$ [stack@c2020 ~]$ [stack@c2020 ~]$ [stack@c2020 ~]$ [stack@c2020 ~]$ [stack@c2020 ~]$ echo $? 1 {code} Let me commit [~elserj] ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Attachments: HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just
[jira] [Commented] (HBASE-14024) ImportTsv is not loading hbase-default.xml
[ https://issues.apache.org/jira/browse/HBASE-14024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616994#comment-14616994 ] Ashish Singhi commented on HBASE-14024: --- Ping for reviews! ImportTsv is not loading hbase-default.xml -- Key: HBASE-14024 URL: https://issues.apache.org/jira/browse/HBASE-14024 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 2.0.0 Reporter: Ashish Singhi Assignee: Ashish Singhi Priority: Critical Fix For: 2.0.0 Attachments: HBASE-14024.patch ImportTsv job is failing with below exception {noformat} Exception in thread main java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:123) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:89) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configurePartitioner(HFileOutputFormat2.java:591) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configureIncrementalLoad(HFileOutputFormat2.java:441) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configureIncrementalLoad(HFileOutputFormat2.java:406) at org.apache.hadoop.hbase.mapreduce.ImportTsv.createSubmittableJob(ImportTsv.java:555) at org.apache.hadoop.hbase.mapreduce.ImportTsv.run(ImportTsv.java:763) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:772) {noformat} {{hbase.fs.tmp.dir}} is set to a default value in hbase-default.xml. I found that hbase configuration resources from its xml are not loaded into conf object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HBASE-13561: --- Attachment: HBASE-13561-v2.patch Verified failure to Verify is reported as expected. Updated usage on ITBLL to advise that return code is checked in addition to counters. ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Attachments: HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12848) Utilize Flash storage for WAL
[ https://issues.apache.org/jira/browse/HBASE-12848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617045#comment-14617045 ] Lars Hofhansl commented on HBASE-12848: --- Hmm... Interesting. Only renames are atomic. In theory we can rename (move the inode) in the NameNode and move the blocks lazily at the DataNodes, but that'd need more HDFS (presumably). Utilize Flash storage for WAL - Key: HBASE-12848 URL: https://issues.apache.org/jira/browse/HBASE-12848 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 1.1.0 Attachments: 12848-v1.patch, 12848-v2.patch, 12848-v3.patch, 12848-v4.patch, 12848-v4.patch One way to improve data ingestion rate is to make use of Flash storage. HDFS is doing the heavy lifting - see HDFS-7228. We assume an environment where: 1. Some servers have a mix of flash, e.g. 2 flash drives and 4 traditional drives. 2. Some servers have all traditional storage. 3. RegionServers are deployed on both profiles within one HBase cluster. This JIRA allows WAL to be managed on flash in a mixed-profile environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13890) Get/Scan from MemStore only (Client API)
[ https://issues.apache.org/jira/browse/HBASE-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616309#comment-14616309 ] Anoop Sam John commented on HBASE-13890: {code} if(results.size() == 0 get.isMemstoreOnly()){ 7071 // memory store mode 7072 // Nothing was found - return empty result or null 7073 return increment.isReturnResults() ? Result.create(results) : null; 7074} {code} I see.. Checking the patch now. So this will fail it to client.. Can the get op be repeated (with out memstore only setting) at server side only? Get/Scan from MemStore only (Client API) Key: HBASE-13890 URL: https://issues.apache.org/jira/browse/HBASE-13890 Project: HBase Issue Type: New Feature Components: API, Client, Scanners Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Attachments: HBASE-13890-v1.patch This is short-circuit read for get/scan when recent data (version) of a cell can be found only in MemStore (with very high probability). Good examples are: Atomic counters and appends. This feature will allow to bypass completely store file scanners and improve performance and latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14027) Clean up netty dependencies
[ https://issues.apache.org/jira/browse/HBASE-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616319#comment-14616319 ] Hadoop QA commented on HBASE-14027: --- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743885/HBASE-14027.2.patch against master branch at commit 7acb061e63614ad957da654f920f54ac7a02edd6. ATTACHMENT ID: 12743885 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation, build, or dev-support patch that doesn't require tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14688//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14688//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14688//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14688//console This message is automatically generated. Clean up netty dependencies --- Key: HBASE-14027 URL: https://issues.apache.org/jira/browse/HBASE-14027 Project: HBase Issue Type: Improvement Components: build Affects Versions: 1.0.0 Reporter: Sean Busbey Assignee: Sean Busbey Fix For: 2.0.0, 1.2.0 Attachments: HBASE-14027.1.patch, HBASE-14027.2.patch We have multiple copies of Netty (3?) getting shipped around. clean some up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13988) Add exception handler for lease thread
[ https://issues.apache.org/jira/browse/HBASE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616330#comment-14616330 ] Hudson commented on HBASE-13988: SUCCESS: Integrated in HBase-0.98 #1049 (See [https://builds.apache.org/job/HBase-0.98/1049/]) HBASE-13988 Add exception handler for lease thread (Liu Shaohui) (enis: rev 9894497e7158905b3a8091e6ec8454e699be3e72) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Add exception handler for lease thread -- Key: HBASE-13988 URL: https://issues.apache.org/jira/browse/HBASE-13988 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2, 1.3.0, 0.98.15 Attachments: HBASE-13988-v001.diff, HBASE-13988-v002.diff In a prod cluster, a region server exited for some important threads were not alive. After excluding other threads from the log, we doubted the lease thread was the root. So we need to add an exception handler to the lease thread to debug why it exited in future. {quote} 2015-06-29,12:46:09,222 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: One or more threads are no longer alive -- stop 2015-06-29,12:46:09,223 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 21600 ... 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Thread-37 exiting 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver21600.compactionChecker exiting 2015-06-29,12:46:12,403 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher: regionserver21600.periodicFlusher exiting {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13988) Add exception handler for lease thread
[ https://issues.apache.org/jira/browse/HBASE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616384#comment-14616384 ] Hudson commented on HBASE-13988: SUCCESS: Integrated in HBase-1.3-IT #25 (See [https://builds.apache.org/job/HBase-1.3-IT/25/]) HBASE-13988 Add exception handler for lease thread (Liu Shaohui) (enis: rev 3da5058337579d72ef046166ac0c979dda5eb74b) * hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java Add exception handler for lease thread -- Key: HBASE-13988 URL: https://issues.apache.org/jira/browse/HBASE-13988 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2, 1.3.0, 0.98.15 Attachments: HBASE-13988-v001.diff, HBASE-13988-v002.diff In a prod cluster, a region server exited for some important threads were not alive. After excluding other threads from the log, we doubted the lease thread was the root. So we need to add an exception handler to the lease thread to debug why it exited in future. {quote} 2015-06-29,12:46:09,222 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: One or more threads are no longer alive -- stop 2015-06-29,12:46:09,223 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 21600 ... 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Thread-37 exiting 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver21600.compactionChecker exiting 2015-06-29,12:46:12,403 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher: regionserver21600.periodicFlusher exiting {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13858) RS/MasterDumpServlet dumps threads before its “Stacks” header
[ https://issues.apache.org/jira/browse/HBASE-13858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sunhaitao updated HBASE-13858: -- Assignee: (was: sunhaitao) RS/MasterDumpServlet dumps threads before its “Stacks” header - Key: HBASE-13858 URL: https://issues.apache.org/jira/browse/HBASE-13858 Project: HBase Issue Type: Bug Components: master, regionserver, UI Affects Versions: 1.1.0 Reporter: Lars George Priority: Trivial Labels: beginner Fix For: 2.0.0, 1.3.0 The stacktraces are captured using a Hadoop helper method, then its output is merged with the current. I presume there is a simple flush after outputing the Stack header missing, before then the caught output is dumped. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-11182) Store backup information in a manifest file using protobuff format
[ https://issues.apache.org/jira/browse/HBASE-11182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov resolved HBASE-11182. --- Resolution: Won't Fix Closing this JIRA as not relevant one to a current Backup/Restore roadmap. Store backup information in a manifest file using protobuff format -- Key: HBASE-11182 URL: https://issues.apache.org/jira/browse/HBASE-11182 Project: HBase Issue Type: New Feature Affects Versions: 0.99.0 Reporter: Jerry He Assignee: Enoch Hsu A manifest file is used to store information about a backup image such as: Table Name Type: Full or Incremental Size Timestamp Info State Info: Converted, Merged, Compacted, etc. Dependency Lineage -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14030) Backup/Restore Phase 1
Vladimir Rodionov created HBASE-14030: - Summary: Backup/Restore Phase 1 Key: HBASE-14030 URL: https://issues.apache.org/jira/browse/HBASE-14030 Project: HBase Issue Type: Umbrella Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov This is the umbrella ticket for Backup/Restore Phase 1. See HBase-7912 design doc for the phase description. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13897) OOM may occur when Import imports a row with too many KeyValues
[ https://issues.apache.org/jira/browse/HBASE-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617131#comment-14617131 ] Hadoop QA commented on HBASE-13897: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743973/13897-v2.txt against master branch at commit 7acb061e63614ad957da654f920f54ac7a02edd6. ATTACHMENT ID: 12743973 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14693//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14693//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14693//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14693//console This message is automatically generated. OOM may occur when Import imports a row with too many KeyValues --- Key: HBASE-13897 URL: https://issues.apache.org/jira/browse/HBASE-13897 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Liu Junhong Assignee: Liu Junhong Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: 13897-v2.txt, HBASE-13897-0.98.patch, HBASE-13897-master-20150629.patch, HBASE-13897-master-20150630.patch, HBASE-13897-master-20150707.patch, HBASE-13897-master.patch When importing a row with too many KeyValues (may have too many columns or versions),KeyValueReducer will incur OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14034) HBase Backup/Restore Phase 1: Abstract Coordination manager (Zk) operations
Vladimir Rodionov created HBASE-14034: - Summary: HBase Backup/Restore Phase 1: Abstract Coordination manager (Zk) operations Key: HBASE-14034 URL: https://issues.apache.org/jira/browse/HBASE-14034 Project: HBase Issue Type: Task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 Abstract Coordination manager (Zk) operations. See org.apache.hadoop.hbase.coordination package for references. Provide Zookeeper implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14036) HBase Backup/Restore Phase 1: Custom WAL archive cleaner
Vladimir Rodionov created HBASE-14036: - Summary: HBase Backup/Restore Phase 1: Custom WAL archive cleaner Key: HBASE-14036 URL: https://issues.apache.org/jira/browse/HBASE-14036 Project: HBase Issue Type: Task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 Custom WAL archive cleaner (BackupLogCleaner). We need to keep WAL files in archive until they either get copied over to backup destination during an incremental backup or full backup (for ALL tables) happens. This is tricky, but is doable. Backup-aware WAL archiver cleaner should consult hbase:backup to determine if WAL file is safe to purge. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12295) Prevent block eviction under us if reads are in progress from the BBs
[ https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617158#comment-14617158 ] ramkrishna.s.vasudevan commented on HBASE-12295: Will correct the long lines in my next revision based on the comments from RB. Prevent block eviction under us if reads are in progress from the BBs - Key: HBASE-12295 URL: https://issues.apache.org/jira/browse/HBASE-12295 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-12295.pdf, HBASE-12295_1.patch, HBASE-12295_1.pdf, HBASE-12295_10.patch, HBASE-12295_2.patch, HBASE-12295_4.patch, HBASE-12295_4.pdf, HBASE-12295_5.pdf, HBASE-12295_9.patch, HBASE-12295_trunk.patch While we try to serve the reads from the BBs directly from the block cache, we need to ensure that the blocks does not get evicted under us while reading. This JIRA is to discuss and implement a strategy for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14024) ImportTsv is not loading hbase-default.xml
[ https://issues.apache.org/jira/browse/HBASE-14024?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617287#comment-14617287 ] Ted Yu commented on HBASE-14024: lgtm ImportTsv is not loading hbase-default.xml -- Key: HBASE-14024 URL: https://issues.apache.org/jira/browse/HBASE-14024 Project: HBase Issue Type: Bug Components: mapreduce Affects Versions: 2.0.0 Reporter: Ashish Singhi Assignee: Ashish Singhi Priority: Critical Fix For: 2.0.0 Attachments: HBASE-14024.patch ImportTsv job is failing with below exception {noformat} Exception in thread main java.lang.IllegalArgumentException: Can not create a Path from a null string at org.apache.hadoop.fs.Path.checkPathArg(Path.java:123) at org.apache.hadoop.fs.Path.init(Path.java:135) at org.apache.hadoop.fs.Path.init(Path.java:89) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configurePartitioner(HFileOutputFormat2.java:591) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configureIncrementalLoad(HFileOutputFormat2.java:441) at org.apache.hadoop.hbase.mapreduce.HFileOutputFormat2.configureIncrementalLoad(HFileOutputFormat2.java:406) at org.apache.hadoop.hbase.mapreduce.ImportTsv.createSubmittableJob(ImportTsv.java:555) at org.apache.hadoop.hbase.mapreduce.ImportTsv.run(ImportTsv.java:763) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.hbase.mapreduce.ImportTsv.main(ImportTsv.java:772) {noformat} {{hbase.fs.tmp.dir}} is set to a default value in hbase-default.xml. I found that hbase configuration resources from its xml are not loaded into conf object. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14032) HBase Backup/Restore Phase 1: Abstract SnapshotCopy (full backup)
[ https://issues.apache.org/jira/browse/HBASE-14032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-14032: -- Summary: HBase Backup/Restore Phase 1: Abstract SnapshotCopy (full backup) (was: Backup/Restore Phase 1: Abstract SnapshotCopy (full backup)) HBase Backup/Restore Phase 1: Abstract SnapshotCopy (full backup) - Key: HBASE-14032 URL: https://issues.apache.org/jira/browse/HBASE-14032 Project: HBase Issue Type: Task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Abstract SnapshotCopy (full backup) to support non-M/R based implementations. Provide M/R implementation. SnapshotCopy is used to copy snapshot’s data during full backup operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14033) HBase Backup/Restore Phase1: Abstract WALPlayer (incremental restore)
[ https://issues.apache.org/jira/browse/HBASE-14033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-14033: -- Summary: HBase Backup/Restore Phase1: Abstract WALPlayer (incremental restore) (was: Backup/Restore Phase1: Abstract WALPlayer (incremental restore)) HBase Backup/Restore Phase1: Abstract WALPlayer (incremental restore) - Key: HBASE-14033 URL: https://issues.apache.org/jira/browse/HBASE-14033 Project: HBase Issue Type: Task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 Abstract WALPlayer (incremental restore) to support non-M/R based implementations. Provide M/R implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14027) Clean up netty dependencies
[ https://issues.apache.org/jira/browse/HBASE-14027?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617168#comment-14617168 ] Hadoop QA commented on HBASE-14027: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743982/HBASE-14027.3.patch against master branch at commit 7acb061e63614ad957da654f920f54ac7a02edd6. ATTACHMENT ID: 12743982 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+0 tests included{color}. The patch appears to be a documentation, build, or dev-support patch that doesn't require tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14694//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14694//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14694//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14694//console This message is automatically generated. Clean up netty dependencies --- Key: HBASE-14027 URL: https://issues.apache.org/jira/browse/HBASE-14027 Project: HBase Issue Type: Improvement Components: build Affects Versions: 1.0.0 Reporter: Sean Busbey Assignee: Sean Busbey Fix For: 2.0.0, 1.2.0 Attachments: HBASE-14027.1.patch, HBASE-14027.2.patch, HBASE-14027.3.patch We have multiple copies of Netty (3?) getting shipped around. clean some up. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14038) Incremental backup list set is ignored during backup
Vladimir Rodionov created HBASE-14038: - Summary: Incremental backup list set is ignored during backup Key: HBASE-14038 URL: https://issues.apache.org/jira/browse/HBASE-14038 Project: HBase Issue Type: Bug Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 BUG: during incremental backup, provided table list is ignored and replaced with the set of tables which have been already backuped before. Test case: backup T1, T2, T3, then request incremental backup for T1, T2 = T3 will be included as well. See: BackupClient.requestBackup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14025) Update CHANGES.txt for 1.2
[ https://issues.apache.org/jira/browse/HBASE-14025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617122#comment-14617122 ] Sean Busbey commented on HBASE-14025: - {quote} I think I get your point, but not sure how easily we can follow this practice. It gets complicated for committers to think about what fixVersions to set at the time of the commit. The thing we have where we just mark the next scheduled version from that branch is simple and worked so far. {quote} That's fair. I think this is the kind of thing that release managers will have to do, since they know if something actually made it into the release branch. Since it's one time work on non-patch releases it doesn't seem so bad. I see it the same as how when the RCs get close the RM needs to move out things that aren't going to make it in (e.g. replacing a 1.2.0 version with 1.2.1 and 1.3.0) or move back in things that got committed between candidates (e.g. by replacing 1.2.1 with 1.2.0). {quote} Can we do a middle ground where we keep the fixVersions in jira, and filter them out in CHANGES.txt for convenience if not needed? {quote} The problem with this is that then the release notes in Jira will be less accurate than the CHANGES.txt file, and I'd very much like to point to the jira data as authoritative. What about having the committers continue with their current habit and then leaving it to the release managers to make things consistent around release time? Update CHANGES.txt for 1.2 -- Key: HBASE-14025 URL: https://issues.apache.org/jira/browse/HBASE-14025 Project: HBase Issue Type: Sub-task Components: documentation Affects Versions: 1.2.0 Reporter: Sean Busbey Assignee: Sean Busbey Fix For: 1.2.0 Since it's more effort than I expected, making a ticket to track actually updating CHANGES.txt so that new RMs have an idea what to expect. Maybe will make doc changes if there's enough here. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13446) Add docs warning about missing data for downstream on versions prior to HBASE-13262
[ https://issues.apache.org/jira/browse/HBASE-13446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Enis Soztutar updated HBASE-13446: -- Fix Version/s: (was: 1.0.2) 1.0.3 Add docs warning about missing data for downstream on versions prior to HBASE-13262 --- Key: HBASE-13446 URL: https://issues.apache.org/jira/browse/HBASE-13446 Project: HBase Issue Type: Task Components: documentation Affects Versions: 0.98.0, 1.0.0 Reporter: Sean Busbey Priority: Critical Fix For: 2.0.0, 0.98.14, 1.0.3 From conversation at the end of HBASE-13262: [~davelatham] {quote} Should we put a warning somewhere (mailing list? book?) about this? Something like: IF (client OR server is = 0.98.11/1.0.0) AND server has a smaller value for hbase.client.scanner.max.result.size than client does, THEN scan requests that reach the server's hbase.client.scanner.max.result.size are likely to miss data. In particular, 0.98.11 defaults hbase.client.scanner.max.result.size to 2MB but other versions default to larger values, so be very careful using 0.98.11 servers with any other client version. {quote} [~busbey] {quote} How about we add a note in the ref guide for upgrades and for troubleshooting? {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14040) Small refactoring in BackupHandler
Vladimir Rodionov created HBASE-14040: - Summary: Small refactoring in BackupHandler Key: HBASE-14040 URL: https://issues.apache.org/jira/browse/HBASE-14040 Project: HBase Issue Type: Improvement Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 Move distributed log roll procedure call to BackupHandler.call from IncrementalBackupManager.getLogFilesForNewBackup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14000) Region server failed to report Master and stuck in reportForDuty retry loop
[ https://issues.apache.org/jira/browse/HBASE-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617210#comment-14617210 ] Jerry He commented on HBASE-14000: -- In HBASE-13317, we try to be conservative if the region sever gets ServerNotRunningYetException when reportForDuty. ServerNotRunningYetException means the master may still be initializing, so there may not be an immediate need to try a new RPC connection. In your case, do you see the loop stuck for a long time, meaning that the old master continued to return ServerNotRunningYetException for a long time? Region server failed to report Master and stuck in reportForDuty retry loop --- Key: HBASE-14000 URL: https://issues.apache.org/jira/browse/HBASE-14000 Project: HBase Issue Type: Bug Reporter: Pankaj Kumar Assignee: Pankaj Kumar Attachments: HBASE-14000.patch In a HA cluster, region server got stuck in reportForDuty retry loop if the active master is restarting and later on master switch happens before it reports successfully. Root cause is same as HBASE-13317, but the region server tried to connect master when it was starting, so rssStub reset didnt happen as {code} if (ioe instanceof ServerNotRunningYetException) { LOG.debug(Master is not running yet); } {code} When master starts, master switch happened. So RS always tried to connect to standby master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13561: -- Resolution: Fixed Hadoop Flags: Reviewed Fix Version/s: (was: 1.0.3) (was: 0.98.14) Status: Resolved (was: Patch Available) Pushed to branch-1, branch-1.1, branch-1.2, and master. Did not push to branch-1.0 or 0.98 (if you put up a backport, I'll apply to these branches too [~elserj]) ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14032) Backup/Restore Phase 1: Abstract SnapshotCopy (full backup)
Vladimir Rodionov created HBASE-14032: - Summary: Backup/Restore Phase 1: Abstract SnapshotCopy (full backup) Key: HBASE-14032 URL: https://issues.apache.org/jira/browse/HBASE-14032 Project: HBase Issue Type: Task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Abstract SnapshotCopy (full backup) to support non-M/R based implementations. Provide M/R implementation. SnapshotCopy is used to copy snapshot’s data during full backup operation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617085#comment-14617085 ] Josh Elser commented on HBASE-13561: Awesome. Thanks for the shepherding, [~stack]. LMK if you have any issues backporting, I can rebase easily enough if desired. ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Attachments: HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13897) OOM may occur when Import imports a row with too many KeyValues
[ https://issues.apache.org/jira/browse/HBASE-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617136#comment-14617136 ] Ted Yu commented on HBASE-13897: QA run passed: {code} [INFO] HBase - Shaded - Server ... SUCCESS [ 0.488 s] [INFO] [INFO] BUILD SUCCESS [INFO] [INFO] Total time: 02:09 h [INFO] Finished at: 2015-07-07T18:24:00+00:00 {code} OOM may occur when Import imports a row with too many KeyValues --- Key: HBASE-13897 URL: https://issues.apache.org/jira/browse/HBASE-13897 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Liu Junhong Assignee: Liu Junhong Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: 13897-v2.txt, HBASE-13897-0.98.patch, HBASE-13897-master-20150629.patch, HBASE-13897-master-20150630.patch, HBASE-13897-master-20150707.patch, HBASE-13897-master.patch When importing a row with too many KeyValues (may have too many columns or versions),KeyValueReducer will incur OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14031) Backup/Restore Phase 1: Abstract DistCp in incremental backup
Vladimir Rodionov created HBASE-14031: - Summary: Backup/Restore Phase 1: Abstract DistCp in incremental backup Key: HBASE-14031 URL: https://issues.apache.org/jira/browse/HBASE-14031 Project: HBase Issue Type: Task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 Abstract DistCp (incremental backup) to support non-M/R based implementations. Provide M/R implementation. DistCp is used to copy WAL files during incremental backup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12848) Utilize Flash storage for WAL
[ https://issues.apache.org/jira/browse/HBASE-12848?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617170#comment-14617170 ] ramkrishna.s.vasudevan commented on HBASE-12848: Reading the intent of the JIRA it is ideally to specify where the active WAL is going to be and not on the archived WALs or any file. The archived files movement should happen at the back end something like a copy option. So it should be a back end option. Moving some files to SSDs also should be a backend option and it is more like HDFS doing it on receiving an instruciton to move the files. Utilize Flash storage for WAL - Key: HBASE-12848 URL: https://issues.apache.org/jira/browse/HBASE-12848 Project: HBase Issue Type: Sub-task Reporter: Ted Yu Assignee: Ted Yu Fix For: 2.0.0, 1.1.0 Attachments: 12848-v1.patch, 12848-v2.patch, 12848-v3.patch, 12848-v4.patch, 12848-v4.patch One way to improve data ingestion rate is to make use of Flash storage. HDFS is doing the heavy lifting - see HDFS-7228. We assume an environment where: 1. Some servers have a mix of flash, e.g. 2 flash drives and 4 traditional drives. 2. Some servers have all traditional storage. 3. RegionServers are deployed on both profiles within one HBase cluster. This JIRA allows WAL to be managed on flash in a mixed-profile environment. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617250#comment-14617250 ] Hadoop QA commented on HBASE-13561: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743994/HBASE-13561-v2.patch against master branch at commit 7acb061e63614ad957da654f920f54ac7a02edd6. ATTACHMENT ID: 12743994 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 4 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.phoenix.mapreduce.IndexToolIT.testMutalbleIndexWithUpdates(IndexToolIT.java:228) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14696//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14696//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14696//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14696//console This message is automatically generated. ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Attachments: HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14033) Backup/Restore Phase1: Abstract WALPlayer (incremental restore)
Vladimir Rodionov created HBASE-14033: - Summary: Backup/Restore Phase1: Abstract WALPlayer (incremental restore) Key: HBASE-14033 URL: https://issues.apache.org/jira/browse/HBASE-14033 Project: HBase Issue Type: Task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 Abstract WALPlayer (incremental restore) to support non-M/R based implementations. Provide M/R implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14030) HBase Backup/Restore Phase 1
[ https://issues.apache.org/jira/browse/HBASE-14030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-14030: -- Summary: HBase Backup/Restore Phase 1 (was: Backup/Restore Phase 1) HBase Backup/Restore Phase 1 Key: HBASE-14030 URL: https://issues.apache.org/jira/browse/HBASE-14030 Project: HBase Issue Type: Umbrella Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov This is the umbrella ticket for Backup/Restore Phase 1. See HBase-7912 design doc for the phase description. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14031) HBase Backup/Restore Phase 1: Abstract DistCp in incremental backup
[ https://issues.apache.org/jira/browse/HBASE-14031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-14031: -- Summary: HBase Backup/Restore Phase 1: Abstract DistCp in incremental backup (was: Backup/Restore Phase 1: Abstract DistCp in incremental backup) HBase Backup/Restore Phase 1: Abstract DistCp in incremental backup --- Key: HBASE-14031 URL: https://issues.apache.org/jira/browse/HBASE-14031 Project: HBase Issue Type: Task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 Abstract DistCp (incremental backup) to support non-M/R based implementations. Provide M/R implementation. DistCp is used to copy WAL files during incremental backup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14039) BackupHandler.deleteSnapshot MUST use HBase Snapshot API
Vladimir Rodionov created HBASE-14039: - Summary: BackupHandler.deleteSnapshot MUST use HBase Snapshot API Key: HBASE-14039 URL: https://issues.apache.org/jira/browse/HBASE-14039 Project: HBase Issue Type: Improvement Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 BackupHandler.deleteSnapshot MUST use HBase API for that (HBaseAdmin) - not direct FS access (deleting snapshot folder may be not enough?). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13867) Add endpoint coprocessor guide to HBase book
[ https://issues.apache.org/jira/browse/HBASE-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617215#comment-14617215 ] Hadoop QA commented on HBASE-13867: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743985/HBASE-13867.1.patch against master branch at commit 7acb061e63614ad957da654f920f54ac7a02edd6. ATTACHMENT ID: 12743985 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 1 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 checkstyle{color}. The applied patch does not increase the total number of checkstyle errors {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: +HBase Coprocessors are modeled after the Coprocessors which are part of Google's BigTable (http://static.googleusercontent.com/media/research.google.com/en//people/jeff/SOCC2010-keynote-slides.pdf, pages 41-42.). + +Coprocessor is a framework that provides an easy way to run your custom code directly on Region Server. +. Mingjie Lai's blog post link:https://blogs.apache.org/hbase/entry/coprocessor_introduction[Coprocessor Introduction]. +. Gaurav Bhardwaj's blog post link:http://www.3pillarglobal.com/insights/hbase-coprocessors[The How To Of HBase Coprocessors]. +When working with any data store (like RDBMS or HBase) you fetch the data (in case of RDBMS you might use SQL query and in case of HBase you use either Get or Scan). To fetch only relevant data you filter it (for RDBMS you put conditions in 'WHERE' clause and in HBase you use Filters). After fetching the desired data, you perform your business computation on the data. + +This scenario is close to ideal for small data, where few thousand rows and a bunch of columns are returned from the data store. Now imagine a scenario where there are billions of rows and millions of columns and you want to perform some computation which requires all the data, like calculating average or sum. Even if you are interested in just few columns, you still have to fetch all the rows. There are a few drawbacks in this approach as described below: +. In this approach the data transfer (from data store to client side) will become the bottleneck, and the time required to complete the operation is limited by the rate at which data transfer is taking place. +. Bandwidth is one of the most precious resources in any data center. Operations like this will severely impact the performance of your cluster. +. Your client code is becoming thick as you are maintaining the code for calculating average or summation on client side. Not a major drawback when talking of severe issues like performance/bandwidth but still worth giving consideration. +In a scenario like this it's better to move the computation (i.e. user's custom code) to the data itself (Region Server). Coprocessor helps you achieve this but you can do more than that. There is another advantage that your code runs in parallel (i.e. on all Regions). To give an idea of Coprocessor's capabilities, different people give different analogies. The three most famous analogies for Coprocessor present in the industry are: {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . {color:red}-1 core zombie tests{color}. There are 1 zombie test(s): at org.apache.oozie.test.MiniHCatServer$1.run(MiniHCatServer.java:137) at org.apache.oozie.test.XTestCase$MiniClusterShutdownMonitor.run(XTestCase.java:1071) at org.apache.oozie.test.XTestCase.waitFor(XTestCase.java:692) at org.apache.oozie.action.hadoop.TestMapReduceActionExecutor.testSetExecutionStats_when_user_has_specified_stats_write_TRUE(TestMapReduceActionExecutor.java:976) Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14695//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14695//artifact/patchprocess/newFindbugsWarnings.html
[jira] [Created] (HBASE-14035) HBase Backup/Restore Phase 1: hbase:backup - backup system table
Vladimir Rodionov created HBASE-14035: - Summary: HBase Backup/Restore Phase 1: hbase:backup - backup system table Key: HBASE-14035 URL: https://issues.apache.org/jira/browse/HBASE-14035 Project: HBase Issue Type: Task Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 *hbase:backup* - move all backup meta info from Zk (coordination manager) to hbase system table. Do not use Zk (coordination manager) as a persistent storage. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14037) Deletion of a table from backup set results int RTE during next backup
Vladimir Rodionov created HBASE-14037: - Summary: Deletion of a table from backup set results int RTE during next backup Key: HBASE-14037 URL: https://issues.apache.org/jira/browse/HBASE-14037 Project: HBase Issue Type: Bug Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 Deletion of a table with backup history (has Zk node) results in RuntimeException on all subsequent backup requests. See: BackupClient.requestBackup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13890) Get/Scan from MemStore only (Client API)
[ https://issues.apache.org/jira/browse/HBASE-13890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616950#comment-14616950 ] Vladimir Rodionov commented on HBASE-13890: --- {quote} I see.. Checking the patch now. So this will fail it to client.. Can the get op be repeated (with out memstore only setting) at server side only? {quote} Yes, I think it can be improved. I am working on patch #2. I want to clarify little bit what is this patch is for. This is mostly to improve high performance counters (HPC), not Get, not Append (is anybody using them anyway) and not Scan operations. Most recent version of HPCs are always in Memstore (99.99% of a time), but each store file in this region/cf has its version as well (before major compaction). When HBase reads counter it has to go through all store files and compare results - very inefficient. This patch allows to bypass store files completely most of the time. For Get/Scan from MemStore only (Client API) Key: HBASE-13890 URL: https://issues.apache.org/jira/browse/HBASE-13890 Project: HBase Issue Type: New Feature Components: API, Client, Scanners Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Attachments: HBASE-13890-v1.patch This is short-circuit read for get/scan when recent data (version) of a cell can be found only in MemStore (with very high probability). Good examples are: Atomic counters and appends. This feature will allow to bypass completely store file scanners and improve performance and latency. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14030) HBase Backup/Restore Phase 1
[ https://issues.apache.org/jira/browse/HBASE-14030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vladimir Rodionov updated HBASE-14030: -- Attachment: HBASE-14030-v0.patch Current version (patch) derived from HBASE-11085 is attached. HBase Backup/Restore Phase 1 Key: HBASE-14030 URL: https://issues.apache.org/jira/browse/HBASE-14030 Project: HBase Issue Type: Umbrella Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Attachments: HBASE-14030-v0.patch This is the umbrella ticket for Backup/Restore Phase 1. See HBase-7912 design doc for the phase description. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-7912) HBase Backup/Restore Based on HBase Snapshot
[ https://issues.apache.org/jira/browse/HBASE-7912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617316#comment-14617316 ] Vladimir Rodionov commented on HBASE-7912: -- For the most recent updates please go to HBASE-14030. HBase Backup/Restore Based on HBase Snapshot Key: HBASE-7912 URL: https://issues.apache.org/jira/browse/HBASE-7912 Project: HBase Issue Type: Sub-task Reporter: Richard Ding Assignee: Vladimir Rodionov Labels: backup Fix For: 2.0.0 Attachments: HBaseBackupRestore-Jira-7912-DesignDoc-v1.pdf, HBaseBackupRestore-Jira-7912-DesignDoc-v2.pdf, HBaseBackupRestore-Jira-7912-v4.pdf, HBaseBackupRestore-Jira-7912-v5 .pdf, HBaseBackupRestore-Jira-7912-v6.pdf, HBase_BackupRestore-Jira-7912-CLI-v1.pdf Finally, we completed the implementation of our backup/restore solution, and would like to share with community through this jira. We are leveraging existing hbase snapshot feature, and provide a general solution to common users. Our full backup is using snapshot to capture metadata locally and using exportsnapshot to move data to another cluster; the incremental backup is using offline-WALplayer to backup HLogs; we also leverage global distribution rolllog and flush to improve performance; other added-on values such as convert, merge, progress report, and CLI commands. So that a common user can backup hbase data without in-depth knowledge of hbase. Our solution also contains some usability features for enterprise users. The detail design document and CLI command will be attached in this jira. We plan to use 10~12 subtasks to share each of the following features, and document the detail implement in the subtasks: * *Full Backup* : provide local and remote back/restore for a list of tables * *offline-WALPlayer* to convert HLog to HFiles offline (for incremental backup) * *distributed* Logroll and distributed flush * Backup *Manifest* and history * *Incremental* backup: to build on top of full backup as daily/weekly backup * *Convert* incremental backup WAL files into hfiles * *Merge* several backup images into one(like merge weekly into monthly) * *add and remove* table to and from Backup image * *Cancel* a backup process * backup progress *status* * full backup based on *existing snapshot* *-* *Below is the original description, to keep here as the history for the design and discussion back in 2013* There have been attempts in the past to come up with a viable HBase backup/restore solution (e.g., HBASE-4618). Recently, there are many advancements and new features in HBase, for example, FileLink, Snapshot, and Distributed Barrier Procedure. This is a proposal for a backup/restore solution that utilizes these new features to achieve better performance and consistency. A common practice of backup and restore in database is to first take full baseline backup, and then periodically take incremental backup that capture the changes since the full baseline backup. HBase cluster can store massive amount data. Combination of full backups with incremental backups has tremendous benefit for HBase as well. The following is a typical scenario for full and incremental backup. # The user takes a full backup of a table or a set of tables in HBase. # The user schedules periodical incremental backups to capture the changes from the full backup, or from last incremental backup. # The user needs to restore table data to a past point of time. # The full backup is restored to the table(s) or to different table name(s). Then the incremental backups that are up to the desired point in time are applied on top of the full backup. We would support the following key features and capabilities. * Full backup uses HBase snapshot to capture HFiles. * Use HBase WALs to capture incremental changes, but we use bulk load of HFiles for fast incremental restore. * Support single table or a set of tables, and column family level backup and restore. * Restore to different table names. * Support adding additional tables or CF to backup set without interruption of incremental backup schedule. * Support rollup/combining of incremental backups into longer period and bigger incremental backups. * Unified command line interface for all the above. The solution will support HBase backup to FileSystem, either on the same cluster or across clusters. It has the flexibility to support backup to other devices and servers in the future. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14038) Incremental backup list set is ignored during backup
[ https://issues.apache.org/jira/browse/HBASE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617416#comment-14617416 ] Jerry He commented on HBASE-14038: -- Hi, [~vrodionov] In the original design, this is intended. The incremental backup is controlled by the 'incremental backup table set'. For example, if the table set contains (table1, table2, table3). Incremental backup will back up the WALs, which cover all the tables in the table set. It is to avoid copying the same set of WALs, which would the likely case if you backup up table1, then backup table2. Incremental backup list set is ignored during backup Key: HBASE-14038 URL: https://issues.apache.org/jira/browse/HBASE-14038 Project: HBase Issue Type: Bug Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 BUG: during incremental backup, provided table list is ignored and replaced with the set of tables which have been already backuped before. Test case: backup T1, T2, T3, then request incremental backup for T1, T2 = T3 will be included as well. See: BackupClient.requestBackup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14038) Incremental backup list set is ignored during backup
[ https://issues.apache.org/jira/browse/HBASE-14038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617454#comment-14617454 ] Vladimir Rodionov commented on HBASE-14038: --- OK, this is not a bug, but a feature, at least, until we implement WAL filtering (by table(s)) on backup. I will leave it for now, but in a future, when Phase 2 begins, I will link this JIRA to Phase 2. Incremental backup list set is ignored during backup Key: HBASE-14038 URL: https://issues.apache.org/jira/browse/HBASE-14038 Project: HBase Issue Type: Bug Reporter: Vladimir Rodionov Assignee: Vladimir Rodionov Fix For: 2.0.0 BUG: during incremental backup, provided table list is ignored and replaced with the set of tables which have been already backuped before. Test case: backup T1, T2, T3, then request incremental backup for T1, T2 = T3 will be included as well. See: BackupClient.requestBackup. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617464#comment-14617464 ] Ted Yu commented on HBASE-13965: Loaded patch v5 on a small cluster and obtained the following: {code} }, { name : Hadoop:service=HBase,name=Master,sub=Balancer, modelerType : Master,sub=Balancer, tag.Context : master, tag.Hostname : cn013.l42scl.hortonworks.com, IntegrationTestBigLinkedList_StoreFileCostFunction : 3.262317387568032, IntegrationTestBigLinkedList_LocalityCostFunction : 2.4739584, IntegrationTestBigLinkedList_TableSkewCostFunction : 5.60546874999, IntegrationTestBigLinkedList_Overall : 35.09174447090137, IntegrationTestBigLinkedList_WriteRequestCostFunction : 0.0, IntegrationTestBigLinkedList_RegionCountSkewCostFunction : 0.0, IntegrationTestBigLinkedList_ReadRequestCostFunction : 5.0, IntegrationTestBigLinkedList_MemstoreSizeCostFunction : 0.0, IntegrationTestBigLinkedList_RegionReplicaHostCostFunction : 0.0, IntegrationTestBigLinkedList_RegionReplicaRackCostFunction : 0.0, IntegrationTestBigLinkedList_MoveCostFunction : 18.75, {code} Do you think it makes sense to expose each cost (other than Overall) as percentage ? This way, it is easier for user to figure out which cost is the dominant factor. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13897) OOM may occur when Import imports a row with too many KeyValues
[ https://issues.apache.org/jira/browse/HBASE-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617377#comment-14617377 ] Hudson commented on HBASE-13897: SUCCESS: Integrated in HBase-TRUNK #6635 (See [https://builds.apache.org/job/HBase-TRUNK/6635/]) HBASE-13897 OOM may occur when Import imports a row with too many KeyValues (Liu Junhong) (tedyu: rev 1162cbdf15acfc63b64835cb9e7ef29d5b9c6494) * hbase-server/src/main/java/org/apache/hadoop/hbase/mapreduce/Import.java OOM may occur when Import imports a row with too many KeyValues --- Key: HBASE-13897 URL: https://issues.apache.org/jira/browse/HBASE-13897 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Liu Junhong Assignee: Liu Junhong Fix For: 2.0.0, 0.98.14, 1.3.0 Attachments: 13897-v2.txt, HBASE-13897-0.98.patch, HBASE-13897-master-20150629.patch, HBASE-13897-master-20150630.patch, HBASE-13897-master-20150707.patch, HBASE-13897-master.patch When importing a row with too many KeyValues (may have too many columns or versions),KeyValueReducer will incur OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617475#comment-14617475 ] Hudson commented on HBASE-13561: SUCCESS: Integrated in HBase-1.2-IT #42 (See [https://builds.apache.org/job/HBase-1.2-IT/42/]) HBASE-13561 ITBLL.Verify doesn't actually evaluate counters after job completes (Josh Elser) (stack: rev 5a11c80aa0fe6e19f16abc9346467e4eef179526) * hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617470#comment-14617470 ] Hudson commented on HBASE-13561: SUCCESS: Integrated in HBase-1.3-IT #26 (See [https://builds.apache.org/job/HBase-1.3-IT/26/]) HBASE-13561 ITBLL.Verify doesn't actually evaluate counters after job completes (Josh Elser) (stack: rev 4e84ac7924a4f09be05c57ec018c796b960d3760) * hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13897) OOM may occur when Import imports a row with too many KeyValues
[ https://issues.apache.org/jira/browse/HBASE-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616847#comment-14616847 ] Hadoop QA commented on HBASE-13897: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743970/HBASE-13897-master-20150707.patch against master branch at commit 7acb061e63614ad957da654f920f54ac7a02edd6. ATTACHMENT ID: 12743970 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14692//console This message is automatically generated. OOM may occur when Import imports a row with too many KeyValues --- Key: HBASE-13897 URL: https://issues.apache.org/jira/browse/HBASE-13897 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Liu Junhong Assignee: Liu Junhong Fix For: 0.98.14 Attachments: HBASE-13897-0.98.patch, HBASE-13897-master-20150629.patch, HBASE-13897-master-20150630.patch, HBASE-13897-master-20150707.patch, HBASE-13897-master.patch When importing a row with too many KeyValues (may have too many columns or versions),KeyValueReducer will incur OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13897) OOM may occur when Import imports a row with too many KeyValues
[ https://issues.apache.org/jira/browse/HBASE-13897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13897: --- Attachment: 13897-v2.txt OOM may occur when Import imports a row with too many KeyValues --- Key: HBASE-13897 URL: https://issues.apache.org/jira/browse/HBASE-13897 Project: HBase Issue Type: Bug Affects Versions: 0.98.13 Reporter: Liu Junhong Assignee: Liu Junhong Fix For: 0.98.14 Attachments: 13897-v2.txt, HBASE-13897-0.98.patch, HBASE-13897-master-20150629.patch, HBASE-13897-master-20150630.patch, HBASE-13897-master-20150707.patch, HBASE-13897-master.patch When importing a row with too many KeyValues (may have too many columns or versions),KeyValueReducer will incur OOM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14023) HBase Srores NULL Value from delimited File Input
[ https://issues.apache.org/jira/browse/HBASE-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616863#comment-14616863 ] Soumendra Kumar Mishra commented on HBASE-14023: Can you provide the mail ID where I need to post the Issue? HBase Srores NULL Value from delimited File Input - Key: HBASE-14023 URL: https://issues.apache.org/jira/browse/HBASE-14023 Project: HBase Issue Type: Bug Reporter: Soumendra Kumar Mishra Data: 101,SMITH,41775,,1000,,100,10 102,ALLEN,,77597,2000,,,20 103,WARD,,,2000,500,,30 Result: ROW COLUMN+CELL 101 column=info:dept, timestamp=1435992182400, value=10 101 column=info:ename, timestamp=1435992182400, value=SMITH 101 column=pay:bonus, timestamp=1435992182400, value=100 101 column=pay:comm, timestamp=1435992182400, value= 101 column=pay:sal, timestamp=1435992182400, value=1000 101 column=tel:mobile, timestamp=1435992182400, value= 101 column=tel:telephone, timestamp=1435992182400, value=41775 I am using PIG to Write Data into HBase. Same issue happened when Data Inserted from TextFile to HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack updated HBASE-13561: -- Fix Version/s: 1.0.3 0.98.14 Pushed the 0.98 and branch-1.0 patches. Thanks [~elserj] ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Attachments: HBASE-13561-0.98-v2.patch, HBASE-13561-branch-1.0-v2.patch, HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617550#comment-14617550 ] Hudson commented on HBASE-13561: FAILURE: Integrated in HBase-1.1 #577 (See [https://builds.apache.org/job/HBase-1.1/577/]) HBASE-13561 ITBLL.Verify doesn't actually evaluate counters after job completes (Josh Elser) (stack: rev 555c42a3f1c89196d9d9f0a70cd73fa7464fa42c) * hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.
[ https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13971: --- Attachment: 13971-v1.txt Flushes stuck since 6 hours on a regionserver. -- Key: HBASE-13971 URL: https://issues.apache.org/jira/browse/HBASE-13971 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.3.0 Environment: Caused while running IntegrationTestLoadAndVerify for 20 M rows on cluster with 32 region servers each with max heap size of 24GBs. Reporter: Abhilash Priority: Critical Attachments: 13971-v1.txt, jstack.1, jstack.2, jstack.3, jstack.4, jstack.5, rsDebugDump.txt, screenshot-1.png One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper jobs and 100 back references. ~37 Million writes on each regionserver till now but no writes happening on any regionserver from past 6 hours and their memstore size is zero(I dont know if this is related). But this particular regionserver has memstore size of 9GBs from past 6 hours. Relevant snaps from debug dump: Tasks: === Task: Flushing IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd. Status: RUNNING:Preparing to flush by snapshotting stores in 8e2d075f94ce7699f416ec4ced9873cd Running for 22034s Task: Flushing IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390. Status: RUNNING:Preparing to flush by snapshotting stores in 9f8d0e01a40405b835bf6e5a22a86390 Running for 22033s Executors: === ... Thread 139 (MemStoreFlusher.1): State: WAITING Blocked count: 139711 Waited count: 239212 Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) Thread 137 (MemStoreFlusher.0): State: WAITING Blocked count: 138931 Waited count: 237448 Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.
[ https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13971: --- Attachment: (was: 13971-v1.txt) Flushes stuck since 6 hours on a regionserver. -- Key: HBASE-13971 URL: https://issues.apache.org/jira/browse/HBASE-13971 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.3.0 Environment: Caused while running IntegrationTestLoadAndVerify for 20 M rows on cluster with 32 region servers each with max heap size of 24GBs. Reporter: Abhilash Priority: Critical Attachments: 13971-v1.txt, jstack.1, jstack.2, jstack.3, jstack.4, jstack.5, rsDebugDump.txt, screenshot-1.png One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper jobs and 100 back references. ~37 Million writes on each regionserver till now but no writes happening on any regionserver from past 6 hours and their memstore size is zero(I dont know if this is related). But this particular regionserver has memstore size of 9GBs from past 6 hours. Relevant snaps from debug dump: Tasks: === Task: Flushing IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd. Status: RUNNING:Preparing to flush by snapshotting stores in 8e2d075f94ce7699f416ec4ced9873cd Running for 22034s Task: Flushing IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390. Status: RUNNING:Preparing to flush by snapshotting stores in 9f8d0e01a40405b835bf6e5a22a86390 Running for 22033s Executors: === ... Thread 139 (MemStoreFlusher.1): State: WAITING Blocked count: 139711 Waited count: 239212 Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) Thread 137 (MemStoreFlusher.0): State: WAITING Blocked count: 138931 Waited count: 237448 Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617627#comment-14617627 ] Hudson commented on HBASE-13561: SUCCESS: Integrated in HBase-1.3 #41 (See [https://builds.apache.org/job/HBase-1.3/41/]) HBASE-13561 ITBLL.Verify doesn't actually evaluate counters after job completes (Josh Elser) (stack: rev 4e84ac7924a4f09be05c57ec018c796b960d3760) * hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13561-0.98-v2.patch, HBASE-13561-branch-1.0-v2.patch, HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617641#comment-14617641 ] Hudson commented on HBASE-13561: FAILURE: Integrated in HBase-TRUNK #6636 (See [https://builds.apache.org/job/HBase-TRUNK/6636/]) HBASE-13561 ITBLL.Verify doesn't actually evaluate counters after job completes (Josh Elser) (stack: rev f5ad736282c8c9c27b14131919d60b72834ec9e4) * hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13561-0.98-v2.patch, HBASE-13561-branch-1.0-v2.patch, HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13971) Flushes stuck since 6 hours on a regionserver.
[ https://issues.apache.org/jira/browse/HBASE-13971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ted Yu updated HBASE-13971: --- Attachment: 13971-v1.txt First attempt. Set upper limit for the duration for which HRegion waits for sequence number to be assigned. Flushes stuck since 6 hours on a regionserver. -- Key: HBASE-13971 URL: https://issues.apache.org/jira/browse/HBASE-13971 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 1.3.0 Environment: Caused while running IntegrationTestLoadAndVerify for 20 M rows on cluster with 32 region servers each with max heap size of 24GBs. Reporter: Abhilash Priority: Critical Attachments: 13971-v1.txt, jstack.1, jstack.2, jstack.3, jstack.4, jstack.5, rsDebugDump.txt, screenshot-1.png One region server stuck while flushing(possible deadlock). Its trying to flush two regions since last 6 hours (see the screenshot). Caused while running IntegrationTestLoadAndVerify for 20 M rows with 600 mapper jobs and 100 back references. ~37 Million writes on each regionserver till now but no writes happening on any regionserver from past 6 hours and their memstore size is zero(I dont know if this is related). But this particular regionserver has memstore size of 9GBs from past 6 hours. Relevant snaps from debug dump: Tasks: === Task: Flushing IntegrationTestLoadAndVerify,R\x9B\x1B\xBF\xAE\x08\xD1\xA2,1435179555993.8e2d075f94ce7699f416ec4ced9873cd. Status: RUNNING:Preparing to flush by snapshotting stores in 8e2d075f94ce7699f416ec4ced9873cd Running for 22034s Task: Flushing IntegrationTestLoadAndVerify,\x93\xA385\x81Z\x11\xE6,1435179555993.9f8d0e01a40405b835bf6e5a22a86390. Status: RUNNING:Preparing to flush by snapshotting stores in 9f8d0e01a40405b835bf6e5a22a86390 Running for 22033s Executors: === ... Thread 139 (MemStoreFlusher.1): State: WAITING Blocked count: 139711 Waited count: 239212 Waiting on java.util.concurrent.CountDownLatch$Sync@b9c094a Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2011) org.apache.hadoop.hbase.regionserver.HRegion.flushcache(HRegion.java:1902) org.apache.hadoop.hbase.regionserver.HRegion.flush(HRegion.java:1828) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:510) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.flushRegion(MemStoreFlusher.java:471) org.apache.hadoop.hbase.regionserver.MemStoreFlusher.access$900(MemStoreFlusher.java:75) org.apache.hadoop.hbase.regionserver.MemStoreFlusher$FlushHandler.run(MemStoreFlusher.java:259) java.lang.Thread.run(Thread.java:745) Thread 137 (MemStoreFlusher.0): State: WAITING Blocked count: 138931 Waited count: 237448 Waiting on java.util.concurrent.CountDownLatch$Sync@53f41f76 Stack: sun.misc.Unsafe.park(Native Method) java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) org.apache.hadoop.hbase.wal.WALKey.getSequenceId(WALKey.java:305) org.apache.hadoop.hbase.regionserver.HRegion.getNextSequenceId(HRegion.java:2422) org.apache.hadoop.hbase.regionserver.HRegion.internalPrepareFlushCache(HRegion.java:2168) org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2047)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617655#comment-14617655 ] Hudson commented on HBASE-13561: FAILURE: Integrated in HBase-1.2 #57 (See [https://builds.apache.org/job/HBase-1.2/57/]) HBASE-13561 ITBLL.Verify doesn't actually evaluate counters after job completes (Josh Elser) (stack: rev 5a11c80aa0fe6e19f16abc9346467e4eef179526) * hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Attachments: HBASE-13561-0.98-v2.patch, HBASE-13561-branch-1.0-v2.patch, HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-6028) Implement a cancel for in-progress compactions
[ https://issues.apache.org/jira/browse/HBASE-6028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617698#comment-14617698 ] Ishan Chhabra commented on HBASE-6028: -- [~esteban], are you working on this? Implement a cancel for in-progress compactions -- Key: HBASE-6028 URL: https://issues.apache.org/jira/browse/HBASE-6028 Project: HBase Issue Type: Bug Components: regionserver Reporter: Derek Wollenstein Assignee: Esteban Gutierrez Priority: Minor Labels: beginner Depending on current server load, it can be extremely expensive to run periodic minor / major compactions. It would be helpful to have a feature where a user could use the shell or a client tool to explicitly cancel an in-progress compactions. This would allow a system to recover when too many regions became eligible for compactions at once -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13867) Add endpoint coprocessor guide to HBase book
[ https://issues.apache.org/jira/browse/HBASE-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617701#comment-14617701 ] Gaurav Bhardwaj commented on HBASE-13867: - Is there any restriction that the length of a line should be less than 100 characters? By line I mean number of characters (including white spaces) between two carriage return. Note: http://hbase.apache.org/book.html, has many occurrences where line length is greater than 100. [~vrodionov], [~gliptak] please suggest. Add endpoint coprocessor guide to HBase book Key: HBASE-13867 URL: https://issues.apache.org/jira/browse/HBASE-13867 Project: HBase Issue Type: Task Components: Coprocessors, documentation Reporter: Vladimir Rodionov Assignee: Gaurav Bhardwaj Attachments: HBASE-13867.1.patch Endpoint coprocessors are very poorly documented. Coprocessor section of HBase book must be updated either with its own endpoint coprocessors HOW-TO guide or, at least, with the link(s) to some other guides. There is good description here: http://www.3pillarglobal.com/insights/hbase-coprocessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Elser updated HBASE-13561: --- Attachment: HBASE-13561-branch-1.0-v2.patch HBASE-13561-0.98-v2.patch v2 patches for 0.98 and branch-1.0 ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 1.1.2, 1.3.0, 1.2.1 Attachments: HBASE-13561-0.98-v2.patch, HBASE-13561-branch-1.0-v2.patch, HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-14023) HBase Srores NULL Value from delimited File Input
[ https://issues.apache.org/jira/browse/HBASE-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankaj Kumar reassigned HBASE-14023: Assignee: Pankaj Kumar HBase Srores NULL Value from delimited File Input - Key: HBASE-14023 URL: https://issues.apache.org/jira/browse/HBASE-14023 Project: HBase Issue Type: Bug Reporter: Soumendra Kumar Mishra Assignee: Pankaj Kumar Data: 101,SMITH,41775,,1000,,100,10 102,ALLEN,,77597,2000,,,20 103,WARD,,,2000,500,,30 Result: ROW COLUMN+CELL 101 column=info:dept, timestamp=1435992182400, value=10 101 column=info:ename, timestamp=1435992182400, value=SMITH 101 column=pay:bonus, timestamp=1435992182400, value=100 101 column=pay:comm, timestamp=1435992182400, value= 101 column=pay:sal, timestamp=1435992182400, value=1000 101 column=tel:mobile, timestamp=1435992182400, value= 101 column=tel:telephone, timestamp=1435992182400, value=41775 I am using PIG to Write Data into HBase. Same issue happened when Data Inserted from TextFile to HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12295) Prevent block eviction under us if reads are in progress from the BBs
[ https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616633#comment-14616633 ] Hadoop QA commented on HBASE-12295: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12743926/HBASE-12295_10.patch against master branch at commit 7acb061e63614ad957da654f920f54ac7a02edd6. ATTACHMENT ID: 12743926 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 41 new or modified tests. {color:green}+1 hadoop versions{color}. The patch compiles with all supported hadoop versions (2.4.0 2.4.1 2.5.0 2.5.1 2.5.2 2.6.0 2.7.0) {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 protoc{color}. The applied patch does not increase the total number of protoc compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 1908 checkstyle errors (more than the master's current 1898 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:red}-1 lineLengths{color}. The patch introduces the following lines longer than 100: + ret = new SharedMemorySizeCachedNoTagsKeyValue(blockBuffer.array(), blockBuffer.arrayOffset() +// those cells are referring to a shared memory area which if evicted by the BucketCache would lead +// readers using this block are aware of this fact and do the necessary action to prevent eviction +// An RpcCallBack that creates a list of scanners that needs to perform callBack operation on completion of multiGets +return Result.create(results, get.isCheckExistenceOnly() ? !results.isEmpty() : null, stale); + // HBaseAdmin only waits for regions to appear in hbase:meta we should wait until they are assigned + public void testGetsWithMultiColumnsAndExplicitTracker() throws IOException, InterruptedException { +private void slowdownCode(final ObserverContextRegionCoprocessorEnvironment e, boolean isGet) { +// call return twice because for the isCache cased the counter would have got incremented twice {color:green}+1 site{color}. The mvn post-site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/14690//testReport/ Release Findbugs (version 2.0.3)warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/14690//artifact/patchprocess/newFindbugsWarnings.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/14690//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/14690//console This message is automatically generated. Prevent block eviction under us if reads are in progress from the BBs - Key: HBASE-12295 URL: https://issues.apache.org/jira/browse/HBASE-12295 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-12295.pdf, HBASE-12295_1.patch, HBASE-12295_1.pdf, HBASE-12295_10.patch, HBASE-12295_2.patch, HBASE-12295_4.patch, HBASE-12295_4.pdf, HBASE-12295_5.pdf, HBASE-12295_9.patch, HBASE-12295_trunk.patch While we try to serve the reads from the BBs directly from the block cache, we need to ensure that the blocks does not get evicted under us while reading. This JIRA is to discuss and implement a strategy for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (HBASE-2236) Upper bound of outstanding WALs can be overrun; take 2 (take 1 was hbase-2053)
[ https://issues.apache.org/jira/browse/HBASE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen reassigned HBASE-2236: Assignee: Heng Chen Upper bound of outstanding WALs can be overrun; take 2 (take 1 was hbase-2053) -- Key: HBASE-2236 URL: https://issues.apache.org/jira/browse/HBASE-2236 Project: HBase Issue Type: Bug Components: regionserver, wal Reporter: stack Assignee: Heng Chen Priority: Critical Labels: moved_from_0_20_5 So hbase-2053 is not aggressive enough. WALs can still overwhelm the upper limit on log count. While the code added by HBASE-2053, when done, will ensure we let go of the oldest WAL, to do it, we might have to flush many regions. E.g: {code} 2010-02-15 14:20:29,351 INFO org.apache.hadoop.hbase.regionserver.HLog: Too many hlogs: logs=45, maxlogs=32; forcing flush of 5 regions(s): test1,193717,1266095474624, test1,194375,1266108228663, test1,195690,1266095539377, test1,196348,1266095539377, test1,197939,1266069173999 {code} This takes time. If we are taking on edits a furious rate, we might have rolled the log again, meantime, maybe more than once. Also log rolls happen inline with a put/delete as soon as it hits the 64MB (default) boundary whereas the necessary flushing is done in background by a single thread and the memstore can overrun the (default) 64MB size. Flushes needed to release logs will be mixed in with natural flushes as memstores fill. Flushes may take longer than the writing of an HLog because they can be larger. So, on an RS that is struggling the tendency would seem to be for a slight rise in WALs. Only if the RS gets a breather will the flusher catch up. If HBASE-2087 happens, then the count of WALs get a boost. Ideas to fix this for good would be : + Priority queue for queuing up flushes with those that are queued to free up WALs having priority + Improve the HBASE-2053 code so that it will free more than just the last WAL, maybe even queuing flushes so we clear all WALs such that we are back under the maximum WALS threshold again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-14029) getting started for standalone still references hadoop-version-specific binary artifacts
Sean Busbey created HBASE-14029: --- Summary: getting started for standalone still references hadoop-version-specific binary artifacts Key: HBASE-14029 URL: https://issues.apache.org/jira/browse/HBASE-14029 Project: HBase Issue Type: Bug Components: documentation Affects Versions: 1.0.0 Reporter: Sean Busbey As of HBase 1.0 we no longer have binary artifacts that are tied to a particular hadoop release. The current section of the ref guide for getting started with standalone mode still refers to them: {quote} Choose a download site from this list of Apache Download Mirrors. Click on the suggested top link. This will take you to a mirror of HBase Releases. Click on the folder named stable and then download the binary file that ends in .tar.gz to your local filesystem. Be sure to choose the version that corresponds with the version of Hadoop you are likely to use later. In most cases, you should choose the file for Hadoop 2, which will be called something like hbase-0.98.3-hadoop2-bin.tar.gz. Do not download the file ending in src.tar.gz for now. {quote} Either remove the reference or turn it into a note call-out for versions 0.98 and earlier. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14000) Region server failed to report Master and stuck in reportForDuty retry loop
[ https://issues.apache.org/jira/browse/HBASE-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616483#comment-14616483 ] Pankaj Kumar commented on HBASE-14000: -- [~tedyu], I tried but I am not able to simulate this scenario in test case. Region server failed to report Master and stuck in reportForDuty retry loop --- Key: HBASE-14000 URL: https://issues.apache.org/jira/browse/HBASE-14000 Project: HBase Issue Type: Bug Reporter: Pankaj Kumar Assignee: Pankaj Kumar Attachments: HBASE-14000.patch In a HA cluster, region server got stuck in reportForDuty retry loop if the active master is restarting and later on master switch happens before it reports successfully. Root cause is same as HBASE-13317, but the region server tried to connect master when it was starting, so rssStub reset didnt happen as {code} if (ioe instanceof ServerNotRunningYetException) { LOG.debug(Master is not running yet); } {code} When master starts, master switch happened. So RS always tried to connect to standby master. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-2236) Upper bound of outstanding WALs can be overrun; take 2 (take 1 was hbase-2053)
[ https://issues.apache.org/jira/browse/HBASE-2236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Heng Chen updated HBASE-2236: - Assignee: (was: Heng Chen) Upper bound of outstanding WALs can be overrun; take 2 (take 1 was hbase-2053) -- Key: HBASE-2236 URL: https://issues.apache.org/jira/browse/HBASE-2236 Project: HBase Issue Type: Bug Components: regionserver, wal Reporter: stack Priority: Critical Labels: moved_from_0_20_5 So hbase-2053 is not aggressive enough. WALs can still overwhelm the upper limit on log count. While the code added by HBASE-2053, when done, will ensure we let go of the oldest WAL, to do it, we might have to flush many regions. E.g: {code} 2010-02-15 14:20:29,351 INFO org.apache.hadoop.hbase.regionserver.HLog: Too many hlogs: logs=45, maxlogs=32; forcing flush of 5 regions(s): test1,193717,1266095474624, test1,194375,1266108228663, test1,195690,1266095539377, test1,196348,1266095539377, test1,197939,1266069173999 {code} This takes time. If we are taking on edits a furious rate, we might have rolled the log again, meantime, maybe more than once. Also log rolls happen inline with a put/delete as soon as it hits the 64MB (default) boundary whereas the necessary flushing is done in background by a single thread and the memstore can overrun the (default) 64MB size. Flushes needed to release logs will be mixed in with natural flushes as memstores fill. Flushes may take longer than the writing of an HLog because they can be larger. So, on an RS that is struggling the tendency would seem to be for a slight rise in WALs. Only if the RS gets a breather will the flusher catch up. If HBASE-2087 happens, then the count of WALs get a boost. Ideas to fix this for good would be : + Priority queue for queuing up flushes with those that are queued to free up WALs having priority + Improve the HBASE-2053 code so that it will free more than just the last WAL, maybe even queuing flushes so we clear all WALs such that we are back under the maximum WALS threshold again. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13988) Add exception handler for lease thread
[ https://issues.apache.org/jira/browse/HBASE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616541#comment-14616541 ] Liu Shaohui commented on HBASE-13988: - The addendum for patch v001 just remove the out-of-date comments about lease thread. I will commit it tomorrow if no objection. Add exception handler for lease thread -- Key: HBASE-13988 URL: https://issues.apache.org/jira/browse/HBASE-13988 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2, 1.3.0, 0.98.15 Attachments: HBASE-13988-addendum.diff, HBASE-13988-v001.diff, HBASE-13988-v002.diff In a prod cluster, a region server exited for some important threads were not alive. After excluding other threads from the log, we doubted the lease thread was the root. So we need to add an exception handler to the lease thread to debug why it exited in future. {quote} 2015-06-29,12:46:09,222 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: One or more threads are no longer alive -- stop 2015-06-29,12:46:09,223 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 21600 ... 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Thread-37 exiting 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver21600.compactionChecker exiting 2015-06-29,12:46:12,403 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher: regionserver21600.periodicFlusher exiting {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13988) Add exception handler for lease thread
[ https://issues.apache.org/jira/browse/HBASE-13988?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Liu Shaohui updated HBASE-13988: Attachment: HBASE-13988-addendum.diff addendum for patch v001 Add exception handler for lease thread -- Key: HBASE-13988 URL: https://issues.apache.org/jira/browse/HBASE-13988 Project: HBase Issue Type: Bug Affects Versions: 2.0.0 Reporter: Liu Shaohui Assignee: Liu Shaohui Priority: Minor Fix For: 2.0.0, 1.0.2, 1.2.0, 1.1.2, 1.3.0, 0.98.15 Attachments: HBASE-13988-addendum.diff, HBASE-13988-v001.diff, HBASE-13988-v002.diff In a prod cluster, a region server exited for some important threads were not alive. After excluding other threads from the log, we doubted the lease thread was the root. So we need to add an exception handler to the lease thread to debug why it exited in future. {quote} 2015-06-29,12:46:09,222 INFO org.apache.hadoop.hbase.regionserver.HRegionServer: STOPPED: One or more threads are no longer alive -- stop 2015-06-29,12:46:09,223 INFO org.apache.hadoop.ipc.HBaseServer: Stopping server on 21600 ... 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.LogRoller: LogRoller exiting. 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Thread-37 exiting 2015-06-29,12:46:09,330 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$CompactionChecker: regionserver21600.compactionChecker exiting 2015-06-29,12:46:12,403 INFO org.apache.hadoop.hbase.regionserver.HRegionServer$PeriodicMemstoreFlusher: regionserver21600.periodicFlusher exiting {quote} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12295) Prevent block eviction under us if reads are in progress from the BBs
[ https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-12295: --- Status: Open (was: Patch Available) Prevent block eviction under us if reads are in progress from the BBs - Key: HBASE-12295 URL: https://issues.apache.org/jira/browse/HBASE-12295 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-12295.pdf, HBASE-12295_1.patch, HBASE-12295_1.pdf, HBASE-12295_2.patch, HBASE-12295_4.patch, HBASE-12295_4.pdf, HBASE-12295_5.pdf, HBASE-12295_9.patch, HBASE-12295_trunk.patch While we try to serve the reads from the BBs directly from the block cache, we need to ensure that the blocks does not get evicted under us while reading. This JIRA is to discuss and implement a strategy for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12295) Prevent block eviction under us if reads are in progress from the BBs
[ https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-12295: --- Status: Patch Available (was: Open) Prevent block eviction under us if reads are in progress from the BBs - Key: HBASE-12295 URL: https://issues.apache.org/jira/browse/HBASE-12295 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-12295.pdf, HBASE-12295_1.patch, HBASE-12295_1.pdf, HBASE-12295_10.patch, HBASE-12295_2.patch, HBASE-12295_4.patch, HBASE-12295_4.pdf, HBASE-12295_5.pdf, HBASE-12295_9.patch, HBASE-12295_trunk.patch While we try to serve the reads from the BBs directly from the block cache, we need to ensure that the blocks does not get evicted under us while reading. This JIRA is to discuss and implement a strategy for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12295) Prevent block eviction under us if reads are in progress from the BBs
[ https://issues.apache.org/jira/browse/HBASE-12295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ramkrishna.s.vasudevan updated HBASE-12295: --- Attachment: HBASE-12295_10.patch Updated patch based on RB comments. Prevent block eviction under us if reads are in progress from the BBs - Key: HBASE-12295 URL: https://issues.apache.org/jira/browse/HBASE-12295 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: ramkrishna.s.vasudevan Assignee: ramkrishna.s.vasudevan Fix For: 2.0.0 Attachments: HBASE-12295.pdf, HBASE-12295_1.patch, HBASE-12295_1.pdf, HBASE-12295_10.patch, HBASE-12295_2.patch, HBASE-12295_4.patch, HBASE-12295_4.pdf, HBASE-12295_5.pdf, HBASE-12295_9.patch, HBASE-12295_trunk.patch While we try to serve the reads from the BBs directly from the block cache, we need to ensure that the blocks does not get evicted under us while reading. This JIRA is to discuss and implement a strategy for the same. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617754#comment-14617754 ] Hudson commented on HBASE-13561: FAILURE: Integrated in HBase-1.0 #989 (See [https://builds.apache.org/job/HBase-1.0/989/]) HBASE-13561 ITBLL.Verify doesn't actually evaluate counters after job completes (Josh Elser) (stack: rev 4fcc3103bda026a9b89414191896a6042af6e01d) * hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Attachments: HBASE-13561-0.98-v2.patch, HBASE-13561-branch-1.0-v2.patch, HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13561) ITBLL.Verify doesn't actually evaluate counters after job completes
[ https://issues.apache.org/jira/browse/HBASE-13561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617803#comment-14617803 ] Hudson commented on HBASE-13561: FAILURE: Integrated in HBase-0.98-on-Hadoop-1.1 #1004 (See [https://builds.apache.org/job/HBase-0.98-on-Hadoop-1.1/1004/]) HBASE-13561 ITBLL.Verify doesn't actually evaluate counters after job completes (Josh Elser) (stack: rev 2f6ef83adc203d6979e11f9527efe242d59ae04d) * hbase-it/src/test/java/org/apache/hadoop/hbase/test/IntegrationTestBigLinkedList.java ITBLL.Verify doesn't actually evaluate counters after job completes --- Key: HBASE-13561 URL: https://issues.apache.org/jira/browse/HBASE-13561 Project: HBase Issue Type: Bug Components: integration tests Affects Versions: 1.0.0, 2.0.0, 1.1.0, 0.98.12 Reporter: Josh Elser Assignee: Josh Elser Fix For: 2.0.0, 0.98.14, 1.1.2, 1.3.0, 1.2.1, 1.0.3 Attachments: HBASE-13561-0.98-v2.patch, HBASE-13561-branch-1.0-v2.patch, HBASE-13561-v1.patch, HBASE-13561-v2.patch Was digging through ITBLL and noticed this oddity: The {{Verify}} Tool doesn't actually call {{Verify#verify(long)}} like the {{Loop}} Tool does. Granted, it doesn't know the total number of KVs that were written given the current arguments, it's not even checking to see if there things like UNDEFINED records found. It seems to me that {{Verify}} should really be doing *some* checking on the counters like {{Loop}} does and not just leaving it up to the visual inspection of whomever launched the task. Am I missing something? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13867) Add endpoint coprocessor guide to HBase book
[ https://issues.apache.org/jira/browse/HBASE-13867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617832#comment-14617832 ] Gabor Liptak commented on HBASE-13867: -- [~ndimiduk] Would the 100 character length limit apply to adoc files too? Thanks Add endpoint coprocessor guide to HBase book Key: HBASE-13867 URL: https://issues.apache.org/jira/browse/HBASE-13867 Project: HBase Issue Type: Task Components: Coprocessors, documentation Reporter: Vladimir Rodionov Assignee: Gaurav Bhardwaj Attachments: HBASE-13867.1.patch Endpoint coprocessors are very poorly documented. Coprocessor section of HBase book must be updated either with its own endpoint coprocessors HOW-TO guide or, at least, with the link(s) to some other guides. There is good description here: http://www.3pillarglobal.com/insights/hbase-coprocessors -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13832) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low
[ https://issues.apache.org/jira/browse/HBASE-13832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13832: Priority: Blocker (was: Critical) Procedure V2: master fail to start due to WALProcedureStore sync failures when HDFS data nodes count is low --- Key: HBASE-13832 URL: https://issues.apache.org/jira/browse/HBASE-13832 Project: HBase Issue Type: Sub-task Components: master, proc-v2 Affects Versions: 2.0.0, 1.1.0, 1.2.0 Reporter: Stephen Yuan Jiang Assignee: Matteo Bertozzi Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.1.2, 1.3.0 Attachments: HBASE-13832-v0.patch, HBASE-13832-v1.patch, HBASE-13832-v2.patch, HBASE-13832-v4.patch, HBASE-13832-v5.patch, HDFSPipeline.java, hbase-13832-test-hang.patch, hbase-13832-v3.patch when the data node 3, we got failure in WALProcedureStore#syncLoop() during master start. The failure prevents master to get started. {noformat} 2015-05-29 13:27:16,625 ERROR [WALProcedureStoreSyncThread] wal.WALProcedureStore: Sync slot failed, abort. java.io.IOException: Failed to replace a bad datanode on the existing pipeline due to no more good datanodes being available to try. (Nodes: current=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983-490ece56c772,DISK]], original=[DatanodeInfoWithStorage[10.333.444.555:50010,DS-3ced-93f4-47b6-9c23-1426f7a6acdc,DISK], DatanodeInfoWithStorage[10.222.666.777:50010,DS-f9c983b4-1f10-4d5e-8983- 490ece56c772,DISK]]). The current failed datanode replacement policy is DEFAULT, and a client may configure this via 'dfs.client.block.write.replace-datanode-on-failure.policy' in its configuration. at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.findNewDatanode(DFSOutputStream.java:951) {noformat} One proposal is to implement some similar logic as FSHLog: if IOException is thrown during syncLoop in WALProcedureStore#start(), instead of immediate abort, we could try to roll the log and see whether this resolve the issue; if the new log cannot be created or more exception from rolling the log, we then abort. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13415) Procedure V2 - Use nonces for double submits from client
[ https://issues.apache.org/jira/browse/HBASE-13415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Busbey updated HBASE-13415: Priority: Blocker (was: Major) Procedure V2 - Use nonces for double submits from client Key: HBASE-13415 URL: https://issues.apache.org/jira/browse/HBASE-13415 Project: HBase Issue Type: Sub-task Components: master Reporter: Enis Soztutar Assignee: Stephen Yuan Jiang Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.3.0 The client can submit a procedure, but before getting the procId back, the master might fail. In this case, the client request will fail and the client will re-submit the request. If 1.1 client or if there is no contention for the table lock, the time window is pretty small, but still might happen. If the proc was accepted and stored in the procedure store, a re-submit from the client will add another procedure, which will execute after the first one. The first one will likely succeed, and the second one will fail (for example in the case of create table, the second one will throw TableExistsException). One idea is to use client generated nonces (that we already have) to guard against these cases. The client will submit the request with the nonce and the nonce will be saved together with the procedure in the store. In case of a double submit, the nonce-cache is checked and the procId of the original request is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14023) HBase Srores NULL Value from delimited File Input
[ https://issues.apache.org/jira/browse/HBASE-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pankaj Kumar updated HBASE-14023: - Assignee: (was: Pankaj Kumar) HBase Srores NULL Value from delimited File Input - Key: HBASE-14023 URL: https://issues.apache.org/jira/browse/HBASE-14023 Project: HBase Issue Type: Bug Reporter: Soumendra Kumar Mishra Data: 101,SMITH,41775,,1000,,100,10 102,ALLEN,,77597,2000,,,20 103,WARD,,,2000,500,,30 Result: ROW COLUMN+CELL 101 column=info:dept, timestamp=1435992182400, value=10 101 column=info:ename, timestamp=1435992182400, value=SMITH 101 column=pay:bonus, timestamp=1435992182400, value=100 101 column=pay:comm, timestamp=1435992182400, value= 101 column=pay:sal, timestamp=1435992182400, value=1000 101 column=tel:mobile, timestamp=1435992182400, value= 101 column=tel:telephone, timestamp=1435992182400, value=41775 I am using PIG to Write Data into HBase. Same issue happened when Data Inserted from TextFile to HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-13387) Add ByteBufferedCell an extension to Cell
[ https://issues.apache.org/jira/browse/HBASE-13387?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anoop Sam John updated HBASE-13387: --- Attachment: HBASE-13387_v2.patch Add ByteBufferedCell an extension to Cell - Key: HBASE-13387 URL: https://issues.apache.org/jira/browse/HBASE-13387 Project: HBase Issue Type: Sub-task Components: regionserver, Scanners Reporter: Anoop Sam John Assignee: Anoop Sam John Fix For: 2.0.0 Attachments: ByteBufferedCell.docx, HBASE-13387_v1.patch, HBASE-13387_v2.patch, WIP_HBASE-13387_V2.patch, WIP_ServerCell.patch, benchmark.zip This came in btw the discussion abt the parent Jira and recently Stack added as a comment on the E2E patch on the parent Jira. The idea is to add a new Interface 'ByteBufferedCell' in which we can add new buffer based getter APIs and getters for position in components in BB. We will keep this interface @InterfaceAudience.Private. When the Cell is backed by a DBB, we can create an Object implementing this new interface. The Comparators has to be aware abt this new Cell extension and has to use the BB based APIs rather than getXXXArray(). Also give util APIs in CellUtil to abstract the checks for new Cell type. (Like matchingXXX APIs, getValueAstype APIs etc) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-14023) HBase Srores NULL Value from delimited File Input
[ https://issues.apache.org/jira/browse/HBASE-14023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] stack resolved HBASE-14023. --- Resolution: Invalid Please ask the question on the mailing list rather than file an issue. HBase Srores NULL Value from delimited File Input - Key: HBASE-14023 URL: https://issues.apache.org/jira/browse/HBASE-14023 Project: HBase Issue Type: Bug Reporter: Soumendra Kumar Mishra Data: 101,SMITH,41775,,1000,,100,10 102,ALLEN,,77597,2000,,,20 103,WARD,,,2000,500,,30 Result: ROW COLUMN+CELL 101 column=info:dept, timestamp=1435992182400, value=10 101 column=info:ename, timestamp=1435992182400, value=SMITH 101 column=pay:bonus, timestamp=1435992182400, value=100 101 column=pay:comm, timestamp=1435992182400, value= 101 column=pay:sal, timestamp=1435992182400, value=1000 101 column=tel:mobile, timestamp=1435992182400, value= 101 column=tel:telephone, timestamp=1435992182400, value=41775 I am using PIG to Write Data into HBase. Same issue happened when Data Inserted from TextFile to HBase. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13415) Procedure V2 - Use nonces for double submits from client
[ https://issues.apache.org/jira/browse/HBASE-13415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14616736#comment-14616736 ] Sean Busbey commented on HBASE-13415: - bumping priority to Blocker for 1.2 per request from [~enis]. How's this coming along? Procedure V2 - Use nonces for double submits from client Key: HBASE-13415 URL: https://issues.apache.org/jira/browse/HBASE-13415 Project: HBase Issue Type: Sub-task Components: master Reporter: Enis Soztutar Assignee: Stephen Yuan Jiang Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.3.0 The client can submit a procedure, but before getting the procId back, the master might fail. In this case, the client request will fail and the client will re-submit the request. If 1.1 client or if there is no contention for the table lock, the time window is pretty small, but still might happen. If the proc was accepted and stored in the procedure store, a re-submit from the client will add another procedure, which will execute after the first one. The first one will likely succeed, and the second one will fail (for example in the case of create table, the second one will throw TableExistsException). One idea is to use client generated nonces (that we already have) to guard against these cases. The client will submit the request with the nonce and the nonce will be saved together with the procedure in the store. In case of a double submit, the nonce-cache is checked and the procId of the original request is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12015) Not cleaning Mob data when Mob CF is removed from table
[ https://issues.apache.org/jira/browse/HBASE-12015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617966#comment-14617966 ] Anoop Sam John commented on HBASE-12015: [~jingchengdu] Any chance for a patch? This is the only pending issue in MOB branch. Any of our Huawei friends ready to take this up? cc [~ashish singhi], [~ashutosh_jindal] Not cleaning Mob data when Mob CF is removed from table --- Key: HBASE-12015 URL: https://issues.apache.org/jira/browse/HBASE-12015 Project: HBase Issue Type: Bug Affects Versions: hbase-11339 Reporter: Anoop Sam John Fix For: hbase-11339 During modifyTable, if a MOB CF is removed from a table, the corresponding mob data also should get removed. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13415) Procedure V2 - Use nonces for double submits from client
[ https://issues.apache.org/jira/browse/HBASE-13415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617994#comment-14617994 ] Stephen Yuan Jiang commented on HBASE-13415: I run the TestModifyTableProcedure test suite under both maven (mvn) and eclipse - everything is normal. Procedure V2 - Use nonces for double submits from client Key: HBASE-13415 URL: https://issues.apache.org/jira/browse/HBASE-13415 Project: HBase Issue Type: Sub-task Components: master Reporter: Enis Soztutar Assignee: Stephen Yuan Jiang Priority: Blocker Fix For: 2.0.0, 1.2.0, 1.3.0 Attachments: HBASE-13415.v1-master.patch The client can submit a procedure, but before getting the procId back, the master might fail. In this case, the client request will fail and the client will re-submit the request. If 1.1 client or if there is no contention for the table lock, the time window is pretty small, but still might happen. If the proc was accepted and stored in the procedure store, a re-submit from the client will add another procedure, which will execute after the first one. The first one will likely succeed, and the second one will fail (for example in the case of create table, the second one will throw TableExistsException). One idea is to use client generated nonces (that we already have) to guard against these cases. The client will submit the request with the nonce and the nonce will be saved together with the procedure in the store. In case of a double submit, the nonce-cache is checked and the procId of the original request is returned. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12596) bulkload needs to follow locality
[ https://issues.apache.org/jira/browse/HBASE-12596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Xu updated HBASE-12596: -- Attachment: HBASE-12596-master-v5.patch HBASE-12596-0.98-v5.patch Address Ashish's comments and add a unit test for hbase.bulkload.locality.sensitive.enabled = true. bulkload needs to follow locality - Key: HBASE-12596 URL: https://issues.apache.org/jira/browse/HBASE-12596 Project: HBase Issue Type: Improvement Components: HFile, regionserver Affects Versions: 0.98.8 Environment: hadoop-2.3.0, hbase-0.98.8, jdk1.7 Reporter: Victor Xu Assignee: Victor Xu Fix For: 0.98.14 Attachments: HBASE-12596-0.98-v1.patch, HBASE-12596-0.98-v2.patch, HBASE-12596-0.98-v3.patch, HBASE-12596-0.98-v4.patch, HBASE-12596-0.98-v5.patch, HBASE-12596-master-v1.patch, HBASE-12596-master-v2.patch, HBASE-12596-master-v3.patch, HBASE-12596-master-v4.patch, HBASE-12596-master-v5.patch, HBASE-12596.patch Normally, we have 2 steps to perform a bulkload: 1. use a job to write HFiles to be loaded; 2. Move these HFiles to the right hdfs directory. However, the locality could be loss during the first step. Why not just write the HFiles directly into the right place? We can do this easily because StoreFile.WriterBuilder has the withFavoredNodes method, and we just need to call it in HFileOutputFormat's getNewWriter(). This feature is enabled by default, and we could use 'hbase.bulkload.locality.sensitive.enabled=false' to disable it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14028) DistributedLogReplay drops edits when ITBLL 125M
[ https://issues.apache.org/jira/browse/HBASE-14028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617997#comment-14617997 ] stack commented on HBASE-14028: --- I have been playing more with this. Losing data is pretty easy to do. Trying to find why the end of a WAL goes missing during replay; there is not enough info to debug and it is a little tough to trace where we're at at any one time. Trying to back fill. DistributedLogReplay drops edits when ITBLL 125M Key: HBASE-14028 URL: https://issues.apache.org/jira/browse/HBASE-14028 Project: HBase Issue Type: Bug Components: Recovery Affects Versions: 1.2.0 Reporter: stack Testing DLR before 1.2.0RC gets cut, we are dropping edits. Issue seems to be around replay into a deployed region that is on a server that dies before all edits have finished replaying. Logging is sparse on sequenceid accounting so can't tell for sure how it is happening (and if our now accounting by Store is messing up DLR). Digging. I notice also that DLR does not refresh its cache of region location on error -- it just keeps trying till whole WAL fails 8 retries...about 30 seconds. We could do a bit of refactor and have the replay find region in new location if moved during DLR replay. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13965) Stochastic Load Balancer JMX Metrics
[ https://issues.apache.org/jira/browse/HBASE-13965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14617922#comment-14617922 ] Lei Chen commented on HBASE-13965: -- Thanks for testing the patch and posting the result metrics. I agree that using percentage is easier for quick look. I will update the patch. Stochastic Load Balancer JMX Metrics Key: HBASE-13965 URL: https://issues.apache.org/jira/browse/HBASE-13965 Project: HBase Issue Type: Improvement Components: Balancer, metrics Reporter: Lei Chen Assignee: Lei Chen Attachments: HBASE-13965-v3.patch, HBASE-13965-v4.patch, HBASE-13965-v5.patch, HBASE-13965-v6.patch, HBASE-13965_v2.patch, HBase-13965-v1.patch, stochasticloadbalancerclasses_v2.png Today’s default HBase load balancer (the Stochastic load balancer) is cost function based. The cost function weights are tunable but no visibility into those cost function results is directly provided. A driving example is a cluster we have been tuning which has skewed rack size (one rack has half the nodes of the other few racks). We are tuning the cluster for uniform response time from all region servers with the ability to tolerate a rack failure. Balancing LocalityCost, RegionReplicaRack Cost and RegionCountSkew Cost is difficult without a way to attribute each cost function’s contribution to overall cost. What this jira proposes is to provide visibility via JMX into each cost function of the stochastic load balancer, as well as the overall cost of the balancing plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)