[jira] [Commented] (HIVE-14925) MSCK repair table hang while running with multi threading enabled
[ https://issues.apache.org/jira/browse/HIVE-14925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570328#comment-15570328 ] Ratheesh Kamoor commented on HIVE-14925: Are you trying with partitions in hdfs? You may not run into issues if threads are fast enough to finish execution before recursive call happens, File systems like S3 will clearly shows error due to n/w latency. > MSCK repair table hang while running with multi threading enabled > - > > Key: HIVE-14925 > URL: https://issues.apache.org/jira/browse/HIVE-14925 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.2.0 >Reporter: Ratheesh Kamoor >Assignee: Ratheesh Kamoor >Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-14925.patch > > > MSCK REPAIR TABLE hanging while running with multi-threading enabled > (default). I think it is because of a major design flaw in how thread pool > implemented in HiveMetaSoreChecker class / checkPartitionDirs method. This > method has a thread pool which register Callable but callable makes a > recursive call to checkPartitionDirs method again. This code will hang when > number of directories is more than thread pool size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14925) MSCK repair table hang while running with multi threading enabled
[ https://issues.apache.org/jira/browse/HIVE-14925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15570225#comment-15570225 ] Ratheesh Kamoor commented on HIVE-14925: Done. This first time I am using RB tool, please let me know if I need to provide more info. Thx > MSCK repair table hang while running with multi threading enabled > - > > Key: HIVE-14925 > URL: https://issues.apache.org/jira/browse/HIVE-14925 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.2.0 >Reporter: Ratheesh Kamoor >Assignee: Rajesh Balamohan >Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-14925.patch > > > MSCK REPAIR TABLE hanging while running with multi-threading enabled > (default). I think it is because of a major design flaw in how thread pool > implemented in HiveMetaSoreChecker class / checkPartitionDirs method. This > method has a thread pool which register Callable but callable makes a > recursive call to checkPartitionDirs method again. This code will hang when > number of directories is more than thread pool size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HIVE-14925) MSCK repair table hang while running with multi threading enabled
[ https://issues.apache.org/jira/browse/HIVE-14925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15569746#comment-15569746 ] Ratheesh Kamoor commented on HIVE-14925: [~pxiong] I moved the logic in inline callable to an external class so that code can be reused in with multi-threaded and non-multi threaded scenario. Also, it will fix the issues of thread lock. Could you please review. Tested with very large partitions (5K+) we have and worked fine. > MSCK repair table hang while running with multi threading enabled > - > > Key: HIVE-14925 > URL: https://issues.apache.org/jira/browse/HIVE-14925 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.2.0 >Reporter: Ratheesh Kamoor >Assignee: Pengcheng Xiong >Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-14925.patch > > > MSCK REPAIR TABLE hanging while running with multi-threading enabled > (default). I think it is because of a major design flaw in how thread pool > implemented in HiveMetaSoreChecker class / checkPartitionDirs method. This > method has a thread pool which register Callable but callable makes a > recursive call to checkPartitionDirs method again. This code will hang when > number of directories is more than thread pool size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14925) MSCK repair table hang while running with multi threading enabled
[ https://issues.apache.org/jira/browse/HIVE-14925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratheesh Kamoor updated HIVE-14925: --- Attachment: HIVE-14925.patch > MSCK repair table hang while running with multi threading enabled > - > > Key: HIVE-14925 > URL: https://issues.apache.org/jira/browse/HIVE-14925 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.2.0 >Reporter: Ratheesh Kamoor >Assignee: Pengcheng Xiong >Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-14925.patch > > > MSCK REPAIR TABLE hanging while running with multi-threading enabled > (default). I think it is because of a major design flaw in how thread pool > implemented in HiveMetaSoreChecker class / checkPartitionDirs method. This > method has a thread pool which register Callable but callable makes a > recursive call to checkPartitionDirs method again. This code will hang when > number of directories is more than thread pool size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HIVE-14925) MSCK repair table hang while running with multi threading enabled
[ https://issues.apache.org/jira/browse/HIVE-14925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratheesh Kamoor updated HIVE-14925: --- Fix Version/s: 2.2.0 Release Note: Issue: MSCK is failing in multithreaded execution Solution: - Moved Path processor logic to an external class which will avoid code duplication and it will be used in both multi-threaded and single threaded execution. Status: Patch Available (was: Open) > MSCK repair table hang while running with multi threading enabled > - > > Key: HIVE-14925 > URL: https://issues.apache.org/jira/browse/HIVE-14925 > Project: Hive > Issue Type: Bug > Components: CLI >Affects Versions: 2.2.0 >Reporter: Ratheesh Kamoor >Assignee: Pengcheng Xiong >Priority: Critical > Fix For: 2.2.0 > > Attachments: HIVE-14925.patch > > > MSCK REPAIR TABLE hanging while running with multi-threading enabled > (default). I think it is because of a major design flaw in how thread pool > implemented in HiveMetaSoreChecker class / checkPartitionDirs method. This > method has a thread pool which register Callable but callable makes a > recursive call to checkPartitionDirs method again. This code will hang when > number of directories is more than thread pool size. -- This message was sent by Atlassian JIRA (v6.3.4#6332)