[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system
[ https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775834#comment-13775834 ] Shuaishuai Nie commented on HIVE-4773: -- Thank [~ekoifman] and [~hari.s] for the comments. -Eugene, I don't think this is a problem when watcher first assignes stderr/stdout to 'out' and then reassigns 'out' to 'statusdir'. Only the last assign of 'out' matters. The fix will ensure stdout/stderr won't be closed when calling writer.close() by override the close function if the 'out' is actrually point to stdout/stderr when calling writer.close(). -Hari 1. I am not sure why close() should immediately close if flush() does not perform the same thing. As I mentioned in the earlier comment, flush() will not ensure the content of stream written to file based on the book Hadoop the definitive guide. It won't write to file if a block is not filled in distribute file system. 2. Inside run() of Watcher why do you need to create a new object using PrintWriter writer = new PrintWriter(out); I didn't change it in my patch. It is in the origin code base. I think it is needed by the format of log in the output file. 3. Even if you add CustomFilterOutputStream class, why do you need to add flush() inside close(). This looks like you are flushing twice. This flush() is not necessary here. Just in case this class is used in somewhere else and flush may work there. 4. Do you necessarily need to make CustomFilterOutputStream class public. It doesnt look like its used elsewhere. For now it is not used anywhere else, I think it is ok to change it to protected. Templeton intermittently fail to commit output to file system - Key: HIVE-4773 URL: https://issues.apache.org/jira/browse/HIVE-4773 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-4773.1.patch, HIVE-4773.2.patch With ASV as a default FS, we saw instances where output is not fully flushed to storage before the Templeton controller process exits. This results in stdout and stderr being empty even though the job completed successfully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system
[ https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775848#comment-13775848 ] Eugene Koifman commented on HIVE-4773: -- [~shuainie] OK, I misread your code. You only use CustomFilterOutputStream to wrap System.out/System.err but not when 'out' = statusdir. I get it now, so your changes do the same thing as I was suggesting in previous comment. I would suggest calling this wrapper class NonClosableStream, and making close() method in it do nothing. (also make class private). I think this will make it easier to understand. Templeton intermittently fail to commit output to file system - Key: HIVE-4773 URL: https://issues.apache.org/jira/browse/HIVE-4773 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-4773.1.patch, HIVE-4773.2.patch With ASV as a default FS, we saw instances where output is not fully flushed to storage before the Templeton controller process exits. This results in stdout and stderr being empty even though the job completed successfully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system
[ https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13775884#comment-13775884 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-4773: - +1 Templeton intermittently fail to commit output to file system - Key: HIVE-4773 URL: https://issues.apache.org/jira/browse/HIVE-4773 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-4773.1.patch, HIVE-4773.2.patch, HIVE-4773.3.patch With ASV as a default FS, we saw instances where output is not fully flushed to storage before the Templeton controller process exits. This results in stdout and stderr being empty even though the job completed successfully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system
[ https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772927#comment-13772927 ] Hari Sankar Sivarama Subramaniyan commented on HIVE-4773: - Hi Shuaishuai I have some qns: 1. I am not sure why close() should immediately close if flush() does not perform the same thing.(Eugene' qn) 2. Inside run() of Watcher why do you need to create a new object using PrintWriter writer = new PrintWriter(out); Cant you use 'out' directly instead which will call the corresponding fns depending on the underlying class. Will this not fix the issue ? 3. Even if you add CustomFilterOutputStream class, why do you need to add flush() inside close(). This looks like you are flushing twice. 4. Do you necessarily need to make CustomFilterOutputStream class public. It doesnt look like its used elsewhere. Thanks Hari Templeton intermittently fail to commit output to file system - Key: HIVE-4773 URL: https://issues.apache.org/jira/browse/HIVE-4773 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-4773.1.patch, HIVE-4773.2.patch With ASV as a default FS, we saw instances where output is not fully flushed to storage before the Templeton controller process exits. This results in stdout and stderr being empty even though the job completed successfully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system
[ https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13773284#comment-13773284 ] Eugene Koifman commented on HIVE-4773: -- Here is a possible compromise the c'tor of Watcher first assigns stderr/stdout to 'out' but then it reassigns 'out' to point to 'statusdir' file (assuming that parameter is set). So we could add a flag that says if(out uses a user specified file) { then call writer.close() as this is the only entity writing to it } else { just call writer.flush() as before and hope for the best } Templeton intermittently fail to commit output to file system - Key: HIVE-4773 URL: https://issues.apache.org/jira/browse/HIVE-4773 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-4773.1.patch, HIVE-4773.2.patch With ASV as a default FS, we saw instances where output is not fully flushed to storage before the Templeton controller process exits. This results in stdout and stderr being empty even though the job completed successfully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system
[ https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13757365#comment-13757365 ] Shuaishuai Nie commented on HIVE-4773: -- The problem seems not exclusive for asv. According to Hadoop the definitive guide 3nd edition P75, HDFS trades off some POSIX requirements for performance, so some operations may behave differently than you expect them to. any content written to the file is not guaranteed to be visible, even if the stream is flushed. Not sure if this will break Yarn if it does container reuse. One safer way is to use FSDataOutputStream instead of PrintWriter which implement function sync() to ensure data written up to that point in the file is visible to user in HDFS. Templeton intermittently fail to commit output to file system - Key: HIVE-4773 URL: https://issues.apache.org/jira/browse/HIVE-4773 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-4773.1.patch With ASV as a default FS, we saw instances where output is not fully flushed to storage before the Templeton controller process exits. This results in stdout and stderr being empty even though the job completed successfully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system
[ https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755443#comment-13755443 ] Hive QA commented on HIVE-4773: --- {color:green}Overall{color}: +1 all checks pass Here are the results of testing the latest attachment: https://issues.apache.org/jira/secure/attachment/12588951/HIVE-4773.1.patch {color:green}SUCCESS:{color} +1 2902 tests passed Test results: https://builds.apache.org/job/PreCommit-HIVE-Build/583/testReport Console output: https://builds.apache.org/job/PreCommit-HIVE-Build/583/console Messages: {noformat} Executing org.apache.hive.ptest.execution.PrepPhase Executing org.apache.hive.ptest.execution.ExecutionPhase Executing org.apache.hive.ptest.execution.ReportingPhase {noformat} This message is automatically generated. Templeton intermittently fail to commit output to file system - Key: HIVE-4773 URL: https://issues.apache.org/jira/browse/HIVE-4773 Project: Hive Issue Type: Bug Components: WebHCat Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-4773.1.patch With ASV as a default FS, we saw instances where output is not fully flushed to storage before the Templeton controller process exits. This results in stdout and stderr being empty even though the job completed successfully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HIVE-4773) Templeton intermittently fail to commit output to file system
[ https://issues.apache.org/jira/browse/HIVE-4773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13755385#comment-13755385 ] Eugene Koifman commented on HIVE-4773: -- It's not clear why this solves the described problem. The code already called writer.flush() which is now also followed by writer.close(). Why would flush() not be sufficient? Also, 'writer' is connected to System.err (or System.out). Is it really safe/OK to close this stream? What about when this runs on Yarn which I believe does some container reuse? Templeton intermittently fail to commit output to file system - Key: HIVE-4773 URL: https://issues.apache.org/jira/browse/HIVE-4773 Project: Hive Issue Type: Bug Reporter: Shuaishuai Nie Assignee: Shuaishuai Nie Attachments: HIVE-4773.1.patch With ASV as a default FS, we saw instances where output is not fully flushed to storage before the Templeton controller process exits. This results in stdout and stderr being empty even though the job completed successfully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira