[jira] [Updated] (CASSANDRA-2755) ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2755: -- Priority: Minor (was: Major) Affects Version/s: (was: 0.8.0) 0.7.0 Fix Version/s: 0.8.2 0.7.7 ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer --- Key: CASSANDRA-2755 URL: https://issues.apache.org/jira/browse/CASSANDRA-2755 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.7.0 Reporter: Greg Katz Assignee: Mck SembWever Priority: Minor Fix For: 0.7.7, 0.8.2 Attachments: 2755-v2.txt, CASSANDRA-2755.patch There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread): # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted. # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread. # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits. # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything. This race condition means that intermittently write failures will go undetected. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2755) ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Ellis updated CASSANDRA-2755: -- Attachment: 2755-v2.txt It looks to me that as long as we check for the exception before calling join, there will be a window to miss one. v2 encapsulates RangeClient.close better to avoid this. ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer --- Key: CASSANDRA-2755 URL: https://issues.apache.org/jira/browse/CASSANDRA-2755 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.0 Reporter: Greg Katz Assignee: Mck SembWever Attachments: 2755-v2.txt, CASSANDRA-2755.patch There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread): # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted. # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread. # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits. # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything. This race condition means that intermittently write failures will go undetected. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (CASSANDRA-2755) ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer
[ https://issues.apache.org/jira/browse/CASSANDRA-2755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mck SembWever updated CASSANDRA-2755: - Attachment: CASSANDRA-2755.patch In RangeClient i cannot see why close() needs to be called before lastException is assigned. The following patch should work: I have tested it against various jobs but i have no reproducible testcase to confirm this bug against. Also in the patch is a slight cleanup to ColumnFamilyRecordWriter's close() methods: keeping implementation out of deprecated methods. ColumnFamilyRecordWriter fails to throw a write exception encountered after the user begins to close the writer --- Key: CASSANDRA-2755 URL: https://issues.apache.org/jira/browse/CASSANDRA-2755 Project: Cassandra Issue Type: Bug Components: Hadoop Affects Versions: 0.8.0 Reporter: Greg Katz Assignee: Mck SembWever Attachments: CASSANDRA-2755.patch There appears to be a race condition in {{ColumnFamilyRecordWriter}} that can result in the loss of an exception. Here is how it can happen (W stands for the {{RangeClient}}'s worker thread; U stands for the {{ColumnFamilyRecordWriter}} user's thread): # W: {{RangeClient}}'s {{run}} method catches an exception originating in the Thrift client/socket, but doesn't get a chance to set it on the {{lastException}} field before it the thread is preempted. # U: The user calls {{close}} which calls {{stopNicely}}. Because the {{lastException}} field is null, {{stopNicely}} does not throw anything. {{close}} then joins on the worker thread. # W: The {{RangeClient}}'s {{run}} method sets the {{lastException}} field and exits. # U: Although the thread in {{close}} is waiting for the worker thread to exit, it has already checked the {{lastException}} field so it doesn't detect the presence of the last exception. Instead, {{close}} returns without throwing anything. This race condition means that intermittently write failures will go undetected. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira