[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...

2016-09-19 Thread Ben-Zvi
Github user Ben-Zvi closed the pull request at:

https://github.com/apache/drill/pull/585


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...

2016-09-16 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/585#discussion_r79267883
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
@@ -592,11 +592,14 @@ public BatchGroup 
mergeAndSpill(LinkedList batchGroups) throws Schem
   }
   injector.injectChecked(context.getExecutionControls(), 
INTERRUPTION_WHILE_SPILLING, IOException.class);
   newGroup.closeOutputStream();
-} catch (Exception e) {
+} catch (Throwable e) {
   // we only need to cleanup newGroup if spill failed
-  AutoCloseables.close(e, newGroup);
+  try {
+AutoCloseables.close(e, newGroup);
+  } catch (Throwable t) { /* close() may hit the same IO issue; just 
ignore */ }
--- End diff --

The root cause for the whole bug is in Hadoop's RawLocalFileSystem.java:

package org.apache.hadoop.fs;
.
public void write(byte[] b, int off, int len) throws IOException {
  try {
fos.write(b, off, len);
  } catch (IOException e) {// unexpected exception
throw new FSError(e);  // assume native fs error
  }
}

And FSError is not a subclass of IOException !!!  

java.lang.Object
java.lang.Throwable
java.lang.Error
org.apache.hadoop.fs.FSError

So the only common ancestor is Throwable .  And any part in the drill code 
that catches only IOException will not catch !!





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...

2016-09-16 Thread Ben-Zvi
Github user Ben-Zvi commented on a diff in the pull request:

https://github.com/apache/drill/pull/585#discussion_r79255636
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
@@ -592,11 +592,14 @@ public BatchGroup 
mergeAndSpill(LinkedList batchGroups) throws Schem
   }
   injector.injectChecked(context.getExecutionControls(), 
INTERRUPTION_WHILE_SPILLING, IOException.class);
   newGroup.closeOutputStream();
-} catch (Exception e) {
+} catch (Throwable e) {
   // we only need to cleanup newGroup if spill failed
-  AutoCloseables.close(e, newGroup);
+  try {
+AutoCloseables.close(e, newGroup);
+  } catch (Throwable t) { /* close() may hit the same IO issue; just 
ignore */ }
--- End diff --

In the case of no disk space to spill, close() tries to cleanup by calling 
flushBuffer() which eventually throws the same exception as there's still no 
space:

at java.io.FileOutputStream.write(FileOutputStream.java:326)
  at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:246)
  at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
  at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
  - locked <0x24e5> (a java.io.BufferedOutputStream)
  at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.write(FSDataOutputStream.java:58)
  at java.io.DataOutputStream.write(DataOutputStream.java:107)
  - locked <0x24e7> (a org.apache.hadoop.fs.FSDataOutputStream)
  at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.writeChunk(ChecksumFileSystem.java:419)
  at 
org.apache.hadoop.fs.FSOutputSummer.writeChecksumChunks(FSOutputSummer.java:206)
  at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:163)
  - locked <0x24e8> (a 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer)
  at 
org.apache.hadoop.fs.FSOutputSummer.flushBuffer(FSOutputSummer.java:144)
  at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:407)
  at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:72)
  at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:106)
  at 
org.apache.drill.exec.physical.impl.xsort.BatchGroup.close(BatchGroup.java:169)
  at 
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76)
  at 
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:53)
  at 
org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:43)
  at 
org.apache.drill.exec.physical.impl.xsort.ExternalSortBatch.mergeAndSpill(ExternalSortBatch.java:598)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...

2016-09-15 Thread amansinha100
Github user amansinha100 commented on a diff in the pull request:

https://github.com/apache/drill/pull/585#discussion_r79096107
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/xsort/ExternalSortBatch.java
 ---
@@ -592,11 +592,14 @@ public BatchGroup 
mergeAndSpill(LinkedList batchGroups) throws Schem
   }
   injector.injectChecked(context.getExecutionControls(), 
INTERRUPTION_WHILE_SPILLING, IOException.class);
   newGroup.closeOutputStream();
-} catch (Exception e) {
+} catch (Throwable e) {
   // we only need to cleanup newGroup if spill failed
-  AutoCloseables.close(e, newGroup);
+  try {
+AutoCloseables.close(e, newGroup);
+  } catch (Throwable t) { /* close() may hit the same IO issue; just 
ignore */ }
--- End diff --

It looks like  close(Throwable t, AutoCloseable) suppresses the exception; 
did you get an exception during testing ?  Otherwise, you could remove this 
second try-catch. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #585: DRILL-3898 : Sort spill was modified to catch all e...

2016-09-09 Thread Ben-Zvi
GitHub user Ben-Zvi opened a pull request:

https://github.com/apache/drill/pull/585

DRILL-3898 :  Sort spill was modified to catch all errors, ignore rep…

…eated errors while closing the new group and issue a more detailed error 
message.

Seems that the spilling IO can run into various kinds of errors (no space, 
failure to create a file,..) which are thrown as different exception classes. 
Hence changed the catch() statement to catch a more general Throwable , and add 
the exception's message for more detail (e.g., no disk space).

Before the change the "no disk space" Throwable was not caught, and thus 
execution continued.

Also the closing of the newGroup could hit some IO errors (e.g., when 
flushing), so a try/catch was added to ignore those.

Note that this change should also fix  DRILL-4542 ("if external sort fails 
to spill to disk, memory is leaked and wrong error message is displayed"). 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ben-Zvi/drill DRILL-3898

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/585.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #585


commit e988f1644be1d9fde24a489d94c7dbc54f8e82d8
Author: Boaz Ben-Zvi 
Date:   2016-09-09T23:36:03Z

DRILL-3898 :  Sort spill was modified to catch all errors, ignore repeated 
errors while closing the new group and issue a more detailed error message.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---