[
https://issues.apache.org/jira/browse/PIG-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12565733#action_12565733
]
Craig Macdonald commented on PIG-89:
------------------------------------
Thanks Benjamin. I can no longer produce the exception using the patch. I agree
that the close() was missing and this is the likely cause of the Too many open
files, as they were not being closed until the finalizer ran on those output
streams. Explicitly closing them as soon as they are written will resolve the
issue.
To increase the memory available to Pig, add
push @javaArgs, "-Xmx512M";
to your pigclient.conf
Cheers
C
> Too many spills to files causes ArrayIndexOutOfBoundsException if new temp
> file cant be created
> -----------------------------------------------------------------------------------------------
>
> Key: PIG-89
> URL: https://issues.apache.org/jira/browse/PIG-89
> Project: Pig
> Issue Type: Bug
> Components: data
> Environment: Linux, Local execution Mode, JDK 1.6
> Reporter: Craig Macdonald
> Attachments: patch-v2.defaultdatabag, patch.defaultdatabag
>
>
> Hello,
> I am experimenting, trying to perform a DISTINCT on a medium sized set of
> URLs - about 3million (same set as I discussed previously - Utkarsh has a
> copy), this time in local execution mode.
> Pig script:
> {{
> A = LOAD 'all_13122007.txt';
> B = DISTINCT A;
> store B into 'bla;
> }}
> Bring these errors (two lines swapped in DefaultDatabag) to find real error.
> {{
> 2008-02-04 18:09:44,756 [Low Memory Detector] INFO org.apache.pig - low
> memory handler called init = 29491200(28800K) used = 269834064(263509K)
> committed = 307036160(299840K) max = 471662592(460608K)
> 2008-02-04 18:09:45,355 [Low Memory Detector] ERROR org.apache.pig - Unable
> to spill contents to disk
> java.io.IOException: Too many open files
> at java.io.UnixFileSystem.createFileExclusively(Native Method)
> at java.io.File.checkAndCreate(File.java:1704)
> at java.io.File.createTempFile(File.java:1793)
> at java.io.File.createTempFile(File.java:1830)
> at org.apache.pig.data.DataBag.getSpillFile(DataBag.java:367)
> at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:69)
> at
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
> at
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
> at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
> at
> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
> at sun.management.Sensor.trigger(Sensor.java:120)
> java.lang.ArrayIndexOutOfBoundsException: -1
> at java.util.ArrayList.remove(ArrayList.java:390)
> at org.apache.pig.data.DefaultDataBag.spill(DefaultDataBag.java:84)
> at
> org.apache.pig.impl.util.SpillableMemoryManager.handleNotification(SpillableMemoryManager.java:123)
> at
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:138)
> at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
> at
> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
> at sun.management.Sensor.trigger(Sensor.java:120)
> Exception in thread "Low Memory Detector" java.lang.InternalError: Error in
> invoking listener
> at
> sun.management.NotificationEmitterSupport.sendNotification(NotificationEmitterSupport.java:141)
> at sun.management.MemoryImpl.createNotification(MemoryImpl.java:171)
> at
> sun.management.MemoryPoolImpl$CollectionSensor.triggerAction(MemoryPoolImpl.java:300)
> at sun.management.Sensor.trigger(Sensor.java:120)
> }}
> There are a two sub-issues here:
> 1. Pig spills too much using a default JVM (64MB) size - expected?
> Perhaps pig.pl should set a default JVM size of more than 64MB?
> 2. the line DefaultDataBag.java:84
> {{{
> mSpillFiles.remove(mSpillFiles.size() - 1);
> }}}
> line should check that mSpillFiles.size() > 0, because if
> File.createTempFile( ) in Databag.getSpillFile() fails, the mSpillFiles will
> not yet have been updated. My preference would be to split try{ } catch
> (IOException ioe) { } within DefaultDatabag.spill() into two exception
> handlers - one for getSpillFile() errors, and one for actual writing errors
> (when we know mSpillFiles has been added to).
> If this latter point isnt coherent, I can create patch.
> Ta muchly.
> C
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.