date:20100526

[jira] Created: (PIG-1427) Monitor and kill runaway UDFs

2010-05-26 Thread Dmitriy V. Ryaboy (JIRA)

Monitor and kill runaway UDFs
-

 Key: PIG-1427
 URL: https://issues.apache.org/jira/browse/PIG-1427
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy


As a safety measure, it is sometimes useful to monitor UDFs as they execute. It 
is often preferable to return null or some other default value instead of 
timing out a runaway evaluation and killing a job. We have in the past seen 
complex regular expressions lead to job failures due to just half a dozen (out 
of millions) particularly obnoxious strings.

It would be great to give Pig users a lightweight way of enabling UDF 
monitoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

2010-05-26 Thread Dirk Schmid (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871719#action_12871719
 ] 

Dirk Schmid commented on PIG-766:
-

{quote}1. Are you getting the exact same stack trace as mentioned in the 
jira?{quote}
Yes the same and some similar traces:
{noformat}
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2786)
at java.io.ByteArrayOutputStream.write(ByteArrayOutputStream.java:94)
at java.io.DataOutputStream.write(DataOutputStream.java:90)
at java.io.FilterOutputStream.write(FilterOutputStream.java:80)
at 
org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:279)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
at 
org.apache.pig.data.DefaultAbstractBag.write(DefaultAbstractBag.java:249)
at 
org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:214)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
at 
org.apache.pig.data.DataReaderWriter.writeDatum(DataReaderWriter.java:209)
at org.apache.pig.data.DefaultTuple.write(DefaultTuple.java:264)
at 
org.apache.pig.impl.io.PigNullableWritable.write(PigNullableWritable.java:123)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:90)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableSerializer.serialize(WritableSerialization.java:77)
at org.apache.hadoop.mapred.IFile$Writer.append(IFile.java:179)
at 
org.apache.hadoop.mapred.Task$CombineOutputCollector.collect(Task.java:880)
at 
org.apache.hadoop.mapred.Task$NewCombinerRunner$OutputConverter.write(Task.java:1201)
at 
org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:199)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:51)
at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:176)
at 
org.apache.hadoop.mapred.Task$NewCombinerRunner.combine(Task.java:1222)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2563)
at 
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2501)



java.lang.OutOfMemoryError: Java heap space
at org.apache.pig.data.DefaultTuple.(DefaultTuple.java:58)
at 
org.apache.pig.data.DefaultTupleFactory.newTuple(DefaultTupleFactory.java:35)
at 
org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:61)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:142)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
at 
org.apache.pig.data.DefaultAbstractBag.readFields(DefaultAbstractBag.java:263)
at 
org.apache.pig.data.DataReaderWriter.bytesToBag(DataReaderWriter.java:71)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:145)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
at 
org.apache.pig.data.DataReaderWriter.bytesToTuple(DataReaderWriter.java:63)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:142)
at 
org.apache.pig.data.DataReaderWriter.readDatum(DataReaderWriter.java:136)
at org.apache.pig.data.DefaultTuple.readFields(DefaultTuple.java:284)
at 
org.apache.pig.impl.io.PigNullableWritable.readFields(PigNullableWritable.java:114)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:67)
at 
org.apache.hadoop.io.serializer.WritableSerialization$WritableDeserializer.deserialize(WritableSerialization.java:40)
at 
org.apache.hadoop.mapreduce.ReduceContext.nextKeyValue(ReduceContext.java:116)
at 
org.apache.hadoop.mapreduce.ReduceContext$ValueIterator.next(ReduceContext.java:163)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POCombinerPackage.getNext(POCombinerPackage.java:155)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POMultiQueryPackage.getNext(POMultiQueryPackage.java:242)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:170)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:161)
at

[jira] Updated: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword

2010-05-26 Thread Jeff Zhang (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeff Zhang updated PIG-1249:


Attachment: PIG_1249_3.patch

Update the patch, including testcase of non-dfs input and do path checking when 
doing estimation



 Safe-guards against misconfigured Pig scripts without PARALLEL keyword
 --

 Key: PIG-1249
 URL: https://issues.apache.org/jira/browse/PIG-1249
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Arun C Murthy
Assignee: Jeff Zhang
Priority: Critical
 Fix For: 0.8.0

 Attachments: PIG-1249.patch, PIG_1249_2.patch, PIG_1249_3.patch


 It would be *very* useful for Pig to have safe-guards against naive scripts 
 which process a *lot* of data without the use of PARALLEL keyword.
 We've seen a fair number of instances where naive users process huge 
 data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1381) Need a way for Pig to take an alternative property file

2010-05-26 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Daniel Dai updated PIG-1381:

Status: Resolved (was: Patch Available)
Hadoop Flags: [Reviewed]
Resolution: Fixed

Manual test pass. Patch committed. Thanks V.V.Chaitanya!

Need a way for Pig to take an alternative property file
---

Key: PIG-1381
URL: https://issues.apache.org/jira/browse/PIG-1381
Project: Pig
Issue Type: Improvement
Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: V.V.Chaitanya Krishna
Fix For: 0.8.0

Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch,
PIG-1381-4.patch, PIG-1381-5.patch, PIG-1381_cli_1.patch, PIG-1381_cli_2.patch

Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a
default pig.properties and if user have a different pig.properties, there
will be a conflict since we can only read one. There are couple of ways to
solve it:
1. Give a command line option for user to pass an additional property file
2. Change the name for default pig.properties to pig-default.properties, and
user can give a pig.properties to override
3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems
to be more natural for hadoop community. If so, we shall provide backward
compatibility to also read pig.properties, pig-cluster-hadoop-site.xml.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1419) Remove user.name from JobConf

2010-05-26 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1419:


  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

Manual test pass. Patch committed.

 Remove user.name from JobConf
 ---

 Key: PIG-1419
 URL: https://issues.apache.org/jira/browse/PIG-1419
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1419-1.patch, PIG-1419-2.patch


 In hadoop security, hadoop will use kerberos id instead of unix id. Pig 
 should not set user.name entry in jobconf. This should be decided by hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1347) Clear up output directory for a failed job

2010-05-26 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1347:


Attachment: PIG-1347-1.patch

Remove redundant code.

 Clear up output directory for a failed job
 --

 Key: PIG-1347
 URL: https://issues.apache.org/jira/browse/PIG-1347
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Ashitosh Darbarwar
 Fix For: 0.8.0

 Attachments: PIG-1347-1.patch


 FileLocalizer.deleteOnFail suppose to track the output files need to be 
 deleted in case the job fails. However, in the current code base, 
 deleteOnFail is dangling. registerDeleteOnFail and triggerDeleteOnFail is 
 called by nobody. We need to bring it back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1347) Clear up output directory for a failed job

2010-05-26 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1347:


Attachment: (was: PIG-1347-1.patch)

 Clear up output directory for a failed job
 --

 Key: PIG-1347
 URL: https://issues.apache.org/jira/browse/PIG-1347
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Ashitosh Darbarwar
 Fix For: 0.8.0

 Attachments: PIG-1347-1.patch


 FileLocalizer.deleteOnFail suppose to track the output files need to be 
 deleted in case the job fails. However, in the current code base, 
 deleteOnFail is dangling. registerDeleteOnFail and triggerDeleteOnFail is 
 called by nobody. We need to bring it back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1347) Clear up output directory for a failed job

2010-05-26 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1347:


Attachment: PIG-1347-1.patch

 Clear up output directory for a failed job
 --

 Key: PIG-1347
 URL: https://issues.apache.org/jira/browse/PIG-1347
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Ashitosh Darbarwar
 Fix For: 0.8.0

 Attachments: PIG-1347-1.patch


 FileLocalizer.deleteOnFail suppose to track the output files need to be 
 deleted in case the job fails. However, in the current code base, 
 deleteOnFail is dangling. registerDeleteOnFail and triggerDeleteOnFail is 
 called by nobody. We need to bring it back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1419) Remove user.name from JobConf

2010-05-26 Thread Pradeep Kamath (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871845#action_12871845
 ] 

Pradeep Kamath commented on PIG-1419:
-

+1

 Remove user.name from JobConf
 ---

 Key: PIG-1419
 URL: https://issues.apache.org/jira/browse/PIG-1419
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1419-1.patch, PIG-1419-2.patch


 In hadoop security, hadoop will use kerberos id instead of unix id. Pig 
 should not set user.name entry in jobconf. This should be decided by hadoop.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1347) Clear up output directory for a failed job

2010-05-26 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871862#action_12871862
 ] 

Ashutosh Chauhan commented on PIG-1347:
---

Patch is pretty straightforward and harmless as it only removes code and does 
not add any thing new. Only concern I have is 
FileLocalizer.registerDeleteOnFail() is a public method so its possible that 
some one using Pig's java api is using this method to do the cleanup himself 
previously.  So, this can be considered as backward incompatible change. But, 
Daniel explained to me that this method was meant for Pig's internal usage and 
clean up in any case was taken care by Pig before the recent store func 
changes, so user need not to worry about it. So, its extremely unlikely that 
someone is using it. 
So, +1 on committing.

 Clear up output directory for a failed job
 --

 Key: PIG-1347
 URL: https://issues.apache.org/jira/browse/PIG-1347
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Ashitosh Darbarwar
 Fix For: 0.8.0

 Attachments: PIG-1347-1.patch


 FileLocalizer.deleteOnFail suppose to track the output files need to be 
 deleted in case the job fails. However, in the current code base, 
 deleteOnFail is dangling. registerDeleteOnFail and triggerDeleteOnFail is 
 called by nobody. We need to bring it back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (PIG-1347) Clear up output directory for a failed job

2010-05-26 Thread Daniel Dai (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1347.
-

Hadoop Flags: [Reviewed]
  Resolution: Fixed

Patch committed.

 Clear up output directory for a failed job
 --

 Key: PIG-1347
 URL: https://issues.apache.org/jira/browse/PIG-1347
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Ashitosh Darbarwar
 Fix For: 0.8.0

 Attachments: PIG-1347-1.patch


 FileLocalizer.deleteOnFail suppose to track the output files need to be 
 deleted in case the job fails. However, in the current code base, 
 deleteOnFail is dangling. registerDeleteOnFail and triggerDeleteOnFail is 
 called by nobody. We need to bring it back.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

does EvalFunc generate the entire bag always ?

2010-05-26 Thread hc busy

Hey, guys, how are Bags passed to EvalFunc stored?

I was looking at the Accumulator interface and it says that the reason why
this needed for COUNT and SUM is because EvalFunc always gives you the
entire bag when the EvalFunc is run on a bag.

I always thought if I did COUNT(TABLE) or SUM(TABLE.FIELD), and the code
inside that does


for(Tuple entry:inputDataBag){
  stuff
}


was an actual iterator that iterated on the bag sequentially without
necessarily having the entire bag in memory all at once. ?? Because it's an
iterator, so there's no way to do anything other than to stream through it.

I'm looking at this because Accumulator has no way of telling Pig I've seen
enough It streams through the entire bag no matter what happens. (like,
hypothetically speaking, if I was writing 5th item of a sorted bag udf),
after I see 5th of a 5 million entry bag, I want to stop executing if
possible.

Is there a easy way to make this happen?

[jira] Commented: (PIG-1424) Error logs of streaming should not be placed in output location

2010-05-26 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1424?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871902#action_12871902
 ] 

Ashutosh Chauhan commented on PIG-1424:
---

Till we figure out a proper solution for this, one possibility is to wrap the 
code in my previous comment into try-catch block. That will unblock PIG-1229 
for commit. We can leave this ticket open if we feel there is a need for a 
better solution. 

 Error logs of streaming should not be placed in output location
 ---

 Key: PIG-1424
 URL: https://issues.apache.org/jira/browse/PIG-1424
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Ashutosh Chauhan
 Fix For: 0.8.0


 This becomes a problem when output location is anything other then a 
 filesystem. Output will be written to DB but where the logs generated by 
 streaming should go? Clearly, they cant be written into DB. This blocks 
 PIG-1229 which introduces writing to DB from Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-928) UDFs in scripting languages

2010-05-26 Thread Arnab Nandi (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872007#action_12872007
 ] 

Arnab Nandi commented on PIG-928:
-

Thanks for looking into the patch Ashutosh! Very good question, short answer: I 
couldn't come up with an elegant solution using {{define}}  :)
 
I spent a bunch of time thinking about the right thing to do before going 
this way. As Woody mentioned, my initial instinct was to do this in in 
{{define}}, but kept hitting roadblocks when working with {{define}}:

# I came up with the analogy that register is like import in java, and 
define is like alias in bash. In this interpretation, whenever you want to 
introduce new code, you {{register}} it with Pig. Whenever you want to alias 
anything for convenience or to add meta-information, you {{define}} it. 
# Define is not amenable to multiple functions in the same script. 
#* For example, to follow the {{stream}} convention, {quote} \{define X 'x.py' 
[inputoutputspec][schemaspec];\}. {quote} Which function is the input/output 
spec for? A solution like {quote} \{[func1():schemaspec1,func2:schemaspec2]} 
{quote} is... ugly.
#* Further, how do we access these functions? One solution is to have the 
namespace as a codeblock, e.g. X.func1(), which is doable by registering 
functions as X.func1, but we're (mis)leading the user to believe there is 
some sort of real namespacing going on. I foresee multi-function files as a 
very common use case; people could have a util.py with their commonly used 
suite of functions instead of forcing 1 file per 2-3 line function. 
#* Note that Julien's @decorator idea cleanly solves this problem and I think 
it'll work for all languages.
# With inline {{define}}, most languages have the convention of mentioning 
function definitions with the function name, input references  return schema 
spec, it seems redundant to force the user to break this convention and have 
something like {quote} \{define x as script('def X(a,b): return a + b;');}, 
{quote} and have x.X(). Lambdas can solve this problem halfway, you'll need to 
then worry about the schema spec and we're back at a kludgy solution!
# My plan for inline functions is to write all to a temp file (1 per script 
engine) and then deal with them as registering a file.
# Jython code runs in its own interpreter because I couldn't figure out how to 
load Jython bytecode into Java, this has something to do with the lack of a 
jythonc afaik(I may be wrong). There will be one interpreter per non-compilable 
scriptengine, for others(Janino, Groovy), we load the class directly into the 
runtime.
# From a code-writing perspective, overloading {{define}} to tack on a third 
use-case despite would involve an overhaul to the POStream physical operator 
and felt very inelegant; register on the other hand is well contained to a 
single purpose -- including files for UDFs.
# Consider the use of Janino as a ScriptEngine. Unlike the Jython scriptengine, 
this loads java UDFs into the native runtime and doesn't translate objects; so 
we're looking at potentially _zero_ loss of performance for inline UDFs (or 
register 'UDF.java'; ). The difference between native and script code gets 
blurry here...

[tl;dr] ...and then I thought fair enough, let's just go with {{register}}! :D

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, pig-greek.tgz, 
 pig.scripting.patch.arnab, pyg.tgz, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1427) Monitor and kill runaway UDFs

2010-05-26 Thread Ashutosh Chauhan (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12872031#action_12872031
 ] 

Ashutosh Chauhan commented on PIG-1427:
---

A useful feature. Couple of comments:

1. Currently in case of time outs and error you are always returning null. It 
will be useful if user can specify a default return value as a definition of 
his annotation which is returned in those cases. For example if my regex fails 
on an input String, I want to return an empty String back. Something like:
{code}
 @MonitoredUDF(timeUnit = TimeUnit.MILLISECONDS, duration = 500, 
defaultReturnValue = )
{code} 

2. It seems that PigHadoopLogger.getReporter() method accidentally got removed 
in 0.7 and trunk. This needs to be restored. It will be really cool to see how 
many of my input records are faulty on UI. Since, it is a small change, I think 
you can add that getter method in there and then update the appropriate 
counters. 

 Monitor and kill runaway UDFs
 -

 Key: PIG-1427
 URL: https://issues.apache.org/jira/browse/PIG-1427
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.8.0
Reporter: Dmitriy V. Ryaboy
Assignee: Dmitriy V. Ryaboy
 Attachments: monitoredUdf.patch


 As a safety measure, it is sometimes useful to monitor UDFs as they execute. 
 It is often preferable to return null or some other default value instead of 
 timing out a runaway evaluation and killing a job. We have in the past seen 
 complex regular expressions lead to job failures due to just half a dozen 
 (out of millions) particularly obnoxious strings.
 It would be great to give Pig users a lightweight way of enabling UDF 
 monitoring.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1427) Monitor and kill runaway UDFs

[jira] Commented: (PIG-766) ava.lang.OutOfMemoryError: Java heap space

[jira] Updated: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword

[jira] Updated: (PIG-1381) Need a way for Pig to take an alternative property file

[jira] Updated: (PIG-1419) Remove user.name from JobConf

[jira] Updated: (PIG-1347) Clear up output directory for a failed job

[jira] Updated: (PIG-1347) Clear up output directory for a failed job

[jira] Updated: (PIG-1347) Clear up output directory for a failed job

[jira] Commented: (PIG-1419) Remove user.name from JobConf

[jira] Commented: (PIG-1347) Clear up output directory for a failed job

[jira] Resolved: (PIG-1347) Clear up output directory for a failed job

does EvalFunc generate the entire bag always ?

[jira] Commented: (PIG-1424) Error logs of streaming should not be placed in output location

[jira] Commented: (PIG-928) UDFs in scripting languages

[jira] Commented: (PIG-1427) Monitor and kill runaway UDFs

15 matches

Site Navigation

Mail list logo

Footer information