date:20100712

[jira] Commented: (PIG-1249) Safe-guards against misconfigured Pig scripts without PARALLEL keyword

2010-07-12 Thread Ashutosh Chauhan (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887283#action_12887283
]

Ashutosh Chauhan commented on PIG-1249:
---

Map-reduce framework has a jira related to this issue.
https://issues.apache.org/jira/browse/MAPREDUCE-1521 It has two implications
for Pig:

1) We need to reconsider whether we still want Pig to set number of reducers on
user's behalf. We can choose not to intelligently choose # of reducers and
let framework fail the job which doesn't correctly specify # of reducers.
Then, Pig is out of this guessing game and users are forced by framework to
correctly specify # of reducers.

2) Now that MR framework will fail the job based on configured limits,
operators where Pig does compute and set number of reducers (like skewed join
etc.) should now be aware of those limits so that # of reducers computed by
them fall within those limits.

Safe-guards against misconfigured Pig scripts without PARALLEL keyword
--

Key: PIG-1249
URL: https://issues.apache.org/jira/browse/PIG-1249
Project: Pig
Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Arun C Murthy
Assignee: Jeff Zhang
Priority: Critical
Fix For: 0.8.0

Attachments: PIG-1249-4.patch, PIG-1249.patch, PIG_1249_2.patch,
PIG_1249_3.patch

It would be *very* useful for Pig to have safe-guards against naive scripts
which process a *lot* of data without the use of PARALLEL keyword.
We've seen a fair number of instances where naive users process huge
data-sets (10TB) with badly mis-configured #reduces e.g. 1 reduce.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-12 Thread Aniket Mokashi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Patch Available  (was: Open)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-12 Thread Aniket Mokashi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Attachment: (was: RegisterPythonUDF2.patch)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1490) Make Pig storers work with remote HDFS in secure mode

2010-07-12 Thread Daniel Dai (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887424#action_12887424
 ] 

Daniel Dai commented on PIG-1490:
-

+1

 Make Pig storers work with remote HDFS in secure mode
 -

 Key: PIG-1490
 URL: https://issues.apache.org/jira/browse/PIG-1490
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1490.patch


 PIG-1403 fixed the problem for Pig loaders. We need to do the same for Pig 
 storers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1494) PIG Logical Optimization: Use CNF in PushUpFilter

2010-07-12 Thread Swati Jain (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887425#action_12887425
 ] 

Swati Jain commented on PIG-1494:
-

Reply from Yan Zhou:

The filter logic split problem can be divided into 2 parts:
1) the filtering logic that can be applied to individual input sources;
and 2) the filtering logic that has to be applied when merged(or joined)
inputs are processed.

The benefits for 1) are any benefits if the underlying storage supports
predicate pushdown; plus the memory/CPU savings by PIG for not
processing the unqualified rows.

For 2), the purpose is not paying higher evaluation costs than
necessary.

For 1), no normal form is necessary. The original logical expression
tree
can be trimmed off any sub-expressions that are not constants nor just
from a particular input source. The complexity is linear with the tree
size; while the use of normal form could potentially lead to exponential
complexity. The difficulty with this approach is how to generate the
filtering logic for 2); while CNF can be used to easily figure out the
logic for 2). However, the exact logic in 2) might not be cheaper to
evaluate than the original logical expression. An example is Filter J2
by ((C1  10) AND (a3+b310)) OR ((C2 == 5) AND (a2+b2 5)). In 2) the
filtering logic after CNF will be ((C1  10) OR (a2+b2  5)) AND
((a3+b310) OR (C2 == 5)) AND ((a3+b3 10) OR (a2+b2  5)). The cost
will be 5 logical evaluations (3 OR plus 2 AND), which could be reduced
to 4, compared with 3 logical evaluations in the original form.

In summary, if only 1) is desired, the tree trimming is enough. If 2) is
desired too, then CNF could be used but its complexity should be
controlled and the cost of the filtering logic evaluation in 2) should
be computed and compared with the original expression evaluation cost.
Further optimization is possible in this direction.

Another potential optimization to consider is to support logical
expression tree of multiple children vs. the binary tree after taking
into consideration of the commutative property of OR and AND operations.
The advantages are less tree traversal costs and easier to change the
evaluation ordering within the same sub-tree in order to maximize the
possibilities to short-cut the evaluations. Although this is general for
all logical expressions, this tends to be more suitable for normal form
handlings as normal forms group the sub-expressions by the operators
that act on the sub-expressions.

 PIG Logical Optimization: Use CNF in PushUpFilter
 -

 Key: PIG-1494
 URL: https://issues.apache.org/jira/browse/PIG-1494
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Swati Jain
Priority: Minor
 Fix For: 0.8.0


 The PushUpFilter rule is not able to handle complicated boolean expressions.
 For example, SplitFilter rule is splitting one LOFilter into two by AND. 
 However it will not be able to split LOFilter if the top level operator is 
 OR. For example:
 *ex script:*
 A = load 'file_a' USING PigStorage(',') as (a1:int,a2:int,a3:int);
 B = load 'file_b' USING PigStorage(',') as (b1:int,b2:int,b3:int);
 C = load 'file_c' USING PigStorage(',') as (c1:int,c2:int,c3:int);
 J1 = JOIN B by b1, C by c1;
 J2 = JOIN J1 by $0, A by a1;
 D = *Filter J2 by ( (c1  10) AND (a3+b3  10) ) OR (c2 == 5);*
 explain D;
 In the above example, the PushUpFilter is not able to push any filter 
 condition across any join as it contains columns from all branches (inputs). 
 But if we convert this expression into Conjunctive Normal Form (CNF) then 
 we would be able to push filter condition c1 10 and c2 == 5 below both join 
 conditions. Here is the CNF expression for highlighted line:
 ( (c1  10) OR (c2 == 5) ) AND ( (a3+b3  10) OR (c2 ==5) )
 *Suggestion:* It would be a good idea to convert LOFilter's boolean 
 expression into CNF, it would then be easy to push parts (conjuncts) of the 
 LOFilter boolean expression selectively. We would also not require rule 
 SplitFilter anymore if we were to add this utility to rule PushUpFilter 
 itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1295) Binary comparator for secondary sort

2010-07-12 Thread Daniel Dai (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887420#action_12887420
]

Daniel Dai commented on PIG-1295:
-

More clarification for custom Tuple. There two cases for custom tuple:
1. User create custom tuple inside UDF. In this case, we do not have a special
serialized format for custom tuple. After serialization, we cannot tell if it
is a custom tuple. That is say, we lose track of tuple implementation after
se/des. Since serialized format is the same, we can still use the same raw
comparator.
2. If user use a custom tuple factory (by overriding
pig.data.tuple.factory.name), then serialized format may be changed. If we
detect that tuple factory is not BinSedesTupleFactory, we shall not use this
raw comparator.

Binary comparator for secondary sort

Key: PIG-1295
URL: https://issues.apache.org/jira/browse/PIG-1295
Project: Pig
Issue Type: Improvement
Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Gianmarco De Francisci Morales
Fix For: 0.8.0

Attachments: PIG-1295_0.1.patch, PIG-1295_0.2.patch,
PIG-1295_0.3.patch, PIG-1295_0.4.patch, PIG-1295_0.5.patch,
PIG-1295_0.6.patch, PIG-1295_0.7.patch, PIG-1295_0.8.patch

When hadoop framework doing the sorting, it will try to use binary version of
comparator if available. The benefit of binary comparator is we do not need
to instantiate the object before we compare. We see a ~30% speedup after we
switch to binary comparator. Currently, Pig use binary comparator in
following case:
1. When semantics of order doesn't matter. For example, in distinct, we need
to do a sort in order to filter out duplicate values; however, we do not care
how comparator sort keys. Groupby also share this character. In this case, we
rely on hadoop's default binary comparator
2. Semantics of order matter, but the key is of simple type. In this case, we
have implementation for simple types, such as integer, long, float,
chararray, databytearray, string
However, if the key is a tuple and the sort semantics matters, we do not have
a binary comparator implementation. This especially matters when we switch to
use secondary sort. In secondary sort, we convert the inner sort of nested
foreach into the secondary key and rely on hadoop to sorting on both main key
and secondary key. The sorting key will become a two items tuple. Since the
secondary key the sorting key of the nested foreach, so the sorting semantics
matters. It turns out we do not have binary comparator once we use secondary
sort, and we see a significant slow down.
Binary comparator for tuple should be doable once we understand the binary
structure of the serialized tuple. We can focus on most common use cases
first, which is group by followed by a nested sort. In this case, we will
use secondary sort. Semantics of the first key does not matter but semantics
of secondary key matters. We need to identify the boundary of main key and
secondary key in the binary tuple buffer without instantiate tuple itself.
Then if the first key equals, we use a binary comparator to compare secondary
key. Secondary key can also be a complex data type, but for the first step,
we focus on simple secondary key, which is the most common use case.
We mark this issue to be a candidate project for Google summer of code 2010
program.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-928) UDFs in scripting languages

2010-07-12 Thread Aniket Mokashi (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aniket Mokashi updated PIG-928:
---

Status: Open  (was: Patch Available)

 UDFs in scripting languages
 ---

 Key: PIG-928
 URL: https://issues.apache.org/jira/browse/PIG-928
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Aniket Mokashi
 Fix For: 0.8.0

 Attachments: calltrace.png, package.zip, PIG-928.patch, 
 pig-greek.tgz, pig.scripting.patch.arnab, pyg.tgz, RegisterPythonUDF3.patch, 
 RegisterPythonUDF4.patch, RegisterPythonUDF_Final.patch, 
 RegisterPythonUDFFinale.patch, RegisterPythonUDFFinale3.patch, 
 RegisterPythonUDFFinale4.patch, RegisterPythonUDFFinale5.patch, 
 RegisterScriptUDFDefineParse.patch, scripting.tgz, scripting.tgz, test.zip


 It should be possible to write UDFs in scripting languages such as python, 
 ruby, etc.  This frees users from needing to compile Java, generate a jar, 
 etc.  It also opens Pig to programmers who prefer scripting languages over 
 Java.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (PIG-1494) PIG Logical Optimization: Use CNF in PushUpFilter

2010-07-12 Thread Swati Jain (JIRA)

PIG Logical Optimization: Use CNF in PushUpFilter
-

 Key: PIG-1494
 URL: https://issues.apache.org/jira/browse/PIG-1494
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Swati Jain
Priority: Minor
 Fix For: 0.8.0


The PushUpFilter rule is not able to handle complicated boolean expressions.

For example, SplitFilter rule is splitting one LOFilter into two by AND. 
However it will not be able to split LOFilter if the top level operator is 
OR. For example:

*ex script:*
A = load 'file_a' USING PigStorage(',') as (a1:int,a2:int,a3:int);
B = load 'file_b' USING PigStorage(',') as (b1:int,b2:int,b3:int);
C = load 'file_c' USING PigStorage(',') as (c1:int,c2:int,c3:int);
J1 = JOIN B by b1, C by c1;
J2 = JOIN J1 by $0, A by a1;
D = *Filter J2 by ( (c1  10) AND (a3+b3  10) ) OR (c2 == 5);*
explain D;
In the above example, the PushUpFilter is not able to push any filter condition 
across any join as it contains columns from all branches (inputs). But if we 
convert this expression into Conjunctive Normal Form (CNF) then we would be 
able to push filter condition c1 10 and c2 == 5 below both join conditions. 
Here is the CNF expression for highlighted line:

( (c1  10) OR (c2 == 5) ) AND ( (a3+b3  10) OR (c2 ==5) )

*Suggestion:* It would be a good idea to convert LOFilter's boolean expression 
into CNF, it would then be easy to push parts (conjuncts) of the LOFilter 
boolean expression selectively. We would also not require rule SplitFilter 
anymore if we were to add this utility to rule PushUpFilter itself.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1490) Make Pig storers work with remote HDFS in secure mode

2010-07-12 Thread Richard Ding (JIRA)


 [ 
https://issues.apache.org/jira/browse/PIG-1490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1490:
--

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
Release Note: Committed to both trunk and 0.7 branch
  Resolution: Fixed

 Make Pig storers work with remote HDFS in secure mode
 -

 Key: PIG-1490
 URL: https://issues.apache.org/jira/browse/PIG-1490
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0, 0.7.0

 Attachments: PIG-1490.patch


 PIG-1403 fixed the problem for Pig loaders. We need to do the same for Pig 
 storers. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

RE: PIG Logical Optimization: Use CNF in SplitFilter

2010-07-12 Thread Yan Zhou

Yes, I already implemented the NOT push down upfront, so you do not
need to do that.

 

The support of CNF will probably be the most difficulty part. But as I
mentioned last time, you should compare the cost after the trimming CNF
to get the post-split filtering logic. Given the complexity of
manipulating CNF and undetermined benefits, I am not sure it should be
in scope at this moment or not.

 

To handle CNF, I think it's a good idea to create a new plan and connect
the nodes in the new plan to the base plan as you envisioned. In my
changes, which uses DNF instead of CNF but processing is similar
otherwise, I use a LogicalExpressionProxy, which contains a source
member that is just the node in the original plan, to link the nodes in
the new plan and old plan.  The original LogicalExpression is enhanced
with a counter to trace the # of proxies of the original nodes since
normal form creation will spread the nodes in the original tree across
many normalized nodes. The benefit, aside from not setting the plan, is
that the original expression is trimmed according to the processing
results from DNF; while DNF is created separately and as a kinda utility
so that complex features can be used. In my changes, I used
multiple-child tree in DNF while not changing the original binary
expression tree structure. Another benefit is that the original tree is
kept as much as it is at the start, i.e., I do not attempt to optimize
its overall structure beyond trimming based upon the simplification
logics. (I also control the size of DNF to 100 nodes.) The down side of
this is added complexity.

 

But in your case, for scenario 2 which is the whole point to use CNF,
you would need to change the original expression tree structurally
beyond trimming for post-split filtering logic. The other benefit of
using multiple-child expression is depending upon if you plan to support
such expression to replace current binary tree

in the final plan. Even though I think it's a good idea to support that,
but it is not in my scope now.

 

I'll add my algorithm details soon to my jira. Please take a look and
comment as you see appropriate.

 

Thanks,

 

Yan

 

 



From: Swati Jain [mailto:swat...@aggiemail.usu.edu] 
Sent: Friday, July 09, 2010 11:00 PM
To: Yan Zhou
Cc: pig-dev@hadoop.apache.org
Subject: Re: PIG Logical Optimization: Use CNF in SplitFilter

 

Hi Yan,

I agree that the first scenario (filter logic applied to individual
input sources) doesn't need conversion to CNF and that it will be a good
idea to add CNF functionality for the second scenario. I was also
planning to provide a configurable threshold value to control the
complexity of CNF conversion.

As part of the above, I wrote a utility to push the NOT operator in
predicates below the AND and OR operators (Scenario 2 in PIG-1399).
I am considering making this utility to push NOT a separate rule in
itself. Lmk if you have already implemented this.

While implementing this utility I am facing some trouble in maintaining
OperatorPlan consistent as I rewrite the expression. This is because
each operator is referencing the main filter logical plan. Here is my
current approach of implementation:

1. I am creating a new LogicalExpressionPlan for the converted boolean
expression.
2. I am creating new logical expressions while pushing the NOT
operation, converting AND into OR, OR into AND eliminating NOT NOT
pairs.
3. However, I am having trouble updating the LogicalExpressionPlan if it
reaches the base case ( i.e. root operator is not NOT,AND,OR).

D = Filter J2 by ( (c2 == 5) OR ( NOT( (c1  10) AND (c3+b3  10 ) ) )
);

In the above, for example, I am not sure how to integrate base
expression (c2 == 5) into the new LogicalExpressionPlan. There is no
routine to set the plan for a given operator and its children. Also,
there is currently no way to deepCopy an expression into a new
OperatorPlan. It would be great if you could give me some suggestions on
what approach to take for this.

One approach I thought of is to visit the base expression and create and
connect the base expression to the LogicalExpressionPlan as I visit it.

Thoughts?
Swati

ps: About your other point regarding binary vs multi way trees, the way
I am creating the normal form is a list of conjuncts, where each
conjunct is a list of disjuncts. This is logically similar to a multi
waytree. However, the current modeling of boolean expressions (modeled
as binary expressions) requires a conversion back to the binary tree
model when adding back to the main plan.

On Tue, Jul 6, 2010 at 12:46 PM, Yan Zhou y...@yahoo-inc.com wrote:

Swati,

I happen to be working on the logical expression simplification effort
(https://issues.apache.org/jira/browse/PIG-1399), but not on the filter
split front. So I guess our interests will have some overlaps.

I think the filter logic split problem can be divided into 2 parts:
1) the filtering logic that can be applied to individual input sources;

[jira] Commented: (PIG-1472) Optimize serialization/deserialization between Map and Reduce and between MR jobs

2010-07-12 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887441#action_12887441
]

Thejas M Nair commented on PIG-1472:

bq. 1. The following code are never used in BinStorage and InterStorage, should
be removed.
I will remove that.

bq. 3. Seems InterStorage is a replacement for BinStorage, why do we make it
private? Shall we encourage user use InterStorage in the place of BinStorage,
and make BinStorage deprecate?
In future, we are likely to find better ways to serialize data between MR jobs
of a pig query. ie the InterSedes serialization format is likely to change, and
the change is not likely to be compatible with its old format. So it will not
be suitable for storing persistent data.
This replaces BinStorage only for its use within pig. Since BinStorage is used
in pig queries and it should be easy to maintain the code, I think we don't
have to deprecate BinStorage.

Optimize serialization/deserialization between Map and Reduce and between MR
jobs
-

Key: PIG-1472
URL: https://issues.apache.org/jira/browse/PIG-1472
Project: Pig
Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0

Attachments: PIG-1472.2.patch, PIG-1472.3.patch, PIG-1472.patch

In certain types of pig queries most of the execution time is spent in
serializing/deserializing (sedes) records between Map and Reduce and between
MR jobs.
For example, if PigMix queries are modified to specify types for all the
fields in the load statement schema, some of the queries (L2,L3,L9, L10 in
pigmix v1) that have records with bags and maps being transmitted across map
or reduce boundaries run a lot longer (runtime increase of few times has been
seen.
There are a few optimizations that have shown to improve the performance of
sedes in my tests -
1. Use smaller number of bytes to store length of the column . For example if
a bytearray is smaller than 255 bytes , a byte can be used to store the
length instead of the integer that is currently used.
2. Instead of custom code to do sedes on Strings, use DataOutput.writeUTF and
DataInput.readUTF. This reduces the cost of serialization by more than 1/2.
Zebra and BinStorage are known to use DefaultTuple sedes functionality. The
serialization format that these loaders use cannot change, so after the
optimization their format is going to be different from the format used
between M/R boundaries.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (PIG-1472) Optimize serialization/deserialization between Map and Reduce and between MR jobs

2010-07-12 Thread Thejas M Nair (JIRA)

[
https://issues.apache.org/jira/browse/PIG-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Thejas M Nair updated PIG-1472:
---

Attachment: PIG-1472.4.patch

Removed unused static constants from InterStorage and BinStorage , addressing
comment#1 from Daniel.

Optimize serialization/deserialization between Map and Reduce and between MR
jobs
-

Key: PIG-1472
URL: https://issues.apache.org/jira/browse/PIG-1472
Project: Pig
Issue Type: Improvement
Affects Versions: 0.8.0
Reporter: Thejas M Nair
Assignee: Thejas M Nair
Fix For: 0.8.0

Attachments: PIG-1472.2.patch, PIG-1472.3.patch, PIG-1472.4.patch,
PIG-1472.patch

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (PIG-1436) Print number of records outputted at each step of a Pig script

2010-07-12 Thread Richard Ding (JIRA)


[ 
https://issues.apache.org/jira/browse/PIG-1436?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12887446#action_12887446
 ] 

Richard Ding commented on PIG-1436:
---

Russell,

PIG-1478 implemented a callback mechanism that allows users to retrieve stats 
after each job. Will this meet your needs? 

 Print number of records outputted at each step of a Pig script
 --

 Key: PIG-1436
 URL: https://issues.apache.org/jira/browse/PIG-1436
 Project: Pig
  Issue Type: New Feature
  Components: grunt
Affects Versions: 0.7.0
Reporter: Russell Jurney
Assignee: Richard Ding
Priority: Minor
 Fix For: 0.8.0


 I often run a script multiple times, or have to go and look through Hadoop 
 task logs, to figure out where I broke a long script in such a way that I get 
 0 records out of it.  I think this is a common problem.
 If someone can point me in the right direction, I can make a pass at this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

83 matches

Mail list logo