[jira] Updated: (PIG-1408) Annotate explain plans with aliases

2010-05-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1408:
--

  Status: Resolved  (was: Patch Available)
Hadoop Flags: [Reviewed]
  Resolution: Fixed

 Annotate explain plans with aliases
 ---

 Key: PIG-1408
 URL: https://issues.apache.org/jira/browse/PIG-1408
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.7.0
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1408.patch


 PIG-1156 added aliases in Pig scripts to the corresponding LogicalOperators 
 and PhysicalOperators. The aliases in the operators, however, are not 
 displayed in the output created by the explain command. 
 Since a Pig script can generate many MR jobs, it will be helpful, for 
 debugging purposes, to annotate the explain output plans with aliases, so 
 that users can correlate the jobs with the statements in their scripts.
 Here is an example: given the following script
 {code}
 A = load 'input';
 B = group A by $0;
 C = foreach B generate group, flatten(A);
 explain C
 {code}
 The output without alias annotation is 
 {code}
 MapReduce node 1-28
 Map Plan
 Local Rearrange[tuple]{bytearray}(false) - 1-22
 |   |
 |   Project[bytearray][0] - 1-23
 |
 |---Load(file:///test/input:org.apache.pig.builtin.PigStorage) - 1-19
 Reduce Plan
 Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-27
 |
 |---New For Each(false,true)[bag] - 1-26
 |   |
 |   Project[bytearray][0] - 1-24
 |   |
 |   Project[bag][1] - 1-25
 |
 |---Package[tuple]{bytearray} - 1-21
 Global sort: false
 {code} 

 While the output with alias annotation will be
 {code}
 MapReduce node 1-28
 Map Plan
 B: Local Rearrange[tuple]{bytearray}(false) - 1-22
 |   |
 |   Project[bytearray][0] - 1-23
 |
 |---A: Load(file:///test/input:org.apache.pig.builtin.PigStorage) - 
 1-19
 Reduce Plan
 C: Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-27
 |
 |---C: New For Each(false,true)[bag] - 1-26
 |   |
 |   Project[bytearray][0] - 1-24
 |   |
 |   Project[bag][1] - 1-25
 |
 |---B: Package[tuple]{bytearray} - 1-21
 Global sort: false
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-566) Dump and store outputs do not match for PigStorage

2010-05-13 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-566:
---

Attachment: PIG-566.patch

Addressed the issues highlighted in Daniel's comment

 Dump and store outputs do not match for PigStorage
 --

 Key: PIG-566
 URL: https://issues.apache.org/jira/browse/PIG-566
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Santhosh Srinivasan
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-566.patch, PIG-566.patch, PIG-566.patch, 
 PIG-566.patch, PIG-566.patch


 The dump and store formats for PigStorage do not match for longs and floats.
 {code}
 grunt y = foreach x generate {(2985671202194220139L)};
 grunt describe y;
 y: {{(long)}}
 grunt dump y;
 ({(2985671202194220139L)})
 grunt store y into 'y';
 grunt cat y
 {(2985671202194220139)}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file

2010-05-13 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867220#action_12867220
 ] 

Ashutosh Chauhan commented on PIG-1381:
---

+1 on the changes. 
For completeness, we can also check in an empty pig.properties  in the conf dir 
and then add comments in both pig.properties and pig-default.properties that if 
user wants to pass some properties doing it through pig-default.properties will 
have no effect and instead they should add extra properties they want to 
add/override in pig.properties.

 Need a way for Pig to take an alternative property file
 ---

 Key: PIG-1381
 URL: https://issues.apache.org/jira/browse/PIG-1381
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, 
 PIG-1381-4.patch


 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a 
 default pig.properties and if user have a different pig.properties, there 
 will be a conflict since we can only read one. There are couple of ways to 
 solve it:
 1. Give a command line option for user to pass an additional property file
 2. Change the name for default pig.properties to pig-default.properties, and 
 user can give a pig.properties to override
 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems 
 to be more natural for hadoop community. If so, we shall provide backward 
 compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-566) Dump and store outputs do not match for PigStorage

2010-05-13 Thread Gianmarco De Francisci Morales (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gianmarco De Francisci Morales updated PIG-566:
---

Status: Patch Available  (was: In Progress)

 Dump and store outputs do not match for PigStorage
 --

 Key: PIG-566
 URL: https://issues.apache.org/jira/browse/PIG-566
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Santhosh Srinivasan
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-566.patch, PIG-566.patch, PIG-566.patch, 
 PIG-566.patch, PIG-566.patch


 The dump and store formats for PigStorage do not match for longs and floats.
 {code}
 grunt y = foreach x generate {(2985671202194220139L)};
 grunt describe y;
 y: {{(long)}}
 grunt dump y;
 ({(2985671202194220139L)})
 grunt store y into 'y';
 grunt cat y
 {(2985671202194220139)}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1385) UDF to create tuples and bags

2010-05-13 Thread Daniel Lescohier (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867244#action_12867244
 ] 

Daniel Lescohier commented on PIG-1385:
---

The Test file in PIG-1385-trunk.patch has a typo: 'org.paache' instead of 
'org.apache'.

+++ 
contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/util/TestToBagToTuple.java
(revision 0)
@@ -0,0 +1,51 @@
+package org.paache.pig.piggybank.util;


 UDF to create tuples and bags
 -

 Key: PIG-1385
 URL: https://issues.apache.org/jira/browse/PIG-1385
 Project: Pig
  Issue Type: New Feature
  Components: tools
Affects Versions: 0.6.0
Reporter: hc busy
Assignee: hc busy
 Fix For: 0.8.0

 Attachments: PIG-1385-trunk.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 Based on this conversation:
  On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote:
 
   What about making them part of the language using symbols?
  
   instead of
  
   foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7;
  
   have language support
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7;
  
   or even:
  
   foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11;
  
  
   Is there reason not to do the second or third other than being more
   complicated?
  
   Certainly I'd volunteer to put the top implementation in to the util
   package and submit them for builtin's, but the latter syntactic candies
   seems more natural..
  
  
  
   On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote:
  
   The grouping package in piggybank is left over from back when Pig
  allowed
   users to define grouping functions (0.1).  Functions like these should
  go in
   evaluation.util.
  
   However, I'd consider putting these in builtin (in main Pig) instead.
These are things everyone asks for and they seem like a reasonable
  addition
   to the core engine.  This will be more of a burden to write (as we'll
  hold
   them to a higher standard) but of more use to people as well.
  
   Alan.
  
  
   On Apr 19, 2010, at 12:53 PM, hc busy wrote:
  
Some times I wonder... I mean, somebody went to the trouble of making a
   path
   called
  
   org.apache.pig.piggybank.grouping
  
   (where it seems like this code belong), but didn't check in any java
  code
   into that package.
  
  
   Any comment about where to put this kind of utility classes?
  
  
  
   On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote:
  
2010/4/19 hc busy hc.b...@gmail.com
  
That's just the way it is right now, you can't make bags or tuples
   directly... Maybe we should have some UDF's in piggybank for these:
  
   toBag()
   toTuple(); --which is kinda like exec(Tuple in){return in;}
   TupleToBag(); --some times you need it this way for some reason.
  
  
Ok. I place my current code here, may be later I make a patch (if
  such
   implementation is acceptable of course).
  
   import org.apache.pig.EvalFunc;
   import org.apache.pig.data.BagFactory;
   import org.apache.pig.data.DataBag;
   import org.apache.pig.data.Tuple;
   import org.apache.pig.data.TupleFactory;
  
   import java.io.IOException;
  
   /**
   * Convert any sequence of fields to bag with specified count of
   fieldsbr
   * Schema: count:int, fld1 [, fld2, fld3, fld4... ].
   * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... }
   *
   * @author astepachev
   */
   public class ToBag extends EvalFuncDataBag {
public BagFactory bagFactory;
public TupleFactory tupleFactory;
  
public ToBag() {
bagFactory = BagFactory.getInstance();
tupleFactory = TupleFactory.getInstance();
}
  
@Override
public DataBag exec(Tuple input) throws IOException {
if (input.isNull())
return null;
final DataBag bag = bagFactory.newDefaultBag();
final Integer couter = (Integer) input.get(0);
if (couter == null)
return null;
Tuple tuple = tupleFactory.newTuple();
for (int i = 0; i  input.size() - 1; i++) {
if (i % couter == 0) {
tuple = tupleFactory.newTuple();
bag.add(tuple);
}
tuple.append(input.get(i + 1));
}
return bag;
}
   }
  
   import org.apache.pig.ExecType;
   import org.apache.pig.PigServer;
   import org.junit.Before;
   import org.junit.Test;
  
   import java.io.IOException;
   import java.net.URISyntaxException;
   import java.net.URL;
  
   import static org.junit.Assert.assertTrue;
  
   /**
   * @author astepachev
   */
   public class ToBagTest {
PigServer pigServer;
URL inputTxt;
  
@Before
public void init() throws IOException, URISyntaxException {
pigServer = new 

[jira] Commented: (PIG-1414) Problem with parameter substitution

2010-05-13 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867251#action_12867251
 ] 

Olga Natkovich commented on PIG-1414:
-

+1

 Problem with parameter substitution
 ---

 Key: PIG-1414
 URL: https://issues.apache.org/jira/browse/PIG-1414
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Attachments: PIG-1414.patch


 The following script:
 {code}
 L = load 'input';
 store L into 'output' using MyClass$StorerAsInnerClass();
 {code}
 causes Pig to fail with this error message:
 {code}
 ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Undefined 
 parameter : StorerAsInnerClass
 java.lang.RuntimeException: Undefined parameter : StorerAsInnerClass
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:232)
 at 
 org.apache.pig.tools.parameters.PigFileParser.input(PigFileParser.java:60)
 at 
 org.apache.pig.tools.parameters.PigFileParser.Parse(PigFileParser.java:42)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:105)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:98)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:576)
 at org.apache.pig.Main.main(Main.java:418)
 {code} 
 even though no parameter substitution is specified from the command line. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1414) Problem with parameter substitution

2010-05-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding resolved PIG-1414.
---

 Hadoop Flags: [Reviewed]
Fix Version/s: 0.8.0
   Resolution: Fixed

This patch fixed the failed unit tests due to parameter substitution.

 Problem with parameter substitution
 ---

 Key: PIG-1414
 URL: https://issues.apache.org/jira/browse/PIG-1414
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1414.patch


 The following script:
 {code}
 L = load 'input';
 store L into 'output' using MyClass$StorerAsInnerClass();
 {code}
 causes Pig to fail with this error message:
 {code}
 ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Undefined 
 parameter : StorerAsInnerClass
 java.lang.RuntimeException: Undefined parameter : StorerAsInnerClass
 at 
 org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:232)
 at 
 org.apache.pig.tools.parameters.PigFileParser.input(PigFileParser.java:60)
 at 
 org.apache.pig.tools.parameters.PigFileParser.Parse(PigFileParser.java:42)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:105)
 at 
 org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:98)
 at org.apache.pig.Main.runParamPreprocessor(Main.java:576)
 at org.apache.pig.Main.main(Main.java:418)
 {code} 
 even though no parameter substitution is specified from the command line. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script

2010-05-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1280:
--

Attachment: PIG-1280.patch

New patch adding a Pig property that allows user to turn off this feature.

 Add a pig-script-id to the JobConf of all jobs run in a pig-script
 --

 Key: PIG-1280
 URL: https://issues.apache.org/jira/browse/PIG-1280
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Arun C Murthy
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1280.patch, PIG-1280.patch


 It would be very useful for tools like gridmix if pig could add a 
 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. 
 Potentially we could use this to re-construct the DAG of jobs in gridmix and 
 so on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script

2010-05-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1280:
--

Status: Patch Available  (was: Open)

 Add a pig-script-id to the JobConf of all jobs run in a pig-script
 --

 Key: PIG-1280
 URL: https://issues.apache.org/jira/browse/PIG-1280
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Arun C Murthy
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1280.patch, PIG-1280.patch


 It would be very useful for tools like gridmix if pig could add a 
 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. 
 Potentially we could use this to re-construct the DAG of jobs in gridmix and 
 so on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script

2010-05-13 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1280:
--

Status: Open  (was: Patch Available)

 Add a pig-script-id to the JobConf of all jobs run in a pig-script
 --

 Key: PIG-1280
 URL: https://issues.apache.org/jira/browse/PIG-1280
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Arun C Murthy
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1280.patch, PIG-1280.patch


 It would be very useful for tools like gridmix if pig could add a 
 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. 
 Potentially we could use this to re-construct the DAG of jobs in gridmix and 
 so on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1415) LoadFunc signature is not correct in LoadFunc.getSchema sometimes

2010-05-13 Thread Daniel Dai (JIRA)
LoadFunc signature is not correct in LoadFunc.getSchema sometimes
-

 Key: PIG-1415
 URL: https://issues.apache.org/jira/browse/PIG-1415
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


The following script does not set signature correctly when we call 
LoadFunc.getSchema.

a = load 'xxx' using TableLoader('xxx') as (a, b, c);

However, if we don't give schema to a, we get the right signature:

a = load 'xxx' using TableLoader('xxx);

Diagnosis:
Parser will generate LoadClause before go to the generation Alias = 
LoadClause, which actually set signature to the LOLoad. When we give a schema, 
parser try to call LOLoad.setSchema(), internally it will call 
LoadFunc.determineSchema. And at that time, signature has not been set yet. 

Solution:
We shall not call LoadFunc.determineSchema inside LOLoad.setSchema().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1415) LoadFunc signature is not correct in LoadFunc.getSchema sometimes

2010-05-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1415:


Description: 
The following script does not set signature correctly when we call 
LoadFunc.getSchema.

a = load 'xxx' using TableLoader('xxx') as (a, b, c);

However, if we don't give schema to a, we get the right signature:

a = load 'xxx' using TableLoader('xxx);

Diagnosis:
Parser will generate LoadClause before go to the generation Alias = 
LoadClause, which actually set signature to the LOLoad. When we give a schema, 
parser try to call LOLoad.setSchema(), internally it will call 
LoadFunc.determineSchema. And at that time, signature has not been set yet. 

This relates to the change we cache determinedSchema in LOLoad 
[PIG-1317|https://issues.apache.org/jira/browse/PIG-1317]. Before that change, 
we will later call LoadFunc.getSchema() again using the right signature. Now we 
cache determinedSchema, so LoadFunc don't have a chance to get the right 
signature inside LoadFunc.getSchema()

Solution:
We shall not call LoadFunc.determineSchema inside LOLoad.setSchema().

  was:
The following script does not set signature correctly when we call 
LoadFunc.getSchema.

a = load 'xxx' using TableLoader('xxx') as (a, b, c);

However, if we don't give schema to a, we get the right signature:

a = load 'xxx' using TableLoader('xxx);

Diagnosis:
Parser will generate LoadClause before go to the generation Alias = 
LoadClause, which actually set signature to the LOLoad. When we give a schema, 
parser try to call LOLoad.setSchema(), internally it will call 
LoadFunc.determineSchema. And at that time, signature has not been set yet. 

Solution:
We shall not call LoadFunc.determineSchema inside LOLoad.setSchema().


 LoadFunc signature is not correct in LoadFunc.getSchema sometimes
 -

 Key: PIG-1415
 URL: https://issues.apache.org/jira/browse/PIG-1415
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0


 The following script does not set signature correctly when we call 
 LoadFunc.getSchema.
 a = load 'xxx' using TableLoader('xxx') as (a, b, c);
 However, if we don't give schema to a, we get the right signature:
 a = load 'xxx' using TableLoader('xxx);
 Diagnosis:
 Parser will generate LoadClause before go to the generation Alias = 
 LoadClause, which actually set signature to the LOLoad. When we give a 
 schema, parser try to call LOLoad.setSchema(), internally it will call 
 LoadFunc.determineSchema. And at that time, signature has not been set yet. 
 This relates to the change we cache determinedSchema in LOLoad 
 [PIG-1317|https://issues.apache.org/jira/browse/PIG-1317]. Before that 
 change, we will later call LoadFunc.getSchema() again using the right 
 signature. Now we cache determinedSchema, so LoadFunc don't have a chance to 
 get the right signature inside LoadFunc.getSchema()
 Solution:
 We shall not call LoadFunc.determineSchema inside LOLoad.setSchema().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script

2010-05-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867281#action_12867281
 ] 

Daniel Dai commented on PIG-1280:
-

+1 for the new patch if tests pass.

 Add a pig-script-id to the JobConf of all jobs run in a pig-script
 --

 Key: PIG-1280
 URL: https://issues.apache.org/jira/browse/PIG-1280
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Arun C Murthy
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1280.patch, PIG-1280.patch


 It would be very useful for tools like gridmix if pig could add a 
 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. 
 Potentially we could use this to re-construct the DAG of jobs in gridmix and 
 so on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db

2010-05-13 Thread Ashutosh Chauhan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ashutosh Chauhan updated PIG-1229:
--

Attachment: pig-1229.patch

Ankur,

Sorry for getting back late on this. I fiddled with your latest patch and was 
able to make some progress on it. I am able to get rid of those Path problems 
(looks like Pig itself is not dealing with it correctly at one place). I think 
with the patch that I attached should work but I am not able to get test case 
to pass because of hsqldb problem which I am not able to resolve. I keep 
getting this error from it:
{noformat}
Caused by: java.sql.SQLException: The database is already in use by another 
process: org.hsqldb.persist.niolockf...@4abea04e[file 
=/private/tmp/batchtest.lck, exists=true, locked=false, valid=false, fl =null]: 
java.lang.Exception: checkHeartbeat(): lock file [/private/tmp/batchtest.lck] 
is presumably locked by another process.
at org.hsqldb.jdbc.Util.sqlException(Unknown Source)
at org.hsqldb.jdbc.jdbcConnection.init(Unknown Source)
at org.hsqldb.jdbcDriver.getConnection(Unknown Source)
at org.hsqldb.jdbcDriver.connect(Unknown Source)
at java.sql.DriverManager.getConnection(DriverManager.java:582)
at java.sql.DriverManager.getConnection(DriverManager.java:185)
at 
org.apache.pig.piggybank.storage.DBStorage.prepareToWrite(DBStorage.java:274)

{noformat}
Anyways here are the changes I made:
1.
{code}
Index:src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java
===
-conf.set(pig.streaming.log.dir, 
-new Path(outputPath, LOG_DIR).toString());
+//conf.set(pig.streaming.log.dir, 
+//new Path(outputPath, LOG_DIR).toString());
 conf.set(pig.streaming.task.output.dir, outputPath);
 }
{code}
This looks like a problem in Pig. Here Pig is incorrectly assuming that it can 
put logs generated during stream command in output location which is incorrect 
if output location is something like DB. Since this needs changes in main Pig 
code, I will suggest to open new jira for it and track it there.

2. Then in DBStorage.java
{code}
@Override
public void setStoreLocation(String location, Job job) throws IOException {
  job.getConfiguration().set(pig.db.conn.string, location);
}
@Override
public RecordWriterNullWritable, NullWritable getRecordWriter(
TaskAttemptContext context) throws IOException, InterruptedException {
  jdbcURL = context.getConfiguration().get(pig.db.conn.string);
  return null;
}
{code} 
Need to save db connection string in job in setStoreLocation() and then 
retrieve it in backend in getRecordWriter(). 

3. In DBStorage.java
{code}
@Override
public void cleanupOnFailure(String location, Job job) throws 
IOException {
  log.error(Job has failed.);
}
{code}
You need to necessarily override this function of StoreFunc() as default 
implementation assumes FileSystem as the output location. Currently, I left it 
as no-op but it can be improved to do rollbacks, release db connections etc. 


 allow pig to write output into a JDBC db
 

 Key: PIG-1229
 URL: https://issues.apache.org/jira/browse/PIG-1229
 Project: Pig
  Issue Type: New Feature
  Components: impl
Reporter: Ian Holsman
Assignee: Ankur
Priority: Minor
 Fix For: 0.8.0

 Attachments: jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.patch


 UDF to store data into a DB

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1416) explain does not show the inner plans of the MapReduce plan

2010-05-13 Thread Thejas M Nair (JIRA)
explain does not show the inner plans of the MapReduce plan
---

 Key: PIG-1416
 URL: https://issues.apache.org/jira/browse/PIG-1416
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
 Fix For: 0.8.0


The  inner plan in MR plan is very useful in understanding query plans where 
the inner plans are not present in the Physical or Logical plans. For example, 
in case of order by, the sampling MR job is not part of the physical plan, 
and the reduce has a POSort which is not shown in explain . 

In following example, notice that POSort is not shown the MR plan .

{code}

grunt l = load 'file.txt' as (a, b, c);
grunt g = group l by a;
grunt f = foreach g { s = order l by $0; generate s; }
grunt explain f;

grunt explain f;
#---
# Logical Plan:
#---
Store 1-137 Schema: {s: {a: bytearray,b: bytearray,c: bytearray}} Type: Unknown
|
|---ForEach 1-136 Schema: {s: {a: bytearray,b: bytearray,c: bytearray}} Type: 
bag
|   |
|   Project 1-132 Projections:  [*]  Overloaded: false FieldSchema: s: 
bag({a: bytearray,b: bytearray,c: bytearray}) Type: bag
|   Input: SORT 1-133|
|   |---SORT 1-133 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: 
bag
|   |   |
|   |   Project 1-134 Projections: [0] Overloaded: false FieldSchema: 
a: bytearray Type: bytearray
|   |   Input: Project 1-135 Projections: [1] Overloaded: true
|   |
|   |---Project 1-135 Projections: [1] Overloaded: true FieldSchema: l: 
tuple({a: bytearray,b: bytearray,c: bytearray}) Type: tuple
|   Input: CoGroup 1-126
|
|---CoGroup 1-126 Schema: {group: bytearray,l: {a: bytearray,b: 
bytearray,c: bytearray}} Type: bag
|   |
|   Project 1-125 Projections: [0] Overloaded: false FieldSchema: a: 
bytearray Type: bytearray
|   Input: Load 1-124
|
|---Load 1-124 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: 
bag

#---
# Physical Plan:
#---
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-148
|
|---New For Each(false)[bag] - 1-147
|   |
|   RelationToExpressionProject[bag][*] - 1-146
|   |
|   |---POSort[bag]() - 1-145
|   |   |
|   |   Project[bytearray][0] - 1-144
|   |
|   |---Project[tuple][1] - 1-143
|
|---Package[tuple]{bytearray} - 1-140
|
|---Global Rearrange[tuple] - 1-139
|
|---Local Rearrange[tuple]{bytearray}(false) - 1-141
|   |
|   Project[bytearray][0] - 1-142
|

|---Load(file:///Users/tejas/trunk_oby/file.txt:org.apache.pig.builtin.PigStorage)
 - 1-138

2010-05-13 15:47:32,102 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size before optimization: 1
2010-05-13 15:47:32,102 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
 - MR plan size after optimization: 1
#--
# Map Reduce Plan
#--
MapReduce node 1-149
Map Plan
Local Rearrange[tuple]{bytearray}(false) - 1-141
|   |
|   Project[bytearray][0] - 1-142
|
|---Load(file:///Users/tejas/trunk_oby/file.txt:org.apache.pig.builtin.PigStorage)
 - 1-138
Reduce Plan
Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-148
|
|---New For Each(false)[bag] - 1-147
|   |
|   RelationToExpressionProject[bag][*] - 1-146
|   |
|   |---Project[tuple][1] - 1-143
|
|---Package[tuple]{bytearray} - 1-140
Global sort: false


{code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1417) Site changes for 0.7

2010-05-13 Thread Daniel Dai (JIRA)
Site changes for 0.7


 Key: PIG-1417
 URL: https://issues.apache.org/jira/browse/PIG-1417
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1417) Site changes for 0.7

2010-05-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867323#action_12867323
 ] 

Daniel Dai commented on PIG-1417:
-

It's too big to attach in Jira. I put it in 
http://people.apache.org/~daijy/site.patch

 Site changes for 0.7
 

 Key: PIG-1417
 URL: https://issues.apache.org/jira/browse/PIG-1417
 Project: Pig
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1416) explain does not show the inner plans of the MapReduce plan

2010-05-13 Thread Thejas M Nair (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thejas M Nair resolved PIG-1416.


Resolution: Invalid

The secondary key optimizer is removing the POSort, that is why it does not 
appear in MR plan. Closing as invalid. 


 explain does not show the inner plans of the MapReduce plan
 ---

 Key: PIG-1416
 URL: https://issues.apache.org/jira/browse/PIG-1416
 Project: Pig
  Issue Type: Bug
  Components: impl
Reporter: Thejas M Nair
 Fix For: 0.8.0


 The  inner plan in MR plan is very useful in understanding query plans where 
 the inner plans are not present in the Physical or Logical plans. For 
 example, in case of order by, the sampling MR job is not part of the 
 physical plan, and the reduce has a POSort which is not shown in explain . 
 In following example, notice that POSort is not shown the MR plan .
 {code}
 grunt l = load 'file.txt' as (a, b, c);
 grunt g = group l by a;
 grunt f = foreach g { s = order l by $0; generate s; }
 grunt explain f;
 grunt explain f;
 #---
 # Logical Plan:
 #---
 Store 1-137 Schema: {s: {a: bytearray,b: bytearray,c: bytearray}} Type: 
 Unknown
 |
 |---ForEach 1-136 Schema: {s: {a: bytearray,b: bytearray,c: bytearray}} Type: 
 bag
 |   |
 |   Project 1-132 Projections:  [*]  Overloaded: false FieldSchema: s: 
 bag({a: bytearray,b: bytearray,c: bytearray}) Type: bag
 |   Input: SORT 1-133|
 |   |---SORT 1-133 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: 
 bag
 |   |   |
 |   |   Project 1-134 Projections: [0] Overloaded: false FieldSchema: 
 a: bytearray Type: bytearray
 |   |   Input: Project 1-135 Projections: [1] Overloaded: true
 |   |
 |   |---Project 1-135 Projections: [1] Overloaded: true FieldSchema: 
 l: tuple({a: bytearray,b: bytearray,c: bytearray}) Type: tuple
 |   Input: CoGroup 1-126
 |
 |---CoGroup 1-126 Schema: {group: bytearray,l: {a: bytearray,b: 
 bytearray,c: bytearray}} Type: bag
 |   |
 |   Project 1-125 Projections: [0] Overloaded: false FieldSchema: a: 
 bytearray Type: bytearray
 |   Input: Load 1-124
 |
 |---Load 1-124 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: 
 bag
 #---
 # Physical Plan:
 #---
 Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-148
 |
 |---New For Each(false)[bag] - 1-147
 |   |
 |   RelationToExpressionProject[bag][*] - 1-146
 |   |
 |   |---POSort[bag]() - 1-145
 |   |   |
 |   |   Project[bytearray][0] - 1-144
 |   |
 |   |---Project[tuple][1] - 1-143
 |
 |---Package[tuple]{bytearray} - 1-140
 |
 |---Global Rearrange[tuple] - 1-139
 |
 |---Local Rearrange[tuple]{bytearray}(false) - 1-141
 |   |
 |   Project[bytearray][0] - 1-142
 |
 
 |---Load(file:///Users/tejas/trunk_oby/file.txt:org.apache.pig.builtin.PigStorage)
  - 1-138
 2010-05-13 15:47:32,102 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size before optimization: 1
 2010-05-13 15:47:32,102 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer
  - MR plan size after optimization: 1
 #--
 # Map Reduce Plan
 #--
 MapReduce node 1-149
 Map Plan
 Local Rearrange[tuple]{bytearray}(false) - 1-141
 |   |
 |   Project[bytearray][0] - 1-142
 |
 |---Load(file:///Users/tejas/trunk_oby/file.txt:org.apache.pig.builtin.PigStorage)
  - 1-138
 Reduce Plan
 Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-148
 |
 |---New For Each(false)[bag] - 1-147
 |   |
 |   RelationToExpressionProject[bag][*] - 1-146
 |   |
 |   |---Project[tuple][1] - 1-143
 |
 |---Package[tuple]{bytearray} - 1-140
 Global sort: false
 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file

2010-05-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867335#action_12867335
 ] 

Daniel Dai commented on PIG-1381:
-

Reattach the patch to address Ashutosh's comment.

 Need a way for Pig to take an alternative property file
 ---

 Key: PIG-1381
 URL: https://issues.apache.org/jira/browse/PIG-1381
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, 
 PIG-1381-4.patch, PIG-1381-5.patch


 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a 
 default pig.properties and if user have a different pig.properties, there 
 will be a conflict since we can only read one. There are couple of ways to 
 solve it:
 1. Give a command line option for user to pass an additional property file
 2. Change the name for default pig.properties to pig-default.properties, and 
 user can give a pig.properties to override
 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems 
 to be more natural for hadoop community. If so, we shall provide backward 
 compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1381) Need a way for Pig to take an alternative property file

2010-05-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1381:


Attachment: PIG-1381-5.patch

 Need a way for Pig to take an alternative property file
 ---

 Key: PIG-1381
 URL: https://issues.apache.org/jira/browse/PIG-1381
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, 
 PIG-1381-4.patch, PIG-1381-5.patch


 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a 
 default pig.properties and if user have a different pig.properties, there 
 will be a conflict since we can only read one. There are couple of ways to 
 solve it:
 1. Give a command line option for user to pass an additional property file
 2. Change the name for default pig.properties to pig-default.properties, and 
 user can give a pig.properties to override
 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems 
 to be more natural for hadoop community. If so, we shall provide backward 
 compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1415) LoadFunc signature is not correct in LoadFunc.getSchema sometimes

2010-05-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1415:


Attachment: PIG-1415-1.patch

 LoadFunc signature is not correct in LoadFunc.getSchema sometimes
 -

 Key: PIG-1415
 URL: https://issues.apache.org/jira/browse/PIG-1415
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.8.0

 Attachments: PIG-1415-1.patch


 The following script does not set signature correctly when we call 
 LoadFunc.getSchema.
 a = load 'xxx' using TableLoader('xxx') as (a, b, c);
 However, if we don't give schema to a, we get the right signature:
 a = load 'xxx' using TableLoader('xxx);
 Diagnosis:
 Parser will generate LoadClause before go to the generation Alias = 
 LoadClause, which actually set signature to the LOLoad. When we give a 
 schema, parser try to call LOLoad.setSchema(), internally it will call 
 LoadFunc.determineSchema. And at that time, signature has not been set yet. 
 This relates to the change we cache determinedSchema in LOLoad 
 [PIG-1317|https://issues.apache.org/jira/browse/PIG-1317]. Before that 
 change, we will later call LoadFunc.getSchema() again using the right 
 signature. Now we cache determinedSchema, so LoadFunc don't have a chance to 
 get the right signature inside LoadFunc.getSchema()
 Solution:
 We shall not call LoadFunc.determineSchema inside LOLoad.setSchema().

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1381) Need a way for Pig to take an alternative property file

2010-05-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1381:


Attachment: (was: PIG-1381-5.patch)

 Need a way for Pig to take an alternative property file
 ---

 Key: PIG-1381
 URL: https://issues.apache.org/jira/browse/PIG-1381
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, 
 PIG-1381-4.patch, PIG-1381-5.patch


 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a 
 default pig.properties and if user have a different pig.properties, there 
 will be a conflict since we can only read one. There are couple of ways to 
 solve it:
 1. Give a command line option for user to pass an additional property file
 2. Change the name for default pig.properties to pig-default.properties, and 
 user can give a pig.properties to override
 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems 
 to be more natural for hadoop community. If so, we shall provide backward 
 compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1381) Need a way for Pig to take an alternative property file

2010-05-13 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1381:


Attachment: PIG-1381-5.patch

 Need a way for Pig to take an alternative property file
 ---

 Key: PIG-1381
 URL: https://issues.apache.org/jira/browse/PIG-1381
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, 
 PIG-1381-4.patch, PIG-1381-5.patch


 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a 
 default pig.properties and if user have a different pig.properties, there 
 will be a conflict since we can only read one. There are couple of ways to 
 solve it:
 1. Give a command line option for user to pass an additional property file
 2. Change the name for default pig.properties to pig-default.properties, and 
 user can give a pig.properties to override
 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems 
 to be more natural for hadoop community. If so, we shall provide backward 
 compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file

2010-05-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867352#action_12867352
 ] 

Daniel Dai commented on PIG-1381:
-

Option 2 committed to both trunk and branch 0.7.

 Need a way for Pig to take an alternative property file
 ---

 Key: PIG-1381
 URL: https://issues.apache.org/jira/browse/PIG-1381
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, 
 PIG-1381-4.patch, PIG-1381-5.patch


 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a 
 default pig.properties and if user have a different pig.properties, there 
 will be a conflict since we can only read one. There are couple of ways to 
 solve it:
 1. Give a command line option for user to pass an additional property file
 2. Change the name for default pig.properties to pig-default.properties, and 
 user can give a pig.properties to override
 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems 
 to be more natural for hadoop community. If so, we shall provide backward 
 compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script

2010-05-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867371#action_12867371
 ] 

Daniel Dai commented on PIG-1280:
-

With PIG-1381 checked in, we need to add config entry into 
pig-default.properties instead of pig.properties. Note this change when commit.

 Add a pig-script-id to the JobConf of all jobs run in a pig-script
 --

 Key: PIG-1280
 URL: https://issues.apache.org/jira/browse/PIG-1280
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Arun C Murthy
Assignee: Richard Ding
 Fix For: 0.8.0

 Attachments: PIG-1280.patch, PIG-1280.patch


 It would be very useful for tools like gridmix if pig could add a 
 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. 
 Potentially we could use this to re-construct the DAG of jobs in gridmix and 
 so on.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-566) Dump and store outputs do not match for PigStorage

2010-05-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867373#action_12867373
 ] 

Daniel Dai commented on PIG-566:


+1 once hudson test pass.

 Dump and store outputs do not match for PigStorage
 --

 Key: PIG-566
 URL: https://issues.apache.org/jira/browse/PIG-566
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.7.0, 0.8.0
Reporter: Santhosh Srinivasan
Assignee: Gianmarco De Francisci Morales
Priority: Minor
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-566.patch, PIG-566.patch, PIG-566.patch, 
 PIG-566.patch, PIG-566.patch


 The dump and store formats for PigStorage do not match for longs and floats.
 {code}
 grunt y = foreach x generate {(2985671202194220139L)};
 grunt describe y;
 y: {{(long)}}
 grunt dump y;
 ({(2985671202194220139L)})
 grunt store y into 'y';
 grunt cat y
 {(2985671202194220139)}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file

2010-05-13 Thread V.V.Chaitanya Krishna (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867376#action_12867376
 ] 

V.V.Chaitanya Krishna commented on PIG-1381:


Apologies for coming in so late. I was on vacation and didnt have access to 
internet.
I have done the coding part for option 1. I need to write unit test cases for 
it. I should be submitting the patch positively by saturday.

 Need a way for Pig to take an alternative property file
 ---

 Key: PIG-1381
 URL: https://issues.apache.org/jira/browse/PIG-1381
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, 
 PIG-1381-4.patch, PIG-1381-5.patch


 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a 
 default pig.properties and if user have a different pig.properties, there 
 will be a conflict since we can only read one. There are couple of ways to 
 solve it:
 1. Give a command line option for user to pass an additional property file
 2. Change the name for default pig.properties to pig-default.properties, and 
 user can give a pig.properties to override
 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems 
 to be more natural for hadoop community. If so, we shall provide backward 
 compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file

2010-05-13 Thread Daniel Dai (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867378#action_12867378
 ] 

Daniel Dai commented on PIG-1381:
-

Great. Thanks V.V.Chaitanya!

 Need a way for Pig to take an alternative property file
 ---

 Key: PIG-1381
 URL: https://issues.apache.org/jira/browse/PIG-1381
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.7.0
Reporter: Daniel Dai
Assignee: V.V.Chaitanya Krishna
 Fix For: 0.7.0, 0.8.0

 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, 
 PIG-1381-4.patch, PIG-1381-5.patch


 Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a 
 default pig.properties and if user have a different pig.properties, there 
 will be a conflict since we can only read one. There are couple of ways to 
 solve it:
 1. Give a command line option for user to pass an additional property file
 2. Change the name for default pig.properties to pig-default.properties, and 
 user can give a pig.properties to override
 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems 
 to be more natural for hadoop community. If so, we shall provide backward 
 compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.