[jira] Updated: (PIG-1408) Annotate explain plans with aliases
[ https://issues.apache.org/jira/browse/PIG-1408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1408: -- Status: Resolved (was: Patch Available) Hadoop Flags: [Reviewed] Resolution: Fixed Annotate explain plans with aliases --- Key: PIG-1408 URL: https://issues.apache.org/jira/browse/PIG-1408 Project: Pig Issue Type: Improvement Affects Versions: 0.7.0 Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1408.patch PIG-1156 added aliases in Pig scripts to the corresponding LogicalOperators and PhysicalOperators. The aliases in the operators, however, are not displayed in the output created by the explain command. Since a Pig script can generate many MR jobs, it will be helpful, for debugging purposes, to annotate the explain output plans with aliases, so that users can correlate the jobs with the statements in their scripts. Here is an example: given the following script {code} A = load 'input'; B = group A by $0; C = foreach B generate group, flatten(A); explain C {code} The output without alias annotation is {code} MapReduce node 1-28 Map Plan Local Rearrange[tuple]{bytearray}(false) - 1-22 | | | Project[bytearray][0] - 1-23 | |---Load(file:///test/input:org.apache.pig.builtin.PigStorage) - 1-19 Reduce Plan Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-27 | |---New For Each(false,true)[bag] - 1-26 | | | Project[bytearray][0] - 1-24 | | | Project[bag][1] - 1-25 | |---Package[tuple]{bytearray} - 1-21 Global sort: false {code} While the output with alias annotation will be {code} MapReduce node 1-28 Map Plan B: Local Rearrange[tuple]{bytearray}(false) - 1-22 | | | Project[bytearray][0] - 1-23 | |---A: Load(file:///test/input:org.apache.pig.builtin.PigStorage) - 1-19 Reduce Plan C: Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-27 | |---C: New For Each(false,true)[bag] - 1-26 | | | Project[bytearray][0] - 1-24 | | | Project[bag][1] - 1-25 | |---B: Package[tuple]{bytearray} - 1-21 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-566) Dump and store outputs do not match for PigStorage
[ https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-566: --- Attachment: PIG-566.patch Addressed the issues highlighted in Daniel's comment Dump and store outputs do not match for PigStorage -- Key: PIG-566 URL: https://issues.apache.org/jira/browse/PIG-566 Project: Pig Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Santhosh Srinivasan Assignee: Gianmarco De Francisci Morales Priority: Minor Fix For: 0.7.0, 0.8.0 Attachments: PIG-566.patch, PIG-566.patch, PIG-566.patch, PIG-566.patch, PIG-566.patch The dump and store formats for PigStorage do not match for longs and floats. {code} grunt y = foreach x generate {(2985671202194220139L)}; grunt describe y; y: {{(long)}} grunt dump y; ({(2985671202194220139L)}) grunt store y into 'y'; grunt cat y {(2985671202194220139)} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file
[ https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867220#action_12867220 ] Ashutosh Chauhan commented on PIG-1381: --- +1 on the changes. For completeness, we can also check in an empty pig.properties in the conf dir and then add comments in both pig.properties and pig-default.properties that if user wants to pass some properties doing it through pig-default.properties will have no effect and instead they should add extra properties they want to add/override in pig.properties. Need a way for Pig to take an alternative property file --- Key: PIG-1381 URL: https://issues.apache.org/jira/browse/PIG-1381 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: V.V.Chaitanya Krishna Fix For: 0.7.0, 0.8.0 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, PIG-1381-4.patch Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a default pig.properties and if user have a different pig.properties, there will be a conflict since we can only read one. There are couple of ways to solve it: 1. Give a command line option for user to pass an additional property file 2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-566) Dump and store outputs do not match for PigStorage
[ https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gianmarco De Francisci Morales updated PIG-566: --- Status: Patch Available (was: In Progress) Dump and store outputs do not match for PigStorage -- Key: PIG-566 URL: https://issues.apache.org/jira/browse/PIG-566 Project: Pig Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Santhosh Srinivasan Assignee: Gianmarco De Francisci Morales Priority: Minor Fix For: 0.7.0, 0.8.0 Attachments: PIG-566.patch, PIG-566.patch, PIG-566.patch, PIG-566.patch, PIG-566.patch The dump and store formats for PigStorage do not match for longs and floats. {code} grunt y = foreach x generate {(2985671202194220139L)}; grunt describe y; y: {{(long)}} grunt dump y; ({(2985671202194220139L)}) grunt store y into 'y'; grunt cat y {(2985671202194220139)} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1385) UDF to create tuples and bags
[ https://issues.apache.org/jira/browse/PIG-1385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867244#action_12867244 ] Daniel Lescohier commented on PIG-1385: --- The Test file in PIG-1385-trunk.patch has a typo: 'org.paache' instead of 'org.apache'. +++ contrib/piggybank/java/src/test/java/org/apache/pig/piggybank/util/TestToBagToTuple.java (revision 0) @@ -0,0 +1,51 @@ +package org.paache.pig.piggybank.util; UDF to create tuples and bags - Key: PIG-1385 URL: https://issues.apache.org/jira/browse/PIG-1385 Project: Pig Issue Type: New Feature Components: tools Affects Versions: 0.6.0 Reporter: hc busy Assignee: hc busy Fix For: 0.8.0 Attachments: PIG-1385-trunk.patch Original Estimate: 24h Remaining Estimate: 24h Based on this conversation: On Tue, Apr 20, 2010 at 6:34 PM, hc busy hc.b...@gmail.com wrote: What about making them part of the language using symbols? instead of foreach T generate Tuple($0, $1, $2), Bag($3, $4, $5), $6, $7; have language support foreach T generate ($0, $1, $2), {$3, $4, $5}, $6, $7; or even: foreach T generate ($0, $1, $2), {$3, $4, $5}, [$6#$7, $8#$9], $10, $11; Is there reason not to do the second or third other than being more complicated? Certainly I'd volunteer to put the top implementation in to the util package and submit them for builtin's, but the latter syntactic candies seems more natural.. On Tue, Apr 20, 2010 at 5:24 PM, Alan Gates ga...@yahoo-inc.com wrote: The grouping package in piggybank is left over from back when Pig allowed users to define grouping functions (0.1). Functions like these should go in evaluation.util. However, I'd consider putting these in builtin (in main Pig) instead. These are things everyone asks for and they seem like a reasonable addition to the core engine. This will be more of a burden to write (as we'll hold them to a higher standard) but of more use to people as well. Alan. On Apr 19, 2010, at 12:53 PM, hc busy wrote: Some times I wonder... I mean, somebody went to the trouble of making a path called org.apache.pig.piggybank.grouping (where it seems like this code belong), but didn't check in any java code into that package. Any comment about where to put this kind of utility classes? On Mon, Apr 19, 2010 at 12:07 PM, Andrey S oct...@gmail.com wrote: 2010/4/19 hc busy hc.b...@gmail.com That's just the way it is right now, you can't make bags or tuples directly... Maybe we should have some UDF's in piggybank for these: toBag() toTuple(); --which is kinda like exec(Tuple in){return in;} TupleToBag(); --some times you need it this way for some reason. Ok. I place my current code here, may be later I make a patch (if such implementation is acceptable of course). import org.apache.pig.EvalFunc; import org.apache.pig.data.BagFactory; import org.apache.pig.data.DataBag; import org.apache.pig.data.Tuple; import org.apache.pig.data.TupleFactory; import java.io.IOException; /** * Convert any sequence of fields to bag with specified count of fieldsbr * Schema: count:int, fld1 [, fld2, fld3, fld4... ]. * Output: count=2, then { (fld1, fld2) , (fld3, fld4) ... } * * @author astepachev */ public class ToBag extends EvalFuncDataBag { public BagFactory bagFactory; public TupleFactory tupleFactory; public ToBag() { bagFactory = BagFactory.getInstance(); tupleFactory = TupleFactory.getInstance(); } @Override public DataBag exec(Tuple input) throws IOException { if (input.isNull()) return null; final DataBag bag = bagFactory.newDefaultBag(); final Integer couter = (Integer) input.get(0); if (couter == null) return null; Tuple tuple = tupleFactory.newTuple(); for (int i = 0; i input.size() - 1; i++) { if (i % couter == 0) { tuple = tupleFactory.newTuple(); bag.add(tuple); } tuple.append(input.get(i + 1)); } return bag; } } import org.apache.pig.ExecType; import org.apache.pig.PigServer; import org.junit.Before; import org.junit.Test; import java.io.IOException; import java.net.URISyntaxException; import java.net.URL; import static org.junit.Assert.assertTrue; /** * @author astepachev */ public class ToBagTest { PigServer pigServer; URL inputTxt; @Before public void init() throws IOException, URISyntaxException { pigServer = new
[jira] Commented: (PIG-1414) Problem with parameter substitution
[ https://issues.apache.org/jira/browse/PIG-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867251#action_12867251 ] Olga Natkovich commented on PIG-1414: - +1 Problem with parameter substitution --- Key: PIG-1414 URL: https://issues.apache.org/jira/browse/PIG-1414 Project: Pig Issue Type: Bug Reporter: Richard Ding Assignee: Richard Ding Attachments: PIG-1414.patch The following script: {code} L = load 'input'; store L into 'output' using MyClass$StorerAsInnerClass(); {code} causes Pig to fail with this error message: {code} ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Undefined parameter : StorerAsInnerClass java.lang.RuntimeException: Undefined parameter : StorerAsInnerClass at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:232) at org.apache.pig.tools.parameters.PigFileParser.input(PigFileParser.java:60) at org.apache.pig.tools.parameters.PigFileParser.Parse(PigFileParser.java:42) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:105) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:98) at org.apache.pig.Main.runParamPreprocessor(Main.java:576) at org.apache.pig.Main.main(Main.java:418) {code} even though no parameter substitution is specified from the command line. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1414) Problem with parameter substitution
[ https://issues.apache.org/jira/browse/PIG-1414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding resolved PIG-1414. --- Hadoop Flags: [Reviewed] Fix Version/s: 0.8.0 Resolution: Fixed This patch fixed the failed unit tests due to parameter substitution. Problem with parameter substitution --- Key: PIG-1414 URL: https://issues.apache.org/jira/browse/PIG-1414 Project: Pig Issue Type: Bug Reporter: Richard Ding Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1414.patch The following script: {code} L = load 'input'; store L into 'output' using MyClass$StorerAsInnerClass(); {code} causes Pig to fail with this error message: {code} ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Undefined parameter : StorerAsInnerClass java.lang.RuntimeException: Undefined parameter : StorerAsInnerClass at org.apache.pig.tools.parameters.PreprocessorContext.substitute(PreprocessorContext.java:232) at org.apache.pig.tools.parameters.PigFileParser.input(PigFileParser.java:60) at org.apache.pig.tools.parameters.PigFileParser.Parse(PigFileParser.java:42) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.parsePigFile(ParameterSubstitutionPreprocessor.java:105) at org.apache.pig.tools.parameters.ParameterSubstitutionPreprocessor.genSubstitutedFile(ParameterSubstitutionPreprocessor.java:98) at org.apache.pig.Main.runParamPreprocessor(Main.java:576) at org.apache.pig.Main.main(Main.java:418) {code} even though no parameter substitution is specified from the command line. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script
[ https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1280: -- Attachment: PIG-1280.patch New patch adding a Pig property that allows user to turn off this feature. Add a pig-script-id to the JobConf of all jobs run in a pig-script -- Key: PIG-1280 URL: https://issues.apache.org/jira/browse/PIG-1280 Project: Pig Issue Type: Improvement Components: impl Reporter: Arun C Murthy Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1280.patch, PIG-1280.patch It would be very useful for tools like gridmix if pig could add a 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. Potentially we could use this to re-construct the DAG of jobs in gridmix and so on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script
[ https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1280: -- Status: Patch Available (was: Open) Add a pig-script-id to the JobConf of all jobs run in a pig-script -- Key: PIG-1280 URL: https://issues.apache.org/jira/browse/PIG-1280 Project: Pig Issue Type: Improvement Components: impl Reporter: Arun C Murthy Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1280.patch, PIG-1280.patch It would be very useful for tools like gridmix if pig could add a 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. Potentially we could use this to re-construct the DAG of jobs in gridmix and so on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script
[ https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1280: -- Status: Open (was: Patch Available) Add a pig-script-id to the JobConf of all jobs run in a pig-script -- Key: PIG-1280 URL: https://issues.apache.org/jira/browse/PIG-1280 Project: Pig Issue Type: Improvement Components: impl Reporter: Arun C Murthy Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1280.patch, PIG-1280.patch It would be very useful for tools like gridmix if pig could add a 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. Potentially we could use this to re-construct the DAG of jobs in gridmix and so on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1415) LoadFunc signature is not correct in LoadFunc.getSchema sometimes
LoadFunc signature is not correct in LoadFunc.getSchema sometimes - Key: PIG-1415 URL: https://issues.apache.org/jira/browse/PIG-1415 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 The following script does not set signature correctly when we call LoadFunc.getSchema. a = load 'xxx' using TableLoader('xxx') as (a, b, c); However, if we don't give schema to a, we get the right signature: a = load 'xxx' using TableLoader('xxx); Diagnosis: Parser will generate LoadClause before go to the generation Alias = LoadClause, which actually set signature to the LOLoad. When we give a schema, parser try to call LOLoad.setSchema(), internally it will call LoadFunc.determineSchema. And at that time, signature has not been set yet. Solution: We shall not call LoadFunc.determineSchema inside LOLoad.setSchema(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1415) LoadFunc signature is not correct in LoadFunc.getSchema sometimes
[ https://issues.apache.org/jira/browse/PIG-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1415: Description: The following script does not set signature correctly when we call LoadFunc.getSchema. a = load 'xxx' using TableLoader('xxx') as (a, b, c); However, if we don't give schema to a, we get the right signature: a = load 'xxx' using TableLoader('xxx); Diagnosis: Parser will generate LoadClause before go to the generation Alias = LoadClause, which actually set signature to the LOLoad. When we give a schema, parser try to call LOLoad.setSchema(), internally it will call LoadFunc.determineSchema. And at that time, signature has not been set yet. This relates to the change we cache determinedSchema in LOLoad [PIG-1317|https://issues.apache.org/jira/browse/PIG-1317]. Before that change, we will later call LoadFunc.getSchema() again using the right signature. Now we cache determinedSchema, so LoadFunc don't have a chance to get the right signature inside LoadFunc.getSchema() Solution: We shall not call LoadFunc.determineSchema inside LOLoad.setSchema(). was: The following script does not set signature correctly when we call LoadFunc.getSchema. a = load 'xxx' using TableLoader('xxx') as (a, b, c); However, if we don't give schema to a, we get the right signature: a = load 'xxx' using TableLoader('xxx); Diagnosis: Parser will generate LoadClause before go to the generation Alias = LoadClause, which actually set signature to the LOLoad. When we give a schema, parser try to call LOLoad.setSchema(), internally it will call LoadFunc.determineSchema. And at that time, signature has not been set yet. Solution: We shall not call LoadFunc.determineSchema inside LOLoad.setSchema(). LoadFunc signature is not correct in LoadFunc.getSchema sometimes - Key: PIG-1415 URL: https://issues.apache.org/jira/browse/PIG-1415 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 The following script does not set signature correctly when we call LoadFunc.getSchema. a = load 'xxx' using TableLoader('xxx') as (a, b, c); However, if we don't give schema to a, we get the right signature: a = load 'xxx' using TableLoader('xxx); Diagnosis: Parser will generate LoadClause before go to the generation Alias = LoadClause, which actually set signature to the LOLoad. When we give a schema, parser try to call LOLoad.setSchema(), internally it will call LoadFunc.determineSchema. And at that time, signature has not been set yet. This relates to the change we cache determinedSchema in LOLoad [PIG-1317|https://issues.apache.org/jira/browse/PIG-1317]. Before that change, we will later call LoadFunc.getSchema() again using the right signature. Now we cache determinedSchema, so LoadFunc don't have a chance to get the right signature inside LoadFunc.getSchema() Solution: We shall not call LoadFunc.determineSchema inside LOLoad.setSchema(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script
[ https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867281#action_12867281 ] Daniel Dai commented on PIG-1280: - +1 for the new patch if tests pass. Add a pig-script-id to the JobConf of all jobs run in a pig-script -- Key: PIG-1280 URL: https://issues.apache.org/jira/browse/PIG-1280 Project: Pig Issue Type: Improvement Components: impl Reporter: Arun C Murthy Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1280.patch, PIG-1280.patch It would be very useful for tools like gridmix if pig could add a 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. Potentially we could use this to re-construct the DAG of jobs in gridmix and so on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1229) allow pig to write output into a JDBC db
[ https://issues.apache.org/jira/browse/PIG-1229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-1229: -- Attachment: pig-1229.patch Ankur, Sorry for getting back late on this. I fiddled with your latest patch and was able to make some progress on it. I am able to get rid of those Path problems (looks like Pig itself is not dealing with it correctly at one place). I think with the patch that I attached should work but I am not able to get test case to pass because of hsqldb problem which I am not able to resolve. I keep getting this error from it: {noformat} Caused by: java.sql.SQLException: The database is already in use by another process: org.hsqldb.persist.niolockf...@4abea04e[file =/private/tmp/batchtest.lck, exists=true, locked=false, valid=false, fl =null]: java.lang.Exception: checkHeartbeat(): lock file [/private/tmp/batchtest.lck] is presumably locked by another process. at org.hsqldb.jdbc.Util.sqlException(Unknown Source) at org.hsqldb.jdbc.jdbcConnection.init(Unknown Source) at org.hsqldb.jdbcDriver.getConnection(Unknown Source) at org.hsqldb.jdbcDriver.connect(Unknown Source) at java.sql.DriverManager.getConnection(DriverManager.java:582) at java.sql.DriverManager.getConnection(DriverManager.java:185) at org.apache.pig.piggybank.storage.DBStorage.prepareToWrite(DBStorage.java:274) {noformat} Anyways here are the changes I made: 1. {code} Index:src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/JobControlCompiler.java === -conf.set(pig.streaming.log.dir, -new Path(outputPath, LOG_DIR).toString()); +//conf.set(pig.streaming.log.dir, +//new Path(outputPath, LOG_DIR).toString()); conf.set(pig.streaming.task.output.dir, outputPath); } {code} This looks like a problem in Pig. Here Pig is incorrectly assuming that it can put logs generated during stream command in output location which is incorrect if output location is something like DB. Since this needs changes in main Pig code, I will suggest to open new jira for it and track it there. 2. Then in DBStorage.java {code} @Override public void setStoreLocation(String location, Job job) throws IOException { job.getConfiguration().set(pig.db.conn.string, location); } @Override public RecordWriterNullWritable, NullWritable getRecordWriter( TaskAttemptContext context) throws IOException, InterruptedException { jdbcURL = context.getConfiguration().get(pig.db.conn.string); return null; } {code} Need to save db connection string in job in setStoreLocation() and then retrieve it in backend in getRecordWriter(). 3. In DBStorage.java {code} @Override public void cleanupOnFailure(String location, Job job) throws IOException { log.error(Job has failed.); } {code} You need to necessarily override this function of StoreFunc() as default implementation assumes FileSystem as the output location. Currently, I left it as no-op but it can be improved to do rollbacks, release db connections etc. allow pig to write output into a JDBC db Key: PIG-1229 URL: https://issues.apache.org/jira/browse/PIG-1229 Project: Pig Issue Type: New Feature Components: impl Reporter: Ian Holsman Assignee: Ankur Priority: Minor Fix For: 0.8.0 Attachments: jira-1229-v2.patch, jira-1229-v3.patch, pig-1229.patch UDF to store data into a DB -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1416) explain does not show the inner plans of the MapReduce plan
explain does not show the inner plans of the MapReduce plan --- Key: PIG-1416 URL: https://issues.apache.org/jira/browse/PIG-1416 Project: Pig Issue Type: Bug Components: impl Reporter: Thejas M Nair Fix For: 0.8.0 The inner plan in MR plan is very useful in understanding query plans where the inner plans are not present in the Physical or Logical plans. For example, in case of order by, the sampling MR job is not part of the physical plan, and the reduce has a POSort which is not shown in explain . In following example, notice that POSort is not shown the MR plan . {code} grunt l = load 'file.txt' as (a, b, c); grunt g = group l by a; grunt f = foreach g { s = order l by $0; generate s; } grunt explain f; grunt explain f; #--- # Logical Plan: #--- Store 1-137 Schema: {s: {a: bytearray,b: bytearray,c: bytearray}} Type: Unknown | |---ForEach 1-136 Schema: {s: {a: bytearray,b: bytearray,c: bytearray}} Type: bag | | | Project 1-132 Projections: [*] Overloaded: false FieldSchema: s: bag({a: bytearray,b: bytearray,c: bytearray}) Type: bag | Input: SORT 1-133| | |---SORT 1-133 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag | | | | | Project 1-134 Projections: [0] Overloaded: false FieldSchema: a: bytearray Type: bytearray | | Input: Project 1-135 Projections: [1] Overloaded: true | | | |---Project 1-135 Projections: [1] Overloaded: true FieldSchema: l: tuple({a: bytearray,b: bytearray,c: bytearray}) Type: tuple | Input: CoGroup 1-126 | |---CoGroup 1-126 Schema: {group: bytearray,l: {a: bytearray,b: bytearray,c: bytearray}} Type: bag | | | Project 1-125 Projections: [0] Overloaded: false FieldSchema: a: bytearray Type: bytearray | Input: Load 1-124 | |---Load 1-124 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag #--- # Physical Plan: #--- Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-148 | |---New For Each(false)[bag] - 1-147 | | | RelationToExpressionProject[bag][*] - 1-146 | | | |---POSort[bag]() - 1-145 | | | | | Project[bytearray][0] - 1-144 | | | |---Project[tuple][1] - 1-143 | |---Package[tuple]{bytearray} - 1-140 | |---Global Rearrange[tuple] - 1-139 | |---Local Rearrange[tuple]{bytearray}(false) - 1-141 | | | Project[bytearray][0] - 1-142 | |---Load(file:///Users/tejas/trunk_oby/file.txt:org.apache.pig.builtin.PigStorage) - 1-138 2010-05-13 15:47:32,102 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2010-05-13 15:47:32,102 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 #-- # Map Reduce Plan #-- MapReduce node 1-149 Map Plan Local Rearrange[tuple]{bytearray}(false) - 1-141 | | | Project[bytearray][0] - 1-142 | |---Load(file:///Users/tejas/trunk_oby/file.txt:org.apache.pig.builtin.PigStorage) - 1-138 Reduce Plan Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-148 | |---New For Each(false)[bag] - 1-147 | | | RelationToExpressionProject[bag][*] - 1-146 | | | |---Project[tuple][1] - 1-143 | |---Package[tuple]{bytearray} - 1-140 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1417) Site changes for 0.7
Site changes for 0.7 Key: PIG-1417 URL: https://issues.apache.org/jira/browse/PIG-1417 Project: Pig Issue Type: Improvement Components: documentation Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1417) Site changes for 0.7
[ https://issues.apache.org/jira/browse/PIG-1417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867323#action_12867323 ] Daniel Dai commented on PIG-1417: - It's too big to attach in Jira. I put it in http://people.apache.org/~daijy/site.patch Site changes for 0.7 Key: PIG-1417 URL: https://issues.apache.org/jira/browse/PIG-1417 Project: Pig Issue Type: Improvement Components: documentation Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1416) explain does not show the inner plans of the MapReduce plan
[ https://issues.apache.org/jira/browse/PIG-1416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejas M Nair resolved PIG-1416. Resolution: Invalid The secondary key optimizer is removing the POSort, that is why it does not appear in MR plan. Closing as invalid. explain does not show the inner plans of the MapReduce plan --- Key: PIG-1416 URL: https://issues.apache.org/jira/browse/PIG-1416 Project: Pig Issue Type: Bug Components: impl Reporter: Thejas M Nair Fix For: 0.8.0 The inner plan in MR plan is very useful in understanding query plans where the inner plans are not present in the Physical or Logical plans. For example, in case of order by, the sampling MR job is not part of the physical plan, and the reduce has a POSort which is not shown in explain . In following example, notice that POSort is not shown the MR plan . {code} grunt l = load 'file.txt' as (a, b, c); grunt g = group l by a; grunt f = foreach g { s = order l by $0; generate s; } grunt explain f; grunt explain f; #--- # Logical Plan: #--- Store 1-137 Schema: {s: {a: bytearray,b: bytearray,c: bytearray}} Type: Unknown | |---ForEach 1-136 Schema: {s: {a: bytearray,b: bytearray,c: bytearray}} Type: bag | | | Project 1-132 Projections: [*] Overloaded: false FieldSchema: s: bag({a: bytearray,b: bytearray,c: bytearray}) Type: bag | Input: SORT 1-133| | |---SORT 1-133 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag | | | | | Project 1-134 Projections: [0] Overloaded: false FieldSchema: a: bytearray Type: bytearray | | Input: Project 1-135 Projections: [1] Overloaded: true | | | |---Project 1-135 Projections: [1] Overloaded: true FieldSchema: l: tuple({a: bytearray,b: bytearray,c: bytearray}) Type: tuple | Input: CoGroup 1-126 | |---CoGroup 1-126 Schema: {group: bytearray,l: {a: bytearray,b: bytearray,c: bytearray}} Type: bag | | | Project 1-125 Projections: [0] Overloaded: false FieldSchema: a: bytearray Type: bytearray | Input: Load 1-124 | |---Load 1-124 Schema: {a: bytearray,b: bytearray,c: bytearray} Type: bag #--- # Physical Plan: #--- Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-148 | |---New For Each(false)[bag] - 1-147 | | | RelationToExpressionProject[bag][*] - 1-146 | | | |---POSort[bag]() - 1-145 | | | | | Project[bytearray][0] - 1-144 | | | |---Project[tuple][1] - 1-143 | |---Package[tuple]{bytearray} - 1-140 | |---Global Rearrange[tuple] - 1-139 | |---Local Rearrange[tuple]{bytearray}(false) - 1-141 | | | Project[bytearray][0] - 1-142 | |---Load(file:///Users/tejas/trunk_oby/file.txt:org.apache.pig.builtin.PigStorage) - 1-138 2010-05-13 15:47:32,102 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size before optimization: 1 2010-05-13 15:47:32,102 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MultiQueryOptimizer - MR plan size after optimization: 1 #-- # Map Reduce Plan #-- MapReduce node 1-149 Map Plan Local Rearrange[tuple]{bytearray}(false) - 1-141 | | | Project[bytearray][0] - 1-142 | |---Load(file:///Users/tejas/trunk_oby/file.txt:org.apache.pig.builtin.PigStorage) - 1-138 Reduce Plan Store(fakefile:org.apache.pig.builtin.PigStorage) - 1-148 | |---New For Each(false)[bag] - 1-147 | | | RelationToExpressionProject[bag][*] - 1-146 | | | |---Project[tuple][1] - 1-143 | |---Package[tuple]{bytearray} - 1-140 Global sort: false {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file
[ https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867335#action_12867335 ] Daniel Dai commented on PIG-1381: - Reattach the patch to address Ashutosh's comment. Need a way for Pig to take an alternative property file --- Key: PIG-1381 URL: https://issues.apache.org/jira/browse/PIG-1381 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: V.V.Chaitanya Krishna Fix For: 0.7.0, 0.8.0 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, PIG-1381-4.patch, PIG-1381-5.patch Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a default pig.properties and if user have a different pig.properties, there will be a conflict since we can only read one. There are couple of ways to solve it: 1. Give a command line option for user to pass an additional property file 2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1381) Need a way for Pig to take an alternative property file
[ https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1381: Attachment: PIG-1381-5.patch Need a way for Pig to take an alternative property file --- Key: PIG-1381 URL: https://issues.apache.org/jira/browse/PIG-1381 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: V.V.Chaitanya Krishna Fix For: 0.7.0, 0.8.0 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, PIG-1381-4.patch, PIG-1381-5.patch Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a default pig.properties and if user have a different pig.properties, there will be a conflict since we can only read one. There are couple of ways to solve it: 1. Give a command line option for user to pass an additional property file 2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1415) LoadFunc signature is not correct in LoadFunc.getSchema sometimes
[ https://issues.apache.org/jira/browse/PIG-1415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1415: Attachment: PIG-1415-1.patch LoadFunc signature is not correct in LoadFunc.getSchema sometimes - Key: PIG-1415 URL: https://issues.apache.org/jira/browse/PIG-1415 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.8.0 Attachments: PIG-1415-1.patch The following script does not set signature correctly when we call LoadFunc.getSchema. a = load 'xxx' using TableLoader('xxx') as (a, b, c); However, if we don't give schema to a, we get the right signature: a = load 'xxx' using TableLoader('xxx); Diagnosis: Parser will generate LoadClause before go to the generation Alias = LoadClause, which actually set signature to the LOLoad. When we give a schema, parser try to call LOLoad.setSchema(), internally it will call LoadFunc.determineSchema. And at that time, signature has not been set yet. This relates to the change we cache determinedSchema in LOLoad [PIG-1317|https://issues.apache.org/jira/browse/PIG-1317]. Before that change, we will later call LoadFunc.getSchema() again using the right signature. Now we cache determinedSchema, so LoadFunc don't have a chance to get the right signature inside LoadFunc.getSchema() Solution: We shall not call LoadFunc.determineSchema inside LOLoad.setSchema(). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1381) Need a way for Pig to take an alternative property file
[ https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1381: Attachment: (was: PIG-1381-5.patch) Need a way for Pig to take an alternative property file --- Key: PIG-1381 URL: https://issues.apache.org/jira/browse/PIG-1381 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: V.V.Chaitanya Krishna Fix For: 0.7.0, 0.8.0 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, PIG-1381-4.patch, PIG-1381-5.patch Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a default pig.properties and if user have a different pig.properties, there will be a conflict since we can only read one. There are couple of ways to solve it: 1. Give a command line option for user to pass an additional property file 2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1381) Need a way for Pig to take an alternative property file
[ https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1381: Attachment: PIG-1381-5.patch Need a way for Pig to take an alternative property file --- Key: PIG-1381 URL: https://issues.apache.org/jira/browse/PIG-1381 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: V.V.Chaitanya Krishna Fix For: 0.7.0, 0.8.0 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, PIG-1381-4.patch, PIG-1381-5.patch Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a default pig.properties and if user have a different pig.properties, there will be a conflict since we can only read one. There are couple of ways to solve it: 1. Give a command line option for user to pass an additional property file 2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file
[ https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867352#action_12867352 ] Daniel Dai commented on PIG-1381: - Option 2 committed to both trunk and branch 0.7. Need a way for Pig to take an alternative property file --- Key: PIG-1381 URL: https://issues.apache.org/jira/browse/PIG-1381 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: V.V.Chaitanya Krishna Fix For: 0.7.0, 0.8.0 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, PIG-1381-4.patch, PIG-1381-5.patch Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a default pig.properties and if user have a different pig.properties, there will be a conflict since we can only read one. There are couple of ways to solve it: 1. Give a command line option for user to pass an additional property file 2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1280) Add a pig-script-id to the JobConf of all jobs run in a pig-script
[ https://issues.apache.org/jira/browse/PIG-1280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867371#action_12867371 ] Daniel Dai commented on PIG-1280: - With PIG-1381 checked in, we need to add config entry into pig-default.properties instead of pig.properties. Note this change when commit. Add a pig-script-id to the JobConf of all jobs run in a pig-script -- Key: PIG-1280 URL: https://issues.apache.org/jira/browse/PIG-1280 Project: Pig Issue Type: Improvement Components: impl Reporter: Arun C Murthy Assignee: Richard Ding Fix For: 0.8.0 Attachments: PIG-1280.patch, PIG-1280.patch It would be very useful for tools like gridmix if pig could add a 'pig-script-id' to all Map-Reduce jobs spawned by a single pig-script. Potentially we could use this to re-construct the DAG of jobs in gridmix and so on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-566) Dump and store outputs do not match for PigStorage
[ https://issues.apache.org/jira/browse/PIG-566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867373#action_12867373 ] Daniel Dai commented on PIG-566: +1 once hudson test pass. Dump and store outputs do not match for PigStorage -- Key: PIG-566 URL: https://issues.apache.org/jira/browse/PIG-566 Project: Pig Issue Type: Bug Affects Versions: 0.7.0, 0.8.0 Reporter: Santhosh Srinivasan Assignee: Gianmarco De Francisci Morales Priority: Minor Fix For: 0.7.0, 0.8.0 Attachments: PIG-566.patch, PIG-566.patch, PIG-566.patch, PIG-566.patch, PIG-566.patch The dump and store formats for PigStorage do not match for longs and floats. {code} grunt y = foreach x generate {(2985671202194220139L)}; grunt describe y; y: {{(long)}} grunt dump y; ({(2985671202194220139L)}) grunt store y into 'y'; grunt cat y {(2985671202194220139)} {code} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file
[ https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867376#action_12867376 ] V.V.Chaitanya Krishna commented on PIG-1381: Apologies for coming in so late. I was on vacation and didnt have access to internet. I have done the coding part for option 1. I need to write unit test cases for it. I should be submitting the patch positively by saturday. Need a way for Pig to take an alternative property file --- Key: PIG-1381 URL: https://issues.apache.org/jira/browse/PIG-1381 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: V.V.Chaitanya Krishna Fix For: 0.7.0, 0.8.0 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, PIG-1381-4.patch, PIG-1381-5.patch Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a default pig.properties and if user have a different pig.properties, there will be a conflict since we can only read one. There are couple of ways to solve it: 1. Give a command line option for user to pass an additional property file 2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1381) Need a way for Pig to take an alternative property file
[ https://issues.apache.org/jira/browse/PIG-1381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12867378#action_12867378 ] Daniel Dai commented on PIG-1381: - Great. Thanks V.V.Chaitanya! Need a way for Pig to take an alternative property file --- Key: PIG-1381 URL: https://issues.apache.org/jira/browse/PIG-1381 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.7.0 Reporter: Daniel Dai Assignee: V.V.Chaitanya Krishna Fix For: 0.7.0, 0.8.0 Attachments: PIG-1381-1.patch, PIG-1381-2.patch, PIG-1381-3.patch, PIG-1381-4.patch, PIG-1381-5.patch Currently, Pig read the first ever pig.properties in CLASSPATH. Pig has a default pig.properties and if user have a different pig.properties, there will be a conflict since we can only read one. There are couple of ways to solve it: 1. Give a command line option for user to pass an additional property file 2. Change the name for default pig.properties to pig-default.properties, and user can give a pig.properties to override 3. Further, can we consider to use pig-default.xml/pig-site.xml, which seems to be more natural for hadoop community. If so, we shall provide backward compatibility to also read pig.properties, pig-cluster-hadoop-site.xml. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.