[jira] Commented: (PIG-992) [zebra] Separate Schema-related files into a Schema package
[ https://issues.apache.org/jira/browse/PIG-992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763439#action_12763439 ] Hong Tang commented on PIG-992: --- Comments: - In many places, both types.ParseException and schema.ParseException are thrown. Do you really want both? - In the following {noformat} +public enum ColumnType implements Writable { {noformat} Is the Writable interface actually used? You have rather odd pattern of asymmetric readFields and write: {noformat} + @Override + public void readFields(DataInput in) throws IOException { +// no op, instantiated by the caller + } + + @Override + public void write(DataOutput out) throws IOException { +Utils.writeString(out, name); + } {noformat} - In the following code {noformat} + public static class ColumnSchema { +public String name; +public ColumnType type; +public Schema schema; +public int index; // field index in schema {noformat} Exposing fields as all-public seems like a bad idea. - Is there a specific usage case to allow schema to be mutable at any time? (minor nit: the comment says add a field, but the code seems to add a column to the schema). {noformat} + /** + * add a field + */ + public void add(ColumnSchema f) throws ParseException + { +add(f, false); + } {noformat} - Why Schema.equals(Object) is not implemented on top of the static version of the method (or vice versa)? - In Schema.readFields(), the Version string from the input is not checked for compatibility. - In the following {noformat} + private void init(String[] columnNames) throws ParseException { +// the arg must be of type or they will be treated as the default type +// TODO: verify column names don't contain COLUMN_DELIMITER {noformat} It seems that the TODO should not involve too much work and please consider not deferring it later. - Need more detailed documentation on the spec of the parameter for Schema.getColumnSchema(String name) {noformat} + /** + * Get a column's schema + */ + public ColumnSchema getColumnSchema(String name) throws ParseException + { {noformat} - Schema.getColumnSchemaOnParsedName and Schema.getColumnSchema seems to be copy/paste code. - Schema.getColumnSchema(ParsedName pn) has side effect of modifying the parameter pn. The javadoc reads cryptic to me. - There are many classes generated by JavaCC. It is probably better not including them in the patch (and put the generated source under build/src). Other minor issues: - Typically contrib projects should use the version string as the parent project. - Style: there are some very long lines. - There are a few white space changes. That should be avoided if possible. - In the following {noformat} +} catch (org.apache.hadoop.zebra.schema.ParseException e) { + throw new AssertionError(Invalid Projection: +e.getMessage()); {noformat} consider change AssertionError to IllegalArgumentException. - In the following: {noformat} + /* + * helper class to parse a column name string one section at a time and find the required + * type for the parsed part. + */ + public static class ParsedName { +public String mName; +int mKeyOffset; // the offset where the keysstring starts +public ColumnType mDT = ColumnType.ANY; // parent's type {noformat} The description seems to indicate that this should not be a public class. I tried to understand the body of the class and do not feel that it serves a general purpose. - The following seems like useless assignment: {noformat} + private long mVersion = schemaVersion; {noformat} - {noformat} /** + * Normalize the schema string. + * + * @param value + * the input string representation of the schema. + * @return the normalized string representation. + */ + public static String normalize(String value) { +String result = new String(); + +if (value == null || value.trim().isEmpty()) + return result; + +StringBuilder sb = new StringBuilder(); +String[] parts = value.trim().split(COLUMN_DELIMITER); +for (int nx = 0; nx parts.length; nx++) { + if (nx 0) sb.append(COLUMN_DELIMITER); + sb.append(parts[nx].trim()); +} +return sb.toString(); + } {noformat} There is a wasted value.trim(). - In Schema.equals(Object), instead of comparing class equality, using instanceof is typically better. - Use StringBuilder instead in the following code: {noformat} +String merged = new String(); +for (int i = 0; i columnNames.length; i++) { + if (i 0) merged += ,; + merged += columnNames[i]; +} {noformat} - There are a few indentation problems. [zebra] Separate Schema-related files into a Schema package - Key: PIG-992 URL: https://issues.apache.org/jira/browse/PIG-992 Project: Pig Issue Type: Improvement
[jira] Updated: (PIG-976) Multi-query optimization throws ClassCastException
[ https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-976: - Status: Patch Available (was: Open) Multi-query optimization throws ClassCastException -- Key: PIG-976 URL: https://issues.apache.org/jira/browse/PIG-976 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Ankur Assignee: Richard Ding Attachments: PIG-976.patch, PIG-976.patch Multi-query optimization fails to merge 2 branches when 1 is a result of Group By ALL and another is a result of Group By field1 where field 1 is of type long. Here is the script that fails with multi-query on. data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); A = GROUP data ALL; B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2; C = FOREACH B GENERATE (sum1/sum2) AS rate; STORE C INTO 'result1'; D = GROUP data BY a; E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c); STORE E into 'result2'; Here is the exception from the logs java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.runPipeline(PODemux.java:264) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.PODemux.getNext(PODemux.java:254) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.processOnePackageOutput(PigCombiner.java:196) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:174) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigCombiner$Combine.reduce(PigCombiner.java:63) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.combineAndSpill(MapTask.java:906) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill(MapTask.java:786) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush(MapTask.java:698) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:228) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2206) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-989) Allow type merge between numerical type and non-numerical type
[ https://issues.apache.org/jira/browse/PIG-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763592#action_12763592 ] Olga Natkovich commented on PIG-989: +1, looks good. Please, commit Allow type merge between numerical type and non-numerical type -- Key: PIG-989 URL: https://issues.apache.org/jira/browse/PIG-989 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.5.0 Reporter: Daniel Dai Attachments: PIG-989-1.patch, PIG-989-2.patch Currently, we do not allow type merge between numerical type and non-numerical type. And the error message is confusing. Eg, if you run: a = load '1.txt' as (a0:chararray, a1:chararray); b = load '2.txt' as (b0:long, b1:chararray); c = join a by a0, b by b0; dump c; And the error message is ERROR 1051: Cannot cast to Unknown We shall: 1. Allow the type merge between numerical type and non-numerical type 2. Or at least, provide more meaningful error message to the user -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-948) [Usability] Relating pig script with MR jobs
[ https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763593#action_12763593 ] Olga Natkovich commented on PIG-948: +1, please, commit [Usability] Relating pig script with MR jobs Key: PIG-948 URL: https://issues.apache.org/jira/browse/PIG-948 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.4.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Minor Fix For: 0.6.0 Attachments: pig-948-2.patch, pig-948-3.patch, PIG-948-4.patch, pig-948.patch Currently its hard to find a way to relate pig script with specific MR job. In a loaded cluster with multiple simultaneous job submissions, its not easy to figure out which specific MR jobs were launched for a given pig script. If Pig can provide this info, it will be useful to debug and monitor the jobs resulting from a pig script. At the very least, Pig should be able to provide user the following information 1) Job id of the launched job. 2) Complete web url of jobtracker running this job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-989) Allow type merge between numerical type and non-numerical type
[ https://issues.apache.org/jira/browse/PIG-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-989: --- Resolution: Fixed Hadoop Flags: [Reviewed] Status: Resolved (was: Patch Available) Patch committed Allow type merge between numerical type and non-numerical type -- Key: PIG-989 URL: https://issues.apache.org/jira/browse/PIG-989 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.5.0 Reporter: Daniel Dai Attachments: PIG-989-1.patch, PIG-989-2.patch Currently, we do not allow type merge between numerical type and non-numerical type. And the error message is confusing. Eg, if you run: a = load '1.txt' as (a0:chararray, a1:chararray); b = load '2.txt' as (b0:long, b1:chararray); c = join a by a0, b by b0; dump c; And the error message is ERROR 1051: Cannot cast to Unknown We shall: 1. Allow the type merge between numerical type and non-numerical type 2. Or at least, provide more meaningful error message to the user -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-948) [Usability] Relating pig script with MR jobs
[ https://issues.apache.org/jira/browse/PIG-948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-948: --- Resolution: Fixed Status: Resolved (was: Patch Available) Patch committed. [Usability] Relating pig script with MR jobs Key: PIG-948 URL: https://issues.apache.org/jira/browse/PIG-948 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.4.0 Reporter: Ashutosh Chauhan Assignee: Ashutosh Chauhan Priority: Minor Fix For: 0.6.0 Attachments: pig-948-2.patch, pig-948-3.patch, PIG-948-4.patch, pig-948.patch Currently its hard to find a way to relate pig script with specific MR job. In a loaded cluster with multiple simultaneous job submissions, its not easy to figure out which specific MR jobs were launched for a given pig script. If Pig can provide this info, it will be useful to debug and monitor the jobs resulting from a pig script. At the very least, Pig should be able to provide user the following information 1) Job id of the launched job. 2) Complete web url of jobtracker running this job. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-976) Multi-query optimization throws ClassCastException
[ https://issues.apache.org/jira/browse/PIG-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763645#action_12763645 ] Pradeep Kamath commented on PIG-976: Reviewed the new patch - one comment is on POMultiQueryPackage: {code} 203 Object obj = tuple.get(0); 204 if (obj instanceof PigNullableWritable) { 205 ((PigNullableWritable)obj).setIndex(origIndex); 206 } 207 else { 208 PigNullableWritable myObj = HDataType.getWritableComparableTypes(obj, (byte)0); 209 myObj.setIndex(origIndex); 210 tuple.set(0, myObj); 211 } {code} If obj is null then the above code in the else would give an exception - I think the code should check for obj == null and if so create a NullWritable object where NullWritable is a subclass of PigNullableWritable representing a null. Since only the getValueAsPigType() method is used in PODemux, that would always return null for this use case. Multi-query optimization throws ClassCastException -- Key: PIG-976 URL: https://issues.apache.org/jira/browse/PIG-976 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.4.0 Reporter: Ankur Assignee: Richard Ding Attachments: PIG-976.patch, PIG-976.patch Multi-query optimization fails to merge 2 branches when 1 is a result of Group By ALL and another is a result of Group By field1 where field 1 is of type long. Here is the script that fails with multi-query on. data = LOAD 'test' USING PigStorage('\t') AS (a:long, b:double, c:double); A = GROUP data ALL; B = FOREACH A GENERATE SUM(data.b) AS sum1, SUM(data.c) AS sum2; C = FOREACH B GENERATE (sum1/sum2) AS rate; STORE C INTO 'result1'; D = GROUP data BY a; E = FOREACH D GENERATE group AS a, SUM(data.b), SUM(data.c); STORE E into 'result2'; Here is the exception from the logs java.lang.ClassCastException: org.apache.pig.data.DefaultTuple cannot be cast to org.apache.pig.data.DataBag at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.processInputBag(POProject.java:399) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject.getNext(POProject.java:180) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.processInput(POUserFunc.java:145) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:197) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POUserFunc.getNext(POUserFunc.java:235) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.processPlan(POForEach.java:254) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POForEach.getNext(POForEach.java:204) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.PhysicalOperator.processInput(PhysicalOperator.java:231) at org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POLocalRearrange.getNext(POLocalRearrange.java:240) at
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Attachment: PIG-922-p3_9.patch Fix the unit test Logical optimizer: push up project -- Key: PIG-922 URL: https://issues.apache.org/jira/browse/PIG-922 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch, PIG-922-p3_6.patch, PIG-922-p3_7.patch, PIG-922-p3_8.patch, PIG-922-p3_9.patch This is a continuation work of [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another rule to the logical optimizer: Push up project, ie, prune columns as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Status: Open (was: Patch Available) Logical optimizer: push up project -- Key: PIG-922 URL: https://issues.apache.org/jira/browse/PIG-922 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch, PIG-922-p3_6.patch, PIG-922-p3_7.patch, PIG-922-p3_8.patch, PIG-922-p3_9.patch This is a continuation work of [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another rule to the logical optimizer: Push up project, ie, prune columns as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-922: --- Status: Patch Available (was: Open) Logical optimizer: push up project -- Key: PIG-922 URL: https://issues.apache.org/jira/browse/PIG-922 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch, PIG-922-p3_6.patch, PIG-922-p3_7.patch, PIG-922-p3_8.patch, PIG-922-p3_9.patch This is a continuation work of [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another rule to the logical optimizer: Push up project, ie, prune columns as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Pig 0.4.0 is released!
Pig Team is happy to announce Pig 0.4.0 release! Pig is a Hadoop subproject that provides high-level data-flow language and an execution framework for parallel computation on a Hadoop cluster. More details about Pig can be found at http://hadoop.apache.org/pig/. This release introduces two new types of join. The skewed join improves join performance for the data with large skew in the join key. The merge join improves performance for the case where both inputs are sorted on the join key. The release also includes support for outer join. The details of the release can be found at http://hadoop.apache.org/pig/releases.html The publishing of this release has been delayed due to problems with Apache infrastructure that prevented us from publishing the updated site. Olga
[jira] Commented: (PIG-995) Limit Optimizer throw exception ERROR 2156: Error while fixing projections
[ https://issues.apache.org/jira/browse/PIG-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763694#action_12763694 ] Pradeep Kamath commented on PIG-995: +1 Limit Optimizer throw exception ERROR 2156: Error while fixing projections Key: PIG-995 URL: https://issues.apache.org/jira/browse/PIG-995 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-995-1.patch The following script fail: A = load '1.txt' AS (a0, a1, a2); B = order A by a1; C = limit B 10; D = foreach C generate $0; dump D; Error log: Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2156: Error while fixing projections. Projection map of node to be replaced is null. at org.apache.pig.impl.logicalLayer.ProjectFixerUpper.visit(ProjectFixerUpper.java:138) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:408) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LOForEach.rewire(LOForEach.java:761) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763705#action_12763705 ] Pradeep Kamath commented on PIG-922: Reviewed changes per my last review comments - looks good - +1 Logical optimizer: push up project -- Key: PIG-922 URL: https://issues.apache.org/jira/browse/PIG-922 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch, PIG-922-p3_6.patch, PIG-922-p3_7.patch, PIG-922-p3_8.patch, PIG-922-p3_9.patch This is a continuation work of [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another rule to the logical optimizer: Push up project, ie, prune columns as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-995) Limit Optimizer throw exception ERROR 2156: Error while fixing projections
[ https://issues.apache.org/jira/browse/PIG-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-995: --- Attachment: PIG-995-2.patch After discussion with Santhosh, I get a better patch. The problem is we do not generate projection map before applying optimization rules. If the optimization rules change the structure of the logical plan and then generate the projection map, we will end up using a wrong projection map. In the new patch, we regenerate projection map before applying each optimization rule. Limit Optimizer throw exception ERROR 2156: Error while fixing projections Key: PIG-995 URL: https://issues.apache.org/jira/browse/PIG-995 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-995-1.patch, PIG-995-2.patch The following script fail: A = load '1.txt' AS (a0, a1, a2); B = order A by a1; C = limit B 10; D = foreach C generate $0; dump D; Error log: Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2156: Error while fixing projections. Projection map of node to be replaced is null. at org.apache.pig.impl.logicalLayer.ProjectFixerUpper.visit(ProjectFixerUpper.java:138) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:408) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LOForEach.rewire(LOForEach.java:761) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-995) Limit Optimizer throw exception ERROR 2156: Error while fixing projections
[ https://issues.apache.org/jira/browse/PIG-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-995: --- Status: Patch Available (was: Open) Limit Optimizer throw exception ERROR 2156: Error while fixing projections Key: PIG-995 URL: https://issues.apache.org/jira/browse/PIG-995 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-995-1.patch, PIG-995-2.patch The following script fail: A = load '1.txt' AS (a0, a1, a2); B = order A by a1; C = limit B 10; D = foreach C generate $0; dump D; Error log: Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2156: Error while fixing projections. Projection map of node to be replaced is null. at org.apache.pig.impl.logicalLayer.ProjectFixerUpper.visit(ProjectFixerUpper.java:138) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:408) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LOForEach.rewire(LOForEach.java:761) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-995) Limit Optimizer throw exception ERROR 2156: Error while fixing projections
[ https://issues.apache.org/jira/browse/PIG-995?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-995: --- Status: Open (was: Patch Available) Limit Optimizer throw exception ERROR 2156: Error while fixing projections Key: PIG-995 URL: https://issues.apache.org/jira/browse/PIG-995 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-995-1.patch, PIG-995-2.patch The following script fail: A = load '1.txt' AS (a0, a1, a2); B = order A by a1; C = limit B 10; D = foreach C generate $0; dump D; Error log: Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2156: Error while fixing projections. Projection map of node to be replaced is null. at org.apache.pig.impl.logicalLayer.ProjectFixerUpper.visit(ProjectFixerUpper.java:138) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:408) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LOForEach.rewire(LOForEach.java:761) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-922) Logical optimizer: push up project
[ https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763728#action_12763728 ] Hadoop QA commented on PIG-922: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421651/PIG-922-p3_9.patch against trunk revision 823257. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 30 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 287 release audit warnings (more than the trunk's current 280 warnings). +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/15/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/15/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/15/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/15/console This message is automatically generated. Logical optimizer: push up project -- Key: PIG-922 URL: https://issues.apache.org/jira/browse/PIG-922 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, PIG-922-p1_2.patch, PIG-922-p1_3.patch, PIG-922-p1_4.patch, PIG-922-p2_preview.patch, PIG-922-p2_preview2.patch, PIG-922-p3_1.patch, PIG-922-p3_2.patch, PIG-922-p3_3.patch, PIG-922-p3_4.patch, PIG-922-p3_5.patch, PIG-922-p3_6.patch, PIG-922-p3_7.patch, PIG-922-p3_8.patch, PIG-922-p3_9.patch This is a continuation work of [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add another rule to the logical optimizer: Push up project, ie, prune columns as early as possible. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1000) InternalCachedBag.java generates javac warning and findbug warning
InternalCachedBag.java generates javac warning and findbug warning -- Key: PIG-1000 URL: https://issues.apache.org/jira/browse/PIG-1000 Project: Pig Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Ying He Assignee: Ying He Fix For: 0.6.0 POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1000) InternalCachedBag.java generates javac warning and findbug warning
[ https://issues.apache.org/jira/browse/PIG-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-1000: - Attachment: PIG-1000.patch fix javac warning and findbug warning InternalCachedBag.java generates javac warning and findbug warning -- Key: PIG-1000 URL: https://issues.apache.org/jira/browse/PIG-1000 Project: Pig Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Ying He Assignee: Ying He Fix For: 0.6.0 Attachments: PIG-1000.patch POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1000) InternalCachedBag.java generates javac warning and findbug warning
[ https://issues.apache.org/jira/browse/PIG-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-1000: - Description: patch submitted by PIG-975 generates javac warning and findbug warning (was: POPackage uses DefaultDataBag during reduce process to hold data. It is registered with SpillableMemoryManager and prone to OutOfMemoryException. It's better to pro-actively managers the usage of the memory. The bag fills in memory to a specified amount, and dump the rest the disk. The amount of memory to hold tuples is configurable. This can avoid out of memory error.) Patch Info: [Patch Available] InternalCachedBag.java generates javac warning and findbug warning -- Key: PIG-1000 URL: https://issues.apache.org/jira/browse/PIG-1000 Project: Pig Issue Type: Improvement Affects Versions: 0.4.0 Reporter: Ying He Assignee: Ying He Fix For: 0.6.0 Attachments: PIG-1000.patch patch submitted by PIG-975 generates javac warning and findbug warning -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-894) order-by fails when input is empty
[ https://issues.apache.org/jira/browse/PIG-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai reassigned PIG-894: -- Assignee: Daniel Dai order-by fails when input is empty -- Key: PIG-894 URL: https://issues.apache.org/jira/browse/PIG-894 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Daniel Dai Attachments: PIG-894-1.patch grunt l = load 'students.txt' ; grunt f = filter l by 1 == 2; grunt o = order f by $0 ; grunt dump o; This results in 3 MR jobs . The 2nd (sampling) MR creates empty sample file, and 3rd MR (order-by) fails with following error in Map job - java.lang.RuntimeException: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:104) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) Caused by: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:89) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-894) order-by fails when input is empty
[ https://issues.apache.org/jira/browse/PIG-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-894: --- Attachment: PIG-894-1.patch order-by fails when input is empty -- Key: PIG-894 URL: https://issues.apache.org/jira/browse/PIG-894 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Attachments: PIG-894-1.patch grunt l = load 'students.txt' ; grunt f = filter l by 1 == 2; grunt o = order f by $0 ; grunt dump o; This results in 3 MR jobs . The 2nd (sampling) MR creates empty sample file, and 3rd MR (order-by) fails with following error in Map job - java.lang.RuntimeException: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:104) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) Caused by: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:89) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-894) order-by fails when input is empty
[ https://issues.apache.org/jira/browse/PIG-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-894: --- Status: Patch Available (was: Open) order-by fails when input is empty -- Key: PIG-894 URL: https://issues.apache.org/jira/browse/PIG-894 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Daniel Dai Attachments: PIG-894-1.patch grunt l = load 'students.txt' ; grunt f = filter l by 1 == 2; grunt o = order f by $0 ; grunt dump o; This results in 3 MR jobs . The 2nd (sampling) MR creates empty sample file, and 3rd MR (order-by) fails with following error in Map job - java.lang.RuntimeException: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:104) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) Caused by: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:89) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1001) Generate more meaningful error message when one input file does not exist
Generate more meaningful error message when one input file does not exist - Key: PIG-1001 URL: https://issues.apache.org/jira/browse/PIG-1001 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Daniel Dai Fix For: 0.6.0 In the following query, if 2.txt does not exist, a = load '1.txt'; b = order a by $0; c = load '2.txt'; d = order c by $0; e = join b by $0, d by $0; dump e; Pig throws error message ERROR 2100: file:/tmp/temp155054664/tmp1144108421 does not exist., Pig should deal with it with the error message Input file 2.txt not exist instead of those confusing messages. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Hudson build is back to normal: Pig-trunk #581
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/581/changes
[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control
[ https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-987: - Attachment: ColumnGroupSecurity.patch [zebra] Zebra Column Group Access Control - Key: PIG-987 URL: https://issues.apache.org/jira/browse/PIG-987 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored. Expected behavior when column group permissions are set: When user selects only columns that they do not have permissions to access, Zebra should return error with message Error #: Permission denied for accessing column column name or names Access control applies to an entire column group, so all columns in a column group have same permissions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-894) order-by fails when input is empty
[ https://issues.apache.org/jira/browse/PIG-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763776#action_12763776 ] Pradeep Kamath commented on PIG-894: The patch uses pig.inputs property from jobconf which does not directly have the input file name - it actually has a serialized arrayListPairFileSpec, Boolean in string form containing the filespec and the issplittable flag for each input for the job - this serialized string will need to be deserialized using ObjectSerializer.deserialize and then from the filespec, the filename will need to be retrieved. order-by fails when input is empty -- Key: PIG-894 URL: https://issues.apache.org/jira/browse/PIG-894 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Daniel Dai Attachments: PIG-894-1.patch grunt l = load 'students.txt' ; grunt f = filter l by 1 == 2; grunt o = order f by $0 ; grunt dump o; This results in 3 MR jobs . The 2nd (sampling) MR creates empty sample file, and 3rd MR (order-by) fails with following error in Map job - java.lang.RuntimeException: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:104) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) Caused by: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:89) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-986) [zebra] Zebra Column Group Naming Support
[ https://issues.apache.org/jira/browse/PIG-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yan Zhou updated PIG-986: - Attachment: ColumnGroupName.patch removed hard coded group in a few test cases [zebra] Zebra Column Group Naming Support - Key: PIG-986 URL: https://issues.apache.org/jira/browse/PIG-986 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.4.0 Reporter: Chao Wang Assignee: Chao Wang Fix For: 0.6.0 Attachments: ColumnGroupName.patch, ColumnGroupName.patch, ColumnGroupName.patch We introduce column group name to Zebra and make it a first-class citizen in Zebra. This can ease management of column groups. We plan to introduce an as clause for column group name in Zebra's syntax. Functional Specifications: 1) Column group names are optional. For column groups which do not have a user-provided name, Zebra will assign some default column group names internally that is unique for that table - CG0, CG1, CG2 ... Note: If CGx is used by user, then it can not be used for internal names. 2) We introduce an AS clause in Zebra's syntax for column group names. If it occurs, it has to immediately follow [ ]. For example, [a1, a2] as PI secure by user:joe group:secure perm:640; [a3, a4] as General compress by lzo. Note that keyword AS is case insensitive. 3) Column group names are unique within one table and are case sensitive, i.e., c1 and C1 are different. 4) Column group names will be used as the physical column group directory path names. 5) Zebra V2 will support dropColumnGroup by column group names (will integrate with Raghu's A29 drop column work). 6) Zebra V2 can support backward compatibility (If there are Zebra V1 created tables in production when V2 is released). More specifically, this means that Zebra V2 can load from V1-created tables and do dropColumnGroup on it. 7) Does NOT support renaming. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control
[ https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763781#action_12763781 ] Yan Zhou commented on PIG-987: -- remove the hardcoded group name from a few test scripts. This patch and the ones in Pig-991 and Pig-986 are ready to be comitted. But please hold on commiting Pig-992 and afterwards. [zebra] Zebra Column Group Access Control - Key: PIG-987 URL: https://issues.apache.org/jira/browse/PIG-987 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored. Expected behavior when column group permissions are set: When user selects only columns that they do not have permissions to access, Zebra should return error with message Error #: Permission denied for accessing column column name or names Access control applies to an entire column group, so all columns in a column group have same permissions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-894) order-by fails when input is empty
[ https://issues.apache.org/jira/browse/PIG-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-894: --- Status: Open (was: Patch Available) order-by fails when input is empty -- Key: PIG-894 URL: https://issues.apache.org/jira/browse/PIG-894 Project: Pig Issue Type: Bug Reporter: Thejas M Nair Assignee: Daniel Dai Attachments: PIG-894-1.patch, PIG-894-2.patch grunt l = load 'students.txt' ; grunt f = filter l by 1 == 2; grunt o = order f by $0 ; grunt dump o; This results in 3 MR jobs . The 2nd (sampling) MR creates empty sample file, and 3rd MR (order-by) fails with following error in Map job - java.lang.RuntimeException: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:104) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) Caused by: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:89) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-894) order-by fails when input is empty
[ https://issues.apache.org/jira/browse/PIG-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-894: --- Fix Version/s: 0.6.0 Affects Version/s: 0.4.0 Status: Patch Available (was: Open) order-by fails when input is empty -- Key: PIG-894 URL: https://issues.apache.org/jira/browse/PIG-894 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-894-1.patch, PIG-894-2.patch grunt l = load 'students.txt' ; grunt f = filter l by 1 == 2; grunt o = order f by $0 ; grunt dump o; This results in 3 MR jobs . The 2nd (sampling) MR creates empty sample file, and 3rd MR (order-by) fails with following error in Map job - java.lang.RuntimeException: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:104) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) Caused by: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:89) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-894) order-by fails when input is empty
[ https://issues.apache.org/jira/browse/PIG-894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-894: --- Attachment: PIG-894-2.patch Fix the issue Pradeep find. order-by fails when input is empty -- Key: PIG-894 URL: https://issues.apache.org/jira/browse/PIG-894 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-894-1.patch, PIG-894-2.patch grunt l = load 'students.txt' ; grunt f = filter l by 1 == 2; grunt o = order f by $0 ; grunt dump o; This results in 3 MR jobs . The 2nd (sampling) MR creates empty sample file, and 3rd MR (order-by) fails with following error in Map job - java.lang.RuntimeException: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:104) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) Caused by: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:89) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-894) order-by fails when input is empty
[ https://issues.apache.org/jira/browse/PIG-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763786#action_12763786 ] Hadoop QA commented on PIG-894: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421681/PIG-894-1.patch against trunk revision 823257. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/16/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/16/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/16/console This message is automatically generated. order-by fails when input is empty -- Key: PIG-894 URL: https://issues.apache.org/jira/browse/PIG-894 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-894-1.patch, PIG-894-2.patch grunt l = load 'students.txt' ; grunt f = filter l by 1 == 2; grunt o = order f by $0 ; grunt dump o; This results in 3 MR jobs . The 2nd (sampling) MR creates empty sample file, and 3rd MR (order-by) fails with following error in Map job - java.lang.RuntimeException: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:104) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) Caused by: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:89) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-995) Limit Optimizer throw exception ERROR 2156: Error while fixing projections
[ https://issues.apache.org/jira/browse/PIG-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763797#action_12763797 ] Hadoop QA commented on PIG-995: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421671/PIG-995-2.patch against trunk revision 823257. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/67/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/67/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/67/console This message is automatically generated. Limit Optimizer throw exception ERROR 2156: Error while fixing projections Key: PIG-995 URL: https://issues.apache.org/jira/browse/PIG-995 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-995-1.patch, PIG-995-2.patch The following script fail: A = load '1.txt' AS (a0, a1, a2); B = order A by a1; C = limit B 10; D = foreach C generate $0; dump D; Error log: Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2156: Error while fixing projections. Projection map of node to be replaced is null. at org.apache.pig.impl.logicalLayer.ProjectFixerUpper.visit(ProjectFixerUpper.java:138) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:408) at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58) at org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65) at org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.impl.logicalLayer.LOForEach.rewire(LOForEach.java:761) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-987) [zebra] Zebra Column Group Access Control
[ https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763836#action_12763836 ] Raghu Angadi commented on PIG-987: -- Thanks Yan. It might be better to remove gauravj also since it is ignored anyway. This implies column access control is not tested in this patch, right? [zebra] Zebra Column Group Access Control - Key: PIG-987 URL: https://issues.apache.org/jira/browse/PIG-987 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored. Expected behavior when column group permissions are set: When user selects only columns that they do not have permissions to access, Zebra should return error with message Error #: Permission denied for accessing column column name or names Access control applies to an entire column group, so all columns in a column group have same permissions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-894) order-by fails when input is empty
[ https://issues.apache.org/jira/browse/PIG-894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12763844#action_12763844 ] Hadoop QA commented on PIG-894: --- +1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12421694/PIG-894-2.patch against trunk revision 823257. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/17/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/17/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/17/console This message is automatically generated. order-by fails when input is empty -- Key: PIG-894 URL: https://issues.apache.org/jira/browse/PIG-894 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Thejas M Nair Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-894-1.patch, PIG-894-2.patch grunt l = load 'students.txt' ; grunt f = filter l by 1 == 2; grunt o = order f by $0 ; grunt dump o; This results in 3 MR jobs . The 2nd (sampling) MR creates empty sample file, and 3rd MR (order-by) fails with following error in Map job - java.lang.RuntimeException: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:104) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:58) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:82) at org.apache.hadoop.mapred.MapTask$MapOutputBuffer.(MapTask.java:348) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:193) at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2207) Caused by: java.lang.RuntimeException: Empty samples file at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:89) ... 5 more -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section
[ https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-991: - Attachment: Bugs-2.patch I am committing a slightly modified patch. I removed the following lines that modified build.xml at the top level. Please ask one of the PIG committers to commit that change. The part that is removed : {noformat} @@ -940,4 +942,13 @@ target name=published depends=ivy-publish-local, maven-artifacts/ +target name=pig-test +jar + jarfile=${build.dir}/pig-test-${version}.jar + basedir=${build.dir}/test/classes + excludes=**/Test*.class + +/jar +/target + /project {noformat} [zebra] A few minor bugs as described in the Description section Key: PIG-991 URL: https://issues.apache.org/jira/browse/PIG-991 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0 Attachments: Bugs-2.patch, Bugs.patch 1) lzo2 was used as the compressor name for the LZO compression algorithm; it should be lzo instead; 2) the default compression is changed from lzo to gz for gzip; 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old package org.apache.pig.table.types; 4) in build.xml, two new javacc targets are added to generate TableSchemaParser and TableStorageParser java codes; 5) Support of column group security ( https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the dumpinfo method: the groups and permissions were not displayed. Note that as a consequence, the patch herein must be applied after that of JIRA987. 6) and 7) a couple of issues reported in Jira917. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-987) [zebra] Zebra Column Group Access Control
[ https://issues.apache.org/jira/browse/PIG-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-987: - Resolution: Fixed Fix Version/s: 0.6.0 Status: Resolved (was: Patch Available) I just committed this. Thanks Yan! [zebra] Zebra Column Group Access Control - Key: PIG-987 URL: https://issues.apache.org/jira/browse/PIG-987 Project: Pig Issue Type: New Feature Affects Versions: 0.6.0 Reporter: Yan Zhou Assignee: Yan Zhou Fix For: 0.6.0 Attachments: ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, ColumnGroupSecurity.patch, TEST-org.apache.hadoop.zebra.io.TestCheckin.txt, TEST-org.apache.hadoop.zebra.mapred.TestCheckin.txt, tmp-987-plus-991.patch Access Control: when processes try to read from the column groups, Zebra should be able to handle allowed vs. disallowed user/application accesses. The security is eventuallt granted by corresponding HDFS security of the data stored. Expected behavior when column group permissions are set: When user selects only columns that they do not have permissions to access, Zebra should return error with message Error #: Permission denied for accessing column column name or names Access control applies to an entire column group, so all columns in a column group have same permissions. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-991) [zebra] A few minor bugs as described in the Description section
[ https://issues.apache.org/jira/browse/PIG-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi updated PIG-991: - Resolution: Fixed Status: Resolved (was: Patch Available) I just committed this. Thanks Yan. [zebra] A few minor bugs as described in the Description section Key: PIG-991 URL: https://issues.apache.org/jira/browse/PIG-991 Project: Pig Issue Type: Bug Affects Versions: 0.4.0 Reporter: Yan Zhou Assignee: Yan Zhou Priority: Minor Fix For: 0.6.0 Attachments: Bugs-2.patch, Bugs.patch 1) lzo2 was used as the compressor name for the LZO compression algorithm; it should be lzo instead; 2) the default compression is changed from lzo to gz for gzip; 3) In JAVACC file SchemaParser.jjt, the package name was wrong using the old package org.apache.pig.table.types; 4) in build.xml, two new javacc targets are added to generate TableSchemaParser and TableStorageParser java codes; 5) Support of column group security ( https://issues.apache.org/jira/browse/PIG-987 ) lacked support of the dumpinfo method: the groups and permissions were not displayed. Note that as a consequence, the patch herein must be applied after that of JIRA987. 6) and 7) a couple of issues reported in Jira917. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.