[jira] Commented: (PIG-1602) The .classpath of eclipse template still use hbase-0.20.0
[ https://issues.apache.org/jira/browse/PIG-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906451#action_12906451 ] Dmitriy V. Ryaboy commented on PIG-1602: +1 The .classpath of eclipse template still use hbase-0.20.0 - Key: PIG-1602 URL: https://issues.apache.org/jira/browse/PIG-1602 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Fix For: 0.8.0 Attachments: PIG_1602.patch The .classpath of eclipse template still use hbase-0.20.0, it should be updated to hbase-0.20.6 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1602) The .classpath of eclipse template still use hbase-0.20.0
[ https://issues.apache.org/jira/browse/PIG-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906467#action_12906467 ] Jeff Zhang commented on PIG-1602: - Patch committed to both trunk and branch-0.8 The .classpath of eclipse template still use hbase-0.20.0 - Key: PIG-1602 URL: https://issues.apache.org/jira/browse/PIG-1602 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Fix For: 0.8.0 Attachments: PIG_1602.patch The .classpath of eclipse template still use hbase-0.20.0, it should be updated to hbase-0.20.6 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1602) The .classpath of eclipse template still use hbase-0.20.0
[ https://issues.apache.org/jira/browse/PIG-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jeff Zhang resolved PIG-1602. - Resolution: Fixed The .classpath of eclipse template still use hbase-0.20.0 - Key: PIG-1602 URL: https://issues.apache.org/jira/browse/PIG-1602 Project: Pig Issue Type: Bug Affects Versions: 0.8.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Priority: Minor Fix For: 0.8.0 Attachments: PIG_1602.patch The .classpath of eclipse template still use hbase-0.20.0, it should be updated to hbase-0.20.6 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1178) LogicalPlan and Optimizer are too complex and hard to work with
[ https://issues.apache.org/jira/browse/PIG-1178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906592#action_12906592 ] Daniel Dai commented on PIG-1178: - Patch PIG-1178-10.patch committed. LogicalPlan and Optimizer are too complex and hard to work with --- Key: PIG-1178 URL: https://issues.apache.org/jira/browse/PIG-1178 Project: Pig Issue Type: Improvement Reporter: Alan Gates Assignee: Daniel Dai Fix For: 0.8.0 Attachments: expressions-2.patch, expressions.patch, lp.patch, lp.patch, PIG-1178-10.patch, PIG-1178-4.patch, PIG-1178-5.patch, PIG-1178-6.patch, PIG-1178-7.patch, PIG-1178-8.patch, PIG-1178-9.patch, pig_1178.patch, pig_1178.patch, PIG_1178.patch, pig_1178_2.patch, pig_1178_3.2.patch, pig_1178_3.3.patch, pig_1178_3.4.patch, pig_1178_3.patch The current implementation of the logical plan and the logical optimizer in Pig has proven to not be easily extensible. Developer feedback has indicated that adding new rules to the optimizer is quite burdensome. In addition, the logical plan has been an area of numerous bugs, many of which have been difficult to fix. Developers also feel that the logical plan is difficult to understand and maintain. The root cause for these issues is that a number of design decisions that were made as part of the 0.2 rewrite of the front end have now proven to be sub-optimal. The heart of this proposal is to revisit a number of those proposals and rebuild the logical plan with a simpler design that will make it much easier to maintain the logical plan as well as extend the logical optimizer. See http://wiki.apache.org/pig/PigLogicalPlanOptimizerRewrite for full details. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1594) NullPointerException in new logical planner
[ https://issues.apache.org/jira/browse/PIG-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-1594. - Resolution: Fixed This issue is fixed by PIG-1178-10.patch. NullPointerException in new logical planner --- Key: PIG-1594 URL: https://issues.apache.org/jira/browse/PIG-1594 Project: Pig Issue Type: Bug Reporter: Andrew Hitchcock Assignee: Daniel Dai Fix For: 0.8.0 I've been testing the trunk version of Pig on Elastic MapReduce against our log processing sample application(1). When I try to run the query it throws a NullPointerException and suggests I disable the new logical plan. Disabling it works and the script succeeds. Here is the query I'm trying to run: {code} register file:/home/hadoop/lib/pig/piggybank.jar DEFINE EXTRACT org.apache.pig.piggybank.evaluation.string.EXTRACT(); RAW_LOGS = LOAD '$INPUT' USING TextLoader as (line:chararray); LOGS_BASE= foreach RAW_LOGS generate FLATTEN(EXTRACT(line, '^(\\S+) (\\S+) (\\S+) \\[([\\w:/]+\\s[+\\-]\\d{4})\\] (.+?) (\\S+) (\\S+) ([^]*) ([^]*)')) as (remoteAddr:chararray, remoteLogname:chararray, user:chararray, time:chararray, request:chararray, status:int, bytes_string:chararray, referrer:chararray, browser:chararray); REFERRER_ONLY = FOREACH LOGS_BASE GENERATE referrer; FILTERED = FILTER REFERRER_ONLY BY referrer matches '.*bing.*' OR referrer matches '.*google.*'; SEARCH_TERMS = FOREACH FILTERED GENERATE FLATTEN(EXTRACT(referrer, '.*[\\?]q=([^]+).*')) as terms:chararray; SEARCH_TERMS_FILTERED = FILTER SEARCH_TERMS BY NOT $0 IS NULL; SEARCH_TERMS_COUNT = FOREACH (GROUP SEARCH_TERMS_FILTERED BY $0) GENERATE $0, COUNT($1) as num; SEARCH_TERMS_COUNT_SORTED = LIMIT(ORDER SEARCH_TERMS_COUNT BY num DESC) 50; STORE SEARCH_TERMS_COUNT_SORTED into '$OUTPUT'; {code} And here is the stack trace that results: {code} ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. org.apache.pig.backend.executionengine.ExecException: ERROR 2042: Error in new logical plan. Try -Dpig.usenewlogicalplan=false. at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.compile(HExecutionEngine.java:285) at org.apache.pig.PigServer.compilePp(PigServer.java:1301) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:1154) at org.apache.pig.PigServer.execute(PigServer.java:1148) at org.apache.pig.PigServer.access$100(PigServer.java:123) at org.apache.pig.PigServer$Graph.execute(PigServer.java:1464) at org.apache.pig.PigServer.executeBatchEx(PigServer.java:350) at org.apache.pig.PigServer.executeBatch(PigServer.java:324) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:111) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:90) at org.apache.pig.Main.run(Main.java:491) at org.apache.pig.Main.main(Main.java:107) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) Caused by: java.lang.NullPointerException at org.apache.pig.EvalFunc.getSchemaName(EvalFunc.java:76) at org.apache.pig.piggybank.impl.ErrorCatchingBase.outputSchema(ErrorCatchingBase.java:76) at org.apache.pig.newplan.logical.expression.UserFuncExpression.getFieldSchema(UserFuncExpression.java:111) at org.apache.pig.newplan.logical.optimizer.FieldSchemaResetter.execute(SchemaResetter.java:175) at org.apache.pig.newplan.logical.expression.AllSameExpressionVisitor.visit(AllSameExpressionVisitor.java:143) at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:55) at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:69) at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:50) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:87) at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:149) at org.apache.pig.newplan.DependencyOrderWalker.walk(DependencyOrderWalker.java:74) at org.apache.pig.newplan.logical.optimizer.SchemaResetter.visit(SchemaResetter.java:76) at
[jira] Commented: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906671#action_12906671 ] Jeff Zhang commented on PIG-794: Dmitriy, In my patch I turn InternalMap as an avro array whose element is a record having two datums(one is key and the other is value). But it occurred weird exception , not know what's wrong with my code {code} Exception in thread main java.lang.NullPointerException at org.apache.avro.io.parsing.Parser.advance(Parser.java:86) at org.apache.avro.io.ResolvingDecoder.readFieldOrder(ResolvingDecoder.java:121) at org.apache.pig.impl.io.avro.PigDataRecordReader.readRecord(PigDataRecordReader.java:77) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:106) at org.apache.pig.impl.io.avro.PigDataRecordReader.readRecord(PigDataRecordReader.java:66) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:106) at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:184) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:108) at org.apache.pig.impl.io.avro.PigDataRecordReader.readRecord(PigDataRecordReader.java:81) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:106) at org.apache.avro.generic.GenericDatumReader.readArray(GenericDatumReader.java:184) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:108) at org.apache.pig.impl.io.avro.PigDataRecordReader.readRecord(PigDataRecordReader.java:83) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:106) at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:97) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:198) at org.apache.avro.file.DataFileStream.next(DataFileStream.java:185) at org.apache.pig.impl.io.avro.PigData.main(PigData.java:224) {code} Use Avro serialization in Pig - Key: PIG-794 URL: https://issues.apache.org/jira/browse/PIG-794 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Rakesh Setty Assignee: Dmitriy V. Ryaboy Attachments: avro-0.1-dev-java_r765402.jar, AvroStorage.patch, AvroStorage_2.patch, AvroStorage_3.patch, AvroStorage_4.patch, AvroTest.java, jackson-asl-0.9.4.jar, PIG-794.patch We would like to use Avro serialization in Pig to pass data between MR jobs instead of the current BinStorage. Attached is an implementation of AvroBinStorage which performs significantly better compared to BinStorage on our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.