[jira] Updated: (PIG-513) PERFORMANCE: optimize some of the code in DefaultTuple
[ https://issues.apache.org/jira/browse/PIG-513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ashutosh Chauhan updated PIG-513: - Status: Patch Available (was: Reopened) PERFORMANCE: optimize some of the code in DefaultTuple -- Key: PIG-513 URL: https://issues.apache.org/jira/browse/PIG-513 Project: Pig Issue Type: Bug Affects Versions: 0.2.0 Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-513.patch, pig-513_2.patch The following areas in DefaultTuple.java can be changed: The member methods get(), set(), getType() and isNull() all call checkBounds() which is redundant call since all these 4 functions throw ExecException. Instead of doing a bounds check, we can catch the IndexOutOfBounds exception in a try-catch and throw it as an ExecException The write() method has the following unused object (d in the code below): {code} for (int i = 0; i sz; i++) { try { Object d = get(i); } catch (ExecException ee) { throw new RuntimeException(ee); } DataReaderWriter.writeDatum(out, mFields.get(i)); } {code} {noformat} The get(i) call in the try should be replaced by the writeDatum call directly since d is never used and there is an unncessary call to get() {noformat} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-882) log level not propogated to loggers
[ https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-882: --- Attachment: PIG-882-4.patch Sync with latest trunk log level not propogated to loggers Key: PIG-882 URL: https://issues.apache.org/jira/browse/PIG-882 Project: Pig Issue Type: Bug Components: impl Reporter: Thejas M Nair Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, PIG-882-4.patch Pig accepts log level as a parameter. But the log level it captures is not set appropriately, so that loggers in different classes log at the specified level. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-885) New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, HashFVN, DiffDate)
[ https://issues.apache.org/jira/browse/PIG-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-885: --- Attachment: PIG-885-7.patch Add null checking to all applicable UDFs New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, HashFVN, DiffDate) Key: PIG-885 URL: https://issues.apache.org/jira/browse/PIG-885 Project: Pig Issue Type: New Feature Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Priority: Minor Fix For: 0.4.0 Attachments: PIG-885-2.patch, PIG-885-3.patch, PIG-885-4.patch, PIG-885-5.patch, PIG-885-6.patch, PIG-885-7.patch, PIG-885.patch Bunch of UDFs: 1. Bin -- Converts a continuous value into discrete values 2. Decode -- Converts a given attribute or expression into another string value, based on the value of the source attribute 3. LookupInFiles -- Check for the existence of an expression in a serial of text files 4. RegexExtract and RegexMatch -- Similar to perl regexes 5. HashFNV -- An implementation of FNV hash 6. DiffDate -- Caculate the number of days in between -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-792) PERFORMANCE: Support skewed join in pig
[ https://issues.apache.org/jira/browse/PIG-792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-792: --- Resolution: Fixed Status: Resolved (was: Patch Available) The code has been committed. Thanks, Sri and Ying for this important contribution PERFORMANCE: Support skewed join in pig --- Key: PIG-792 URL: https://issues.apache.org/jira/browse/PIG-792 Project: Pig Issue Type: Improvement Reporter: Sriranjan Manjunath Attachments: skewedjoin.patch Fragmented replicated join has a few limitations: - One of the tables needs to be loaded into memory - Join is limited to two tables Skewed join partitions the table and joins the records in the reduce phase. It computes a histogram of the key space to account for skewing in the input records. Further, it adjusts the number of reducers depending on the key distribution. We need to implement the skewed join in pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-892) Make COUNT and AVG deal with nulls accordingly with SQL standar
[ https://issues.apache.org/jira/browse/PIG-892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-892: --- Attachment: PIG-892_v3.patch Patch with addressed comments from Santhosh Make COUNT and AVG deal with nulls accordingly with SQL standar --- Key: PIG-892 URL: https://issues.apache.org/jira/browse/PIG-892 Project: Pig Issue Type: Improvement Affects Versions: 0.3.0 Reporter: Olga Natkovich Assignee: Olga Natkovich Fix For: 0.4.0 Attachments: PIG-892.patch, PIG-892_v2.patch, PIG-892_v3.patch both COUNT and AVG need to ignore nulls. Also add COUNT_STAR to match COUNT(*) in SQL -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-885) New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, HashFVN, DiffDate)
[ https://issues.apache.org/jira/browse/PIG-885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-885: --- Attachment: PIG-885-8.patch Add NullPointerException check New UDFs for piggybank (Bin, Decode, LookupInFiles, RegexExtract, RegexMatch, HashFVN, DiffDate) Key: PIG-885 URL: https://issues.apache.org/jira/browse/PIG-885 Project: Pig Issue Type: New Feature Affects Versions: 0.3.0 Reporter: Daniel Dai Assignee: Daniel Dai Priority: Minor Fix For: 0.4.0 Attachments: PIG-885-2.patch, PIG-885-3.patch, PIG-885-4.patch, PIG-885-5.patch, PIG-885-6.patch, PIG-885-7.patch, PIG-885-8.patch, PIG-885.patch Bunch of UDFs: 1. Bin -- Converts a continuous value into discrete values 2. Decode -- Converts a given attribute or expression into another string value, based on the value of the source attribute 3. LookupInFiles -- Check for the existence of an expression in a serial of text files 4. RegexExtract and RegexMatch -- Similar to perl regexes 5. HashFNV -- An implementation of FNV hash 6. DiffDate -- Caculate the number of days in between -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736964#action_12736964 ] Jeff Hammerbacher commented on PIG-833: --- Hey Raghu, Good stuff! Do you guys have any internal benchmarks that you could add to the docs on design and usage? Thanks, Jeff Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-889) Pig can not access reporter of PigHadoopLog in Load Func
[ https://issues.apache.org/jira/browse/PIG-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736990#action_12736990 ] Santhosh Srinivasan commented on PIG-889: - PigHadoopLogger implements the PigLogger interface. As part of the implementation it uses the Hadoop reporter for aggregating the warning messages. Pig can not access reporter of PigHadoopLog in Load Func Key: PIG-889 URL: https://issues.apache.org/jira/browse/PIG-889 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.4.0 Reporter: Jeff Zhang Assignee: Jeff Zhang Fix For: 0.4.0 Attachments: Pig_889_Patch.txt I'd like to increment Counter in my own LoadFunc, but it will throw NullPointerException. It seems that the reporter is not initialized. I looked into this problem and find that it need to call PigHadoopLogger.getInstance().setReporter(reporter) in PigInputFormat. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-897) Pig should support counters
Pig should support counters --- Key: PIG-897 URL: https://issues.apache.org/jira/browse/PIG-897 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.4.0 Reporter: Santhosh Srinivasan Fix For: 0.4.0 Pig should support the use of counters. The use of the counters can possibly be via the script or via Java APIs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-880) Order by is borken with complex fields
[ https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santhosh Srinivasan reassigned PIG-880: --- Assignee: Santhosh Srinivasan Order by is borken with complex fields -- Key: PIG-880 URL: https://issues.apache.org/jira/browse/PIG-880 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Olga Natkovich Assignee: Santhosh Srinivasan Fix For: 0.4.0 Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch Pig script: a = load 'studentcomplextab10k' as (smap:map[],c2,c3); f = foreach a generate smap#'name, smap#'age', smap#'gpa' ; s = order f by $0; store s into 'sc.out' Stack: Caused by: java.lang.ArrayStoreException at java.lang.System.arraycopy(Native Method) at java.util.Arrays.copyOf(Arrays.java:2763) at java.util.ArrayList.toArray(ArrayList.java:305) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96) ... 5 more at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769) at org.apache.pig.PigServer.execute(PigServer.java:762) at org.apache.pig.PigServer.access$100(PigServer.java:91) at org.apache.pig.PigServer$Graph.execute(PigServer.java:933) at org.apache.pig.PigServer.executeBatch(PigServer.java:245) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:389) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-880) Order by is borken with complex fields
[ https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Santhosh Srinivasan updated PIG-880: Status: Patch Available (was: Open) Order by is borken with complex fields -- Key: PIG-880 URL: https://issues.apache.org/jira/browse/PIG-880 Project: Pig Issue Type: Bug Affects Versions: 0.3.0 Reporter: Olga Natkovich Assignee: Santhosh Srinivasan Fix For: 0.4.0 Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, PIG-880.patch Pig script: a = load 'studentcomplextab10k' as (smap:map[],c2,c3); f = foreach a generate smap#'name, smap#'age', smap#'gpa' ; s = order f by $0; store s into 'sc.out' Stack: Caused by: java.lang.ArrayStoreException at java.lang.System.arraycopy(Native Method) at java.util.Arrays.copyOf(Arrays.java:2763) at java.util.ArrayList.toArray(ArrayList.java:305) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96) ... 5 more at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179) at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204) at org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265) at org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769) at org.apache.pig.PigServer.execute(PigServer.java:762) at org.apache.pig.PigServer.access$100(PigServer.java:91) at org.apache.pig.PigServer$Graph.execute(PigServer.java:933) at org.apache.pig.PigServer.executeBatch(PigServer.java:245) at org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88) at org.apache.pig.Main.main(Main.java:389) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-833) Storage access layer
[ https://issues.apache.org/jira/browse/PIG-833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12736998#action_12736998 ] Raghu Angadi commented on PIG-833: -- There will be benchmark results either attached to this jira or to a subsequent jira. I would like to compare to SequenceFiles and the new format in Hive. Should to see on par performance. Major performance benefits come from commonly used projections (through column groups) and map side joins of sorted tables. An important part of motivation is some features like column security, ability to delete entire columns. We are running some larger scale benchmarks internally.. but these run on Yahoo's internal data sources. Storage access layer Key: PIG-833 URL: https://issues.apache.org/jira/browse/PIG-833 Project: Pig Issue Type: New Feature Reporter: Jay Tang Attachments: hadoop20.jar.bz2, PIG-833-zebra.patch, zebra-javadoc.tgz A layer is needed to provide a high level data access abstraction and a tabular view of data in Hadoop, and could free Pig users from implementing their own data storage/retrieval code. This layer should also include a columnar storage format in order to provide fast data projection, CPU/space-efficient data serialization, and a schema language to manage physical storage metadata. Eventually it could also support predicate pushdown for further performance improvement. Initially, this layer could be a contrib project in Pig and become a hadoop subproject later on. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-882) log level not propogated to loggers
[ https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737006#action_12737006 ] Hadoop QA commented on PIG-882: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12414928/PIG-882-4.patch against trunk revision 799141. +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to cause Findbugs to fail. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/145/testReport/ Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/145/console This message is automatically generated. log level not propogated to loggers Key: PIG-882 URL: https://issues.apache.org/jira/browse/PIG-882 Project: Pig Issue Type: Bug Components: impl Reporter: Thejas M Nair Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, PIG-882-4.patch Pig accepts log level as a parameter. But the log level it captures is not set appropriately, so that loggers in different classes log at the specified level. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-882) log level not propogated to loggers
[ https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-882: --- Attachment: PIG-882-5.patch log level not propogated to loggers Key: PIG-882 URL: https://issues.apache.org/jira/browse/PIG-882 Project: Pig Issue Type: Bug Components: impl Reporter: Thejas M Nair Attachments: PIG-882-1.patch, PIG-882-2.patch, PIG-882-3.patch, PIG-882-4.patch, PIG-882-5.patch Pig accepts log level as a parameter. But the log level it captures is not set appropriately, so that loggers in different classes log at the specified level. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
zookeeper patch builds
Looks like hudson space issue is resolved; I 've restarted the zookeeper patch build jobs. -Giri