[jira] Created: (PIG-805) removing dependency on consolidated hadoop.jar from pig.jar
removing dependency on consolidated hadoop.jar from pig.jar --- Key: PIG-805 URL: https://issues.apache.org/jira/browse/PIG-805 Project: Pig Issue Type: Improvement Reporter: Olga Natkovich Assignee: Giridharan Kesavan The proposal is to - for compilation always use dependencies from Ivy - for packaging to have 2 targets: (1) The current jar target will not package it and rely on having it available at runtime (2) Add a new target jar_with_hadoop that does what the current jar does (3) Add a property that allows to build the release tar with and without hadoop.jar embeded into pig. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich reopened PIG-794: I think it was closed by mistake. The final patch has not been reviewed or committed yet > Use Avro serialization in Pig > - > > Key: PIG-794 > URL: https://issues.apache.org/jira/browse/PIG-794 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.2.0 >Reporter: Rakesh Setty > Fix For: 0.2.0 > > Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, > jackson-asl-0.9.4.jar > > > We would like to use Avro serialization in Pig to pass data between MR jobs > instead of the current BinStorage. Attached is an implementation of > AvroBinStorage which performs significantly better compared to BinStorage on > our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707576#action_12707576 ] Rakesh Setty commented on PIG-794: -- There was one important change I had to do in AvroStorage to the Avro format to get it working. The map keys were stored as String objects. I had to change it so that both key and value can be Object instances. Please let me know if this is an issue. Thanks, Rakesh > Use Avro serialization in Pig > - > > Key: PIG-794 > URL: https://issues.apache.org/jira/browse/PIG-794 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.2.0 >Reporter: Rakesh Setty > Fix For: 0.2.0 > > Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, > jackson-asl-0.9.4.jar > > > We would like to use Avro serialization in Pig to pass data between MR jobs > instead of the current BinStorage. Attached is an implementation of > AvroBinStorage which performs significantly better compared to BinStorage on > our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707570#action_12707570 ] Rakesh Setty commented on PIG-794: -- The new patch has unit tests. The comments are already in javadoc format. Please let me know if I have missed somewhere. > Use Avro serialization in Pig > - > > Key: PIG-794 > URL: https://issues.apache.org/jira/browse/PIG-794 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.2.0 >Reporter: Rakesh Setty > Fix For: 0.2.0 > > Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, > jackson-asl-0.9.4.jar > > > We would like to use Avro serialization in Pig to pass data between MR jobs > instead of the current BinStorage. Attached is an implementation of > AvroBinStorage which performs significantly better compared to BinStorage on > our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh Setty updated PIG-794: - Resolution: Fixed Fix Version/s: 0.2.0 Status: Resolved (was: Patch Available) > Use Avro serialization in Pig > - > > Key: PIG-794 > URL: https://issues.apache.org/jira/browse/PIG-794 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.2.0 >Reporter: Rakesh Setty > Fix For: 0.2.0 > > Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, > jackson-asl-0.9.4.jar > > > We would like to use Avro serialization in Pig to pass data between MR jobs > instead of the current BinStorage. Attached is an implementation of > AvroBinStorage which performs significantly better compared to BinStorage on > our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh Setty updated PIG-794: - Attachment: avro-0.1-dev-java.jar AvroStorage.patch Attaching the new patch along with the latest avro jar. > Use Avro serialization in Pig > - > > Key: PIG-794 > URL: https://issues.apache.org/jira/browse/PIG-794 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.2.0 >Reporter: Rakesh Setty > Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, > jackson-asl-0.9.4.jar > > > We would like to use Avro serialization in Pig to pass data between MR jobs > instead of the current BinStorage. Attached is an implementation of > AvroBinStorage which performs significantly better compared to BinStorage on > our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh Setty updated PIG-794: - Attachment: (was: avro-0.1-dev-java.jar) > Use Avro serialization in Pig > - > > Key: PIG-794 > URL: https://issues.apache.org/jira/browse/PIG-794 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.2.0 >Reporter: Rakesh Setty > Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, > jackson-asl-0.9.4.jar > > > We would like to use Avro serialization in Pig to pass data between MR jobs > instead of the current BinStorage. Attached is an implementation of > AvroBinStorage which performs significantly better compared to BinStorage on > our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-794) Use Avro serialization in Pig
[ https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rakesh Setty updated PIG-794: - Attachment: (was: AvroStorage.patch) > Use Avro serialization in Pig > - > > Key: PIG-794 > URL: https://issues.apache.org/jira/browse/PIG-794 > Project: Pig > Issue Type: Improvement > Components: impl >Affects Versions: 0.2.0 >Reporter: Rakesh Setty > Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, > jackson-asl-0.9.4.jar > > > We would like to use Avro serialization in Pig to pass data between MR jobs > instead of the current BinStorage. Attached is an implementation of > AvroBinStorage which performs significantly better compared to BinStorage on > our benchmarks. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)
[ https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-626: --- Attachment: PIG-626.patch A version of the patch that deals with the findbugs and javac warnings. > Statistics (records read by each mapper and reducer) > > > Key: PIG-626 > URL: https://issues.apache.org/jira/browse/PIG-626 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.2.0 >Reporter: Shubham Chopra >Assignee: Shubham Chopra >Priority: Minor > Fix For: 0.3.0 > > Attachments: PIG-626.patch, pigStats.patch, pigStats.patch, > pigStats.patch, pigStats.patch, pigStats.patch, > TEST-org.apache.pig.test.TestBZip.txt > > > This uses the counters framework that hadoop has. Initially, I am just > interested in finding out the number of records read by each mapper/reducer > particularly for the last job in any script. A sample code to access the > statistics for the last job: > String reducePlan = > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN"); > if(reducePlan == null) { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS")); > } else { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS")); > } > The patch contains 7 test cases. These include tests PigStorage and > BinStorage along with one for multiple MR jobs case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Reopened: (PIG-626) Statistics (records read by each mapper and reducer)
[ https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reopened PIG-626: I should have checked my other window before I marked the bug as fixed. The commit failed, I can't seem to contact Apache's SVN at the moment. I'll commit the patch once I can. > Statistics (records read by each mapper and reducer) > > > Key: PIG-626 > URL: https://issues.apache.org/jira/browse/PIG-626 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.2.0 >Reporter: Shubham Chopra >Assignee: Shubham Chopra >Priority: Minor > Fix For: 0.3.0 > > Attachments: pigStats.patch, pigStats.patch, pigStats.patch, > pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt > > > This uses the counters framework that hadoop has. Initially, I am just > interested in finding out the number of records read by each mapper/reducer > particularly for the last job in any script. A sample code to access the > statistics for the last job: > String reducePlan = > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN"); > if(reducePlan == null) { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS")); > } else { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS")); > } > The patch contains 7 test cases. These include tests PigStorage and > BinStorage along with one for multiple MR jobs case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-175) Reading compressed files in local mode + MiniMRCluster
[ https://issues.apache.org/jira/browse/PIG-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707468#action_12707468 ] Craig Macdonald commented on PIG-175: - Enclosed are updated results fro Pig 0.2.0. In this version, MapReduce mode can now always parse gzip and bzip2 files file, however local mode cannot. {noformat} == Bashs good friend: cat == Normal A B C bz2 A B C gzip A B C == MiniMRCluster == test.all.pig 2009-05-08 19:56:51,715 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2009-05-08 19:56:52,034 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, sessionId= 2009-05-08 19:56:54,686 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2009-05-08 19:56:54,717 [Thread-3] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-05-08 19:56:55,718 [Thread-9] INFO org.apache.hadoop.mapred.MapTask - numReduceTasks: 0 2009-05-08 19:56:56,015 [Thread-9] INFO org.apache.hadoop.mapred.LocalJobRunner - 2009-05-08 19:56:56,020 [Thread-9] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0001_m_00_0' done. 2009-05-08 19:56:56,030 [Thread-9] INFO org.apache.hadoop.mapred.TaskRunner - Saved output of task 'attempt_local_0001_m_00_0' to file:/tmp/temp442336691/tmp1233577046 2009-05-08 19:56:59,714 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-05-08 19:57:04,720 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-05-08 19:57:04,720 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! (A) (B) (C) 2009-05-08 19:57:06,148 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2009-05-08 19:57:06,153 [Thread-10] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-05-08 19:57:06,450 [Thread-16] INFO org.apache.hadoop.mapred.MapTask - numReduceTasks: 0 2009-05-08 19:57:06,512 [Thread-16] INFO org.apache.hadoop.mapred.LocalJobRunner - 2009-05-08 19:57:06,514 [Thread-16] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0002_m_00_0' done. 2009-05-08 19:57:06,519 [Thread-16] INFO org.apache.hadoop.mapred.TaskRunner - Saved output of task 'attempt_local_0002_m_00_0' to file:/tmp/temp442336691/tmp-1848149730 2009-05-08 19:57:11,152 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-05-08 19:57:16,154 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-05-08 19:57:16,154 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! (A) (B) (C) 2009-05-08 19:57:17,114 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already initialized 2009-05-08 19:57:17,118 [Thread-17] WARN org.apache.hadoop.mapred.JobClient - Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same. 2009-05-08 19:57:17,359 [Thread-23] INFO org.apache.hadoop.mapred.MapTask - numReduceTasks: 0 2009-05-08 19:57:17,520 [Thread-23] INFO org.apache.hadoop.mapred.LocalJobRunner - 2009-05-08 19:57:17,523 [Thread-23] INFO org.apache.hadoop.mapred.TaskRunner - Task 'attempt_local_0003_m_00_0' done. 2009-05-08 19:57:17,528 [Thread-23] INFO org.apache.hadoop.mapred.TaskRunner - Saved output of task 'attempt_local_0003_m_00_0' to file:/tmp/temp442336691/tmp97423898 2009-05-08 19:57:22,119 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 0% complete 2009-05-08 19:57:27,122 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - 100% complete 2009-05-08 19:57:27,122 [main] INFO org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher - Success! (A) (B) (C) test.bz2.pig 2009-05-08 19:57:28,096 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop file system at: file:/// 2009-05-08 19:57:28,401 [main] INFO org.apache.hadoop.metrics.jvm.JvmMetrics - Initializing JVM Metrics with processName=JobTracker, session
[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)
[ https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-626: --- Resolution: Fixed Fix Version/s: 0.3.0 Status: Resolved (was: Patch Available) Patch checked in. Thanks Shubham for your patience on this one. > Statistics (records read by each mapper and reducer) > > > Key: PIG-626 > URL: https://issues.apache.org/jira/browse/PIG-626 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.2.0 >Reporter: Shubham Chopra >Assignee: Shubham Chopra >Priority: Minor > Fix For: 0.3.0 > > Attachments: pigStats.patch, pigStats.patch, pigStats.patch, > pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt > > > This uses the counters framework that hadoop has. Initially, I am just > interested in finding out the number of records read by each mapper/reducer > particularly for the last job in any script. A sample code to access the > statistics for the last job: > String reducePlan = > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN"); > if(reducePlan == null) { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS")); > } else { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS")); > } > The patch contains 7 test cases. These include tests PigStorage and > BinStorage along with one for multiple MR jobs case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-626) Statistics (records read by each mapper and reducer)
[ https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707432#action_12707432 ] Hadoop QA commented on PIG-626: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12407613/pigStats.patch against trunk revision 772750. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 12 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 226 javac compiler warnings (more than the trunk's current 225 warnings). -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/33/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/33/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/33/console This message is automatically generated. > Statistics (records read by each mapper and reducer) > > > Key: PIG-626 > URL: https://issues.apache.org/jira/browse/PIG-626 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.2.0 >Reporter: Shubham Chopra >Assignee: Shubham Chopra >Priority: Minor > Attachments: pigStats.patch, pigStats.patch, pigStats.patch, > pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt > > > This uses the counters framework that hadoop has. Initially, I am just > interested in finding out the number of records read by each mapper/reducer > particularly for the last job in any script. A sample code to access the > statistics for the last job: > String reducePlan = > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN"); > if(reducePlan == null) { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS")); > } else { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS")); > } > The patch contains 7 test cases. These include tests PigStorage and > BinStorage along with one for multiple MR jobs case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)
[ https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-626: --- Assignee: Shubham Chopra Status: Patch Available (was: Open) > Statistics (records read by each mapper and reducer) > > > Key: PIG-626 > URL: https://issues.apache.org/jira/browse/PIG-626 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.2.0 >Reporter: Shubham Chopra >Assignee: Shubham Chopra >Priority: Minor > Attachments: pigStats.patch, pigStats.patch, pigStats.patch, > pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt > > > This uses the counters framework that hadoop has. Initially, I am just > interested in finding out the number of records read by each mapper/reducer > particularly for the last job in any script. A sample code to access the > statistics for the last job: > String reducePlan = > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN"); > if(reducePlan == null) { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS")); > } else { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS")); > } > The patch contains 7 test cases. These include tests PigStorage and > BinStorage along with one for multiple MR jobs case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-734) Non-string keys in maps
[ https://issues.apache.org/jira/browse/PIG-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707381#action_12707381 ] Alan Gates commented on PIG-734: I wasn't planning on making mymap#1 translate to mymap#'1'. The issue I see with that is if that works, why doesn't mymap#intcol work? I'm concerned that sayings keys need to be strings but then cheating in this one case will make the semantics confusing. > Non-string keys in maps > --- > > Key: PIG-734 > URL: https://issues.apache.org/jira/browse/PIG-734 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Minor > Fix For: 0.3.0 > > Attachments: PIG-734.patch > > > With the addition of types to pig, maps were changed to allow any atomic type > to be a key. However, in practice we do not see people using keys other than > strings. And allowing multiple types is causing us issues in serializing > data (we have to check what every key type is) and in the design for non-java > UDFs (since many scripting languages include associative arrays such as > Perl's hash). > So I propose we scope back maps to only have string keys. This would be a > non-compatible change. But I am not aware of anyone using non-string keys, > so hopefully it would have little or no impact. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-734) Non-string keys in maps
[ https://issues.apache.org/jira/browse/PIG-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707377#action_12707377 ] David Ciemiewicz commented on PIG-734: -- Alan, if I don't think this is going to be that problematic. Even if I try to pass in a map dereference with an integer such as mymap#1, would pig automagically convert the 1 to a string equivalent to mymap#'1'. If so, I think this would be quite acceptable. > Non-string keys in maps > --- > > Key: PIG-734 > URL: https://issues.apache.org/jira/browse/PIG-734 > Project: Pig > Issue Type: Bug >Affects Versions: 0.2.0 >Reporter: Alan Gates >Assignee: Alan Gates >Priority: Minor > Fix For: 0.3.0 > > Attachments: PIG-734.patch > > > With the addition of types to pig, maps were changed to allow any atomic type > to be a key. However, in practice we do not see people using keys other than > strings. And allowing multiple types is causing us issues in serializing > data (we have to check what every key type is) and in the design for non-java > UDFs (since many scripting languages include associative arrays such as > Perl's hash). > So I propose we scope back maps to only have string keys. This would be a > non-compatible change. But I am not aware of anyone using non-string keys, > so hopefully it would have little or no impact. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)
[ https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shubham Chopra updated PIG-626: --- Attachment: pigStats.patch Patch compatible with the latest trunk attached. > Statistics (records read by each mapper and reducer) > > > Key: PIG-626 > URL: https://issues.apache.org/jira/browse/PIG-626 > Project: Pig > Issue Type: New Feature > Components: impl >Affects Versions: 0.2.0 >Reporter: Shubham Chopra >Priority: Minor > Attachments: pigStats.patch, pigStats.patch, pigStats.patch, > pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt > > > This uses the counters framework that hadoop has. Initially, I am just > interested in finding out the number of records read by each mapper/reducer > particularly for the last job in any script. A sample code to access the > statistics for the last job: > String reducePlan = > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN"); > if(reducePlan == null) { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS")); > } else { > System.out.println("Records written : " + > stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS")); > } > The patch contains 7 test cases. These include tests PigStorage and > BinStorage along with one for multiple MR jobs case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.