[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables
[ https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797183#action_12797183 ] Gerrit Jansen van Vuuren commented on PIG-1117: --- If you have time yes please, that way I can correct anything if need be. I'll try to implement the fieldsToRead soon, and it should not be that difficult, I just have to get around to it :). Thanks for the head up, I'll do some reading up, this change from Slicer to InputFormat will be great though. I don't think Hive has an InputFormat, but this isn't a problem. Pig reading hive columnar rc tables --- Key: PIG-1117 URL: https://issues.apache.org/jira/browse/PIG-1117 Project: Pig Issue Type: New Feature Affects Versions: 0.7.0 Reporter: Gerrit Jansen van Vuuren Assignee: Gerrit Jansen van Vuuren Fix For: 0.7.0 Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch I've coded a LoadFunc implementation that can read from Hive Columnar RC tables, this is needed for a project that I'm working on because all our data is stored using the Hive thrift serialized Columnar RC format. I have looked at the piggy bank but did not find any implementation that could do this. We've been running it on our cluster for the last week and have worked out most bugs. There are still some improvements to be done but I would need like setting the amount of mappers based on date partitioning. Its been optimized so as to read only specific columns and can churn through a data set almost 8 times faster with this improvement because not all column data is read. I would like to contribute the class to the piggybank can you guide me in what I need to do? I've used hive specific classes to implement this, is it possible to add this to the piggy bank build ivy for automatic download of the dependencies? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1180) Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory
Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory Key: PIG-1180 URL: https://issues.apache.org/jira/browse/PIG-1180 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Piggybank depends on pig.jar to compile. If we build pig using the option ant jar-withouthadoop, we only get pig-withouthadoop.jar. In this case, piggybank should look for pig-withouthadoop.jar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1180) Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory
[ https://issues.apache.org/jira/browse/PIG-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-1180: Attachment: PIG-1180-1.patch pig.jar have higher priority than pig-withouthadoop.jar. If pig.jar exists, use pig.jar first, if not, then look for pig-withouthadoop.jar. Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory Key: PIG-1180 URL: https://issues.apache.org/jira/browse/PIG-1180 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1180-1.patch Piggybank depends on pig.jar to compile. If we build pig using the option ant jar-withouthadoop, we only get pig-withouthadoop.jar. In this case, piggybank should look for pig-withouthadoop.jar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1180) Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory
[ https://issues.apache.org/jira/browse/PIG-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797231#action_12797231 ] Olga Natkovich commented on PIG-1180: - +1 Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory Key: PIG-1180 URL: https://issues.apache.org/jira/browse/PIG-1180 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1180-1.patch Piggybank depends on pig.jar to compile. If we build pig using the option ant jar-withouthadoop, we only get pig-withouthadoop.jar. In this case, piggybank should look for pig-withouthadoop.jar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (PIG-1180) Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory
[ https://issues.apache.org/jira/browse/PIG-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai resolved PIG-1180. - Resolution: Fixed Hadoop Flags: [Reviewed] This patch is applied to build.xml only. No need for hudson process. Patch committed to trunk and 0.6 branch. Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory Key: PIG-1180 URL: https://issues.apache.org/jira/browse/PIG-1180 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.6.0 Attachments: PIG-1180-1.patch Piggybank depends on pig.jar to compile. If we build pig using the option ant jar-withouthadoop, we only get pig-withouthadoop.jar. In this case, piggybank should look for pig-withouthadoop.jar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (PIG-1181) Need to find the right place to store streaming error messages when running multi-query scripts
Need to find the right place to store streaming error messages when running multi-query scripts Key: PIG-1181 URL: https://issues.apache.org/jira/browse/PIG-1181 Project: Pig Issue Type: Bug Reporter: Richard Ding Pig Latin allows user to specify a HDFS directory to store the streaming stderr ourput (if necessary). For instance, the following script {code} DEFINE Y `stream.pl` stderr('stream_err' limit 100); X = STREAM A THROUGH Y; STORE X INTO '/tmp/stream_out'; {code} will put streaming stderr into the directory _/tmp/stream_out/_logs/stream_err_. Namely, in the _logs directory of the job's output directory. But the problem occurs with multiquery scripts where a single job can have multiple output directories. The current implementation stores streamig stderr in the _logs directory of a ramdom generated tmp directory and it would be hard for user to find if she needs to look ino streaming stderr messages. A better solution is needed to store the streaming stderr in HDFS for multiquery scripts. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-502) Limit and Illustrate do not work together
[ https://issues.apache.org/jira/browse/PIG-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797260#action_12797260 ] Richard Ding commented on PIG-502: -- As of now, illustrate also doesn't support operators _CROSS_, _DISTINCT_, and _STREAM_. We need to look into these (and above) operators and see if we can make illustrate to work with them. Limit and Illustrate do not work together - Key: PIG-502 URL: https://issues.apache.org/jira/browse/PIG-502 Project: Pig Issue Type: Improvement Components: tools Affects Versions: 0.2.0 Environment: Hadoop 18 Reporter: Viraj Bhat Suppose a user wants to do an illustrate command after limiting his data to a certain number of records, it does not seem to work.. -- {code} MYDATA = load 'testfilelarge.txt' as (f1, f2, f3, f4, f5); MYDATA = limit MYDATA 10; describe MYDATA; illustrate MYDATA; {code} -- Running this script produces the following output and error -- MYDATA: {f1: bytearray,f2: bytearray,f3: bytearray,f4: bytearray,f5: bytearray} 2008-10-18 02:14:26,900 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to hadoop fil e system at: hdfs://localhost:9000 2008-10-18 02:14:27,013 [main] INFO org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to map-reduce job tracker at: localhost:9001 java.lang.RuntimeException: Unrecognized logical operator. at org.apache.pig.pen.EquivalenceClasses.GetEquivalenceClasses(EquivalenceClasses.java:60) at org.apache.pig.pen.DerivedDataVisitor.evaluateOperator(DerivedDataVisitor.java:368) at org.apache.pig.pen.DerivedDataVisitor.visit(DerivedDataVisitor.java:273) at org.apache.pig.impl.logicalLayer.LOLimit.visit(LOLimit.java:71) at org.apache.pig.impl.logicalLayer.LOLimit.visit(LOLimit.java:10) at org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68) at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51) at org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:98) at org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:90) at org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:106) at org.apache.pig.PigServer.getExamples(PigServer.java:630) at org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:279) at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:183) at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84) at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64) at org.apache.pig.Main.main(Main.java:306) -- If I remove the illustrate and replace it with dump MYDATA; it works.. -- {code} MYDATA = load 'testfilelarge.txt' as (f1, f2, f3, f4, f5); MYDATA = limit MYDATA 10; describe MYDATA; -- illustrate MYDATA; dump MYDATA; {code} -- -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Richard Ding updated PIG-1090: -- Attachment: PIG-1090-10.patch Remove 'FIXME comments from JobControlCompiler class and open JIRA Pig-1181 to track the issue. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-1176) Column Pruner issues in union of loader with and without schema
[ https://issues.apache.org/jira/browse/PIG-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797405#action_12797405 ] Pradeep Kamath commented on PIG-1176: - +1 Column Pruner issues in union of loader with and without schema --- Key: PIG-1176 URL: https://issues.apache.org/jira/browse/PIG-1176 Project: Pig Issue Type: Bug Components: impl Affects Versions: 0.6.0 Reporter: Daniel Dai Assignee: Daniel Dai Fix For: 0.7.0 Attachments: PIG-1176-1.patch Column pruner for union could fail if one source of union have the schema and the other does not have schema. For example, the following script fail: {code} a = load '1.txt' as (a0, a1, a2); b = foreach a generate a0; c = load '2.txt'; d = foreach c generate $0; e = union b, d; dump e; {code} However, this issue is in trunk only and is not applicable to 0.6 branch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs
[ https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-480: Status: Open (was: Patch Available) cancel this patch to add new patch to support combiner PERFORMANCE: Use identity mapper in a chain of M-R jobs --- Key: PIG-480 URL: https://issues.apache.org/jira/browse/PIG-480 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Assignee: Ying He Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch For jobs with two or more MR jobs, use identity mapper wherever possible in second and subsequent MR jobs. Identity mapper is about 50% than pig empty map job because it doesn't parse the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs
[ https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-480: Attachment: PIG_480.patch add support for combiner PERFORMANCE: Use identity mapper in a chain of M-R jobs --- Key: PIG-480 URL: https://issues.apache.org/jira/browse/PIG-480 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Assignee: Ying He Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch For jobs with two or more MR jobs, use identity mapper wherever possible in second and subsequent MR jobs. Identity mapper is about 50% than pig empty map job because it doesn't parse the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs
[ https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797412#action_12797412 ] Ying He commented on PIG-480: - I did more performance tests. It shows the performance is related to the nature of data. If the data is skewed, performance is very bad for combiner case. If data is uniform, the combiner case gets the most performance gain. The test is done by using a join then a group by statement. For skewed data, if I use skewed join, the result is much better. I think the reason of bad performance for skewed data is that because the map plan of second job is moved to the reducer of first job. If data is skewed, a single reducer has to execute the extra logic for all its tuples. While without this patch, that part of logic would be executed inside multiple mappers. So we lost parallelism for this. The more skewed the data is, the worse the performance would be. 1. skewed data combiner job 1 job 2 total patch 7min 53sec 1min 1sec8min 54sec trunk 4min 43sec 1min 37sec 6min 20sec combiner and using skewed join patch1min 55sec 1min 1sec 2min 56sec trunk1min 44sec 1min 40sec 3min 24sec no combiner patch2min 26sec 2min 28sec 4min 54sec trunk1min 25sec 3min 24sec 4min 49sec no combiner and using skewed join patch 1min 17sec 3min 5sec 4min 22sec trunk59sec 3min 7sec 4min 6sec 2. uniform data combiner patch 6min 48sec 3min 43sec10min 31sec trunk7min 32sec 7min 3sec 14min 35sec no combiner patch 1min 25sec 2min 25sec 3min 50sec trunk 1min 24sec 2min 28sec 3min 52sec each group of tests may use different data, so don't make cross group comparison. PERFORMANCE: Use identity mapper in a chain of M-R jobs --- Key: PIG-480 URL: https://issues.apache.org/jira/browse/PIG-480 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Assignee: Ying He Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch For jobs with two or more MR jobs, use identity mapper wherever possible in second and subsequent MR jobs. Identity mapper is about 50% than pig empty map job because it doesn't parse the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs
[ https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ying He updated PIG-480: Status: Patch Available (was: Open) PERFORMANCE: Use identity mapper in a chain of M-R jobs --- Key: PIG-480 URL: https://issues.apache.org/jira/browse/PIG-480 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Assignee: Ying He Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch For jobs with two or more MR jobs, use identity mapper wherever possible in second and subsequent MR jobs. Identity mapper is about 50% than pig empty map job because it doesn't parse the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces
[ https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pradeep Kamath updated PIG-1090: Attachment: PIG-1090-11.patch Patch to fix issue in TextLoader() wherein the output would be scrambled if different lines in the input are or different sizes. The issue was that TextInputFormat which is used by TextLoader reuses the memory buffer to provide each line of the input. So in TextLoader we need to make a copy for use. Update sources to reflect recent changes in load-store interfaces - Key: PIG-1090 URL: https://issues.apache.org/jira/browse/PIG-1090 Project: Pig Issue Type: Sub-task Reporter: Pradeep Kamath Assignee: Pradeep Kamath Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-2.patch, PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch There have been some changes (as recorded in the Changes Section, Nov 2 2009 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the load/store interfaces - this jira is to track the task of making those changes under src. Changes under test will be addresses in a different jira. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Pig-trunk #660
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/660/changes Changes: [daijy] PIG-1180: Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory [pradeepkth] This is to cleanup the local mode code after switching to using hadoop local mode -- [...truncated 240452 lines...] [junit] 10/01/07 02:23:09 INFO datanode.DataNode: Receiving block blk_4529900263997112706_1017 src: /127.0.0.1:38624 dest: /127.0.0.1:50773 [junit] 10/01/07 02:23:09 INFO datanode.DataNode: Receiving block blk_4529900263997112706_1017 src: /127.0.0.1:55172 dest: /127.0.0.1:34653 [junit] 10/01/07 02:23:09 INFO datanode.DataNode: Receiving block blk_4529900263997112706_1017 src: /127.0.0.1:43946 dest: /127.0.0.1:32866 [junit] 10/01/07 02:23:09 INFO DataNode.clienttrace: src: /127.0.0.1:43946, dest: /127.0.0.1:32866, bytes: 48857, op: HDFS_WRITE, cliID: DFSClient_1941479077, srvID: DS-168095-127.0.1.1-32866-1262830956384, blockid: blk_4529900263997112706_1017 [junit] 10/01/07 02:23:09 INFO datanode.DataNode: PacketResponder 0 for block blk_4529900263997112706_1017 terminating [junit] 10/01/07 02:23:09 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:32866 is added to blk_4529900263997112706_1017 size 48857 [junit] 10/01/07 02:23:09 INFO DataNode.clienttrace: src: /127.0.0.1:55172, dest: /127.0.0.1:34653, bytes: 48857, op: HDFS_WRITE, cliID: DFSClient_1941479077, srvID: DS-75923346-127.0.1.1-34653-1262830957336, blockid: blk_4529900263997112706_1017 [junit] 10/01/07 02:23:09 INFO datanode.DataNode: PacketResponder 1 for block blk_4529900263997112706_1017 terminating [junit] 10/01/07 02:23:09 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:34653 is added to blk_4529900263997112706_1017 size 48857 [junit] 10/01/07 02:23:09 INFO DataNode.clienttrace: src: /127.0.0.1:38624, dest: /127.0.0.1:50773, bytes: 48857, op: HDFS_WRITE, cliID: DFSClient_1941479077, srvID: DS-1435789546-127.0.1.1-50773-1262830956883, blockid: blk_4529900263997112706_1017 [junit] 10/01/07 02:23:09 INFO datanode.DataNode: PacketResponder 2 for block blk_4529900263997112706_1017 terminating [junit] 10/01/07 02:23:09 INFO hdfs.StateChange: BLOCK* NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50773 is added to blk_4529900263997112706_1017 size 48857 [junit] 10/01/07 02:23:09 INFO hdfs.StateChange: DIR* NameSystem.completeFile: file /tmp/temp-1306912187/tmp-891082635/_logs/history/localhost_1262830957897_job_20100107022237874_0002_conf.xml is closed by DFSClient_1941479077 [junit] 10/01/07 02:23:09 INFO datanode.DataNode: Deleting block blk_-4766788746537872722_1006 file build/test/data/dfs/data/data3/current/blk_-4766788746537872722 [junit] 10/01/07 02:23:09 INFO datanode.DataNode: Deleting block blk_6737284167407333489_1007 file build/test/data/dfs/data/data4/current/blk_6737284167407333489 [junit] 10/01/07 02:23:09 INFO mapReduceLayer.MapReduceLauncher: Submitting job: job_20100107022237874_0002 to execution engine. [junit] 10/01/07 02:23:09 INFO mapReduceLayer.MapReduceLauncher: More information at: http://localhost:54010/jobdetails.jsp?jobid=job_20100107022237874_0002 [junit] 10/01/07 02:23:09 INFO mapReduceLayer.MapReduceLauncher: To kill this job, use: kill job_20100107022237874_0002 [junit] 10/01/07 02:23:10 INFO FSNamesystem.audit: ugi=hudson,hudson ip=/127.0.0.1 cmd=open src=/tmp/hadoop-hudson/mapred/system/job_20100107022237874_0002/job.split dst=nullperm=null [junit] 10/01/07 02:23:10 INFO DataNode.clienttrace: src: /127.0.0.1:45812, dest: /127.0.0.1:34580, bytes: 1605, op: HDFS_READ, cliID: DFSClient_1941479077, srvID: DS-1109484639-127.0.1.1-45812-1262830957804, blockid: blk_-124486616345040_1014 [junit] 10/01/07 02:23:10 INFO mapred.JobInProgress: Input size for job job_20100107022237874_0002 = 12. Number of splits = 2 [junit] 10/01/07 02:23:10 INFO mapred.JobInProgress: tip:task_20100107022237874_0002_m_00 has split on node:/default-rack/h7.grid.sp2.yahoo.net [junit] 10/01/07 02:23:10 INFO mapred.JobInProgress: tip:task_20100107022237874_0002_m_01 has split on node:/default-rack/h7.grid.sp2.yahoo.net [junit] 10/01/07 02:23:10 INFO mapred.JobTracker: Adding task 'attempt_20100107022237874_0002_m_03_0' to tip task_20100107022237874_0002_m_03, for tracker 'tracker_host2.foo.com:localhost/127.0.0.1:40200' [junit] 10/01/07 02:23:10 INFO mapred.TaskTracker: LaunchTaskAction (registerTask): attempt_20100107022237874_0002_m_03_0 task's state:UNASSIGNED [junit] 10/01/07 02:23:10 INFO mapred.TaskTracker: Trying to launch : attempt_20100107022237874_0002_m_03_0 [junit] 10/01/07 02:23:10 INFO mapred.TaskTracker: In TaskLauncher, current free slots : 2 and trying to launch
[jira] Commented: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs
[ https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797518#action_12797518 ] Hadoop QA commented on PIG-480: --- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12429598/PIG_480.patch against trunk revision 896606. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 230 javac compiler warnings (more than the trunk's current 212 warnings). +1 findbugs. The patch does not introduce any new Findbugs warnings. -1 release audit. The applied patch generated 482 release audit warnings (more than the trunk's current 481 warnings). -1 core tests. The patch failed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/168/testReport/ Release audit warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/168/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/168/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Console output: http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/168/console This message is automatically generated. PERFORMANCE: Use identity mapper in a chain of M-R jobs --- Key: PIG-480 URL: https://issues.apache.org/jira/browse/PIG-480 Project: Pig Issue Type: Improvement Affects Versions: 0.2.0 Reporter: Olga Natkovich Assignee: Ying He Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch For jobs with two or more MR jobs, use identity mapper wherever possible in second and subsequent MR jobs. Identity mapper is about 50% than pig empty map job because it doesn't parse the data. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.