[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-06 Thread Gerrit Jansen van Vuuren (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797183#action_12797183
 ] 

Gerrit Jansen van Vuuren commented on PIG-1117:
---

If you have time yes please, that way I can correct anything if need be.

I'll try to implement the fieldsToRead soon, and it should not be that 
difficult, I just have to get around to it :).

Thanks for the head up, I'll do some reading up, this change from Slicer to 
InputFormat will be great though. 
I don't think Hive has an InputFormat, but this isn't a problem.



 Pig reading hive columnar rc tables
 ---

 Key: PIG-1117
 URL: https://issues.apache.org/jira/browse/PIG-1117
 Project: Pig
  Issue Type: New Feature
Affects Versions: 0.7.0
Reporter: Gerrit Jansen van Vuuren
Assignee: Gerrit Jansen van Vuuren
 Fix For: 0.7.0

 Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
 PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch


 I've coded a LoadFunc implementation that can read from Hive Columnar RC 
 tables, this is needed for a project that I'm working on because all our data 
 is stored using the Hive thrift serialized Columnar RC format. I have looked 
 at the piggy bank but did not find any implementation that could do this. 
 We've been running it on our cluster for the last week and have worked out 
 most bugs.
  
 There are still some improvements to be done but I would need  like setting 
 the amount of mappers based on date partitioning. Its been optimized so as to 
 read only specific columns and can churn through a data set almost 8 times 
 faster with this improvement because not all column data is read.
 I would like to contribute the class to the piggybank can you guide me in 
 what I need to do?
 I've used hive specific classes to implement this, is it possible to add this 
 to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1180) Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory

2010-01-06 Thread Daniel Dai (JIRA)
Piggybank should compile even if we only have pig-withouthadoop.jar but no 
pig.jar in the pig home directory


 Key: PIG-1180
 URL: https://issues.apache.org/jira/browse/PIG-1180
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0


Piggybank depends on pig.jar to compile. If we build pig using the option ant 
jar-withouthadoop, we only get pig-withouthadoop.jar. In this case, piggybank 
should look for pig-withouthadoop.jar. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1180) Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory

2010-01-06 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-1180:


Attachment: PIG-1180-1.patch

pig.jar have higher priority than pig-withouthadoop.jar. If pig.jar exists, use 
pig.jar first, if not, then look for pig-withouthadoop.jar.

 Piggybank should compile even if we only have pig-withouthadoop.jar but no 
 pig.jar in the pig home directory
 

 Key: PIG-1180
 URL: https://issues.apache.org/jira/browse/PIG-1180
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1180-1.patch


 Piggybank depends on pig.jar to compile. If we build pig using the option 
 ant jar-withouthadoop, we only get pig-withouthadoop.jar. In this case, 
 piggybank should look for pig-withouthadoop.jar. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1180) Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory

2010-01-06 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797231#action_12797231
 ] 

Olga Natkovich commented on PIG-1180:
-

+1

 Piggybank should compile even if we only have pig-withouthadoop.jar but no 
 pig.jar in the pig home directory
 

 Key: PIG-1180
 URL: https://issues.apache.org/jira/browse/PIG-1180
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1180-1.patch


 Piggybank depends on pig.jar to compile. If we build pig using the option 
 ant jar-withouthadoop, we only get pig-withouthadoop.jar. In this case, 
 piggybank should look for pig-withouthadoop.jar. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-1180) Piggybank should compile even if we only have pig-withouthadoop.jar but no pig.jar in the pig home directory

2010-01-06 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai resolved PIG-1180.
-

  Resolution: Fixed
Hadoop Flags: [Reviewed]

This patch is applied to build.xml only. No need for hudson process. Patch 
committed to trunk and 0.6 branch. 

 Piggybank should compile even if we only have pig-withouthadoop.jar but no 
 pig.jar in the pig home directory
 

 Key: PIG-1180
 URL: https://issues.apache.org/jira/browse/PIG-1180
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.6.0

 Attachments: PIG-1180-1.patch


 Piggybank depends on pig.jar to compile. If we build pig using the option 
 ant jar-withouthadoop, we only get pig-withouthadoop.jar. In this case, 
 piggybank should look for pig-withouthadoop.jar. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-1181) Need to find the right place to store streaming error messages when running multi-query scripts

2010-01-06 Thread Richard Ding (JIRA)
Need to find the right place to store streaming error messages when running 
multi-query scripts 


 Key: PIG-1181
 URL: https://issues.apache.org/jira/browse/PIG-1181
 Project: Pig
  Issue Type: Bug
Reporter: Richard Ding



Pig Latin allows user to specify a HDFS directory to store the streaming stderr 
ourput (if necessary). For instance, the following script

{code}
DEFINE Y `stream.pl` stderr('stream_err' limit 100);
X = STREAM A THROUGH Y;
STORE X INTO '/tmp/stream_out';
{code} 

will put streaming stderr into the directory 
_/tmp/stream_out/_logs/stream_err_.  Namely, in the _logs directory of the 
job's output directory.

But the problem occurs with multiquery scripts where a single job can have 
multiple output directories. The current implementation stores streamig stderr 
in the _logs directory of a ramdom generated tmp directory and it would be hard 
for user to find if she needs to look ino streaming stderr messages.

A better solution is needed to store the streaming stderr in HDFS for 
multiquery scripts.




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-502) Limit and Illustrate do not work together

2010-01-06 Thread Richard Ding (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797260#action_12797260
 ] 

Richard Ding commented on PIG-502:
--

As of now, illustrate also doesn't support operators _CROSS_, _DISTINCT_, and 
_STREAM_. We need to look into these (and above) operators and see if we can 
make illustrate to work with them.

 Limit and Illustrate do not work together
 -

 Key: PIG-502
 URL: https://issues.apache.org/jira/browse/PIG-502
 Project: Pig
  Issue Type: Improvement
  Components: tools
Affects Versions: 0.2.0
 Environment: Hadoop 18
Reporter: Viraj Bhat

 Suppose a user wants to do an illustrate command after limiting his data to a 
 certain number of records, it does not seem to work..
 --
 {code}
 MYDATA = load 'testfilelarge.txt' as (f1, f2, f3, f4, f5);
 MYDATA  = limit MYDATA 10;
 describe MYDATA;
 illustrate MYDATA;
 {code}
 --
 Running this script produces the following output and error
 --
 MYDATA: {f1: bytearray,f2: bytearray,f3: bytearray,f4: bytearray,f5: 
 bytearray}
 2008-10-18 02:14:26,900 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to hadoop fil
 e system at: hdfs://localhost:9000
 2008-10-18 02:14:27,013 [main] INFO  
 org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
 to map-reduce
  job tracker at: localhost:9001
 java.lang.RuntimeException: Unrecognized logical operator.
 at 
 org.apache.pig.pen.EquivalenceClasses.GetEquivalenceClasses(EquivalenceClasses.java:60)
 at 
 org.apache.pig.pen.DerivedDataVisitor.evaluateOperator(DerivedDataVisitor.java:368)
 at 
 org.apache.pig.pen.DerivedDataVisitor.visit(DerivedDataVisitor.java:273)
 at org.apache.pig.impl.logicalLayer.LOLimit.visit(LOLimit.java:71)
 at org.apache.pig.impl.logicalLayer.LOLimit.visit(LOLimit.java:10)
 at 
 org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
 at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
 at 
 org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:98)
 at 
 org.apache.pig.pen.LineageTrimmingVisitor.init(LineageTrimmingVisitor.java:90)
 at 
 org.apache.pig.pen.ExampleGenerator.getExamples(ExampleGenerator.java:106)
 at org.apache.pig.PigServer.getExamples(PigServer.java:630)
 at 
 org.apache.pig.tools.grunt.GruntParser.processIllustrate(GruntParser.java:279)
 at 
 org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:183)
 at 
 org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:84)
 at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:64)
 at org.apache.pig.Main.main(Main.java:306)
 --
 If I remove the illustrate and replace it with dump MYDATA;  it works..
 --
 {code}
 MYDATA = load 'testfilelarge.txt' as (f1, f2, f3, f4, f5);
 MYDATA  = limit MYDATA 10;
 describe MYDATA;
 -- illustrate MYDATA;
 dump MYDATA;
 {code}
 --

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-06 Thread Richard Ding (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Richard Ding updated PIG-1090:
--

Attachment: PIG-1090-10.patch

Remove 'FIXME comments from JobControlCompiler class and open JIRA Pig-1181 to 
track the issue.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-2.patch, PIG-1090-3.patch, 
 PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, PIG-1090-8.patch, 
 PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1176) Column Pruner issues in union of loader with and without schema

2010-01-06 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1176?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797405#action_12797405
 ] 

Pradeep Kamath commented on PIG-1176:
-

+1

 Column Pruner issues in union of loader with and without schema
 ---

 Key: PIG-1176
 URL: https://issues.apache.org/jira/browse/PIG-1176
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.6.0
Reporter: Daniel Dai
Assignee: Daniel Dai
 Fix For: 0.7.0

 Attachments: PIG-1176-1.patch


 Column pruner for union could fail if one source of union have the schema and 
 the other does not have schema. For example, the following script fail:
 {code}
 a = load '1.txt' as (a0, a1, a2);
 b = foreach a generate a0;
 c = load '2.txt';
 d = foreach c generate $0;
 e = union b, d;
 dump e;
 {code}
 However, this issue is in trunk only and is not applicable to 0.6 branch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs

2010-01-06 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-480:


Status: Open  (was: Patch Available)

cancel this patch to add new patch to support combiner

 PERFORMANCE: Use identity mapper in a chain of M-R jobs
 ---

 Key: PIG-480
 URL: https://issues.apache.org/jira/browse/PIG-480
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Ying He
 Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch


 For jobs with two or more MR jobs, use identity mapper wherever possible in 
 second and subsequent MR jobs. Identity mapper is about 50% than pig empty 
 map job because it doesn't parse the data. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs

2010-01-06 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-480:


Attachment: PIG_480.patch

add support for combiner

 PERFORMANCE: Use identity mapper in a chain of M-R jobs
 ---

 Key: PIG-480
 URL: https://issues.apache.org/jira/browse/PIG-480
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Ying He
 Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch


 For jobs with two or more MR jobs, use identity mapper wherever possible in 
 second and subsequent MR jobs. Identity mapper is about 50% than pig empty 
 map job because it doesn't parse the data. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs

2010-01-06 Thread Ying He (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797412#action_12797412
 ] 

Ying He commented on PIG-480:
-

I did more performance tests. It shows the performance is related to the 
nature of data. If the data is skewed, performance is very bad for 
combiner case. If data is uniform,  the combiner case gets the most 
performance gain.  The test is done by using a join then a group by 
statement.

For skewed data, if I use skewed join, the result is much better.  I 
think the reason of bad performance for skewed data is that because the 
map plan of second job is moved to the reducer of first job. If data is 
skewed, a single reducer has to execute the extra logic for all its 
tuples. While without this patch, that part of logic would be executed 
inside multiple mappers. So we lost parallelism for this.  The more 
skewed the data is, the worse the performance would be. 

1. skewed data
combiner   job 1 job 2 total
patch 7min 53sec  1min 1sec8min 54sec
trunk 4min 43sec  1min 37sec  6min 20sec

combiner and using skewed join
patch1min 55sec  1min 1sec 2min 56sec
trunk1min 44sec  1min 40sec   3min 24sec

no combiner
patch2min 26sec  2min 28sec 4min 54sec
trunk1min 25sec  3min 24sec  4min 49sec

no combiner and using skewed join
patch   1min 17sec  3min 5sec   4min 22sec
trunk59sec   3min 7sec   4min 6sec

2. uniform data
combiner
patch   6min 48sec  3min 43sec10min 31sec
trunk7min 32sec  7min 3sec  14min 35sec

no combiner
patch   1min 25sec  2min 25sec 3min 50sec
trunk   1min 24sec  2min 28sec 3min 52sec

each group of tests may use different data, so don't make cross group 
comparison.


 PERFORMANCE: Use identity mapper in a chain of M-R jobs
 ---

 Key: PIG-480
 URL: https://issues.apache.org/jira/browse/PIG-480
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Ying He
 Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch


 For jobs with two or more MR jobs, use identity mapper wherever possible in 
 second and subsequent MR jobs. Identity mapper is about 50% than pig empty 
 map job because it doesn't parse the data. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs

2010-01-06 Thread Ying He (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying He updated PIG-480:


Status: Patch Available  (was: Open)

 PERFORMANCE: Use identity mapper in a chain of M-R jobs
 ---

 Key: PIG-480
 URL: https://issues.apache.org/jira/browse/PIG-480
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Ying He
 Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch


 For jobs with two or more MR jobs, use identity mapper wherever possible in 
 second and subsequent MR jobs. Identity mapper is about 50% than pig empty 
 map job because it doesn't parse the data. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-1090) Update sources to reflect recent changes in load-store interfaces

2010-01-06 Thread Pradeep Kamath (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pradeep Kamath updated PIG-1090:


Attachment: PIG-1090-11.patch

Patch to fix issue in TextLoader() wherein the output would be scrambled if 
different lines in the input are or different sizes. The issue was that 
TextInputFormat which is used by TextLoader reuses the memory buffer to provide 
each line of the input. So in TextLoader we need to make a copy for use.

 Update sources to reflect recent changes in load-store interfaces
 -

 Key: PIG-1090
 URL: https://issues.apache.org/jira/browse/PIG-1090
 Project: Pig
  Issue Type: Sub-task
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-1090-10.patch, PIG-1090-11.patch, PIG-1090-2.patch, 
 PIG-1090-3.patch, PIG-1090-4.patch, PIG-1090-6.patch, PIG-1090-7.patch, 
 PIG-1090-8.patch, PIG-1090-9.patch, PIG-1090.patch, PIG-1190-5.patch


 There have been some changes (as recorded in the Changes Section, Nov 2 2009 
 sub section of http://wiki.apache.org/pig/LoadStoreRedesignProposal) in the 
 load/store interfaces - this jira is to track the task of making those 
 changes under src. Changes under test will be addresses in a different jira.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Pig-trunk #660

2010-01-06 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/660/changes

Changes:

[daijy] PIG-1180: Piggybank should compile even if we only have 
pig-withouthadoop.jar but no pig.jar in the pig home directory

[pradeepkth] This is to cleanup the local mode code after switching to using 
hadoop local mode

--
[...truncated 240452 lines...]
[junit] 10/01/07 02:23:09 INFO datanode.DataNode: Receiving block 
blk_4529900263997112706_1017 src: /127.0.0.1:38624 dest: /127.0.0.1:50773
[junit] 10/01/07 02:23:09 INFO datanode.DataNode: Receiving block 
blk_4529900263997112706_1017 src: /127.0.0.1:55172 dest: /127.0.0.1:34653
[junit] 10/01/07 02:23:09 INFO datanode.DataNode: Receiving block 
blk_4529900263997112706_1017 src: /127.0.0.1:43946 dest: /127.0.0.1:32866
[junit] 10/01/07 02:23:09 INFO DataNode.clienttrace: src: /127.0.0.1:43946, 
dest: /127.0.0.1:32866, bytes: 48857, op: HDFS_WRITE, cliID: 
DFSClient_1941479077, srvID: DS-168095-127.0.1.1-32866-1262830956384, 
blockid: blk_4529900263997112706_1017
[junit] 10/01/07 02:23:09 INFO datanode.DataNode: PacketResponder 0 for 
block blk_4529900263997112706_1017 terminating
[junit] 10/01/07 02:23:09 INFO hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:32866 is added to 
blk_4529900263997112706_1017 size 48857
[junit] 10/01/07 02:23:09 INFO DataNode.clienttrace: src: /127.0.0.1:55172, 
dest: /127.0.0.1:34653, bytes: 48857, op: HDFS_WRITE, cliID: 
DFSClient_1941479077, srvID: DS-75923346-127.0.1.1-34653-1262830957336, 
blockid: blk_4529900263997112706_1017
[junit] 10/01/07 02:23:09 INFO datanode.DataNode: PacketResponder 1 for 
block blk_4529900263997112706_1017 terminating
[junit] 10/01/07 02:23:09 INFO hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:34653 is added to 
blk_4529900263997112706_1017 size 48857
[junit] 10/01/07 02:23:09 INFO DataNode.clienttrace: src: /127.0.0.1:38624, 
dest: /127.0.0.1:50773, bytes: 48857, op: HDFS_WRITE, cliID: 
DFSClient_1941479077, srvID: DS-1435789546-127.0.1.1-50773-1262830956883, 
blockid: blk_4529900263997112706_1017
[junit] 10/01/07 02:23:09 INFO datanode.DataNode: PacketResponder 2 for 
block blk_4529900263997112706_1017 terminating
[junit] 10/01/07 02:23:09 INFO hdfs.StateChange: BLOCK* 
NameSystem.addStoredBlock: blockMap updated: 127.0.0.1:50773 is added to 
blk_4529900263997112706_1017 size 48857
[junit] 10/01/07 02:23:09 INFO hdfs.StateChange: DIR* 
NameSystem.completeFile: file 
/tmp/temp-1306912187/tmp-891082635/_logs/history/localhost_1262830957897_job_20100107022237874_0002_conf.xml
 is closed by DFSClient_1941479077
[junit] 10/01/07 02:23:09 INFO datanode.DataNode: Deleting block 
blk_-4766788746537872722_1006 file 
build/test/data/dfs/data/data3/current/blk_-4766788746537872722
[junit] 10/01/07 02:23:09 INFO datanode.DataNode: Deleting block 
blk_6737284167407333489_1007 file 
build/test/data/dfs/data/data4/current/blk_6737284167407333489
[junit] 10/01/07 02:23:09 INFO mapReduceLayer.MapReduceLauncher: Submitting 
job: job_20100107022237874_0002 to execution engine.
[junit] 10/01/07 02:23:09 INFO mapReduceLayer.MapReduceLauncher: More 
information at: 
http://localhost:54010/jobdetails.jsp?jobid=job_20100107022237874_0002
[junit] 10/01/07 02:23:09 INFO mapReduceLayer.MapReduceLauncher: To kill 
this job, use: kill job_20100107022237874_0002
[junit] 10/01/07 02:23:10 INFO FSNamesystem.audit: ugi=hudson,hudson
ip=/127.0.0.1   cmd=open
src=/tmp/hadoop-hudson/mapred/system/job_20100107022237874_0002/job.split   
dst=nullperm=null
[junit] 10/01/07 02:23:10 INFO DataNode.clienttrace: src: /127.0.0.1:45812, 
dest: /127.0.0.1:34580, bytes: 1605, op: HDFS_READ, cliID: 
DFSClient_1941479077, srvID: DS-1109484639-127.0.1.1-45812-1262830957804, 
blockid: blk_-124486616345040_1014
[junit] 10/01/07 02:23:10 INFO mapred.JobInProgress: Input size for job 
job_20100107022237874_0002 = 12. Number of splits = 2
[junit] 10/01/07 02:23:10 INFO mapred.JobInProgress: 
tip:task_20100107022237874_0002_m_00 has split on 
node:/default-rack/h7.grid.sp2.yahoo.net
[junit] 10/01/07 02:23:10 INFO mapred.JobInProgress: 
tip:task_20100107022237874_0002_m_01 has split on 
node:/default-rack/h7.grid.sp2.yahoo.net
[junit] 10/01/07 02:23:10 INFO mapred.JobTracker: Adding task 
'attempt_20100107022237874_0002_m_03_0' to tip 
task_20100107022237874_0002_m_03, for tracker 
'tracker_host2.foo.com:localhost/127.0.0.1:40200'
[junit] 10/01/07 02:23:10 INFO mapred.TaskTracker: LaunchTaskAction 
(registerTask): attempt_20100107022237874_0002_m_03_0 task's 
state:UNASSIGNED
[junit] 10/01/07 02:23:10 INFO mapred.TaskTracker: Trying to launch : 
attempt_20100107022237874_0002_m_03_0
[junit] 10/01/07 02:23:10 INFO mapred.TaskTracker: In TaskLauncher, current 
free slots : 2 and trying to launch 

[jira] Commented: (PIG-480) PERFORMANCE: Use identity mapper in a chain of M-R jobs

2010-01-06 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797518#action_12797518
 ] 

Hadoop QA commented on PIG-480:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12429598/PIG_480.patch
  against trunk revision 896606.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 230 javac compiler warnings (more 
than the trunk's current 212 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

-1 release audit.  The applied patch generated 482 release audit warnings 
(more than the trunk's current 481 warnings).

-1 core tests.  The patch failed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/168/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/168/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/168/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/168/console

This message is automatically generated.

 PERFORMANCE: Use identity mapper in a chain of M-R jobs
 ---

 Key: PIG-480
 URL: https://issues.apache.org/jira/browse/PIG-480
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Olga Natkovich
Assignee: Ying He
 Attachments: PIG_480.patch, PIG_480.patch, PIG_480.patch


 For jobs with two or more MR jobs, use identity mapper wherever possible in 
 second and subsequent MR jobs. Identity mapper is about 50% than pig empty 
 map job because it doesn't parse the data. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.