[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

Status: Patch Available  (was: Reopened)

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Fix For: 0.3.0

 Attachments: PIG-626.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, pigStats.patch, 
 TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Shubham Chopra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chopra updated PIG-626:
---

Attachment: pigStats.patch

Patch compatible with the latest trunk attached.

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Priority: Minor
 Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

Assignee: Shubham Chopra
  Status: Patch Available  (was: Open)

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

   Resolution: Fixed
Fix Version/s: 0.3.0
   Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Shubham for your patience on this one.

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Fix For: 0.3.0

 Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

Attachment: PIG-626.patch

A version of the patch that deals with the findbugs and javac warnings.

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Fix For: 0.3.0

 Attachments: PIG-626.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, pigStats.patch, 
 TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-02-24 Thread Shubham Chopra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chopra updated PIG-626:
---

Attachment: pigStats.patch

PigStats patch.

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: types_branch
Reporter: Shubham Chopra
Priority: Minor
 Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
 TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-01-29 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

Attachment: TEST-org.apache.pig.test.TestBZip.txt

Results of TestBZIp

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: types_branch
Reporter: Shubham Chopra
Priority: Minor
 Attachments: pigStats.patch, pigStats.patch, 
 TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-01-20 Thread Shubham Chopra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chopra updated PIG-626:
---

Attachment: pigStats.patch


1. The initial goal is to just get the pig query result sizes. I try to get the 
no. of records written by the last job which gives me the required metric. In 
addition to this I am making the stats for all the jobs available. I am not 
adding any specific stat to map/reduce.
2. I am only using the hadoop counters here. I am not registering any 
pig-specific counters. Olga had suggested that I make the counters available in 
the form of a map with the keys starting with PIG_STATS to reference the 
required counter. 
Hadoop has a bunch of counters that it uses to display stats in its map-red 
webUI. These counters give the no. of records and bytes read/written by 
map/reduce of each job. I am making all the hadoop stats, at the granularity of 
a record, available.
3. I have rectified this in the new patch.
4. These counters are incremented within hadoop.


 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: types_branch
Reporter: Shubham Chopra
Priority: Minor
 Attachments: pigStats.patch, pigStats.patch


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.