[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

Status: Patch Available  (was: Reopened)

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Fix For: 0.3.0

 Attachments: PIG-626.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, pigStats.patch, 
 TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-799) Unit tests on windows are failing after multiquery commit

2009-05-11 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-799:
---

Attachment: PIG-799.patch

The failure is caused by changed logic of QueryParser.massageFilename in 
multi-query patch. I attached patch and please review.

 Unit tests on windows are failing after multiquery commit
 -

 Key: PIG-799
 URL: https://issues.apache.org/jira/browse/PIG-799
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Attachments: PIG-799.patch


 Daniel could you take a look. It should be reproducible with the latest 
 trunk. Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-799) Unit tests on windows are failing after multiquery commit

2009-05-11 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-799:
---

Status: Patch Available  (was: Open)

 Unit tests on windows are failing after multiquery commit
 -

 Key: PIG-799
 URL: https://issues.apache.org/jira/browse/PIG-799
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Attachments: PIG-799.patch


 Daniel could you take a look. It should be reproducible with the latest 
 trunk. Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-781) Error reporting for failed MR jobs

2009-05-11 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-781:
---

Status: Patch Available  (was: Open)

 Error reporting for failed MR jobs
 --

 Key: PIG-781
 URL: https://issues.apache.org/jira/browse/PIG-781
 Project: Pig
  Issue Type: Improvement
Reporter: Gunther Hagleitner
 Attachments: partial_failure.patch, partial_failure.patch


 If we have multiple MR jobs to run and some of them fail the behavior of the 
 system is to not stop on the first failure but to keep going. That way jobs 
 that do not depend on the failed job might still succeed.
 The question is to how best report this scenario to a user. How do we tell 
 which jobs failed and which didn't?
 One way could be to tie jobs to stores and report which store locations won't 
 have data and which ones do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-788) Proposal to remove float from Pig data types

2009-05-11 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reassigned PIG-788:
--

Assignee: Alan Gates

 Proposal to remove float from Pig data types
 

 Key: PIG-788
 URL: https://issues.apache.org/jira/browse/PIG-788
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Alan Gates
Assignee: Alan Gates

 Pig would like to use the new Hadoop Avro serialization package to pass data 
 between MR jobs, and eventually between Pig and UDFs that are not written in 
 Java.  Avro will not be supporting the float data type, but only double (see 
 AVRO-17).  Pig currently support both float and double.  Double is the 
 default floating point type (so if the user says x + 1.0, 1.0 is taken to be 
 a double, not a float).  Float was initially included in the list of Pig 
 types because Hadoop supported it as one of the Writable types, and we were 
 trying to make sure all of Hadoop's writable types could be represented in 
 Pig.  
 In practice we do not see anyone using the float type.   In order to be able 
 to easily use Avro I propose dropping the float type.  
 Please speak up if you are using the float type and you have a compelling 
 reason not to use double.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-788) Proposal to remove float from Pig data types

2009-05-11 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708274#action_12708274
 ] 

Santhosh Srinivasan commented on PIG-788:
-

-1 on this jira for the following reasons:

1. floats take 4 bytes as opposed to doubles that take 8 bytes
2. Floating point operations are much faster than operations on doubles
3. Issue of breaking backward compatibility at the cost of slower performance 
(and not faster performance)
4. A storage layer should not dictate how a higher layer evolves.

 Proposal to remove float from Pig data types
 

 Key: PIG-788
 URL: https://issues.apache.org/jira/browse/PIG-788
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Alan Gates
Assignee: Alan Gates

 Pig would like to use the new Hadoop Avro serialization package to pass data 
 between MR jobs, and eventually between Pig and UDFs that are not written in 
 Java.  Avro will not be supporting the float data type, but only double (see 
 AVRO-17).  Pig currently support both float and double.  Double is the 
 default floating point type (so if the user says x + 1.0, 1.0 is taken to be 
 a double, not a float).  Float was initially included in the list of Pig 
 types because Hadoop supported it as one of the Writable types, and we were 
 trying to make sure all of Hadoop's writable types could be represented in 
 Pig.  
 In practice we do not see anyone using the float type.   In order to be able 
 to easily use Avro I propose dropping the float type.  
 Please speak up if you are using the float type and you have a compelling 
 reason not to use double.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.