[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)
[ https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-626: --- Status: Patch Available (was: Reopened) Statistics (records read by each mapper and reducer) Key: PIG-626 URL: https://issues.apache.org/jira/browse/PIG-626 Project: Pig Issue Type: New Feature Components: impl Affects Versions: 0.2.0 Reporter: Shubham Chopra Assignee: Shubham Chopra Priority: Minor Fix For: 0.3.0 Attachments: PIG-626.patch, pigStats.patch, pigStats.patch, pigStats.patch, pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt This uses the counters framework that hadoop has. Initially, I am just interested in finding out the number of records read by each mapper/reducer particularly for the last job in any script. A sample code to access the statistics for the last job: String reducePlan = stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN); if(reducePlan == null) { System.out.println(Records written : + stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS)); } else { System.out.println(Records written : + stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS)); } The patch contains 7 test cases. These include tests PigStorage and BinStorage along with one for multiple MR jobs case. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-799) Unit tests on windows are failing after multiquery commit
[ https://issues.apache.org/jira/browse/PIG-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Dai updated PIG-799: --- Attachment: PIG-799.patch The failure is caused by changed logic of QueryParser.massageFilename in multi-query patch. I attached patch and please review. Unit tests on windows are failing after multiquery commit - Key: PIG-799 URL: https://issues.apache.org/jira/browse/PIG-799 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Daniel Dai Attachments: PIG-799.patch Daniel could you take a look. It should be reproducible with the latest trunk. Thanks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-799) Unit tests on windows are failing after multiquery commit
[ https://issues.apache.org/jira/browse/PIG-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-799: --- Status: Patch Available (was: Open) Unit tests on windows are failing after multiquery commit - Key: PIG-799 URL: https://issues.apache.org/jira/browse/PIG-799 Project: Pig Issue Type: Bug Reporter: Olga Natkovich Assignee: Daniel Dai Attachments: PIG-799.patch Daniel could you take a look. It should be reproducible with the latest trunk. Thanks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (PIG-781) Error reporting for failed MR jobs
[ https://issues.apache.org/jira/browse/PIG-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Olga Natkovich updated PIG-781: --- Status: Patch Available (was: Open) Error reporting for failed MR jobs -- Key: PIG-781 URL: https://issues.apache.org/jira/browse/PIG-781 Project: Pig Issue Type: Improvement Reporter: Gunther Hagleitner Attachments: partial_failure.patch, partial_failure.patch If we have multiple MR jobs to run and some of them fail the behavior of the system is to not stop on the first failure but to keep going. That way jobs that do not depend on the failed job might still succeed. The question is to how best report this scenario to a user. How do we tell which jobs failed and which didn't? One way could be to tie jobs to stores and report which store locations won't have data and which ones do. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (PIG-788) Proposal to remove float from Pig data types
[ https://issues.apache.org/jira/browse/PIG-788?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates reassigned PIG-788: -- Assignee: Alan Gates Proposal to remove float from Pig data types Key: PIG-788 URL: https://issues.apache.org/jira/browse/PIG-788 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Alan Gates Assignee: Alan Gates Pig would like to use the new Hadoop Avro serialization package to pass data between MR jobs, and eventually between Pig and UDFs that are not written in Java. Avro will not be supporting the float data type, but only double (see AVRO-17). Pig currently support both float and double. Double is the default floating point type (so if the user says x + 1.0, 1.0 is taken to be a double, not a float). Float was initially included in the list of Pig types because Hadoop supported it as one of the Writable types, and we were trying to make sure all of Hadoop's writable types could be represented in Pig. In practice we do not see anyone using the float type. In order to be able to easily use Avro I propose dropping the float type. Please speak up if you are using the float type and you have a compelling reason not to use double. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (PIG-788) Proposal to remove float from Pig data types
[ https://issues.apache.org/jira/browse/PIG-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708274#action_12708274 ] Santhosh Srinivasan commented on PIG-788: - -1 on this jira for the following reasons: 1. floats take 4 bytes as opposed to doubles that take 8 bytes 2. Floating point operations are much faster than operations on doubles 3. Issue of breaking backward compatibility at the cost of slower performance (and not faster performance) 4. A storage layer should not dictate how a higher layer evolves. Proposal to remove float from Pig data types Key: PIG-788 URL: https://issues.apache.org/jira/browse/PIG-788 Project: Pig Issue Type: Improvement Components: impl Affects Versions: 0.2.0 Reporter: Alan Gates Assignee: Alan Gates Pig would like to use the new Hadoop Avro serialization package to pass data between MR jobs, and eventually between Pig and UDFs that are not written in Java. Avro will not be supporting the float data type, but only double (see AVRO-17). Pig currently support both float and double. Double is the default floating point type (so if the user says x + 1.0, 1.0 is taken to be a double, not a float). Float was initially included in the list of Pig types because Hadoop supported it as one of the Writable types, and we were trying to make sure all of Hadoop's writable types could be represented in Pig. In practice we do not see anyone using the float type. In order to be able to easily use Avro I propose dropping the float type. Please speak up if you are using the float type and you have a compelling reason not to use double. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.