[jira] Created: (PIG-805) removing dependency on consolidated hadoop.jar from pig.jar

2009-05-08 Thread Olga Natkovich (JIRA)
removing dependency on consolidated hadoop.jar from pig.jar
---

 Key: PIG-805
 URL: https://issues.apache.org/jira/browse/PIG-805
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Giridharan Kesavan


The proposal is to

- for compilation always use dependencies from Ivy
- for packaging to have 2 targets:

(1) The current jar target will not package it and rely on having it available 
at runtime
(2) Add a new target jar_with_hadoop that does what the current jar does
(3) Add a property that allows to build the release tar with and without 
hadoop.jar embeded into pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reopened PIG-794:



I think it was closed by mistake. The final patch has not been reviewed or 
committed yet

> Use Avro serialization in Pig
> -
>
> Key: PIG-794
> URL: https://issues.apache.org/jira/browse/PIG-794
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Rakesh Setty
> Fix For: 0.2.0
>
> Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
> jackson-asl-0.9.4.jar
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs 
> instead of the current BinStorage. Attached is an implementation of 
> AvroBinStorage which performs significantly better compared to BinStorage on 
> our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707576#action_12707576
 ] 

Rakesh Setty commented on PIG-794:
--

There was one important change I had to do in AvroStorage to the Avro format to 
get it working. The map keys were stored as String objects. I had to change it 
so that both key and value can be Object instances. Please let me know if this 
is an issue.

Thanks,
Rakesh

> Use Avro serialization in Pig
> -
>
> Key: PIG-794
> URL: https://issues.apache.org/jira/browse/PIG-794
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Rakesh Setty
> Fix For: 0.2.0
>
> Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
> jackson-asl-0.9.4.jar
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs 
> instead of the current BinStorage. Attached is an implementation of 
> AvroBinStorage which performs significantly better compared to BinStorage on 
> our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707570#action_12707570
 ] 

Rakesh Setty commented on PIG-794:
--

The new patch has unit tests. The comments are already in javadoc format. 
Please let me know if I have missed somewhere.

> Use Avro serialization in Pig
> -
>
> Key: PIG-794
> URL: https://issues.apache.org/jira/browse/PIG-794
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Rakesh Setty
> Fix For: 0.2.0
>
> Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
> jackson-asl-0.9.4.jar
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs 
> instead of the current BinStorage. Attached is an implementation of 
> AvroBinStorage which performs significantly better compared to BinStorage on 
> our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Setty updated PIG-794:
-

   Resolution: Fixed
Fix Version/s: 0.2.0
   Status: Resolved  (was: Patch Available)

> Use Avro serialization in Pig
> -
>
> Key: PIG-794
> URL: https://issues.apache.org/jira/browse/PIG-794
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Rakesh Setty
> Fix For: 0.2.0
>
> Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
> jackson-asl-0.9.4.jar
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs 
> instead of the current BinStorage. Attached is an implementation of 
> AvroBinStorage which performs significantly better compared to BinStorage on 
> our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Setty updated PIG-794:
-

Attachment: avro-0.1-dev-java.jar
AvroStorage.patch

Attaching the new patch along with the latest avro jar.

> Use Avro serialization in Pig
> -
>
> Key: PIG-794
> URL: https://issues.apache.org/jira/browse/PIG-794
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Rakesh Setty
> Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
> jackson-asl-0.9.4.jar
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs 
> instead of the current BinStorage. Attached is an implementation of 
> AvroBinStorage which performs significantly better compared to BinStorage on 
> our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Setty updated PIG-794:
-

Attachment: (was: avro-0.1-dev-java.jar)

> Use Avro serialization in Pig
> -
>
> Key: PIG-794
> URL: https://issues.apache.org/jira/browse/PIG-794
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Rakesh Setty
> Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
> jackson-asl-0.9.4.jar
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs 
> instead of the current BinStorage. Attached is an implementation of 
> AvroBinStorage which performs significantly better compared to BinStorage on 
> our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Setty updated PIG-794:
-

Attachment: (was: AvroStorage.patch)

> Use Avro serialization in Pig
> -
>
> Key: PIG-794
> URL: https://issues.apache.org/jira/browse/PIG-794
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Rakesh Setty
> Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
> jackson-asl-0.9.4.jar
>
>
> We would like to use Avro serialization in Pig to pass data between MR jobs 
> instead of the current BinStorage. Attached is an implementation of 
> AvroBinStorage which performs significantly better compared to BinStorage on 
> our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

Attachment: PIG-626.patch

A version of the patch that deals with the findbugs and javac warnings.

> Statistics (records read by each mapper and reducer)
> 
>
> Key: PIG-626
> URL: https://issues.apache.org/jira/browse/PIG-626
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Shubham Chopra
>Assignee: Shubham Chopra
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: PIG-626.patch, pigStats.patch, pigStats.patch, 
> pigStats.patch, pigStats.patch, pigStats.patch, 
> TEST-org.apache.pig.test.TestBZip.txt
>
>
> This uses the counters framework that hadoop has. Initially, I am just 
> interested in finding out the number of records read by each mapper/reducer 
> particularly for the last job in any script. A sample code to access the 
> statistics for the last job:
> String reducePlan = 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN");
> if(reducePlan == null) {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS"));
> } else {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS"));
> }
> The patch contains 7 test cases. These include tests PigStorage and 
> BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reopened PIG-626:



I should have checked my other window before I marked the bug as fixed.  The 
commit failed, I can't seem to contact Apache's SVN at the moment.  I'll commit 
the patch once I can.

> Statistics (records read by each mapper and reducer)
> 
>
> Key: PIG-626
> URL: https://issues.apache.org/jira/browse/PIG-626
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Shubham Chopra
>Assignee: Shubham Chopra
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
> pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt
>
>
> This uses the counters framework that hadoop has. Initially, I am just 
> interested in finding out the number of records read by each mapper/reducer 
> particularly for the last job in any script. A sample code to access the 
> statistics for the last job:
> String reducePlan = 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN");
> if(reducePlan == null) {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS"));
> } else {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS"));
> }
> The patch contains 7 test cases. These include tests PigStorage and 
> BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-175) Reading compressed files in local mode + MiniMRCluster

2009-05-08 Thread Craig Macdonald (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707468#action_12707468
 ] 

Craig Macdonald commented on PIG-175:
-

Enclosed are updated results fro Pig 0.2.0. In this version, MapReduce mode can 
now always parse gzip and bzip2 files file, however local mode cannot.


{noformat}
==
Bashs good friend: cat
==
Normal
A
B
C
bz2
A
B
C
gzip
A
B
C
==
MiniMRCluster
==
test.all.pig
2009-05-08 19:56:51,715 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: file:///
2009-05-08 19:56:52,034 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Initializing JVM Metrics with processName=JobTracker, sessionId=
2009-05-08 19:56:54,686 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2009-05-08 19:56:54,717 [Thread-3] WARN  org.apache.hadoop.mapred.JobClient - 
Use GenericOptionsParser for parsing the arguments. Applications should 
implement Tool for the same.
2009-05-08 19:56:55,718 [Thread-9] INFO  org.apache.hadoop.mapred.MapTask - 
numReduceTasks: 0
2009-05-08 19:56:56,015 [Thread-9] INFO  
org.apache.hadoop.mapred.LocalJobRunner - 
2009-05-08 19:56:56,020 [Thread-9] INFO  org.apache.hadoop.mapred.TaskRunner - 
Task 'attempt_local_0001_m_00_0' done.
2009-05-08 19:56:56,030 [Thread-9] INFO  org.apache.hadoop.mapred.TaskRunner - 
Saved output of task 'attempt_local_0001_m_00_0' to 
file:/tmp/temp442336691/tmp1233577046
2009-05-08 19:56:59,714 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2009-05-08 19:57:04,720 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2009-05-08 19:57:04,720 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Success!
(A)
(B)
(C)
2009-05-08 19:57:06,148 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2009-05-08 19:57:06,153 [Thread-10] WARN  org.apache.hadoop.mapred.JobClient - 
Use GenericOptionsParser for parsing the arguments. Applications should 
implement Tool for the same.
2009-05-08 19:57:06,450 [Thread-16] INFO  org.apache.hadoop.mapred.MapTask - 
numReduceTasks: 0
2009-05-08 19:57:06,512 [Thread-16] INFO  
org.apache.hadoop.mapred.LocalJobRunner - 
2009-05-08 19:57:06,514 [Thread-16] INFO  org.apache.hadoop.mapred.TaskRunner - 
Task 'attempt_local_0002_m_00_0' done.
2009-05-08 19:57:06,519 [Thread-16] INFO  org.apache.hadoop.mapred.TaskRunner - 
Saved output of task 'attempt_local_0002_m_00_0' to 
file:/tmp/temp442336691/tmp-1848149730
2009-05-08 19:57:11,152 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2009-05-08 19:57:16,154 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2009-05-08 19:57:16,154 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Success!
(A)
(B)
(C)
2009-05-08 19:57:17,114 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2009-05-08 19:57:17,118 [Thread-17] WARN  org.apache.hadoop.mapred.JobClient - 
Use GenericOptionsParser for parsing the arguments. Applications should 
implement Tool for the same.
2009-05-08 19:57:17,359 [Thread-23] INFO  org.apache.hadoop.mapred.MapTask - 
numReduceTasks: 0
2009-05-08 19:57:17,520 [Thread-23] INFO  
org.apache.hadoop.mapred.LocalJobRunner - 
2009-05-08 19:57:17,523 [Thread-23] INFO  org.apache.hadoop.mapred.TaskRunner - 
Task 'attempt_local_0003_m_00_0' done.
2009-05-08 19:57:17,528 [Thread-23] INFO  org.apache.hadoop.mapred.TaskRunner - 
Saved output of task 'attempt_local_0003_m_00_0' to 
file:/tmp/temp442336691/tmp97423898
2009-05-08 19:57:22,119 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2009-05-08 19:57:27,122 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2009-05-08 19:57:27,122 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Success!
(A)
(B)
(C)
test.bz2.pig
2009-05-08 19:57:28,096 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: file:///
2009-05-08 19:57:28,401 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Initializing JVM Metrics with processName=JobTracker, session

[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

   Resolution: Fixed
Fix Version/s: 0.3.0
   Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Shubham for your patience on this one.

> Statistics (records read by each mapper and reducer)
> 
>
> Key: PIG-626
> URL: https://issues.apache.org/jira/browse/PIG-626
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Shubham Chopra
>Assignee: Shubham Chopra
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
> pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt
>
>
> This uses the counters framework that hadoop has. Initially, I am just 
> interested in finding out the number of records read by each mapper/reducer 
> particularly for the last job in any script. A sample code to access the 
> statistics for the last job:
> String reducePlan = 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN");
> if(reducePlan == null) {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS"));
> } else {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS"));
> }
> The patch contains 7 test cases. These include tests PigStorage and 
> BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707432#action_12707432
 ] 

Hadoop QA commented on PIG-626:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12407613/pigStats.patch
  against trunk revision 772750.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 226 javac compiler warnings (more 
than the trunk's current 225 warnings).

-1 findbugs.  The patch appears to introduce 2 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/33/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/33/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/33/console

This message is automatically generated.

> Statistics (records read by each mapper and reducer)
> 
>
> Key: PIG-626
> URL: https://issues.apache.org/jira/browse/PIG-626
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Shubham Chopra
>Assignee: Shubham Chopra
>Priority: Minor
> Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
> pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt
>
>
> This uses the counters framework that hadoop has. Initially, I am just 
> interested in finding out the number of records read by each mapper/reducer 
> particularly for the last job in any script. A sample code to access the 
> statistics for the last job:
> String reducePlan = 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN");
> if(reducePlan == null) {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS"));
> } else {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS"));
> }
> The patch contains 7 test cases. These include tests PigStorage and 
> BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

Assignee: Shubham Chopra
  Status: Patch Available  (was: Open)

> Statistics (records read by each mapper and reducer)
> 
>
> Key: PIG-626
> URL: https://issues.apache.org/jira/browse/PIG-626
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Shubham Chopra
>Assignee: Shubham Chopra
>Priority: Minor
> Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
> pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt
>
>
> This uses the counters framework that hadoop has. Initially, I am just 
> interested in finding out the number of records read by each mapper/reducer 
> particularly for the last job in any script. A sample code to access the 
> statistics for the last job:
> String reducePlan = 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN");
> if(reducePlan == null) {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS"));
> } else {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS"));
> }
> The patch contains 7 test cases. These include tests PigStorage and 
> BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-734) Non-string keys in maps

2009-05-08 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707381#action_12707381
 ] 

Alan Gates commented on PIG-734:


I wasn't planning on making mymap#1 translate to mymap#'1'.  The issue I see 
with that is if that works, why doesn't mymap#intcol work?  I'm concerned that 
sayings keys need to be strings but then cheating in this one case will make 
the semantics confusing.

> Non-string keys in maps
> ---
>
> Key: PIG-734
> URL: https://issues.apache.org/jira/browse/PIG-734
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: PIG-734.patch
>
>
> With the addition of types to pig, maps were changed to allow any atomic type 
> to be a key.  However, in practice we do not see people using keys other than 
> strings.  And allowing multiple types is causing us issues in serializing 
> data (we have to check what every key type is) and in the design for non-java 
> UDFs (since many scripting languages include associative arrays such as 
> Perl's hash).
> So I propose we scope back maps to only have string keys.  This would be a 
> non-compatible change.  But I am not aware of anyone using non-string keys, 
> so hopefully it would have little or no impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-734) Non-string keys in maps

2009-05-08 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12707377#action_12707377
 ] 

David Ciemiewicz commented on PIG-734:
--

Alan, if I don't think this is going to be that problematic.

Even if I try to pass in a map dereference with an integer such as mymap#1, 
would pig automagically convert the 1 to a string equivalent to mymap#'1'.  If 
so, I think this would be quite acceptable.

> Non-string keys in maps
> ---
>
> Key: PIG-734
> URL: https://issues.apache.org/jira/browse/PIG-734
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.2.0
>Reporter: Alan Gates
>Assignee: Alan Gates
>Priority: Minor
> Fix For: 0.3.0
>
> Attachments: PIG-734.patch
>
>
> With the addition of types to pig, maps were changed to allow any atomic type 
> to be a key.  However, in practice we do not see people using keys other than 
> strings.  And allowing multiple types is causing us issues in serializing 
> data (we have to check what every key type is) and in the design for non-java 
> UDFs (since many scripting languages include associative arrays such as 
> Perl's hash).
> So I propose we scope back maps to only have string keys.  This would be a 
> non-compatible change.  But I am not aware of anyone using non-string keys, 
> so hopefully it would have little or no impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Shubham Chopra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chopra updated PIG-626:
---

Attachment: pigStats.patch

Patch compatible with the latest trunk attached.

> Statistics (records read by each mapper and reducer)
> 
>
> Key: PIG-626
> URL: https://issues.apache.org/jira/browse/PIG-626
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.2.0
>Reporter: Shubham Chopra
>Priority: Minor
> Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
> pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt
>
>
> This uses the counters framework that hadoop has. Initially, I am just 
> interested in finding out the number of records read by each mapper/reducer 
> particularly for the last job in any script. A sample code to access the 
> statistics for the last job:
> String reducePlan = 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_PLAN");
> if(reducePlan == null) {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_MAP_OUTPUT_RECORDS"));
> } else {
> System.out.println("Records written : " + 
> stats.getPigStats().get(stats.getLastJobID()).get("PIG_STATS_REDUCE_OUTPUT_RECORDS"));
> }
> The patch contains 7 test cases. These include tests PigStorage and 
> BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.