[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Shubham Chopra (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chopra updated PIG-626:
---

Attachment: pigStats.patch

Patch compatible with the latest trunk attached.

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Priority: Minor
 Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-734) Non-string keys in maps

2009-05-08 Thread David Ciemiewicz (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-734?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707377#action_12707377
 ] 

David Ciemiewicz commented on PIG-734:
--

Alan, if I don't think this is going to be that problematic.

Even if I try to pass in a map dereference with an integer such as mymap#1, 
would pig automagically convert the 1 to a string equivalent to mymap#'1'.  If 
so, I think this would be quite acceptable.

 Non-string keys in maps
 ---

 Key: PIG-734
 URL: https://issues.apache.org/jira/browse/PIG-734
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.2.0
Reporter: Alan Gates
Assignee: Alan Gates
Priority: Minor
 Fix For: 0.3.0

 Attachments: PIG-734.patch


 With the addition of types to pig, maps were changed to allow any atomic type 
 to be a key.  However, in practice we do not see people using keys other than 
 strings.  And allowing multiple types is causing us issues in serializing 
 data (we have to check what every key type is) and in the design for non-java 
 UDFs (since many scripting languages include associative arrays such as 
 Perl's hash).
 So I propose we scope back maps to only have string keys.  This would be a 
 non-compatible change.  But I am not aware of anyone using non-string keys, 
 so hopefully it would have little or no impact.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

Assignee: Shubham Chopra
  Status: Patch Available  (was: Open)

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707432#action_12707432
 ] 

Hadoop QA commented on PIG-626:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12407613/pigStats.patch
  against trunk revision 772750.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 226 javac compiler warnings (more 
than the trunk's current 225 warnings).

-1 findbugs.  The patch appears to introduce 2 new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/33/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/33/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/33/console

This message is automatically generated.

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

   Resolution: Fixed
Fix Version/s: 0.3.0
   Status: Resolved  (was: Patch Available)

Patch checked in.  Thanks Shubham for your patience on this one.

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Fix For: 0.3.0

 Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-175) Reading compressed files in local mode + MiniMRCluster

2009-05-08 Thread Craig Macdonald (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-175?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707468#action_12707468
 ] 

Craig Macdonald commented on PIG-175:
-

Enclosed are updated results fro Pig 0.2.0. In this version, MapReduce mode can 
now always parse gzip and bzip2 files file, however local mode cannot.


{noformat}
==
Bashs good friend: cat
==
Normal
A
B
C
bz2
A
B
C
gzip
A
B
C
==
MiniMRCluster
==
test.all.pig
2009-05-08 19:56:51,715 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: file:///
2009-05-08 19:56:52,034 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Initializing JVM Metrics with processName=JobTracker, sessionId=
2009-05-08 19:56:54,686 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2009-05-08 19:56:54,717 [Thread-3] WARN  org.apache.hadoop.mapred.JobClient - 
Use GenericOptionsParser for parsing the arguments. Applications should 
implement Tool for the same.
2009-05-08 19:56:55,718 [Thread-9] INFO  org.apache.hadoop.mapred.MapTask - 
numReduceTasks: 0
2009-05-08 19:56:56,015 [Thread-9] INFO  
org.apache.hadoop.mapred.LocalJobRunner - 
2009-05-08 19:56:56,020 [Thread-9] INFO  org.apache.hadoop.mapred.TaskRunner - 
Task 'attempt_local_0001_m_00_0' done.
2009-05-08 19:56:56,030 [Thread-9] INFO  org.apache.hadoop.mapred.TaskRunner - 
Saved output of task 'attempt_local_0001_m_00_0' to 
file:/tmp/temp442336691/tmp1233577046
2009-05-08 19:56:59,714 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2009-05-08 19:57:04,720 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2009-05-08 19:57:04,720 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Success!
(A)
(B)
(C)
2009-05-08 19:57:06,148 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2009-05-08 19:57:06,153 [Thread-10] WARN  org.apache.hadoop.mapred.JobClient - 
Use GenericOptionsParser for parsing the arguments. Applications should 
implement Tool for the same.
2009-05-08 19:57:06,450 [Thread-16] INFO  org.apache.hadoop.mapred.MapTask - 
numReduceTasks: 0
2009-05-08 19:57:06,512 [Thread-16] INFO  
org.apache.hadoop.mapred.LocalJobRunner - 
2009-05-08 19:57:06,514 [Thread-16] INFO  org.apache.hadoop.mapred.TaskRunner - 
Task 'attempt_local_0002_m_00_0' done.
2009-05-08 19:57:06,519 [Thread-16] INFO  org.apache.hadoop.mapred.TaskRunner - 
Saved output of task 'attempt_local_0002_m_00_0' to 
file:/tmp/temp442336691/tmp-1848149730
2009-05-08 19:57:11,152 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2009-05-08 19:57:16,154 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2009-05-08 19:57:16,154 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Success!
(A)
(B)
(C)
2009-05-08 19:57:17,114 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Cannot initialize JVM Metrics with processName=JobTracker, sessionId= - already 
initialized
2009-05-08 19:57:17,118 [Thread-17] WARN  org.apache.hadoop.mapred.JobClient - 
Use GenericOptionsParser for parsing the arguments. Applications should 
implement Tool for the same.
2009-05-08 19:57:17,359 [Thread-23] INFO  org.apache.hadoop.mapred.MapTask - 
numReduceTasks: 0
2009-05-08 19:57:17,520 [Thread-23] INFO  
org.apache.hadoop.mapred.LocalJobRunner - 
2009-05-08 19:57:17,523 [Thread-23] INFO  org.apache.hadoop.mapred.TaskRunner - 
Task 'attempt_local_0003_m_00_0' done.
2009-05-08 19:57:17,528 [Thread-23] INFO  org.apache.hadoop.mapred.TaskRunner - 
Saved output of task 'attempt_local_0003_m_00_0' to 
file:/tmp/temp442336691/tmp97423898
2009-05-08 19:57:22,119 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 0% complete
2009-05-08 19:57:27,122 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- 100% complete
2009-05-08 19:57:27,122 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher 
- Success!
(A)
(B)
(C)
test.bz2.pig
2009-05-08 19:57:28,096 [main] INFO  
org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting to 
hadoop file system at: file:///
2009-05-08 19:57:28,401 [main] INFO  org.apache.hadoop.metrics.jvm.JvmMetrics - 
Initializing JVM Metrics with processName=JobTracker, 

[jira] Reopened: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates reopened PIG-626:



I should have checked my other window before I marked the bug as fixed.  The 
commit failed, I can't seem to contact Apache's SVN at the moment.  I'll commit 
the patch once I can.

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Fix For: 0.3.0

 Attachments: pigStats.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-08 Thread Alan Gates (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Gates updated PIG-626:
---

Attachment: PIG-626.patch

A version of the patch that deals with the findbugs and javac warnings.

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Fix For: 0.3.0

 Attachments: PIG-626.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, pigStats.patch, 
 TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Setty updated PIG-794:
-

Attachment: avro-0.1-dev-java.jar
AvroStorage.patch

Attaching the new patch along with the latest avro jar.

 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
 Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
 jackson-asl-0.9.4.jar


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Setty updated PIG-794:
-

Attachment: (was: avro-0.1-dev-java.jar)

 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
 Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
 jackson-asl-0.9.4.jar


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Setty updated PIG-794:
-

Attachment: (was: AvroStorage.patch)

 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
 Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
 jackson-asl-0.9.4.jar


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rakesh Setty updated PIG-794:
-

   Resolution: Fixed
Fix Version/s: 0.2.0
   Status: Resolved  (was: Patch Available)

 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
 Fix For: 0.2.0

 Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
 jackson-asl-0.9.4.jar


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707570#action_12707570
 ] 

Rakesh Setty commented on PIG-794:
--

The new patch has unit tests. The comments are already in javadoc format. 
Please let me know if I have missed somewhere.

 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
 Fix For: 0.2.0

 Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
 jackson-asl-0.9.4.jar


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Rakesh Setty (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707576#action_12707576
 ] 

Rakesh Setty commented on PIG-794:
--

There was one important change I had to do in AvroStorage to the Avro format to 
get it working. The map keys were stored as String objects. I had to change it 
so that both key and value can be Object instances. Please let me know if this 
is an issue.

Thanks,
Rakesh

 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
 Fix For: 0.2.0

 Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
 jackson-asl-0.9.4.jar


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Reopened: (PIG-794) Use Avro serialization in Pig

2009-05-08 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich reopened PIG-794:



I think it was closed by mistake. The final patch has not been reviewed or 
committed yet

 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
 Fix For: 0.2.0

 Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
 jackson-asl-0.9.4.jar


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-805) removing dependency on consolidated hadoop.jar from pig.jar

2009-05-08 Thread Olga Natkovich (JIRA)
removing dependency on consolidated hadoop.jar from pig.jar
---

 Key: PIG-805
 URL: https://issues.apache.org/jira/browse/PIG-805
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich
Assignee: Giridharan Kesavan


The proposal is to

- for compilation always use dependencies from Ivy
- for packaging to have 2 targets:

(1) The current jar target will not package it and rely on having it available 
at runtime
(2) Add a new target jar_with_hadoop that does what the current jar does
(3) Add a property that allows to build the release tar with and without 
hadoop.jar embeded into pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.