[jira] Commented: (PIG-788) Proposal to remove float from Pig data types

2009-05-12 Thread Mridul Muralidharan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708341#action_12708341
 ] 

Mridul Muralidharan commented on PIG-788:
-

We do use floats quite a bit in our projects, so assertion of we do not see 
anyone using the float type is not correct.
Even the webdata (and webmap too iirc) uses float for some of its fields.

Agree with rest of Santhosh' comments above (11/May/09 05:52 PM) too.

 Proposal to remove float from Pig data types
 

 Key: PIG-788
 URL: https://issues.apache.org/jira/browse/PIG-788
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Alan Gates
Assignee: Alan Gates

 Pig would like to use the new Hadoop Avro serialization package to pass data 
 between MR jobs, and eventually between Pig and UDFs that are not written in 
 Java.  Avro will not be supporting the float data type, but only double (see 
 AVRO-17).  Pig currently support both float and double.  Double is the 
 default floating point type (so if the user says x + 1.0, 1.0 is taken to be 
 a double, not a float).  Float was initially included in the list of Pig 
 types because Hadoop supported it as one of the Writable types, and we were 
 trying to make sure all of Hadoop's writable types could be represented in 
 Pig.  
 In practice we do not see anyone using the float type.   In order to be able 
 to easily use Avro I propose dropping the float type.  
 Please speak up if you are using the float type and you have a compelling 
 reason not to use double.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-806) to remove author tags in the pig source code

2009-05-12 Thread Giridharan Kesavan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708422#action_12708422
 ] 

Giridharan Kesavan commented on PIG-806:


This issue blocks : 
https://issues.apache.org/jira/browse/PIG-765

 to remove author tags in the pig source code
 

 Key: PIG-806
 URL: https://issues.apache.org/jira/browse/PIG-806
 Project: Pig
  Issue Type: Bug
Reporter: Giridharan Kesavan

 Following java source files has author tags in them ; which need to to be 
 cleaned. 
 src/org/apache/pig/Algebraic.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCross.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCogroup.java
 src/org/apache/pig/impl/io/FileSpec.java
 src/org/apache/pig/impl/streaming/StreamingCommand.java
 src/org/apache/pig/StoreFunc.java
 src/org/apache/pig/tools/cmdline/CmdLineParser.java
 src/org/apache/pig/tools/timer/PerformanceTimer.java
 src/org/apache/pig/tools/timer/PerformanceTimerFactory.java
 Thanks,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Pig-trunk #432

2009-05-12 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Pig-trunk/432/changes




[jira] Updated: (PIG-765) to implement jdiff

2009-05-12 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-765:
---

Attachment: pig-765.patch

ported patch to resolve the jdiff dependencies using ivy.
tnx!

 to implement jdiff
 --

 Key: PIG-765
 URL: https://issues.apache.org/jira/browse/PIG-765
 Project: Pig
  Issue Type: Improvement
  Components: build
Reporter: Giridharan Kesavan
Assignee: Giridharan Kesavan
 Attachments: pig-765.patch, pig-765.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-806) to remove author tags in the pig source code

2009-05-12 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708462#action_12708462
 ] 

Olga Natkovich commented on PIG-806:


what does author tags mean? Are you talking about control characters?

 to remove author tags in the pig source code
 

 Key: PIG-806
 URL: https://issues.apache.org/jira/browse/PIG-806
 Project: Pig
  Issue Type: Bug
Reporter: Giridharan Kesavan

 Following java source files has author tags in them ; which need to to be 
 cleaned. 
 src/org/apache/pig/Algebraic.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCross.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCogroup.java
 src/org/apache/pig/impl/io/FileSpec.java
 src/org/apache/pig/impl/streaming/StreamingCommand.java
 src/org/apache/pig/StoreFunc.java
 src/org/apache/pig/tools/cmdline/CmdLineParser.java
 src/org/apache/pig/tools/timer/PerformanceTimer.java
 src/org/apache/pig/tools/timer/PerformanceTimerFactory.java
 Thanks,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-806) to remove author tags in the pig source code

2009-05-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708464#action_12708464
 ] 

Thejas M Nair commented on PIG-806:
---

Example of author tag in a java file -

/*
* @author xyz
*/


 to remove author tags in the pig source code
 

 Key: PIG-806
 URL: https://issues.apache.org/jira/browse/PIG-806
 Project: Pig
  Issue Type: Bug
Reporter: Giridharan Kesavan

 Following java source files has author tags in them ; which need to to be 
 cleaned. 
 src/org/apache/pig/Algebraic.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCross.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCogroup.java
 src/org/apache/pig/impl/io/FileSpec.java
 src/org/apache/pig/impl/streaming/StreamingCommand.java
 src/org/apache/pig/StoreFunc.java
 src/org/apache/pig/tools/cmdline/CmdLineParser.java
 src/org/apache/pig/tools/timer/PerformanceTimer.java
 src/org/apache/pig/tools/timer/PerformanceTimerFactory.java
 Thanks,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Issue Comment Edited: (PIG-806) to remove author tags in the pig source code

2009-05-12 Thread Thejas M Nair (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708464#action_12708464
 ] 

Thejas M Nair edited comment on PIG-806 at 5/12/09 8:28 AM:


Example of author tag in a java file -

{code}

/*
* @author xyz
*/
{code}

  was (Author: thejas):
Example of author tag in a java file -

/*
* @author xyz
*/

  
 to remove author tags in the pig source code
 

 Key: PIG-806
 URL: https://issues.apache.org/jira/browse/PIG-806
 Project: Pig
  Issue Type: Bug
Reporter: Giridharan Kesavan

 Following java source files has author tags in them ; which need to to be 
 cleaned. 
 src/org/apache/pig/Algebraic.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCross.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCogroup.java
 src/org/apache/pig/impl/io/FileSpec.java
 src/org/apache/pig/impl/streaming/StreamingCommand.java
 src/org/apache/pig/StoreFunc.java
 src/org/apache/pig/tools/cmdline/CmdLineParser.java
 src/org/apache/pig/tools/timer/PerformanceTimer.java
 src/org/apache/pig/tools/timer/PerformanceTimerFactory.java
 Thanks,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-806) to remove author tags in the pig source code

2009-05-12 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708469#action_12708469
 ] 

Olga Natkovich commented on PIG-806:


Thanks, Tejas. What is wrong with author tag? The error on PIG-765 says that 
Pig community agreed to disallow that but I don't remember that.

 to remove author tags in the pig source code
 

 Key: PIG-806
 URL: https://issues.apache.org/jira/browse/PIG-806
 Project: Pig
  Issue Type: Bug
Reporter: Giridharan Kesavan

 Following java source files has author tags in them ; which need to to be 
 cleaned. 
 src/org/apache/pig/Algebraic.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCross.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCogroup.java
 src/org/apache/pig/impl/io/FileSpec.java
 src/org/apache/pig/impl/streaming/StreamingCommand.java
 src/org/apache/pig/StoreFunc.java
 src/org/apache/pig/tools/cmdline/CmdLineParser.java
 src/org/apache/pig/tools/timer/PerformanceTimer.java
 src/org/apache/pig/tools/timer/PerformanceTimerFactory.java
 Thanks,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-799) Unit tests on windows are failing after multiquery commit

2009-05-12 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708471#action_12708471
 ] 

Olga Natkovich commented on PIG-799:


Daniel, thanks for the patch!

Looks like the automated patch testing is not working. If the tests pass in 
both windows and unix, please, commit the patch.

 Unit tests on windows are failing after multiquery commit
 -

 Key: PIG-799
 URL: https://issues.apache.org/jira/browse/PIG-799
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Attachments: PIG-799.patch


 Daniel could you take a look. It should be reproducible with the latest 
 trunk. Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (PIG-806) to remove author tags in the pig source code

2009-05-12 Thread nitesh bhatia
Hi

@author tags are not allowed in Pig (or any apache project I suppose).
Refer How to Contribute page - http://wiki.apache.org/pig/HowToContribute

--nitesh

On Tue, May 12, 2009 at 9:09 PM, Olga Natkovich (JIRA) j...@apache.orgwrote:


[
 https://issues.apache.org/jira/browse/PIG-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708469#action_12708469]

 Olga Natkovich commented on PIG-806:
 

 Thanks, Tejas. What is wrong with author tag? The error on PIG-765 says
 that Pig community agreed to disallow that but I don't remember that.

  to remove author tags in the pig source code
  
 
  Key: PIG-806
  URL: https://issues.apache.org/jira/browse/PIG-806
  Project: Pig
   Issue Type: Bug
 Reporter: Giridharan Kesavan
 
  Following java source files has author tags in them ; which need to to be
 cleaned.
  src/org/apache/pig/Algebraic.java
 
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCross.java
 
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCogroup.java
  src/org/apache/pig/impl/io/FileSpec.java
  src/org/apache/pig/impl/streaming/StreamingCommand.java
  src/org/apache/pig/StoreFunc.java
  src/org/apache/pig/tools/cmdline/CmdLineParser.java
  src/org/apache/pig/tools/timer/PerformanceTimer.java
  src/org/apache/pig/tools/timer/PerformanceTimerFactory.java
  Thanks,

 --
 This message is automatically generated by JIRA.
 -
 You can reply to this email to add a comment to the issue online.




-- 
Nitesh Bhatia
Dhirubhai Ambani Institute of Information  Communication Technology
Gandhinagar
Gujarat

Life is never perfect. It just depends where you draw the line.

visit:
http://www.awaaaz.com - connecting through music
http://www.volstreet.com - lets volunteer for better tomorrow
http://www.instibuzz.com - Voice opinions, Transact easily, Have fun


[jira] Commented: (PIG-806) to remove author tags in the pig source code

2009-05-12 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-806?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708508#action_12708508
 ] 

Alan Gates commented on PIG-806:


http://wiki.apache.org/pig/HowToContribute see section on Making Changes.

 to remove author tags in the pig source code
 

 Key: PIG-806
 URL: https://issues.apache.org/jira/browse/PIG-806
 Project: Pig
  Issue Type: Bug
Reporter: Giridharan Kesavan

 Following java source files has author tags in them ; which need to to be 
 cleaned. 
 src/org/apache/pig/Algebraic.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCross.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCogroup.java
 src/org/apache/pig/impl/io/FileSpec.java
 src/org/apache/pig/impl/streaming/StreamingCommand.java
 src/org/apache/pig/StoreFunc.java
 src/org/apache/pig/tools/cmdline/CmdLineParser.java
 src/org/apache/pig/tools/timer/PerformanceTimer.java
 src/org/apache/pig/tools/timer/PerformanceTimerFactory.java
 Thanks,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-788) Proposal to remove float from Pig data types

2009-05-12 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-788?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708512#action_12708512
 ] 

Alan Gates commented on PIG-788:


Reading the latest comments on AVRO-17 it looks like they are leaning towards 
keeping float, so this may be becoming a non-issue.

 Proposal to remove float from Pig data types
 

 Key: PIG-788
 URL: https://issues.apache.org/jira/browse/PIG-788
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Alan Gates
Assignee: Alan Gates

 Pig would like to use the new Hadoop Avro serialization package to pass data 
 between MR jobs, and eventually between Pig and UDFs that are not written in 
 Java.  Avro will not be supporting the float data type, but only double (see 
 AVRO-17).  Pig currently support both float and double.  Double is the 
 default floating point type (so if the user says x + 1.0, 1.0 is taken to be 
 a double, not a float).  Float was initially included in the list of Pig 
 types because Hadoop supported it as one of the Writable types, and we were 
 trying to make sure all of Hadoop's writable types could be represented in 
 Pig.  
 In practice we do not see anyone using the float type.   In order to be able 
 to easily use Avro I propose dropping the float type.  
 Please speak up if you are using the float type and you have a compelling 
 reason not to use double.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-806) to remove author tags in the pig source code

2009-05-12 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-806:


Affects Version/s: 0.3.0
Fix Version/s: 0.3.0
 Assignee: Santhosh Srinivasan

 to remove author tags in the pig source code
 

 Key: PIG-806
 URL: https://issues.apache.org/jira/browse/PIG-806
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Giridharan Kesavan
Assignee: Santhosh Srinivasan
 Fix For: 0.3.0


 Following java source files has author tags in them ; which need to to be 
 cleaned. 
 src/org/apache/pig/Algebraic.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCross.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCogroup.java
 src/org/apache/pig/impl/io/FileSpec.java
 src/org/apache/pig/impl/streaming/StreamingCommand.java
 src/org/apache/pig/StoreFunc.java
 src/org/apache/pig/tools/cmdline/CmdLineParser.java
 src/org/apache/pig/tools/timer/PerformanceTimer.java
 src/org/apache/pig/tools/timer/PerformanceTimerFactory.java
 Thanks,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-806) to remove author tags in the pig source code

2009-05-12 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-806?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan resolved PIG-806.
-

Resolution: Fixed

Committed the changes. Except for StreamingCommand.java all the other files 
noted in the bug report were modified to remove the @author tag

 to remove author tags in the pig source code
 

 Key: PIG-806
 URL: https://issues.apache.org/jira/browse/PIG-806
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.3.0
Reporter: Giridharan Kesavan
Assignee: Santhosh Srinivasan
 Fix For: 0.3.0


 Following java source files has author tags in them ; which need to to be 
 cleaned. 
 src/org/apache/pig/Algebraic.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCross.java
 src/org/apache/pig/backend/local/executionengine/physicalLayer/relationalOperators/POCogroup.java
 src/org/apache/pig/impl/io/FileSpec.java
 src/org/apache/pig/impl/streaming/StreamingCommand.java
 src/org/apache/pig/StoreFunc.java
 src/org/apache/pig/tools/cmdline/CmdLineParser.java
 src/org/apache/pig/tools/timer/PerformanceTimer.java
 src/org/apache/pig/tools/timer/PerformanceTimerFactory.java
 Thanks,

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-807) PERFORMANCE: Provide a way for UDFs to use read-once bags (backed by the Hadoop values iterator)

2009-05-12 Thread Pradeep Kamath (JIRA)
PERFORMANCE: Provide a way for UDFs to use read-once bags (backed by the Hadoop 
values iterator)


 Key: PIG-807
 URL: https://issues.apache.org/jira/browse/PIG-807
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.1
Reporter: Pradeep Kamath
 Fix For: 0.3.0


Currently all bags resulting from a group or cogroup are materialized as bags 
containing all of the contents. The issue with this is that if a particular key 
has many corresponding values, all these values get stuffed in a bag which may 
run out of memory and hence spill causing slow down in performance and sometime 
memory exceptions. In many cases, the udfs which use these bags coming out a 
group and cogroup only need to iterate over the bag in a unidirectional 
read-once manner. This can be implemented by having the bag implement its 
iterator by simply iterating over the underlying hadoop iterator provided in 
the reduce. This kind of a bag is also needed in 
http://issues.apache.org/jira/browse/PIG-802. So the code can be reused for 
this issue too. The other part of this issue is to have some way for the udfs 
to communicate to Pig that any input bags that they need are read once bags . 
This can be achieved by having an Interface - say UsesReadOnceBags  which is 
serves as a tag to indicate the intent to Pig. Pig can then rewire its 
execution plan to use ReadOnceBags is feasible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-802) PERFORMANCE: not creating bags for ORDER BY

2009-05-12 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708551#action_12708551
 ] 

Pradeep Kamath commented on PIG-802:


Adding some more details:
A new kind of bag - ReadOnceBag needs to be implemented. This bag will have 
reference to the key  currently being processed and the iterator to values 
provided by hadoop in reduce(). The ReadOnceBag's iterator will simply iterate 
over the hadoop iterator at each call and construct a tuple by using the key 
and value (see POPackage.java for details on how this is done). POPackage 
should also be changed or a new class introduced which creates ReadOnceBags 
instead of regular bags. This creation of the bag should only initialize the 
bag with the key and iterator.

 PERFORMANCE: not creating bags for ORDER BY
 ---

 Key: PIG-802
 URL: https://issues.apache.org/jira/browse/PIG-802
 Project: Pig
  Issue Type: Improvement
Reporter: Olga Natkovich

 Order by should be changed to not use POPackage to put all of the tuples in a 
 bag on the reduce side, as the bag is just immediately flattened. It can 
 instead work like join does for the last input in the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-794) Use Avro serialization in Pig

2009-05-12 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708701#action_12708701
 ] 

Olga Natkovich commented on PIG-794:


I integrated the latest patch and run unit tests. All the AVRO unit tests 
failed with the following stack trace:

Could not initialize class 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AvroTupleSchema
java.lang.NoClassDefFoundError: Could not initialize class 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AvroTupleSchema
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.TupleAvroWriter.writeDatum(AvroStorage.java:359)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.TupleAvroWriter.writeTuple(AvroStorage.java:408)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.TupleAvroWriter.write(AvroStorage.java:353)
at 
org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.AvroStorage.putNext(AvroStorage.java:571)
at 
org.apache.pig.backend.hadoop.executionengine.physicalLayer.relationalOperators.POStore.getNext(POStore.java:121)
at 
org.apache.pig.backend.local.executionengine.LocalPigLauncher.runPipeline(LocalPigLauncher.java:129)
at 
org.apache.pig.backend.local.executionengine.LocalPigLauncher.launchPig(LocalPigLauncher.java:102)
at org.apache.pig.test.TestAvroStorage.store(TestAvroStorage.java:117)
at 
org.apache.pig.test.TestAvroStorage.testLoadStoreComplexDataWithNull(TestAvroStorage.java:178)

~


 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
 Fix For: 0.2.0

 Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
 jackson-asl-0.9.4.jar


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-799) Unit tests on windows are failing after multiquery commit

2009-05-12 Thread Daniel Dai (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Daniel Dai updated PIG-799:
---

   Resolution: Fixed
Fix Version/s: 0.3.0
   Status: Resolved  (was: Patch Available)

 Unit tests on windows are failing after multiquery commit
 -

 Key: PIG-799
 URL: https://issues.apache.org/jira/browse/PIG-799
 Project: Pig
  Issue Type: Bug
Reporter: Olga Natkovich
Assignee: Daniel Dai
 Fix For: 0.3.0

 Attachments: PIG-799.patch


 Daniel could you take a look. It should be reproducible with the latest 
 trunk. Thanks

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-626) Statistics (records read by each mapper and reducer)

2009-05-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708737#action_12708737
 ] 

Hadoop QA commented on PIG-626:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12407672/PIG-626.patch
  against trunk revision 774167.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 12 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/34/console

This message is automatically generated.

 Statistics (records read by each mapper and reducer)
 

 Key: PIG-626
 URL: https://issues.apache.org/jira/browse/PIG-626
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.2.0
Reporter: Shubham Chopra
Assignee: Shubham Chopra
Priority: Minor
 Fix For: 0.3.0

 Attachments: PIG-626.patch, pigStats.patch, pigStats.patch, 
 pigStats.patch, pigStats.patch, pigStats.patch, 
 TEST-org.apache.pig.test.TestBZip.txt


 This uses the counters framework that hadoop has. Initially, I am just 
 interested in finding out the number of records read by each mapper/reducer 
 particularly for the last job in any script. A sample code to access the 
 statistics for the last job:
 String reducePlan = 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_PLAN);
 if(reducePlan == null) {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_MAP_OUTPUT_RECORDS));
 } else {
 System.out.println(Records written :  + 
 stats.getPigStats().get(stats.getLastJobID()).get(PIG_STATS_REDUCE_OUTPUT_RECORDS));
 }
 The patch contains 7 test cases. These include tests PigStorage and 
 BinStorage along with one for multiple MR jobs case.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-781) Error reporting for failed MR jobs

2009-05-12 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-781?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12708752#action_12708752
 ] 

Hadoop QA commented on PIG-781:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12407727/partial_failure.patch
  against trunk revision 774167.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 14 new or modified tests.

-1 patch.  The patch command could not apply the patch.

Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-minerva.apache.org/37/console

This message is automatically generated.

 Error reporting for failed MR jobs
 --

 Key: PIG-781
 URL: https://issues.apache.org/jira/browse/PIG-781
 Project: Pig
  Issue Type: Improvement
Reporter: Gunther Hagleitner
 Attachments: partial_failure.patch, partial_failure.patch


 If we have multiple MR jobs to run and some of them fail the behavior of the 
 system is to not stop on the first failure but to keep going. That way jobs 
 that do not depend on the failed job might still succeed.
 The question is to how best report this scenario to a user. How do we tell 
 which jobs failed and which didn't?
 One way could be to tie jobs to stores and report which store locations won't 
 have data and which ones do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-794) Use Avro serialization in Pig

2009-05-12 Thread Giridharan Kesavan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giridharan Kesavan updated PIG-794:
---

Attachment: PIG-794.patch

this patch resolves jackson-asl.jar from the mvn repo through ivy and avro from 
the local lib dir.
While submitting this patch to svn we have to add avro jar to the lib dir

tnx!

 Use Avro serialization in Pig
 -

 Key: PIG-794
 URL: https://issues.apache.org/jira/browse/PIG-794
 Project: Pig
  Issue Type: Improvement
  Components: impl
Affects Versions: 0.2.0
Reporter: Rakesh Setty
 Fix For: 0.2.0

 Attachments: avro-0.1-dev-java.jar, AvroStorage.patch, 
 jackson-asl-0.9.4.jar, PIG-794.patch


 We would like to use Avro serialization in Pig to pass data between MR jobs 
 instead of the current BinStorage. Attached is an implementation of 
 AvroBinStorage which performs significantly better compared to BinStorage on 
 our benchmarks.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.