Re: High(er) res Pig logo?

2009-09-28 Thread Alan Gates
I have a couple of higher resolution pigs in overalls and a pig on the  
Hadoop elephant.  I've checked them into src/docs/src/documentation/ 
resources/images/ so all can use them.


Also, we're working on cleaning up the Pig with Y! logo issue.

Alan.

On Sep 27, 2009, at 9:59 AM, Dmitriy Ryaboy wrote:

Where can one find the Pig logo in a size/resolution suitable for  
presentations?


Also, I went on the website and noticed that the Y! reappeared on  
Pig's chest.


-D




Re: High(er) res Pig logo?

2009-09-28 Thread Dmitriy Ryaboy
Thanks Alan, got 'em. Much appreciated.

-D

On Mon, Sep 28, 2009 at 3:44 PM, Alan Gates ga...@yahoo-inc.com wrote:
 I have a couple of higher resolution pigs in overalls and a pig on the
 Hadoop elephant.  I've checked them into
 src/docs/src/documentation/resources/images/ so all can use them.

 Also, we're working on cleaning up the Pig with Y! logo issue.

 Alan.

 On Sep 27, 2009, at 9:59 AM, Dmitriy Ryaboy wrote:

 Where can one find the Pig logo in a size/resolution suitable for
 presentations?

 Also, I went on the website and noticed that the Y! reappeared on Pig's
 chest.

 -D




RE: High(er) res Pig logo?

2009-09-28 Thread Olga Natkovich
I have cleaned up the logo.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] 
Sent: Sunday, September 27, 2009 10:00 AM
To: pig-dev@hadoop.apache.org
Subject: High(er) res Pig logo?

Where can one find the Pig logo in a size/resolution suitable for
presentations?

Also, I went on the website and noticed that the Y! reappeared on Pig's
chest.

-D


[jira] Commented: (PIG-979) Acummulator Interface for UDFs

2009-09-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760389#action_12760389
 ] 

Alan Gates commented on PIG-979:


Jeff, thanks for the paper.  I looked over it and I'm not certain it directly 
applies.  They are measuring both the aggregation time (sort or hash) and how 
it is passed to the user defined aggregate (iterate or accumulate).  Being in 
Hadoop we already have the aggregation done.  So it's just a question of the 
fastest way to make the data available to the UDF.  As I said above, we want to 
test the performance of this and prove its worth before we add it.

As a general complaint, they used a fairly old revision of Pig code in their 
paper, even though it appears it was published in the last few months.

 Acummulator Interface for UDFs
 --

 Key: PIG-979
 URL: https://issues.apache.org/jira/browse/PIG-979
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Ying He

 Add an accumulator interface for UDFs that would allow them to take a set 
 number of records at a time instead of the entire bag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-981) Merge join should restrict join key expressions to simple projects

2009-09-28 Thread Pradeep Kamath (JIRA)
Merge join should restrict join key expressions to simple projects
--

 Key: PIG-981
 URL: https://issues.apache.org/jira/browse/PIG-981
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath


Currently merge join allows join key expressions to be arbitrary expressions 
with the assumption that the expressions keep the sort order. Since currently 
only ascending sort order is supported, the code checks at run times for sort 
order and catches the case where sort order is broken because the join key 
expression is not order preserving. However there is a reason we should 
restrict the join keys to projection of columns only:
 PIG-953 will enable pig to perform merge join  to work with loaders and store 
functions which can internally index sorted data. These store functions can 
only create an index (and hence lookup on the index) on raw data columns (and 
not expressions on the columns).
Hopefully this does not downgrade the usability of merge join much since if the 
expressions can always be applied post join on the join columns and since the 
expressions are order preserving they do not affect the outcome of the join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-979) Acummulator Interface for UDFs

2009-09-28 Thread Alan Gates (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760396#action_12760396
 ] 

Alan Gates commented on PIG-979:


Ciemo,

In your comment above, you indicate you'd like functions like cumulative sum to 
be able to emit a value each time a record is added.  But how does that work 
with something like:

{code}
A = load 'bla';
B = group A by $0;
C = foreach B generate {
   D = order A by $1;
   generate CUMULATIVE_SUM(D.$2), SUM(D.$2);
}
{code}

SUM can't output a value until it's seen everything, but CUMULATIVE_SUM will 
have an output on every record.  The way Pig's data model handles this with 
bags.  The other possibility I can see is that Pig handles this as having an 
implicit flatten, so output from above would look like:

1   10
3   10
6   10
10 10

Are you proposing that we create a way to streamline output of these types of 
functions to STORE (or DUMP) so that the bag never need be materialized?  Or do 
you want a UDF type that takes a bag and produces multiple outputs along with 
an implicit flatten?  Or are you suggesting a change in the data model?  

 Acummulator Interface for UDFs
 --

 Key: PIG-979
 URL: https://issues.apache.org/jira/browse/PIG-979
 Project: Pig
  Issue Type: New Feature
Reporter: Alan Gates
Assignee: Ying He

 Add an accumulator interface for UDFs that would allow them to take a set 
 number of records at a time instead of the entire bag.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-981) Merge join should restrict join key expressions to simple projects

2009-09-28 Thread Ashutosh Chauhan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760406#action_12760406
 ] 

Ashutosh Chauhan commented on PIG-981:
--

Default Merge Join implementation can handle order preserving join expressions, 
that is, when merge join itself builds the index and doesn't rely on underlying 
storage for index. When Merge Join doesn't build index itself, this can't be 
guaranteed, but then we don't have to limit all possible uses of merge-join 
because of this reason. Rather, we should check if Merge Join is building 
indexes of its own, if it is then allow order preserving expression, if it is 
not, only *then* restrict expressions to projections.

 Merge join should restrict join key expressions to simple projects
 --

 Key: PIG-981
 URL: https://issues.apache.org/jira/browse/PIG-981
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath

 Currently merge join allows join key expressions to be arbitrary expressions 
 with the assumption that the expressions keep the sort order. Since currently 
 only ascending sort order is supported, the code checks at run times for sort 
 order and catches the case where sort order is broken because the join key 
 expression is not order preserving. However there is a reason we should 
 restrict the join keys to projection of columns only:
  PIG-953 will enable pig to perform merge join  to work with loaders and 
 store functions which can internally index sorted data. These store functions 
 can only create an index (and hence lookup on the index) on raw data columns 
 (and not expressions on the columns).
 Hopefully this does not downgrade the usability of merge join much since if 
 the expressions can always be applied post join on the join columns and since 
 the expressions are order preserving they do not affect the outcome of the 
 join. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-953) Enable merge join in pig to work with loaders and store functions which can internally index sorted data

2009-09-28 Thread Pradeep Kamath (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760423#action_12760423
 ] 

Pradeep Kamath commented on PIG-953:


Here is a proposal for dealing with Sort Column information in SortInfo. Rather 
than giving Arraylist of column names and separate array list of asc/desc 
flags, it would be good to have a unified structure containing both pieces of 
information per sort column. Also there are use cases for providing column 
names (zebra) and for them being optional and providing column positions 
instead which some other loader /optimizer might find useful. The type of the 
column might also be useful if available. Hence, the proposal is to have a 
SortColumn class with the following attributes : column name, column position 
(zero based index), column type, asc/desc flag. Then in SortInfo there would be 
a ListSortColumn which would be available through a getter. This should 
address both the concerns above. Callers will need to explicity check for null 
column names and UNKNOWN column type since these two scenarios may occur if 
schema is not available for pig runtime to provide the information.

Thoughts?

 Enable merge join in pig to work with loaders and store functions which can 
 internally index sorted data 
 -

 Key: PIG-953
 URL: https://issues.apache.org/jira/browse/PIG-953
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.3.0
Reporter: Pradeep Kamath
Assignee: Pradeep Kamath
 Attachments: PIG-953-2.patch, PIG-953.patch


 Currently merge join implementation in pig includes construction of an index 
 on sorted data and use of that index to seek into the right input to 
 efficiently perform the join operation. Some loaders (notably the zebra 
 loader) internally implement an index on sorted data and can perform this 
 seek efficiently using their index. So the use of the index needs to be 
 abstracted in such a way that when the loader supports indexing, pig uses it 
 (indirectly through the loader) and does not construct an index. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-28 Thread Ankit Modi (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ankit Modi updated PIG-960:
---

Attachment: pig_rlr.patch

Added a new patch with Apache license and SVN Trunk Revision 819662

 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi
 Attachments: pig_rlr.patch


 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-975:
---

Status: Open  (was: Patch Available)

 Need a databag that does not register with SpillableMemoryManager and spill 
 data pro-actively
 -

 Key: PIG-975
 URL: https://issues.apache.org/jira/browse/PIG-975
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.2.0
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.2.0

 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
 PIG-975.patch3, PIG-975.patch4


 POPackage uses DefaultDataBag during reduce process to hold data. It is 
 registered with SpillableMemoryManager and prone to OutOfMemoryException.  
 It's better to pro-actively managers the usage of the memory. The bag fills 
 in memory to a specified amount, and dump the rest the disk.  The amount of 
 memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-975:
---

Fix Version/s: (was: 0.2.0)
   0.6.0
Affects Version/s: (was: 0.2.0)
   0.4.0
   Status: Patch Available  (was: Open)

 Need a databag that does not register with SpillableMemoryManager and spill 
 data pro-actively
 -

 Key: PIG-975
 URL: https://issues.apache.org/jira/browse/PIG-975
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.6.0

 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
 PIG-975.patch3, PIG-975.patch4


 POPackage uses DefaultDataBag during reduce process to hold data. It is 
 registered with SpillableMemoryManager and prone to OutOfMemoryException.  
 It's better to pro-actively managers the usage of the memory. The bag fills 
 in memory to a specified amount, and dump the rest the disk.  The amount of 
 memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-752) local mode doesn't read bzip2 and gzip compressed data files

2009-09-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-752:
---

Fix Version/s: (was: 0.4.0)
   0.6.0

 local mode doesn't read bzip2 and gzip compressed data files
 

 Key: PIG-752
 URL: https://issues.apache.org/jira/browse/PIG-752
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: David Ciemiewicz
Assignee: Jeff Zhang
 Fix For: 0.6.0

 Attachments: Pig_752.Patch


 Problem 1)  use of .bz2 file extension does not store results bzip2 
 compressed in Local mode (-exectype local)
 If I use the .bz2 filename extension in a STORE statement on HDFS, the 
 results are stored with bzip2 compression.
 If I use the .bz2 filename extension in a STORE statement on local file 
 system, the results are NOT stored with bzip2 compression.
 compact.bz2.pig:
 {code}
 A = load 'events.test' using PigStorage();
 store A into 'events.test.bz2' using PigStorage();
 C = load 'events.test.bz2' using PigStorage();
 C = limit C 10;
 dump C;
 {code}
 {code}
 -bash-3.00$ pig -exectype local compact.bz2.pig
 -bash-3.00$ file events.test
 events.test: ASCII English text, with very long lines
 -bash-3.00$ file events.test.bz2
 events.test.bz2: ASCII English text, with very long lines
 -bash-3.00$ cat events.test | bzip2  events.test.bz2
 -bash-3.00$ file events.test.bz2
 events.test.bz2: bzip2 compressed data, block size = 900k
 {code}
 The output format in local mode is definitely not bzip2, but it should be.
 {code}
 Problem 2) pig in local mode does not decompress bzip2 compressed files, but 
 should, to be consistent with HDFS
 read.bz2.pig:
 {code}
 A = load 'events.test.bz2' using PigStorage();
 A = limit A 10;
 dump A;
 {code}
 The output should be human readable but is instead garbage, indicating no 
 decompression took place during the load:
 {code}
 -bash-3.00$ pig -exectype local read.bz2.pig
 USING: /grid/0/gs/pig/current
 2009-04-03 18:26:30,455 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - 100% complete!
 2009-04-03 18:26:30,456 [main] INFO  
 org.apache.pig.backend.local.executionengine.LocalPigLauncher - Success!!
 (BZh91AYsyoz?u?...@{x_?d?|u-??mK???;??4?C??)
 ((R? 6?*mg, 
 ?6?Zj?k,???0?QT?d???hY?#mJ?[j???z?m?t?u?K)??K5+??)?m?E7j?X?8a??
 ??U?p@@MT?$?B?P??N??=???(z}gk...@c$\??i]?g:?J)
 a(R?,?u?v???...@?i@??J??!D?)???A?PP?IY??m?
 (mP(i?4,#F[?I)@?...@??|7^?}U??wwg,?u?$?T???((Q!D?=`*?}hP??_|??=?(??2???m=?xG?(?rC?B?(33??:4?N???t|??T?*??k??NT?x???=?fyv?wf??4z???4t?)
 (?oou?t???Kwl?3?nCM?WS?;l???P?s?x
 a???e)B??9?  ?44
 ((?...@4?)
 (f)
 (?...@+?d?0@?U)
 (Q?SR)
 -bash-3.00$ 
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-660) Integration with Hadoop 0.20

2009-09-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich resolved PIG-660.


Resolution: Fixed

patch was committed a while back

 Integration with Hadoop 0.20
 

 Key: PIG-660
 URL: https://issues.apache.org/jira/browse/PIG-660
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
 Environment: Hadoop 0.20
Reporter: Santhosh Srinivasan
Assignee: Santhosh Srinivasan
 Fix For: 0.5.0

 Attachments: hadoop20.jar.gz, PIG-660-for-branch-0.3.patch, 
 PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, PIG-660_3.patch, 
 PIG-660_4.patch, PIG-660_5.patch, PIG-660_trunk.patch, PIG-660_trunk_2.patch, 
 pig_660_shims.patch, pig_660_shims_2.patch, pig_660_shims_3.patch


 With Hadoop 0.20, it will be possible to query the status of each map and 
 reduce in a map reduce job. This will allow better error reporting. Some of 
 the other items that could be on Hadoop's feature requests/bugs are 
 documented here for tracking.
 1. Hadoop should return objects instead of strings when exceptions are thrown
 2. The JobControl should handle all exceptions and report them appropriately. 
 For example, when the JobControl fails to launch jobs, it should handle 
 exceptions appropriately and should support APIs that query this state, i.e., 
 failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-592) schema inferred incorrectly

2009-09-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-592:
---

Fix Version/s: (was: 0.5.0)
   0.6.0

 schema inferred incorrectly
 ---

 Key: PIG-592
 URL: https://issues.apache.org/jira/browse/PIG-592
 Project: Pig
  Issue Type: Bug
Affects Versions: 0.4.0
Reporter: Christopher Olston
 Fix For: 0.6.0

 Attachments: PIG-592-1.patch


 A simple pig script, that never introduces any schema information:
 A = load 'foo';
 B = foreach (group A by $8) generate group, COUNT($1);
 C = load 'bar';   // ('bar' has two columns)
 D = join B by $0, C by $0;
 E = foreach D generate $0, $1, $3;
 Fails, complaining that $3 does not exist:
 java.io.IOException: Out of bound access. Trying to access non-existent 
 column: 3. Schema {B::group: bytearray,long,bytearray} has 3 column(s).
 Apparently Pig gets confused, and thinks it knows the schema for C (a single 
 bytearray column).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-956) Reduce patch testing time

2009-09-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-956:
---

Attachment: PIG-956.patch

 Reduce patch testing time
 -

 Key: PIG-956
 URL: https://issues.apache.org/jira/browse/PIG-956
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.6.0

 Attachments: PIG-956.patch


 The proposal is to split the tests into 2 groups:
 (1) Ten-minute tests - this is a set of tests that run with every patch 
 submission and takes aproximately 10 minutes
 (2) All tests - these include all tests and they will run nightly
 This is similar to work done in Hadoop: 
 http://issues.apache.org/jira/browse/HDFS-458

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-956) Reduce patch testing time

2009-09-28 Thread Olga Natkovich (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-956?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Olga Natkovich updated PIG-956:
---

Status: Patch Available  (was: Open)

 Reduce patch testing time
 -

 Key: PIG-956
 URL: https://issues.apache.org/jira/browse/PIG-956
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.6.0

 Attachments: PIG-956.patch


 The proposal is to split the tests into 2 groups:
 (1) Ten-minute tests - this is a set of tests that run with every patch 
 submission and takes aproximately 10 minutes
 (2) All tests - these include all tests and they will run nightly
 This is similar to work done in Hadoop: 
 http://issues.apache.org/jira/browse/HDFS-458

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-956) Reduce patch testing time

2009-09-28 Thread Olga Natkovich (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760454#action_12760454
 ] 

Olga Natkovich commented on PIG-956:


The test-commit target runs 7-8 minutes and has coverage of 53% (compared to 
70% for the entire set of tests)

 Reduce patch testing time
 -

 Key: PIG-956
 URL: https://issues.apache.org/jira/browse/PIG-956
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Olga Natkovich
Assignee: Olga Natkovich
 Fix For: 0.6.0

 Attachments: PIG-956.patch


 The proposal is to split the tests into 2 groups:
 (1) Ten-minute tests - this is a set of tests that run with every patch 
 submission and takes aproximately 10 minutes
 (2) All tests - these include all tests and they will run nightly
 This is similar to work done in Hadoop: 
 http://issues.apache.org/jira/browse/HDFS-458

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-960) Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage

2009-09-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-960?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760484#action_12760484
 ] 

Hadoop QA commented on PIG-960:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12420748/pig_rlr.patch
  against trunk revision 819691.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 6 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 406 javac compiler warnings (more 
than the trunk's current 403 warnings).

+1 findbugs.  The patch does not introduce any new Findbugs warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/48/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/48/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h7.grid.sp2.yahoo.net/48/console

This message is automatically generated.

 Using Hadoop's optimized LineRecordReader for reading Tuples in PigStorage 
 ---

 Key: PIG-960
 URL: https://issues.apache.org/jira/browse/PIG-960
 Project: Pig
  Issue Type: Improvement
  Components: impl
Reporter: Ankit Modi
 Attachments: pig_rlr.patch


 PigStorage's reading of Tuples ( lines ) can be optimized using Hadoop's 
 {{LineRecordReader}}.
 This can help in following areas
 - Improving performance reading of Tuples (lines) in {{PigStorage}}
 - Any future improvements in line reading done in Hadoop's 
 {{LineRecordReader}} is automatically carried over to Pig
 Issues that are handled by this patch
 - BZip uses internal buffers and positioning for determining the number of 
 bytes read. Hence buffering done by {{LineRecordReader}} has to be turned off
 - Current implementation of {{LocalSeekableInputStream}} does not implement 
 {{available}} method. This method has to be implemented.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-975) Need a databag that does not register with SpillableMemoryManager and spill data pro-actively

2009-09-28 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12760485#action_12760485
 ] 

Hadoop QA commented on PIG-975:
---

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12420603/PIG-975.patch4
  against trunk revision 819691.

+1 @author.  The patch does not contain any @author tags.

+1 tests included.  The patch appears to include 3 new or modified tests.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 406 javac compiler warnings (more 
than the trunk's current 403 warnings).

-1 findbugs.  The patch appears to introduce 1 new Findbugs warnings.

-1 release audit.  The applied patch generated 278 release audit warnings 
(more than the trunk's current 277 warnings).

+1 core tests.  The patch passed core unit tests.

+1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/testReport/
Release audit warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/artifact/trunk/patchprocess/releaseAuditDiffWarnings.txt
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Pig-Patch-h8.grid.sp2.yahoo.net/10/console

This message is automatically generated.

 Need a databag that does not register with SpillableMemoryManager and spill 
 data pro-actively
 -

 Key: PIG-975
 URL: https://issues.apache.org/jira/browse/PIG-975
 Project: Pig
  Issue Type: Improvement
Affects Versions: 0.4.0
Reporter: Ying He
Assignee: Ying He
 Fix For: 0.6.0

 Attachments: internalbag.xls, PIG-975.patch, PIG-975.patch2, 
 PIG-975.patch3, PIG-975.patch4


 POPackage uses DefaultDataBag during reduce process to hold data. It is 
 registered with SpillableMemoryManager and prone to OutOfMemoryException.  
 It's better to pro-actively managers the usage of the memory. The bag fills 
 in memory to a specified amount, and dump the rest the disk.  The amount of 
 memory to hold tuples is configurable. This can avoid out of memory error.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.