[jira] Commented: (PIG-1661) Add alternative search-provider to Pig site

2010-10-02 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12917246#action_12917246
 ] 

Santhosh Srinivasan commented on PIG-1661:
--

Sure, worth a try.

> Add alternative search-provider to Pig site
> ---
>
> Key: PIG-1661
> URL: https://issues.apache.org/jira/browse/PIG-1661
> Project: Pig
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Alex Baranau
>Priority: Minor
> Attachments: PIG-1661.patch
>
>
> Use search-hadoop.com service to make available search in Pig sources, MLs, 
> wiki, etc.
> This was initially proposed on user mailing list. The search service was 
> already added in site's skin (common for all Hadoop related projects) via 
> AVRO-626 so this issue is about enabling it for Pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [VOTE] Pig to become a top level Apache project

2010-08-18 Thread Santhosh Srinivasan
+1

Kudos to Alan and Olga for this amazing milestone.

Santhosh 

-Original Message-
From: Alan Gates [mailto:ga...@yahoo-inc.com] 
Sent: Wednesday, August 18, 2010 10:34 AM
To: pig-dev@hadoop.apache.org
Subject: [VOTE] Pig to become a top level Apache project

Earlier this week I began a discussion on Pig becoming a TLP 
(http://bit.ly/byD7L8
  ).  All of the received feedback was positive.  So, let's have a formal vote.

I propose we move Pig to a top level Apache project.

I propose that the initial PMC of this project be the list of all currently 
active Pig committers (http://hadoop.apache.org/pig/whoweare.html
  ) as of 18 August 2010.

I nominate Olga Natkovich as the chair of the PMC.  (PMC chairs have no more 
power than other PMC members, but they are responsible for writing regular 
reports for the Apache board, assigning rights to new committers, etc.)

I propose that as part of the resolution that will be forwarded to the Apache 
board we include that one of the first tasks of the new Pig PMC will be to 
adopt bylaws for the governance of the project.

Alan.

P.S.
If this vote passes, the next step is that the proposal will be forwarded to 
the Hadoop PMC for discussion and vote.
If the Hadoop PMC vote passes, a formal resolution is then drafted (see 
http://bit.ly/bvOTRq for an example resolution) and sent to the Apache board.
The Apache board will then vote on whether to make Pig a TLP.


RE: Restarting discussion on Pig as a TLP

2010-08-17 Thread Santhosh Srinivasan
+1

Very nice! I see that opinion has changed within a couple of quarters :) I am 
pasting references to my responses to Alan's initial proposal to make Pig a 
TLP. I am all for making Pig a TLP.

References:

1. 
http://mail-archives.apache.org/mod_mbox/hadoop-pig-dev/201004.mbox/%3c088a0b616c8c1d4787dd686c6922a72a03161...@snv-exvs10.ds.corp.yahoo.com%3e
2. 
http://mail-archives.apache.org/mod_mbox/hadoop-pig-dev/201004.mbox/%3c088a0b616c8c1d4787dd686c6922a72a03161...@snv-exvs10.ds.corp.yahoo.com%3e
 

-Original Message-
From: Alan Gates [mailto:ga...@yahoo-inc.com] 
Sent: Monday, August 16, 2010 1:46 PM
To: pig-dev@hadoop.apache.org
Subject: Restarting discussion on Pig as a TLP

Five months ago I started a discussion on whether Pig should become a top level 
project (TLP) at Apache instead of remaining a subproject of Hadoop 
(http://mail-archives.apache.org/mod_mbox/hadoop-pig-dev/201003.mbox/%3c006aea7c-8829-4788-ad7b-822396fa2...@yahoo-inc.com%3e
).  At the time I voted against it 
(http://mail-archives.apache.org/mod_mbox/hadoop-pig-dev/201003.mbox/%3cf1484964-e774-48b7-9d45-6e57c7b09...@yahoo-inc.com%3e
), as did many others.  However, I would like to restart that discussion now.

I gave several reasons for voting against it :

First, I was worried that by loosing our connection to Hadoop, Pig would loose 
its source of new users.  I have since been assured by Hadoop members that Pig 
would be free to keep our tab on their page (as Hbase has).  Also, obviously we 
would still be welcomed at Hadoop get togethers such as the various HUGs, 
Hadoop Summits, etc.  So our connection does not seem in danger.

Two, I was concerned that by not being members of the Hadoop community we would 
loose influence with Hadoop.  It is true that Pig developers will have to stay 
active in the Hadoop community, which will put a slightly extra burden on them. 
 But they are already bearing this burden, and whether or not the communities 
are governed by the same or separate PMCs will not affect this.

Finally, I said that philosophically it makes sense to me that all Hadoop 
related projects should stay under one umbrella.  This still makes sense to me, 
and I do see this as a downside of Pig moving out of Hadoop.

In addition to the above, a few other things have happened over the intervening 
months to cause me to reconsider.  Most importantly, it has become clear to me 
that Pig is operating as if it were a TLP inside Hadoop.  We have four members 
on the Hadoop PMC, which means we have sufficient votes to elect our committers 
and release our products.

Also, several Hadoop PMC members who have long experience in Apache projects 
have made clear to me that they believe Pig is ready to be a TLP.

I was also concerned about diversity in our PMC, since our project is Yahoo 
heavy.  Given that 10 out of 12 committers are Yahoo employees we need to work 
on this.  But we do have experienced committers in three different 
organizations, and I think this gives us sufficient base to to work on it as a 
TLP.

So, in summary, I have switched my view on this from "not yet" to "now is a 
good time".  I think Pig is ready to be a TLP.  We have a community of 
contributors and users that is growing both in numbers and in diversity.  We 
have a strong group of committers who I believe are ready to take on leadership 
of the project and who will benefit from being mentored by the larger Apache 
community.

Thoughts?

Alan.


RE: Begin a discussion about Pig as a top level project

2010-04-05 Thread Santhosh Srinivasan
"Given that, do you think it makes  
sense to say that Pig stays a subproject for now, but if it someday  
grows beyond Hadoop only it becomes a TLP?  I could agree to that  
stance."

Bingo!

Santhosh 

-Original Message-
From: Alan Gates [mailto:ga...@yahoo-inc.com] 
Sent: Monday, April 05, 2010 11:37 AM
To: pig-dev@hadoop.apache.org
Subject: Re: Begin a discussion about Pig as a top level project

Prognostication is a difficult business.  Of course I'd love it if  
someday there is an ISO Pig Latin committee (with meetings in cool  
exotic places) deciding the official standard for Pig Latin.  But that  
seems like saying in your start up's business plan, "When we reach  
Google's size, then we'll do x".  If there ever is an ISO Pig Latin  
standard it will be years off.

As others have noted, staying tight to Hadoop now has many advantages,  
both in technical and adoption terms.  Hence my advocacy of keeping  
Pig Latin Hadoop agnostic while tightly integrating the backend.   
Which is to say that in my view, Pig is Hadoop specific now, but there  
may come a day when that is no longer true.   Whether Pig will ever  
move past just running on Hadoop to running in other parallel systems  
won't be known for years to come.  Given that, do you think it makes  
sense to say that Pig stays a subproject for now, but if it someday  
grows beyond Hadoop only it becomes a TLP?  I could agree to that  
stance.

Alan.

On Apr 3, 2010, at 12:43 PM, Santhosh Srinivasan wrote:

> I see this as a multi-part question. Looking back at some of the
> significant roadmap/existential questions asked in the last 12  
> months, I
> see the following:
>
> 1. With the introduction of SQL, what is the philosophy of Pig (I sent
> an email about this approximately 9 months ago)
> 2. What is the approach to support backward compatibility in Pig (Alan
> had sent an email about this 3 months ago)
> 3. Should Pig be a TLP (the current email thread).
>
> Here is my take on answering the aforementioned questions.
>
> The initial philosophy of Pig was to be backend agnostic. It was
> designed as a data flow language. Whenever a new language is designed,
> the syntax and semantics of the language have to be laid out. The  
> syntax
> is usually captured in the form of a BNF grammar. The semantics are
> defined by the language creators. Backward compatibility is then a
> question of holding true to the syntax and semantics. With Pig, in
> addition to the language, the Java APIs were exposed to customers to
> implement UDFs (load/store/filter/grouping/row transformation etc),
> provision looping since the language does not support looping  
> constructs
> and also support a programmatic mode of access. Backward compatibility
> in this context is to support API versioning.
>
> Do we still intend to position as a data flow language that is backend
> agnostic? If the answer is yes, then there is a strong case for making
> Pig a TLP.
>
> Are we influenced by Hadoop? A big YES! The reason Pig chose to  
> become a
> Hadoop sub-project was to ride the Hadoop popularity wave. As a
> consequence, we chose to be heavily influenced by the Hadoop roadmap.
>
> Like a good lawyer, I also have rebuttals to Alan's questions :)
>
> 1. Search engine popularity - We can discuss this with the Hadoop team
> and still retain links to TLP's that are coupled (loosely or tightly).
> 2. Explicit connection to Hadoop - I see this as logical connection  
> v/s
> physical connection. Today, we are physically connected as a
> sub-project. Becoming a TLP, will not increase/decrease our  
> influence on
> the Hadoop community (think Logical, Physical and MR Layers :)
> 3. Philosophy - I have already talked about this. The tight coupling  
> is
> by choice. If Pig continues to be a data flow language with clear  
> syntax
> and semantics then someone can implement Pig on top of a different
> backend. Do we intend to take this approach?
>
> I just wanted to offer a different opinion to this thread. I strongly
> believe that we should think about the original philosophy. Will we  
> have
> a Pig standards committee that will decide on the changes to the
> language (think C/C++) if there are multiple backend implementations?
>
> I will reserve my vote based on the outcome of the philosophy and
> backward compatibility discussions. If we decide that Pig will be
> treated and maintained like a true language with clear syntax and
> semantics then we have a strong case to make it into a TLP. If not, we
> should retain our existing ties to Hadoop and make Pig into a data  
> flow
> language for Hadoop.
>
> Santhosh
>
> -Original Message-
> From: Thejas Nair [mailto:te...@yahoo-inc.com]
> Sent:

RE: Begin a discussion about Pig as a top level project

2010-04-03 Thread Santhosh Srinivasan
I see this as a multi-part question. Looking back at some of the
significant roadmap/existential questions asked in the last 12 months, I
see the following:

1. With the introduction of SQL, what is the philosophy of Pig (I sent
an email about this approximately 9 months ago)
2. What is the approach to support backward compatibility in Pig (Alan
had sent an email about this 3 months ago)
3. Should Pig be a TLP (the current email thread).

Here is my take on answering the aforementioned questions.

The initial philosophy of Pig was to be backend agnostic. It was
designed as a data flow language. Whenever a new language is designed,
the syntax and semantics of the language have to be laid out. The syntax
is usually captured in the form of a BNF grammar. The semantics are
defined by the language creators. Backward compatibility is then a
question of holding true to the syntax and semantics. With Pig, in
addition to the language, the Java APIs were exposed to customers to
implement UDFs (load/store/filter/grouping/row transformation etc),
provision looping since the language does not support looping constructs
and also support a programmatic mode of access. Backward compatibility
in this context is to support API versioning.

Do we still intend to position as a data flow language that is backend
agnostic? If the answer is yes, then there is a strong case for making
Pig a TLP.

Are we influenced by Hadoop? A big YES! The reason Pig chose to become a
Hadoop sub-project was to ride the Hadoop popularity wave. As a
consequence, we chose to be heavily influenced by the Hadoop roadmap.

Like a good lawyer, I also have rebuttals to Alan's questions :)

1. Search engine popularity - We can discuss this with the Hadoop team
and still retain links to TLP's that are coupled (loosely or tightly).
2. Explicit connection to Hadoop - I see this as logical connection v/s
physical connection. Today, we are physically connected as a
sub-project. Becoming a TLP, will not increase/decrease our influence on
the Hadoop community (think Logical, Physical and MR Layers :)
3. Philosophy - I have already talked about this. The tight coupling is
by choice. If Pig continues to be a data flow language with clear syntax
and semantics then someone can implement Pig on top of a different
backend. Do we intend to take this approach?

I just wanted to offer a different opinion to this thread. I strongly
believe that we should think about the original philosophy. Will we have
a Pig standards committee that will decide on the changes to the
language (think C/C++) if there are multiple backend implementations?

I will reserve my vote based on the outcome of the philosophy and
backward compatibility discussions. If we decide that Pig will be
treated and maintained like a true language with clear syntax and
semantics then we have a strong case to make it into a TLP. If not, we
should retain our existing ties to Hadoop and make Pig into a data flow
language for Hadoop.

Santhosh

-Original Message-
From: Thejas Nair [mailto:te...@yahoo-inc.com] 
Sent: Friday, April 02, 2010 4:08 PM
To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
Subject: Re: Begin a discussion about Pig as a top level project

I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
heavily influenced by its roadmap. I think it makes sense to continue as
a sub-project of hadoop.

-Thejas



On 3/31/10 4:04 PM, "Dmitriy Ryaboy"  wrote:

> Over time, Pig is increasing its coupling to Hadoop (for good 
> reasons), rather than decreasing it. If and when Pig becomes a viable 
> entity without hadoop around, it might make sense as a TLP. As is, I 
> think becoming a TLP will only introduce unnecessary administrative
and bureaucratic headaches.
> So my vote is also -1.
> 
> -Dmitriy
> 
> 
> 
> On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates 
wrote:
> 
>> So far I haven't seen any feedback on this.  Apache has asked the 
>> Hadoop PMC to submit input in April on whether some subprojects 
>> should be promoted to TLPs.  We, the Pig community, need to give 
>> feedback to the Hadoop PMC on how we feel about this.  Please make
your voice heard.
>> 
>> So now I'll head my own call and give my thoughts on it.
>> 
>> The biggest advantage I see to being a TLP is a direct connection to 
>> Apache.  Right now all of the Pig team's interaction with Apache is 
>> through the Hadoop PMC.  Being directly connected to Apache would 
>> benefit Pig team members who would have a better view into Apache.  
>> It would also raise our profile in Apache and thus make other
projects more aware of us.
>> 
>> However, I am concerned about loosing Pig's explicit connection to
Hadoop.
>>  This concern has a couple of dimensions.  One, Hadoop and MapReduce 
>> are the current flavor of the month in computing.  Given that Pig 
>> shares a name with the common farm animal, it's hard to be sure based
on search statistics.
>>  But Google trends shows that "hadoop" is searched on much more 
>> frequently

[jira] Created: (PIG-1344) PigStorage should be able to read back complex data containing delimiters created by PigStorage

2010-03-30 Thread Santhosh Srinivasan (JIRA)
PigStorage should be able to read back complex data containing delimiters 
created by PigStorage
---

 Key: PIG-1344
 URL: https://issues.apache.org/jira/browse/PIG-1344
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.7.0
Reporter: Santhosh Srinivasan
Assignee: Daniel Dai
 Fix For: 0.8.0


With Pig 0.7, the TextDataParser has been removed and the logic to parse 
complex data types has moved to Utf8StorageConverter. However, this does not 
handle the case where the complex data types could contain delimiters ('{', 
'}', ',', '(', ')', '[', ']', '#'). Fixing this issue will make PigStorage self 
contained and more usable.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-26 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850355#action_12850355
 ] 

Santhosh Srinivasan commented on PIG-1331:
--

Thanks for the information. Looking at the Hive design at 
http://wiki.apache.org/hadoop/Hive/Design , it looks like there is no 
significant difference between Owl and Hive. As you indicate, I hope we 
converge to a common metastore for Hadoop.



> Owl Hadoop Table Management Service
> ---
>
> Key: PIG-1331
> URL: https://issues.apache.org/jira/browse/PIG-1331
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
>
> This JIRA is a proposal to create a Hadoop table management service: Owl. 
> Today, MapReduce and Pig applications interacts directly with HDFS 
> directories and files and must deal with low level data management issues 
> such as storage format, serialization/compression schemes, data layout, and 
> efficient data accesses, etc, often with different solutions. Owl aims to 
> provide a standard way to addresses this issue and abstracts away the 
> complexities of reading/writing huge amount of data from/to HDFS.
> Owl has a data access API that is modeled after the traditional Hadoop 
> !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
> related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
> store.  Owl integrates with different storage module like Zebra with a 
> pluggable architecture.
>  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
> time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1331) Owl Hadoop Table Management Service

2010-03-26 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12850342#action_12850342
 ] 

Santhosh Srinivasan commented on PIG-1331:
--

Jay, 

In PIG-823 there was a discussion around how Owl is different from Hive's 
metastore. Is that still true today? If not, can you elaborate on the key 
differences between the two systems?

Thanks,
Santhosh

> Owl Hadoop Table Management Service
> ---
>
> Key: PIG-1331
> URL: https://issues.apache.org/jira/browse/PIG-1331
> Project: Pig
>  Issue Type: New Feature
>Reporter: Jay Tang
>
> This JIRA is a proposal to create a Hadoop table management service: Owl. 
> Today, MapReduce and Pig applications interacts directly with HDFS 
> directories and files and must deal with low level data management issues 
> such as storage format, serialization/compression schemes, data layout, and 
> efficient data accesses, etc, often with different solutions. Owl aims to 
> provide a standard way to addresses this issue and abstracts away the 
> complexities of reading/writing huge amount of data from/to HDFS.
> Owl has a data access API that is modeled after the traditional Hadoop 
> !InputFormt and a management API to manipulate Owl objects.  This JIRA is 
> related to Pig-823 (Hadoop Metadata Service) as Owl has an internal metadata 
> store.  Owl integrates with different storage module like Zebra with a 
> pluggable architecture.
>  Initially, the proposal is to submit Owl as a Pig contrib project.  Over 
> time, it makes sense to move it to a Hadoop subproject.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1117) Pig reading hive columnar rc tables

2010-01-11 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12798917#action_12798917
 ] 

Santhosh Srinivasan commented on PIG-1117:
--

+1 on making it part of main piggybank. We should not be creating a separate 
directory just to handle hive.

> Pig reading hive columnar rc tables
> ---
>
> Key: PIG-1117
> URL: https://issues.apache.org/jira/browse/PIG-1117
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.7.0
>Reporter: Gerrit Jansen van Vuuren
>Assignee: Gerrit Jansen van Vuuren
> Fix For: 0.7.0
>
> Attachments: HiveColumnarLoader.patch, HiveColumnarLoaderTest.patch, 
> PIG-1117.patch, PIG-117-v.0.6.0.patch, PIG-117-v.0.7.0.patch
>
>
> I've coded a LoadFunc implementation that can read from Hive Columnar RC 
> tables, this is needed for a project that I'm working on because all our data 
> is stored using the Hive thrift serialized Columnar RC format. I have looked 
> at the piggy bank but did not find any implementation that could do this. 
> We've been running it on our cluster for the last week and have worked out 
> most bugs.
>  
> There are still some improvements to be done but I would need  like setting 
> the amount of mappers based on date partitioning. Its been optimized so as to 
> read only specific columns and can churn through a data set almost 8 times 
> faster with this improvement because not all column data is read.
> I would like to contribute the class to the piggybank can you guide me in 
> what I need to do?
> I've used hive specific classes to implement this, is it possible to add this 
> to the piggy bank build ivy for automatic download of the dependencies?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2009-11-10 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12776098#action_12776098
 ] 

Santhosh Srinivasan commented on PIG-1065:
--

bq. Aliasing inside foreach is hugely useful for readability. Are you 
suggesting removing the ability to assign aliases inside a forearch, or just to 
change/assign schemas?

For consistency, all relational operators should support the AS clause. 
Gradually, the aliasing on a per column basis in foreach should be removed from 
the documentation, deprecated and eventually removed. This is a long term 
recommendation.

> In-determinate behaviour of Union when there are 2 non-matching schema's
> 
>
> Key: PIG-1065
> URL: https://issues.apache.org/jira/browse/PIG-1065
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
> Fix For: 0.6.0
>
>
> I have a script which first does a union of these schemas and then does a 
> ORDER BY of this result.
> {code}
> f1 = LOAD '1.txt' as (key:chararray, v:chararray);
> f2 = LOAD '2.txt' as (key:chararray);
> u0 = UNION f1, f2;
> describe u0;
> dump u0;
> u1 = ORDER u0 BY $0;
> dump u1;
> {code}
> When I run in Map Reduce mode I get the following result:
> $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
> 
> Schema for u0 unknown.
> 
> (1,2)
> (2,3)
> (1)
> (2)
> 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias u1
> at org.apache.pig.PigServer.openIterator(PigServer.java:475)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> 
> Caused by: java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 
> When I run the same script in local mode I get a different result, as we know 
> that local mode does not use any Hadoop Classes.
> $java -cp pig.jar org.apache.pig.Main -x local broken.pig
> 
> Schema for u0 unknown
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> Here are some questions
> 1) Why do we allow union if the schemas do not match
> 2) Should we not print an error message/warning so that the user knows that 
> this is not allowed or he can get unexpected results?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2009-11-10 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12775968#action_12775968
 ] 

Santhosh Srinivasan commented on PIG-1065:
--

The schema will then correspond to the prefix as it is implemented today. For 
example if the AS statement is define for the flatten($1) and if $1 flattens to 
10 columns and if the AS clause has 3 columns then the prefix is used and the 
remaining are left undefined.

> In-determinate behaviour of Union when there are 2 non-matching schema's
> 
>
> Key: PIG-1065
> URL: https://issues.apache.org/jira/browse/PIG-1065
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
> Fix For: 0.6.0
>
>
> I have a script which first does a union of these schemas and then does a 
> ORDER BY of this result.
> {code}
> f1 = LOAD '1.txt' as (key:chararray, v:chararray);
> f2 = LOAD '2.txt' as (key:chararray);
> u0 = UNION f1, f2;
> describe u0;
> dump u0;
> u1 = ORDER u0 BY $0;
> dump u1;
> {code}
> When I run in Map Reduce mode I get the following result:
> $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
> 
> Schema for u0 unknown.
> 
> (1,2)
> (2,3)
> (1)
> (2)
> 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias u1
> at org.apache.pig.PigServer.openIterator(PigServer.java:475)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> 
> Caused by: java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 
> When I run the same script in local mode I get a different result, as we know 
> that local mode does not use any Hadoop Classes.
> $java -cp pig.jar org.apache.pig.Main -x local broken.pig
> 
> Schema for u0 unknown
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> Here are some questions
> 1) Why do we allow union if the schemas do not match
> 2) Should we not print an error message/warning so that the user knows that 
> this is not allowed or he can get unexpected results?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: RequiredFields contents

2009-11-05 Thread Santhosh Srinivasan
The first element in the pair is the input number. Its mostly 0 for most
operators. For multi-input operators like join and cogroup, it will
range from 0 to (n - 1) where  n is the number of inputs.

Santhosh

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] 
Sent: Thursday, November 05, 2009 9:05 PM
To: pig-dev@hadoop.apache.org
Subject: RequiredFields contents

Hi all,

I am looking at the RequiredFields class and it has this explanation of
what getFields() returns:

/**
 * List of fields required from the input. This includes fields that
are
 * transformed, and thus are no longer the same fields. Using the
example 'B
 * = foreach A generate $0, $2, $3, udf($1)' would produce the list
(0, 0),
 * (0, 2), (0, 3), (0, 1). Note that the order is not guaranteed.
 */


The second element of the pair is self-explanatory -- but what is the
first element in the pair?

Thanks,
-Dmitriy


[jira] Commented: (PIG-1065) In-determinate behaviour of Union when there are 2 non-matching schema's

2009-11-05 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774153#action_12774153
 ] 

Santhosh Srinivasan commented on PIG-1065:
--

Answer to Question 1: Pig 1.0 had that syntax and it was retained for backward 
compatibility. Paolo suggested that for uniformity, the 'AS' clause for the 
load statements should be extended to all relational operators. Gradually, the 
column aliasing in the foreach should be removed from the documentation and 
eventually removed from the language.

> In-determinate behaviour of Union when there are 2 non-matching schema's
> 
>
> Key: PIG-1065
> URL: https://issues.apache.org/jira/browse/PIG-1065
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Viraj Bhat
> Fix For: 0.6.0
>
>
> I have a script which first does a union of these schemas and then does a 
> ORDER BY of this result.
> {code}
> f1 = LOAD '1.txt' as (key:chararray, v:chararray);
> f2 = LOAD '2.txt' as (key:chararray);
> u0 = UNION f1, f2;
> describe u0;
> dump u0;
> u1 = ORDER u0 BY $0;
> dump u1;
> {code}
> When I run in Map Reduce mode I get the following result:
> $java -cp pig.jar:$HADOOP_HOME/conf org.apache.pig.Main broken.pig
> 
> Schema for u0 unknown.
> 
> (1,2)
> (2,3)
> (1)
> (2)
> 
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias u1
> at org.apache.pig.PigServer.openIterator(PigServer.java:475)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:532)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:166)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:142)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> 
> Caused by: java.io.IOException: Type mismatch in key from map: expected 
> org.apache.pig.impl.io.NullableBytesWritable, recieved 
> org.apache.pig.impl.io.NullableText
> at 
> org.apache.hadoop.mapred.MapTask$MapOutputBuffer.collect(MapTask.java:415)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.collect(PigMapReduce.java:108)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.runPipeline(PigMapBase.java:251)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapBase.map(PigMapBase.java:240)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapReduce$Map.map(PigMapReduce.java:93)
> at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:47)
> at org.apache.hadoop.mapred.MapTask.run(MapTask.java:227)
> 
> When I run the same script in local mode I get a different result, as we know 
> that local mode does not use any Hadoop Classes.
> $java -cp pig.jar org.apache.pig.Main -x local broken.pig
> 
> Schema for u0 unknown
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> (1,2)
> (1)
> (2,3)
> (2)
> 
> Here are some questions
> 1) Why do we allow union if the schemas do not match
> 2) Should we not print an error message/warning so that the user knows that 
> this is not allowed or he can get unexpected results?
> Viraj

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1073) LogicalPlanCloner can't clone plan containing LOJoin

2009-11-05 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12774147#action_12774147
 ] 

Santhosh Srinivasan commented on PIG-1073:
--

If my memory serves me correctly, the logical plan cloning was implemented (by 
me) for cloning inner plans for foreach. As such, the top level plan cloning 
was never tested and some items are marked as TODO (see visit methods for 
LOLoad, LOStore and LOStream).

If you want to use it as you mention in your test cases, then you need to add 
code for cloning the LOLoad, LOStore, LOStream and LOJoin operators.


> LogicalPlanCloner can't clone plan containing LOJoin
> 
>
> Key: PIG-1073
> URL: https://issues.apache.org/jira/browse/PIG-1073
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Ashutosh Chauhan
>
> Add following testcase in LogicalPlanBuilder.java
> public void testLogicalPlanCloner() throws CloneNotSupportedException{
> LogicalPlan lp = buildPlan("C = join ( load 'A') by $0, (load 'B') by 
> $0;");
> LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
> cloner.getClonedPlan();
> }
> and this fails with the following stacktrace:
> java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
> at 
> org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
> at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
> at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:67)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:69)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(LogicalPlanCloneHelper.java:73)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(LogicalPlanCloner.java:46)
> at 
> org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(TestLogicalPlanBuilder.java:2110)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: How to clone a logical plan ?

2009-11-05 Thread Santhosh Srinivasan
If my memory serves me correctly, the logical plan cloning was
implemented (by me) for cloning inner plans for foreach. As such, the
top level plan cloning was never tested and some items are marked as
TODO (see visit methods for LOLoad, LOStore and LOStream).

If you want to use it as you mention in your test cases, then you need
to add code for cloning the LOLoad, LOStore, LOStream and LOJoin.

Santhosh


-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Thursday, November 05, 2009 4:04 PM
To: pig-dev@hadoop.apache.org
Subject: RE: How to clone a logical plan ?

You have hit a bug. I think LOJoin has to be added to
LogicalPlanCloneHelper.java. Can you file a jira?

Thanks,
Santhosh

-Original Message-
From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com]
Sent: Thursday, November 05, 2009 3:28 PM
To: pig-dev@hadoop.apache.org
Subject: How to clone a logical plan ?

Hi,

For our cost based optimizer for a given query plan we need to generate
alternative query plans and evaluate them based on their estimated cost.
As a result of that, I want to clone a logical plan. I thought
LogicalPlanCloner is meant for that, but it doesnt seem to work. I added
this simple test case in TestLogicalPlanBuilder.java

public void testLogicalPlanCloneHelper() throws
CloneNotSupportedException{
LogicalPlan lp = buildPlan("C = join ( load 'A') by $0, (load
'B') by $0;");
LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
cloner.getClonedPlan();
}

and this fails with the following stacktrace:

java.lang.NullPointerException
at
org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
at
org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
at
org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:67)
at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:69)
at
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at
org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(Lo
gicalPlanCloneHelper.java:73)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(Logical
PlanCloner.java:46)
at
org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(Te
stLogicalPlanBuilder.java:2110)

I am debugging this, but wanted to ask if I have hit a bug here or if I
am doing something wrong?

Thanks,
Ashutosh


RE: How to clone a logical plan ?

2009-11-05 Thread Santhosh Srinivasan
You have hit a bug. I think LOJoin has to be added to
LogicalPlanCloneHelper.java. Can you file a jira?

Thanks,
Santhosh

-Original Message-
From: Ashutosh Chauhan [mailto:ashutosh.chau...@gmail.com] 
Sent: Thursday, November 05, 2009 3:28 PM
To: pig-dev@hadoop.apache.org
Subject: How to clone a logical plan ?

Hi,

For our cost based optimizer for a given query plan we need to generate
alternative query plans and evaluate them based on their estimated cost.
As a result of that, I want to clone a logical plan. I thought
LogicalPlanCloner is meant for that, but it doesnt seem to work. I added
this simple test case in TestLogicalPlanBuilder.java

public void testLogicalPlanCloneHelper() throws
CloneNotSupportedException{
LogicalPlan lp = buildPlan("C = join ( load 'A') by $0, (load
'B') by $0;");
LogicalPlanCloner cloner = new LogicalPlanCloner(lp);
cloner.getClonedPlan();
}

and this fails with the following stacktrace:

java.lang.NullPointerException
at
org.apache.pig.impl.logicalLayer.LOVisitor.visit(LOVisitor.java:171)
at
org.apache.pig.impl.logicalLayer.PlanSetter.visit(PlanSetter.java:63)
at
org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:213)
at org.apache.pig.impl.logicalLayer.LOJoin.visit(LOJoin.java:45)
at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:67)
at
org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.ja
va:69)
at
org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
at
org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloneHelper.getClonedPlan(Lo
gicalPlanCloneHelper.java:73)
at
org.apache.pig.impl.logicalLayer.LogicalPlanCloner.getClonedPlan(Logical
PlanCloner.java:46)
at
org.apache.pig.test.TestLogicalPlanBuilder.testLogicalPlanCloneHelper(Te
stLogicalPlanBuilder.java:2110)

I am debugging this, but wanted to ask if I have hit a bug here or if I
am doing something wrong?

Thanks,
Ashutosh


[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-29 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771442#action_12771442
 ] 

Santhosh Srinivasan commented on PIG-1016:
--

Hc Busy, thanks for taking time to contribute the patch, explaining the details 
and especially for being patient. A few more questions and details have to be 
cleared up before we commit this patch.

IMHO, the right comparison should be along the lines of checking if o1 and o2 
are NullableBytesWritable followed by a check for PigNullableWritable and then 
followed by error handling code.

Alan, can you comment on this approach?

There is a more important semantic issue. If the map value types are strings 
and if the strings are numeric, then the value types for the maps will be of 
different types. In that case, the load function will break. In addition, 
conversion routines might fail when the compareTo method is invoked. An example 
to illustrate this issue.

Suppose, the records is ['key'#1234567890124567]. PIG-880 would treat the value 
as a string and there would be no problem. Now, with the changes reverted, the 
type is inferred as integer and the parsing will fail as the value is too big 
to fit into an integer

Secondly, assuming that the integer was small enough to be converted, the 
comparison method in DataType.java will return the wrong results when an 
integer and a string are compared. For example, if the records are:

[key#*$]
[key#123]

The first value is treated as a string and the second value is treated as an 
integer. The compareTo method will return 1 to indicate that string > integer 
while in reality 123 > *$

Please correct me if the last statement is incorrect or let me know if it needs 
more explanation.

Thoughts/comments from other committers?

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Fix For: 0.5.0
>
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-28 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12771287#action_12771287
 ] 

Santhosh Srinivasan commented on PIG-1016:
--

I am summarizing my understanding of the patch that has been submitted by hc 
busy.

Root cause: PIG-880 changed the value type of maps in PigStorage from native 
Java types to DataByteArray. As a result of this change, parsing of complex 
types as map values was disabled.

Proposed fix: Revert the changes made as part of PIG-880 to interpret map 
values as Java types. In addition, change the comparison method to check for 
the object type and call the appropriate compareTo method. The latter is 
required to workaround the fact that the front-end assigns the value type to be 
DataByteArray whereas the backend sees the actual type (Integer, Long, Tuple, 
DataBag, etc.)

Based on this understanding I have the following review comment(s).

Index: 
src/org/apache/pig/backend/hadoop/executionengine/mapReduceLayer/PigBytesRawComparator.java
===

Can you explain the checks in the if and the else? Specifically, 
NullableBytesWritable is a subclass of PigNullableWritable. As a result, in the 
if part, the check for both o1 and o2 not being PigNullableWritable is 
confusing as nbw1 and nbw2 are cast to NullableBytesWritable if o1 and o2 are 
not PigNullableWritable.  

{code}
+// find bug is complaining about nulls. This check sequence will 
prevent nulls from being dereferenced.
+if(o1!=null && o2!=null){
+
+// In case the objects are comparable
+if((o1 instanceof NullableBytesWritable && o2 instanceof 
NullableBytesWritable)||
+   !(o1 instanceof PigNullableWritable && o2 instanceof 
PigNullableWritable)
+){
+
+  NullableBytesWritable nbw1 = (NullableBytesWritable)o1;
+  NullableBytesWritable nbw2 = (NullableBytesWritable)o2;
+  
+  // If either are null, handle differently.
+  if (!nbw1.isNull() && !nbw2.isNull()) {
+  rc = 
((DataByteArray)nbw1.getValueAsPigType()).compareTo((DataByteArray)nbw2.getValueAsPigType());
+  } else {
+  // For sorting purposes two nulls are equal.
+  if (nbw1.isNull() && nbw2.isNull()) rc = 0;
+  else if (nbw1.isNull()) rc = -1;
+  else rc = 1;
+  }
+}else{
+  // enter here only if both o1 and o2 are 
non-NullableByteWritable PigNullableWritable's
+  PigNullableWritable nbw1 = (PigNullableWritable)o1;
+  PigNullableWritable nbw2 = (PigNullableWritable)o2;
+  // If either are null, handle differently.
+  if (!nbw1.isNull() && !nbw2.isNull()) {
+  rc = nbw1.compareTo(nbw2);
+  } else {
+  // For sorting purposes two nulls are equal.
+  if (nbw1.isNull() && nbw2.isNull()) rc = 0;
+  else if (nbw1.isNull()) rc = -1;
+  else rc = 1;
+  }
+}
+}else{
+  if(o1==null && o2==null){rc=0;}
+  else if(o1==null) {rc=-1;}
+  else{ rc=1; }
{code}

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Fix For: 0.5.0
>
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1056) table can not be loaded after store

2009-10-27 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12770743#action_12770743
 ] 

Santhosh Srinivasan commented on PIG-1056:
--

Do you have the right load statement? I don't see the using clause that 
specifies the zebra loader.

> table can not be loaded after store
> ---
>
> Key: PIG-1056
> URL: https://issues.apache.org/jira/browse/PIG-1056
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.6.0
>Reporter: Jing Huang
>
> Pig Stack Trace
> ---
> ERROR 1018: Problem determining schema during load
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1000: Error during 
> parsing. Problem determining schema during load
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1023)
> at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:967)
> at org.apache.pig.PigServer.registerQuery(PigServer.java:383)
> at 
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:716)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:324)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:144)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.parser.ParseException: Problem 
> determining schema during load
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:734)
> at 
> org.apache.pig.impl.logicalLayer.LogicalPlanBuilder.parse(LogicalPlanBuilder.java:63)
> at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1017)
> ... 8 more
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1018: 
> Problem determining schema during load
> at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:155)
> at 
> org.apache.pig.impl.logicalLayer.parser.QueryParser.Parse(QueryParser.java:732)
> ... 10 more
> Caused by: java.io.IOException: No table specified for input
> at 
> org.apache.hadoop.zebra.pig.TableLoader.checkConf(TableLoader.java:238)
> at 
> org.apache.hadoop.zebra.pig.TableLoader.determineSchema(TableLoader.java:258)
> at org.apache.pig.impl.logicalLayer.LOLoad.getSchema(LOLoad.java:148)
> ... 11 more
> 
> ~ 
> 
> script:
> register /grid/0/dev/hadoopqa/hadoop/lib/zebra.jar;
> A = load 'filter.txt' as (name:chararray, age:int);
> B = filter A by age < 20;
> --dump B;
> store B into 'filter1' using 
> org.apache.hadoop.zebra.pig.TableStorer('[name];[age]');
> rec1 = load 'B' using org.apache.hadoop.zebra.pig.TableLoader();
> dump rec1;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1012) FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in serializable class

2009-10-21 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12768368#action_12768368
 ] 

Santhosh Srinivasan commented on PIG-1012:
--

I just looked at the first patch. It was setting generate to true in 
TestMRCompiler.java It should be set to false in order to run the test case 
correctly.

+++ test/org/apache/pig/test/TestMRCompiler.java

-private boolean generate = false;
+private boolean generate = true;

> FINDBUGS: SE_BAD_FIELD: Non-transient non-serializable instance field in 
> serializable class
> ---
>
> Key: PIG-1012
> URL: https://issues.apache.org/jira/browse/PIG-1012
> Project: Pig
>  Issue Type: Bug
>Reporter: Olga Natkovich
> Attachments: PIG-1012-2.patch, PIG-1012.patch
>
>
> SeClass org.apache.pig.backend.executionengine.PigSlice defines 
> non-transient non-serializable instance field is
> SeClass org.apache.pig.backend.executionengine.PigSlice defines 
> non-transient non-serializable instance field loader
> Sejava.util.zip.GZIPInputStream stored into non-transient field 
> PigSlice.is
> Seorg.apache.pig.backend.datastorage.SeekableInputStream stored into 
> non-transient field PigSlice.is
> Seorg.apache.tools.bzip2r.CBZip2InputStream stored into non-transient 
> field PigSlice.is
> Seorg.apache.pig.builtin.PigStorage stored into non-transient field 
> PigSlice.loader
> Seorg.apache.pig.backend.hadoop.DoubleWritable$Comparator implements 
> Comparator but not Serializable
> Se
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigBagWritableComparator
>  implements Comparator but not Serializable
> Se
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigCharArrayWritableComparator
>  implements Comparator but not Serializable
> Se
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigDBAWritableComparator
>  implements Comparator but not Serializable
> Se
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigDoubleWritableComparator
>  implements Comparator but not Serializable
> Se
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigFloatWritableComparator
>  implements Comparator but not Serializable
> Se
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigIntWritableComparator
>  implements Comparator but not Serializable
> Se
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigLongWritableComparator
>  implements Comparator but not Serializable
> Se
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigTupleWritableComparator
>  implements Comparator but not Serializable
> Se
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.JobControlCompiler$PigWritableComparator
>  implements Comparator but not Serializable
> SeClass 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceOper 
> defines non-transient non-serializable instance field nig
> SeClass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.EqualToExpr
>  defines non-transient non-serializable instance field log
> SeClass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GreaterThanExpr
>  defines non-transient non-serializable instance field log
> SeClass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.GTOrEqualToExpr
>  defines non-transient non-serializable instance field log
> SeClass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.LessThanExpr
>  defines non-transient non-serializable instance field log
> SeClass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.LTOrEqualToExpr
>  defines non-transient non-serializable instance field log
> SeClass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.NotEqualToExpr
>  defines non-transient non-serializable instance field log
> SeClass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POCast
>  defines non-transient non-serializable instance field log
> SeClass 
> org.apache.pig.backend.hadoop.executionengine.physicalLayer.expressionOperators.POProject
>  defines non-tr

[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-15 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766382#action_12766382
 ] 

Santhosh Srinivasan commented on PIG-1016:
--

hc busy,

>From your example snippet, I was not able to understand if Pig is preventing 
>you from doing that based on the current code base. If not, what is the error 
>that you are seeing?

Santhosh

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-14 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765779#action_12765779
 ] 

Santhosh Srinivasan commented on PIG-1014:
--

Another option is to change the implementation of COUNT to reflect the proposed 
semantics. If the underlying UDF is changed then the user should be notified 
via an information message. If the user checks the explain output then (s)he 
will notice COUNT_STAR and will be confused.

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1016) Reading in map data seems broken

2009-10-14 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765695#action_12765695
 ] 

Santhosh Srinivasan commented on PIG-1016:
--

The fix proposed in this JIRA reverts the changes made as part of PIG-880. Can 
you explain in more detail about the issue that you are facing currently? 
Specifically, can you provide a test case that reproduces this bug.

> Reading in map data seems broken
> 
>
> Key: PIG-1016
> URL: https://issues.apache.org/jira/browse/PIG-1016
> Project: Pig
>  Issue Type: Improvement
>  Components: data
>Affects Versions: 0.4.0
>Reporter: hc busy
> Attachments: PIG-1016.patch
>
>
> Hi, I'm trying to load a map that has a tuple for value. The read fails in 
> 0.4.0 because of a misconfiguration in the parser. Where as in almost all 
> documentation it is stated that value of the map can be any time.
> I've attached a patch that allows us to read in complex objects as value as 
> documented. I've done simple verification of loading in maps with tuple/map 
> values and writing them back out using LOAD and STORE. All seems to work fine.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-13 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765357#action_12765357
 ] 

Santhosh Srinivasan commented on PIG-1014:
--

After a discussion with Pradeep who also graciously ran SQL queries to verify 
semantics, we have the following proposal:

The semantics of COUNT could be defined as:

1. COUNT( A ) is equivalent to COUNT( A.* ) and the result of COUNT( A ) will 
count null tuples in the relation
2. COUNT( A.$0) will not count null tuples in the relation

3. COUNT(A.($0, $1)) is equivalent to COUNT( A1.* ) where A1 is the relation 
containing tuples with two columns and will exhibit the behavior of statement 1

OR 

3. COUNT(A.($0, $1)) is equivalent to COUNT( A1.* ) where A1 is the relation 
containing tuples with two columns and will exhibit the behavior of statement 2

Point 3 needs more discussion.

Comments/thoughts/suggestions/anything else welcome.


> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-13 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12765194#action_12765194
 ] 

Santhosh Srinivasan commented on PIG-1014:
--

Essentially, Pradeep is pointing out an issue in the implementation of COUNT. 
If that is the case then COUNT has to be fixed or the semantics of COUNT has to 
be documented to explain the current implementation. I would vote for fixing 
COUNT to have the correct semantics.

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-12 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764846#action_12764846
 ] 

Santhosh Srinivasan commented on PIG-984:
-

Very quick comment. The parser has a log.info which should be converted to a 
log.debug

Index: src/org/apache/pig/impl/logicalLayer/parser/QueryParser.jjt
===


+[ ("\"collected\"" { 
+log.info("Using mapside");


> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>Assignee: Richard Ding
> Attachments: PIG-984.patch, PIG-984_1.patch
>
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-12 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764792#action_12764792
 ] 

Santhosh Srinivasan commented on PIG-1014:
--

If the user wants to count without nulls then the user should use COUNT_STAR. 
One of the philosophies of Pig has been to allow users to do exactly what they 
want. Here, we are violating that philosophy and secondly we are second 
guessing the user's intention.

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-12 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764771#action_12764771
 ] 

Santhosh Srinivasan commented on PIG-1014:
--

Is Pig trying to guess the user's intent? What if the user wanted to do count 
without nulls ?

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-1014) Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all records are counted without considering nullness of the fields in the records

2009-10-10 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-1014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764368#action_12764368
 ] 

Santhosh Srinivasan commented on PIG-1014:
--

When the semantics of COUNT was changed, I thought this was communicated with 
the users. What is the intention of this jira?

> Pig should convert COUNT(relation) to COUNT_STAR(relation) so that all 
> records are counted without considering nullness of the fields in the records
> 
>
> Key: PIG-1014
> URL: https://issues.apache.org/jira/browse/PIG-1014
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Pradeep Kamath
>


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-995) Limit Optimizer throw exception "ERROR 2156: Error while fixing projections"

2009-10-09 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764119#action_12764119
 ] 

Santhosh Srinivasan commented on PIG-995:
-

Review comments:

The initialization code is fine. However, the try catch block is shared between 
the rebuildSchemas() and rebuildProjectionMaps() method invocation. This could 
lead to misleading error message. Specifically, if the rebuildSchemas() throws 
an exception then the error message will indicate that rebuilding projection 
maps failed.

> Limit Optimizer throw exception "ERROR 2156: Error while fixing projections"
> 
>
> Key: PIG-995
> URL: https://issues.apache.org/jira/browse/PIG-995
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.6.0
>
> Attachments: PIG-995-1.patch, PIG-995-2.patch, PIG-995-3.patch
>
>
> The following script fail:
> A = load '1.txt' AS (a0, a1, a2);
> B = order A by a1;
> C = limit B 10;
> D = foreach C generate $0;
> dump D;
> Error log:
> Caused by: org.apache.pig.impl.plan.VisitorException: ERROR 2156: Error while 
> fixing projections. Projection map of node to be replaced is null.
> at 
> org.apache.pig.impl.logicalLayer.ProjectFixerUpper.visit(ProjectFixerUpper.java:138)
> at 
> org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:408)
> at org.apache.pig.impl.logicalLayer.LOProject.visit(LOProject.java:58)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.depthFirst(DepthFirstWalker.java:65)
> at 
> org.apache.pig.impl.plan.DepthFirstWalker.walk(DepthFirstWalker.java:50)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.LOForEach.rewire(LOForEach.java:761)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-10-01 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761270#action_12761270
 ] 

Santhosh Srinivasan commented on PIG-984:
-

bq. But this is in line with what we've done for joins, philosophically, 
semantically, and syntacticly.

Not exactly; with joins we are exposing different kinds of joins. Here we are 
exposing the underlying aspects of the framework (mapside). If there is a 
parallel framework that does not do map-reduce then having mapside in the 
language is philosophically and semantically not correct.

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-09-30 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761073#action_12761073
 ] 

Santhosh Srinivasan commented on PIG-984:
-

bq. This is something that can be inferred looking at the schema and 
distribution key. I understand wanting a manual handle to turn on the behavior 
while developing, but the production version of this can be done automatically 
( "if distributed by and sorted on a subset of group keys, apply map-side 
group" rule in the optimizer).

+1 Thats what I meant when I said

bq. 1. I am concerned about extending the language for supporting features that 
can be handled internally. The scope of the language has not been defined but 
the language continues to evolve.

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-984) PERFORMANCE: Implement a map-side group operator to speed up processing of ordered data

2009-09-30 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12761028#action_12761028
 ] 

Santhosh Srinivasan commented on PIG-984:
-

A couple of things:

1. I am concerned about extending the language for supporting features that can 
be handled internally. The scope of the language has not been defined but the 
language continues to evolve.

2. I agree with Thejas' comment about allowing expressions that do not alter 
the property. Pig will not be able to check that but it is no different from 
being able to check if the data is sorted or not.

> PERFORMANCE: Implement a map-side group operator to speed up processing of 
> ordered data 
> 
>
> Key: PIG-984
> URL: https://issues.apache.org/jira/browse/PIG-984
> Project: Pig
>  Issue Type: New Feature
>Reporter: Richard Ding
>
> The general group by operation in Pig needs both mappers and reducers (the 
> aggregation is done in reducers). This incurs disk writes/reads  between 
> mappers and reducers.
> However, in the cases where the input data has the following properties
>1. The records with the same key are grouped together (such as the data is 
> sorted by the keys).
>2. The records with the same key are in the same mapper input.
> the group by operation can be performed in the mappers only and thus remove 
> the overhead of disk writes/reads.
> Alan proposed adding a hint to the group by clause like this one:
> {code}
> A = load 'input' using SomeLoader(...);
> B = group A by $0 using "mapside";
> C = foreach B generate ...
> {code}
> The proposed addition of using "mapside" to group will be a mapside group 
> operator that collects all records for a given key into a buffer. When it 
> sees a key change it will emit the key and bag for records it had buffered. 
> It will assume that all keys for a given record are collected together and 
> thus there is not need to buffer across keys. 
> It is expected that "SomeLoader" will be implemented by data systems such as 
> Zebra to ensure the data emitted by the loader satisfies the above properties 
> (1) and (2).
> It will be the responsibility of the user (or the loader) to guarantee these 
> properties (1) & (2) before invoking the mapside hint for the group by 
> clause. The Pig runtime can't check for the errors in the input data.
> For the group by clauses with mapside hint, Pig Latin will only support group 
> by columns (including *), not group by expressions nor group all. 
>   

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Revisit Pig Philosophy?

2009-09-20 Thread Santhosh Srinivasan
Hey Milind,

Varaha is a boar and not a pig :) I agree with you on the point that Pig
and Pig Latin have not been clearly defined and most times they are used
interchangeably.

Santhosh 

-Original Message-
From: Milind A Bhandarkar [mailto:mili...@yahoo-inc.com] 
Sent: Friday, September 18, 2009 8:02 PM
To: pig-dev@hadoop.apache.org
Cc: pig-dev@hadoop.apache.org
Subject: Re: Revisit Pig Philosophy?

It's Friday evening, so I have some time to discuss philosophy ;-)

Before we discuss any question about revisiting pig philosophy, the
first question that needs to be answered is "what is pig" ? (this
corresponds to the Hindu philosophy's basic argument, that any deep
personal philosophical investigations need to start with a question
"koham?" (in Sanskrit, it means 'who am I?'))

So, coming back to approx 4000 years after the origin of that
philosophy, we need to ask "what is pig?" (incidentally, pig, or varaaha
in Sanskrit, was the second incarnation of lord Vishnu in hindu
scriptures, but that's not relevant here.)

What we need to decide is, is pig is a dataflow language ? I think not.
"Pig Latin" is the language. Pig is referred to in countless slide decks
( aka pig scriptures, btw I own 50% of these scriptures) as a runtime
system that interprets pig Latin, kind of like java and jvm. (Duality of
nature, called "dwaita" philosophy in sanskrit is applicable here. But I
won't go deeper than that.)

So, pig-Latin-the-language's stance  could still be that it could be
implemented on any runtime. But pig the runtime's philosophy could be
that it is a thin layer on top of hadoop. And all the world could
breathe a sigh of relief. (mostly, by not having to answer these
philosophical questions.)

So, 'koham' is the 4000 year old question this project needs to answer.
That's all.

AUM.. (it's Friday.)

- (swami) Milind ;-)

On Sep 18, 2009, at 19:05, "Jeff Hammerbacher" 
wrote:

> Hey,
>
>> 2. Local mode and other parallel frameworks
>>
>> 
>> Pigs Live Anywhere
>>
>> Pig is intended to be a language for parallel data processing. It is 
>> not tied to one particular parallel framework. It has been 
>> implemented first on hadoop, but we do not intend that to be only on 
>> hadoop.
>> 
>>
>> Are we still holding onto this? What about local mode? Local mode is 
>> not being treated on equal footing with that of Hadoop for practical 
>> reasons. However, users expect things that work on local mode to work

>> without any hitches on Hadoop.
>>
>> Are we still designing the system assuming that Pig will be stacked 
>> on top of other parallel frameworks?
>>
>
> FWIW, I appreciate this philosophical stance from Pig. Allowing 
> locally tested scripts to be migrated to the cluster without breakage 
> is a noble goal, and keeping the option of (one day) developing an 
> alternative execution environment for Pig that runs over HDFS but uses

> a richer physical set of operators than MapReduce would be great.
>
> Of course, those of you who are running Pig in production will have a 
> much better sense of the feasibility, rather than desirability, of 
> this philosophical stance.
>
> Later,
> Jeff


Revisit Pig Philosophy?

2009-09-18 Thread Santhosh Srinivasan
Pig Developers,

I looked at the Pig philosophy page as it serves as a guideline for
accepting changes to Pig. Is it time to revisit the overall philosophy? 

Reference: http://hadoop.apache.org/pig/philosophy.html

Some items of interest:

1. SQL semantics and Pig

With the recent addition of SQL on top of Pig, we are making changes to
accommodate SQL semantics. Should this be part of Pig's philosophy?

2. Local mode and other parallel frameworks


Pigs Live Anywhere

Pig is intended to be a language for parallel data processing. It is not
tied to one particular parallel framework. It has been implemented first
on hadoop, but we do not intend that to be only on hadoop.


Are we still holding onto this? What about local mode? Local mode is not
being treated on equal footing with that of Hadoop for practical
reasons. However, users expect things that work on local mode to work
without any hitches on Hadoop.

Are we still designing the system assuming that Pig will be stacked on
top of other parallel frameworks?

Thanks,
Santhosh


[jira] Commented: (PIG-955) Skewed join generates incorrect results

2009-09-11 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12754349#action_12754349
 ] 

Santhosh Srinivasan commented on PIG-955:
-

Hi Ying,

How are Fragment Replicate Join and Skewed Join related as you mention in your 
bug description? Also, skewed join has been part of trunk for more than a month 
now. Your bug description states that Pig needs skewed join.

Thanks,
Santhosh

> Skewed join generates  incorrect results 
> -
>
> Key: PIG-955
> URL: https://issues.apache.org/jira/browse/PIG-955
> Project: Pig
>  Issue Type: Improvement
>Reporter: Ying He
> Attachments: PIG-955.patch
>
>
> Fragmented replicated join has a few limitations:
>  - One of the tables needs to be loaded into memory
>  - Join is limited to two tables
> Skewed join partitions the table and joins the records in the reduce phase. 
> It computes a histogram of the key space to account for skewing in the input 
> records. Further, it adjusts the number of reducers depending on the key 
> distribution.
> We need to implement the skewed join in pig.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Request for feedback: cost-based optimizer

2009-09-01 Thread Santhosh Srinivasan
Dmitriy and Gang,

The mailing list does not allow attachments. Can you post it on a
website and just send the URL ?

Thanks,
Santhosh 

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com] 
Sent: Tuesday, September 01, 2009 9:48 AM
To: pig-dev@hadoop.apache.org
Subject: Request for feedback: cost-based optimizer

Hi everyone,
Attached is a (very) preliminary document outlining a rough design we
are proposing for a cost-based optimizer for Pig.
This is being done as a capstone project by three CMU Master's students
(myself, Ashutosh Chauhan, and Tejal Desai). As such, it is not
necessarily meant for immediate incorporation into the Pig codebase,
although it would be nice if it, or parts of it, are found to be useful
in the mainline.

We would love to get some feedback from the developer community
regarding the ideas expressed in the document, any concerns about the
design, suggestions for improvement, etc.

Thanks,
Dmitriy, Ashutosh, Tejal


[jira] Commented: (PIG-922) Logical optimizer: push up project

2009-08-25 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747560#action_12747560
 ] 

Santhosh Srinivasan commented on PIG-922:
-

For relational operators that require multiple inputs, the list will correspond 
to each of its inputs. If you notice getRequiredFields, the list is populated 
on a per input basis. In the case of getRequiredInputs, I see that the use of 
the list is not consistent.for LOJoin, LOUnion, LOCogroup and LOCross.

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
> PIG-922-p1_2.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: switching to different parser in Pig

2009-08-25 Thread Santhosh Srinivasan
Its been 6 months since this topic was discussed but we don't have
closure on it. 

For SQL on top of Pig, we are using Jflex and CUP
(https://issues.apache.org/jira/browse/PIG-824). If we have decided on
the right parser, can we have a plan to move the other parsers in Pig to
the same technology?

Thanks,
Santhosh

PS: I am assuming we are not moving to Antlr.


-Original Message-
From: Alan Gates [mailto:ga...@yahoo-inc.com] 
Sent: Tuesday, February 24, 2009 10:17 AM
To: pig-dev@hadoop.apache.org; pi.so...@gmail.com
Subject: Re: switching to different parser in Pig

Sorry, after I sent that email yesterday I realized I was not very  
clear.  I did not mean to imply that antlr didn't have good  
documentation or good error handling.  What I wanted to say was we  
want all three of those things, and it didn't appear that antlr  
provided all three, since it doesn't separate out scanner and parser.   
Also, from my viewpoint, I prefer bottom up LALR(1) parsers like yacc  
to top down parsers like javacc.  My understanding is that antlr is  
top down like javacc.  My reasoning for this preference is that parser  
books and classes have used those for decades, so there are a large  
number of engineers out there (including me :) ) who know how to work  
with them.  But maybe antlr is close enough to what we need.  I'll  
take a deeper look at it before I vote officially on which way we  
should go.

As for loops and branches, I'm not saying we need those in Pig Latin.   
We need them somehow.  Whether it's better to put them in Pig Latin or  
imbed pig in a existing script language is an ongoing debate.  I don't  
want to make a decision now that effectively ends that debate without  
buy in from those who feel strongly that Pig Latin should include  
those constructs.

I agree with you that we should modify the logical plan to support  
this rather than add another layer.  As for active development, the  
only thing I'm aware of is we hope to start working on a more robust  
optimizer for pig soon, and that will require some additional  
functionality out of the logical operators, but it shouldn't cause any  
fundamental architectural changes.

Alan.


On Feb 24, 2009, at 1:27 AM, pi song wrote:

> (1) Lack of good documentation which makes it hard to and time  
> consuming
> to learn javacc and make changes to Pig grammar
> <== ANTLR is very very well documented.
> http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference
> http://media.pragprog.com/titles/tpantlr/toc.pdf
> http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home
>
> (2) No easy way to customize error handling and error messages
> <== ANTLR has very extensive error handling support
> http://media.pragprog.com/titles/tpantlr/errors.pdf
>
> (3) Single path that performs both tokenizing and parsing
> <== What is the advantage of decoupling tokenizer and parsing ?
>
> In addition, "Composite Grammar" is very useful for keeping the parser
> modular. Things that can be treated as sub-languages such as bag  
> schema
> definition can be done and unit tested separately.
>
> ANTLRWorks http://www.antlr.org/works/index.html
> also
> makes grammar development very efficient. Think about IDE that helps  
> you
> debug your code (which is grammar).
>
> One question, is there any use case for branching and loops? The  
> current Pig
> is more like a query (declarative) language. I don't really see how  
> loop
> constructs would fit. I think what Ted mentioned is more embedding  
> Pig in
> other languages and use those languages to do loops.
>
> We should think about how the logical plan layer can be made simpler  
> for
> external use so don't have to introduce a new layer. Is there any  
> major
> active development on it? Currently I have more spare time and  
> should be
> able to help out. (BTW, I'm slow because this is just my hobby. I  
> don't want
> to drag you guys)
>
> Pi Song
>
> On Tue, Feb 24, 2009 at 6:23 AM, nitesh bhatia
 >wrote:
>
>> Hi
>> I got this info from javacc mailing lists. This may prove helpful:
>>
>>
>>



>> -Original Message- From: Ken Beesley
>> [mailto:ken@xrce.xerox.com] Sent: Wednesday, August 18, 2004 2:56
>> PM To: javacc Subject: [JavaCC] Alternatives to JavaCC (was Hello  
>> All)
>>
>> Vicas wrote:
>>
>> Hello All
>>
>> Kindly let me know other parsers available which does the same job as
>> javacc.
>>
>> It would be very nice of you if you can send me some documentation
>> related to this.
>>
>> Thanks Vikas
>>
>> (Correction and clarifications to the following would be _very_
>> welcome. I'm very likely out of date.)
>>
>> Of course, no two software tools are likely to do _exactly_ the same
>> job. Someone already pointed you to ANTLR, which is probably the
>> best-known alternative 

[jira] Commented: (PIG-922) Logical optimizer: push up project

2009-08-25 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12747521#action_12747521
 ] 

Santhosh Srinivasan commented on PIG-922:
-

I am not sure about the logic for handling cogroup. Let me take another example 
of an operator with multiple inputs - union. If you look at the code below, the 
method returns a single required fields element. The required fields element 
contains a reference to all the inputs that are required to compute that 
particular column. However, wrt cogroup you are returning a list of required 
fields that contains nulls for all the positions that are of no interest.

{code}
+ArrayList> inputList = new 
ArrayList>();
+for (int i=0;i(i, column));
+List result = new ArrayList();
+result.add(new RequiredFields(inputList));
+return result;
{code}

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch, 
> PIG-922-p1_2.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-922) Logical optimizer: push up project

2009-08-21 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12746387#action_12746387
 ] 

Santhosh Srinivasan commented on PIG-922:
-

Review comments for patch p1_1

I have not reviewed the test cases. I have reviewed all the sources.

Index: src/org/apache/pig/impl/logicalLayer/RelationalOperator.java
===

In the second example, why are A.a0 and B.b0 relevant input columns for C.$0 ? 
I don't see this logic in LOJoin.getRelevantInputs()

{code}
+ * eg2:
+ * A = load 'a' AS (a0, a1);
+ * B = load 'b' AS (b0, b1);
+ * C = join A by a0, B by b0;
+ * 
+ * Relevant input columns for C.$0 is A.a0, B.b0. Relevant input columns 
for C.$1 is A.a1.
{code}

Index: src/org/apache/pig/impl/logicalLayer/LOForEach.java
===

I am not sure about the logic for the computation of the inner plan number that 
produces the output column in getRelevantInputs. I would recommend that you 
cache the schema generated by the inner plan (as part of getSchema()) and use 
that information here.

{code}
+// find the index of foreach inner plan for this particular output 
column
+LogicalOperator pOp = null;
+int planIndex = 0;
+try {
+pOp = 
mSchema.getField(0).getReverseCanonicalMap().keySet().iterator().next();
+ 
+for (int i=1;i<=column;i++)
+{
+if 
(mSchema.getField(i).getReverseCanonicalMap().keySet().iterator().next()!=pOp)
+{
+planIndex++;
+pOp = 
mSchema.getField(i).getReverseCanonicalMap().keySet().iterator().next();
+}
+}
+} catch (FrontendException e) {
+log.warn("Cannot retrieve field schema from "+mSchema.toString());
+return null;
+}
{code}

Index: src/org/apache/pig/impl/logicalLayer/LOCogroup.java
===

Why are we adding null to the list of required fields while iterating over the 
inputs?

{code}
+if(inputNum == column-1) {
+result.add(new RequiredFields(true));
+} else {
+result.add(null);
+}
{code}


Index: src/org/apache/pig/impl/plan/RequiredFields.java
===

Where are the following methods used? I did not see any calls to them.

{code}
+
+// return true if this merge modify the object itself 
+public boolean merge(RequiredFields r2)
+{
+boolean newRequiredFields = false;
+if (r2==null)
+return newRequiredFields;
+if (r2.getNeedAllFields())
+{
+mNeedAllFields = true;
+}
+if (!r2.getNeedNoFields())
+{
+mNeedNoFields = false;
+}
+if (r2.getFields()==null)
+return newRequiredFields;
+for (Pair f:r2.getFields())
+{
+if (mFields==null)
+mFields = new ArrayList>(); 
+if (!mFields.contains(f))
+{
+mFields.add(f);
+mNeedNoFields = false;
+newRequiredFields = true;
+}
+}
+return newRequiredFields;
+}
+
+public void reIndex(int i)
+{
+for (Pair p:mFields)
+{
+p.first = i;
+}
+}
{code}

> Logical optimizer: push up project
> --
>
> Key: PIG-922
> URL: https://issues.apache.org/jira/browse/PIG-922
> Project: Pig
>  Issue Type: New Feature
>  Components: impl
>Affects Versions: 0.3.0
>Reporter: Daniel Dai
>Assignee: Daniel Dai
> Fix For: 0.4.0
>
> Attachments: PIG-922-p1_0.patch, PIG-922-p1_1.patch
>
>
> This is a continuation work of 
> [PIG-697|https://issues.apache.org/jira/browse/PIG-697]. We need to add 
> another rule to the logical optimizer: Push up project, ie, prune columns as 
> early as possible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-924) Make Pig work with multiple versions of Hadoop

2009-08-20 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12745568#action_12745568
 ] 

Santhosh Srinivasan commented on PIG-924:
-

Hadoop has promised "APIs in stone" forever and has not delivered on that 
promise yet. Higher layers in the stack have to learn how to cope with a ever 
changing lower layer. How this change is managed is a matter of convenience to 
the owners of the higher layer. I really like Shims approach which avoids the 
cost of branching out Pig every time we make a compatible release. The cost of 
creating a branch for each version of hadoop seems to be too high compared to 
the cost of the Shims approach.

Of course, there are pros and cons to each approach. The question here is when 
will Hadoop set its APIs in stone and how many more releases will we have 
before this happens. If the answer to the question is 12 months and 2 more 
releases, then we should go with the Shims approach. If the answer is 3-6 
months and one more release then we should stick with our current approach and 
pay the small penalty of patches supplied to work with the specific release of 
Hadoop.

Summary: Use the shims patch if APIs are not set in stone within a quarter or 
two and if there is more than one release of Hadoop.

> Make Pig work with multiple versions of Hadoop
> --
>
> Key: PIG-924
> URL: https://issues.apache.org/jira/browse/PIG-924
> Project: Pig
>  Issue Type: Bug
>Reporter: Dmitriy V. Ryaboy
> Attachments: pig_924.2.patch, pig_924.3.patch, pig_924.patch
>
>
> The current Pig build scripts package hadoop and other dependencies into the 
> pig.jar file.
> This means that if users upgrade Hadoop, they also need to upgrade Pig.
> Pig has relatively few dependencies on Hadoop interfaces that changed between 
> 18, 19, and 20.  It is possibly to write a dynamic shim that allows Pig to 
> use the correct calls for any of the above versions of Hadoop. Unfortunately, 
> the building process precludes us from the ability to do this at runtime, and 
> forces an unnecessary Pig rebuild even if dynamic shims are created.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Proposal to create a branch for contrib project Zebra

2009-08-18 Thread Santhosh Srinivasan
I would recommend that zebra wait for Pig 0.4.0 (a couple of weeks?). A
branch will be created for the 0.4.0 release and zebra will
automatically benefit.

Santhosh

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Tuesday, August 18, 2009 9:49 AM
To: pig-dev@hadoop.apache.org
Subject: Re: Proposal to create a branch for contrib project Zebra

Milind A Bhandarkar wrote:
> 
> Since zebra.jar is not included in pig.jar (I hope not), I can still
use
> stable zebra jar (binary) with latest pig compiled in trunk.

The problem is that though the current version is "expected to be" 
stable, it would still require some bug fixes. We essentially need to 
maintain another branch (official or a private git) to provide version 
0.1 jar with critical bug fixes.

In that sense, would it be better if we created a "zebra-v1" branch and 
commit the new features to trunk? May be for regular users we can create

Pig.jar and zebra.jar from different lines.

Raghu.

> Also, build failure in zebra need not impact pig release, since the
other
> contrib, i.e. Piggybank is also "build-optional".
> 
> I think that creating a branch results in too many changes on that
branch
> before a mainline merge happens. Each of the feature additions you
mention
> would be very highly desirable even in the absence of others.
> 
> Just my 2 non-binding cents.
> 
> - milind
> 


RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
After a lot of back and forth and information sharing, its clear in my
mind that branches are not required for contrib projects.

My vote remains -1

Thanks,
Santhosh

-Original Message-
From: Milind A Bhandarkar [mailto:mili...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 5:32 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Proposal to create a branch for contrib project Zebra

IANAC, but my (non-binding) vote is also -1. I think all the
improvements
and feature addition to zebra should be available through pig trunk. The
codebase is not big enough to justify creating a branch. If the reason
is
Pig's dependence on a checked in hadoop jar, the shims proposal by
Dmitry
should be taken up asap, so that those who want to use zebra can use pig
trunk with hadoop 0.20

- milind


On 8/17/09 5:14 PM, "Yiping Han"  wrote:

> +1
> 
> 
> On 8/18/09 7:11 AM, "Olga Natkovich"  wrote:
> 
>> +1
>> 
>> -Original Message-
>> From: Raghu Angadi [mailto:rang...@yahoo-inc.com]
>> Sent: Monday, August 17, 2009 4:06 PM
>> To: pig-dev@hadoop.apache.org
>> Subject: Proposal to create a branch for contrib project Zebra
>> 
>> 
>> Thanks to the PIG team, The first version of contrib project Zebra
>> (PIG-833) is committed to PIG trunk.
>> 
>> In short, Zebra is a table storage layer built for use in PIG and
other
>> Hadoop applications.
>> 
>> While we are stabilizing current version V1 in the trunk, we plan to
add
>> 
>> more new features to it. We would like to create an svn branch for
the
>> new features. We will be responsible for managing zebra in PIG trunk
and
>> 
>> in the new branch. We will merge the branch when it is ready. We
expect
>> the changes to affect only 'contrib/zebra' directory.
>> 
>> As a regular contributor to Hadoop, I will be the initial committer
for
>> Zebra. As more patches are contributed by other Zebra developers,
there
>> might be more commiters added through normal Hadoop/Apache procedure.
>> 
>> I would like to create a branch called 'zebra-v2' with approval from
PIG
>> 
>> team.
>> 
>> Thanks,
>> Raghu.
> 
> --
> Yiping Han
> F-3140 
> (408)349-4403
> y...@yahoo-inc.com
> 


-- 
Milind Bhandarkar
Y!IM: GridSolutions
Tel: 408-349-2136 
(mili...@yahoo-inc.com)



RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
Giridharan Kesavan's omission as a committer is an oversight on part of
the hadoop team. Ideally, he should be listed as a release engineer with
committer privileges

Secondly, QA/Release/etc are necessarily evils to ship a high quality
product while contrib projects are not.

That leaves us with contrib committers.

Can you point to earlier email threads that cover the topic of giving
committer access to contrib projects? Specifically, what does it mean to
award someone committer privileges to a contrib project, what are the
access privileges that come with such rights, what are the dos/don'ts,
etc.

Thirdly, are there instances of contrib committers creating branches?

Thanks,
Santhosh

-Original Message-
From: Arun C Murthy [mailto:a...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 6:18 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Proposal to create a branch for contrib project Zebra


On Aug 17, 2009, at 4:38 PM, Santhosh Srinivasan wrote:

> Is there any precedence for such proposals? I am not comfortable with
> extending committer access to contrib teams. I would suggest that  
> Zebra
> be made a sub-project of Hadoop and have a life of its own.
>

There has been sufficient precedence for 'contrib committers' in  
Hadoop (e.g. Chukwa vis-a-vis the former 'Hadoop Core' sub-project)  
and is normal within the Apache world for committers with specific  
'roles' e.g specific Contrib modules, QA, Release/Build etc.
(http://hadoop.apache.org/common/credits.html 
  - in fact, Giridharan Kesavan is an unlisted 'release' committer for  
Apache Hadoop)

I believe it's a desired, nay stated,  goal for Zebra to graduate as a  
Hadoop sub-project eventually, based on which it was voted-in as a  
contrib module by the Apache Pig.

Given these, I don't see  any cause for concern here.

Arun

> Santhosh
>
> -Original Message-
> From: Raghu Angadi [mailto:rang...@yahoo-inc.com]
> Sent: Monday, August 17, 2009 4:06 PM
> To: pig-dev@hadoop.apache.org
> Subject: Proposal to create a branch for contrib project Zebra
>
>
> Thanks to the PIG team, The first version of contrib project Zebra
> (PIG-833) is committed to PIG trunk.
>
> In short, Zebra is a table storage layer built for use in PIG and  
> other
> Hadoop applications.
>
> While we are stabilizing current version V1 in the trunk, we plan to  
> add
>
> more new features to it. We would like to create an svn branch for the
> new features. We will be responsible for managing zebra in PIG trunk  
> and
>
> in the new branch. We will merge the branch when it is ready. We  
> expect
> the changes to affect only 'contrib/zebra' directory.
>
> As a regular contributor to Hadoop, I will be the initial committer  
> for
> Zebra. As more patches are contributed by other Zebra developers,  
> there
> might be more commiters added through normal Hadoop/Apache procedure.
>
> I would like to create a branch called 'zebra-v2' with approval from  
> PIG
>
> team.
>
> Thanks,
> Raghu.



RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
Efficiently is a subjective term. When zebra was made a contrib project,
it was very clear that they will have growing pains. If efficiency was a
top priority then zebra should have chosen the incubation route.

There will be no oversight and control into what goes into contrib. This
is a very bad precedent.

Santhosh

-Original Message-
From: Olga Natkovich 
Sent: Monday, August 17, 2009 5:37 PM
To: Santhosh Srinivasan; 'pig-dev@hadoop.apache.org'
Subject: RE: Proposal to create a branch for contrib project Zebra

Over time the plan is to move Zebra to a subproject. Until this is done,
they need to have an environment where they can do their work
efficiently. I am not sure what is the concern with allowing them to
have a dev branch.

Olga

-Original Message-----
From: Santhosh Srinivasan 
Sent: Monday, August 17, 2009 5:27 PM
To: Olga Natkovich; 'pig-dev@hadoop.apache.org'
Subject: RE: Proposal to create a branch for contrib project Zebra

Its good to know that Raghu Angadi is a PMC member and that he has
committer rights to all subprojects. That's besides the point.

The example of a branch for multi-query is not quite right. Multi-query
was part of the pig development efforts and not a contrib project.

Raghu is suggesting that he will be the first of many more committers.
If that's the case then Zebra is clearly better off being a subproject
under Hadoop. That way, Raghu need to ask for permission and the Pig
team need not deal with committers for a contrib project.

Tomorrow, there will be requests from other contrib projects for similar
reasons. I don't see this as a good enough reason to grant committer
rights to contrib projects.

Santhosh


-Original Message-
From: Olga Natkovich 
Sent: Monday, August 17, 2009 5:12 PM
To: pig-dev@hadoop.apache.org; Santhosh Srinivasan
Subject: RE: Proposal to create a branch for contrib project Zebra

Raghu is PMC member and as such already has committer rights to all
subprojects. So we are not breaking any new grounds here. The reasoning
is the same as for creating branches for Pig multiquery work that we did
in Pig.

Olga

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:39 PM
To: Santhosh Srinivasan; pig-dev@hadoop.apache.org
Subject: RE: Proposal to create a branch for contrib project Zebra

My vote is -1 

-Original Message-
From: Santhosh Srinivasan 
Sent: Monday, August 17, 2009 4:38 PM
To: 'pig-dev@hadoop.apache.org'
Subject: RE: Proposal to create a branch for contrib project Zebra

Is there any precedence for such proposals? I am not comfortable with
extending committer access to contrib teams. I would suggest that Zebra
be made a sub-project of Hadoop and have a life of its own.

Santhosh 

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:06 PM
To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra


Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.

In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.

While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and

in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.

As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.

I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.


RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
Its good to know that Raghu Angadi is a PMC member and that he has
committer rights to all subprojects. That's besides the point.

The example of a branch for multi-query is not quite right. Multi-query
was part of the pig development efforts and not a contrib project.

Raghu is suggesting that he will be the first of many more committers.
If that's the case then Zebra is clearly better off being a subproject
under Hadoop. That way, Raghu need to ask for permission and the Pig
team need not deal with committers for a contrib project.

Tomorrow, there will be requests from other contrib projects for similar
reasons. I don't see this as a good enough reason to grant committer
rights to contrib projects.

Santhosh


-Original Message-
From: Olga Natkovich 
Sent: Monday, August 17, 2009 5:12 PM
To: pig-dev@hadoop.apache.org; Santhosh Srinivasan
Subject: RE: Proposal to create a branch for contrib project Zebra

Raghu is PMC member and as such already has committer rights to all
subprojects. So we are not breaking any new grounds here. The reasoning
is the same as for creating branches for Pig multiquery work that we did
in Pig.

Olga

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:39 PM
To: Santhosh Srinivasan; pig-dev@hadoop.apache.org
Subject: RE: Proposal to create a branch for contrib project Zebra

My vote is -1 

-Original Message-
From: Santhosh Srinivasan 
Sent: Monday, August 17, 2009 4:38 PM
To: 'pig-dev@hadoop.apache.org'
Subject: RE: Proposal to create a branch for contrib project Zebra

Is there any precedence for such proposals? I am not comfortable with
extending committer access to contrib teams. I would suggest that Zebra
be made a sub-project of Hadoop and have a life of its own.

Santhosh 

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:06 PM
To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra


Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.

In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.

While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and

in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.

As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.

I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.


RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
Is there any precedence for such proposals? I am not comfortable with
extending committer access to contrib teams. I would suggest that Zebra
be made a sub-project of Hadoop and have a life of its own.

Santhosh 

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:06 PM
To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra


Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.

In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.

While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and

in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.

As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.

I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.


RE: Proposal to create a branch for contrib project Zebra

2009-08-17 Thread Santhosh Srinivasan
My vote is -1 

-Original Message-
From: Santhosh Srinivasan 
Sent: Monday, August 17, 2009 4:38 PM
To: 'pig-dev@hadoop.apache.org'
Subject: RE: Proposal to create a branch for contrib project Zebra

Is there any precedence for such proposals? I am not comfortable with
extending committer access to contrib teams. I would suggest that Zebra
be made a sub-project of Hadoop and have a life of its own.

Santhosh 

-Original Message-
From: Raghu Angadi [mailto:rang...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 4:06 PM
To: pig-dev@hadoop.apache.org
Subject: Proposal to create a branch for contrib project Zebra


Thanks to the PIG team, The first version of contrib project Zebra 
(PIG-833) is committed to PIG trunk.

In short, Zebra is a table storage layer built for use in PIG and other 
Hadoop applications.

While we are stabilizing current version V1 in the trunk, we plan to add

more new features to it. We would like to create an svn branch for the 
new features. We will be responsible for managing zebra in PIG trunk and

in the new branch. We will merge the branch when it is ready. We expect 
the changes to affect only 'contrib/zebra' directory.

As a regular contributor to Hadoop, I will be the initial committer for 
Zebra. As more patches are contributed by other Zebra developers, there 
might be more commiters added through normal Hadoop/Apache procedure.

I would like to create a branch called 'zebra-v2' with approval from PIG

team.

Thanks,
Raghu.


RE: Pig 0.4.0 release

2009-08-17 Thread Santhosh Srinivasan
Rephrasing my question:

Till we release 0.5.0, will zebra's requirement on hadoop-0.20 prevent fixing 
of any bugs/issues with Piggybank? 

Santhosh

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:47 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Till we release 0.5.0, will zebra's requirement on 0.20 prevent any bugs/issues 
with Piggybank?

Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:43 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Santhosh,

What do you mean by "fixing piggybank"?

Olga

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:37 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

I have a question:

Will we be able to fix piggybank sources given that Zebra needs 0.20 and the 
rest of Pig requires 0.18? 

If the answer is yes then, +1 for the release. I agree with the plan of making 
0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1.

Thanks,
Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 12:57 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovich wrote:
> Pig Developers,
>
>
>
> We have made several significant performance and other improvements over
> the last couple of months:
>
>
>
> (1)     Added an optimizer with several rules
>
> (2)     Introduced skew and merge joins
>
> (3)     Cleaned COUNT and AVG semantics
>
>
>
> I think it is time for another release to make this functionality
> available to users.
>
>
>
> I propose that Pig 0.4.0 is released against Hadoop 18 since most users
> are still using this version. Once Hadoop 20.1 is released, we will roll
> Pig 0.5.0 based on Hadoop 20.
>
>
>
> Please, vote on the proposal by Thursday.
>
>
>
> Olga
>
>


RE: Pig 0.4.0 release

2009-08-17 Thread Santhosh Srinivasan
Till we release 0.5.0, will zebra's requirement on 0.20 prevent any bugs/issues 
with Piggybank?

Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:43 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Santhosh,

What do you mean by "fixing piggybank"?

Olga

-Original Message-
From: Santhosh Srinivasan [mailto:s...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 1:37 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

I have a question:

Will we be able to fix piggybank sources given that Zebra needs 0.20 and the 
rest of Pig requires 0.18? 

If the answer is yes then, +1 for the release. I agree with the plan of making 
0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1.

Thanks,
Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 12:57 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovich wrote:
> Pig Developers,
>
>
>
> We have made several significant performance and other improvements over
> the last couple of months:
>
>
>
> (1)     Added an optimizer with several rules
>
> (2)     Introduced skew and merge joins
>
> (3)     Cleaned COUNT and AVG semantics
>
>
>
> I think it is time for another release to make this functionality
> available to users.
>
>
>
> I propose that Pig 0.4.0 is released against Hadoop 18 since most users
> are still using this version. Once Hadoop 20.1 is released, we will roll
> Pig 0.5.0 based on Hadoop 20.
>
>
>
> Please, vote on the proposal by Thursday.
>
>
>
> Olga
>
>


RE: Pig 0.4.0 release

2009-08-17 Thread Santhosh Srinivasan
I have a question:

Will we be able to fix piggybank sources given that Zebra needs 0.20 and the 
rest of Pig requires 0.18? 

If the answer is yes then, +1 for the release. I agree with the plan of making 
0.4.0 with Hadoop-0.18 and a later release (0.5.0) for Hadoop-0.20.1.

Thanks,
Santhosh

-Original Message-
From: Olga Natkovich [mailto:ol...@yahoo-inc.com] 
Sent: Monday, August 17, 2009 12:57 PM
To: pig-dev@hadoop.apache.org
Subject: RE: Pig 0.4.0 release

Hi Dmitry,

Non-committers get a non-binding vote.

Zebra needs Hadoop 20.1 because it is relying on TFile functionality that is 
not available in Hadoop 20. In general, the recommendation from the Hadoop team 
is to wait till hadoop 20.1 is released.

For the remainder of the issues, while I see that it would be nice to resolve 
them, I don't see them as blockers for Pig 0.4.0.

My plan was to release what's currently in the trunk and have a follow up patch 
releases if needed.

Olga

-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@cloudera.com] 
Sent: Monday, August 17, 2009 12:04 PM
To: pig-dev@hadoop.apache.org
Subject: Re: Pig 0.4.0 release

Olga,

Do non-commiters get a vote?

Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/

Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)

There are a couple of bugs still outstanding that I think would need
to get fixed before a release:

https://issues.apache.org/jira/browse/PIG-859
https://issues.apache.org/jira/browse/PIG-925

 I think all of these can be solved within a week; assuming we are
talking about a release after these go into trunk, +1.

-D


On Mon, Aug 17, 2009 at 11:46 AM, Olga Natkovich wrote:
> Pig Developers,
>
>
>
> We have made several significant performance and other improvements over
> the last couple of months:
>
>
>
> (1)     Added an optimizer with several rules
>
> (2)     Introduced skew and merge joins
>
> (3)     Cleaned COUNT and AVG semantics
>
>
>
> I think it is time for another release to make this functionality
> available to users.
>
>
>
> I propose that Pig 0.4.0 is released against Hadoop 18 since most users
> are still using this version. Once Hadoop 20.1 is released, we will roll
> Pig 0.5.0 based on Hadoop 20.
>
>
>
> Please, vote on the proposal by Thursday.
>
>
>
> Olga
>
>


[jira] Commented: (PIG-913) Error in Pig script when grouping on chararray column

2009-08-11 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-913?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742150#action_12742150
 ] 

Santhosh Srinivasan commented on PIG-913:
-

+1 for the fix. As Dmitriy indicates, we need new unit test cases after Hudson 
verifies the patch.

> Error in Pig script when grouping on chararray column
> -
>
> Key: PIG-913
> URL: https://issues.apache.org/jira/browse/PIG-913
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.4.0
>Reporter: Viraj Bhat
>Priority: Critical
> Fix For: 0.4.0
>
> Attachments: PIG-913.patch
>
>
> I have a very simple script which fails at parsetime due to the schema I 
> specified in the loader.
> {code}
> data = LOAD '/user/viraj/studenttab10k' AS (s:chararray);
> dataSmall = limit data 100;
> bb = GROUP dataSmall by $0;
> dump bb;
> {code}
> =
> 2009-08-06 18:47:56,297 [main] INFO  org.apache.pig.Main - Logging error 
> messages to: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 09/08/06 18:47:56 INFO pig.Main: Logging error messages to: 
> /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> 2009-08-06 18:47:56,459 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to hadoop file system at: hdfs://localhost:9000
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to hadoop 
> file system at: hdfs://localhost:9000
> 2009-08-06 18:47:56,694 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine - Connecting 
> to map-reduce job tracker at: localhost:9001
> 09/08/06 18:47:56 INFO executionengine.HExecutionEngine: Connecting to 
> map-reduce job tracker at: localhost:9001
> 2009-08-06 18:47:57,008 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1002: Unable to store alias bb
> 09/08/06 18:47:57 ERROR grunt.Grunt: ERROR 1002: Unable to store alias bb
> Details at logfile: /homes/viraj/pig-svn/trunk/pig_1249609676296.log
> =
> =
> Pig Stack Trace
> ---
> ERROR 1002: Unable to store alias bb
> org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1066: Unable to 
> open iterator for alias bb
> at org.apache.pig.PigServer.openIterator(PigServer.java:481)
> at 
> org.apache.pig.tools.grunt.GruntParser.processDump(GruntParser.java:531)
> at 
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:190)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:165)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:141)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:89)
> at org.apache.pig.Main.main(Main.java:397)
> Caused by: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1002: 
> Unable to store alias bb
> at org.apache.pig.PigServer.store(PigServer.java:536)
> at org.apache.pig.PigServer.openIterator(PigServer.java:464)
> ... 6 more
> Caused by: java.lang.NullPointerException
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.unsetSchema(LOCogroup.java:359)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.SchemaRemover.visit(SchemaRemover.java:64)
> at 
> org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:335)
> at org.apache.pig.impl.logicalLayer.LOCogroup.visit(LOCogroup.java:46)
> at 
> org.apache.pig.impl.plan.DependencyOrderWalker.walk(DependencyOrderWalker.java:68)
> at org.apache.pig.impl.plan.PlanVisitor.visit(PlanVisitor.java:51)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalTransformer.rebuildSchemas(LogicalTransformer.java:67)
> at 
> org.apache.pig.impl.logicalLayer.optimizer.LogicalOptimizer.optimize(LogicalOptimizer.java:187)
> at org.apache.pig.PigServer.compileLp(PigServer.java:854)
> at org.apache.pig.PigServer.compileLp(PigServer.java:791)
> at org.apache.pig.PigServer.store(PigServer.java:509)
> ... 7 more
> =

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-561) Need to generate empty tuples and bags as a part of Pig Syntax

2009-08-09 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan resolved PIG-561.
-

Resolution: Duplicate

Duplicate of PIG-773

> Need to generate empty tuples and bags as a part of Pig Syntax
> --
>
> Key: PIG-561
> URL: https://issues.apache.org/jira/browse/PIG-561
> Project: Pig
>  Issue Type: New Feature
>Affects Versions: 0.2.0
>Reporter: Viraj Bhat
>
> There is a need to sometimes generate empty tuples and bags as a part of the 
> Pig syntax rather than using UDF's
> {code}
> a = load 'mydata.txt' using PigStorage();
> b =foreach a generate ( ) as emptytuple;
> c = foreach a generate { } as emptybag;
> dump c;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-08-07 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

All optimizer related patches have been committed.

> Proposed improvements to pig's optimizer
> 
>
> Key: PIG-697
> URL: https://issues.apache.org/jira/browse/PIG-697
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Alan Gates
>    Assignee: Santhosh Srinivasan
> Attachments: Optimizer_Phase5.patch, OptimizerPhase1.patch, 
> OptimizerPhase1_part2.patch, OptimizerPhase2.patch, 
> OptimizerPhase3_parrt1-1.patch, OptimizerPhase3_parrt1.patch, 
> OptimizerPhase3_part2_3.patch, OptimizerPhase4_part1-1.patch, 
> OptimizerPhase4_part2.patch
>
>
> I propose the following changes to pig optimizer, plan, and operator 
> functionality to support more robust optimization:
> 1) Remove the required array from Rule.  This will change rules so that they 
> only match exact patterns instead of allowing missing elements in the pattern.
> This has the downside that if a given rule applies to two patterns (say 
> Load->Filter->Group, Load->Group) you have to write two rules.  But it has 
> the upside that
> the resulting rules know exactly what they are getting.  The original intent 
> of this was to reduce the number of rules that needed to be written.  But the
> resulting rules have do a lot of work to understand the operators they are 
> working with.  With exact matches only, each rule will know exactly the 
> operators it
> is working on and can apply the logic of shifting the operators around.  All 
> four of the existing rules set all entries of required to true, so removing 
> this
> will have no effect on them.
> 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
> conversions or a certain number of iterations has been reached.  Currently the
> function is:
> {code}
> public final void optimize() throws OptimizerException {
> RuleMatcher matcher = new RuleMatcher();
> for (Rule rule : mRules) {
> if (matcher.match(rule)) {
> // It matches the pattern.  Now check if the transformer
> // approves as well.
> List> matches = matcher.getAllMatches();
> for (List match:matches)
> {
>   if (rule.transformer.check(match)) {
>   // The transformer approves.
>   rule.transformer.transform(match);
>   }
> }
> }
> }
> }
> {code}
> It would change to be:
> {code}
> public final void optimize() throws OptimizerException {
> RuleMatcher matcher = new RuleMatcher();
> boolean sawMatch;
> int iterators = 0;
> do {
> sawMatch = false;
> for (Rule rule : mRules) {
> List> matches = matcher.getAllMatches();
> for (List match:matches) {
> // It matches the pattern.  Now check if the transformer
> // approves as well.
> if (rule.transformer.check(match)) {
> // The transformer approves.
> sawMatch = true;
> rule.transformer.transform(match);
> }
> }
> }
> // Not sure if 1000 is the right number of iterations, maybe it
> // should be configurable so that large scripts don't stop too 
> // early.
> } while (sawMatch && numIterations++ < 1000);
> }
> {code}
> The reason for limiting the number of iterations is to avoid infinite loops.  
> The reason for iterating over the rules is so that each rule can be applied 
> multiple
> times as necessary.  This allows us to write simple rules, mostly swaps 
> between neighboring operators, without worrying that we get the plan right in 
> one pass.
> For example, we might have a plan that looks like:  
> Load->Join->Filter->Foreach, and we want to optimize it to 
> Load->Foreach->Filter->Join.  With two simple
> rules (swap filter and join and swap foreach and filter), applied 
> iteratively, we can get from the initial to final plan, without needing to 
> understanding the
> big picture of the entire plan.
> 3) Add three cal

[jira] Commented: (PIG-912) Rename/Add 'string' as a type in place of chararray - and deprecate (and probably eventually remove) the use of 'chararray'

2009-08-06 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12740357#action_12740357
 ] 

Santhosh Srinivasan commented on PIG-912:
-

+1

> Rename/Add 'string' as a type in place of chararray - and deprecate (and 
> probably eventually remove) the use of 'chararray'
> ---
>
> Key: PIG-912
> URL: https://issues.apache.org/jira/browse/PIG-912
> Project: Pig
>  Issue Type: Bug
>Reporter: Mridul Muralidharan
>
> The type 'chararray' in pig does not refer to an array of characters (char 
> []) but rather to java.lang.String
> This is inconsistent and confusing naming; and additionally, will be a 
> interoperability issue with other systems which support schema's (zebra among 
> others).
> It would be good to have a consistent naming across projects, while also 
> having appropriate names for the various types.
> Since use of 'chararray' is already widely deployed, it would be good to :
> a) Add a type 'string' (or equivalent) which is an alias for 'chararray'.
> Additionally, it is possible to envision these too (if deemed necessary - not 
> a main requiremnt) :
> b) Modify documentation and example scripts to use this new type.
> c) Emit warnings about chararray being deprecated.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-908) Need a way to correlate MR jobs with Pig statements

2009-08-04 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12739147#action_12739147
 ] 

Santhosh Srinivasan commented on PIG-908:
-

+1

This approach has been discussed but not documented.

> Need a way to correlate MR jobs with Pig statements
> ---
>
> Key: PIG-908
> URL: https://issues.apache.org/jira/browse/PIG-908
> Project: Pig
>  Issue Type: Wish
>Reporter: Dmitriy V. Ryaboy
>
> Complex Pig Scripts often generate many Map-Reduce jobs, especially with the 
> recent introduction of multi-store capabilities.
> For example, the first script in the Pig tutorial produces 5 MR jobs.
> There is currently very little support for debugging resulting jobs; if one 
> of the MR jobs fails, it is hard to figure out which part of the script it 
> was responsible for. Explain plans help, but even with the explain plan, a 
> fair amount of effort (and sometimes, experimentation) is required to 
> correlate the failing MR job with the corresponding PigLatin statements.
> This ticket is created to discuss approaches to alleviating this problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-724) Treating map values in PigStorage

2009-07-31 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan resolved PIG-724.
-

  Resolution: Fixed
Assignee: Santhosh Srinivasan
Hadoop Flags: [Incompatible change]

Issue fixed as part of PIG-880

> Treating map values in PigStorage
> -
>
> Key: PIG-724
> URL: https://issues.apache.org/jira/browse/PIG-724
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.1
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: 0.2.1
>
>
> Currently, PigStorage cannot treats the materialized string 123 as an integer 
> with the value 123. If the user intended this to be the string 123, 
> PigStorage cannot deal with it. This reasoning also applies to doubles. Due 
> to this issue, maps that contain values which are of the same type but 
> manifest the issue discussed at beginning of the paragraph, Pig throws its 
> hands up at runtime.  An example to illustrate the problem will help.
> In the example below a sample row in the data (map.txt) contains the 
> following:
> [key01#35,key02#value01]
> When Pig tries to convert the stream to a map, it creates a Map Object> where the key is a string and the value is an integer. Running the 
> script shown below, results in a run-time error.
> {code}
> grunt> a = load 'map.txt' as (themap: map[]);
> grunt> b = filter a by (chararray)(themap#'key01') == 'hello';
>   
> grunt> dump b;
> 2009-03-18 15:19:03,773 [main] INFO  
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - 0% complete
> 2009-03-18 15:19:28,797 [main] ERROR 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher
>  - Map reduce job failed
> 2009-03-18 15:19:28,817 [main] ERROR org.apache.pig.tools.grunt.Grunt - ERROR 
> 1081: Cannot cast to chararray. Expected bytearray but received: int
> {code} 
> There are two ways to resolve this issue:
> 1. Change the conversion routine for bytesToMap to return a map where the 
> value is a bytearray and not the actual type. This change breaks backward 
> compatibility
> 2. Introduce checks in POCast where conversions that are legal in the type 
> checking world are allowed, i.e., run time checks will be made to check for 
> compatible casts. In the above example, an int can be converted to a 
> chararray and the cast will be made. If on the other hand, it was a chararray 
> to int conversion then an exception will be thrown.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-31 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Tags: MapValues Bytearray
  Resolution: Fixed
Hadoop Flags: [Incompatible change, Reviewed]
  Status: Resolved  (was: Patch Available)

Patch has been committed. This fix breaks backward compatibility where 
PigStorage reads maps. The type of the map values will now be bytearray instead 
of the actual type.

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880_1.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-07-31 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


Status: Patch Available  (was: In Progress)

> Proposed improvements to pig's optimizer
> 
>
> Key: PIG-697
> URL: https://issues.apache.org/jira/browse/PIG-697
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Alan Gates
>    Assignee: Santhosh Srinivasan
> Attachments: Optimizer_Phase5.patch, OptimizerPhase1.patch, 
> OptimizerPhase1_part2.patch, OptimizerPhase2.patch, 
> OptimizerPhase3_parrt1-1.patch, OptimizerPhase3_parrt1.patch, 
> OptimizerPhase3_part2_3.patch, OptimizerPhase4_part1-1.patch, 
> OptimizerPhase4_part2.patch
>
>
> I propose the following changes to pig optimizer, plan, and operator 
> functionality to support more robust optimization:
> 1) Remove the required array from Rule.  This will change rules so that they 
> only match exact patterns instead of allowing missing elements in the pattern.
> This has the downside that if a given rule applies to two patterns (say 
> Load->Filter->Group, Load->Group) you have to write two rules.  But it has 
> the upside that
> the resulting rules know exactly what they are getting.  The original intent 
> of this was to reduce the number of rules that needed to be written.  But the
> resulting rules have do a lot of work to understand the operators they are 
> working with.  With exact matches only, each rule will know exactly the 
> operators it
> is working on and can apply the logic of shifting the operators around.  All 
> four of the existing rules set all entries of required to true, so removing 
> this
> will have no effect on them.
> 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
> conversions or a certain number of iterations has been reached.  Currently the
> function is:
> {code}
> public final void optimize() throws OptimizerException {
> RuleMatcher matcher = new RuleMatcher();
> for (Rule rule : mRules) {
> if (matcher.match(rule)) {
> // It matches the pattern.  Now check if the transformer
> // approves as well.
> List> matches = matcher.getAllMatches();
> for (List match:matches)
> {
>   if (rule.transformer.check(match)) {
>   // The transformer approves.
>   rule.transformer.transform(match);
>   }
> }
> }
> }
> }
> {code}
> It would change to be:
> {code}
> public final void optimize() throws OptimizerException {
> RuleMatcher matcher = new RuleMatcher();
> boolean sawMatch;
> int iterators = 0;
> do {
> sawMatch = false;
> for (Rule rule : mRules) {
> List> matches = matcher.getAllMatches();
> for (List match:matches) {
> // It matches the pattern.  Now check if the transformer
> // approves as well.
> if (rule.transformer.check(match)) {
> // The transformer approves.
> sawMatch = true;
> rule.transformer.transform(match);
> }
> }
> }
> // Not sure if 1000 is the right number of iterations, maybe it
> // should be configurable so that large scripts don't stop too 
> // early.
> } while (sawMatch && numIterations++ < 1000);
> }
> {code}
> The reason for limiting the number of iterations is to avoid infinite loops.  
> The reason for iterating over the rules is so that each rule can be applied 
> multiple
> times as necessary.  This allows us to write simple rules, mostly swaps 
> between neighboring operators, without worrying that we get the plan right in 
> one pass.
> For example, we might have a plan that looks like:  
> Load->Join->Filter->Foreach, and we want to optimize it to 
> Load->Foreach->Filter->Join.  With two simple
> rules (swap filter and join and swap foreach and filter), applied 
> iteratively, we can get from the initial to final plan, without needing to 
> understanding the
> big picture of the entire plan.
> 3) Add three calls to OperatorPlan:
> {code}
> /**
>  * Swap two operators in a plan.  Both of the operators must have single
>  

[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-07-31 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


Attachment: Optimizer_Phase5.patch

Attached patch removes references to LOFRJoin and replaces it with LOJoin. All 
the optimization rules and test cases now use LOJoin.

> Proposed improvements to pig's optimizer
> 
>
> Key: PIG-697
> URL: https://issues.apache.org/jira/browse/PIG-697
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Alan Gates
>    Assignee: Santhosh Srinivasan
> Attachments: Optimizer_Phase5.patch, OptimizerPhase1.patch, 
> OptimizerPhase1_part2.patch, OptimizerPhase2.patch, 
> OptimizerPhase3_parrt1-1.patch, OptimizerPhase3_parrt1.patch, 
> OptimizerPhase3_part2_3.patch, OptimizerPhase4_part1-1.patch, 
> OptimizerPhase4_part2.patch
>
>
> I propose the following changes to pig optimizer, plan, and operator 
> functionality to support more robust optimization:
> 1) Remove the required array from Rule.  This will change rules so that they 
> only match exact patterns instead of allowing missing elements in the pattern.
> This has the downside that if a given rule applies to two patterns (say 
> Load->Filter->Group, Load->Group) you have to write two rules.  But it has 
> the upside that
> the resulting rules know exactly what they are getting.  The original intent 
> of this was to reduce the number of rules that needed to be written.  But the
> resulting rules have do a lot of work to understand the operators they are 
> working with.  With exact matches only, each rule will know exactly the 
> operators it
> is working on and can apply the logic of shifting the operators around.  All 
> four of the existing rules set all entries of required to true, so removing 
> this
> will have no effect on them.
> 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
> conversions or a certain number of iterations has been reached.  Currently the
> function is:
> {code}
> public final void optimize() throws OptimizerException {
> RuleMatcher matcher = new RuleMatcher();
> for (Rule rule : mRules) {
> if (matcher.match(rule)) {
> // It matches the pattern.  Now check if the transformer
> // approves as well.
> List> matches = matcher.getAllMatches();
> for (List match:matches)
> {
>   if (rule.transformer.check(match)) {
>   // The transformer approves.
>   rule.transformer.transform(match);
>   }
> }
> }
> }
> }
> {code}
> It would change to be:
> {code}
> public final void optimize() throws OptimizerException {
> RuleMatcher matcher = new RuleMatcher();
> boolean sawMatch;
> int iterators = 0;
> do {
> sawMatch = false;
> for (Rule rule : mRules) {
> List> matches = matcher.getAllMatches();
> for (List match:matches) {
> // It matches the pattern.  Now check if the transformer
> // approves as well.
> if (rule.transformer.check(match)) {
> // The transformer approves.
> sawMatch = true;
> rule.transformer.transform(match);
> }
> }
> }
> // Not sure if 1000 is the right number of iterations, maybe it
> // should be configurable so that large scripts don't stop too 
> // early.
> } while (sawMatch && numIterations++ < 1000);
> }
> {code}
> The reason for limiting the number of iterations is to avoid infinite loops.  
> The reason for iterating over the rules is so that each rule can be applied 
> multiple
> times as necessary.  This allows us to write simple rules, mostly swaps 
> between neighboring operators, without worrying that we get the plan right in 
> one pass.
> For example, we might have a plan that looks like:  
> Load->Join->Filter->Foreach, and we want to optimize it to 
> Load->Foreach->Filter->Join.  With two simple
> rules (swap filter and join and swap foreach and filter), applied 
> iteratively, we can get from the initial to final plan, without needing to 
> understanding the
> big picture of the entire plan.
> 3) Add three cal

[jira] Updated: (PIG-697) Proposed improvements to pig's optimizer

2009-07-31 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-697?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-697:


Status: In Progress  (was: Patch Available)

> Proposed improvements to pig's optimizer
> 
>
> Key: PIG-697
> URL: https://issues.apache.org/jira/browse/PIG-697
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Alan Gates
>    Assignee: Santhosh Srinivasan
> Attachments: OptimizerPhase1.patch, OptimizerPhase1_part2.patch, 
> OptimizerPhase2.patch, OptimizerPhase3_parrt1-1.patch, 
> OptimizerPhase3_parrt1.patch, OptimizerPhase3_part2_3.patch, 
> OptimizerPhase4_part1-1.patch, OptimizerPhase4_part2.patch
>
>
> I propose the following changes to pig optimizer, plan, and operator 
> functionality to support more robust optimization:
> 1) Remove the required array from Rule.  This will change rules so that they 
> only match exact patterns instead of allowing missing elements in the pattern.
> This has the downside that if a given rule applies to two patterns (say 
> Load->Filter->Group, Load->Group) you have to write two rules.  But it has 
> the upside that
> the resulting rules know exactly what they are getting.  The original intent 
> of this was to reduce the number of rules that needed to be written.  But the
> resulting rules have do a lot of work to understand the operators they are 
> working with.  With exact matches only, each rule will know exactly the 
> operators it
> is working on and can apply the logic of shifting the operators around.  All 
> four of the existing rules set all entries of required to true, so removing 
> this
> will have no effect on them.
> 2) Change PlanOptimizer.optimize to iterate over the rules until there are no 
> conversions or a certain number of iterations has been reached.  Currently the
> function is:
> {code}
> public final void optimize() throws OptimizerException {
> RuleMatcher matcher = new RuleMatcher();
> for (Rule rule : mRules) {
> if (matcher.match(rule)) {
> // It matches the pattern.  Now check if the transformer
> // approves as well.
> List> matches = matcher.getAllMatches();
> for (List match:matches)
> {
>   if (rule.transformer.check(match)) {
>   // The transformer approves.
>   rule.transformer.transform(match);
>   }
> }
> }
> }
> }
> {code}
> It would change to be:
> {code}
> public final void optimize() throws OptimizerException {
> RuleMatcher matcher = new RuleMatcher();
> boolean sawMatch;
> int iterators = 0;
> do {
> sawMatch = false;
> for (Rule rule : mRules) {
> List> matches = matcher.getAllMatches();
> for (List match:matches) {
> // It matches the pattern.  Now check if the transformer
> // approves as well.
> if (rule.transformer.check(match)) {
> // The transformer approves.
> sawMatch = true;
> rule.transformer.transform(match);
> }
> }
> }
> // Not sure if 1000 is the right number of iterations, maybe it
> // should be configurable so that large scripts don't stop too 
> // early.
> } while (sawMatch && numIterations++ < 1000);
> }
> {code}
> The reason for limiting the number of iterations is to avoid infinite loops.  
> The reason for iterating over the rules is so that each rule can be applied 
> multiple
> times as necessary.  This allows us to write simple rules, mostly swaps 
> between neighboring operators, without worrying that we get the plan right in 
> one pass.
> For example, we might have a plan that looks like:  
> Load->Join->Filter->Foreach, and we want to optimize it to 
> Load->Foreach->Filter->Join.  With two simple
> rules (swap filter and join and swap foreach and filter), applied 
> iteratively, we can get from the initial to final plan, without needing to 
> understanding the
> big picture of the entire plan.
> 3) Add three calls to OperatorPlan:
> {code}
> /**
>  * Swap two operators in a plan.  Both of the operators must have single
>  * inputs and single outputs.
>  *

[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-30 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Status: In Progress  (was: Patch Available)

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880_1.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-30 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Status: Patch Available  (was: In Progress)

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880_1.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-30 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Attachment: PIG-880_1.patch

Attaching a new patch that fixes a couple of unit tests.

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880_1.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-30 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Attachment: (was: PIG-880.patch)

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880_1.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-898) TextDataParser does not handle delimiters from one complex type in another

2009-07-30 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737319#action_12737319
 ] 

Santhosh Srinivasan commented on PIG-898:
-

In addition, empty bags, tuples and constants and nulls are not handled.

> TextDataParser does not handle delimiters from one complex type in another
> --
>
> Key: PIG-898
> URL: https://issues.apache.org/jira/browse/PIG-898
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.4.0
>    Reporter: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.4.0
>
>
> Currently, TextDataParser does not handle delimiters of one complex type in 
> another. An example of such a case is key1(#value1} will not be parsed 
> correctly. The production for strings matches any sequence of character that 
> do not contain any delimiters for the complex types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-30 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Status: Patch Available  (was: In Progress)

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (PIG-880) Order by is borken with complex fields

2009-07-30 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on PIG-880 started by Santhosh Srinivasan.

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-30 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Status: Open  (was: Patch Available)

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-898) TextDataParser does not handle delimiters from one complex type in another

2009-07-29 Thread Santhosh Srinivasan (JIRA)
TextDataParser does not handle delimiters from one complex type in another
--

 Key: PIG-898
 URL: https://issues.apache.org/jira/browse/PIG-898
 Project: Pig
  Issue Type: Bug
  Components: impl
Affects Versions: 0.4.0
Reporter: Santhosh Srinivasan
Priority: Minor
 Fix For: 0.4.0


Currently, TextDataParser does not handle delimiters of one complex type in 
another. An example of such a case is key1(#value1} will not be parsed 
correctly. The production for strings matches any sequence of character that do 
not contain any delimiters for the complex types.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Status: Patch Available  (was: Open)

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-880) Order by is borken with complex fields

2009-07-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-880:


Attachment: PIG-880.patch

Attached patch creates maps with value type set to DataByteArray (i.e., 
bytearray) for text data parsed by PigStorage. This change is consistent with 
the language semantics of treating value type as bytearray. New test cases have 
been added.

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch, 
> PIG-880.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (PIG-880) Order by is borken with complex fields

2009-07-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan reassigned PIG-880:
---

Assignee: Santhosh Srinivasan

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (PIG-897) Pig should support counters

2009-07-29 Thread Santhosh Srinivasan (JIRA)
Pig should support counters
---

 Key: PIG-897
 URL: https://issues.apache.org/jira/browse/PIG-897
 Project: Pig
  Issue Type: New Feature
  Components: impl
Affects Versions: 0.4.0
Reporter: Santhosh Srinivasan
 Fix For: 0.4.0


Pig should support the use of counters. The use of the counters can possibly be 
via the script or via Java APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (PIG-889) Pig can not access reporter of PigHadoopLog in Load Func

2009-07-29 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan resolved PIG-889.
-

  Resolution: Won't Fix
Release Note: As per the discussion with Jeff, closing the bug as won't fix

> Pig can not access reporter of PigHadoopLog in Load Func
> 
>
> Key: PIG-889
> URL: https://issues.apache.org/jira/browse/PIG-889
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_889_Patch.txt
>
>
> I'd like to increment Counter in my own LoadFunc, but it will throw 
> NullPointerException. It seems that the reporter is not initialized.  
> I looked into this problem and find that it need to call 
> PigHadoopLogger.getInstance().setReporter(reporter) in PigInputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-889) Pig can not access reporter of PigHadoopLog in Load Func

2009-07-29 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736990#action_12736990
 ] 

Santhosh Srinivasan commented on PIG-889:
-

PigHadoopLogger implements the PigLogger interface. As part of the 
implementation it uses the Hadoop reporter for aggregating the warning messages.

> Pig can not access reporter of PigHadoopLog in Load Func
> 
>
> Key: PIG-889
> URL: https://issues.apache.org/jira/browse/PIG-889
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_889_Patch.txt
>
>
> I'd like to increment Counter in my own LoadFunc, but it will throw 
> NullPointerException. It seems that the reporter is not initialized.  
> I looked into this problem and find that it need to call 
> PigHadoopLogger.getInstance().setReporter(reporter) in PigInputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-882) log level not propogated to loggers

2009-07-28 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736359#action_12736359
 ] 

Santhosh Srinivasan commented on PIG-882:
-

Minor comment:

Index: src/org/apache/pig/Main.java
===

Instead of printing the warning message to stdout, it should be printed to 
stderr.

{code}
+catch (IOException e)
+{
+System.out.println("Warn: Cannot open log4j properties file, use 
default");
+}
{code}


The rest of the patch looks fine.

> log level not propogated to loggers 
> 
>
> Key: PIG-882
> URL: https://issues.apache.org/jira/browse/PIG-882
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Reporter: Thejas M Nair
> Attachments: PIG-882-1.patch, PIG-882-2.patch
>
>
> Pig accepts log level as a parameter. But the log level it captures is not 
> set appropriately, so that loggers in different classes log at the specified 
> level.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-660) Integration with Hadoop 0.20

2009-07-28 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736283#action_12736283
 ] 

Santhosh Srinivasan commented on PIG-660:
-

The build.xml in the patch(es) have the reference to hadoop20.jar. The missing 
part is the hadoop20.jar that Pig can use to build its sources. Pig cannot use 
the hadoop20.jar coming from the Hadoop release.

> Integration with Hadoop 0.20
> 
>
> Key: PIG-660
> URL: https://issues.apache.org/jira/browse/PIG-660
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
> Environment: Hadoop 0.20
>Reporter: Santhosh Srinivasan
>    Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-660.patch, PIG-660_1.patch, PIG-660_2.patch, 
> PIG-660_3.patch, PIG-660_4.patch, PIG-660_5.patch
>
>
> With Hadoop 0.20, it will be possible to query the status of each map and 
> reduce in a map reduce job. This will allow better error reporting. Some of 
> the other items that could be on Hadoop's feature requests/bugs are 
> documented here for tracking.
> 1. Hadoop should return objects instead of strings when exceptions are thrown
> 2. The JobControl should handle all exceptions and report them appropriately. 
> For example, when the JobControl fails to launch jobs, it should handle 
> exceptions appropriately and should support APIs that query this state, i.e., 
> failure to launch jobs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-889) Pig can not access reporter of PigHadoopLog in Load Func

2009-07-28 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12736201#action_12736201
 ] 

Santhosh Srinivasan commented on PIG-889:
-

The PigHadoopLogger implements the PigLogger interface. The only supported 
method for this interface is warn(). Supporting counters as part of Pig will 
involve part of what you suggest. While your implementation extends 
PigHadoopLogger, it is not generic to support counters in Pig. Other load 
functions will have to use direct references to PigHadoopLogger which is not 
the correct way of accessing and updating counters. Pig needs to extend the 
load function interface (and store function interface?) to allow access to 
counters.

Summary: Pig needs to support counters and its a slightly bigger topic. 
Extending functionality of existing classes that are meant for a different 
reason will make support difficult in the future.

If you agree, we can mark this issue as invalid and open a new jira that will 
capture requirements for supporting counters in Pig?

> Pig can not access reporter of PigHadoopLog in Load Func
> 
>
> Key: PIG-889
> URL: https://issues.apache.org/jira/browse/PIG-889
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_889_Patch.txt
>
>
> I'd like to increment Counter in my own LoadFunc, but it will throw 
> NullPointerException. It seems that the reporter is not initialized.  
> I looked into this problem and find that it need to call 
> PigHadoopLogger.getInstance().setReporter(reporter) in PigInputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-889) Pig can not access reporter of PigHadoopLog in Load Func

2009-07-24 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12735129#action_12735129
 ] 

Santhosh Srinivasan commented on PIG-889:
-

The issue here is the lack of support for counters within Pig.

The intention of warn method in the PigLogger interface was to allow sources 
within Pig and UDFs  for warning aggregation. Your use of the reporter within 
the logger is not supported. An implementation detail prevents the correct use 
of this interface for load functions. The Hadoop reporter object is provided in 
the getRecordReader, map and reduce calls. For load functions, Pig provides an 
interface and for UDFs, an abstract class. As a result, the logger instance 
cannot be initialized in the loaders till we decide to add a method to support 
it. 

Will having the code from PigMapBase.map()  in 
PigInputFormat.java.getRecordReader work for you? 

{code}
PigHadoopLogger pigHadoopLogger = PigHadoopLogger.getInstance();
pigHadoopLogger.setAggregate(aggregateWarning);
pigHadoopLogger.setReporter(reporter);
PhysicalOperator.setPigLogger(pigHadoopLogger);
{code}

Note that this is a workaround for your situation. I would highly recommend 
that you move to the use of counters when they are supported.

> Pig can not access reporter of PigHadoopLog in Load Func
> 
>
> Key: PIG-889
> URL: https://issues.apache.org/jira/browse/PIG-889
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_889_Patch.txt
>
>
> I'd like to increment Counter in my own LoadFunc, but it will throw 
> NullPointerException. It seems that the reporter is not initialized.  
> I looked into this problem and find that it need to call 
> PigHadoopLogger.getInstance().setReporter(reporter) in PigInputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-773) Empty complex constants (empty bag, empty tuple and empty map) should be supported

2009-07-23 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-773?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-773:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch has been committed. Thanks for the fix Ashutosh.

> Empty complex constants (empty bag, empty tuple and empty map) should be 
> supported
> --
>
> Key: PIG-773
> URL: https://issues.apache.org/jira/browse/PIG-773
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: pig-773.patch, pig-773_v2.patch, pig-773_v3.patch, 
> pig-773_v4.patch, pig-773_v5.patch
>
>
> We should be able to create empty bag constant using {}, empty tuple constant 
> using (), empty map constant using [] within a pig script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-773) Empty complex constants (empty bag, empty tuple and empty map) should be supported

2009-07-23 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734810#action_12734810
 ] 

Santhosh Srinivasan commented on PIG-773:
-

+ 1 for the changes.

> Empty complex constants (empty bag, empty tuple and empty map) should be 
> supported
> --
>
> Key: PIG-773
> URL: https://issues.apache.org/jira/browse/PIG-773
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Pradeep Kamath
>Assignee: Ashutosh Chauhan
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: pig-773.patch, pig-773_v2.patch, pig-773_v3.patch, 
> pig-773_v4.patch, pig-773_v5.patch
>
>
> We should be able to create empty bag constant using {}, empty tuple constant 
> using (), empty map constant using [] within a pig script

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-892) Make COUNT and AVG deal with nulls accordingly with SQL standar

2009-07-23 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734806#action_12734806
 ] 

Santhosh Srinivasan commented on PIG-892:
-

1. Index: src/org/apache/pig/builtin/FloatAvg.java
===

The size of 't' is not checked before t.get(0) in the method count


{code}
+if (t != null && t.get(0) != null)
+cnt++;
+}
{code}

2. Index: src/org/apache/pig/builtin/IntAvg.java
===

Same comment as FloatAvg.java

3. Index: src/org/apache/pig/builtin/DoubleAvg.java
===

Same comment as FloatAvg.java

4. Index: src/org/apache/pig/builtin/AVG.java
===

Same comment as FloatAvg.java

5. Index: src/org/apache/pig/builtin/LongAvg.java
===

Same comment as FloatAvg.java


6. Index: src/org/apache/pig/builtin/COUNT_STAR.java
===

I am not sure about the naming convention here. None of the built-in functions 
have a special character in the class name. COUNTSTAR would be better than 
COUNT_STAR.


> Make COUNT and AVG deal with nulls accordingly with SQL standar
> ---
>
> Key: PIG-892
> URL: https://issues.apache.org/jira/browse/PIG-892
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: 0.4.0
>
> Attachments: PIG-892.patch, PIG-892_v2.patch
>
>
> both COUNT and AVG need to ignore nulls. Also add COUNT_STAR to match 
> COUNT(*) in SQL

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-892) Make COUNT and AVG deal with nulls accordingly with SQL standar

2009-07-22 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12734417#action_12734417
 ] 

Santhosh Srinivasan commented on PIG-892:
-

I am reviewing the patch.

> Make COUNT and AVG deal with nulls accordingly with SQL standar
> ---
>
> Key: PIG-892
> URL: https://issues.apache.org/jira/browse/PIG-892
> Project: Pig
>  Issue Type: Improvement
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
>Assignee: Olga Natkovich
> Fix For: 0.4.0
>
> Attachments: PIG-892.patch, PIG-892_v2.patch
>
>
> both COUNT and AVG need to ignore nulls. Also add COUNT_STAR to match 
> COUNT(*) in SQL

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-889) Pig can not access reporter of PigHadoopLog in Load Func

2009-07-21 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733825#action_12733825
 ] 

Santhosh Srinivasan commented on PIG-889:
-

Comments:

The reporter inside the logger is setup correctly in PigInputFormat for Hadoop. 
However the usage of the logger to retrieve the reporter and then increment 
counters is flawed for the following reasons:

1. In the test case, the new loader uses PigHadoopLogger directly. When the 
loader is used in local mode, the notion of Hadoop disappears and the reference 
to PigHadoopLogger is not usable (i.e., will result in a NullPointerException).

{code}
+   @Override
+   public Tuple getNext() throws IOException {
+   PigHadoopLogger.getInstance().getReporter().incrCounter(
+   MyCounter.TupleCounter, 1);
+   return super.getNext();
+   }
{code}

2. The loggers were meant for warning aggregations. Here, there is a case being 
made to expand the capabilities to allow user defined counter aggregations. If 
thats the case, then new methods have to be added to the PigLogger interface.

> Pig can not access reporter of PigHadoopLog in Load Func
> 
>
> Key: PIG-889
> URL: https://issues.apache.org/jira/browse/PIG-889
> Project: Pig
>  Issue Type: Improvement
>  Components: impl
>Affects Versions: 0.4.0
>Reporter: Jeff Zhang
>Assignee: Jeff Zhang
> Fix For: 0.4.0
>
> Attachments: Pig_889_Patch.txt
>
>
> I'd like to increment Counter in my own LoadFunc, but it will throw 
> NullPointerException. It seems that the reporter is not initialized.  
> I looked into this problem and find that it need to call 
> PigHadoopLogger.getInstance().setReporter(reporter) in PigInputFormat.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-893) support cast of chararray to other simple types

2009-07-21 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-893?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733773#action_12733773
 ] 

Santhosh Srinivasan commented on PIG-893:
-

What are the semantics of chararray (string) to numeric types? 

Pig does not support conversion of any non-bytearray type to bytearray. The 
proposal in the jira description is minimalistic. Does it match with that of 
SQL? 

Without clear articulation about what these things mean, we cannot/should not 
support chararray to numeric type conversions. PiggyBank already supports UDFs 
that convert strings to int, double, etc.

It's a nice to have, as part of the language but its better positioned as a 
UDF. If clear semantics are laid out then making it part of the language will 
be a matter of consensus.

> support cast of chararray to other simple types
> ---
>
> Key: PIG-893
> URL: https://issues.apache.org/jira/browse/PIG-893
> Project: Pig
>  Issue Type: New Feature
>Reporter: Thejas M Nair
>
> Pig should support casting of chararray to 
> integer,long,float,double,bytearray. If the conversion fails for reasons such 
> as overflow, cast should return null and log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-695) Pig should not fail when error logs cannot be created

2009-07-21 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-695:


Fix Version/s: 0.4.0

> Pig should not fail when error logs cannot be created
> -
>
> Key: PIG-695
> URL: https://issues.apache.org/jira/browse/PIG-695
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Fix For: 0.4.0
>
> Attachments: PIG-695.patch
>
>
> Currently, PIG validates the log file location and fails/exits when the log 
> file cannot be created. Instead, it should print a warning and continue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-695) Pig should not fail when error logs cannot be created

2009-07-21 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-695:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Patch has been committed.

> Pig should not fail when error logs cannot be created
> -
>
> Key: PIG-695
> URL: https://issues.apache.org/jira/browse/PIG-695
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Attachments: PIG-695.patch
>
>
> Currently, PIG validates the log file location and fails/exits when the log 
> file cannot be created. Instead, it should print a warning and continue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-695) Pig should not fail when error logs cannot be created

2009-07-20 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733416#action_12733416
 ] 

Santhosh Srinivasan commented on PIG-695:
-

There are no unit tests added for this fix as this is part of the testing Main. 
Currently there are no unit tests for Main.

> Pig should not fail when error logs cannot be created
> -
>
> Key: PIG-695
> URL: https://issues.apache.org/jira/browse/PIG-695
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Attachments: PIG-695.patch
>
>
> Currently, PIG validates the log file location and fails/exits when the log 
> file cannot be created. Instead, it should print a warning and continue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-880) Order by is borken with complex fields

2009-07-19 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12733018#action_12733018
 ] 

Santhosh Srinivasan commented on PIG-880:
-

Review Comment:

PigStorage should read map values as strings instead of interpreting the types. 
This way, integers that are too long to fit into Integer, etc. will still be 
interpreted as bytearray.

> Order by is borken with complex fields
> --
>
> Key: PIG-880
> URL: https://issues.apache.org/jira/browse/PIG-880
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>Reporter: Olga Natkovich
> Fix For: 0.4.0
>
> Attachments: PIG-880-bytearray-mapvalue-code-without-tests.patch
>
>
> Pig script:
> a = load 'studentcomplextab10k' as (smap:map[],c2,c3);
> f = foreach a generate smap#'name, smap#'age', smap#'gpa' ;
> s = order f by $0;   
> store s into 'sc.out' 
> Stack:
> Caused by: java.lang.ArrayStoreException
> at java.lang.System.arraycopy(Native Method)
> at java.util.Arrays.copyOf(Arrays.java:2763)
> at java.util.ArrayList.toArray(ArrayList.java:305)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.convertToArray(WeightedRangePartitioner.java:154)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.partitioners.WeightedRangePartitioner.configure(WeightedRangePartitioner.java:96)
> ... 5 more
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getErrorMessages(Launcher.java:230)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.Launcher.getStats(Launcher.java:179)
> at 
> org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.MapReduceLauncher.launchPig(MapReduceLauncher.java:204)
> at 
> org.apache.pig.backend.hadoop.executionengine.HExecutionEngine.execute(HExecutionEngine.java:265)
> at 
> org.apache.pig.PigServer.executeCompiledLogicalPlan(PigServer.java:769)
> at org.apache.pig.PigServer.execute(PigServer.java:762)
> at org.apache.pig.PigServer.access$100(PigServer.java:91)
> at org.apache.pig.PigServer$Graph.execute(PigServer.java:933)
> at org.apache.pig.PigServer.executeBatch(PigServer.java:245)
> at 
> org.apache.pig.tools.grunt.GruntParser.executeBatch(GruntParser.java:112)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:168)
> at 
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:140)
> at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:88)
> at org.apache.pig.Main.main(Main.java:389)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work stopped: (PIG-695) Pig should not fail when error logs cannot be created

2009-07-17 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on PIG-695 stopped by Santhosh Srinivasan.

> Pig should not fail when error logs cannot be created
> -
>
> Key: PIG-695
> URL: https://issues.apache.org/jira/browse/PIG-695
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Attachments: PIG-695.patch
>
>
> Currently, PIG validates the log file location and fails/exits when the log 
> file cannot be created. Instead, it should print a warning and continue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-695) Pig should not fail when error logs cannot be created

2009-07-17 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-695:


Status: Patch Available  (was: Open)

> Pig should not fail when error logs cannot be created
> -
>
> Key: PIG-695
> URL: https://issues.apache.org/jira/browse/PIG-695
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Attachments: PIG-695.patch
>
>
> Currently, PIG validates the log file location and fails/exits when the log 
> file cannot be created. Instead, it should print a warning and continue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Work started: (PIG-695) Pig should not fail when error logs cannot be created

2009-07-17 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on PIG-695 started by Santhosh Srinivasan.

> Pig should not fail when error logs cannot be created
> -
>
> Key: PIG-695
> URL: https://issues.apache.org/jira/browse/PIG-695
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Attachments: PIG-695.patch
>
>
> Currently, PIG validates the log file location and fails/exits when the log 
> file cannot be created. Instead, it should print a warning and continue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-695) Pig should not fail when error logs cannot be created

2009-07-17 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-695?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-695:


Attachment: PIG-695.patch

Attached patch ensures that Pig does not error out when the error log file is 
not writable. 

> Pig should not fail when error logs cannot be created
> -
>
> Key: PIG-695
> URL: https://issues.apache.org/jira/browse/PIG-695
> Project: Pig
>  Issue Type: Bug
>  Components: impl
>Affects Versions: 0.2.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
> Attachments: PIG-695.patch
>
>
> Currently, PIG validates the log file location and fails/exits when the log 
> file cannot be created. Instead, it should print a warning and continue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-728) All backend error messages must be logged to preserve the original error messages

2009-07-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-728:


  Resolution: Fixed
Hadoop Flags: [Reviewed]
  Status: Resolved  (was: Patch Available)

Issue has been resolved.

> All backend error messages must be logged to preserve the original error 
> messages
> -
>
> Key: PIG-728
> URL: https://issues.apache.org/jira/browse/PIG-728
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: PIG-728_1.patch
>
>
> The current error handling framework logs backend error messages only when 
> Pig is not able to parse the error message. Instead, Pig should log the 
> backend error message irrespective of Pig's ability to parse backend error 
> messages. On a side note, the use of instantiateFuncFromSpec in Launcher.java 
> is not consistent and should avoid the use of class_name + "(" + 
> string_constructor_args + ")".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (PIG-728) All backend error messages must be logged to preserve the original error messages

2009-07-16 Thread Santhosh Srinivasan (JIRA)

[ 
https://issues.apache.org/jira/browse/PIG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12732281#action_12732281
 ] 

Santhosh Srinivasan commented on PIG-728:
-

Patch has been committed.

> All backend error messages must be logged to preserve the original error 
> messages
> -
>
> Key: PIG-728
> URL: https://issues.apache.org/jira/browse/PIG-728
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: PIG-728_1.patch
>
>
> The current error handling framework logs backend error messages only when 
> Pig is not able to parse the error message. Instead, Pig should log the 
> backend error message irrespective of Pig's ability to parse backend error 
> messages. On a side note, the use of instantiateFuncFromSpec in Launcher.java 
> is not consistent and should avoid the use of class_name + "(" + 
> string_constructor_args + ")".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-728) All backend error messages must be logged to preserve the original error messages

2009-07-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-728:


Attachment: PIG-728_1.patch

Attaching a new patch that fixes the findbugs issue.

> All backend error messages must be logged to preserve the original error 
> messages
> -
>
> Key: PIG-728
> URL: https://issues.apache.org/jira/browse/PIG-728
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: PIG-728_1.patch
>
>
> The current error handling framework logs backend error messages only when 
> Pig is not able to parse the error message. Instead, Pig should log the 
> backend error message irrespective of Pig's ability to parse backend error 
> messages. On a side note, the use of instantiateFuncFromSpec in Launcher.java 
> is not consistent and should avoid the use of class_name + "(" + 
> string_constructor_args + ")".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-728) All backend error messages must be logged to preserve the original error messages

2009-07-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-728:


Status: Patch Available  (was: In Progress)

> All backend error messages must be logged to preserve the original error 
> messages
> -
>
> Key: PIG-728
> URL: https://issues.apache.org/jira/browse/PIG-728
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.4.0
>
> Attachments: PIG-728_1.patch
>
>
> The current error handling framework logs backend error messages only when 
> Pig is not able to parse the error message. Instead, Pig should log the 
> backend error message irrespective of Pig's ability to parse backend error 
> messages. On a side note, the use of instantiateFuncFromSpec in Launcher.java 
> is not consistent and should avoid the use of class_name + "(" + 
> string_constructor_args + ")".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (PIG-728) All backend error messages must be logged to preserve the original error messages

2009-07-16 Thread Santhosh Srinivasan (JIRA)

 [ 
https://issues.apache.org/jira/browse/PIG-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Santhosh Srinivasan updated PIG-728:


Attachment: (was: PIG-728.patch)

> All backend error messages must be logged to preserve the original error 
> messages
> -
>
> Key: PIG-728
> URL: https://issues.apache.org/jira/browse/PIG-728
> Project: Pig
>  Issue Type: Bug
>Affects Versions: 0.3.0
>    Reporter: Santhosh Srinivasan
>Assignee: Santhosh Srinivasan
>Priority: Minor
> Fix For: 0.4.0
>
>
> The current error handling framework logs backend error messages only when 
> Pig is not able to parse the error message. Instead, Pig should log the 
> backend error message irrespective of Pig's ability to parse backend error 
> messages. On a side note, the use of instantiateFuncFromSpec in Launcher.java 
> is not consistent and should avoid the use of class_name + "(" + 
> string_constructor_args + ")".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



  1   2   3   4   5   >