What is a relation?

2008-12-05 Thread Alan Gates
All, A question on types in pig. When you say: A = load 'myfile'; what exactly is A? For the moment let us call A a relation, since it is a set of records, and we can pass it to a relational operator, such as FILTER, ORDER, etc. To clarify the question, is a relation equivalent to a

Re: Pig Team now has two new committers!

2008-12-09 Thread Alan Gates
Congrats to both of you, an honor well earned. Alan. On Dec 9, 2008, at 8:51 AM, Olga Natkovich wrote: Hi, I am happy to announce that Hadoop PMC voted to make Pradeep Kamath and Santhosh Srinivasan Pig Committer to acknowledge their significant contribution to the project!

Re: Pig performance

2008-12-20 Thread Alan Gates
I left a comment on the blog addressing some of the issues he brought up. Alan. On Dec 20, 2008, at 1:00 AM, Jeff Hammerbacher wrote: Hey Pig team, Did anyone check out the recent claims about Pig's poor performance versus Cascading? Though I haven't worked extensively with either

Re: Pig performance

2008-12-31 Thread Alan Gates
is paradoxically not the most current and that storing intermediate results can decrease the scope of optimizations. On Dec 20, 2008, at 10:16, Alan Gates ga...@yahoo-inc.com wrote: I left a comment on the blog addressing some of the issues he brought up. Alan. On Dec 20, 2008, at 1:00 AM

Re: Adaptive Query Optimization

2009-01-20 Thread Alan Gates
There is no concept of costing in pig at this point. Currently we let the script writer decide when to choose an FR Join over a symmetric hash join. We certainly welcome any work on an optimizer in pig. Be sure and take a look at https://issues.apache.org/jira/browse/PIG-360 where some

Re: switching to different parser in Pig

2009-02-23 Thread Alan Gates
We looked into antlr. It appears to be very similar to javacc, with the added feature that the java code it generates is humanly readable. That isn't why we want to switch off of javacc. Olga listed the 3 things we want out of a parser that javacc isn't giving us (lack of docs, no easy

Re: switching to different parser in Pig

2009-02-24 Thread Alan Gates
ASTs, with JavaCC you have the choice between JJTree and JTB. With SableCC, when I last looked at it, you only get the JTB-like option. -- On Mon, Feb 23, 2009 at 10:06 PM, Alan Gates ga...@yahoo-inc.com wrote: We looked into antlr. It appears to be very similar to javacc

Fwd: Core For Paper ---- Grid and Cloud Middleware Workshop, in conjunction with GCC2009

2009-03-03 Thread Alan Gates
Begin forwarded message: From: Yongqiang He heyongqi...@software.ict.ac.cn Date: March 1, 2009 10:18:03 PM PST To: core-u...@hadoop.apache.org core-u...@hadoop.apache.org, core-...@hadoop.apache.org core-...@hadoop.apache.org, hbase-u...@hadoop.apache.org hbase-u...@hadoop.apache.org ,

Re: scope string in OperatorKey

2009-03-11 Thread Alan Gates
The purpose of the scope string is to allow us to have multiple sessions of pig running and distinguish the operators. It's one of those things that was put in before an actual requirement, so whether it will prove useful or not remains to be seen. As for removing it from explain, is it

Re: [VOTE] Release Pig 1.0.0 (candidate 0)

2009-03-20 Thread Alan Gates
README.txt still has the incubator text in it. This needs to be removed. I'll roll a new package and call a new vote. Alan. On Mar 17, 2009, at 3:21 PM, Olga Natkovich wrote: Pig Committers, I have created a candidate build for Pig 1.0.0. This release represents a major rewrite of Pig

Re: [VOTE] Release Pig 1.0.0 (candidate 0)

2009-03-23 Thread Alan Gates
To address Santhosh's concerns that 1.0 is not an appropriate release number, I propose that we release the same code under the name 0.2.0. Alan. On Mar 20, 2009, at 11:54 AM, Santhosh Srinivasan wrote: -1 on the 1.0.0 release. IMHO, Pig is relatively stable but not quite there. I would

Re: [VOTE] Release Pig 0.2.0 (candidate 2)

2009-04-01 Thread Alan Gates
From: Santhosh Srinivasan [...@yahoo-inc.com] Sent: Tuesday, March 31, 2009 10:50 AM To: pig-dev@hadoop.apache.org Subject: RE: [VOTE] Release Pig 0.2.0 (candidate 2) +1 Santhosh -Original Message- From: Alan Gates [mailto:ga...@yahoo-inc.com] Sent: Friday, March 27

Re: Ajax library for Pig

2009-04-08 Thread Alan Gates
Sorry if these are silly questions, but I'm not very familiar with some of these technologies. So what you propose is that Pig would be installed on some dedicated server machine and a web server would be placed in front of it. Then client libraries would be developed that made calls to

Pig release 0.2.0

2009-04-09 Thread Alan Gates
The Pig team is happy to announce Pig 0.2.0 has been released. This release includes the addition of a types, better error detection and handling, and 5x performance improvement over 0.1.1. The details of the release can be found at http://hadoop.apache.org/pig/releases.html . Pig is a

Re: Ajax library for Pig

2009-04-14 Thread Alan Gates
calls ( i.e async call from server to browser an inbuilt feature in DWR). DWR is under Apache Licence V2. --nitesh On Wed, Apr 8, 2009 at 9:11 PM, Alan Gates ga...@yahoo-inc.com wrote: Sorry if these are silly questions, but I'm not very familiar with some of these technologies. So what

Re: [Pig Wiki] Update of HowToContribute by AlanGates

2009-04-16 Thread Alan Gates
At this point these are all proposed, none are yet realized. So there is no code for any of them. The place to track these proposals are in the referenced JIRAs. Alan. On Apr 15, 2009, at 6:44 PM, zhang jianfeng wrote: Hi Alan, Thank you for your guideline. So where's code of these

Re: [Pig Wiki] Update of ProposedProjects by AlanGates

2009-04-16 Thread Alan Gates
Your understanding of the proposal is correct. The goal would be to produce Java code rather than a pipeline configuration. But the reasoning is not so that users can then take that and modify themselves. There's nothing preventing them from doing it, but it has a couple of major

A proposal for changing pig's memory management

2009-05-14 Thread Alan Gates
http://wiki.apache.org/pig/PigMemory Alan.

Re: A proposal for changing pig's memory management

2009-05-19 Thread Alan Gates
are there to provide? Wouldn't a virtual tuple type that was nothing more than a byte buffer, type and an offset do almost all of what is proposed here? On Thu, May 14, 2009 at 5:33 PM, Alan Gates ga...@yahoo-inc.com wrote: http://wiki.apache.org/pig/PigMemory Alan.

Re: A proposal for changing pig's memory management

2009-05-19 Thread Alan Gates
similar (treating chararray the same way as bytearray) until String operations need to be done. Thanks, Thejas On 5/14/09 5:33 PM, Alan Gates ga...@yahoo-inc.com wrote: http://wiki.apache.org/pig/PigMemory Alan.

Re: A proposal for changing pig's memory management

2009-05-20 Thread Alan Gates
On May 19, 2009, at 10:30 PM, Mridul Muralidharan wrote: I am still not very convinced about the value about this implementation - particularly considering the advances made since 1.3 in memory allocators and garbage collection. My fundamental concern is not with the slowness of garbage

Re: UDF with parameters?

2009-05-22 Thread Alan Gates
Yes, it is possible. The UDF should take the percentage you want as a constructor argument. It will have to be passed as a string and converted. Then in your Pig Latin, you will use the DEFINE statement to pass the argument to the constructor. REGISTER /src/myfunc.jar DEFINE percentile

Proposed design for new merge join in pig

2009-05-28 Thread Alan Gates
http://wiki.apache.org/pig/PigMergeJoin Alan.

Updated PigMix numbers for latest top of trunk

2009-05-28 Thread Alan Gates
http://wiki.apache.org/pig/PigMix Alan.

Re: PigPen Source

2009-06-15 Thread Alan Gates
It has not yet been integrated into contrib because it requires the eclipse libraries to build, and those weren't integrated. The ivy stuff used by pig's build should be configured to pick up the appropriate eclipse jars so that this can be added to contrib. Alan. On Jun 15, 2009, at

Re: Rewire and multi-query load/store optimization

2009-06-16 Thread Alan Gates
+1 on option one. The use of store-load was only to overcome a temporary problem in Pig. We've fixed the problem, so let's not propagate it. We will need to document this very clearly (maybe even to the point of issuing warnings in the parser when we see this combo) so users understand

Re: [VOTE] Release Pig 0.3.0 (candidate 0)

2009-06-22 Thread Alan Gates
Downloaded, ran, ran tutorial, built piggybank. All looks good. +1 Alan. On Jun 18, 2009, at 12:30 PM, Olga Natkovich wrote: Hi, I created a candidate build for Pig 0.3.0 release. The main feature of this release is support for multiquery which allows to share computation across

Re: asking for comments on benchmark queries

2009-06-23 Thread Alan Gates
Zheng, I don't think you're subscribed to pig-dev (your emails have been bouncing to the moderator). So I've cc'd you explicitly on this. I don't think we need a Pig JIRA, it's probably easier if we all work on the hive one. I'll post my comments on the various scripts to that bug.

Re: requirements for Pig 1.0?

2009-06-24 Thread Alan Gates
Jurney wrote: For 1.0 - complete Owl? http://wiki.apache.org/pig/Metadata Russell Jurney rjur...@cloudstenography.com On Jun 23, 2009, at 4:40 PM, Alan Gates wrote: I don't believe there's a solid list of want to haves for 1.0. The big issue I see is that there are too many interfaces

Re: requirements for Pig 1.0?

2009-06-24 Thread Alan Gates
To be clear, going to 1.0 is not about having a certain set of features. It is about stability and usability. When a project declares itself 1.0 it is making some guarantees regarding the stability of its interfaces (in Pig's case this is Pig Latin, UDFs, and command line usage). It is

Re: Is it a bug ?

2009-07-23 Thread Alan Gates
It looks wrong to me, but I don't have a deep understanding of that code. Alan. On Jul 15, 2009, at 6:03 PM, zhang jianfeng wrote: Hi all, Today, when I read the source code, I find a piece of suspicious code: (PigServer.java Line 1047) graph.ignoreNumStores =

Re: Pig 0.4.0 release

2009-08-18 Thread Alan Gates
Non-committers certainly get a vote, it just isn't binding. I agree on PIG-925 as a blocker. I don't see PIG-859 as a blocker since there is a simple work around. If we want to release 0.4.0 within a week or so, dynamic shims won't be an option because we won't be able to solve the

Re: Pig 0.4.0 release

2009-08-18 Thread Alan Gates
On Aug 18, 2009, at 10:05 AM, Dmitriy Ryaboy wrote: I am about to submit a cleaned up patch for 924. It works fine as a static patch (in fact I can attach it to 660 as well) -- compiling with -Dhadoop.version=XX works as proposed for the static shims. It does the necessary prep for the code to

Re: questions about integration of pig and HBase

2009-09-09 Thread Alan Gates
Alan Gates a écrit : Pig supports reading from Hbase (in Hadoop/Hbase 0.18 only). Hello, Do you have any link to the documentation about how to do that? I can't find any example... Thanks,

Re: Request for feedback: cost-based optimizer

2009-09-11 Thread Alan Gates
This is a good start at adding a cost based optimizer to Pig. I have a number of comments: 1) Your argument for putting it in the physical layer rather than the logical is that the logical layer does not know physical statistics. This need not be true. You suggest adding a getStatistics

Re: [VOTE] Release Pig 0.4.0 (candidate 0)

2009-09-16 Thread Alan Gates
When I run this against a Hadoop 0.18.3 instance I can do DFS operations, but MR operations fail with: Error message from job controller - java.lang.AbstractMethodError: org.apache.xerces.dom.DocumentImpl.getXmlStandalone()Z at com .sun .org

Re: [VOTE] Release Pig 0.4.0 (candidate 0)

2009-09-16 Thread Alan Gates
/Checkin_2.pig it fails with the stack given earlier. Alan. On Sep 16, 2009, at 12:46 PM, Olga Natkovich wrote: Alan, I tried the jar packaged in the release and I am able to successfully run tests. Could you give it another try? Thanks, Olga -Original Message- From: Alan Gates

Re: [VOTE] Release Pig 0.4.0 (candidate 1)

2009-09-17 Thread Alan Gates
Now the code won't build because there's no hadoop jar in the lib directory. Alan. On Sep 17, 2009, at 12:09 PM, Olga Natkovich wrote: Hi, I have fixed the issue causing the failure that Alan reported. Please test the new release: http://people.apache.org/~olga/pig-0.4.0-candidate-1/.

Re: Revisit Pig Philosophy?

2009-09-21 Thread Alan Gates
I agree with Milind that we should move to saying that Pig Latin is a data flow language independent of any particular platform, while the current implementation of Pig is tied to Hadoop. I'm not sure how thin that implementation will be, but I'm in favor of making it thin where possible

Re: [VOTE] Release Pig 0.4.0 (candidate 2)

2009-09-22 Thread Alan Gates
private is the pmc list. Releases need pmc votes, hence we send to private. Alan. On Sep 21, 2009, at 7:46 PM, Milind A Bhandarkar wrote: Unrelated to the message content: why is there a priv...@hadoop.apache.org on the cc here? Is this even a valid alias? An open source project needs to

Re: [VOTE] Release Pig 0.4.0 (candidate 2)

2009-09-22 Thread Alan Gates
+1. Tested local mode and tutorial on my mac. Tested hadoop mode on linux. Alan. On Sep 21, 2009, at 5:54 PM, Olga Natkovich wrote: Hi, The new version is available in http://people.apache.org/~olga/pig-0.4.0-candidate-2/. I see one failure in a unit test in piggybank (contrib.) but it

Re: High(er) res Pig logo?

2009-09-28 Thread Alan Gates
I have a couple of higher resolution pigs in overalls and a pig on the Hadoop elephant. I've checked them into src/docs/src/documentation/ resources/images/ so all can use them. Also, we're working on cleaning up the Pig with Y! logo issue. Alan. On Sep 27, 2009, at 9:59 AM, Dmitriy Ryaboy

Re: LocalRearrange out of bounds exception - tips for debugging?

2009-10-13 Thread Alan Gates
Have you checked that each record your input data has at least the number of fields you specify? Have you checked that the field separator in your data matches the default for PigPerformanceLoader (^A I think)? Alan. On Oct 13, 2009, at 10:28 AM, Dmitriy Ryaboy wrote: We ran into what

Hudson testing of patches

2009-10-22 Thread Alan Gates
We've had many questions on this, so I'm sending this to everyone on the dev list in hopes of clarifying the situation. Our Hudson setup for testing patches is falsely returning failures on all or most unit tests for all patches. So if you submit a patch and all the unit tests fail,

Re: [VOTE] Release Pig 0.5.0 (candidate 0)

2009-10-26 Thread Alan Gates
+1 On my laptop (mac) ran tutorial in both local and hadoop modes, ran a join/group/sort/limit script in both local and hadoop modes, did build of pig and contrib. On linux box did build of both pig and contrib, ran a join/group/sort/ limit script in both local and hadoop modes. Alan. On

Re: LoadFunc.skipNext() function for faster sampling ?

2009-11-03 Thread Alan Gates
We definitely want to avoid parsing every tuple when sampling. But do we need to implement a special function for it? Pig will have access to the InputFormat instance, correct? Can it not call InputFormat.getNext the desired number of times (which will not parse the tuple) and then call

Re: [VOTE] Branch for Pig 0.6.0 release

2009-11-09 Thread Alan Gates
+1. In addition to the new features we've added, our change to use Hadoop's LineRecordReader brought Pig to parity with Hadoop in the PigMix tests, about a 30% average performance improvement. This should be huge for our users. Alan. On Nov 9, 2009, at 12:26 PM, Olga Natkovich wrote:

Re: package org.apache.hadoop.zebra.parse missing

2009-11-11 Thread Alan Gates
The parser package is generated as part of the build. Doing invoking ant in the contrib/zebra directory should result in the parser package being created at ./src-gen/org/apache/hadoop/zebra/parser Alan. On Nov 11, 2009, at 12:54 AM, Min Zhou wrote: Hi guys, I checked out pig from

Re: FYI - forking TFile off Hadoop into Zebra

2009-11-13 Thread Alan Gates
On Nov 11, 2009, at 4:13 PM, Ashutosh Chauhan wrote: On Wed, Nov 11, 2009 at 18:26, Chao Wang ch...@yahoo-inc.com wrote: Last, we would like to point out that this is a short term solution for Zebra and we plan to: 1) port all changes to Zebra TFile back into Hadoop TFile. 2) in the long

Re: optimizer hints in Pig

2009-11-16 Thread Alan Gates
In general I think optimizer hints fit well with Pig's approach to data processing, as expressed in our philosophic statement that Pigs are domestic animals (see http://hadoop.apache.org/pig/ philosophy.html ). At least in the examples you give, I don't see 'with' as binding. The user is

Welcome Jeff Zhang

2009-11-19 Thread Alan Gates
All, I would like to welcome Jeff Zhang as our newest Pig committer. Jeff has been contributing to Pig for about nine months now. He's been active on the mailing lists, in contributing patches, and in helping other users with their patches. Congratulations Jeff, and thanks for your

Yahoo is hiring for Hadoop development

2009-11-20 Thread Alan Gates
All, Yahoo has a number of Hadoop development positions open. There are engineering, architect, management, and QA positions all open. See http://developer.yahoo.net/blogs/hadoop/2009/11/updated_do_you_have_what_it_ta.html for details. Alan.

Re: TPC-H benchmark

2009-11-23 Thread Alan Gates
I don't know of any. Officially Pig cannot publish a TPC-H number because it is not a transaction based store. But I still think it would be very interesting to see the results if someone took the time to translate the queries. Alan. On Nov 22, 2009, at 6:20 PM, RichardGUO Fei wrote:

Re: Why we name it zebra ?

2009-11-30 Thread Alan Gates
On Nov 26, 2009, at 7:39 AM, Jeff Zhang wrote: Hi all, I'd like to know where's the name zebra come from ? does it convey the meaning of this meta data system that the columnar storage format is like the lines on the zebra's skin. Pretty much, yes. We've fallen into the habit of giving

Re: Pig reading hive columnar rc tables

2009-11-30 Thread Alan Gates
On Nov 30, 2009, at 12:18 PM, Dmitriy Ryaboy wrote: That's awesome, I've been itching to do that but never got around to it.. Garrit, do you have any benchmarks on read speeds? I don't know about putting this in piggybank, as it carries with it pretty significant dependencies, increasing

Re: SQL in Pig?

2010-01-19 Thread Alan Gates
We are still actively working on adding SQL to Pig. We hope to have an updated patch posted to that JIRA in February or March. Alan. On Jan 18, 2010, at 4:15 PM, Michael Dalton wrote: Hi, What's the current status of SQL support in Pig? I looked at the JIRA (

Backward compatibility

2010-01-25 Thread Alan Gates
Over the last year the number of Pig users has grown, both in terms of absolute number and the number of different companies using it. However, it is going to be a little while yet before Pig reaches a maturity level that it can declare a 1.0 release and promise it won't break backward

Re: reading/writing HBase in Pig

2010-01-25 Thread Alan Gates
On Jan 18, 2010, at 10:14 PM, Michael Dalton wrote: I took a look at the load-store branch and that definitely seems like the right place to do this. So the right thing to do would be to just open up a JIRA and then post a patch against the load-store rewrite tree, correct? Yes. You

Begin a discussion about Pig as a top level project

2010-03-19 Thread Alan Gates
You have probably heard by now that there is a discussion going on in the Hadoop PMC as to whether a number of the subprojects (Hbase, Avro, Zookeeper, Hive, and Pig) should move out from under the Hadoop umbrella and become top level Apache projects (TLP). This discussion has picked up

JIRA Fix Version

2010-03-24 Thread Alan Gates
A reminder to Pig committers: When closing a JIRA issue as Resolved/ Fixed please make sure to set the Fix Version field. This helps our users know what versions they need to use to get fixes for their issues. And it helps release managers when they build releases to know what is and

Re: Begin a discussion about Pig as a top level project

2010-03-31 Thread Alan Gates
-1. I'm open to being persuaded that I'm wrong or my concerns can be addressed while still having Pig as a TLP. Alan. On Mar 19, 2010, at 10:59 AM, Alan Gates wrote: You have probably heard by now that there is a discussion going on in the Hadoop PMC as to whether a number

Re: Begin a discussion about Pig as a top level project

2010-04-05 Thread Alan Gates
unnecessary administrative and bureaucratic headaches. So my vote is also -1. -Dmitriy On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates ga...@yahoo-inc.com wrote: So far I haven't seen any feedback on this. Apache has asked the Hadoop PMC to submit input in April on whether some subprojects should

Re: Begin a discussion about Pig as a top level project

2010-04-05 Thread Alan Gates
becoming a TLP will only introduce unnecessary administrative and bureaucratic headaches. So my vote is also -1. -Dmitriy On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates ga...@yahoo-inc.com wrote: So far I haven't seen any feedback on this. Apache has asked the Hadoop PMC to submit input in April

Re: TypeCheckingVisitor and casting to less precise numeric types

2010-04-15 Thread Alan Gates
You are correct that all of these casts can be done. We omitted them explicitly because of what you said that we did not want to loose precision. We should be able to downcast when users ask explicitly for it, but we don't want to do this implicitly. Alan. On Mar 24, 2010, at 2:47 PM,

Re: Shouldn't hadoop18.jar be removed from lib of trunk?

2010-04-22 Thread Alan Gates
It should be removed. I filed https://issues.apache.org/jira/browse/PIG-1388 so we'll remember to remove it in 0.8. Alan. On Apr 21, 2010, at 10:24 PM, chaitanya krishna wrote: Hi, Since pig-trunk now supports hadoop-0.20 and as it already has hadoop20.jar, shouldn't the hadoop18.jar be

Re: Consider cleaning up backend code

2010-04-22 Thread Alan Gates
A couple of years ago we had this concept that Pig as is should be able to run on other backends (like say Dryad if it were open source). So we built this whole backend interface and (mostly) kept Hadoop specific objects out of the front end. Recently we have modified that stand and said

Re: When is the pig-0.7.0 and pig-0.8.0 scheduled to be released?

2010-04-23 Thread Alan Gates
We've already branched for 0.7, which means we're not putting any new features in there, just critical bug fixes. We're extensively testing it now and hope to release it soon. We don't have a date for 0.8 yet. Alan. On Apr 23, 2010, at 2:08 AM, chaitanya krishna wrote: Hi, Can someone

Re: [VOTE] Release Pig 0.7.0 (candidate 0)

2010-05-07 Thread Alan Gates
+1. Ran the tutorial and some simple smoke tests on my mac and on linux. Checked that the signature keys are good. Alan. On May 5, 2010, at 11:44 AM, Daniel Dai wrote: Hi, I have created a candidate build for Pig 0.7.0. A description of what is new and different is included in the

[Travel Assistance] - Applications Open for ApacheCon NA 2010

2010-05-17 Thread Alan Gates
The Travel Assistance Committee is now taking in applications for those wanting to attend ApacheCon North America (NA) 2010, which is taking place between the 1st and 5th November in Atlanta. The Travel Assistance Committee is looking for people who would like to be able to attend

Re: Code Repository

2010-05-21 Thread Alan Gates
http://wiki.apache.org/pig/HowToContribute Alan. On May 20, 2010, at 9:15 PM, Renato Marroquín Mogrovejo wrote: Hi, is there a PIG coding standard? or any type of documentation I could follow? Thanks. Renato M.

Re: About PigPen

2010-05-24 Thread Alan Gates
The one on the JIRA is more up to date. However, be aware that PigPen has not been updated since Pig 0.2 and does not work with new versions of Pig. Alan. On May 23, 2010, at 11:25 PM, Renato Marroquín Mogrovejo wrote: Hi, does anybody know which the PigPen release is? I found two

Re: does EvalFunc generate the entire bag always ?

2010-05-27 Thread Alan Gates
The default case is that a UDFs that take bags (such as COUNT, etc.) are handed the entire bag at once. In the case where all UDFs in a foreach implement the algebraic interface and the expression itself is algebraic than the combiner will be used, thus significantly limiting the size of

Hudson returning -1 on javadoc

2010-05-27 Thread Alan Gates
Since it's return from the hospital Hudson has been returning -1 on all patches submitted complaining about a broken javadoc tag. It turns out the bad tag snuck into the code whilst Hudson was away. I've checked in a fix, so Hudson should be happy again. Any patches that were flunked

Re: does EvalFunc generate the entire bag always ?

2010-06-02 Thread Alan Gates
... A practical application escapes me right now, But if I do C = foreach B{ C1 = MyUdf(B.bag_on_b); C2 = limit C1 5; } does it know to push limit in this case? On Thu, May 27, 2010 at 2:32 PM, Alan Gates ga...@yahoo-inc.com wrote: The default case is that a UDFs that take bags (such as COUNT

Re: algebraic optimization not invoked for filter following group?

2010-06-15 Thread Alan Gates
For at least simple cases what's in the pseduo code should work. I hope someday soon we can start using the new logical optimizer work (in the experimental package) to build rules for the MR optimizer (like this combiner stuff) as well, which should be much easier to code. But it will be

Re: SIZE() of relation

2010-06-15 Thread Alan Gates
There have been several requests for this. I'm not a fan of it, because it makes it too easy to forget that you're forcing a single reducer MR job to accomplish this. But I'm open to persuasion if everyone else disagrees. Alan. On Jun 11, 2010, at 7:27 PM, Russell Jurney wrote: This

Re: the last job in the mapreduce plan

2010-06-15 Thread Alan Gates
I've never seen a case where this happens. Is this a theoretical question or are you seeing this issue? Alan. On Jun 15, 2010, at 8:49 AM, Gang Luo wrote: Hi, Is it possible the last MapReduce job in the MR plan only loads something and stores it without any other processing in between?

Re: skew join in pig

2010-06-16 Thread Alan Gates
On Jun 16, 2010, at 8:36 AM, Gang Luo wrote: Hi, there is something confusing me in the skew join (http://wiki.apache.org/pig/PigSkewedJoinSpec ) 1. does the sampling job sample and build histogram on both tables, or just one table (in this case, which one) ? Just the left one. 2. the

Re: skew join in pig

2010-06-18 Thread Alan Gates
. It is much clear now. One more thing to ask about the third question is, how to allocate reducers to several hot keys? Hashing? Further, Pig doesn't divide the reducers into hot-key reducers and non-hot-key reducers, is it right? Thanks, -Gang - 原始邮件 发件人: Alan Gates ga...@yahoo-inc.com

Re: Bug in new logical optimizer framework?

2010-06-28 Thread Alan Gates
On Jun 28, 2010, at 12:36 AM, Swati Jain wrote: Thanks for the prompt reply. As you mentioned optimization is in its developing stage, does it mean optimization framework is not complete or only rules are in developing stage? In addition to that, I would really appreciate if you could give

Re: Avoiding serialization/de-serialization in pig

2010-06-30 Thread Alan Gates
On Jun 28, 2010, at 5:51 PM, Dmitriy Ryaboy wrote: For what it's worth, I saw very significant speed improvements (order of magnitude for wide tables with few projected columns) when I implemented (2) for our protocol buffer - based loaders. I have a feeling that propagating schemas when

Re: Add deepCopy in LogicalExpression

2010-07-13 Thread Alan Gates
How does deepCopy differ from clone? Alan. On Jul 12, 2010, at 11:19 PM, Swati Jain wrote: Hi, I am working on ticket PIG -1494 ( https://issues.apache.org/jira/browse/PIG-1494 ). While implementing this functionality (conversion of logical expression into CNF), I need to construct the

Notes from Pig contributor workshop

2010-07-13 Thread Alan Gates
optimizer framework in the MR optimizer. Alan Gates indicated that while he does not believe we should translate the entire set of MR optimizer visitors into the new framework until we've further tested the framework, this might be a good first test for the new optimizer in the MR optimizer

Announcing Howl development list

2010-07-20 Thread Alan Gates
On Jul 14, 2010, at 2:11 AM, Jeff Hammerbacher wrote: Hey, Thanks for writing up these notes, they're very useful. Pradeep Kamath gave a short presentation on Howl, the work he is leading to create a shared metadata system between Pig, Hive, and Map Reduce. Dmitriy noted that we need

Restarting discussion on Pig as a TLP

2010-08-16 Thread Alan Gates
Five months ago I started a discussion on whether Pig should become a top level project (TLP) at Apache instead of remaining a subproject of Hadoop (http://mail-archives.apache.org/mod_mbox/hadoop-pig-dev/201003.mbox/%3c006aea7c-8829-4788-ad7b-822396fa2...@yahoo-inc.com%3e ). At the time I

August Pig contributor workshop

2010-08-17 Thread Alan Gates
All, We will be holding the next Pig contributor workshop at Twitter on Wednesday, August 25 from 4-6. The tentative agenda is to discuss: Making Piggybank better Pig and Azkaban integration Plans for features in 0.9 An update on the Howl project Anyone contributing to or interested in

[VOTE] Pig to become a top level Apache project

2010-08-18 Thread Alan Gates
Earlier this week I began a discussion on Pig becoming a TLP (http://bit.ly/byD7L8 ). All of the received feedback was positive. So, let's have a formal vote. I propose we move Pig to a top level Apache project. I propose that the initial PMC of this project be the list of all currently

Re: August Pig contributor workshop

2010-08-18 Thread Alan Gates
Confirming Olga and I will be there. Alan. On Aug 18, 2010, at 4:45 PM, Dmitriy Ryaboy wrote: Hi folks, Please do RSVP so that we know how many people are coming. Thanks, -Dmitriy On Tue, Aug 17, 2010 at 4:04 PM, Alan Gates ga...@yahoo-inc.com wrote: All, We will be holding the next

Re: release notes in JIRA

2010-08-20 Thread Alan Gates
+1 Backloading documentation is error prone and leads to not getting documentation done. Alan. On Aug 20, 2010, at 4:11 PM, Olga Natkovich wrote: Guys, After spending the last couple of days collecting information for Pig 0.8.0 documentation, I would like to propose a change for our

Re: [VOTE] Pig to become a top level Apache project

2010-08-23 Thread Alan Gates
With 9 +1 votes and no -1s the vote passes. I will begin a vote on Hadoop general. Alan. On Aug 18, 2010, at 10:34 AM, Alan Gates wrote: Earlier this week I began a discussion on Pig becoming a TLP (http://bit.ly/byD7L8 ). All of the received feedback was positive. So, let's have

Re: Caster interface and byte conversion

2010-08-24 Thread Alan Gates
This seems fine. Is the Pig engine at any point testing to see if the interface is implemented and if so calling toBytes, or is this totally for use inside the store functions themselves to serialize Pig data types? Alan. On Aug 22, 2010, at 1:40 AM, Dmitriy Ryaboy wrote: The current

Re: is Hudson awol?

2010-08-24 Thread Alan Gates
Yes, our friend Hudson is ill again. Giri, Hudson's doctor, should get a chance to look at it in a few days. Alan. On Aug 23, 2010, at 3:31 PM, Dmitriy Ryaboy wrote: Haven't heard anything from Hudson in a while... -D

Re: Caster interface and byte conversion

2010-08-24 Thread Alan Gates
, Alan Gates wrote: This seems fine. Is the Pig engine at any point testing to see if the interface is implemented and if so calling toBytes, or is this totally for use inside the store functions themselves to serialize Pig data types? Alan. On Aug 22, 2010, at 1:40 AM, Dmitriy Ryaboy wrote

Re: Caster interface and byte conversion

2010-08-24 Thread Alan Gates
? Yeah, makes sense. Alan. -D On Tue, Aug 24, 2010 at 10:01 AM, Alan Gates ga...@yahoo-inc.com wrote: One other comment. By making this part of an interface that extends LoadCaster you are assuming the implementing class is both a load and store function. It makes more sense to have

Fwd: hudson patch test jobs : hadoop pig and zookeeper

2010-08-24 Thread Alan Gates
Begin forwarded message: From: Giridharan Kesavan gkesa...@yahoo-inc.com Date: August 24, 2010 4:38:46 PM PDT To: gene...@hadoop.apache.org gene...@hadoop.apache.org Subject: hudson patch test jobs : hadoop pig and zookeeper Reply-To: gene...@hadoop.apache.org gene...@hadoop.apache.org Hi,

Re: Pig Contributor meeting notes

2010-08-26 Thread Alan Gates
On Aug 26, 2010, at 12:55 AM, Jeff Zhang wrote: Wonderful, Dmitriy, It's pity for me missing the contributor meeting. And any ppt shared ? Jeff, We don't want to exclude our contributors who don't happen to live in the San Francisco Bay Area. If we could include you via Skype or some

Re: Does Pig Re-Use FileInputLoadFuncs Objects?

2010-09-07 Thread Alan Gates
I'm not 100% sure I understand the question. Are you asking if it re- uses instances of a given load or store function? It should not. Alan. On Aug 31, 2010, at 7:28 PM, Russell Jurney wrote: Pardon the cross-post: Does Pig ever re-use FileInputLoadFunc objects? We suspect state is

Re: help : error run pig

2010-09-27 Thread Alan Gates
Pig is failing to connect to your namenode. Is the address Pig is trying to use (hdfs://master:54310/) correct? Can you connect using that string from the same machine using bin/hadoop? Alan. On Sep 27, 2010, at 8:45 AM, Ngô Văn Vĩ wrote: I run Pig at Hadoop Mode (Pig-0.7.0 and

[jira] Updated: (PIG-519) allow for '#' to signify a comment in a PIG script

2008-11-14 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Gates updated PIG-519: --- Resolution: Fixed Fix Version/s: types_branch Hadoop Flags: [Reviewed] Status: Resolved

[jira] Commented: (PIG-512) Expressions in foreach lead to errors

2008-11-14 Thread Alan Gates (JIRA)
[ https://issues.apache.org/jira/browse/PIG-512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647700#action_12647700 ] Alan Gates commented on PIG-512: In LogicalPlanCloneHelper, why do you need this: {code

  1   2   3   4   5   6   7   8   9   10   >