Restarting discussion on Pig as a TLP

2010-08-16 Thread Alan Gates
Five months ago I started a discussion on whether Pig should become a top level project (TLP) at Apache instead of remaining a subproject of Hadoop (http://mail-archives.apache.org/mod_mbox/hadoop-pig-dev/201003.mbox/%3c006aea7c-8829-4788-ad7b-822396fa2...@yahoo-inc.com%3e ). At the time I vo

August Pig contributor workshop

2010-08-17 Thread Alan Gates
All, We will be holding the next Pig contributor workshop at Twitter on Wednesday, August 25 from 4-6. The tentative agenda is to discuss: Making Piggybank better Pig and Azkaban integration Plans for features in 0.9 An update on the Howl project Anyone contributing to or interested in cont

[VOTE] Pig to become a top level Apache project

2010-08-18 Thread Alan Gates
Earlier this week I began a discussion on Pig becoming a TLP (http://bit.ly/byD7L8 ). All of the received feedback was positive. So, let's have a formal vote. I propose we move Pig to a top level Apache project. I propose that the initial PMC of this project be the list of all currently

Re: August Pig contributor workshop

2010-08-18 Thread Alan Gates
Confirming Olga and I will be there. Alan. On Aug 18, 2010, at 4:45 PM, Dmitriy Ryaboy wrote: Hi folks, Please do RSVP so that we know how many people are coming. Thanks, -Dmitriy On Tue, Aug 17, 2010 at 4:04 PM, Alan Gates wrote: All, We will be holding the next Pig contributor

Re: release notes in JIRA

2010-08-20 Thread Alan Gates
+1 Backloading documentation is error prone and leads to not getting documentation done. Alan. On Aug 20, 2010, at 4:11 PM, Olga Natkovich wrote: Guys, After spending the last couple of days collecting information for Pig 0.8.0 documentation, I would like to propose a change for our p

Re: [VOTE] Pig to become a top level Apache project

2010-08-23 Thread Alan Gates
With 9 +1 votes and no -1s the vote passes. I will begin a vote on Hadoop general. Alan. On Aug 18, 2010, at 10:34 AM, Alan Gates wrote: Earlier this week I began a discussion on Pig becoming a TLP (http://bit.ly/byD7L8 ). All of the received feedback was positive. So, let's h

Re: Caster interface and byte conversion

2010-08-24 Thread Alan Gates
This seems fine. Is the Pig engine at any point testing to see if the interface is implemented and if so calling toBytes, or is this totally for use inside the store functions themselves to serialize Pig data types? Alan. On Aug 22, 2010, at 1:40 AM, Dmitriy Ryaboy wrote: The current HBa

Re: is Hudson awol?

2010-08-24 Thread Alan Gates
Yes, our friend Hudson is ill again. Giri, Hudson's doctor, should get a chance to look at it in a few days. Alan. On Aug 23, 2010, at 3:31 PM, Dmitriy Ryaboy wrote: Haven't heard anything from Hudson in a while... -D

Re: Caster interface and byte conversion

2010-08-24 Thread Alan Gates
, Alan Gates wrote: This seems fine. Is the Pig engine at any point testing to see if the interface is implemented and if so calling toBytes, or is this totally for use inside the store functions themselves to serialize Pig data types? Alan. On Aug 22, 2010, at 1:40 AM, Dmitriy Ryaboy wrote: The

Re: Caster interface and byte conversion

2010-08-24 Thread Alan Gates
em ok? Yeah, makes sense. Alan. -D On Tue, Aug 24, 2010 at 10:01 AM, Alan Gates wrote: One other comment. By making this part of an interface that extends LoadCaster you are assuming the implementing class is both a load and store function. It makes more sense to have a sep

Fwd: hudson patch test jobs : hadoop pig and zookeeper

2010-08-24 Thread Alan Gates
Begin forwarded message: From: "Giridharan Kesavan" Date: August 24, 2010 4:38:46 PM PDT To: "gene...@hadoop.apache.org" Subject: hudson patch test jobs : hadoop pig and zookeeper Reply-To: "gene...@hadoop.apache.org" Hi, We have a new hudson master hudson.apache.org and hudson.zones.a

Re: Pig Contributor meeting notes

2010-08-26 Thread Alan Gates
On Aug 26, 2010, at 12:55 AM, Jeff Zhang wrote: Wonderful, Dmitriy, It's pity for me missing the contributor meeting. And any ppt shared ? Jeff, We don't want to exclude our contributors who don't happen to live in the San Francisco Bay Area. If we could include you via Skype or some ot

Re: Does Pig Re-Use FileInputLoadFuncs Objects?

2010-09-07 Thread Alan Gates
I'm not 100% sure I understand the question. Are you asking if it re- uses instances of a given load or store function? It should not. Alan. On Aug 31, 2010, at 7:28 PM, Russell Jurney wrote: Pardon the cross-post: Does Pig ever re-use FileInputLoadFunc objects? We suspect state is being

Re: help : error run pig

2010-09-27 Thread Alan Gates
Pig is failing to connect to your namenode. Is the address Pig is trying to use (hdfs://master:54310/) correct? Can you connect using that string from the same machine using bin/hadoop? Alan. On Sep 27, 2010, at 8:45 AM, Ngô Văn Vĩ wrote: I run Pig at Hadoop Mode (Pig-0.7.0 and hadoop-0.

Re: LoadFunc.skipNext() function for faster sampling ?

2009-11-03 Thread Alan Gates
We definitely want to avoid parsing every tuple when sampling. But do we need to implement a special function for it? Pig will have access to the InputFormat instance, correct? Can it not call InputFormat.getNext the desired number of times (which will not parse the tuple) and then call

Re: [jira] Commented: (PIG-970) Support of HBase 0.20.0

2009-11-05 Thread Alan Gates
Switching to pig-dev since the JIRA need not record discussions on release planning. I don't know if there will be a 0.5.1 or not. We don't currently have a proposed release date for 0.6.0. PIG-1048 is a fairly serious bug in the skew join stuff. We may want to consider a 0.5.1 release

Re: [VOTE] Branch for Pig 0.6.0 release

2009-11-09 Thread Alan Gates
+1. In addition to the new features we've added, our change to use Hadoop's LineRecordReader brought Pig to parity with Hadoop in the PigMix tests, about a 30% average performance improvement. This should be huge for our users. Alan. On Nov 9, 2009, at 12:26 PM, Olga Natkovich wrote: H

Re: package org.apache.hadoop.zebra.parse missing

2009-11-11 Thread Alan Gates
The parser package is generated as part of the build. Doing invoking ant in the contrib/zebra directory should result in the parser package being created at ./src-gen/org/apache/hadoop/zebra/parser Alan. On Nov 11, 2009, at 12:54 AM, Min Zhou wrote: Hi guys, I checked out pig from trunk,

Re: FYI - forking TFile off Hadoop into Zebra

2009-11-13 Thread Alan Gates
On Nov 11, 2009, at 4:13 PM, Ashutosh Chauhan wrote: On Wed, Nov 11, 2009 at 18:26, Chao Wang wrote: Last, we would like to point out that this is a short term solution for Zebra and we plan to: 1) port all changes to Zebra TFile back into Hadoop TFile. 2) in the long run have a single un

Re: optimizer hints in Pig

2009-11-16 Thread Alan Gates
In general I think optimizer hints fit well with Pig's approach to data processing, as expressed in our philosophic statement that Pigs are domestic animals (see http://hadoop.apache.org/pig/ philosophy.html ). At least in the examples you give, I don't see 'with' as binding. The user is

Welcome Jeff Zhang

2009-11-19 Thread Alan Gates
All, I would like to welcome Jeff Zhang as our newest Pig committer. Jeff has been contributing to Pig for about nine months now. He's been active on the mailing lists, in contributing patches, and in helping other users with their patches. Congratulations Jeff, and thanks for your con

Yahoo is hiring for Hadoop development

2009-11-20 Thread Alan Gates
All, Yahoo has a number of Hadoop development positions open. There are engineering, architect, management, and QA positions all open. See http://developer.yahoo.net/blogs/hadoop/2009/11/updated_do_you_have_what_it_ta.html for details. Alan.

Re: TPC-H benchmark

2009-11-23 Thread Alan Gates
I don't know of any. Officially Pig cannot publish a TPC-H number because it is not a transaction based store. But I still think it would be very interesting to see the results if someone took the time to translate the queries. Alan. On Nov 22, 2009, at 6:20 PM, RichardGUO Fei wrote:

Re: Why we name it zebra ?

2009-11-30 Thread Alan Gates
On Nov 26, 2009, at 7:39 AM, Jeff Zhang wrote: Hi all, I'd like to know where's the name zebra come from ? does it convey the meaning of this meta data system that the columnar storage format is like the lines on the zebra's skin. Pretty much, yes. We've fallen into the habit of giving a

Re: Pig reading hive columnar rc tables

2009-11-30 Thread Alan Gates
On Nov 30, 2009, at 12:18 PM, Dmitriy Ryaboy wrote: That's awesome, I've been itching to do that but never got around to it.. Garrit, do you have any benchmarks on read speeds? I don't know about putting this in piggybank, as it carries with it pretty significant dependencies, increasing

Re: SQL in Pig?

2010-01-19 Thread Alan Gates
We are still actively working on adding SQL to Pig. We hope to have an updated patch posted to that JIRA in February or March. Alan. On Jan 18, 2010, at 4:15 PM, Michael Dalton wrote: Hi, What's the current status of SQL support in Pig? I looked at the JIRA ( http://issues.apache.org/jir

Backward compatibility

2010-01-25 Thread Alan Gates
Over the last year the number of Pig users has grown, both in terms of absolute number and the number of different companies using it. However, it is going to be a little while yet before Pig reaches a maturity level that it can declare a 1.0 release and promise it won't break backward comp

Re: reading/writing HBase in Pig

2010-01-25 Thread Alan Gates
On Jan 18, 2010, at 10:14 PM, Michael Dalton wrote: I took a look at the load-store branch and that definitely seems like the right place to do this. So the right thing to do would be to just open up a JIRA and then post a patch against the load-store rewrite tree, correct? Yes. You sho

Begin a discussion about Pig as a top level project

2010-03-19 Thread Alan Gates
You have probably heard by now that there is a discussion going on in the Hadoop PMC as to whether a number of the subprojects (Hbase, Avro, Zookeeper, Hive, and Pig) should move out from under the Hadoop umbrella and become top level Apache projects (TLP). This discussion has picked up re

JIRA Fix Version

2010-03-24 Thread Alan Gates
A reminder to Pig committers: When closing a JIRA issue as Resolved/ Fixed please make sure to set the Fix Version field. This helps our users know what versions they need to use to get fixes for their issues. And it helps release managers when they build releases to know what is and isn'

Re: Begin a discussion about Pig as a top level project

2010-03-31 Thread Alan Gates
elative to Hadoop. So, I'm -1 on Pig moving out. But this is a soft -1. I'm open to being persuaded that I'm wrong or my concerns can be addressed while still having Pig as a TLP. Alan. On Mar 19, 2010, at 10:59 AM, Alan Gates wrote: You have probably heard by now that

Re: Begin a discussion about Pig as a top level project

2010-04-05 Thread Alan Gates
is, I think becoming a TLP will only introduce unnecessary administrative and bureaucratic headaches. So my vote is also -1. -Dmitriy On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates wrote: So far I haven't seen any feedback on this. Apache has asked the Hadoop PMC to submit input in April o

Re: Begin a discussion about Pig as a top level project

2010-04-05 Thread Alan Gates
make sense as a TLP. As is, I think becoming a TLP will only introduce unnecessary administrative and bureaucratic headaches. So my vote is also -1. -Dmitriy On Wed, Mar 31, 2010 at 2:38 PM, Alan Gates wrote: So far I haven't seen any feedback on this. Apache has asked the Hadoop PMC

Re: TypeCheckingVisitor and casting to less precise numeric types

2010-04-15 Thread Alan Gates
You are correct that all of these casts can be done. We omitted them explicitly because of what you said that we did not want to loose precision. We should be able to downcast when users ask explicitly for it, but we don't want to do this implicitly. Alan. On Mar 24, 2010, at 2:47 PM, An

Re: Shouldn't hadoop18.jar be removed from lib of trunk?

2010-04-22 Thread Alan Gates
It should be removed. I filed https://issues.apache.org/jira/browse/PIG-1388 so we'll remember to remove it in 0.8. Alan. On Apr 21, 2010, at 10:24 PM, chaitanya krishna wrote: Hi, Since pig-trunk now supports hadoop-0.20 and as it already has hadoop20.jar, shouldn't the hadoop18.jar be re

Re: Consider cleaning up backend code

2010-04-22 Thread Alan Gates
A couple of years ago we had this concept that Pig as is should be able to run on other backends (like say Dryad if it were open source). So we built this whole backend interface and (mostly) kept Hadoop specific objects out of the front end. Recently we have modified that stand and said t

Re: When is the pig-0.7.0 and pig-0.8.0 scheduled to be released?

2010-04-23 Thread Alan Gates
We've already branched for 0.7, which means we're not putting any new features in there, just critical bug fixes. We're extensively testing it now and hope to release it soon. We don't have a date for 0.8 yet. Alan. On Apr 23, 2010, at 2:08 AM, chaitanya krishna wrote: Hi, Can someone

Re: [VOTE] Release Pig 0.7.0 (candidate 0)

2010-05-07 Thread Alan Gates
+1. Ran the tutorial and some simple smoke tests on my mac and on linux. Checked that the signature keys are good. Alan. On May 5, 2010, at 11:44 AM, Daniel Dai wrote: Hi, I have created a candidate build for Pig 0.7.0. A description of what is new and different is included in the relea

[Travel Assistance] - Applications Open for ApacheCon NA 2010

2010-05-17 Thread Alan Gates
The Travel Assistance Committee is now taking in applications for those wanting to attend ApacheCon North America (NA) 2010, which is taking place between the 1st and 5th November in Atlanta. The Travel Assistance Committee is looking for people who would like to be able to attend ApacheCon,

Re: Code Repository

2010-05-21 Thread Alan Gates
http://wiki.apache.org/pig/HowToContribute Alan. On May 20, 2010, at 9:15 PM, Renato Marroquín Mogrovejo wrote: Hi, is there a PIG coding standard? or any type of documentation I could follow? Thanks. Renato M.

Re: About PigPen

2010-05-24 Thread Alan Gates
The one on the JIRA is more up to date. However, be aware that PigPen has not been updated since Pig 0.2 and does not work with new versions of Pig. Alan. On May 23, 2010, at 11:25 PM, Renato Marroquín Mogrovejo wrote: Hi, does anybody know which the PigPen release is? I found two links.

Re: does EvalFunc generate the entire bag always ?

2010-05-27 Thread Alan Gates
The default case is that a UDFs that take bags (such as COUNT, etc.) are handed the entire bag at once. In the case where all UDFs in a foreach implement the algebraic interface and the expression itself is algebraic than the combiner will be used, thus significantly limiting the size of t

Hudson returning -1 on javadoc

2010-05-27 Thread Alan Gates
Since it's return from the hospital Hudson has been returning -1 on all patches submitted complaining about a broken javadoc tag. It turns out the bad tag snuck into the code whilst Hudson was away. I've checked in a fix, so Hudson should be happy again. Any patches that were flunked jus

Re: does EvalFunc generate the entire bag always ?

2010-06-02 Thread Alan Gates
orks in conjunction with UDF's... A practical application escapes me right now, But if I do C = foreach B{ C1 = MyUdf(B.bag_on_b); C2 = limit C1 5; } does it know to push limit in this case? On Thu, May 27, 2010 at 2:32 PM, Alan Gates wrote: The default case is that a UDFs that take bag

Re: algebraic optimization not invoked for filter following group?

2010-06-15 Thread Alan Gates
For at least simple cases what's in the pseduo code should work. I hope someday soon we can start using the new logical optimizer work (in the experimental package) to build rules for the MR optimizer (like this combiner stuff) as well, which should be much easier to code. But it will be

Re: SIZE() of relation

2010-06-15 Thread Alan Gates
There have been several requests for this. I'm not a fan of it, because it makes it too easy to forget that you're forcing a single reducer MR job to accomplish this. But I'm open to persuasion if everyone else disagrees. Alan. On Jun 11, 2010, at 7:27 PM, Russell Jurney wrote: This wou

Re: the last job in the mapreduce plan

2010-06-15 Thread Alan Gates
I've never seen a case where this happens. Is this a theoretical question or are you seeing this issue? Alan. On Jun 15, 2010, at 8:49 AM, Gang Luo wrote: Hi, Is it possible the last MapReduce job in the MR plan only loads something and stores it without any other processing in between? F

Re: skew join in pig

2010-06-16 Thread Alan Gates
On Jun 16, 2010, at 8:36 AM, Gang Luo wrote: Hi, there is something confusing me in the skew join (http://wiki.apache.org/pig/PigSkewedJoinSpec ) 1. does the sampling job sample and build histogram on both tables, or just one table (in this case, which one) ? Just the left one. 2. the join

Re: skew join in pig

2010-06-18 Thread Alan Gates
much clear now. One more thing to ask about the third question is, how to allocate reducers to several hot keys? Hashing? Further, Pig doesn't divide the reducers into hot-key reducers and non-hot-key reducers, is it right? Thanks, -Gang - 原始邮件 发件人: Alan Gates 收件人: pi

Re: Bug in new logical optimizer framework?

2010-06-28 Thread Alan Gates
On Jun 28, 2010, at 12:36 AM, Swati Jain wrote: Thanks for the prompt reply. As you mentioned optimization is in its developing stage, does it mean optimization framework is not complete or only rules are in developing stage? In addition to that, I would really appreciate if you could give

Re: Avoiding serialization/de-serialization in pig

2010-06-29 Thread Alan Gates
On Jun 28, 2010, at 5:51 PM, Dmitriy Ryaboy wrote: For what it's worth, I saw very significant speed improvements (order of magnitude for wide tables with few projected columns) when I implemented (2) for our protocol buffer - based loaders. I have a feeling that propagating schemas when k

Re: Add "deepCopy" in LogicalExpression

2010-07-13 Thread Alan Gates
How does deepCopy differ from clone? Alan. On Jul 12, 2010, at 11:19 PM, Swati Jain wrote: Hi, I am working on ticket PIG -1494 ( https://issues.apache.org/jira/browse/PIG-1494 ). While implementing this functionality (conversion of logical expression into CNF), I need to construct the Ope

Notes from Pig contributor workshop

2010-07-13 Thread Alan Gates
e new optimizer framework in the MR optimizer. Alan Gates indicated that while he does not believe we should translate the entire set of MR optimizer visitors into the new framework until we've further tested the framework, this might be a good first test for the new optimizer in the MR

Announcing Howl development list

2010-07-20 Thread Alan Gates
On Jul 14, 2010, at 2:11 AM, Jeff Hammerbacher wrote: Hey, Thanks for writing up these notes, they're very useful. Pradeep Kamath gave a short presentation on Howl, the work he is leading to create a shared metadata system between Pig, Hive, and Map Reduce. Dmitriy noted that we need to

Semantics of empty bags in Foreach

2008-11-10 Thread Alan Gates
The JIRA https://issues.apache.org/jira/browse/PIG-514 has brought up an interesting issue of how we handle empty bags in foreach statements. The current pig semantic for foreach is that it always produces a cross produce of all of the fields in its projection list. So: B = foreach A ge

Re: Implmenting chown/chmod/chgrp in grunt

2008-11-13 Thread Alan Gates
Take a look at org.apache.pig.tools.grunt.GruntParser.java. All of the file system commands are implemented in there. That's probably also where you chmod et al should go. Alan. On Nov 9, 2008, at 5:22 PM, Ian Holsman wrote: hi. I was about to add these commands into the grunt command s

Re: Implmenting chown/chmod/chgrp in grunt

2008-11-18 Thread Alan Gates
new FsPermission((short)newperms)); } catch (IOException e) { System.err.println(getName() + ": changing permissions of '" + file.getPath() + "':" + e.getMessage()); } } } which I presume is the only way to do it. Al

Re: [VOTE] Release Pig 0.1.1 (candidate 0)

2008-12-02 Thread Alan Gates
+1 Alan. On Nov 25, 2008, at 3:58 PM, Olga Natkovich wrote: Hi, I have created a candidate build for Pig 0.1.1. This release is almost identical to Pig 0.1.0 with a couple of exceptions: (1) It is integrated with hadoop 18 (2) It has one small bug fix (PIG-253) (3) Several UDF were added

What is a relation?

2008-12-05 Thread Alan Gates
All, A question on types in pig. When you say: A = load 'myfile'; what exactly is A? For the moment let us call A a relation, since it is a set of records, and we can pass it to a relational operator, such as FILTER, ORDER, etc. To clarify the question, is a relation equivalent to a bag

Re: Pig Team now has two new committers!

2008-12-09 Thread Alan Gates
Congrats to both of you, an honor well earned. Alan. On Dec 9, 2008, at 8:51 AM, Olga Natkovich wrote: Hi, I am happy to announce that Hadoop PMC voted to make Pradeep Kamath and Santhosh Srinivasan Pig Committer to acknowledge their significant contribution to the project! Congratulation

Re: What is a relation?

2008-12-11 Thread Alan Gates
s to be the same so that we can handle processing in the outer script as well as inside of nested foreach the same and make it easier to extend the set of operators allowed inside of foreach block. Olga -Original Message- From: Alan Gates [mailto:ga...@yahoo-inc.com] Sent: Friday, Dec

Re: Pig performance

2008-12-20 Thread Alan Gates
I left a comment on the blog addressing some of the issues he brought up. Alan. On Dec 20, 2008, at 1:00 AM, Jeff Hammerbacher wrote: Hey Pig team, Did anyone check out the recent claims about Pig's poor performance versus Cascading? Though I haven't worked extensively with either system,

Re: Pig performance

2008-12-31 Thread Alan Gates
in his blog comment were that trunk pig is paradoxically not the most current and that storing intermediate results can decrease the scope of optimizations. On Dec 20, 2008, at 10:16, Alan Gates wrote: I left a comment on the blog addressing some of the issues he brought up. Alan. On Dec 20, 2008,

Re: Adaptive Query Optimization

2009-01-20 Thread Alan Gates
There is no concept of costing in pig at this point. Currently we let the script writer decide when to choose an FR Join over a symmetric hash join. We certainly welcome any work on an optimizer in pig. Be sure and take a look at https://issues.apache.org/jira/browse/PIG-360 where some

Re: [jira] Commented: (PIG-620) find Max Tuple by 1st field UDF (for piggybank)

2009-01-20 Thread Alan Gates
I can do them. I wanted to check about the println before I removed it. Alan. On Jan 20, 2009, at 5:41 PM, Vadim Zaliva (JIRA) wrote: [ https://issues.apache.org/jira/browse/PIG-620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12665677 #action_12

Re: Pig compile error

2009-02-13 Thread Alan Gates
I don't think you are compiling on 1.6. Those are all errors you see when trying to compile when using 1.5. The IOException(String, Throwable) is in 1.6 and not 1.5, and @Override semantics changed slightly. You may have the right version of java but the wrong version of javac. I see yo

Re: switching to different parser in Pig

2009-02-17 Thread Alan Gates
Ted, If understand your comments correctly you aren't chiming in on whether we should switch parsers, just that you would like there to be a published interface of what pig latin syntax trees look like so you could generate them in other tools and then feed them into pig. Is that correct

Re: Pig Performance Benchmarks

2009-02-17 Thread Alan Gates
That's correct. The 10m in the names weren't really meant to be hardcoded into the patch, as the idea is that the tables could be created at different sizes depending on your cluster size. Sorry for the incomplete state of things, obviously that patch needs some work before I commit it.

Re: Pig compile error

2009-02-17 Thread Alan Gates
mpiling. I am having this problem in recent releases of Hadoop and Pig. Earlier sort of same problem came while setting Hadoop 0.19. Is their any major change in source config file ? --nitesh On Fri, Feb 13, 2009 at 8:19 PM, Alan Gates wrote: I don't think you are compiling on

Re: switching to different parser in Pig

2009-02-23 Thread Alan Gates
We looked into antlr. It appears to be very similar to javacc, with the added feature that the java code it generates is humanly readable. That isn't why we want to switch off of javacc. Olga listed the 3 things we want out of a parser that javacc isn't giving us (lack of docs, no easy c

Re: switching to different parser in Pig

2009-02-24 Thread Alan Gates
lored ASTs, you need to pay the price of learning JJTree. Assuming that you need to build ASTs, with JavaCC you have the choice between JJTree and JTB. With SableCC, when I last looked at it, you only get the JTB-like option. -- On Mon, Feb 23, 2009 at 10:06 PM, Alan Gates wrote: We

Fwd: Core For Paper ---- Grid and Cloud Middleware Workshop, in conjunction with GCC2009

2009-03-03 Thread Alan Gates
Begin forwarded message: From: Yongqiang He Date: March 1, 2009 10:18:03 PM PST To: "core-u...@hadoop.apache.org" , "core-...@hadoop.apache.org " , "hbase-u...@hadoop.apache.org" >, "hive-u...@hadoop.apache.org" , "hive-...@hadoop.apache.org " Subject: Core For Paper Grid and Cloud M

Re: Next pig release

2009-03-09 Thread Alan Gates
+1 on doing a release. +0 on calling it 1.0. Are we really that stable? Alan. On Mar 6, 2009, at 5:28 PM, Olga Natkovich wrote: Pig Developers and Committers, Now that types branch is merged into trunk and the dust settled, I propose that it is time for the next release. I propose that w

Re: scope string in OperatorKey

2009-03-11 Thread Alan Gates
The purpose of the scope string is to allow us to have multiple sessions of pig running and distinguish the operators. It's one of those things that was put in before an actual requirement, so whether it will prove useful or not remains to be seen. As for removing it from explain, is it st

Re: [VOTE] Release Pig 1.0.0 (candidate 0)

2009-03-20 Thread Alan Gates
README.txt still has the incubator text in it. This needs to be removed. I'll roll a new package and call a new vote. Alan. On Mar 17, 2009, at 3:21 PM, Olga Natkovich wrote: Pig Committers, I have created a candidate build for Pig 1.0.0. This release represents a major rewrite of Pig f

Re: [VOTE] Release Pig 1.0.0 (candidate 0)

2009-03-23 Thread Alan Gates
To address Santhosh's concerns that 1.0 is not an appropriate release number, I propose that we release the same code under the name 0.2.0. Alan. On Mar 20, 2009, at 11:54 AM, Santhosh Srinivasan wrote: -1 on the 1.0.0 release. IMHO, Pig is relatively stable but not quite there. I would pref

Re: [VOTE] Release Pig 0.2.0 (candidate 2)

2009-04-01 Thread Alan Gates
___ From: Santhosh Srinivasan [...@yahoo-inc.com] Sent: Tuesday, March 31, 2009 10:50 AM To: pig-dev@hadoop.apache.org Subject: RE: [VOTE] Release Pig 0.2.0 (candidate 2) +1 Santhosh -Original Message- From: Alan Gates [mailto:ga...@yahoo-inc.com] Sent: Friday, March 27, 2009 6:08 PM To: pig-dev

Re: Ajax library for Pig

2009-04-08 Thread Alan Gates
Sorry if these are silly questions, but I'm not very familiar with some of these technologies. So what you propose is that Pig would be installed on some dedicated server machine and a web server would be placed in front of it. Then client libraries would be developed that made calls to t

Pig release 0.2.0

2009-04-09 Thread Alan Gates
The Pig team is happy to announce Pig 0.2.0 has been released. This release includes the addition of a types, better error detection and handling, and 5x performance improvement over 0.1.1. The details of the release can be found at http://hadoop.apache.org/pig/releases.html . Pig is a Ha

Re: Ajax library for Pig

2009-04-14 Thread Alan Gates
rse- ajax calls ( i.e async call from server to browser an inbuilt feature in DWR). DWR is under Apache Licence V2. --nitesh On Wed, Apr 8, 2009 at 9:11 PM, Alan Gates wrote: Sorry if these are silly questions, but I'm not very familiar with some of these technologies. So what you

Re: [Pig Wiki] Update of "HowToContribute" by AlanGates

2009-04-16 Thread Alan Gates
At this point these are all proposed, none are yet realized. So there is no code for any of them. The place to track these proposals are in the referenced JIRAs. Alan. On Apr 15, 2009, at 6:44 PM, zhang jianfeng wrote: Hi Alan, Thank you for your guideline. So where's code of these Pr

Re: [Pig Wiki] Update of "ProposedProjects" by AlanGates

2009-04-16 Thread Alan Gates
Your understanding of the proposal is correct. The goal would be to produce Java code rather than a pipeline configuration. But the reasoning is not so that users can then take that and modify themselves. There's nothing preventing them from doing it, but it has a couple of major drawbac

Re: migrating udfs

2009-04-24 Thread Alan Gates
See http://hadoop.apache.org/pig/docs/r0.2.0/udf.html and http://wiki.apache.org/pig/ConvertingUDFs (the latter I just posted, so no you didn't miss it before). Alan. On Apr 23, 2009, at 10:49 PM, Earl Cahill wrote: Looks like no one is going to migrate my udfs for me, but getting some help

A proposal for changing pig's memory management

2009-05-14 Thread Alan Gates
http://wiki.apache.org/pig/PigMemory Alan.

Re: A proposal for changing pig's memory management

2009-05-19 Thread Alan Gates
what the NIO byte buffers are there to provide? Wouldn't a virtual tuple type that was nothing more than a byte buffer, type and an offset do almost all of what is proposed here? On Thu, May 14, 2009 at 5:33 PM, Alan Gates wrote: http://wiki.apache.org/pig/PigMemory Alan.

Re: A proposal for changing pig's memory management

2009-05-19 Thread Alan Gates
something similar (treating chararray the same way as bytearray) until String operations need to be done. Thanks, Thejas On 5/14/09 5:33 PM, "Alan Gates" wrote: http://wiki.apache.org/pig/PigMemory Alan.

Re: A proposal for changing pig's memory management

2009-05-20 Thread Alan Gates
On May 19, 2009, at 10:30 PM, Mridul Muralidharan wrote: I am still not very convinced about the value about this implementation - particularly considering the advances made since 1.3 in memory allocators and garbage collection. My fundamental concern is not with the slowness of garbage

Re: UDF with parameters?

2009-05-22 Thread Alan Gates
Yes, it is possible. The UDF should take the percentage you want as a constructor argument. It will have to be passed as a string and converted. Then in your Pig Latin, you will use the DEFINE statement to pass the argument to the constructor. REGISTER /src/myfunc.jar DEFINE percentile m

Proposed design for new merge join in pig

2009-05-28 Thread Alan Gates
http://wiki.apache.org/pig/PigMergeJoin Alan.

Updated PigMix numbers for latest top of trunk

2009-05-28 Thread Alan Gates
http://wiki.apache.org/pig/PigMix Alan.

Re: PigPen Source

2009-06-15 Thread Alan Gates
It has not yet been integrated into contrib because it requires the eclipse libraries to build, and those weren't integrated. The ivy stuff used by pig's build should be configured to pick up the appropriate eclipse jars so that this can be added to contrib. Alan. On Jun 15, 2009, at 12:0

Re: Rewire and multi-query load/store optimization

2009-06-16 Thread Alan Gates
+1 on option one. The use of store->load was only to overcome a temporary problem in Pig. We've fixed the problem, so let's not propagate it. We will need to document this very clearly (maybe even to the point of issuing warnings in the parser when we see this combo) so users understand

Re: [VOTE] Release Pig 0.3.0 (candidate 0)

2009-06-22 Thread Alan Gates
Downloaded, ran, ran tutorial, built piggybank. All looks good. +1 Alan. On Jun 18, 2009, at 12:30 PM, Olga Natkovich wrote: Hi, I created a candidate build for Pig 0.3.0 release. The main feature of this release is support for multiquery which allows to share computation across multiple

Re: asking for comments on benchmark queries

2009-06-23 Thread Alan Gates
Zheng, I don't think you're subscribed to pig-dev (your emails have been bouncing to the moderator). So I've cc'd you explicitly on this. I don't think we need a Pig JIRA, it's probably easier if we all work on the hive one. I'll post my comments on the various scripts to that bug. I'v

Re: requirements for Pig 1.0?

2009-06-23 Thread Alan Gates
I don't believe there's a solid list of want to haves for 1.0. The big issue I see is that there are too many interfaces that are still shifting, such as: 1) Data input/output formats. The way we do slicing (that is, user provided InputFormats) and the equivalent outputs aren't yet solid.

Re: requirements for Pig 1.0?

2009-06-24 Thread Alan Gates
M, Russell Jurney wrote: For 1.0 - complete Owl? http://wiki.apache.org/pig/Metadata Russell Jurney rjur...@cloudstenography.com On Jun 23, 2009, at 4:40 PM, Alan Gates wrote: I don't believe there's a solid list of want to haves for 1.0. The big issue I see is that there are too

Re: requirements for Pig 1.0?

2009-06-24 Thread Alan Gates
To be clear, going to 1.0 is not about having a certain set of features. It is about stability and usability. When a project declares itself 1.0 it is making some guarantees regarding the stability of its interfaces (in Pig's case this is Pig Latin, UDFs, and command line usage). It is a

Re: Is it a bug ?

2009-07-23 Thread Alan Gates
It looks wrong to me, but I don't have a deep understanding of that code. Alan. On Jul 15, 2009, at 6:03 PM, zhang jianfeng wrote: Hi all, Today, when I read the source code, I find a piece of suspicious code: (PigServer.java Line 1047) graph.ignoreNumStores = processedStore

Food for thought on Pig design

2009-08-14 Thread Alan Gates
http://dreamsongs.com/WIB.html mainly section 2.1 on Worse is Better I stumbled across this article today and found the section on Worse is Better very interesting, especially since he is directly comparing the design philosophies of C vs Lisp. The article is almost 20 years old, so you ma

Re: Pig 0.4.0 release

2009-08-18 Thread Alan Gates
Non-committers certainly get a vote, it just isn't binding. I agree on PIG-925 as a blocker. I don't see PIG-859 as a blocker since there is a simple work around. If we want to release 0.4.0 within a week or so, dynamic shims won't be an option because we won't be able to solve the bundled

  1   2   3   4   5   6   7   8   9   10   >