Thanks for doing the work in the first place :-)
Sorry about the lack of attribution. No @author tags...
-D
On Mon, Jun 8, 2009 at 11:13 PM, Earl Cahillcahi...@yahoo.com wrote:
I am planning on coming to the hadoop stuff out near san fran, wednesday and
thursday, thought I would get the
I know there was some discussion of making the types release (0.2) a Pig 1
release, but that got nixed. There wasn't a similar discussion on 0.3.
Has the list of want-to-haves for Pig 1.0 been discussed since?
we're ready to consider 1.0. It would be nice to be 1.0 not
too long after Hadoop is, which still gives us at least 6-9 months.
Alan.
On Jun 22, 2009, at 10:58 AM, Dmitriy Ryaboy wrote:
I know there was some discussion of making the types release (0.2) a
Pig 1
release, but that got
+1 for standard semantics.
We need a COALESCE function to go along with this.
-D
On Mon, Jul 6, 2009 at 10:46 AM, Olga Natkovich ol...@yahoo-inc.com wrote:
Hi,
The current implementation of COUNT and AVG in Pig counts null values.
This is inconsistent with SQL semantics and also with
Jeff,
Chris Olston answered this a while back:
http://markmail.org/thread/xnwutstlftnyycxs
(by the way, MarkMail is awesome for searching mailing list archives. Highly
recommended.)
There are some changes that have to do with sampling and multi-store, but
that email will give you the general
Olga,
Do non-commiters get a vote?
Zebra is in trunk, but relies on 0.20, which is somewhat inconsistent
even if it's in contrib/
Would love to see dynamic (or at least static) shims incorporated into
the 0.4 release (see PIG-660, PIG-924)
There are a couple of bugs still outstanding that I
in a version of hadoop20.jar that will work for users who want to
build with 0.20. This way users can still build this if they want and our
release isn't blocked on the patch.
Alan.
On Aug 17, 2009, at 12:03 PM, Dmitriy Ryaboy wrote:
Olga,
Do non-commiters get a vote?
Zebra is in trunk
Hi everyone,
Attached is a (very) preliminary document outlining a rough design we
are proposing for a cost-based optimizer for Pig.
This is being done as a capstone project by three CMU Master's
students (myself, Ashutosh Chauhan, and Tejal Desai). As such, it is
not necessarily meant for
and just send the URL ?
Thanks,
Santhosh
-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
Sent: Tuesday, September 01, 2009 9:48 AM
To: pig-dev@hadoop.apache.org
Subject: Request for feedback: cost-based optimizer
Hi everyone,
Attached is a (very) preliminary
in the
design, to be honest). But we feel that the implementations have to
be execution mode specific.
-Dmitriy
On Tue, Sep 1, 2009 at 6:26 PM, Jianyong Daijiany...@yahoo-inc.com wrote:
I am still reading but one interesting question is why you decide to put CBO
in physical layer?
Dmitriy Ryaboy wrote
. But now Pig is a subproject of hadoop and almost all Pig users are
using hadoop, I think it is fine to optimize thing towards hadoop.
Dmitriy Ryaboy wrote:
Our initial survey of related literature showed that the usual place
for a CBO tends to be between the physical and logical layer (in fact
so what we implement
works for what you need.
Alan.
On Sep 1, 2009, at 9:54 AM, Dmitriy Ryaboy wrote:
Whoops :-)
Here's the Google doc:
http://docs.google.com/Doc?docid=0Adqb7pZsloe6ZGM4Z3o1OG1fMjFrZjViZ21jdAhl=en
-Dmitriy
On Tue, Sep 1, 2009 at 12:51 PM, Santhosh Srinivasans...@yahoo
http://iablog.sybase.com/paulley/2009/08/is-sql-a-failed-abstraction/
Gosh that looks familiar.
-D
Olga, which test failed? If it's one of the ones I contributed, I'll fix it.
-D
On Mon, Sep 21, 2009 at 8:54 PM, Olga Natkovich ol...@yahoo-inc.com wrote:
Hi,
The new version is available in
http://people.apache.org/~olga/pig-0.4.0-candidate-2/.
I see one failure in a unit test in
Where can one find the Pig logo in a size/resolution suitable for presentations?
Also, I went on the website and noticed that the Y! reappeared on Pig's chest.
-D
.
Also, we're working on cleaning up the Pig with Y! logo issue.
Alan.
On Sep 27, 2009, at 9:59 AM, Dmitriy Ryaboy wrote:
Where can one find the Pig logo in a size/resolution suitable for
presentations?
Also, I went on the website and noticed that the Y! reappeared on Pig's
chest.
-D
We ran into what looks like some edge case bug in Pig, which causes it
to throw an IndexOutOfBoundsException (stack trace below). The script
just joins two relations; it looks like our data was generated
incorrectly, and the join is empty, which may be what's causing the
failure. It also appears
Jeff,
Slicers dont work in local mode, there is an ancient ticket for that on the
Jira.
Richard -- hard to say whats going on without more code. Think you can come up
with a simplified version of your loadfunc that fails in a similar manner, and
share it?
-Original Message-
From:
Do you get any of your Log messages to come out, or none at all?
-D
2009/10/26 RichardGUO Fei gladiato...@hotmail.com:
Hi,
This is the rough source codes of the slicer/loadfunc:
public class HadoopStoreStorage extends Utf8StorageConverter
implements LoadFunc, Slicer {
private
problem!
Date: Tue, 27 Oct 2009 23:40:43 -0800
I mean hadoop's local mode not pig's own local mode
-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
Sent: 2009年10月26日 6:33
To: pig-dev@hadoop.apache.org; pig-dev@hadoop.apache.org
Subject: RE: Custom Loadfunc problem
Could someone explain the nature of the two-level access problem
referred to in the Load/Store redesign wiki and in the DataType code?
Thanks,
-D
= false;
-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
Sent: Monday, November 02, 2009 5:33 PM
To: pig-dev@hadoop.apache.org
Subject: two-level access problem?
Could someone explain the nature of the two-level access problem
referred to in the Load/Store redesign
Hi all,
I am looking at the RequiredFields class and it has this explanation
of what getFields() returns:
/**
* List of fields required from the input. This includes fields that are
* transformed, and thus are no longer the same fields. Using the example 'B
* = foreach A
Richard,
The Load/Store redesign proposal has an interface that defines how
stats get represented; a loader that implements ResourceLoader will
pass statistics up into Pig, which will then take care of doing
whatever it needs to do with them. The specifics of how the stats get
loaded in by the
Congrats Jeff!
On Thu, Nov 19, 2009 at 7:47 PM, Jeff Zhang zjf...@gmail.com wrote:
I am very glad to join the pig family. I have grown and learned a lot with
others' help in the last nine months.I will continue contribute to pig and
learn from others.
Jeff Zhang
On Thu, Nov 19, 2009 at
Rash
s...@ning.com
On Nov 19, 2009, at 4:48 PM, Dmitriy Ryaboy wrote:
Zaki,
Glad to hear it wasn't Pig's fault!
Can you post a description of what was going on with S3, or at least
how you fixed it?
-D
On Thu, Nov 19, 2009 at 2:57 PM, zaki rahaman zaki.raha...@gmail.com
wrote:
Okay
That's awesome, I've been itching to do that but never got around to it..
Garrit, do you have any benchmarks on read speeds?
I don't know about putting this in piggybank, as it carries with it pretty
significant dependencies, increasing the size of the jar and making it
difficult for users to
Sorry I misspelled your name, Gerrit.
-D
On Mon, Nov 30, 2009 at 3:18 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
That's awesome, I've been itching to do that but never got around to it..
Garrit, do you have any benchmarks on read speeds?
I don't know about putting this in piggybank
, at 12:18 PM, Dmitriy Ryaboy wrote:
That's awesome, I've been itching to do that but never got around to
it..
Garrit, do you have any benchmarks on read speeds?
I don't know about putting this in piggybank, as it carries with it
pretty
significant dependencies, increasing the size
Olga,
Are there any changes in 0.6 that are not backwards-compatible, or is
all that only in trunk?
-Dmitriy
On Thu, Jan 7, 2010 at 10:33 AM, Olga Natkovich ol...@yahoo-inc.com wrote:
Pig Developers,
Since we have branched for the release, we have fixed a lot of bugs and
stabilized the
in UDFs. The only modifications
that changes things a bit is moving local mode from native to Hadoop's.
Olga
-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
Sent: Thursday, January 07, 2010 10:44 AM
To: pig-dev@hadoop.apache.org
Subject: Re: time to release Pig 0.6.0
.
Olga
-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
Sent: Thursday, January 07, 2010 2:20 PM
To: pig-dev@hadoop.apache.org
Subject: Re: time to release Pig 0.6.0
Having just been hit by this -- any chance we can put
http://issues.apache.org/jira/browse/PIG
Both are caused by you running in local mode by default.
On Mon, Jan 11, 2010 at 5:36 PM, felix gao gre1...@gmail.com wrote:
Follow up with the previous email. I have noticed the following
I have a pig script called Overlap that reads in bunch *.bz2 files
if I run the following command
Hi Mike,
It would be great to have a StoreFunc for HBase!
There is a rewrite underway for the Load/Store stuff that will make
that a lot easier -- see https://issues.apache.org/jira/browse/PIG-966
. You may want to consider writing it for the load-store redesign
branch. This is what's probably
for
inheritance arises rather than begin as protected?
-Original Message-
From: Dmitriy Ryaboy [mailto:dvrya...@gmail.com]
Sent: Tuesday, February 02, 2010 7:35 PM
To: pig-dev@hadoop.apache.org
Subject: Private variables are not eco-friendly
Hi all,
I keep running into problems trying
Jian,
If what you are looking for is something that will let you deal with
skewed data and forget about how the underlying distributed system
works, both Pig and Hive will help you do that to some extent. If you
are looking for something that will let you exercise fine-grained
control over
that will be fed into a total of 200 reducers.
-D
On Mon, Feb 8, 2010 at 7:16 AM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
Jian,
If what you are looking for is something that will let you deal with
skewed data and forget about how the underlying distributed system
works, both Pig and Hive will help
hc,
Good stuff. I was thinking along very similar lines with regards to allowing
mapping a function over a bag. I suspect a MAP can actually be written as a
udf. We'd just have to pass the name of the function to be mapped and call
InstantiateFuncFromSpec on it.
We may want a different name for
Hi guys,
Trunk has been broken for a while. A bunch of tests in the test-commit
target fail, mostly due to The import
org.apache.pig.experimental.logical.optimizer.PlanPrinter cannot be
resolved. Could someone check in the missing file?
-D
Over time, Pig is increasing its coupling to Hadoop (for good reasons),
rather than decreasing it. If and when Pig becomes a viable entity without
hadoop around, it might make sense as a TLP. As is, I think becoming a TLP
will only introduce unnecessary administrative and bureaucratic headaches.
CDH2 or CDH3?
CDH2 is basically 0.{4,5}. CDH3 is in between 5 and 6.
I expect the first result -- a flattened bag of tuples results in multiple
rows, each containing the (not-flattened) tuple.
Btw, Pig 0.6 is out.
-D
On Fri, Apr 2, 2010 at 11:32 AM, hc busy hc.b...@gmail.com wrote:
doh
-
From: Thejas Nair [mailto:te...@yahoo-inc.com]
Sent: Friday, April 02, 2010 4:08 PM
To: pig-dev@hadoop.apache.org; Dmitriy Ryaboy
Subject: Re: Begin a discussion about Pig as a top level project
I agree with Alan and Dmitriy - Pig is tightly coupled with hadoop, and
heavily
If you define a UDF like this:
DEFINE foo my.Udf('param1', 'param2');
data = foreach other_data generate foo(field);
and my.Udf is an algebraic function, the Initial, Intermediate, and Final
classes do not get initialized with the arguments passed into my.Udf in the
DEFINE.
Am I missing
Apache systems were attacked earlier this month; details here:
https://blogs.apache.org/infra/entry/apache_org_04_09_2010
Particularly important bit:
Password Security
*If you are a user of the Apache hosted JIRA, Bugzilla, or Confluence, a
hashed copy of your password has been compromised.*
Still PIG-200
-D
On Fri, Apr 16, 2010 at 1:37 PM, Radhikadevi Parvathaneni
rparv...@acad.umass.edu wrote:
hi Pig development team,
Can you please provide me some skewed and non-skewed data sets for checking
the
performance of different join types in PIG.
Thank you in advance
Radhika
At some point you need to run ant so that it pulls down various
dependencies and autogenerate some code -- this is probably the step that
was missing when you used the subclipse plugin. I know people have used
subclipse successfully before (me, I'm more of a command-line type).
An ant target that
Is anyone running unpatched 0.6 anywhere? I am in the process of
putting together a jar for us, and getting worried about all the
PruneColumns optimization fixes that came after the 0.6 release.
-D
No, that one is Hudson.
While it was on, uh, medical leave, a bunch of patches got committed, so now
I am guessing it's trying to apply a patch to a tree that already has said
patch in it.
Don't worry about it, this patch is already in trunk.
-D
On Thu, May 27, 2010 at 11:22 AM, Russell Jurney
It looks like right now, the combiner optimization does not kick in for a
script like this:
data = load 'foo' using PigStorage() as (a, b, c);
grouped = group data by a;
filtered = filter grouped by COUNT(data) 1000;
Looking at the code in CombinerOptimizer, seems like the Filter bit is just
It would be cool to just treat relations as bags in the general case. They
kind of are, and kind of are not. Causes lots of user confusion.
There are obvious users-doing-dumb-stuff scenarios that arise though.
I guess the Pig philosophy is that the user is the optimizer, though.. so
maybe it's ok.
MR job to
accomplish this. But I'm open to persuasion if everyone else disagrees.
Alan.
On Jun 11, 2010, at 7:27 PM, Russell Jurney wrote:
This would be great. Save us from GROUP ALL/FOREACH, which is awkward.
On Fri, Jun 11, 2010 at 7:14 PM, Dmitriy Ryaboy dvrya...@gmail.com
wrote
On Wed, Jun 16, 2010 at 9:16 AM, Alan Gates ga...@yahoo-inc.com wrote:
4. for non-hot keys, my understanding is that they are shuffled to reducers
based on default hash partitioner. However, it could happen all the keys
shuffled to one reducers incurs skew even none of them is skewed
It's just whatever the hash function happens to do. By the time the hot
keys are slotted to be spread among multiple reducers, they are no longer
hot, so it doesn't matter if you put a few of the partitions in the same
reducer. Remember, we mostly care about things we have to keep in memory.
Since
http://www.eclipse.org/Xtext/documentation/
Wow. That would be huge.
For what it's worth, I saw very significant speed improvements (order of
magnitude for wide tables with few projected columns) when I implemented (2)
for our protocol buffer - based loaders.
I have a feeling that propagating schemas when known, and using them to for
(de)serialization instead of
Renato,
I just want to make sure folks know -- Pig already has a number of such
optimizations. Daniel's work is aimed at making it (much) easier to write
such rules and to add a couple new ones. But some of the classic
optimizations like projection and filter push-down already exist in the
It does -- lack of existence of a directory during planning does not imply
the directory will be missing when you run.
Sounds like the sort of thing one might want to put into PigUnit
On Wed, Jul 7, 2010 at 2:19 PM, Russell Jurney russell.jur...@gmail.comwrote:
This is my most common error as
Is there a preferred way to handle errors in LoadFunc initialization?
I suspect that if I throw an exception in the constructor, the Pig process
might die, which is not friendly, esp. to people working in the shell; but
just printing out an error can obviously lead to trouble later on, as well.
This sounds reasonable. +1.
-D
On Mon, Aug 16, 2010 at 1:46 PM, Alan Gates ga...@yahoo-inc.com wrote:
Five months ago I started a discussion on whether Pig should become a top
level project (TLP) at Apache instead of remaining a subproject of Hadoop (
+1 for TLP
+1 for Olga as PMC
On Wed, Aug 18, 2010 at 10:34 AM, Alan Gates ga...@yahoo-inc.com wrote:
Earlier this week I began a discussion on Pig becoming a TLP (
http://bit.ly/byD7L8 ). All of the received feedback was positive. So,
let's have a formal vote.
I propose we move Pig to a
Hi folks,
Please do RSVP so that we know how many people are coming.
Thanks,
-Dmitriy
On Tue, Aug 17, 2010 at 4:04 PM, Alan Gates ga...@yahoo-inc.com wrote:
All,
We will be holding the next Pig contributor workshop at Twitter on
Wednesday, August 25 from 4-6. The tentative agenda is to
I just noticed that even though Utf8StorageConverter implements the various
byte[] toBytes(Obj o) methods, they are not part of the LoadCaster interface
-- and therefore can't be relied on when using modular Casters, like I am
trying to do for the HBaseLoader.
Since we don't want to introduce
The current HBase patch on PIG-1205 (patch 7) includes this refactoring.
Please take a look if you have concerns.
Or just if you feel like reviewing the code... :)
-D
On Sat, Aug 21, 2010 at 5:22 PM, Dmitriy Ryaboy dvrya...@gmail.com wrote:
I just noticed that even though Utf8StorageConverter
.
On Aug 18, 2010, at 4:45 PM, Dmitriy Ryaboy wrote:
Hi folks,
Please do RSVP so that we know how many people are coming.
Thanks,
-Dmitriy
On Tue, Aug 17, 2010 at 4:04 PM, Alan Gates ga...@yahoo-inc.com
wrote:
All,
We will be holding the next Pig contributor workshop
Haven't heard anything from Hudson in a while...
-D
Ryaboy wrote:
The current HBase patch on PIG-1205 (patch 7) includes this
refactoring.
Please take a look if you have concerns.
Or just if you feel like reviewing the code... :)
-D
On Sat, Aug 21, 2010 at 5:22 PM, Dmitriy Ryaboy dvrya...@gmail.com
wrote:
I just noticed that even though
functions some time after 0.8 is branched. The initial list of
committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl Steinbach
(Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate someone.
Please send us any thoughts you might have on this subject. It was suggested
that a lot
Thanks Carl!
On Thu, Aug 26, 2010 at 1:08 AM, Carl Steinbach c...@cloudera.com wrote:
Hi,
I added Pig to the list of projects that can be reviewed on Cloudera's
public
ReviewBoard instance, located at http://review.cloudera.org (AKA
review.hbase.org).
Review requests and comments are
68 matches
Mail list logo