[Pig Wiki] Update of "ProposedProjects" by AlanGates

Apache Wiki Thu, 14 Jan 2010 11:25:56 -0800

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.


The "ProposedProjects" page has been changed by AlanGates.
http://wiki.apache.org/pig/ProposedProjects?action=diff&rev1=9&rev2=10

--------------------------------------------------

  = Proposed Pig Projects =
+ The list of proposed Pig projects is now kept on the PigJournal page.
- This page describes projects what we (the committers) would like to see added
- to Pig.  The scale of these projects vary, but they are larger projects,
- usually on the weeks or months scale.  We have not yet filed
- [[https://issues.apache.org/jira/browse/PIG|JIRAs]] for some of these
- because they are still in the vague idea stage.  As they become more concrete,
- [[https://issues.apache.org/jira/browse/PIG|JIRAs]] will be filed for them.
  
- We welcome contributers to take on one of these projects.  If you would like
- to do so, please file a JIRA (if one does not already exist for the project)
- with a proposed solution.  Pig's committers will work with you from there to
- help refine your solution.  Once a solution is agreed upon, you can begin
- implementation.
+ Looking to get involved in Pig?  Excellent.  A great place to start is find a 
[[http://issues.apache.org/jira/browse/PIG|JIRA]] that interests you and 
provide a
+ patch for that.  If you are looking for a bigger project to take on, take a 
look at PigJournal.  Before starting work on a project, it is best to post on 
the
+ JIRA that you plan on working on it and an outline of the approach you intend 
to take.  If it does not have a JIRA yet send a mail to
+ [[mailto:[email protected]|pig-dev]].  This has a couple of
+ advantages.  One, if others want to collaborate with you, it gives them a 
chance to say so and pitch in.  Two, it lets the committers know what you are 
working
+ on they can help you through the process.
  
- If you see a project here that you would like to see Pig implement but you are
- not in a position to implement the solution right now, feel free to vote for
- the project.  Add your name to the list of supporters.  This will help
- contributers looking for a project to select one that will benefit many users.
- 
- If you would like to propose a project for Pig, feel free to add to this list.
- If it is a smaller project, or something you plan to begin work on
- immediately, filing a [[https://issues.apache.org/jira/browse/PIG|JIRA]] is a 
better route.
- 
- || Catagory || Project || JIRA || References || Proposed By || Votes For ||
- || Execution || Pig currently executes scripts by building a pipeline of 
pre-built operators and running data through those operators in map reduce 
jobs.  We need to investigate instead have Pig generate java code specific to a 
job, and then compiling that code and using it to run the map reduce jobs. || 
|| || Many conference attendees || gates ||
- || Language || Currently only LIMIT, DISTINCT, ORDER BY, and FILTER are 
allowed inside FOREACH.  All operators should be allowed in FOREACH. || || || 
gates || ||
- || Optimization || Speed up comparison of tuples during shuffle for ORDER BY 
|| [[https://issues.apache.org/jira/browse/PIG-659|659]] || || olgan || ||
- || Optimization || Often in a Pig script that produces a chain of MR jobs, 
the map phases of 2nd and subsequent jobs very little.  What little they do 
should be pushed into the proceeding reduce and the map replaced by the 
identity mapper.  Initial tests showed that the identity mapper was 50% faster 
than using a Pig mapper (because Pig uses the loader to parse out tuples even 
if the map itself is empty). || 
[[https://issues.apache.org/jira/browse/PIG-480|480]] || || olgan || gates ||
- || Optimization || Use hand crafted calls to do string to integer or float 
conversions.  Initial tests showed these could be done about 8x faster than 
String.toIntger() and String.toFloat(). || 
[[https://issues.apache.org/jira/browse/PIG-482|482]] || || olgan || gates ||
- || Optimization || Currently Pig always samples for an ORDER BY to determine 
how to partition, and then runs another job to do the sort.  For small enough 
inputs, it should just sort with a single reducer. || 
[[https://issues.apache.org/jira/browse/PIG-483|483]] || || olgan || ||
- || Optimization || The combiner is not currently used if FILTER is in the 
FOREACH.  In some cases it could still be used.  || 
[[https://issues.apache.org/jira/browse/PIG-479|479]] || || olgan || ||
- || Optimization || The combiner is not currently used if LIMIT is in the 
FOREACH.  ||  || || gates || ||
- || Optimization || Currently when types of data are declared Pig inserts a 
FOREACH immediately after the LOAD that does the conversions.  These 
conversions should be delayed until the field is actually used. || 
[[https://issues.apache.org/jira/browse/PIG-410|410]] || || olgan || gates ||
- || Optimization || The Pig optimizer should be used to determine when fields 
in a record are no longer needed and put in FOREACH statements to project out 
the unecessary data as early as possible. || 
[[https://issues.apache.org/jira/browse/PIG-466|466]] || || olgan || ||
- || Optimization || Change physical operators to pass list of tuples in 
getNext instead of one tuple at a time. || 
[[https://issues.apache.org/jira/browse/PIG-688|688]] || || Thejas || ||
-

[Pig Wiki] Update of "ProposedProjects" by AlanGates

Reply via email to