Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "PigJournal" page has been changed by AlanGates.
http://wiki.apache.org/pig/PigJournal?action=diff&rev1=10&rev2=11

--------------------------------------------------

  || Make configuration available to UDFs                 || 0.6                
  || ||
  || Load Store Redesign                                  || 0.7                
  || ||
  || Pig Mix 2.0                                          || not yet released   
  || ||
+ || Rewrite Logical Optimizer                            || not yet released   
  || ||
+ || Cleanup of javadocs                                  || not yet released   
  || ||
+ || UDFs in scripting languages                          || not yet released   
  || ||
+ || Ability to specify a custom partitioner              || not yet released   
  || ||
+ || Pig usage stats collection                           || not yet released   
  || ||
+ || Make Pig available via Maven                         || not yet released   
  || ||
+ || Standard UDFs Pig Should Provide                     || not yet released   
  || ||
+ || Add Scalars To Pig Latin                             || not yet released   
  || ||
+ || Run Map Reduce Jobs Directly From Pig                || not yet released   
  || ||
  
  == Work in Progress ==
  This covers work that is currently being done.  For each entry the main JIRA 
for the work is referenced.
  
  || Feature                                     || JIRA                        
                                || Comments ||
  || Boolean Type                                || 
[[https://issues.apache.org/jira/browse/PIG-1429|PIG-1429]] || ||
+ || Make Illustrate Work                        || 
[[https://issues.apache.org/jira/browse/PIG-502|PIG-502]], 
[[https://issues.apache.org/jira/browse/PIG-534|PIG-534]], 
[[https://issues.apache.org/jira/browse/PIG-903|PIG-903]], 
[[https://issues.apache.org/jira/browse/PIG-1066|PIG-1066]] || ||
+ || Better Parser and Scanner Technology        || many || ||
+ || Clarify Pig Latin Semantics                 || many || ||
+ || Extending Pig to Include Branching, Looping, and Functions || 
TuringCompletePig || ||
+ 
- || Query Optimizer                             || 
[[http://issues.apache.org/jira/browse/PIG-1178|PIG-1178]]  || ||
- || Cleanup of javadocs                         || 
[[https://issues.apache.org/jira/browse/PIG-1311|PIG-1311]] || ||
- || UDFs in scripting languages                 || 
[[https://issues.apache.org/jira/browse/PIG-928|PIG-928]]   || ||
- || Ability to specify a custom partitioner     || 
[[https://issues.apache.org/jira/browse/PIG-282|PIG-282]]   || ||
- || Pig usage stats collection                  || 
[[https://issues.apache.org/jira/browse/PIG-1389|PIG-1389]], 
[[https://issues.apache.org/jira/browse/PIG-908|PIG-908]], 
[[https://issues.apache.org/jira/browse/PIG-864|PIG-864]], 
[[https://issues.apache.org/jira/browse/PIG-809|PIG-809]] || ||
- || Make Pig available via Maven                || 
[[https://issues.apache.org/jira/browse/PIG-1334|PIG-1334]] || ||
- || Standard UDFs Pig Should Provide            || 
[[https://issues.apache.org/jira/browse/PIG-1405|PIG-1405]] || ||
- || Add Scalars To Pig Latin                    || 
[[https://issues.apache.org/jira/browse/PIG-1434|PIG-1434]] || ||
- || Run Map Reduce Jobs Directly From Pig       || 
[[https://issues.apache.org/jira/browse/PIG-506|PIG-506]]   || ||
  
  == Proposed Future Work ==
  Work that the Pig project proposes to do in the future is further broken into 
three categories:
@@ -74, +79 @@

  Within each subsection order is alphabetical and does not imply priority.
  
  === Agreed Work, Agreed Approach ===
- ==== Make Illustrate Work ====
- Illustrate has become Pig's ignored step-child.  Users find it very useful, 
but developers have not kept it up to date with new features (e.g. it does not 
work with merge join).  Also, the way it is currently
- implemented it has code in many of Pig's physical operators.  This means the 
code is more complex and burdened with branches, making it harder to maintain.  
It also means that when doing new development it is
- easy to forget about illustrate.  Illustrate needs to be redesigned in such a 
way that it does not add complexity to physical operators and that as new 
operators are developed it is necessary and easy to add
- illustrate functionality to them.  Tests for illustrate also need to be added 
to the test suite so that it is no broken unintentionally.
- 
- '''Category:'''  Usability
- 
- '''Dependency:''' 
- 
- '''References:''' 
- 
- '''Estimated Development Effort:'''  medium
- 
  ==== Combiner Not Used with Limit or Filter ====
  Pig Scripts that have a foreach with a nested limit or filter do not use the 
combiner even when they could.  Not all filters can use the combiner, but in 
some cases
  they can.  I think all limits could at least apply the limit in the combiner, 
though the UDF itself may only be executed in the reducer. 
@@ -296, +287 @@

  '''Estimated Development Effort:'''  small
  
  
- ==== Clarify Pig Latin Semantics ====
- There are areas of Pig Latin semantics that are not clear or not consistent.  
Take for example, a script like:
- 
- {{{
-     A = load 'foo' AS (a: bag, b: int);
-     B = foreach A generate flatten(a);
- }}}
- 
- What is the schema of B? It should be unknown, since the schema of a is 
unknown.  Currently it is instead assigned a schema of (bytearray).
- 
- Solving this involves two steps.  First, a definitive, clear, consistent 
grammar needs to be developed for Pig Latin.  Second, the front end code 
(mostly the
- LogicalPlan and the type checker) need to be modified to assure that they 
conform to this specification. 
- 
- '''Category:'''  Usability
- 
- '''Dependency:'''  Should be done after a parser technology is selected as 
standard (see Standardize on Parser and Scanner Technology) since it will 
require changes
- to the grammar.
- 
- '''References:'''
- 
- '''Estimated Development Effort:'''  medium
- 
- ==== Extending Pig to Include Branching, Looping, and Functions ====
- It would be very convenient for Pig Latin to include branching, looping, and 
function calls.  Consider for example a program where the user wishes to 
iterate over
- data until it begins to converge:
- 
- {{{
-     A = load 'webcrawl' (url: chararray, links: bag);
-     while (unresolved_links(links) > 0.9 * COUNT(links)) {
-         -- resolve links
-         ...
-     }
-     store Z into 'webmap';
- }}}
- 
- There are at least two ways this could be accomplished.  One, Pig Latin 
itself could be extended to include these features.  Two, Pig Latin could be 
embedded in an
- existing scripting language (such as Python, Ruby, Perl, maybe others) and 
the branching, looping, and function constructs in that language provide Pig
- control flow.  There are advantages and disadvantages to each.  Hybrid 
approaches (e.g. branching and looping in a script language, functions or 
macros in Pig
- Latin) are also possible.  The Pig team needs to come to a consensus on which 
path to choose. 
- 
- '''Category:'''  New functionality
- 
- '''Dependency:'''
- 
- '''References:''' TuringCompletePig
- 
- '''Estimated Development Effort:'''  large
- 
  ==== IDE for Pig ====
  !PigPen was developed and released for Pig with 0.2.  However, it has not 
been kept up to date.  Users have consistently expressed interest
  in an IDE for Pig.  Ideally this would also include tools for writing UDFs, 
not just Pig Latin scripts.  One option is to bring !PigPen up to date and 
maintain it.
@@ -356, +299 @@

  '''References:'''
  
  '''Estimated Development Effort:'''  large and ongoing
- 
- ==== Better Parser and Scanner Technology ====
- Currently Pig Latin and grunt use Javacc for parsing and scanning.  Javacc 
has proven to be
- difficult to work with, very poorly documented, and gives users horrible, 
barely understandable error messages.  Pig needs to select better parsing and 
scanning
- packages.  Antlr, Sablecc, and perhaps other technologies need to be 
investigated as well.
- 
- '''Category:'''  Developer and Usability (for better error messages)
- 
- '''Dependency:'''
- 
- '''References:'''
- 
- '''Estimated Development Effort:'''  medium
- 
  
  === Experimental ===
  ==== Add List Datatype ====

Reply via email to