Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The "PigJournal" page has been changed by AlanGates. http://wiki.apache.org/pig/PigJournal?action=diff&rev1=10&rev2=11 -------------------------------------------------- || Make configuration available to UDFs || 0.6 || || || Load Store Redesign || 0.7 || || || Pig Mix 2.0 || not yet released || || + || Rewrite Logical Optimizer || not yet released || || + || Cleanup of javadocs || not yet released || || + || UDFs in scripting languages || not yet released || || + || Ability to specify a custom partitioner || not yet released || || + || Pig usage stats collection || not yet released || || + || Make Pig available via Maven || not yet released || || + || Standard UDFs Pig Should Provide || not yet released || || + || Add Scalars To Pig Latin || not yet released || || + || Run Map Reduce Jobs Directly From Pig || not yet released || || == Work in Progress == This covers work that is currently being done. For each entry the main JIRA for the work is referenced. || Feature || JIRA || Comments || || Boolean Type || [[https://issues.apache.org/jira/browse/PIG-1429|PIG-1429]] || || + || Make Illustrate Work || [[https://issues.apache.org/jira/browse/PIG-502|PIG-502]], [[https://issues.apache.org/jira/browse/PIG-534|PIG-534]], [[https://issues.apache.org/jira/browse/PIG-903|PIG-903]], [[https://issues.apache.org/jira/browse/PIG-1066|PIG-1066]] || || + || Better Parser and Scanner Technology || many || || + || Clarify Pig Latin Semantics || many || || + || Extending Pig to Include Branching, Looping, and Functions || TuringCompletePig || || + - || Query Optimizer || [[http://issues.apache.org/jira/browse/PIG-1178|PIG-1178]] || || - || Cleanup of javadocs || [[https://issues.apache.org/jira/browse/PIG-1311|PIG-1311]] || || - || UDFs in scripting languages || [[https://issues.apache.org/jira/browse/PIG-928|PIG-928]] || || - || Ability to specify a custom partitioner || [[https://issues.apache.org/jira/browse/PIG-282|PIG-282]] || || - || Pig usage stats collection || [[https://issues.apache.org/jira/browse/PIG-1389|PIG-1389]], [[https://issues.apache.org/jira/browse/PIG-908|PIG-908]], [[https://issues.apache.org/jira/browse/PIG-864|PIG-864]], [[https://issues.apache.org/jira/browse/PIG-809|PIG-809]] || || - || Make Pig available via Maven || [[https://issues.apache.org/jira/browse/PIG-1334|PIG-1334]] || || - || Standard UDFs Pig Should Provide || [[https://issues.apache.org/jira/browse/PIG-1405|PIG-1405]] || || - || Add Scalars To Pig Latin || [[https://issues.apache.org/jira/browse/PIG-1434|PIG-1434]] || || - || Run Map Reduce Jobs Directly From Pig || [[https://issues.apache.org/jira/browse/PIG-506|PIG-506]] || || == Proposed Future Work == Work that the Pig project proposes to do in the future is further broken into three categories: @@ -74, +79 @@ Within each subsection order is alphabetical and does not imply priority. === Agreed Work, Agreed Approach === - ==== Make Illustrate Work ==== - Illustrate has become Pig's ignored step-child. Users find it very useful, but developers have not kept it up to date with new features (e.g. it does not work with merge join). Also, the way it is currently - implemented it has code in many of Pig's physical operators. This means the code is more complex and burdened with branches, making it harder to maintain. It also means that when doing new development it is - easy to forget about illustrate. Illustrate needs to be redesigned in such a way that it does not add complexity to physical operators and that as new operators are developed it is necessary and easy to add - illustrate functionality to them. Tests for illustrate also need to be added to the test suite so that it is no broken unintentionally. - - '''Category:''' Usability - - '''Dependency:''' - - '''References:''' - - '''Estimated Development Effort:''' medium - ==== Combiner Not Used with Limit or Filter ==== Pig Scripts that have a foreach with a nested limit or filter do not use the combiner even when they could. Not all filters can use the combiner, but in some cases they can. I think all limits could at least apply the limit in the combiner, though the UDF itself may only be executed in the reducer. @@ -296, +287 @@ '''Estimated Development Effort:''' small - ==== Clarify Pig Latin Semantics ==== - There are areas of Pig Latin semantics that are not clear or not consistent. Take for example, a script like: - - {{{ - A = load 'foo' AS (a: bag, b: int); - B = foreach A generate flatten(a); - }}} - - What is the schema of B? It should be unknown, since the schema of a is unknown. Currently it is instead assigned a schema of (bytearray). - - Solving this involves two steps. First, a definitive, clear, consistent grammar needs to be developed for Pig Latin. Second, the front end code (mostly the - LogicalPlan and the type checker) need to be modified to assure that they conform to this specification. - - '''Category:''' Usability - - '''Dependency:''' Should be done after a parser technology is selected as standard (see Standardize on Parser and Scanner Technology) since it will require changes - to the grammar. - - '''References:''' - - '''Estimated Development Effort:''' medium - - ==== Extending Pig to Include Branching, Looping, and Functions ==== - It would be very convenient for Pig Latin to include branching, looping, and function calls. Consider for example a program where the user wishes to iterate over - data until it begins to converge: - - {{{ - A = load 'webcrawl' (url: chararray, links: bag); - while (unresolved_links(links) > 0.9 * COUNT(links)) { - -- resolve links - ... - } - store Z into 'webmap'; - }}} - - There are at least two ways this could be accomplished. One, Pig Latin itself could be extended to include these features. Two, Pig Latin could be embedded in an - existing scripting language (such as Python, Ruby, Perl, maybe others) and the branching, looping, and function constructs in that language provide Pig - control flow. There are advantages and disadvantages to each. Hybrid approaches (e.g. branching and looping in a script language, functions or macros in Pig - Latin) are also possible. The Pig team needs to come to a consensus on which path to choose. - - '''Category:''' New functionality - - '''Dependency:''' - - '''References:''' TuringCompletePig - - '''Estimated Development Effort:''' large - ==== IDE for Pig ==== !PigPen was developed and released for Pig with 0.2. However, it has not been kept up to date. Users have consistently expressed interest in an IDE for Pig. Ideally this would also include tools for writing UDFs, not just Pig Latin scripts. One option is to bring !PigPen up to date and maintain it. @@ -356, +299 @@ '''References:''' '''Estimated Development Effort:''' large and ongoing - - ==== Better Parser and Scanner Technology ==== - Currently Pig Latin and grunt use Javacc for parsing and scanning. Javacc has proven to be - difficult to work with, very poorly documented, and gives users horrible, barely understandable error messages. Pig needs to select better parsing and scanning - packages. Antlr, Sablecc, and perhaps other technologies need to be investigated as well. - - '''Category:''' Developer and Usability (for better error messages) - - '''Dependency:''' - - '''References:''' - - '''Estimated Development Effort:''' medium - === Experimental === ==== Add List Datatype ====
