Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The "HowlJournal" page has been changed by AlanGates. http://wiki.apache.org/pig/HowlJournal -------------------------------------------------- New page: = Howl Journal = This document tracks the development of Howl. It summarizes work that has been done in previous releases, what is currently being worked on, and proposals for future work in Howl. == Completed Work == || Feature || Available in || Comments || || Read/write of data from Map Reduce || Not yet released || || || Read/write of data from Pig || Not yet released || || || Read from Hive || Not yet released || || || Support pushdown of columns to be projected into storage format || Not yet released || || || Support for RCFile storage || Not yet released || || == Work in Progress == || Feature || Description || || Add a CLI || This will allow users to use Howl without installing all of Hive. The syntax will match that of Hive's DDL. || || Partition pruning || Currently, when asked to return information about a table Hive's metastore returns all partitions in the table. This has a couple of issues. One, for tables with large numbers of partitions it means the metadata operation of fetching information about the table is very expensive. Two, it makes more sense to have the partition pruning logic in one place (Howl) rather than in Hive, Pig, and MR. || == Proposed Work == '''Authentication'''<<BR>> Integrate Howl with security work done on Hadoop so that users can be properly authenticated. '''Authorization'''<<BR>> The initial proposal is to use HDFS permissions to determine whether Howl operations can be executed. For example, it would not be possible to drop a table unless the user had write permissions on the directory holding that table. We need to determine how to extend this model to data not stored in HDFS (e.g. Hbase) and objects that do not exist in HDFS (e.g. views). See HowlSecurity for more information. '''Non-partition Predicate Pushdown'''<<BR>> Since in the future storage formats (such as RCFile) should support predicate pushdown, Howl needs to be able to push predicates into the storage layer when appropriate. '''Notification'''<<BR>> Add ability for systems such as work flow to be notified when new data arrives in Howl. This will be designed around a few systems receiving notification, not large numbers of users receiving notifications (i.e. we will not be building a general purpose publish/subscribe system). One solution to this might be an RSS feed or similar simple service. '''Schema Evolution'''<<BR>> Currently schema evolution in Hive is limited to adding columns at the end of the non-partition keys columns. It may be desirable to support other forms of schema evolution, such as adding columns in other parts of the record, or making it so that new partitions for a table no longer contain a given column. '''Support data read across partitions with different storage formats'''<<BR>> This work is done except that only one storage format is currently supported. '''Support for more file formats'''<<BR>> Additional file formats such as sequence file, text, etc. need to be added. '''Utility APIs'''<<BR>> Grid managers will want to build tools that use Howl to help manage their grids. For example, one might build a tool to do replication between two grids. Such tools will want to use Howl's metadata. Howl needs to provide an appropriate API for these types of tools.