Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change 
notification.

The "HowlJournal" page has been changed by AlanGates.
http://wiki.apache.org/pig/HowlJournal

--------------------------------------------------

New page:
= Howl Journal =

This document tracks the development of Howl.  It summarizes work that has been 
done in previous releases, what is currently being worked on, and proposals for
future work in Howl.

== Completed Work ==

|| Feature                                                            || 
Available in      || Comments ||
|| Read/write of data from Map Reduce                                 || Not 
yet released  ||          ||
|| Read/write of data from Pig                                        || Not 
yet released  ||          ||
|| Read from Hive                                                     || Not 
yet released  ||          ||
|| Support pushdown of columns to be projected into storage format    || Not 
yet released  ||          ||
|| Support for RCFile storage                                         || Not 
yet released  ||          ||

== Work in Progress ==

|| Feature           || Description ||
|| Add a CLI         || This will allow users to use Howl without installing 
all of Hive.  The syntax will match that of Hive's DDL. ||
|| Partition pruning || Currently, when asked to return information about a 
table Hive's metastore returns all partitions in the table.  This has a couple 
of issues.  One, for tables with large numbers of partitions it means the 
metadata operation of fetching information about the table is very expensive.  
Two, it makes more sense to have the partition pruning logic in one place 
(Howl) rather than in Hive, Pig, and MR. ||


== Proposed Work ==
'''Authentication'''<<BR>> Integrate Howl with security work done on Hadoop so 
that users can be properly authenticated.

'''Authorization'''<<BR>> The initial proposal is to use HDFS permissions to 
determine whether Howl operations can be executed.  For example, it would not 
be possible to drop a table unless the user had write permissions on the 
directory holding that table.  We need to determine how to extend this model to 
data not stored in HDFS (e.g. Hbase) and objects that do not exist in HDFS 
(e.g. views).  See HowlSecurity for more information.

'''Non-partition Predicate Pushdown'''<<BR>> Since in the future storage 
formats (such as RCFile) should support predicate pushdown, Howl needs to be 
able to push predicates into the storage layer when appropriate.

'''Notification'''<<BR>> Add ability for systems such as work flow to be 
notified when new data arrives in Howl.  This will be designed around a few 
systems receiving notification, not large numbers of users receiving 
notifications (i.e. we will not be building a general purpose publish/subscribe 
system).  One solution to this might be an RSS feed or similar simple service.

'''Schema Evolution'''<<BR>>  Currently schema evolution in Hive is limited to 
adding columns at the end of the non-partition keys columns.  It may be 
desirable to support other forms of schema evolution, such as adding columns in 
other parts of the record, or making it so that new partitions for a table no 
longer contain a given column.

'''Support data read across partitions with different storage formats'''<<BR>> 
This work is done except that only one storage format is currently supported.

'''Support for more file formats'''<<BR>> Additional file formats such as 
sequence file, text, etc. need to be added.

'''Utility APIs'''<<BR>> Grid managers will want to build tools that use Howl 
to help manage their grids.  For example, one might build a tool to do 
replication between two grids.  Such tools will want to use Howl's metadata.  
Howl needs to provide an appropriate API for these types of tools.

Reply via email to