Dear Wiki user, You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The "owl" page has been changed by jaytang. http://wiki.apache.org/pig/owl?action=diff&rev1=12&rev2=13 -------------------------------------------------- The core M/R programming interface as we know it (the mapper, reducer, output collector, record reader and input format ) all deal with collection of abstract data objects, not files. However, the current set of !InputFormat implementations provided by job API are relatively primitive and are heavily coupled to file formats and HDFS paths to describe input and output locations. From an application programmer’s perspective, one has to think about both the abstract data and the physical representation and storage location, which is a disconnect from the abstract data API. In the meantime, the number of file formats and (de)serialization libraries have flourished in the Hadoop community. Some of these require certain metadata to operate/optimize. While providing optimization and performance enhancements, these file formats and SerDe libs don’t make it any easier to develop applications on and manage very big data sets. - == High Level Diagram == + == High Level Diagram == As one can see, Owl gives Hadoop users a uniform interface for organizing, discovering and managing data stored in many different formats, and to promote interoperability among different programming frameworks. Owl presents a single logical view of data organization and hides the complexity and evolutions in underlying physical data layout schemes. It gives Hadoop applications a stable foundation to build upon. @@ -34, +34 @@ || Owl has support for converting data between write-friendly and read-friendly formats || future || || Owl has support for addressing HDFS NameNode limitations by decreasing the number of files needed to store very large data sets || future || || Owl provides a security model for secure data access || future || - == Prerequisite == @@ -102, +101 @@ * deploy owl war file to Tomcat * set up -Dorg.apache.hadoop.owl.xmlconfig=<full path to owlServerConfig.xml> for the Tomcat deployment - == Developing on Owl == + == Developing on Owl == Owl has two major public APIs. ''Owl Driver'' provides management APIs against three core Owl abstractions: "Owl Table", "Owl Database", and "Partition". This API is backed up by an internal Owl metadata store that runs on Tomcat and a relational database. ''!OwlInputFormat'' provides a data access API and is modeled after the traditional Hadoop !InputFormat. In the future, we plan to support ''!OwlOutputFormat'' and thus the notion of "Owl Managed Table" where Owl controls the data flow into and out of "Owl Tables". Owl also supports Pig integration with OwlPigLoader/Storer module.
