Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Pig Wiki" for change
The "owl" page has been changed by jaytang.
The core M/R programming interface as we know it (the mapper, reducer, output
collector, record reader and input format ) all deal with collection of
abstract data objects, not files. However, the current set of !InputFormat
implementations provided by job API are relatively primitive and are heavily
coupled to file formats and HDFS paths to describe input and output locations.
From an application programmer’s perspective, one has to think about both the
abstract data and the physical representation and storage location, which is a
disconnect from the abstract data API. In the meantime, the number of file
formats and (de)serialization libraries have flourished in the Hadoop
community. Some of these require certain metadata to operate/optimize. While
providing optimization and performance enhancements, these file formats and
SerDe libs don’t make it any easier to develop applications on and manage very
big data sets.
- == High Level Diagram ==
+ == High Level Diagram ==
As one can see, Owl gives Hadoop users a uniform interface for organizing,
discovering and managing data stored in many different formats, and to promote
interoperability among different programming frameworks. Owl presents a single
logical view of data organization and hides the complexity and evolutions in
underlying physical data layout schemes. It gives Hadoop applications a stable
foundation to build upon.
@@ -34, +34 @@
|| Owl has support for converting data between write-friendly and
read-friendly formats || future ||
|| Owl has support for addressing HDFS NameNode limitations by decreasing the
number of files needed to store very large data sets || future ||
|| Owl provides a security model for secure data access || future ||
== Prerequisite ==
@@ -102, +101 @@
* deploy owl war file to Tomcat
* set up -Dorg.apache.hadoop.owl.xmlconfig=<full path to
owlServerConfig.xml> for the Tomcat deployment
- == Developing on Owl ==
+ == Developing on Owl ==
Owl has two major public APIs. ''Owl Driver'' provides management APIs
against three core Owl abstractions: "Owl Table", "Owl Database", and
"Partition". This API is backed up by an internal Owl metadata store that runs
on Tomcat and a relational database. ''!OwlInputFormat'' provides a data
access API and is modeled after the traditional Hadoop !InputFormat. In the
future, we plan to support ''!OwlOutputFormat'' and thus the notion of "Owl
Managed Table" where Owl controls the data flow into and out of "Owl Tables".
Owl also supports Pig integration with OwlPigLoader/Storer module.