Slides about Azkaban and Pig: http://www.slideshare.net/rjurney/azkaban-pig-5057793
On Thu, Aug 26, 2010 at 12:55 AM, Jeff Zhang <zjf...@gmail.com> wrote: > Wonderful, Dmitriy, It's pity for me missing the contributor meeting. > And any ppt shared ? > > > > On Wed, Aug 25, 2010 at 8:32 PM, Dmitriy Ryaboy <dvrya...@gmail.com> > wrote: > > Twitter hosted this month's Pig contributor meeting. > > Developers from Yahoo, Twitter, LinkedIn, RichRelevance, and Cloudera > were > > present. > > > > 1. Howl > > First, Alan Gates demoed Howl, a project whose goal is to provide table > > management service for all of hadoop. The vision is that ultimately you > will > > be able to read/write data using regular MR, or Pig, or Hive, and read it > > using any of those three, with full support of a partition-aware metadata > > store that will tell you what data is available, what its schema is, etc, > > reusing a single table abstraction. > > > > Currently, tables are created using (a restricted subset of) Hive ddl > > statements; a howl cli for this will be created, which will enforce the > > restricted subset. > > Writing to the table using Pig or MapReduce is supported. Reading can > > already be done using all three. > > > > At the moment, a single Pig store statement can only store into a single > > partition; adding ability to "spray" across partitions is on the roadmap. > > This, and a good api for interacting with the metastore, are the two > areas > > that were identified as good opportunities for the wider developer > community > > to get involved with the project. The source code is on GitHub, and is at > > the moment synchronized with the development trunk manually; Yahoo folks > > will look into changing this. > > > > Security is a concern, and Yahoo will be working on it. Making it > possible > > for Hive to write to the tables is at the moment not as high a priority > as > > the others listed, it would basically involve just writing a Hive SerDe > (an > > equivalent of Pig's StoreFunc). > > > > 2. Azkaban presentation > > Russel Jurney and Richard Park from LinkedIn presented the workflow > > management tool open-sourced by LinkedIn, called Azkaban. It allows you > to > > declare job dependencies, has a web interface for launching and > monitoring > > jobs, etc. It has a special exec mode for Pig that lets you set some > > Pig-specific options on a per-job basis. It does not currently have > > triggering or job-instance parameter substitution (it does have job-level > > parameter substitution). When asked what would Pig could do to make life > > easier for Azkaban, the two things Richard identified were registering > jars > > through the grunt command line and a way to monitor the running job -- > both > > of these are already in trunk, so we're in pretty good shaped for 0.8 > > > > 3. Piggybank discussion > > Kevin Weil led a discussion of the piggybank. There are a few problems > with > > it -- it's released on the Pig schedule, and has quite a few barriers to > > submission that are, anecdotally at least, preventing people from > > contributing. Several options were discussed, with the group finally > > settling on starting a community-curated GitHub project for piggybank. It > > will have a number of committers from different companies, and will aim > to > > make it easy for folks to contribute (all contribs will still have to > have > > tests, and be Apache 2.0-licensed). More details will be forthcoming as > we > > figure them out. Initially this project will be seeded with the current > > Piggybank functions some time after 0.8 is branched. The initial list of > > committers Kevin Weil (Twitter), Dmitriy Ryaboy (Twitter), Carl Steinbach > > (Cloudera), and Russel Jurney (LinkedIn). Yahoo will also nominate > someone. > > Please send us any thoughts you might have on this subject. It was > suggested > > that a lot of common code might be shared with Hive UDFs, which have the > > same problems as Piggybank does, and that perhaps the project can be > another > > collaboration point between the projects. Not clear how that would work, > > Carl will talk to other Hive people. > > > > Pig 0.9 > > So far the items on the list for 0.9 are: better type propagation / > > resolution story and documentation, perhaps different parser (ANTLR?), > some > > performance tweaks, and map types with fixed-type values. Much still to > be > > decided. > > > > The next contributor meeting will be hosted by LinkedIn in October. > > > > -Dmitriy > > > > > > -- > Best Regards > > Jeff Zhang >