+1 2012/6/15 Alan Gates <[email protected]>
> Thanks Russell. I move we make you the official Apache Pig secretary. :) > > Alan. > > On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote: > > > Tuesday, Pig Meetup > > > > Alan Gates - upcoming improvements in operators/backend physical plan. > > Desphagetification. > > Reworking UDF interface, keep backward compatibility. > > Hadoop 2 coming, will be slow adoption. > > > > Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at > > capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering > > performance metrics, will be in HCatalog. Look at previous executions of > > same job to optimize on the fly. > > > > Companies: Yahoo, consultants, salesforce, twitter, hortonworks, > cloudera, > > zocalo systems?, trend micro > > > > Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view. > > Shows you progress of your script as percentage and stepwise view. Helps > > with debug, optimization. Major progress. > > > > Pig users talk - using pig in local mode on sample, then pushing to > > cluster. Using illustrate to cut developer iterations. No counters in > local > > mode. Embedded pig in loops for ML. Java embedding. > > Java API PigServer to run scripts from apps. Macros are helping remove > ugly > > blocks of code, but UDFs are more solved by JRuby. Mortar data fixed > Python > > UDFs. > > > > Reducing friction around using Pig with tools is important. Slowness of > > batch is hard for new users. Sample is hard to prepare that will do > joins. > > Illustrate was invented for this purpose. > > > > Scheduling pig jobs is still a problem. Oozie is unpopular and too hard. > > Azkaban is inadequate for the enterprise. People hack things together. It > > sucks. > > > > HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is > > for metadata so far. People are wanting to extend it to grab UDFs, etc. > > > > Russell Jurney http://datasyndrome.com > >
