+1

2012/6/15 Alan Gates <[email protected]>

> Thanks Russell.  I move we make you the official Apache Pig secretary. :)
>
> Alan.
>
> On Jun 12, 2012, at 9:45 PM, Russell Jurney wrote:
>
> > Tuesday, Pig Meetup
> >
> > Alan Gates - upcoming improvements in operators/backend physical plan.
> > Desphagetification.
> > Reworking UDF interface, keep backward compatibility.
> > Hadoop 2 coming, will be slow adoption.
> >
> > Bill Graham, Julien & Twitter - Optimization oriented. Cluster is at
> > capacity. Detect skew, cost based optimizers, dynamic tuning. Gathering
> > performance metrics, will be in HCatalog. Look at previous executions of
> > same job to optimize on the fly.
> >
> > Companies: Yahoo, consultants, salesforce, twitter, hortonworks,
> cloudera,
> > zocalo systems?, trend micro
> >
> > Bill presented Ambrose. Motivation: 40MR job pig scripts, added DAG view.
> > Shows you progress of your script as percentage and stepwise view. Helps
> > with debug, optimization. Major progress.
> >
> > Pig users talk - using pig in local mode on sample, then pushing to
> > cluster. Using illustrate to cut developer iterations. No counters in
> local
> > mode. Embedded pig in loops for ML. Java embedding.
> > Java API PigServer to run scripts from apps. Macros are helping remove
> ugly
> > blocks of code, but UDFs are more solved by JRuby. Mortar data fixed
> Python
> > UDFs.
> >
> > Reducing friction around using Pig with tools is important. Slowness of
> > batch is hard for new users. Sample is hard to prepare that will do
> joins.
> > Illustrate was invented for this purpose.
> >
> > Scheduling pig jobs is still a problem. Oozie is unpopular and too hard.
> > Azkaban is inadequate for the enterprise. People hack things together. It
> > sucks.
> >
> > HCatalog is maturing. Rest API. Hive and Pig together. Rest interface is
> > for metadata so far. People are wanting to extend it to grab UDFs, etc.
> >
> > Russell Jurney http://datasyndrome.com
>
>

Reply via email to