Yeah...

But that doesn't help when I want to write a Pig library for you.  It also
doesn't help when I want to write a pig script that calls your library
stuff in the middle and then passes the result to something that Jake
wrote.  Pig's optimizer can't build a complete data flow across that
composite program.

It does help a bit with the problem of, say, iterating over files in a
directory.

My preference is languages like FlumeJava which start with java and use
builder-style API to inject the data flow specification.

On Mon, Oct 31, 2011 at 12:54 PM, Dan Brickley <[email protected]> wrote:

> On 31 October 2011 20:22, Ted Dunning <[email protected]> wrote:
> > On Mon, Oct 31, 2011 at 12:00 PM, Dan Brickley <[email protected]>
> wrote:
> >
> >> On 31 October 2011 17:27, Ted Dunning <[email protected]> wrote:
> >> > I think this would be very interesting to see.  Whether it should be
> part
> >> > of Mahout or a separate project is an open question.
> >> >
> >> > PIG, is, unfortunately not a real language in the sense of turing
> >> > completion or extensibility.  It is good at what it does, but not at
> >> being
> >> > extended to do more.
> >>
> >> ...although you can call out to functions defined in Java, Python etc.
> >> This doesn't make the top level language into a programming language,
> >> though. Was that your point, Ted?
> >> Yes.  That was the point.  Calling out is different from being able to
> > control the process from the outside in.
>
> I've just found http://wiki.apache.org/pig/TuringCompletePig which has
> copious notes on ways to address this. Excerpting a little:
>
> """Pig Latin is a data flow language. As such it does not offer users
> control flow and modularity features that are present in general
> purpose programming languages, including functions, modules, loops,
> and branches. Given that it is a data flow language adding these
> constructs is neither straightforward nor reasonable. However, users
> do want to be able to integrate standard programming techniques of
> separation and code sharing offered by functions and modules as well
> as integration of control flow offered by functions, loops, and
> branches. This document proposes a way to accomplish these goals while
> preserving Pig Latin's data flow orientation."""
>
> Spoiler alert (wiki page has a lot more detail).  Plan seems to be
> combination of macros (which are now in the language) and "second part
> of the proposal is to embed Pig Latin scripts in the host scripting
> language via a JDBC like compile, bind, run model. "
>
> I'm not sure how far along that part is...
>
> Dan
>
> ps. the following 3 links have everything I attempted before with
> Pig/Mahout integration; not a lot, but it left me intrigued and
> frustrated in equal measure.
>
> http://www.mail-archive.com/[email protected]/msg02848.html
> https://gist.github.com/1192831
>
> http://search-lucene.com/m/IOfRIc6wGq1&subj=+Unknown+program+chosen+Valid+program+names+are+truncated+list+from+Hadoop+program+driver
>

Reply via email to