Yeah... But that doesn't help when I want to write a Pig library for you. It also doesn't help when I want to write a pig script that calls your library stuff in the middle and then passes the result to something that Jake wrote. Pig's optimizer can't build a complete data flow across that composite program.
It does help a bit with the problem of, say, iterating over files in a directory. My preference is languages like FlumeJava which start with java and use builder-style API to inject the data flow specification. On Mon, Oct 31, 2011 at 12:54 PM, Dan Brickley <[email protected]> wrote: > On 31 October 2011 20:22, Ted Dunning <[email protected]> wrote: > > On Mon, Oct 31, 2011 at 12:00 PM, Dan Brickley <[email protected]> > wrote: > > > >> On 31 October 2011 17:27, Ted Dunning <[email protected]> wrote: > >> > I think this would be very interesting to see. Whether it should be > part > >> > of Mahout or a separate project is an open question. > >> > > >> > PIG, is, unfortunately not a real language in the sense of turing > >> > completion or extensibility. It is good at what it does, but not at > >> being > >> > extended to do more. > >> > >> ...although you can call out to functions defined in Java, Python etc. > >> This doesn't make the top level language into a programming language, > >> though. Was that your point, Ted? > >> Yes. That was the point. Calling out is different from being able to > > control the process from the outside in. > > I've just found http://wiki.apache.org/pig/TuringCompletePig which has > copious notes on ways to address this. Excerpting a little: > > """Pig Latin is a data flow language. As such it does not offer users > control flow and modularity features that are present in general > purpose programming languages, including functions, modules, loops, > and branches. Given that it is a data flow language adding these > constructs is neither straightforward nor reasonable. However, users > do want to be able to integrate standard programming techniques of > separation and code sharing offered by functions and modules as well > as integration of control flow offered by functions, loops, and > branches. This document proposes a way to accomplish these goals while > preserving Pig Latin's data flow orientation.""" > > Spoiler alert (wiki page has a lot more detail). Plan seems to be > combination of macros (which are now in the language) and "second part > of the proposal is to embed Pig Latin scripts in the host scripting > language via a JDBC like compile, bind, run model. " > > I'm not sure how far along that part is... > > Dan > > ps. the following 3 links have everything I attempted before with > Pig/Mahout integration; not a lot, but it left me intrigued and > frustrated in equal measure. > > http://www.mail-archive.com/[email protected]/msg02848.html > https://gist.github.com/1192831 > > http://search-lucene.com/m/IOfRIc6wGq1&subj=+Unknown+program+chosen+Valid+program+names+are+truncated+list+from+Hadoop+program+driver >
