yeah I came across the openIterator(alias) on PigServer. basically that's what I like to get (dump of the alias and nothing else) when I execute pig script.
I'm currently writing a ruby wrapper that will use STORE the alias into temporary location in hdfs then do Hadoop file fetch any better idea? J On 7 Dec 2010, at 18:16, Ashutosh Chauhan wrote: > I am not sure if I understood your requirements clearly, but if you > are not looking for a pure PigLatin solution and can work through > Pig's java api, then you may want to look at PigServer. > http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/PigServer.html > Something along the following lines: > > PigServer pig = new PigServer(pc, true); > pig.registerQuery("A = load 'mydata'; "); > pig.registerQuery("B = filter A by $0 > 10;"); > Iterator<Tuple> itr = pig.operIterator("B"); > while(itr.hasNext()){ > if ( itr.next().get(0) == 25 ) { > // trigger further processing. > } > } > > Its obviously not directly useful, but conveys the general idea. Hope it > helps. > > Ashutosh > On Tue, Dec 7, 2010 at 06:40, Jae Lee <[email protected]> wrote: >> Hi, >> >> In our application Hive is used as a database. i.e. a result set from a >> select query is consumed outside of hadoop cluster. >> >> The consumption process is not Hadoop friendly as in it is network bound not >> cpu/disk bound. >> >> I'm in a process of converting hive query into pig query to see if it reads >> better. >> >> What I'm stuck at is finding the content of a specific alias dump, from all >> the other stuff being logged, to be able to trigger further process. >> >> STREAM <alias> THROUGH <cmd> seems to be one way to trigger a process, it's >> just that it seems not suitable for the kind of process we are looking at, >> because the <cmd> gets run in hadoop cluster. >> >> any thought? >> >> J >
