oh yes it will definitely work... it's just that I don't want to write java wrapper around PigServer. I would rather want a solution that works with plain vanilla pig installation...
J On 8 Dec 2010, at 16:41, Ashutosh Chauhan wrote: > You didn't mention why PigServer.openIterator() won't work for you. > One of its usecase is what you are describing. It will avoid the need > of writing ruby wrapper. > > Ashutosh > On Tue, Dec 7, 2010 at 10:26, Jae Lee <[email protected]> wrote: >> yeah I came across the openIterator(alias) on PigServer. >> >> basically that's what I like to get (dump of the alias and nothing else) >> when I execute pig script. >> >> I'm currently writing a ruby wrapper that will use STORE the alias into >> temporary location in hdfs then do Hadoop file fetch >> any better idea? >> >> J >> On 7 Dec 2010, at 18:16, Ashutosh Chauhan wrote: >> >>> I am not sure if I understood your requirements clearly, but if you >>> are not looking for a pure PigLatin solution and can work through >>> Pig's java api, then you may want to look at PigServer. >>> http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/PigServer.html >>> Something along the following lines: >>> >>> PigServer pig = new PigServer(pc, true); >>> pig.registerQuery("A = load 'mydata'; "); >>> pig.registerQuery("B = filter A by $0 > 10;"); >>> Iterator<Tuple> itr = pig.operIterator("B"); >>> while(itr.hasNext()){ >>> if ( itr.next().get(0) == 25 ) { >>> // trigger further processing. >>> } >>> } >>> >>> Its obviously not directly useful, but conveys the general idea. Hope it >>> helps. >>> >>> Ashutosh >>> On Tue, Dec 7, 2010 at 06:40, Jae Lee <[email protected]> wrote: >>>> Hi, >>>> >>>> In our application Hive is used as a database. i.e. a result set from a >>>> select query is consumed outside of hadoop cluster. >>>> >>>> The consumption process is not Hadoop friendly as in it is network bound >>>> not cpu/disk bound. >>>> >>>> I'm in a process of converting hive query into pig query to see if it >>>> reads better. >>>> >>>> What I'm stuck at is finding the content of a specific alias dump, from >>>> all the other stuff being logged, to be able to trigger further process. >>>> >>>> STREAM <alias> THROUGH <cmd> seems to be one way to trigger a process, >>>> it's just that it seems not suitable for the kind of process we are >>>> looking at, because the <cmd> gets run in hadoop cluster. >>>> >>>> any thought? >>>> >>>> J >>> >> >> >
