yeah I came across the openIterator(alias) on PigServer.

basically that's what I like to get (dump of the alias and nothing else) when I 
execute pig script.

I'm currently writing a ruby wrapper that will use STORE the alias into 
temporary location in hdfs then do Hadoop file fetch
any better idea?

J
On 7 Dec 2010, at 18:16, Ashutosh Chauhan wrote:

> I am not sure if I understood your requirements clearly, but if you
> are not looking for a pure PigLatin solution and can work through
> Pig's java api, then you may want to look at PigServer.
> http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/PigServer.html
> Something along the following lines:
> 
> PigServer pig = new PigServer(pc, true);
> pig.registerQuery("A = load 'mydata'; ");
> pig.registerQuery("B = filter A by $0 > 10;");
> Iterator<Tuple> itr = pig.operIterator("B");
> while(itr.hasNext()){
>  if ( itr.next().get(0) == 25 ) {
>    // trigger further processing.
>  }
> }
> 
> Its obviously not directly useful, but conveys the general idea. Hope it 
> helps.
> 
> Ashutosh
> On Tue, Dec 7, 2010 at 06:40, Jae Lee <[email protected]> wrote:
>> Hi,
>> 
>> In our application Hive is used as a database. i.e. a result set from a 
>> select query is consumed outside of hadoop cluster.
>> 
>> The consumption process is not Hadoop friendly as in it is network bound not 
>> cpu/disk bound.
>> 
>> I'm in a process of converting hive query into pig query to see if it reads 
>> better.
>> 
>> What I'm stuck at is finding the content of a specific alias dump, from all 
>> the other stuff being logged, to be able to trigger further process.
>> 
>> STREAM <alias> THROUGH <cmd> seems to be one way to trigger a process, it's 
>> just that it seems not suitable for the kind of process we are looking at, 
>> because the <cmd> gets run in hadoop cluster.
>> 
>> any thought?
>> 
>> J
> 

Reply via email to