Re: Is there anything in pig that supports external client to stream out a content of alias? a bit like Hive Thrift server...

Jae Lee Wed, 08 Dec 2010 10:34:28 -0800

oh yes it will definitely work... it's just that I don't want to write java 
wrapper around PigServer. I would rather want a solution that works with plain 
vanilla pig installation...


J
On 8 Dec 2010, at 16:41, Ashutosh Chauhan wrote:

> You didn't mention why PigServer.openIterator() won't work for you.
> One of its usecase is what you are describing. It will avoid the need
> of writing ruby wrapper.
> 
> Ashutosh
> On Tue, Dec 7, 2010 at 10:26, Jae Lee <[email protected]> wrote:
>> yeah I came across the openIterator(alias) on PigServer.
>> 
>> basically that's what I like to get (dump of the alias and nothing else) 
>> when I execute pig script.
>> 
>> I'm currently writing a ruby wrapper that will use STORE the alias into 
>> temporary location in hdfs then do Hadoop file fetch
>> any better idea?
>> 
>> J
>> On 7 Dec 2010, at 18:16, Ashutosh Chauhan wrote:
>> 
>>> I am not sure if I understood your requirements clearly, but if you
>>> are not looking for a pure PigLatin solution and can work through
>>> Pig's java api, then you may want to look at PigServer.
>>> http://pig.apache.org/docs/r0.7.0/api/org/apache/pig/PigServer.html
>>> Something along the following lines:
>>> 
>>> PigServer pig = new PigServer(pc, true);
>>> pig.registerQuery("A = load 'mydata'; ");
>>> pig.registerQuery("B = filter A by $0 > 10;");
>>> Iterator<Tuple> itr = pig.operIterator("B");
>>> while(itr.hasNext()){
>>>  if ( itr.next().get(0) == 25 ) {
>>>    // trigger further processing.
>>>  }
>>> }
>>> 
>>> Its obviously not directly useful, but conveys the general idea. Hope it 
>>> helps.
>>> 
>>> Ashutosh
>>> On Tue, Dec 7, 2010 at 06:40, Jae Lee <[email protected]> wrote:
>>>> Hi,
>>>> 
>>>> In our application Hive is used as a database. i.e. a result set from a 
>>>> select query is consumed outside of hadoop cluster.
>>>> 
>>>> The consumption process is not Hadoop friendly as in it is network bound 
>>>> not cpu/disk bound.
>>>> 
>>>> I'm in a process of converting hive query into pig query to see if it 
>>>> reads better.
>>>> 
>>>> What I'm stuck at is finding the content of a specific alias dump, from 
>>>> all the other stuff being logged, to be able to trigger further process.
>>>> 
>>>> STREAM <alias> THROUGH <cmd> seems to be one way to trigger a process, 
>>>> it's just that it seems not suitable for the kind of process we are 
>>>> looking at, because the <cmd> gets run in hadoop cluster.
>>>> 
>>>> any thought?
>>>> 
>>>> J
>>> 
>> 
>> 
>

Re: Is there anything in pig that supports external client to stream out a content of alias? a bit like Hive Thrift server...

Reply via email to