Re: On pre/post Application/Superstep contract

2011-10-01 Thread Avery Ching
Can you show me an example of the inner Context class idea?  Sounds 
interesting...


Another question is whether to have the 
(pre|post)(Application|Superstep)() methods executed one as an aggregate 
and passed to the workers, or executed per worker.  I think the former 
might be a little expensive, depending on how big the "Context" is.  
Perhaps executed per worker makes the most sense.  Any other thoughts?


Maybe aggregator methods would be useful as well, say to do this like 
write the aggregators for the entire application every now and then.  
That would probably get executed on the master.  I think the current 
uses of the (pre|post)(Application|Superstep)() methods are fine in the 
per-worker specific way of thinking.


Avery

On 10/1/11 7:06 AM, Jake Mannix wrote:
On Sat, Oct 1, 2011 at 2:29 AM, Hyunsik Choi > wrote:


Now, that way looks good. Probably, later we could improve that
like Context
of MapReduce.


ooh!  I really that suggestion, actually.  If every BasicVertex has an
inner Context class, we can allow user applications to define/extend their
Context and we can avoid even doing any of this setClass() and reflection
based stuff, if we do it right.  Typesafe context object FTW!

  -jake


--
Hyunsik Choi
Database Lab, Korea University

On Sat, Oct 1, 2011 at 3:01 AM, Avery Ching mailto:ach...@apache.org>> wrote:
> It isn't visible (purposefully) since it is internal state.
>
> That being said, I believe this type of functionality would be
useful.
>  Right now there is a lot of ugly static variables stored in Vertex
> implementations because of it.  Perhaps we should add another
method in
> GiraphJob
>
> final public void setWorkerObjectClass(Class
> workerObjectClass);
>
> Then in BasicVertex
>
> public void preApplication(Configurable workerObject);
> public void postApplication(Configurable workerObject);
> public void preSuperstep(Configurable workerObject);
> public void postSuperstep(Configurable workerObject);
> public Configurable getWorkerObject();
>
> Anyone else think of a cleaner way to do it?
>
> Avery
>
> On 9/30/11 8:42 AM, Claudio Martella wrote:
>>
>> afaik getGraphState() is not visible to my object. Or?
>>
>> On Fri, Sep 30, 2011 at 5:23 PM, Jake
Mannixmailto:jake.man...@gmail.com>>
>>  wrote:
>>>
>>> Remember that there's already a "singleton"-like object
available to all
>>> vertices: the GraphState object, which has a handle on the
GraphMapper.
>>> Maybe this is the right place to get your handle on the
>>> FSDataOutputStream?
>>>   -jake
>>> On Fri, Sep 30, 2011 at 7:25 AM, Claudio Martella
>>> mailto:claudio.marte...@gmail.com>>  wrote:

 Hello,

 to my understanding pre/post Application/Superstep methods
are called
 ONCE on a "fake" vertex on each worker (the so called
 representativeVertex). This means that these methods should
not depend
 on any specific-vertex data.

 As I'm trying to sort out my Emitter, I thought I could
create one
 FSDataOutputStream per worker which each Vertex belonging to that
 worker could share (which would be even thread-safe as each
worker is
 not parallel).

 The questions are:

 1) how to share the FSDataOutputFormat object created at
 preApplication() (and closed at postApplication()) created by
this
 representativeVertex?

 2) about the filename, I'd be happy to have access to the
Worker Id so
 to create an outputfile filename as with happens with
reducers and
 part files by FileOutputFormat
(i.e.-workerid).


 The "best" idea i have in my mind right now is to use the calling
 vertex (the representativeVertex) hashCode as the id, and
create an
 external Singleton where i can request register and request the
 outputfiles similarly to what happens with Aggregators now,
and by
 passing the *this* reference as an index to this map. Any
better idea?
 :)


 --
 Claudio Martella
 claudio.marte...@gmail.com 
>>>
>>
>>
>
>






Re: On pre/post Application/Superstep contract

2011-10-01 Thread Jake Mannix
On Sat, Oct 1, 2011 at 2:29 AM, Hyunsik Choi  wrote:

> Now, that way looks good. Probably, later we could improve that like
> Context
> of MapReduce.
>

ooh!  I really that suggestion, actually.  If every BasicVertex has an
inner Context class, we can allow user applications to define/extend their
Context and we can avoid even doing any of this setClass() and reflection
based stuff, if we do it right.  Typesafe context object FTW!

  -jake


>
> --
> Hyunsik Choi
> Database Lab, Korea University
>
> On Sat, Oct 1, 2011 at 3:01 AM, Avery Ching  wrote:
> > It isn't visible (purposefully) since it is internal state.
> >
> > That being said, I believe this type of functionality would be useful.
> >  Right now there is a lot of ugly static variables stored in Vertex
> > implementations because of it.  Perhaps we should add another method in
> > GiraphJob
> >
> > final public void setWorkerObjectClass(Class
> > workerObjectClass);
> >
> > Then in BasicVertex
> >
> > public void preApplication(Configurable workerObject);
> > public void postApplication(Configurable workerObject);
> > public void preSuperstep(Configurable workerObject);
> > public void postSuperstep(Configurable workerObject);
> > public Configurable getWorkerObject();
> >
> > Anyone else think of a cleaner way to do it?
> >
> > Avery
> >
> > On 9/30/11 8:42 AM, Claudio Martella wrote:
> >>
> >> afaik getGraphState() is not visible to my object. Or?
> >>
> >> On Fri, Sep 30, 2011 at 5:23 PM, Jake Mannix
> >>  wrote:
> >>>
> >>> Remember that there's already a "singleton"-like object available to
> all
> >>> vertices: the GraphState object, which has a handle on the GraphMapper.
> >>> Maybe this is the right place to get your handle on the
> >>> FSDataOutputStream?
> >>>   -jake
> >>> On Fri, Sep 30, 2011 at 7:25 AM, Claudio Martella
> >>>   wrote:
> 
>  Hello,
> 
>  to my understanding pre/post Application/Superstep methods are called
>  ONCE on a "fake" vertex on each worker (the so called
>  representativeVertex). This means that these methods should not depend
>  on any specific-vertex data.
> 
>  As I'm trying to sort out my Emitter, I thought I could create one
>  FSDataOutputStream per worker which each Vertex belonging to that
>  worker could share (which would be even thread-safe as each worker is
>  not parallel).
> 
>  The questions are:
> 
>  1) how to share the FSDataOutputFormat object created at
>  preApplication() (and closed at postApplication()) created by this
>  representativeVertex?
> 
>  2) about the filename, I'd be happy to have access to the Worker Id so
>  to create an outputfile filename as with happens with reducers and
>  part files by FileOutputFormat (i.e.-workerid).
> 
> 
>  The "best" idea i have in my mind right now is to use the calling
>  vertex (the representativeVertex) hashCode as the id, and create an
>  external Singleton where i can request register and request the
>  outputfiles similarly to what happens with Aggregators now, and by
>  passing the *this* reference as an index to this map. Any better idea?
>  :)
> 
> 
>  --
>  Claudio Martella
>  claudio.marte...@gmail.com
> >>>
> >>
> >>
> >
> >
>


Re: On pre/post Application/Superstep contract

2011-10-01 Thread Claudio Martella
yep, it looks good from italy as well :)

On Sat, Oct 1, 2011 at 11:29 AM, Hyunsik Choi  wrote:
> Now, that way looks good. Probably, later we could improve that like Context
> of MapReduce.
>
> --
> Hyunsik Choi
> Database Lab, Korea University
>
> On Sat, Oct 1, 2011 at 3:01 AM, Avery Ching  wrote:
>> It isn't visible (purposefully) since it is internal state.
>>
>> That being said, I believe this type of functionality would be useful.
>>  Right now there is a lot of ugly static variables stored in Vertex
>> implementations because of it.  Perhaps we should add another method in
>> GiraphJob
>>
>> final public void setWorkerObjectClass(Class
>> workerObjectClass);
>>
>> Then in BasicVertex
>>
>> public void preApplication(Configurable workerObject);
>> public void postApplication(Configurable workerObject);
>> public void preSuperstep(Configurable workerObject);
>> public void postSuperstep(Configurable workerObject);
>> public Configurable getWorkerObject();
>>
>> Anyone else think of a cleaner way to do it?
>>
>> Avery
>>
>> On 9/30/11 8:42 AM, Claudio Martella wrote:
>>>
>>> afaik getGraphState() is not visible to my object. Or?
>>>
>>> On Fri, Sep 30, 2011 at 5:23 PM, Jake Mannix
>>>  wrote:

 Remember that there's already a "singleton"-like object available to all
 vertices: the GraphState object, which has a handle on the GraphMapper.
 Maybe this is the right place to get your handle on the
 FSDataOutputStream?
   -jake
 On Fri, Sep 30, 2011 at 7:25 AM, Claudio Martella
   wrote:
>
> Hello,
>
> to my understanding pre/post Application/Superstep methods are called
> ONCE on a "fake" vertex on each worker (the so called
> representativeVertex). This means that these methods should not depend
> on any specific-vertex data.
>
> As I'm trying to sort out my Emitter, I thought I could create one
> FSDataOutputStream per worker which each Vertex belonging to that
> worker could share (which would be even thread-safe as each worker is
> not parallel).
>
> The questions are:
>
> 1) how to share the FSDataOutputFormat object created at
> preApplication() (and closed at postApplication()) created by this
> representativeVertex?
>
> 2) about the filename, I'd be happy to have access to the Worker Id so
> to create an outputfile filename as with happens with reducers and
> part files by FileOutputFormat (i.e.-workerid).
>
>
> The "best" idea i have in my mind right now is to use the calling
> vertex (the representativeVertex) hashCode as the id, and create an
> external Singleton where i can request register and request the
> outputfiles similarly to what happens with Aggregators now, and by
> passing the *this* reference as an index to this map. Any better idea?
> :)
>
>
> --
>     Claudio Martella
>     claudio.marte...@gmail.com

>>>
>>>
>>
>>
>



-- 
    Claudio Martella
    claudio.marte...@gmail.com


Re: On pre/post Application/Superstep contract

2011-10-01 Thread Hyunsik Choi
Now, that way looks good. Probably, later we could improve that like Context
of MapReduce.

--
Hyunsik Choi
Database Lab, Korea University

On Sat, Oct 1, 2011 at 3:01 AM, Avery Ching  wrote:
> It isn't visible (purposefully) since it is internal state.
>
> That being said, I believe this type of functionality would be useful.
>  Right now there is a lot of ugly static variables stored in Vertex
> implementations because of it.  Perhaps we should add another method in
> GiraphJob
>
> final public void setWorkerObjectClass(Class
> workerObjectClass);
>
> Then in BasicVertex
>
> public void preApplication(Configurable workerObject);
> public void postApplication(Configurable workerObject);
> public void preSuperstep(Configurable workerObject);
> public void postSuperstep(Configurable workerObject);
> public Configurable getWorkerObject();
>
> Anyone else think of a cleaner way to do it?
>
> Avery
>
> On 9/30/11 8:42 AM, Claudio Martella wrote:
>>
>> afaik getGraphState() is not visible to my object. Or?
>>
>> On Fri, Sep 30, 2011 at 5:23 PM, Jake Mannix
>>  wrote:
>>>
>>> Remember that there's already a "singleton"-like object available to all
>>> vertices: the GraphState object, which has a handle on the GraphMapper.
>>> Maybe this is the right place to get your handle on the
>>> FSDataOutputStream?
>>>   -jake
>>> On Fri, Sep 30, 2011 at 7:25 AM, Claudio Martella
>>>   wrote:

 Hello,

 to my understanding pre/post Application/Superstep methods are called
 ONCE on a "fake" vertex on each worker (the so called
 representativeVertex). This means that these methods should not depend
 on any specific-vertex data.

 As I'm trying to sort out my Emitter, I thought I could create one
 FSDataOutputStream per worker which each Vertex belonging to that
 worker could share (which would be even thread-safe as each worker is
 not parallel).

 The questions are:

 1) how to share the FSDataOutputFormat object created at
 preApplication() (and closed at postApplication()) created by this
 representativeVertex?

 2) about the filename, I'd be happy to have access to the Worker Id so
 to create an outputfile filename as with happens with reducers and
 part files by FileOutputFormat (i.e.-workerid).


 The "best" idea i have in my mind right now is to use the calling
 vertex (the representativeVertex) hashCode as the id, and create an
 external Singleton where i can request register and request the
 outputfiles similarly to what happens with Aggregators now, and by
 passing the *this* reference as an index to this map. Any better idea?
 :)


 --
     Claudio Martella
     claudio.marte...@gmail.com
>>>
>>
>>
>
>


Re: On pre/post Application/Superstep contract

2011-09-30 Thread Avery Ching

It isn't visible (purposefully) since it is internal state.

That being said, I believe this type of functionality would be useful.  
Right now there is a lot of ugly static variables stored in Vertex 
implementations because of it.  Perhaps we should add another method in 
GiraphJob


final public void setWorkerObjectClass(Class 
workerObjectClass);


Then in BasicVertex

public void preApplication(Configurable workerObject);
public void postApplication(Configurable workerObject);
public void preSuperstep(Configurable workerObject);
public void postSuperstep(Configurable workerObject);
public Configurable getWorkerObject();

Anyone else think of a cleaner way to do it?

Avery

On 9/30/11 8:42 AM, Claudio Martella wrote:

afaik getGraphState() is not visible to my object. Or?

On Fri, Sep 30, 2011 at 5:23 PM, Jake Mannix  wrote:

Remember that there's already a "singleton"-like object available to all
vertices: the GraphState object, which has a handle on the GraphMapper.
Maybe this is the right place to get your handle on the FSDataOutputStream?
   -jake
On Fri, Sep 30, 2011 at 7:25 AM, Claudio Martella
  wrote:

Hello,

to my understanding pre/post Application/Superstep methods are called
ONCE on a "fake" vertex on each worker (the so called
representativeVertex). This means that these methods should not depend
on any specific-vertex data.

As I'm trying to sort out my Emitter, I thought I could create one
FSDataOutputStream per worker which each Vertex belonging to that
worker could share (which would be even thread-safe as each worker is
not parallel).

The questions are:

1) how to share the FSDataOutputFormat object created at
preApplication() (and closed at postApplication()) created by this
representativeVertex?

2) about the filename, I'd be happy to have access to the Worker Id so
to create an outputfile filename as with happens with reducers and
part files by FileOutputFormat (i.e.-workerid).


The "best" idea i have in my mind right now is to use the calling
vertex (the representativeVertex) hashCode as the id, and create an
external Singleton where i can request register and request the
outputfiles similarly to what happens with Aggregators now, and by
passing the *this* reference as an index to this map. Any better idea?
:)


--
 Claudio Martella
 claudio.marte...@gmail.com









Re: On pre/post Application/Superstep contract

2011-09-30 Thread Claudio Martella
afaik getGraphState() is not visible to my object. Or?

On Fri, Sep 30, 2011 at 5:23 PM, Jake Mannix  wrote:
> Remember that there's already a "singleton"-like object available to all
> vertices: the GraphState object, which has a handle on the GraphMapper.
> Maybe this is the right place to get your handle on the FSDataOutputStream?
>   -jake
> On Fri, Sep 30, 2011 at 7:25 AM, Claudio Martella
>  wrote:
>>
>> Hello,
>>
>> to my understanding pre/post Application/Superstep methods are called
>> ONCE on a "fake" vertex on each worker (the so called
>> representativeVertex). This means that these methods should not depend
>> on any specific-vertex data.
>>
>> As I'm trying to sort out my Emitter, I thought I could create one
>> FSDataOutputStream per worker which each Vertex belonging to that
>> worker could share (which would be even thread-safe as each worker is
>> not parallel).
>>
>> The questions are:
>>
>> 1) how to share the FSDataOutputFormat object created at
>> preApplication() (and closed at postApplication()) created by this
>> representativeVertex?
>>
>> 2) about the filename, I'd be happy to have access to the Worker Id so
>> to create an outputfile filename as with happens with reducers and
>> part files by FileOutputFormat (i.e. -workerid).
>>
>>
>> The "best" idea i have in my mind right now is to use the calling
>> vertex (the representativeVertex) hashCode as the id, and create an
>> external Singleton where i can request register and request the
>> outputfiles similarly to what happens with Aggregators now, and by
>> passing the *this* reference as an index to this map. Any better idea?
>> :)
>>
>>
>> --
>>     Claudio Martella
>>     claudio.marte...@gmail.com
>
>



-- 
    Claudio Martella
    claudio.marte...@gmail.com


Re: On pre/post Application/Superstep contract

2011-09-30 Thread Jake Mannix
Remember that there's already a "singleton"-like object available to all
vertices: the GraphState object, which has a handle on the GraphMapper.

Maybe this is the right place to get your handle on the FSDataOutputStream?

  -jake

On Fri, Sep 30, 2011 at 7:25 AM, Claudio Martella <
claudio.marte...@gmail.com> wrote:

> Hello,
>
> to my understanding pre/post Application/Superstep methods are called
> ONCE on a "fake" vertex on each worker (the so called
> representativeVertex). This means that these methods should not depend
> on any specific-vertex data.
>
> As I'm trying to sort out my Emitter, I thought I could create one
> FSDataOutputStream per worker which each Vertex belonging to that
> worker could share (which would be even thread-safe as each worker is
> not parallel).
>
> The questions are:
>
> 1) how to share the FSDataOutputFormat object created at
> preApplication() (and closed at postApplication()) created by this
> representativeVertex?
>
> 2) about the filename, I'd be happy to have access to the Worker Id so
> to create an outputfile filename as with happens with reducers and
> part files by FileOutputFormat (i.e. -workerid).
>
>
> The "best" idea i have in my mind right now is to use the calling
> vertex (the representativeVertex) hashCode as the id, and create an
> external Singleton where i can request register and request the
> outputfiles similarly to what happens with Aggregators now, and by
> passing the *this* reference as an index to this map. Any better idea?
> :)
>
>
> --
> Claudio Martella
> claudio.marte...@gmail.com
>


On pre/post Application/Superstep contract

2011-09-30 Thread Claudio Martella
Hello,

to my understanding pre/post Application/Superstep methods are called
ONCE on a "fake" vertex on each worker (the so called
representativeVertex). This means that these methods should not depend
on any specific-vertex data.

As I'm trying to sort out my Emitter, I thought I could create one
FSDataOutputStream per worker which each Vertex belonging to that
worker could share (which would be even thread-safe as each worker is
not parallel).

The questions are:

1) how to share the FSDataOutputFormat object created at
preApplication() (and closed at postApplication()) created by this
representativeVertex?

2) about the filename, I'd be happy to have access to the Worker Id so
to create an outputfile filename as with happens with reducers and
part files by FileOutputFormat (i.e. -workerid).


The "best" idea i have in my mind right now is to use the calling
vertex (the representativeVertex) hashCode as the id, and create an
external Singleton where i can request register and request the
outputfiles similarly to what happens with Aggregators now, and by
passing the *this* reference as an index to this map. Any better idea?
:)


-- 
    Claudio Martella
    claudio.marte...@gmail.com