Re: writing/emitting to HDFS

2011-09-26 Thread Avery Ching

Hi Claudio,

I think I understand what you are trying to do, a kind of a distributed 
logging for debugging.  I think such a feature can definitely be 
useful.  Aggregators might be able to do what you want, then with things 
like https://issues.apache.org/jira/browse/GIRAPH-10, perhaps not just 
at the end of the application, but after each superstep, might be able 
to accomplish what you want.


Feel free to take a crack at the issue...let's see what interfaces make 
sense.


Avery

On 9/26/11 7:03 AM, Claudio Martella wrote:

i'm really just trying to emit "results" into an hdfs file at
different moments of the computation. I'm really just thinking at a
functionality like log.debug(), to give an example, where all the
messages are collected from different workers at different supersteps.
At the moment I've implemented this:

https://github.com/claudiomartella/graffiti/blob/master/src/main/java/org/acaro/graffiti/processing/GraffitiEmitter.java

which i assign to each vertex at preApplication() and close from each
vertex at postApplication(). I'm not super happy about this solution.
During this weekend though, I thought I might use an Aggregator to
send my ResultSet object and use the Aggregator to write to disk. That
would be a nice design and I could contribute the JIRA about storing
Aggregator results.

What do you think?

On Fri, Sep 23, 2011 at 1:40 AM, Avery Ching  wrote:

This is more of a limitation of the fact that files are immutable in HDFS.
  Any more insight on what you're trying to do?  Perhaps we can think of a
more general way to address the issue.

Avery

On 9/22/11 10:31 AM, Claudio Martella wrote:

Hi Avery,

thanks, yes it does. The question would be though how to share the
file handle between the vertices on the same node. i could open the
file on the preApplication() and close it on the postApplication() but
i would end up potentially with as many files as vertices in the
graph.

Do you have any idea on this side? Maybe share somehow the handle and a
lock?

On Thu, Sep 22, 2011 at 4:07 PM, Avery Chingwrote:

There are some methods in Vertex (i.e. preApplication(), preSuperstep(),
postApplication(), postSuperstep()) that can be overidden to do anything
you
like, for instance write out some data to an HDFS file.  We have an open
issue on outputting Aggregator values that is unassigned if you'd like to
take a look at it as well
(https://issues.apache.org/jira/browse/GIRAPH-10).

Hope this helps,

Avery

On 9/22/11 7:34 AM, Claudio Martella wrote:

Hello list,

I have the need to emit to HDFS once in a while some Text. This
doesn't happen necessarily at the end of the computation and I might
need to emit something more complex than just the VertexValue, so I'd
like more control than what the VertexWriter gives me.

What do you suggest I might do to obtain a handler to a HDFS file (it
can be in parts aswell) to write to?
Is there any code I can start looking at?

Thanks!
Claudio












Re: writing/emitting to HDFS

2011-09-26 Thread Claudio Martella
Thanks for the feedback. As a matter of fact that's exactly the type
of functionality i'm looking for, with minimal infrastructure cost
though. Thanks!

On Fri, Sep 23, 2011 at 7:13 PM, Andy Schlaikjer  wrote:
> How about Scribing messages (and writing to HDFS) during calculation?
> Then you could perform bulk log analysis on the output with a separate
> Hadoop (or Pig) job.
>
> http://en.wikipedia.org/wiki/Scribe_(log_server)
>
> Andy
>
>
> On Thu, Sep 22, 2011 at 7:31 AM, Claudio Martella
>  wrote:
>> Hi Avery,
>>
>> thanks, yes it does. The question would be though how to share the
>> file handle between the vertices on the same node. i could open the
>> file on the preApplication() and close it on the postApplication() but
>> i would end up potentially with as many files as vertices in the
>> graph.
>>
>> Do you have any idea on this side? Maybe share somehow the handle and a lock?
>>
>> On Thu, Sep 22, 2011 at 4:07 PM, Avery Ching  wrote:
>>> There are some methods in Vertex (i.e. preApplication(), preSuperstep(),
>>> postApplication(), postSuperstep()) that can be overidden to do anything you
>>> like, for instance write out some data to an HDFS file.  We have an open
>>> issue on outputting Aggregator values that is unassigned if you'd like to
>>> take a look at it as well (https://issues.apache.org/jira/browse/GIRAPH-10).
>>>
>>> Hope this helps,
>>>
>>> Avery
>>>
>>> On 9/22/11 7:34 AM, Claudio Martella wrote:

 Hello list,

 I have the need to emit to HDFS once in a while some Text. This
 doesn't happen necessarily at the end of the computation and I might
 need to emit something more complex than just the VertexValue, so I'd
 like more control than what the VertexWriter gives me.

 What do you suggest I might do to obtain a handler to a HDFS file (it
 can be in parts aswell) to write to?
 Is there any code I can start looking at?

 Thanks!
 Claudio

>>>
>>>
>>
>>
>>
>> --
>>     Claudio Martella
>>     claudio.marte...@gmail.com
>>
>



-- 
    Claudio Martella
    claudio.marte...@gmail.com


Re: writing/emitting to HDFS

2011-09-26 Thread Claudio Martella
i'm really just trying to emit "results" into an hdfs file at
different moments of the computation. I'm really just thinking at a
functionality like log.debug(), to give an example, where all the
messages are collected from different workers at different supersteps.
At the moment I've implemented this:

https://github.com/claudiomartella/graffiti/blob/master/src/main/java/org/acaro/graffiti/processing/GraffitiEmitter.java

which i assign to each vertex at preApplication() and close from each
vertex at postApplication(). I'm not super happy about this solution.
During this weekend though, I thought I might use an Aggregator to
send my ResultSet object and use the Aggregator to write to disk. That
would be a nice design and I could contribute the JIRA about storing
Aggregator results.

What do you think?

On Fri, Sep 23, 2011 at 1:40 AM, Avery Ching  wrote:
> This is more of a limitation of the fact that files are immutable in HDFS.
>  Any more insight on what you're trying to do?  Perhaps we can think of a
> more general way to address the issue.
>
> Avery
>
> On 9/22/11 10:31 AM, Claudio Martella wrote:
>>
>> Hi Avery,
>>
>> thanks, yes it does. The question would be though how to share the
>> file handle between the vertices on the same node. i could open the
>> file on the preApplication() and close it on the postApplication() but
>> i would end up potentially with as many files as vertices in the
>> graph.
>>
>> Do you have any idea on this side? Maybe share somehow the handle and a
>> lock?
>>
>> On Thu, Sep 22, 2011 at 4:07 PM, Avery Ching  wrote:
>>>
>>> There are some methods in Vertex (i.e. preApplication(), preSuperstep(),
>>> postApplication(), postSuperstep()) that can be overidden to do anything
>>> you
>>> like, for instance write out some data to an HDFS file.  We have an open
>>> issue on outputting Aggregator values that is unassigned if you'd like to
>>> take a look at it as well
>>> (https://issues.apache.org/jira/browse/GIRAPH-10).
>>>
>>> Hope this helps,
>>>
>>> Avery
>>>
>>> On 9/22/11 7:34 AM, Claudio Martella wrote:

 Hello list,

 I have the need to emit to HDFS once in a while some Text. This
 doesn't happen necessarily at the end of the computation and I might
 need to emit something more complex than just the VertexValue, so I'd
 like more control than what the VertexWriter gives me.

 What do you suggest I might do to obtain a handler to a HDFS file (it
 can be in parts aswell) to write to?
 Is there any code I can start looking at?

 Thanks!
 Claudio

>>>
>>
>>
>
>



-- 
    Claudio Martella
    claudio.marte...@gmail.com


Re: writing/emitting to HDFS

2011-09-23 Thread Andy Schlaikjer
How about Scribing messages (and writing to HDFS) during calculation?
Then you could perform bulk log analysis on the output with a separate
Hadoop (or Pig) job.

http://en.wikipedia.org/wiki/Scribe_(log_server)

Andy


On Thu, Sep 22, 2011 at 7:31 AM, Claudio Martella
 wrote:
> Hi Avery,
>
> thanks, yes it does. The question would be though how to share the
> file handle between the vertices on the same node. i could open the
> file on the preApplication() and close it on the postApplication() but
> i would end up potentially with as many files as vertices in the
> graph.
>
> Do you have any idea on this side? Maybe share somehow the handle and a lock?
>
> On Thu, Sep 22, 2011 at 4:07 PM, Avery Ching  wrote:
>> There are some methods in Vertex (i.e. preApplication(), preSuperstep(),
>> postApplication(), postSuperstep()) that can be overidden to do anything you
>> like, for instance write out some data to an HDFS file.  We have an open
>> issue on outputting Aggregator values that is unassigned if you'd like to
>> take a look at it as well (https://issues.apache.org/jira/browse/GIRAPH-10).
>>
>> Hope this helps,
>>
>> Avery
>>
>> On 9/22/11 7:34 AM, Claudio Martella wrote:
>>>
>>> Hello list,
>>>
>>> I have the need to emit to HDFS once in a while some Text. This
>>> doesn't happen necessarily at the end of the computation and I might
>>> need to emit something more complex than just the VertexValue, so I'd
>>> like more control than what the VertexWriter gives me.
>>>
>>> What do you suggest I might do to obtain a handler to a HDFS file (it
>>> can be in parts aswell) to write to?
>>> Is there any code I can start looking at?
>>>
>>> Thanks!
>>> Claudio
>>>
>>
>>
>
>
>
> --
>     Claudio Martella
>     claudio.marte...@gmail.com
>


Re: writing/emitting to HDFS

2011-09-22 Thread Claudio Martella
Hi Avery,

thanks, yes it does. The question would be though how to share the
file handle between the vertices on the same node. i could open the
file on the preApplication() and close it on the postApplication() but
i would end up potentially with as many files as vertices in the
graph.

Do you have any idea on this side? Maybe share somehow the handle and a lock?

On Thu, Sep 22, 2011 at 4:07 PM, Avery Ching  wrote:
> There are some methods in Vertex (i.e. preApplication(), preSuperstep(),
> postApplication(), postSuperstep()) that can be overidden to do anything you
> like, for instance write out some data to an HDFS file.  We have an open
> issue on outputting Aggregator values that is unassigned if you'd like to
> take a look at it as well (https://issues.apache.org/jira/browse/GIRAPH-10).
>
> Hope this helps,
>
> Avery
>
> On 9/22/11 7:34 AM, Claudio Martella wrote:
>>
>> Hello list,
>>
>> I have the need to emit to HDFS once in a while some Text. This
>> doesn't happen necessarily at the end of the computation and I might
>> need to emit something more complex than just the VertexValue, so I'd
>> like more control than what the VertexWriter gives me.
>>
>> What do you suggest I might do to obtain a handler to a HDFS file (it
>> can be in parts aswell) to write to?
>> Is there any code I can start looking at?
>>
>> Thanks!
>> Claudio
>>
>
>



-- 
    Claudio Martella
    claudio.marte...@gmail.com


Re: writing/emitting to HDFS

2011-09-22 Thread Avery Ching
There are some methods in Vertex (i.e. preApplication(), preSuperstep(), 
postApplication(), postSuperstep()) that can be overidden to do anything 
you like, for instance write out some data to an HDFS file.  We have an 
open issue on outputting Aggregator values that is unassigned if you'd 
like to take a look at it as well 
(https://issues.apache.org/jira/browse/GIRAPH-10).


Hope this helps,

Avery

On 9/22/11 7:34 AM, Claudio Martella wrote:

Hello list,

I have the need to emit to HDFS once in a while some Text. This
doesn't happen necessarily at the end of the computation and I might
need to emit something more complex than just the VertexValue, so I'd
like more control than what the VertexWriter gives me.

What do you suggest I might do to obtain a handler to a HDFS file (it
can be in parts aswell) to write to?
Is there any code I can start looking at?

Thanks!
Claudio