Re: writing/emitting to HDFS
Hi Claudio, I think I understand what you are trying to do, a kind of a distributed logging for debugging. I think such a feature can definitely be useful. Aggregators might be able to do what you want, then with things like https://issues.apache.org/jira/browse/GIRAPH-10, perhaps not just at the end of the application, but after each superstep, might be able to accomplish what you want. Feel free to take a crack at the issue...let's see what interfaces make sense. Avery On 9/26/11 7:03 AM, Claudio Martella wrote: i'm really just trying to emit "results" into an hdfs file at different moments of the computation. I'm really just thinking at a functionality like log.debug(), to give an example, where all the messages are collected from different workers at different supersteps. At the moment I've implemented this: https://github.com/claudiomartella/graffiti/blob/master/src/main/java/org/acaro/graffiti/processing/GraffitiEmitter.java which i assign to each vertex at preApplication() and close from each vertex at postApplication(). I'm not super happy about this solution. During this weekend though, I thought I might use an Aggregator to send my ResultSet object and use the Aggregator to write to disk. That would be a nice design and I could contribute the JIRA about storing Aggregator results. What do you think? On Fri, Sep 23, 2011 at 1:40 AM, Avery Ching wrote: This is more of a limitation of the fact that files are immutable in HDFS. Any more insight on what you're trying to do? Perhaps we can think of a more general way to address the issue. Avery On 9/22/11 10:31 AM, Claudio Martella wrote: Hi Avery, thanks, yes it does. The question would be though how to share the file handle between the vertices on the same node. i could open the file on the preApplication() and close it on the postApplication() but i would end up potentially with as many files as vertices in the graph. Do you have any idea on this side? Maybe share somehow the handle and a lock? On Thu, Sep 22, 2011 at 4:07 PM, Avery Chingwrote: There are some methods in Vertex (i.e. preApplication(), preSuperstep(), postApplication(), postSuperstep()) that can be overidden to do anything you like, for instance write out some data to an HDFS file. We have an open issue on outputting Aggregator values that is unassigned if you'd like to take a look at it as well (https://issues.apache.org/jira/browse/GIRAPH-10). Hope this helps, Avery On 9/22/11 7:34 AM, Claudio Martella wrote: Hello list, I have the need to emit to HDFS once in a while some Text. This doesn't happen necessarily at the end of the computation and I might need to emit something more complex than just the VertexValue, so I'd like more control than what the VertexWriter gives me. What do you suggest I might do to obtain a handler to a HDFS file (it can be in parts aswell) to write to? Is there any code I can start looking at? Thanks! Claudio
Re: writing/emitting to HDFS
Thanks for the feedback. As a matter of fact that's exactly the type of functionality i'm looking for, with minimal infrastructure cost though. Thanks! On Fri, Sep 23, 2011 at 7:13 PM, Andy Schlaikjer wrote: > How about Scribing messages (and writing to HDFS) during calculation? > Then you could perform bulk log analysis on the output with a separate > Hadoop (or Pig) job. > > http://en.wikipedia.org/wiki/Scribe_(log_server) > > Andy > > > On Thu, Sep 22, 2011 at 7:31 AM, Claudio Martella > wrote: >> Hi Avery, >> >> thanks, yes it does. The question would be though how to share the >> file handle between the vertices on the same node. i could open the >> file on the preApplication() and close it on the postApplication() but >> i would end up potentially with as many files as vertices in the >> graph. >> >> Do you have any idea on this side? Maybe share somehow the handle and a lock? >> >> On Thu, Sep 22, 2011 at 4:07 PM, Avery Ching wrote: >>> There are some methods in Vertex (i.e. preApplication(), preSuperstep(), >>> postApplication(), postSuperstep()) that can be overidden to do anything you >>> like, for instance write out some data to an HDFS file. We have an open >>> issue on outputting Aggregator values that is unassigned if you'd like to >>> take a look at it as well (https://issues.apache.org/jira/browse/GIRAPH-10). >>> >>> Hope this helps, >>> >>> Avery >>> >>> On 9/22/11 7:34 AM, Claudio Martella wrote: Hello list, I have the need to emit to HDFS once in a while some Text. This doesn't happen necessarily at the end of the computation and I might need to emit something more complex than just the VertexValue, so I'd like more control than what the VertexWriter gives me. What do you suggest I might do to obtain a handler to a HDFS file (it can be in parts aswell) to write to? Is there any code I can start looking at? Thanks! Claudio >>> >>> >> >> >> >> -- >> Claudio Martella >> claudio.marte...@gmail.com >> > -- Claudio Martella claudio.marte...@gmail.com
Re: writing/emitting to HDFS
i'm really just trying to emit "results" into an hdfs file at different moments of the computation. I'm really just thinking at a functionality like log.debug(), to give an example, where all the messages are collected from different workers at different supersteps. At the moment I've implemented this: https://github.com/claudiomartella/graffiti/blob/master/src/main/java/org/acaro/graffiti/processing/GraffitiEmitter.java which i assign to each vertex at preApplication() and close from each vertex at postApplication(). I'm not super happy about this solution. During this weekend though, I thought I might use an Aggregator to send my ResultSet object and use the Aggregator to write to disk. That would be a nice design and I could contribute the JIRA about storing Aggregator results. What do you think? On Fri, Sep 23, 2011 at 1:40 AM, Avery Ching wrote: > This is more of a limitation of the fact that files are immutable in HDFS. > Any more insight on what you're trying to do? Perhaps we can think of a > more general way to address the issue. > > Avery > > On 9/22/11 10:31 AM, Claudio Martella wrote: >> >> Hi Avery, >> >> thanks, yes it does. The question would be though how to share the >> file handle between the vertices on the same node. i could open the >> file on the preApplication() and close it on the postApplication() but >> i would end up potentially with as many files as vertices in the >> graph. >> >> Do you have any idea on this side? Maybe share somehow the handle and a >> lock? >> >> On Thu, Sep 22, 2011 at 4:07 PM, Avery Ching wrote: >>> >>> There are some methods in Vertex (i.e. preApplication(), preSuperstep(), >>> postApplication(), postSuperstep()) that can be overidden to do anything >>> you >>> like, for instance write out some data to an HDFS file. We have an open >>> issue on outputting Aggregator values that is unassigned if you'd like to >>> take a look at it as well >>> (https://issues.apache.org/jira/browse/GIRAPH-10). >>> >>> Hope this helps, >>> >>> Avery >>> >>> On 9/22/11 7:34 AM, Claudio Martella wrote: Hello list, I have the need to emit to HDFS once in a while some Text. This doesn't happen necessarily at the end of the computation and I might need to emit something more complex than just the VertexValue, so I'd like more control than what the VertexWriter gives me. What do you suggest I might do to obtain a handler to a HDFS file (it can be in parts aswell) to write to? Is there any code I can start looking at? Thanks! Claudio >>> >> >> > > -- Claudio Martella claudio.marte...@gmail.com
Re: writing/emitting to HDFS
How about Scribing messages (and writing to HDFS) during calculation? Then you could perform bulk log analysis on the output with a separate Hadoop (or Pig) job. http://en.wikipedia.org/wiki/Scribe_(log_server) Andy On Thu, Sep 22, 2011 at 7:31 AM, Claudio Martella wrote: > Hi Avery, > > thanks, yes it does. The question would be though how to share the > file handle between the vertices on the same node. i could open the > file on the preApplication() and close it on the postApplication() but > i would end up potentially with as many files as vertices in the > graph. > > Do you have any idea on this side? Maybe share somehow the handle and a lock? > > On Thu, Sep 22, 2011 at 4:07 PM, Avery Ching wrote: >> There are some methods in Vertex (i.e. preApplication(), preSuperstep(), >> postApplication(), postSuperstep()) that can be overidden to do anything you >> like, for instance write out some data to an HDFS file. We have an open >> issue on outputting Aggregator values that is unassigned if you'd like to >> take a look at it as well (https://issues.apache.org/jira/browse/GIRAPH-10). >> >> Hope this helps, >> >> Avery >> >> On 9/22/11 7:34 AM, Claudio Martella wrote: >>> >>> Hello list, >>> >>> I have the need to emit to HDFS once in a while some Text. This >>> doesn't happen necessarily at the end of the computation and I might >>> need to emit something more complex than just the VertexValue, so I'd >>> like more control than what the VertexWriter gives me. >>> >>> What do you suggest I might do to obtain a handler to a HDFS file (it >>> can be in parts aswell) to write to? >>> Is there any code I can start looking at? >>> >>> Thanks! >>> Claudio >>> >> >> > > > > -- > Claudio Martella > claudio.marte...@gmail.com >
Re: writing/emitting to HDFS
Hi Avery, thanks, yes it does. The question would be though how to share the file handle between the vertices on the same node. i could open the file on the preApplication() and close it on the postApplication() but i would end up potentially with as many files as vertices in the graph. Do you have any idea on this side? Maybe share somehow the handle and a lock? On Thu, Sep 22, 2011 at 4:07 PM, Avery Ching wrote: > There are some methods in Vertex (i.e. preApplication(), preSuperstep(), > postApplication(), postSuperstep()) that can be overidden to do anything you > like, for instance write out some data to an HDFS file. We have an open > issue on outputting Aggregator values that is unassigned if you'd like to > take a look at it as well (https://issues.apache.org/jira/browse/GIRAPH-10). > > Hope this helps, > > Avery > > On 9/22/11 7:34 AM, Claudio Martella wrote: >> >> Hello list, >> >> I have the need to emit to HDFS once in a while some Text. This >> doesn't happen necessarily at the end of the computation and I might >> need to emit something more complex than just the VertexValue, so I'd >> like more control than what the VertexWriter gives me. >> >> What do you suggest I might do to obtain a handler to a HDFS file (it >> can be in parts aswell) to write to? >> Is there any code I can start looking at? >> >> Thanks! >> Claudio >> > > -- Claudio Martella claudio.marte...@gmail.com
Re: writing/emitting to HDFS
There are some methods in Vertex (i.e. preApplication(), preSuperstep(), postApplication(), postSuperstep()) that can be overidden to do anything you like, for instance write out some data to an HDFS file. We have an open issue on outputting Aggregator values that is unassigned if you'd like to take a look at it as well (https://issues.apache.org/jira/browse/GIRAPH-10). Hope this helps, Avery On 9/22/11 7:34 AM, Claudio Martella wrote: Hello list, I have the need to emit to HDFS once in a while some Text. This doesn't happen necessarily at the end of the computation and I might need to emit something more complex than just the VertexValue, so I'd like more control than what the VertexWriter gives me. What do you suggest I might do to obtain a handler to a HDFS file (it can be in parts aswell) to write to? Is there any code I can start looking at? Thanks! Claudio