[jira] [Commented] (GIRAPH-246) Periodic worker calls to context.progress() will prevent timeout on some Hadoop clusters during barrier waits

2012-08-15 Thread Jaeho Shin (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435694#comment-13435694 ] Jaeho Shin commented on GIRAPH-246: --- I believe source of the timeout with trunk in that

[jira] [Commented] (GIRAPH-278) Website still tries to load incubator logo

2012-08-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435678#comment-13435678 ] Hudson commented on GIRAPH-278: --- Integrated in Giraph-trunk-Commit #174 (See [https://build

[jira] [Commented] (GIRAPH-275) Restore data locality to workers reading InputSplits where possible without querying NameNode, ZooKeeper

2012-08-15 Thread Eli Reisman (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435677#comment-13435677 ] Eli Reisman commented on GIRAPH-275: In answer to yesterday's question (now that I'm u

[jira] [Commented] (GIRAPH-300) Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning

2012-08-15 Thread Eli Reisman (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435675#comment-13435675 ] Eli Reisman commented on GIRAPH-300: Whew! Turns out this is probably related to our c

[jira] [Updated] (GIRAPH-246) Periodic worker calls to context.progress() will prevent timeout on some Hadoop clusters during barrier waits

2012-08-15 Thread Eli Reisman (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-246: --- Attachment: GIRAPH-246-NEW-FIX.patch This is working for us. My ops fella still says we need to call

Re: [jira] [Commented] (GIRAPH-300) Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning

2012-08-15 Thread Avery Ching
Yes, this will happen, but should be okay, since the connect retries will take care of it (hopefully). This already happened with the old code (as you mentioned). I'm also working on a more robust implementation that will retry failed requests going forward (and establish broken connections).

[jira] [Updated] (GIRAPH-301) InputSplit Reservations are clumping, leaving many workers asleep while other process too many splits and get overloaded.

2012-08-15 Thread Eli Reisman (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Eli Reisman updated GIRAPH-301: --- Attachment: GIRAPH-301-2.patch Much simpler, does the same thing. Still going to be more ZK calls, bu

[jira] [Commented] (GIRAPH-300) Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning

2012-08-15 Thread Eli Reisman (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435564#comment-13435564 ] Eli Reisman commented on GIRAPH-300: Getting errors like this during input superstep o

Re: [jira] [Commented] (GIRAPH-249) Move part of the graph out-of-core when memory is low

2012-08-15 Thread Eli Reisman
Great metrics, this made a very interesting read, and great code too as always. This must have been a lot of work. I like the idea of eliminating the extra temporary storage data structures where possible, even when not going out-of-core. I think that + avoiding extra object creation during the wor

[jira] [Commented] (GIRAPH-300) Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning

2012-08-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435477#comment-13435477 ] Hudson commented on GIRAPH-300: --- Integrated in Giraph-trunk-Commit #173 (See [https://build

[jira] [Commented] (GIRAPH-249) Move part of the graph out-of-core when memory is low

2012-08-15 Thread Alessandro Presta (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435437#comment-13435437 ] Alessandro Presta commented on GIRAPH-249: -- Thanks Claudio, good observation. You

[jira] [Commented] (GIRAPH-300) Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning

2012-08-15 Thread Alessandro Presta (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435413#comment-13435413 ] Alessandro Presta commented on GIRAPH-300: -- Looks good, +1 > Imp

Re: Review Request: GIRAPH-300 : Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning

2012-08-15 Thread Alessandro Presta
> On Aug. 15, 2012, 2:03 p.m., Alessandro Presta wrote: > > http://svn.apache.org/repos/asf/giraph/trunk/src/main/java/org/apache/giraph/comm/NettyClient.java, > > line 389 > > > > > > This looks a bit weird (using a T

Re: Review Request: GIRAPH-300 : Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning

2012-08-15 Thread Avery Ching
> On Aug. 15, 2012, 2:03 p.m., Alessandro Presta wrote: > > This is a big win, thanks! > > So it was a reliability issue masked as a scalability one: more workers -> > > increased probability of network failure -> waiting forever on lost > > requests. > > Now is there anything that can be done

Re: Review Request: GIRAPH-300 : Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning

2012-08-15 Thread Avery Ching
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6600/ --- (Updated Aug. 15, 2012, 6:52 p.m.) Review request for giraph. Changes ---

[jira] [Updated] (GIRAPH-300) Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning

2012-08-15 Thread Avery Ching (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Avery Ching updated GIRAPH-300: --- Attachment: GIRAPH-300.2.patch Synced with reviewboard diff2 > Improve netty reliabi

[jira] [Commented] (GIRAPH-301) InputSplit Reservations are clumping, leaving many workers asleep while other process too many splits and get overloaded.

2012-08-15 Thread Eli Reisman (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435348#comment-13435348 ] Eli Reisman commented on GIRAPH-301: one thing we can do to reduce ZK reads at the beg

Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-15 Thread Alessandro Presta
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5987/ --- (Updated Aug. 15, 2012, 4:51 p.m.) Review request for giraph and Avery Ching.

Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-15 Thread Alessandro Presta
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5987/ --- (Updated Aug. 15, 2012, 4:51 p.m.) Review request for giraph and Avery Ching.

Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-15 Thread Alessandro Presta
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5987/ --- (Updated Aug. 15, 2012, 4:51 p.m.) Review request for giraph and Avery Ching.

Re: Review Request: Move part of the graph out-of-core when memory is low

2012-08-15 Thread Alessandro Presta
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/5987/ --- (Updated Aug. 15, 2012, 4:49 p.m.) Review request for giraph and Avery Ching.

Re: Review Request: GIRAPH-300 : Improve netty reliability with retrying failed connections, tracking requests, thread-safe hash partitioning

2012-08-15 Thread Alessandro Presta
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/6600/#review10330 --- This is a big win, thanks! So it was a reliability issue masked as a

[jira] [Commented] (GIRAPH-278) Website still tries to load incubator logo

2012-08-15 Thread Jakob Homan (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435298#comment-13435298 ] Jakob Homan commented on GIRAPH-278: +1. > Website still tries to l

[jira] [Commented] (GIRAPH-249) Move part of the graph out-of-core when memory is low

2012-08-15 Thread Claudio Martella (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435293#comment-13435293 ] Claudio Martella commented on GIRAPH-249: - Hi Alessandro, this is very interestin

[jira] [Commented] (GIRAPH-249) Move part of the graph out-of-core when memory is low

2012-08-15 Thread Alessandro Presta (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435290#comment-13435290 ] Alessandro Presta commented on GIRAPH-249: -- https://reviews.apache.org/r/5987/dif

[jira] [Commented] (GIRAPH-301) InputSplit Reservations are clumping, leaving many workers asleep while other process too many splits and get overloaded.

2012-08-15 Thread Eli Reisman (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435284#comment-13435284 ] Eli Reisman commented on GIRAPH-301: Hey Alessandro, Yes I think on 4, that was the i

[jira] [Updated] (GIRAPH-249) Move part of the graph out-of-core when memory is low

2012-08-15 Thread Alessandro Presta (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alessandro Presta updated GIRAPH-249: - Attachment: GIRAPH-249.patch I gave this another shot. This time it plays nicely with inp

[jira] [Commented] (GIRAPH-296) TotalNumVertices and TotalNumEdges are not saved in checkpoint

2012-08-15 Thread Alessandro Presta (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435155#comment-13435155 ] Alessandro Presta commented on GIRAPH-296: -- By the way: +1, committed. Thanks Maj

[jira] [Commented] (GIRAPH-293) Should aggregators be checkpointed?

2012-08-15 Thread Maja Kabiljo (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13435154#comment-13435154 ] Maja Kabiljo commented on GIRAPH-293: - Ops, I meant GIRAPH-296 and GIRAPH-297.

[jira] [Updated] (GIRAPH-293) Should aggregators be checkpointed?

2012-08-15 Thread Maja Kabiljo (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo updated GIRAPH-293: Attachment: GIRAPH-293.patch Making aggregators work correctly with checkpointing - saving the aggr

[jira] [Commented] (GIRAPH-296) TotalNumVertices and TotalNumEdges are not saved in checkpoint

2012-08-15 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434974#comment-13434974 ] Hudson commented on GIRAPH-296: --- Integrated in Giraph-trunk-Commit #172 (See [https://build

[jira] [Commented] (GIRAPH-301) InputSplit Reservations are clumping, leaving many workers asleep while other process too many splits and get overloaded.

2012-08-15 Thread Alessandro Presta (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-301?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434961#comment-13434961 ] Alessandro Presta commented on GIRAPH-301: -- This sounds good. Can you maybe post

[jira] [Updated] (GIRAPH-296) TotalNumVertices and TotalNumEdges are not saved in checkpoint

2012-08-15 Thread Maja Kabiljo (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo updated GIRAPH-296: Attachment: GIRAPH-296.patch Rebasing. > TotalNumVertices and TotalNumEdges are no

[jira] [Commented] (GIRAPH-278) Website still tries to load incubator logo

2012-08-15 Thread Eugene Koontz (JIRA)
[ https://issues.apache.org/jira/browse/GIRAPH-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13434856#comment-13434856 ] Eugene Koontz commented on GIRAPH-278: -- Planning on committing this tomorrow unless s