[jira] [Created] (GIRAPH-1224) Allow job to succeed if input is empty
Maja Kabiljo created GIRAPH-1224: Summary: Allow job to succeed if input is empty Key: GIRAPH-1224 URL: https://issues.apache.org/jira/browse/GIRAPH-1224 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo If input is empty we always fail, but sometimes when it's part of bigger workflow we might want to let job succeed, add an option for that. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Created] (GIRAPH-1222) Allow output formats to have writing setup and finalization
Maja Kabiljo created GIRAPH-1222: Summary: Allow output formats to have writing setup and finalization Key: GIRAPH-1222 URL: https://issues.apache.org/jira/browse/GIRAPH-1222 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo Sometimes output formats need custom logic to be executed once per worker, before and after writers are being used. Add callbacks to allow for that. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (GIRAPH-1215) Make FixedCapacityHeaps work with 0 capacity
Maja Kabiljo created GIRAPH-1215: Summary: Make FixedCapacityHeaps work with 0 capacity Key: GIRAPH-1215 URL: https://issues.apache.org/jira/browse/GIRAPH-1215 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo Currently FixedCapacityHeaps throw an exception when they are used with capacity 0. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GIRAPH-1213) Fix issues with network requests retries and add more logging
Maja Kabiljo created GIRAPH-1213: Summary: Fix issues with network requests retries and add more logging Key: GIRAPH-1213 URL: https://issues.apache.org/jira/browse/GIRAPH-1213 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo Fixing two bugs: * When channel fails, we are currently retrying all requests towards the destination machine from the channel, instead of just ones which are happening on the concrete channel. * In practice, we've noticed BlockingOperationException can get thrown when we wait to connect on channel in which case we silently don't send the request we are trying to send, so catching this exception and retrying instead. Also added logging of channel ids to be able to debug issues related to network requests not delivering easier. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GIRAPH-1212) Fix DefaultJobProgressTracker when splitMasterWorker=false
Maja Kabiljo created GIRAPH-1212: Summary: Fix DefaultJobProgressTracker when splitMasterWorker=false Key: GIRAPH-1212 URL: https://issues.apache.org/jira/browse/GIRAPH-1212 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo DefaultJobProgressTracker assumes we are using numWorkers+1 mappers, fix that -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GIRAPH-1211) Make retrying to send network requests after timeout optional
Maja Kabiljo created GIRAPH-1211: Summary: Make retrying to send network requests after timeout optional Key: GIRAPH-1211 URL: https://issues.apache.org/jira/browse/GIRAPH-1211 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo Using counters added in GIRAPH-1205 we were able to confirm that resending network requests after timeout almost never succeeds, so add an option to fail early instead of keep trying to resend these network requests indefinitely. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GIRAPH-1200) Add counters for network request resends
Maja Kabiljo created GIRAPH-1200: Summary: Add counters for network request resends Key: GIRAPH-1200 URL: https://issues.apache.org/jira/browse/GIRAPH-1200 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Expose statistics around network requests which we had to resend. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GIRAPH-1184) Don't access configuration on every message
Maja Kabiljo created GIRAPH-1184: Summary: Don't access configuration on every message Key: GIRAPH-1184 URL: https://issues.apache.org/jira/browse/GIRAPH-1184 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Cache whether we are using message size encoding from configuration to prevent accessing conf on every message. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GIRAPH-1170) Add logging for out-of-core
[ https://issues.apache.org/jira/browse/GIRAPH-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1170. -- Resolution: Fixed Assignee: Dionysios Logothetis > Add logging for out-of-core > --- > > Key: GIRAPH-1170 > URL: https://issues.apache.org/jira/browse/GIRAPH-1170 > Project: Giraph > Issue Type: Improvement >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis >Priority: Minor > > |Adding some debugging information for when reading a partition from disk > fails.| > > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GIRAPH-1160) Fix memory estimation in MemoryEstimatorOrcal
[ https://issues.apache.org/jira/browse/GIRAPH-1160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1160. -- Resolution: Fixed Assignee: Dionysios Logothetis > Fix memory estimation in MemoryEstimatorOrcal > - > > Key: GIRAPH-1160 > URL: https://issues.apache.org/jira/browse/GIRAPH-1160 > Project: Giraph > Issue Type: Bug >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis >Priority: Major > > Method MemoryEstimatorOracle.calculateRegression() exits if the number of > valid columns to use for the regression is not the same as the total number > of columns. This is wrong, the regression can run on only the valid columns. > This causes the memory estimation to be very off. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GIRAPH-1168) Instantiate OutEdges through Factory class
[ https://issues.apache.org/jira/browse/GIRAPH-1168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1168. -- Resolution: Fixed Assignee: Dionysios Logothetis > Instantiate OutEdges through Factory class > -- > > Key: GIRAPH-1168 > URL: https://issues.apache.org/jira/browse/GIRAPH-1168 > Project: Giraph > Issue Type: New Feature >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis >Priority: Major > > Sometimes the instantiation of an OutEdges implementation might have large > overhead, e.g. if it access the configuration. Instead of creating it > directly, introduce a factory class that can be instantiated once. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GIRAPH-1153) Update json dependency version
[ https://issues.apache.org/jira/browse/GIRAPH-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1153. -- Resolution: Fixed Assignee: Dionysios Logothetis https://github.com/apache/giraph/pull/43 > Update json dependency version > -- > > Key: GIRAPH-1153 > URL: https://issues.apache.org/jira/browse/GIRAPH-1153 > Project: Giraph > Issue Type: Improvement >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis >Priority: Minor > > This is a pretty old json version, conflicts with newer ones. Updating to a > more recent. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GIRAPH-1175) Avoid evaluation of Partition.getEdgeCount in log line
[ https://issues.apache.org/jira/browse/GIRAPH-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1175. -- Resolution: Fixed Assignee: Dionysios Logothetis > Avoid evaluation of Partition.getEdgeCount in log line > -- > > Key: GIRAPH-1175 > URL: https://issues.apache.org/jira/browse/GIRAPH-1175 > Project: Giraph > Issue Type: Bug >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis >Priority: Major > > Calling {{partition.getEdgeCount()}} iteration on all the vertices of the > partition, which can be expensive. The expression inside {{checkNotNull}} is > always evaluated, making this expensive. This constructs the string only if > necessary. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GIRAPH-1149) Fix initialization of IdAndValueArrayEdges and IdAndNullArrayEdges
[ https://issues.apache.org/jira/browse/GIRAPH-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1149. -- Resolution: Fixed > Fix initialization of IdAndValueArrayEdges and IdAndNullArrayEdges > -- > > Key: GIRAPH-1149 > URL: https://issues.apache.org/jira/browse/GIRAPH-1149 > Project: Giraph > Issue Type: Bug >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis >Priority: Major > > The initialize() method for these implementations does not reset the > underlying data structure (array) just like in other implementations (e.g. > HashMapEdges). This introduces bugs when the OutEdges implementation is > re-used during input. > https://github.com/apache/giraph/pull/40 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GIRAPH-1167) Add Long2ByteHashMapEdges implementation
[ https://issues.apache.org/jira/browse/GIRAPH-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1167. -- Resolution: Fixed Assignee: Dionysios Logothetis > Add Long2ByteHashMapEdges implementation > > > Key: GIRAPH-1167 > URL: https://issues.apache.org/jira/browse/GIRAPH-1167 > Project: Giraph > Issue Type: New Feature >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis >Priority: Minor > > Memory efficient OutEdges implementation to hold long IDs and byte values. > This is similar to the existing Long2DoubleHashMapEdges implementation. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GIRAPH-1154) Improve message printed for super-vertices
[ https://issues.apache.org/jira/browse/GIRAPH-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1154. -- Resolution: Fixed Assignee: Dionysios Logothetis > Improve message printed for super-vertices > -- > > Key: GIRAPH-1154 > URL: https://issues.apache.org/jira/browse/GIRAPH-1154 > Project: Giraph > Issue Type: Improvement >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis >Priority: Minor > > When a job fails due to super-vertices the message printed does not explain > to the users how to set the giraph.useBigDataIOForMessages option. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (GIRAPH-1164) Set mapred.job.tracker/mapred.local.dir options in InternalVertexRunner
[ https://issues.apache.org/jira/browse/GIRAPH-1164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1164. -- Resolution: Fixed Assignee: Dionysios Logothetis > Set mapred.job.tracker/mapred.local.dir options in InternalVertexRunner > --- > > Key: GIRAPH-1164 > URL: https://issues.apache.org/jira/browse/GIRAPH-1164 > Project: Giraph > Issue Type: Improvement >Reporter: Dionysios Logothetis >Assignee: Dionysios Logothetis >Priority: Trivial > > The mapred.job.tracker and mapred.local.dir are always expected when the > InternalVertexRunner is used but they are not set explicitly. Instead, they > are expected to be loaded from an external options file (or passed as > parameters from the tests). Setting them explicitly while stile allowing > them to be overriden makes more sense. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GIRAPH-1182) Log hostname which we disconnected from
Maja Kabiljo created GIRAPH-1182: Summary: Log hostname which we disconnected from Key: GIRAPH-1182 URL: https://issues.apache.org/jira/browse/GIRAPH-1182 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo When we can't make a connection, log hostname which was causing the problem -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GIRAPH-1174) Support having GcObservers
Maja Kabiljo created GIRAPH-1174: Summary: Support having GcObservers Key: GIRAPH-1174 URL: https://issues.apache.org/jira/browse/GIRAPH-1174 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GIRAPH-1171) Collect stats about how long it took to process each partition
Maja Kabiljo created GIRAPH-1171: Summary: Collect stats about how long it took to process each partition Key: GIRAPH-1171 URL: https://issues.apache.org/jira/browse/GIRAPH-1171 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo In order to make it easier to analyze whether there are some vertices in the graph which slow down the computation, or processing times of partitions is imbalanced, expose the stats about how long it took for each partition to be processed. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GIRAPH-1169) Expose jobGotAllMappers callback in DefaultJobProgressTracker
Maja Kabiljo created GIRAPH-1169: Summary: Expose jobGotAllMappers callback in DefaultJobProgressTracker Key: GIRAPH-1169 URL: https://issues.apache.org/jira/browse/GIRAPH-1169 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (GIRAPH-1166) Allow MasterObserver to get superstep aggregated metrics
Maja Kabiljo created GIRAPH-1166: Summary: Allow MasterObserver to get superstep aggregated metrics Key: GIRAPH-1166 URL: https://issues.apache.org/jira/browse/GIRAPH-1166 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor Pass superstep AggregatedMetrics to MasterObserver, to be able to analyze eg stragglers in jobs. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GIRAPH-1165) Skip iterating through vertices in supersteps with just global logic
Maja Kabiljo created GIRAPH-1165: Summary: Skip iterating through vertices in supersteps with just global logic Key: GIRAPH-1165 URL: https://issues.apache.org/jira/browse/GIRAPH-1165 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor Some supersteps don't do anything with vertices but just do global worker or master computation or perform aggregation. Not iterating through vertices in these cases can save time (some time is still spent in zookeeper barrier but that can be addressed separately). -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GIRAPH-1159) DefaultJobProgressTrackerService: Only kill the job if it's still running
Maja Kabiljo created GIRAPH-1159: Summary: DefaultJobProgressTrackerService: Only kill the job if it's still running Key: GIRAPH-1159 URL: https://issues.apache.org/jira/browse/GIRAPH-1159 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo In killJobWithMessage, we need to check if the job has completed before killing it. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GIRAPH-1157) Allow implementations of JobProgressTrackerService to extend kill job behavior
Maja Kabiljo created GIRAPH-1157: Summary: Allow implementations of JobProgressTrackerService to extend kill job behavior Key: GIRAPH-1157 URL: https://issues.apache.org/jira/browse/GIRAPH-1157 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (GIRAPH-1148) Connected components - make calculate sizes work with large number of components
Maja Kabiljo created GIRAPH-1148: Summary: Connected components - make calculate sizes work with large number of components Key: GIRAPH-1148 URL: https://issues.apache.org/jira/browse/GIRAPH-1148 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Currently if we have a graph with large number of connected components, calculating connected components sizes fails because reducer becomes too large. Use array of handles instead. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (GIRAPH-1147) Store timestamps when various fractions of input were done
[ https://issues.apache.org/jira/browse/GIRAPH-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1147. -- Resolution: Fixed > Store timestamps when various fractions of input were done > -- > > Key: GIRAPH-1147 > URL: https://issues.apache.org/jira/browse/GIRAPH-1147 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > In order to evaluate how read stragglers affect job performance, add a way to > expose timestamps when various fractions of input were done reading through > counters. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (GIRAPH-1138) Don't wrap exceptions from executor service
[ https://issues.apache.org/jira/browse/GIRAPH-1138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1138. -- Resolution: Fixed > Don't wrap exceptions from executor service > --- > > Key: GIRAPH-1138 > URL: https://issues.apache.org/jira/browse/GIRAPH-1138 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > In ProgressableUtils.getResultsWithNCallables we wrap exceptions from > underlying threads, making logs hard to read. We should re-throw original > exception when possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (GIRAPH-1146) Keep track of number of supersteps when possible
[ https://issues.apache.org/jira/browse/GIRAPH-1146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1146. -- Resolution: Fixed > Keep track of number of supersteps when possible > > > Key: GIRAPH-1146 > URL: https://issues.apache.org/jira/browse/GIRAPH-1146 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > In many cases we know how many supersteps are there going to be. We can keep > track of it and log it with progress. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (GIRAPH-1147) Store timestamps when various fractions of input were done
Maja Kabiljo created GIRAPH-1147: Summary: Store timestamps when various fractions of input were done Key: GIRAPH-1147 URL: https://issues.apache.org/jira/browse/GIRAPH-1147 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor In order to evaluate how read stragglers affect job performance, add a way to expose timestamps when various fractions of input were done reading through counters. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (GIRAPH-1146) Keep track of number of supersteps when possible
Maja Kabiljo created GIRAPH-1146: Summary: Keep track of number of supersteps when possible Key: GIRAPH-1146 URL: https://issues.apache.org/jira/browse/GIRAPH-1146 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor In many cases we know how many supersteps are there going to be. We can keep track of it and log it with progress. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (GIRAPH-1133) Fix JobProgressTracker in OverrideExceptionHandler
[ https://issues.apache.org/jira/browse/GIRAPH-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1133. -- Resolution: Fixed > Fix JobProgressTracker in OverrideExceptionHandler > -- > > Key: GIRAPH-1133 > URL: https://issues.apache.org/jira/browse/GIRAPH-1133 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > We create OverrideExceptionHandler before JobProgressTracker, so it can't > report errors to command line. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (GIRAPH-1140) Cleanup temp files in hdfs after job is done
[ https://issues.apache.org/jira/browse/GIRAPH-1140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1140. -- Resolution: Fixed > Cleanup temp files in hdfs after job is done > > > Key: GIRAPH-1140 > URL: https://issues.apache.org/jira/browse/GIRAPH-1140 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > Currently we are not cleaning up temp files we create in hdfs, fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (GIRAPH-1134) Track number of input splits in command line
[ https://issues.apache.org/jira/browse/GIRAPH-1134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1134. -- Resolution: Fixed > Track number of input splits in command line > > > Key: GIRAPH-1134 > URL: https://issues.apache.org/jira/browse/GIRAPH-1134 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > The progress we track during input reports how much data have we read, but > not how much data there is to read. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (GIRAPH-1141) Kill the job if no progress is being made
[ https://issues.apache.org/jira/browse/GIRAPH-1141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1141. -- Resolution: Fixed > Kill the job if no progress is being made > - > > Key: GIRAPH-1141 > URL: https://issues.apache.org/jira/browse/GIRAPH-1141 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > Sometimes jobs can get stuck for various reasons, it's better to have an > option to kill them then to keep them running holding resources. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (GIRAPH-1141) Kill the job if no progress is being made
Maja Kabiljo created GIRAPH-1141: Summary: Kill the job if no progress is being made Key: GIRAPH-1141 URL: https://issues.apache.org/jira/browse/GIRAPH-1141 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor Sometimes jobs can get stuck for various reasons, it's better to have an option to kill them then to keep them running holding resources. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (GIRAPH-1140) Cleanup temp files in hdfs after job is done
Maja Kabiljo created GIRAPH-1140: Summary: Cleanup temp files in hdfs after job is done Key: GIRAPH-1140 URL: https://issues.apache.org/jira/browse/GIRAPH-1140 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo Currently we are not cleaning up temp files we create in hdfs, fix it. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (GIRAPH-1138) Don't wrap exceptions from executor service
Maja Kabiljo created GIRAPH-1138: Summary: Don't wrap exceptions from executor service Key: GIRAPH-1138 URL: https://issues.apache.org/jira/browse/GIRAPH-1138 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor In ProgressableUtils.getResultsWithNCallables we wrap exceptions from underlying threads, making logs hard to read. We should re-throw original exception when possible. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (GIRAPH-1134) Track number of input splits in command line
Maja Kabiljo created GIRAPH-1134: Summary: Track number of input splits in command line Key: GIRAPH-1134 URL: https://issues.apache.org/jira/browse/GIRAPH-1134 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor The progress we track during input reports how much data have we read, but not how much data there is to read. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GIRAPH-1133) Fix JobProgressTracker in OverrideExceptionHandler
[ https://issues.apache.org/jira/browse/GIRAPH-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15894936#comment-15894936 ] Maja Kabiljo commented on GIRAPH-1133: -- https://github.com/apache/giraph/pull/22 > Fix JobProgressTracker in OverrideExceptionHandler > -- > > Key: GIRAPH-1133 > URL: https://issues.apache.org/jira/browse/GIRAPH-1133 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > We create OverrideExceptionHandler before JobProgressTracker, so it can't > report errors to command line. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (GIRAPH-1133) Fix JobProgressTracker in OverrideExceptionHandler
Maja Kabiljo created GIRAPH-1133: Summary: Fix JobProgressTracker in OverrideExceptionHandler Key: GIRAPH-1133 URL: https://issues.apache.org/jira/browse/GIRAPH-1133 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor We create OverrideExceptionHandler before JobProgressTracker, so it can't report errors to command line. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (GIRAPH-1115) Move UncaughtExceptionHandler setup to GraphTaskManager
[ https://issues.apache.org/jira/browse/GIRAPH-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1115. -- Resolution: Fixed > Move UncaughtExceptionHandler setup to GraphTaskManager > --- > > Key: GIRAPH-1115 > URL: https://issues.apache.org/jira/browse/GIRAPH-1115 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1114) Expose StatusReporter from workers in blocks framework
[ https://issues.apache.org/jira/browse/GIRAPH-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1114. -- Resolution: Fixed > Expose StatusReporter from workers in blocks framework > -- > > Key: GIRAPH-1114 > URL: https://issues.apache.org/jira/browse/GIRAPH-1114 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > Sometimes we need to call progress or update status from workers, expose this > functionality -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1108) Allow measuring time spent doing GC in some interval
[ https://issues.apache.org/jira/browse/GIRAPH-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1108. -- Resolution: Fixed > Allow measuring time spent doing GC in some interval > > > Key: GIRAPH-1108 > URL: https://issues.apache.org/jira/browse/GIRAPH-1108 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > Sometimes when things are slow, we want to know whether it's because of GC or > not. Keep track of last k GC pauses and a way to check how much time since > some timestamp was spent doing GC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1115) Move UncaughtExceptionHandler setup to GraphTaskManager
[ https://issues.apache.org/jira/browse/GIRAPH-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15504408#comment-15504408 ] Maja Kabiljo commented on GIRAPH-1115: -- https://reviews.facebook.net/D64113 > Move UncaughtExceptionHandler setup to GraphTaskManager > --- > > Key: GIRAPH-1115 > URL: https://issues.apache.org/jira/browse/GIRAPH-1115 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1115) Move UncaughtExceptionHandler setup to GraphTaskManager
Maja Kabiljo created GIRAPH-1115: Summary: Move UncaughtExceptionHandler setup to GraphTaskManager Key: GIRAPH-1115 URL: https://issues.apache.org/jira/browse/GIRAPH-1115 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1114) Expose StatusReporter from workers in blocks framework
[ https://issues.apache.org/jira/browse/GIRAPH-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491764#comment-15491764 ] Maja Kabiljo commented on GIRAPH-1114: -- https://reviews.facebook.net/D63999 > Expose StatusReporter from workers in blocks framework > -- > > Key: GIRAPH-1114 > URL: https://issues.apache.org/jira/browse/GIRAPH-1114 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > Sometimes we need to call progress or update status from workers, expose this > functionality -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1114) Expose StatusReporter from workers in blocks framework
Maja Kabiljo created GIRAPH-1114: Summary: Expose StatusReporter from workers in blocks framework Key: GIRAPH-1114 URL: https://issues.apache.org/jira/browse/GIRAPH-1114 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor Sometimes we need to call progress or update status from workers, expose this functionality -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1108) Allow measuring time spent doing GC in some interval
[ https://issues.apache.org/jira/browse/GIRAPH-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15439821#comment-15439821 ] Maja Kabiljo commented on GIRAPH-1108: -- https://reviews.facebook.net/D62727 > Allow measuring time spent doing GC in some interval > > > Key: GIRAPH-1108 > URL: https://issues.apache.org/jira/browse/GIRAPH-1108 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > Sometimes when things are slow, we want to know whether it's because of GC or > not. Keep track of last k GC pauses and a way to check how much time since > some timestamp was spent doing GC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1108) Allow measuring time spent doing GC in some interval
Maja Kabiljo created GIRAPH-1108: Summary: Allow measuring time spent doing GC in some interval Key: GIRAPH-1108 URL: https://issues.apache.org/jira/browse/GIRAPH-1108 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor Sometimes when things are slow, we want to know whether it's because of GC or not. Keep track of last k GC pauses and a way to check how much time since some timestamp was spent doing GC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1103) Another try to fix jobs getting stuck after channel failure
[ https://issues.apache.org/jira/browse/GIRAPH-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1103. -- Resolution: Fixed > Another try to fix jobs getting stuck after channel failure > --- > > Key: GIRAPH-1103 > URL: https://issues.apache.org/jira/browse/GIRAPH-1103 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > With GIRAPH-1087 we see jobs stuck after channel failure less often, but it > still happens. There are several additional issues I found: requests failing > to send at the first place so they never get retried, callbacks for channel > failures not being triggered always. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1105) Fix number of open requests in FacebookConfiguration
[ https://issues.apache.org/jira/browse/GIRAPH-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1105. -- Resolution: Fixed > Fix number of open requests in FacebookConfiguration > > > Key: GIRAPH-1105 > URL: https://issues.apache.org/jira/browse/GIRAPH-1105 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1107) Allow observers to access job counters
[ https://issues.apache.org/jira/browse/GIRAPH-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1107. -- Resolution: Fixed > Allow observers to access job counters > -- > > Key: GIRAPH-1107 > URL: https://issues.apache.org/jira/browse/GIRAPH-1107 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > From mapper/master/worker observer we might want to update some job counters > for stats. For that we should allow observers to access job context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1107) Allow observers to access job counters
[ https://issues.apache.org/jira/browse/GIRAPH-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15433349#comment-15433349 ] Maja Kabiljo commented on GIRAPH-1107: -- https://reviews.facebook.net/D62391 > Allow observers to access job counters > -- > > Key: GIRAPH-1107 > URL: https://issues.apache.org/jira/browse/GIRAPH-1107 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > From mapper/master/worker observer we might want to update some job counters > for stats. For that we should allow observers to access job context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1107) Allow observers to access job counters
Maja Kabiljo created GIRAPH-1107: Summary: Allow observers to access job counters Key: GIRAPH-1107 URL: https://issues.apache.org/jira/browse/GIRAPH-1107 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor >From mapper/master/worker observer we might want to update some job counters >for stats. For that we should allow observers to access job context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1105) Fix number of open requests in FacebookConfiguration
[ https://issues.apache.org/jira/browse/GIRAPH-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15419569#comment-15419569 ] Maja Kabiljo commented on GIRAPH-1105: -- https://reviews.facebook.net/D62019 > Fix number of open requests in FacebookConfiguration > > > Key: GIRAPH-1105 > URL: https://issues.apache.org/jira/browse/GIRAPH-1105 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1105) Fix number of open requests in FacebookConfiguration
Maja Kabiljo created GIRAPH-1105: Summary: Fix number of open requests in FacebookConfiguration Key: GIRAPH-1105 URL: https://issues.apache.org/jira/browse/GIRAPH-1105 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1104) NegativeArraySize exception in BigDataOutput
[ https://issues.apache.org/jira/browse/GIRAPH-1104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15415912#comment-15415912 ] Maja Kabiljo commented on GIRAPH-1104: -- This didn't seem to solve the problem, https://reviews.facebook.net/D61839 does while following max byte[] size semantics from BigDataIO. > NegativeArraySize exception in BigDataOutput > > > Key: GIRAPH-1104 > URL: https://issues.apache.org/jira/browse/GIRAPH-1104 > Project: Giraph > Issue Type: Bug >Reporter: Sergey Edunov >Assignee: Sergey Edunov > > We're seeing this exception in some jobs. Supposedly related to high degree > vertices > Caused by: java.lang.NegativeArraySizeException > at > org.apache.giraph.utils.UnsafeByteArrayOutputStream.ensureSize(UnsafeByteArrayOutputStream.java:117) > at > org.apache.giraph.utils.UnsafeByteArrayOutputStream.write(UnsafeByteArrayOutputStream.java:168) > at > org.apache.giraph.utils.io.BigDataOutput.write(BigDataOutput.java:183) > at org.apache.giraph.edge.ByteArrayEdges.write(ByteArrayEdges.java:204) > at > org.apache.giraph.ooc.data.DiskBackedPartitionStore.writeOutEdges(DiskBackedPartitionStore.java:353) > at > org.apache.giraph.ooc.data.DiskBackedPartitionStore.offloadInMemoryPartitionData(DiskBackedPartitionStore.java:389) > at > org.apache.giraph.ooc.data.DiskBackedDataStore.offloadPartitionDataProxy(DiskBackedDataStore.java:294) > at > org.apache.giraph.ooc.data.DiskBackedPartitionStore.offloadPartitionData(DiskBackedPartitionStore.java:318) > at > org.apache.giraph.ooc.command.StorePartitionIOCommand.execute(StorePartitionIOCommand.java:55) > at > org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:99) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1103) Another try to fix jobs getting stuck after channel failure
[ https://issues.apache.org/jira/browse/GIRAPH-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15412202#comment-15412202 ] Maja Kabiljo commented on GIRAPH-1103: -- https://reviews.facebook.net/D61719 > Another try to fix jobs getting stuck after channel failure > --- > > Key: GIRAPH-1103 > URL: https://issues.apache.org/jira/browse/GIRAPH-1103 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > With GIRAPH-1087 we see jobs stuck after channel failure less often, but it > still happens. There are several additional issues I found: requests failing > to send at the first place so they never get retried, callbacks for channel > failures not being triggered always. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1103) Another try to fix jobs getting stuck after channel failure
Maja Kabiljo created GIRAPH-1103: Summary: Another try to fix jobs getting stuck after channel failure Key: GIRAPH-1103 URL: https://issues.apache.org/jira/browse/GIRAPH-1103 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo With GIRAPH-1087 we see jobs stuck after channel failure less often, but it still happens. There are several additional issues I found: requests failing to send at the first place so they never get retried, callbacks for channel failures not being triggered always. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1087) Retry requests after channel failure
[ https://issues.apache.org/jira/browse/GIRAPH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1087. -- Resolution: Fixed > Retry requests after channel failure > > > Key: GIRAPH-1087 > URL: https://issues.apache.org/jira/browse/GIRAPH-1087 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > We currently don't have a callback to retry requests after channel failure, > and would either wait for request timeout or not retrying request at all at > places where we don't wait for open requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1086) Use pool of byte arrays with InMemoryDataAccessor
[ https://issues.apache.org/jira/browse/GIRAPH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1086. -- Resolution: Fixed > Use pool of byte arrays with InMemoryDataAccessor > - > > Key: GIRAPH-1086 > URL: https://issues.apache.org/jira/browse/GIRAPH-1086 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > Have a pool of byte arrays with InMemoryDataAccessor, to save on byte array > creation and initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1087) Retry requests after channel failure
[ https://issues.apache.org/jira/browse/GIRAPH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15384399#comment-15384399 ] Maja Kabiljo commented on GIRAPH-1087: -- https://reviews.facebook.net/D60675 > Retry requests after channel failure > > > Key: GIRAPH-1087 > URL: https://issues.apache.org/jira/browse/GIRAPH-1087 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > We currently don't have a callback to retry requests after channel failure, > and would either wait for request timeout or not retrying request at all at > places where we don't wait for open requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1091) Fix SimpleRangePartitionFactoryTest
[ https://issues.apache.org/jira/browse/GIRAPH-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1091. -- Resolution: Fixed > Fix SimpleRangePartitionFactoryTest > --- > > Key: GIRAPH-1091 > URL: https://issues.apache.org/jira/browse/GIRAPH-1091 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > Fix For: 1.2.0 > > > SimpleRangePartitionFactoryTest relied on old logic for calculating number of > partitions and got broken with GIRAPH-1082. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1091) Fix SimpleRangePartitionFactoryTest
[ https://issues.apache.org/jira/browse/GIRAPH-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15375470#comment-15375470 ] Maja Kabiljo commented on GIRAPH-1091: -- https://reviews.facebook.net/D60747 > Fix SimpleRangePartitionFactoryTest > --- > > Key: GIRAPH-1091 > URL: https://issues.apache.org/jira/browse/GIRAPH-1091 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > SimpleRangePartitionFactoryTest relied on old logic for calculating number of > partitions and got broken with GIRAPH-1082. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1091) Fix SimpleRangePartitionFactoryTest
Maja Kabiljo created GIRAPH-1091: Summary: Fix SimpleRangePartitionFactoryTest Key: GIRAPH-1091 URL: https://issues.apache.org/jira/browse/GIRAPH-1091 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor SimpleRangePartitionFactoryTest relied on old logic for calculating number of partitions and got broken with GIRAPH-1082. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1090) Allow getting shards of broadcasts in ShardedBroadcastHandle
[ https://issues.apache.org/jira/browse/GIRAPH-1090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15373421#comment-15373421 ] Maja Kabiljo commented on GIRAPH-1090: -- https://reviews.facebook.net/D60681 > Allow getting shards of broadcasts in ShardedBroadcastHandle > > > Key: GIRAPH-1090 > URL: https://issues.apache.org/jira/browse/GIRAPH-1090 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > When the value we are reducing / broadcasting in shards is large, sometimes > it's more efficient to get the shards separately and process them instead of > getting the globally reduced one. Expose that functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1090) Allow getting shards of broadcasts in ShardedBroadcastHandle
Maja Kabiljo created GIRAPH-1090: Summary: Allow getting shards of broadcasts in ShardedBroadcastHandle Key: GIRAPH-1090 URL: https://issues.apache.org/jira/browse/GIRAPH-1090 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor When the value we are reducing / broadcasting in shards is large, sometimes it's more efficient to get the shards separately and process them instead of getting the globally reduced one. Expose that functionality. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1087) Retry requests after channel failure
Maja Kabiljo created GIRAPH-1087: Summary: Retry requests after channel failure Key: GIRAPH-1087 URL: https://issues.apache.org/jira/browse/GIRAPH-1087 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo We currently don't have a callback to retry requests after channel failure, and would either wait for request timeout or not retrying request at all at places where we don't wait for open requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1086) Use pool of byte arrays with InMemoryDataAccessor
[ https://issues.apache.org/jira/browse/GIRAPH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371888#comment-15371888 ] Maja Kabiljo commented on GIRAPH-1086: -- https://reviews.facebook.net/D60621 > Use pool of byte arrays with InMemoryDataAccessor > - > > Key: GIRAPH-1086 > URL: https://issues.apache.org/jira/browse/GIRAPH-1086 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > Have a pool of byte arrays with InMemoryDataAccessor, to save on byte array > creation and initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1082) Remove limit on the number of partitions
[ https://issues.apache.org/jira/browse/GIRAPH-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1082. -- Resolution: Fixed > Remove limit on the number of partitions > > > Key: GIRAPH-1082 > URL: https://issues.apache.org/jira/browse/GIRAPH-1082 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > Currently we have a limit on how many partitions we can have because we write > all partition information to Zookeeper. We can instead send this information > in requests and remove the hard limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1086) Use pool of byte arrays with InMemoryDataAccessor
Maja Kabiljo created GIRAPH-1086: Summary: Use pool of byte arrays with InMemoryDataAccessor Key: GIRAPH-1086 URL: https://issues.apache.org/jira/browse/GIRAPH-1086 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Have a pool of byte arrays with InMemoryDataAccessor, to save on byte array creation and initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1085) Add InMemoryDataAccessor
[ https://issues.apache.org/jira/browse/GIRAPH-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1085. -- Resolution: Fixed > Add InMemoryDataAccessor > > > Key: GIRAPH-1085 > URL: https://issues.apache.org/jira/browse/GIRAPH-1085 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > When we deal with graphs which have a lot of vertices with very little total > data associated with them (values + edges) we start experiencing memory > problems because of too many objects created, since every vertex has multiple > objects associated with it. To solve this problem, we should have a > serialized partition representation (current ByteArrayPartition just keeps > byte[] per vertex, not per partition). We can leverage the out-of-core > infrastructure and just add data accessor which won't be backed by disk but > in memory buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1083) Make sure we fail after exception in ooc-io thread happens
[ https://issues.apache.org/jira/browse/GIRAPH-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1083. -- Resolution: Fixed > Make sure we fail after exception in ooc-io thread happens > -- > > Key: GIRAPH-1083 > URL: https://issues.apache.org/jira/browse/GIRAPH-1083 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > Currently if some exception happens in ooc-io thread the job is left running > for long time after the exception. We should make sure we fail early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1085) Add InMemoryDataAccessor
[ https://issues.apache.org/jira/browse/GIRAPH-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15365188#comment-15365188 ] Maja Kabiljo commented on GIRAPH-1085: -- https://reviews.facebook.net/D60435 > Add InMemoryDataAccessor > > > Key: GIRAPH-1085 > URL: https://issues.apache.org/jira/browse/GIRAPH-1085 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > When we deal with graphs which have a lot of vertices with very little total > data associated with them (values + edges) we start experiencing memory > problems because of too many objects created, since every vertex has multiple > objects associated with it. To solve this problem, we should have a > serialized partition representation (current ByteArrayPartition just keeps > byte[] per vertex, not per partition). We can leverage the out-of-core > infrastructure and just add data accessor which won't be backed by disk but > in memory buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1085) Add InMemoryDataAccessor
Maja Kabiljo created GIRAPH-1085: Summary: Add InMemoryDataAccessor Key: GIRAPH-1085 URL: https://issues.apache.org/jira/browse/GIRAPH-1085 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo When we deal with graphs which have a lot of vertices with very little total data associated with them (values + edges) we start experiencing memory problems because of too many objects created, since every vertex has multiple objects associated with it. To solve this problem, we should have a serialized partition representation (current ByteArrayPartition just keeps byte[] per vertex, not per partition). We can leverage the out-of-core infrastructure and just add data accessor which won't be backed by disk but in memory buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1083) Make sure we fail after exception in ooc-io thread happens
[ https://issues.apache.org/jira/browse/GIRAPH-1083?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359579#comment-15359579 ] Maja Kabiljo commented on GIRAPH-1083: -- https://reviews.facebook.net/D60291 > Make sure we fail after exception in ooc-io thread happens > -- > > Key: GIRAPH-1083 > URL: https://issues.apache.org/jira/browse/GIRAPH-1083 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > Currently if some exception happens in ooc-io thread the job is left running > for long time after the exception. We should make sure we fail early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1083) Make sure we fail after exception in ooc-io thread happens
Maja Kabiljo created GIRAPH-1083: Summary: Make sure we fail after exception in ooc-io thread happens Key: GIRAPH-1083 URL: https://issues.apache.org/jira/browse/GIRAPH-1083 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo Currently if some exception happens in ooc-io thread the job is left running for long time after the exception. We should make sure we fail early. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1082) Remove limit on the number of partitions
[ https://issues.apache.org/jira/browse/GIRAPH-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15359063#comment-15359063 ] Maja Kabiljo commented on GIRAPH-1082: -- https://reviews.facebook.net/D60267 > Remove limit on the number of partitions > > > Key: GIRAPH-1082 > URL: https://issues.apache.org/jira/browse/GIRAPH-1082 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > Currently we have a limit on how many partitions we can have because we write > all partition information to Zookeeper. We can instead send this information > in requests and remove the hard limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1082) Remove limit on the number of partitions
Maja Kabiljo created GIRAPH-1082: Summary: Remove limit on the number of partitions Key: GIRAPH-1082 URL: https://issues.apache.org/jira/browse/GIRAPH-1082 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Currently we have a limit on how many partitions we can have because we write all partition information to Zookeeper. We can instead send this information in requests and remove the hard limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1080) Add FacebookConfiguration
[ https://issues.apache.org/jira/browse/GIRAPH-1080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15353644#comment-15353644 ] Maja Kabiljo commented on GIRAPH-1080: -- https://reviews.facebook.net/D60135 > Add FacebookConfiguration > - > > Key: GIRAPH-1080 > URL: https://issues.apache.org/jira/browse/GIRAPH-1080 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Fix For: 1.2.0 > > > Internally we use a lot of different configuration defaults, we should make > them available for anyone to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1080) Add FacebookConfiguration
Maja Kabiljo created GIRAPH-1080: Summary: Add FacebookConfiguration Key: GIRAPH-1080 URL: https://issues.apache.org/jira/browse/GIRAPH-1080 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo Fix For: 1.2.0 Internally we use a lot of different configuration defaults, we should make them available for anyone to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1079) Add triangle counting example
[ https://issues.apache.org/jira/browse/GIRAPH-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo updated GIRAPH-1079: - Fix Version/s: 1.2.0 > Add triangle counting example > - > > Key: GIRAPH-1079 > URL: https://issues.apache.org/jira/browse/GIRAPH-1079 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > Fix For: 1.2.0 > > > Add an app for triangle counting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1079) Add triangle counting example
[ https://issues.apache.org/jira/browse/GIRAPH-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15351489#comment-15351489 ] Maja Kabiljo commented on GIRAPH-1079: -- https://reviews.facebook.net/D60057 > Add triangle counting example > - > > Key: GIRAPH-1079 > URL: https://issues.apache.org/jira/browse/GIRAPH-1079 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > Add an app for triangle counting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1079) Add triangle counting example
Maja Kabiljo created GIRAPH-1079: Summary: Add triangle counting example Key: GIRAPH-1079 URL: https://issues.apache.org/jira/browse/GIRAPH-1079 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor Add an app for triangle counting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1061) Add Connected Components block factory
[ https://issues.apache.org/jira/browse/GIRAPH-1061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1061. -- Resolution: Fixed > Add Connected Components block factory > -- > > Key: GIRAPH-1061 > URL: https://issues.apache.org/jira/browse/GIRAPH-1061 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > Add block factory for Connected Components to make it easy to run it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1077) Jobs getting stuck after channel failure
[ https://issues.apache.org/jira/browse/GIRAPH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1077. -- Resolution: Fixed > Jobs getting stuck after channel failure > > > Key: GIRAPH-1077 > URL: https://issues.apache.org/jira/browse/GIRAPH-1077 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > When a channel fails currently we just log the failure. Since we don't wait > on open requests from every place, checking requests doesn't get called > always, and we've seen issues with jobs staying stuck, for example during the > input stage when request for split to read from worker to master fails. When > we know that channel failed, we should try to resend the requests from that > channel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1064) Reconnect JobProgressTracker
[ https://issues.apache.org/jira/browse/GIRAPH-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1064. -- Resolution: Fixed > Reconnect JobProgressTracker > > > Key: GIRAPH-1064 > URL: https://issues.apache.org/jira/browse/GIRAPH-1064 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > When workers/master don't talk to JobProgressTracker it can disconnect and > throw RejectedExecutionException - we should catch and retry on that > exception too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1065) Allow extending JobProgressTrackerService
[ https://issues.apache.org/jira/browse/GIRAPH-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1065. -- Resolution: Fixed > Allow extending JobProgressTrackerService > - > > Key: GIRAPH-1065 > URL: https://issues.apache.org/jira/browse/GIRAPH-1065 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > We might want to perform additional actions on events from > JobProgressTrackerService. Allow overriding it and specifying another class > to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
dev@giraph.apache.org
[ https://issues.apache.org/jira/browse/GIRAPH-1062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1062. -- Resolution: Fixed > Page rank in Blocks&Pieces > -- > > Key: GIRAPH-1062 > URL: https://issues.apache.org/jira/browse/GIRAPH-1062 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > We have some examples of pagerank, but they all have some things missing. > Make one which will take sinks into account, have convergence checks, support > both weighted and unweighted graphs. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1075) UnsafeByteArrayOutputStream silently writes long UTFs incorrectly
[ https://issues.apache.org/jira/browse/GIRAPH-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1075. -- Resolution: Fixed > UnsafeByteArrayOutputStream silently writes long UTFs incorrectly > - > > Key: GIRAPH-1075 > URL: https://issues.apache.org/jira/browse/GIRAPH-1075 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > UnsafeByteArrayOutputStream.writeUTF was copied from DataOutputStream, but > part which checks the length was missed out. When we try to write long > strings they serialize without an issue, but when we try to deserialize them > we get a wrong value back and don't read the same number of bytes. Make it > fail like DataOutputStream instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1063) Make primitive type generated fixed capacity min heaps
[ https://issues.apache.org/jira/browse/GIRAPH-1063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maja Kabiljo resolved GIRAPH-1063. -- Resolution: Fixed > Make primitive type generated fixed capacity min heaps > -- > > Key: GIRAPH-1063 > URL: https://issues.apache.org/jira/browse/GIRAPH-1063 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > It's often needed to get top k (key, value) pairs, but existing > implementations deal with objects making them inefficient. Make one with > primitive types. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1077) Jobs getting stuck after channel failure
[ https://issues.apache.org/jira/browse/GIRAPH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15342452#comment-15342452 ] Maja Kabiljo commented on GIRAPH-1077: -- https://reviews.facebook.net/D59895 > Jobs getting stuck after channel failure > > > Key: GIRAPH-1077 > URL: https://issues.apache.org/jira/browse/GIRAPH-1077 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > When a channel fails currently we just log the failure. Since we don't wait > on open requests from every place, checking requests doesn't get called > always, and we've seen issues with jobs staying stuck, for example during the > input stage when request for split to read from worker to master fails. When > we know that channel failed, we should try to resend the requests from that > channel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1077) Jobs getting stuck after channel failure
Maja Kabiljo created GIRAPH-1077: Summary: Jobs getting stuck after channel failure Key: GIRAPH-1077 URL: https://issues.apache.org/jira/browse/GIRAPH-1077 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo When a channel fails currently we just log the failure. Since we don't wait on open requests from every place, checking requests doesn't get called always, and we've seen issues with jobs staying stuck, for example during the input stage when request for split to read from worker to master fails. When we know that channel failed, we should try to resend the requests from that channel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1075) UnsafeByteArrayOutputStream silently writes long UTFs incorrectly
[ https://issues.apache.org/jira/browse/GIRAPH-1075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15336767#comment-15336767 ] Maja Kabiljo commented on GIRAPH-1075: -- https://reviews.facebook.net/D59817 > UnsafeByteArrayOutputStream silently writes long UTFs incorrectly > - > > Key: GIRAPH-1075 > URL: https://issues.apache.org/jira/browse/GIRAPH-1075 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > > UnsafeByteArrayOutputStream.writeUTF was copied from DataOutputStream, but > part which checks the length was missed out. When we try to write long > strings they serialize without an issue, but when we try to deserialize them > we get a wrong value back and don't read the same number of bytes. Make it > fail like DataOutputStream instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1075) UnsafeByteArrayOutputStream silently writes long UTFs incorrectly
Maja Kabiljo created GIRAPH-1075: Summary: UnsafeByteArrayOutputStream silently writes long UTFs incorrectly Key: GIRAPH-1075 URL: https://issues.apache.org/jira/browse/GIRAPH-1075 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo UnsafeByteArrayOutputStream.writeUTF was copied from DataOutputStream, but part which checks the length was missed out. When we try to write long strings they serialize without an issue, but when we try to deserialize them we get a wrong value back and don't read the same number of bytes. Make it fail like DataOutputStream instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1065) Allow extending JobProgressTrackerService
[ https://issues.apache.org/jira/browse/GIRAPH-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15289252#comment-15289252 ] Maja Kabiljo commented on GIRAPH-1065: -- https://reviews.facebook.net/D58383 > Allow extending JobProgressTrackerService > - > > Key: GIRAPH-1065 > URL: https://issues.apache.org/jira/browse/GIRAPH-1065 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > We might want to perform additional actions on events from > JobProgressTrackerService. Allow overriding it and specifying another class > to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1065) Allow extending JobProgressTrackerService
Maja Kabiljo created GIRAPH-1065: Summary: Allow extending JobProgressTrackerService Key: GIRAPH-1065 URL: https://issues.apache.org/jira/browse/GIRAPH-1065 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor We might want to perform additional actions on events from JobProgressTrackerService. Allow overriding it and specifying another class to use. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1064) Reconnect JobProgressTracker
[ https://issues.apache.org/jira/browse/GIRAPH-1064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15287348#comment-15287348 ] Maja Kabiljo commented on GIRAPH-1064: -- https://reviews.facebook.net/D58323 > Reconnect JobProgressTracker > > > Key: GIRAPH-1064 > URL: https://issues.apache.org/jira/browse/GIRAPH-1064 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > > When workers/master don't talk to JobProgressTracker it can disconnect and > throw RejectedExecutionException - we should catch and retry on that > exception too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1064) Reconnect JobProgressTracker
Maja Kabiljo created GIRAPH-1064: Summary: Reconnect JobProgressTracker Key: GIRAPH-1064 URL: https://issues.apache.org/jira/browse/GIRAPH-1064 Project: Giraph Issue Type: Bug Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor When workers/master don't talk to JobProgressTracker it can disconnect and throw RejectedExecutionException - we should catch and retry on that exception too. -- This message was sent by Atlassian JIRA (v6.3.4#6332)