[jira] [Commented] (GIRAPH-1043) Implementation of Darwini graph generator
[ https://issues.apache.org/jira/browse/GIRAPH-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850915#comment-15850915 ] Sergey Edunov commented on GIRAPH-1043: --- The phabricator link is not working anymore. Here is the updated link: https://github.com/apache/giraph/pull/19 > Implementation of Darwini graph generator > - > > Key: GIRAPH-1043 > URL: https://issues.apache.org/jira/browse/GIRAPH-1043 > Project: Giraph > Issue Type: Task >Reporter: Sergey Edunov >Assignee: Sergey Edunov > > Implementation of graph generator that is able to capture many properties of > social graphs, such as high local clustering coefficient, non-power law > degree distributions and log normal joint degree distribution. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (GIRAPH-1126) Broken Link on Introduction Page for User Docs
[ https://issues.apache.org/jira/browse/GIRAPH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755930#comment-15755930 ] Sergey Edunov commented on GIRAPH-1126: --- I believe that version from release-1.0 does not even compile now. Simply because Vertex class doesn't have compute() function anymore. So, we should either take this link: https://github.com/apache/giraph/blob/trunk/giraph-examples/src/main/java/org/apache/giraph/examples/SimpleShortestPathsComputation.java and update intro accordingly. Or, much better, rewrite the whole thing using the new API and update intro as well. > Broken Link on Introduction Page for User Docs > -- > > Key: GIRAPH-1126 > URL: https://issues.apache.org/jira/browse/GIRAPH-1126 > Project: Giraph > Issue Type: Task > Components: site > Environment: Chrome Browser v54.0.2840.98 on macOS Sierra v10.12.1 >Reporter: Michael Aro >Assignee: Michael Aro >Priority: Trivial > Labels: documentation > > On the Introduction page of the User Docs is a broken link. The "here" link > before the code snippet on the page has a broken link to a java file on the > Github site. > URL: http://giraph.apache.org/intro.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1123) Latest trunk does not compile (checkstyle fails)
[ https://issues.apache.org/jira/browse/GIRAPH-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1123: -- Fix Version/s: (was: 1.2.0) 1.3.0 > Latest trunk does not compile (checkstyle fails) > > > Key: GIRAPH-1123 > URL: https://issues.apache.org/jira/browse/GIRAPH-1123 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Alessio Arleo >Assignee: Alessio Arleo > Fix For: 1.3.0 > > Attachments: GIRAPH-1123.patch > > > Latest trunk does not compile due to checkstyle errors. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1120) Insecure repository configuration
[ https://issues.apache.org/jira/browse/GIRAPH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1120: -- Fix Version/s: (was: 1.2.0) 1.3.0 > Insecure repository configuration > -- > > Key: GIRAPH-1120 > URL: https://issues.apache.org/jira/browse/GIRAPH-1120 > Project: Giraph > Issue Type: Bug > Components: build >Affects Versions: 1.3.0 >Reporter: Olaf Flebbe > Fix For: 1.3.0 > > Attachments: > 0001-GIRAPH-1120-Insecure-repository-configuration.patch, > 0001-GIRAPH-1120-Insecure-repository-configuration.patch > > > Hi, the repository configuration of giraph is dangerous, since it is > susceptible for mitm attacks. > {code} > > > central > http://repo1.maven.org/maven2 > > true > > > ... > {code} > If one looks closer, no repository is needed to be configured since > everything from the default profile is in maven central. > If anything from a non-default profile is not found in maven central, it > should be moved to the respective profile. For instance the CDH artifact > repository should be moved to the cdh hadoop_cdh4.1.2 profile. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1106) Update "Quick Start" section on Giraph website
[ https://issues.apache.org/jira/browse/GIRAPH-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1106: -- Fix Version/s: (was: 1.2.0) 1.3.0 > Update "Quick Start" section on Giraph website > -- > > Key: GIRAPH-1106 > URL: https://issues.apache.org/jira/browse/GIRAPH-1106 > Project: Giraph > Issue Type: Improvement > Components: documentation >Affects Versions: 1.2.0 >Reporter: Jose Luis Larroque > Fix For: 1.3.0 > > > The quick start guide (http://giraph.apache.org/quick_start.html) must be > updated, is very confusing for a new user seeing the Hadoop "0.20.203.0-RC1" > version, for using with Giraph 1.2.0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1095) Performance regression after GIRAPH-1068
[ https://issues.apache.org/jira/browse/GIRAPH-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov resolved GIRAPH-1095. --- Resolution: Fixed > Performance regression after GIRAPH-1068 > > > Key: GIRAPH-1095 > URL: https://issues.apache.org/jira/browse/GIRAPH-1095 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Sergey Edunov >Assignee: Sergey Edunov > Fix For: 1.2.0 > > > We noticed significant performance regression caused by GIRAPH-1068 for jobs > that have a lot of supersteps. This is likely caused by some missing > zookeeper options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1124) Create documentation on how to make Giraph release
Sergey Edunov created GIRAPH-1124: - Summary: Create documentation on how to make Giraph release Key: GIRAPH-1124 URL: https://issues.apache.org/jira/browse/GIRAPH-1124 Project: Giraph Issue Type: Wish Reporter: Sergey Edunov Assignee: Sergey Edunov Fix For: 1.3.0 Make a documentation on how to do Giraph release -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1044) Update book info in the User Docs / Related Literature page of the site
[ https://issues.apache.org/jira/browse/GIRAPH-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1044: -- Affects Version/s: (was: 1.3.0) 1.2.0 Fix Version/s: (was: 1.3.0) 1.2.0 > Update book info in the User Docs / Related Literature page of the site > --- > > Key: GIRAPH-1044 > URL: https://issues.apache.org/jira/browse/GIRAPH-1044 > Project: Giraph > Issue Type: Improvement > Components: site >Affects Versions: 1.2.0 >Reporter: Claudio Martella >Assignee: Roman Shaposhnik > Labels: documentation > Fix For: 1.2.0 > > Attachments: > 0001-GIRAPH-1044.-Update-book-info-in-the-User-Docs-Relat.patch > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1009) Spammy 'lost reservation' messages from ZooKeeper in workers' log at the end of the computation.
[ https://issues.apache.org/jira/browse/GIRAPH-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1009: -- Affects Version/s: (was: 1.3.0) 1.2.0 Fix Version/s: (was: 1.3.0) 1.2.0 > Spammy 'lost reservation' messages from ZooKeeper in workers' log at the end > of the computation. > > > Key: GIRAPH-1009 > URL: https://issues.apache.org/jira/browse/GIRAPH-1009 > Project: Giraph > Issue Type: Bug > Components: bsp, zookeeper >Affects Versions: 1.2.0 > Environment: All environment, while running with more than one worker. >Reporter: Hassan Eslami >Assignee: Hassan Eslami >Priority: Minor > Labels: log, worker, zookeeper > Fix For: 1.2.0 > > Original Estimate: 72h > Remaining Estimate: 72h > > When running Giraph with more than one worker, ZooKeeper usually throws a > bunch of 'lost reservation' info log messages for input splits at the end of > the computation in workers' log. This clutters log file of workers, specially > in cases where the job is running on a large graph with fairly large number > of input splits. > Here are examples of these log messages: > ... > {{INFO2015-05-19 14:44:58,894 \[main-EventThread\] > org.apache.giraph.worker.InputSplitsHandler - process: Input split > /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/1126 lost > reservation}} > {{INFO2015-05-19 14:44:58,894 \[main-EventThread\] > org.apache.giraph.worker.InputSplitsHandler - process: Input split > /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/2585 lost > reservation}} > {{INFO2015-05-19 14:44:58,895 \[main-EventThread\] > org.apache.giraph.worker.InputSplitsHandler - process: Input split > /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/1166 lost > reservation}} > {{INFO2015-05-19 14:44:58,895 \[main-EventThread\] > org.apache.giraph.worker.InputSplitsHandler - process: Input split > /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/1212 lost > reservation}} > {{INFO2015-05-19 14:44:58,895 \[main-EventThread\] > org.apache.giraph.worker.InputSplitsHandler - process: Input split > /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/1666 lost > reservation}} > {{INFO2015-05-19 14:44:58,896 \[main-EventThread\] > org.apache.giraph.worker.InputSplitsHandler - process: Input split > /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/2282 lost > reservation}} > {{INFO2015-05-19 14:44:58,896 \[main-EventThread\] > org.apache.giraph.worker.InputSplitsHandler - process: Input split > /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/1302 lost > reservation}} > {{INFO 2015-05-19 14:44:58,896 \[main-EventThread\] > org.apache.giraph.worker.InputSplitsHandler - process: Input split > /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/2364 lost > reservation}} > ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-990) Current trunk will build for hadoop 1.2.0 not 0.20.203 as stated by documentation
[ https://issues.apache.org/jira/browse/GIRAPH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-990: - Affects Version/s: (was: 1.3.0) 1.2.0 Fix Version/s: (was: 1.3.0) 1.2.0 > Current trunk will build for hadoop 1.2.0 not 0.20.203 as stated by > documentation > - > > Key: GIRAPH-990 > URL: https://issues.apache.org/jira/browse/GIRAPH-990 > Project: Giraph > Issue Type: Task > Components: documentation >Affects Versions: 1.2.0 > Environment: hadoop 0.20.203 >Reporter: Fredrik Einarsson >Assignee: Roman Shaposhnik > Fix For: 1.2.0 > > Attachments: > 0001-GIRAPH-990.-Current-trunk-will-build-for-hadoop-1.2..patch > > Original Estimate: 1h > Remaining Estimate: 1h > > I am new to hadoop, giraph etc and will use it as a part of my master thesis. > Therefore I went to your quickstart guide > (http://giraph.apache.org/quick_start.html) and followed it. > If one follows the guide point by point the "mvn package -DskipTests" command > will result in giraph-examples jar > giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar. > not the version for 0.20.203 which is needed. Also when I visit your github > page it is stated > "- Apache Hadoop 0.20.203.0 > This is the default version used by Giraph: if you do not specify a > profile with the -P flag, maven will use this version. You may also > explicitly specify it with "mvn -Phadoop_0.20.203 "." > Either documentation or mvn settings is wrong. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1086) Use pool of byte arrays with InMemoryDataAccessor
[ https://issues.apache.org/jira/browse/GIRAPH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1086: -- Fix Version/s: 1.3.0 > Use pool of byte arrays with InMemoryDataAccessor > - > > Key: GIRAPH-1086 > URL: https://issues.apache.org/jira/browse/GIRAPH-1086 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Fix For: 1.3.0 > > > Have a pool of byte arrays with InMemoryDataAccessor, to save on byte array > creation and initialization. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1087) Retry requests after channel failure
[ https://issues.apache.org/jira/browse/GIRAPH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1087: -- Fix Version/s: 1.3.0 > Retry requests after channel failure > > > Key: GIRAPH-1087 > URL: https://issues.apache.org/jira/browse/GIRAPH-1087 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Fix For: 1.3.0 > > > We currently don't have a callback to retry requests after channel failure, > and would either wait for request timeout or not retrying request at all at > places where we don't wait for open requests. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1105) Fix number of open requests in FacebookConfiguration
[ https://issues.apache.org/jira/browse/GIRAPH-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1105: -- Fix Version/s: 1.2.0 > Fix number of open requests in FacebookConfiguration > > > Key: GIRAPH-1105 > URL: https://issues.apache.org/jira/browse/GIRAPH-1105 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Fix For: 1.2.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1107) Allow observers to access job counters
[ https://issues.apache.org/jira/browse/GIRAPH-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1107: -- Fix Version/s: 1.3.0 > Allow observers to access job counters > -- > > Key: GIRAPH-1107 > URL: https://issues.apache.org/jira/browse/GIRAPH-1107 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > Fix For: 1.3.0 > > > From mapper/master/worker observer we might want to update some job counters > for stats. For that we should allow observers to access job context. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1114) Expose StatusReporter from workers in blocks framework
[ https://issues.apache.org/jira/browse/GIRAPH-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1114: -- Fix Version/s: 1.3.0 > Expose StatusReporter from workers in blocks framework > -- > > Key: GIRAPH-1114 > URL: https://issues.apache.org/jira/browse/GIRAPH-1114 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > Fix For: 1.3.0 > > > Sometimes we need to call progress or update status from workers, expose this > functionality -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1108) Allow measuring time spent doing GC in some interval
[ https://issues.apache.org/jira/browse/GIRAPH-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1108: -- Fix Version/s: 1.3.0 > Allow measuring time spent doing GC in some interval > > > Key: GIRAPH-1108 > URL: https://issues.apache.org/jira/browse/GIRAPH-1108 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > Fix For: 1.3.0 > > > Sometimes when things are slow, we want to know whether it's because of GC or > not. Keep track of last k GC pauses and a way to check how much time since > some timestamp was spent doing GC. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1115) Move UncaughtExceptionHandler setup to GraphTaskManager
[ https://issues.apache.org/jira/browse/GIRAPH-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1115: -- Fix Version/s: 1.3.0 > Move UncaughtExceptionHandler setup to GraphTaskManager > --- > > Key: GIRAPH-1115 > URL: https://issues.apache.org/jira/browse/GIRAPH-1115 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > Fix For: 1.3.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1077) Jobs getting stuck after channel failure
[ https://issues.apache.org/jira/browse/GIRAPH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1077: -- Fix Version/s: 1.2.0 > Jobs getting stuck after channel failure > > > Key: GIRAPH-1077 > URL: https://issues.apache.org/jira/browse/GIRAPH-1077 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Fix For: 1.2.0 > > > When a channel fails currently we just log the failure. Since we don't wait > on open requests from every place, checking requests doesn't get called > always, and we've seen issues with jobs staying stuck, for example during the > input stage when request for split to read from worker to master fails. When > we know that channel failed, we should try to resend the requests from that > channel. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1082) Remove limit on the number of partitions
[ https://issues.apache.org/jira/browse/GIRAPH-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1082: -- Fix Version/s: 1.2.0 > Remove limit on the number of partitions > > > Key: GIRAPH-1082 > URL: https://issues.apache.org/jira/browse/GIRAPH-1082 > Project: Giraph > Issue Type: Improvement >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Fix For: 1.2.0 > > > Currently we have a limit on how many partitions we can have because we write > all partition information to Zookeeper. We can instead send this information > in requests and remove the hard limit. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1081) Fix a bug in internal out-of-core infra: multithreaded accesses to buffers
[ https://issues.apache.org/jira/browse/GIRAPH-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1081: -- Fix Version/s: 1.2.0 > Fix a bug in internal out-of-core infra: multithreaded accesses to buffers > -- > > Key: GIRAPH-1081 > URL: https://issues.apache.org/jira/browse/GIRAPH-1081 > Project: Giraph > Issue Type: Bug >Reporter: Hassan Eslami >Assignee: Hassan Eslami > Fix For: 1.2.0 > > > The multi-threaded accesses to raw data buffers in DiskBackedDataStore is > overlooked, violating assumption on properly partitioning data to different > IO threads. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1085) Add InMemoryDataAccessor
[ https://issues.apache.org/jira/browse/GIRAPH-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1085: -- Fix Version/s: 1.2.0 > Add InMemoryDataAccessor > > > Key: GIRAPH-1085 > URL: https://issues.apache.org/jira/browse/GIRAPH-1085 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo > Fix For: 1.2.0 > > > When we deal with graphs which have a lot of vertices with very little total > data associated with them (values + edges) we start experiencing memory > problems because of too many objects created, since every vertex has multiple > objects associated with it. To solve this problem, we should have a > serialized partition representation (current ByteArrayPartition just keeps > byte[] per vertex, not per partition). We can leverage the out-of-core > infrastructure and just add data accessor which won't be backed by disk but > in memory buffers. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1098) Job may get stuck if zookeeper port fixed and is in use
[ https://issues.apache.org/jira/browse/GIRAPH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1098: -- Fix Version/s: 1.2.0 > Job may get stuck if zookeeper port fixed and is in use > --- > > Key: GIRAPH-1098 > URL: https://issues.apache.org/jira/browse/GIRAPH-1098 > Project: Giraph > Issue Type: Bug >Reporter: Sergey Edunov >Assignee: Sergey Edunov > Fix For: 1.2.0 > > > We see jobs getting stuck indefinitely if zookeeper port is in use: > INFO2016-07-19 16:08:29,168 [main] > org.apache.zookeeper.server.NIOServerCnxnFactory - binding to port > ::/0:0:0:0:0:0:0:0:22181 > ERROR 2016-07-19 16:08:29,168 [main] > org.apache.giraph.zk.InProcessZooKeeperRunner - Unable to start zookeeper > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) > at > org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.runFromConfig(InProcessZooKeeperRunner.java:196) > at > org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.start(InProcessZooKeeperRunner.java:154) > at > org.apache.giraph.zk.InProcessZooKeeperRunner$QuorumRunner.start(InProcessZooKeeperRunner.java:97) > at > org.apache.giraph.zk.InProcessZooKeeperRunner.start(InProcessZooKeeperRunner.java:52) > at > org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServer(ZooKeeperManager.java:476) > at > org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:447) > at > org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:247) > at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:56) > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:627) > at org.apache.hadoop.mapred.MapTask.runImpl(MapTask.java:301) > at org.apache.hadoop.mapred.Task.run(Task.java:604) > at org.apache.hadoop.mapred.CoronaChild.main(CoronaChild.java:177) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1122) Javadoc generation fails for Giraph 1.2.0
Sergey Edunov created GIRAPH-1122: - Summary: Javadoc generation fails for Giraph 1.2.0 Key: GIRAPH-1122 URL: https://issues.apache.org/jira/browse/GIRAPH-1122 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Assignee: Sergey Edunov Javadoc generation currently fails in Giraph 1.2.0 We need to fix it to be able to update website after the release -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1118) Giraph-gora and Giraph-rexster test cases fail in release-1.2
Sergey Edunov created GIRAPH-1118: - Summary: Giraph-gora and Giraph-rexster test cases fail in release-1.2 Key: GIRAPH-1118 URL: https://issues.apache.org/jira/browse/GIRAPH-1118 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Trying to do mvn clean install -Phadoop_1 mvn clean install -Phadoop_2 gora and rexster test cases fail and block release -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1117) Provide a flexible way to decide whether to create vertex when it is not present in the input
Sergey Edunov created GIRAPH-1117: - Summary: Provide a flexible way to decide whether to create vertex when it is not present in the input Key: GIRAPH-1117 URL: https://issues.apache.org/jira/browse/GIRAPH-1117 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Priority: Minor Currently there is only one way to control if source vertices that don't explicitly exist in the input should be created: giraph.createEdgeSourceVertices This way one can enable or disable vertex creation on the job level. However sometime we need more fine-grained control. E.g. we want to create vertices for some edge input and then skip creation for other part. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1113) Errors from vertex combiner don't have vertex id, which makes it very hard to debug.
[ https://issues.apache.org/jira/browse/GIRAPH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1113: -- Description: Vertex combiners only have access to vertex values so if something happens and exception is thrown there is no way to track it back to particular vertex. We need to handle these exceptions higher in the code and re-throw with extra information > Errors from vertex combiner don't have vertex id, which makes it very hard to > debug. > > > Key: GIRAPH-1113 > URL: https://issues.apache.org/jira/browse/GIRAPH-1113 > Project: Giraph > Issue Type: Improvement >Reporter: Sergey Edunov >Assignee: Sergey Edunov >Priority: Minor > > Vertex combiners only have access to vertex values so if something happens > and exception is thrown there is no way to track it back to particular > vertex. We need to handle these exceptions higher in the code and re-throw > with extra information -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (GIRAPH-1113) Errors from vertex combiner don't have vertex id, which makes it very hard to debug.
[ https://issues.apache.org/jira/browse/GIRAPH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov reassigned GIRAPH-1113: - Assignee: Sergey Edunov > Errors from vertex combiner don't have vertex id, which makes it very hard to > debug. > > > Key: GIRAPH-1113 > URL: https://issues.apache.org/jira/browse/GIRAPH-1113 > Project: Giraph > Issue Type: Improvement >Reporter: Sergey Edunov >Assignee: Sergey Edunov >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1113) Errors from vertex combiner don't have vertex id, which makes it very hard to debug.
Sergey Edunov created GIRAPH-1113: - Summary: Errors from vertex combiner don't have vertex id, which makes it very hard to debug. Key: GIRAPH-1113 URL: https://issues.apache.org/jira/browse/GIRAPH-1113 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1094) Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput
[ https://issues.apache.org/jira/browse/GIRAPH-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490985#comment-15490985 ] Sergey Edunov commented on GIRAPH-1094: --- According to hbase docs, hadoop 1x is only compatible with Hbase 0.94: https://hbase.apache.org/book.html#hadoop And hbase 0.94 is not compatible with our version of guava. Seems like we need to update hbase, but that will also mean we need to remove hbase support from hadoop_1 profile. I'm not sure how many users of hadoop_1 we have and how many of them care about hbase. This issue has been blocking the release for quite a while now, so I'll make an upgrade and remove support for hbase in hadoop_1 profile if there are no explicit objections by the end of this week. > Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput > - > > Key: GIRAPH-1094 > URL: https://issues.apache.org/jira/browse/GIRAPH-1094 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Sergey Edunov >Assignee: Roman Shaposhnik > > This test case seems to fail because of a missing dependency. > 16/07/13 22:31:50 ERROR master.MasterFileSystem: bootstrap > org.apache.hadoop.hbase.DroppedSnapshotException: region: -ROOT-,,0 > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1684) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1552) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1047) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:995) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:960) > at > org.apache.hadoop.hbase.master.MasterFileSystem.bootstrap(MasterFileSystem.java:523) > at > org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:463) > at > org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:148) > at > org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:133) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:573) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:432) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NoClassDefFoundError: > com/google/common/io/NullOutputStream > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:375) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:1330) > at > org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:913) > at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:794) > at > org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2429) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1659) > Full log can be found here: > https://builds.apache.org/job/Giraph-1.2/5/MVN_PROFILE=hadoop_1,jdk=JDK%201.7%20(latest),label=ubuntu/testReport/junit/org.apache.giraph.io.hbase/TestHBaseRootMarkerVertextFormat/testHBaseInputOutput/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1111) FileOutputFormat#setOutputPath is not always available
Sergey Edunov created GIRAPH-: - Summary: FileOutputFormat#setOutputPath is not always available Key: GIRAPH- URL: https://issues.apache.org/jira/browse/GIRAPH- Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Assignee: Sergey Edunov Priority: Minor We need to make Giraph work with hadoop distributions where FileOutputFormat#setOutputPath is not available. It is very easy to pull this implementation out as every hadoop distribution has exactly the same implementation. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1094) Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput
[ https://issues.apache.org/jira/browse/GIRAPH-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437665#comment-15437665 ] Sergey Edunov commented on GIRAPH-1094: --- This bug is known: http://stackoverflow.com/questions/17625938/hbase-minidfscluster-java-fails-in-certain-environments it's caused by HDFS issue: https://issues.apache.org/jira/browse/HDFS-2556 It doesn't really affect Giraph in runtime, so I think we shouldn't block the release. > Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput > - > > Key: GIRAPH-1094 > URL: https://issues.apache.org/jira/browse/GIRAPH-1094 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Sergey Edunov >Assignee: Roman Shaposhnik > > This test case seems to fail because of a missing dependency. > 16/07/13 22:31:50 ERROR master.MasterFileSystem: bootstrap > org.apache.hadoop.hbase.DroppedSnapshotException: region: -ROOT-,,0 > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1684) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1552) > at > org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1047) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:995) > at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:960) > at > org.apache.hadoop.hbase.master.MasterFileSystem.bootstrap(MasterFileSystem.java:523) > at > org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:463) > at > org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:148) > at > org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:133) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:573) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:432) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.lang.NoClassDefFoundError: > com/google/common/io/NullOutputStream > at > org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:375) > at > org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:1330) > at > org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:913) > at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:794) > at > org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2429) > at > org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1659) > Full log can be found here: > https://builds.apache.org/job/Giraph-1.2/5/MVN_PROFILE=hadoop_1,jdk=JDK%201.7%20(latest),label=ubuntu/testReport/junit/org.apache.giraph.io.hbase/TestHBaseRootMarkerVertextFormat/testHBaseInputOutput/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1104) NegativeArraySize exception in BigDataOutput
Sergey Edunov created GIRAPH-1104: - Summary: NegativeArraySize exception in BigDataOutput Key: GIRAPH-1104 URL: https://issues.apache.org/jira/browse/GIRAPH-1104 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Assignee: Sergey Edunov We're seeing this exception in some jobs. Supposedly related to high degree vertices Caused by: java.lang.NegativeArraySizeException at org.apache.giraph.utils.UnsafeByteArrayOutputStream.ensureSize(UnsafeByteArrayOutputStream.java:117) at org.apache.giraph.utils.UnsafeByteArrayOutputStream.write(UnsafeByteArrayOutputStream.java:168) at org.apache.giraph.utils.io.BigDataOutput.write(BigDataOutput.java:183) at org.apache.giraph.edge.ByteArrayEdges.write(ByteArrayEdges.java:204) at org.apache.giraph.ooc.data.DiskBackedPartitionStore.writeOutEdges(DiskBackedPartitionStore.java:353) at org.apache.giraph.ooc.data.DiskBackedPartitionStore.offloadInMemoryPartitionData(DiskBackedPartitionStore.java:389) at org.apache.giraph.ooc.data.DiskBackedDataStore.offloadPartitionDataProxy(DiskBackedDataStore.java:294) at org.apache.giraph.ooc.data.DiskBackedPartitionStore.offloadPartitionData(DiskBackedPartitionStore.java:318) at org.apache.giraph.ooc.command.StorePartitionIOCommand.execute(StorePartitionIOCommand.java:55) at org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:99) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1100) Multiple mutation requests to one vertex result in failure
Sergey Edunov created GIRAPH-1100: - Summary: Multiple mutation requests to one vertex result in failure Key: GIRAPH-1100 URL: https://issues.apache.org/jira/browse/GIRAPH-1100 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Assignee: Sergey Edunov The scenario is simple: You send multiple addEdgeRequest() where the source vertex of new edge does not exist (typical scenario for adding reverse edges). If two of these requests happen to arrive in the single SendPartitionMutationsRequest, giraph is unable to handle it: FATAL 2016-07-26 17:59:09,563 [netty-server-worker-6] org.apache.giraph.graph.GraphTaskManager - uncaughtException: OverrideExceptionHandler on thread netty-server-worker-6, msg = readFields: Already has vertex id 977939745592684, exiting... java.lang.IllegalStateException: readFields: Already has vertex id 977939745592684 at org.apache.giraph.comm.requests.SendPartitionMutationsRequest.readFieldsRequest(SendPartitionMutationsRequest.java:98) at org.apache.giraph.comm.requests.WritableRequest.readFields(WritableRequest.java:118) at org.apache.giraph.utils.RequestUtils.decodeWritableRequest(RequestUtils.java:52) at org.apache.giraph.comm.netty.handler.RequestDecoder.channelRead(RequestDecoder.java:89) at io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338) at io.netty.channel.DefaultChannelHandlerContext.access$700(DefaultChannelHandlerContext.java:29) at io.netty.channel.DefaultChannelHandlerContext$8.run(DefaultChannelHandlerContext.java:329) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:354) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1099) Bypass all DNC calls in Giraph
Sergey Edunov created GIRAPH-1099: - Summary: Bypass all DNC calls in Giraph Key: GIRAPH-1099 URL: https://issues.apache.org/jira/browse/GIRAPH-1099 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Assignee: Sergey Edunov After GIRAPH-1034 we have reduced the number of DNS lookups and reverse DNS lookups dramatically. However, we still see failures occasionally and it would be great to completely bypass DNS if PREFER_IP_ADDRESSES is set. Currently I'm aware of two places that make DNS lookups: One in BspService: this.hostname = conf.getLocalHostname(); Another one is probably related: java.net.UnknownHostException: ***.com: unknown error at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323) at java.net.InetAddress.getAllByName0(InetAddress.java:1276) at java.net.InetAddress.getAllByName(InetAddress.java:1192) at java.net.InetAddress.getAllByName(InetAddress.java:1126) at org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60) at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445) at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380) at org.apache.giraph.zk.ZooKeeperExt.(ZooKeeperExt.java:114) at org.apache.giraph.bsp.BspService.(BspService.java:281) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Closed] (GIRAPH-1098) Job may get stuck if zookeeper port fixed and is in use
[ https://issues.apache.org/jira/browse/GIRAPH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov closed GIRAPH-1098. - > Job may get stuck if zookeeper port fixed and is in use > --- > > Key: GIRAPH-1098 > URL: https://issues.apache.org/jira/browse/GIRAPH-1098 > Project: Giraph > Issue Type: Bug >Reporter: Sergey Edunov >Assignee: Sergey Edunov > > We see jobs getting stuck indefinitely if zookeeper port is in use: > INFO2016-07-19 16:08:29,168 [main] > org.apache.zookeeper.server.NIOServerCnxnFactory - binding to port > ::/0:0:0:0:0:0:0:0:22181 > ERROR 2016-07-19 16:08:29,168 [main] > org.apache.giraph.zk.InProcessZooKeeperRunner - Unable to start zookeeper > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at > org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) > at > org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.runFromConfig(InProcessZooKeeperRunner.java:196) > at > org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.start(InProcessZooKeeperRunner.java:154) > at > org.apache.giraph.zk.InProcessZooKeeperRunner$QuorumRunner.start(InProcessZooKeeperRunner.java:97) > at > org.apache.giraph.zk.InProcessZooKeeperRunner.start(InProcessZooKeeperRunner.java:52) > at > org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServer(ZooKeeperManager.java:476) > at > org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:447) > at > org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:247) > at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:56) > at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90) > at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:627) > at org.apache.hadoop.mapred.MapTask.runImpl(MapTask.java:301) > at org.apache.hadoop.mapred.Task.run(Task.java:604) > at org.apache.hadoop.mapred.CoronaChild.main(CoronaChild.java:177) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1098) Job may get stuck if zookeeper port fixed and is in use
Sergey Edunov created GIRAPH-1098: - Summary: Job may get stuck if zookeeper port fixed and is in use Key: GIRAPH-1098 URL: https://issues.apache.org/jira/browse/GIRAPH-1098 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Assignee: Sergey Edunov We see jobs getting stuck indefinitely if zookeeper port is in use: INFO2016-07-19 16:08:29,168 [main] org.apache.zookeeper.server.NIOServerCnxnFactory - binding to port ::/0:0:0:0:0:0:0:0:22181 ERROR 2016-07-19 16:08:29,168 [main] org.apache.giraph.zk.InProcessZooKeeperRunner - Unable to start zookeeper java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) at org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95) at org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.runFromConfig(InProcessZooKeeperRunner.java:196) at org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.start(InProcessZooKeeperRunner.java:154) at org.apache.giraph.zk.InProcessZooKeeperRunner$QuorumRunner.start(InProcessZooKeeperRunner.java:97) at org.apache.giraph.zk.InProcessZooKeeperRunner.start(InProcessZooKeeperRunner.java:52) at org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServer(ZooKeeperManager.java:476) at org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:447) at org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:247) at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:56) at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90) at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:627) at org.apache.hadoop.mapred.MapTask.runImpl(MapTask.java:301) at org.apache.hadoop.mapred.Task.run(Task.java:604) at org.apache.hadoop.mapred.CoronaChild.main(CoronaChild.java:177) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1097) Fix TestOutOfCore.testOutOfCoreLocalDiskAccessor
Sergey Edunov created GIRAPH-1097: - Summary: Fix TestOutOfCore.testOutOfCoreLocalDiskAccessor Key: GIRAPH-1097 URL: https://issues.apache.org/jira/browse/GIRAPH-1097 Project: Giraph Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sergey Edunov Assignee: Sergey Edunov Fix For: 1.2.0 TestOutOfCore.testOutOfCoreLocalDiskAccessor for hadoop_1 profile fails. java.lang.AssertionError at org.junit.Assert.fail(Assert.java:86) at org.junit.Assert.assertTrue(Assert.java:41) at org.junit.Assert.assertTrue(Assert.java:52) at org.apache.giraph.TestOutOfCore.runTest(TestOutOfCore.java:100) at org.apache.giraph.TestOutOfCore.testOutOfCoreLocalDiskAccessor(TestOutOfCore.java:84) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1095) Performance regression after GIRAPH-1068
Sergey Edunov created GIRAPH-1095: - Summary: Performance regression after GIRAPH-1068 Key: GIRAPH-1095 URL: https://issues.apache.org/jira/browse/GIRAPH-1095 Project: Giraph Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sergey Edunov Assignee: Sergey Edunov Fix For: 1.2.0 We noticed significant performance regression caused by GIRAPH-1068 for jobs that have a lot of supersteps. This is likely caused by some missing zookeeper options. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1094) Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput
Sergey Edunov created GIRAPH-1094: - Summary: Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput Key: GIRAPH-1094 URL: https://issues.apache.org/jira/browse/GIRAPH-1094 Project: Giraph Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sergey Edunov This test case seems to fail because of a missing dependency. 16/07/13 22:31:50 ERROR master.MasterFileSystem: bootstrap org.apache.hadoop.hbase.DroppedSnapshotException: region: -ROOT-,,0 at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1684) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1552) at org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1047) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:995) at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:960) at org.apache.hadoop.hbase.master.MasterFileSystem.bootstrap(MasterFileSystem.java:523) at org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:463) at org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:148) at org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:133) at org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:573) at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:432) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NoClassDefFoundError: com/google/common/io/NullOutputStream at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:375) at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:1330) at org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:913) at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:794) at org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2429) at org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1659) Full log can be found here: https://builds.apache.org/job/Giraph-1.2/5/MVN_PROFILE=hadoop_1,jdk=JDK%201.7%20(latest),label=ubuntu/testReport/junit/org.apache.giraph.io.hbase/TestHBaseRootMarkerVertextFormat/testHBaseInputOutput/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1093) Fix TestRexsterLongDoubleFloatIOFormat test case
Sergey Edunov created GIRAPH-1093: - Summary: Fix TestRexsterLongDoubleFloatIOFormat test case Key: GIRAPH-1093 URL: https://issues.apache.org/jira/browse/GIRAPH-1093 Project: Giraph Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sergey Edunov This test case appears fluky, and fails with "Address already in use" exception: java.net.BindException: Address already in use at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:463) at sun.nio.ch.Net.bind(Net.java:455) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:395) at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:366) at org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:357) at org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:606) at org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:260) at com.tinkerpop.rexster.server.HttpRexsterServer.reconfigure(HttpRexsterServer.java:195) at com.tinkerpop.rexster.server.HttpRexsterServer.start(HttpRexsterServer.java:149) at Full log can be found here: https://builds.apache.org/job/Giraph-1.2/5/MVN_PROFILE=hadoop_2,jdk=JDK%201.7%20(latest),label=ubuntu/testReport/junit/org.apache.giraph.rexster.io.formats/TestRexsterLongDoubleFloatIOFormat/org_apache_giraph_rexster_io_formats_TestRexsterLongDoubleFloatIOFormat/ -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1091) Fix SimpleRangePartitionFactoryTest
[ https://issues.apache.org/jira/browse/GIRAPH-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1091: -- Fix Version/s: 1.2.0 > Fix SimpleRangePartitionFactoryTest > --- > > Key: GIRAPH-1091 > URL: https://issues.apache.org/jira/browse/GIRAPH-1091 > Project: Giraph > Issue Type: Bug >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > Fix For: 1.2.0 > > > SimpleRangePartitionFactoryTest relied on old logic for calculating number of > partitions and got broken with GIRAPH-1082. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1092) TestCollections.testLargeBasicList fails with OOM
[ https://issues.apache.org/jira/browse/GIRAPH-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1092: -- Fix Version/s: 1.2.0 > TestCollections.testLargeBasicList fails with OOM > - > > Key: GIRAPH-1092 > URL: https://issues.apache.org/jira/browse/GIRAPH-1092 > Project: Giraph > Issue Type: Bug >Affects Versions: 1.2.0 >Reporter: Sergey Edunov >Assignee: Sergey Edunov > Fix For: 1.2.0 > > > TestCollections.testLargeBasicList fails with OOM in Jenkins. This test case > requires more than 1G memory to run. After a small chat with author we > decided to disable this test case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1092) TestCollections.testLargeBasicList fails with OOM
Sergey Edunov created GIRAPH-1092: - Summary: TestCollections.testLargeBasicList fails with OOM Key: GIRAPH-1092 URL: https://issues.apache.org/jira/browse/GIRAPH-1092 Project: Giraph Issue Type: Bug Affects Versions: 1.2.0 Reporter: Sergey Edunov Assignee: Sergey Edunov TestCollections.testLargeBasicList fails with OOM in Jenkins. This test case requires more than 1G memory to run. After a small chat with author we decided to disable this test case. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1079) Add triangle counting example
[ https://issues.apache.org/jira/browse/GIRAPH-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov resolved GIRAPH-1079. --- Resolution: Fixed > Add triangle counting example > - > > Key: GIRAPH-1079 > URL: https://issues.apache.org/jira/browse/GIRAPH-1079 > Project: Giraph > Issue Type: New Feature >Reporter: Maja Kabiljo >Assignee: Maja Kabiljo >Priority: Minor > Fix For: 1.2.0 > > > Add an app for triangle counting -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-882) List of zookeeper connection strings is trimmed by Hadoop counters.
[ https://issues.apache.org/jira/browse/GIRAPH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359227#comment-15359227 ] Sergey Edunov commented on GIRAPH-882: -- I believe that if you set up external zookeeper quorum this code isn't even gets called. In this case, all access to the zookeeper goes through ZooKeeperExt. Check for example GraphTaskManager.java, these lines: String serverPortList = conf.getZookeeperList(); if (serverPortList.isEmpty()) { if (startZooKeeperManager()) { return; // ZK connect/startup failed } } else { createZooKeeperCounter(serverPortList); } The first "if" is the only entry point into the ZooKeeperManager as far as I can see. And we don't go there if external zookeeper quorum is set. > List of zookeeper connection strings is trimmed by Hadoop counters. > --- > > Key: GIRAPH-882 > URL: https://issues.apache.org/jira/browse/GIRAPH-882 > Project: Giraph > Issue Type: Bug > Components: zookeeper >Affects Versions: 1.1.0, 1.2.0 >Reporter: Lukas Nalezenec > Attachments: GIRAPH-882-rev2.patch, GIRAPH-882-rev3.patch, > GIRAPH-882.patch, testrun.log > > > We are running job with quorum of 3 zookeepers. Each serves has got long name > (turing452.fi.callan.de:22181). Connection strings are stored to Hadoop > Counters (for example: > turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turing488.fi.callan.de:22181) > but since name of counter is limited to ~63 character the connection string > is trimmed (turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin). > 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Client > environment:user.name=hadoop > 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Initiating client connection, > connectString=turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin > sessionTimeout=6 > Exception in thread "main" java.net.UnknownHostException: turin > at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) > at > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) > at java.net.InetAddress.getAllByName0(InetAddress.java:1246) > at java.net.InetAddress.getAllByName(InetAddress.java:1162) > at java.net.InetAddress.getAllByName(InetAddress.java:1098) > at > org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60) > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445) > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380) > at org.apache.giraph.zk.ZooKeeperExt.(ZooKeeperExt.java:114) > at > org.apache.giraph.job.JobProgressTracker.(JobProgressTracker.java:69) > at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:255) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-882) List of zookeeper connection strings is trimmed by Hadoop counters.
[ https://issues.apache.org/jira/browse/GIRAPH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355531#comment-15355531 ] Sergey Edunov commented on GIRAPH-882: -- Hi Donngjin, Are you available for chat sometime this week? I'm trying to figure out how this diff relates to GIRAPH-1068. There we removed an option to start multiple zookeepers from the giraph itself. It is still possible to connect to external zookeeper quorum though. To understand what needs to be done here, can you explain your use case? If I understand correctly, here you create 5 zookeeper instances from the Giraph job. What is the reason, and why don't you use single zookeper? > List of zookeeper connection strings is trimmed by Hadoop counters. > --- > > Key: GIRAPH-882 > URL: https://issues.apache.org/jira/browse/GIRAPH-882 > Project: Giraph > Issue Type: Bug > Components: zookeeper >Affects Versions: 1.1.0, 1.2.0 >Reporter: Lukas Nalezenec > Attachments: GIRAPH-882-rev2.patch, GIRAPH-882-rev3.patch, > GIRAPH-882.patch, testrun.log > > > We are running job with quorum of 3 zookeepers. Each serves has got long name > (turing452.fi.callan.de:22181). Connection strings are stored to Hadoop > Counters (for example: > turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turing488.fi.callan.de:22181) > but since name of counter is limited to ~63 character the connection string > is trimmed (turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin). > 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Client > environment:user.name=hadoop > 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Initiating client connection, > connectString=turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin > sessionTimeout=6 > Exception in thread "main" java.net.UnknownHostException: turin > at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) > at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) > at > java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) > at java.net.InetAddress.getAllByName0(InetAddress.java:1246) > at java.net.InetAddress.getAllByName(InetAddress.java:1162) > at java.net.InetAddress.getAllByName(InetAddress.java:1098) > at > org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60) > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445) > at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380) > at org.apache.giraph.zk.ZooKeeperExt.(ZooKeeperExt.java:114) > at > org.apache.giraph.job.JobProgressTracker.(JobProgressTracker.java:69) > at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:255) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-1068) Make Zookeeper accept 0 as a port number and let it choose any available free port
[ https://issues.apache.org/jira/browse/GIRAPH-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-1068: -- Summary: Make Zookeeper accept 0 as a port number and let it choose any available free port (was: Make Zookeeper accept 0 as a port number and let it choose free available port) > Make Zookeeper accept 0 as a port number and let it choose any available free > port > -- > > Key: GIRAPH-1068 > URL: https://issues.apache.org/jira/browse/GIRAPH-1068 > Project: Giraph > Issue Type: Task >Reporter: Sergey Edunov >Assignee: Sergey Edunov > > We have a few use cases where having zookeeper bound to specific port is very > inconvenient. > 1) Unit tests that run in parallel. > 2) Shared clusters where multiple giraph instances can run on the same > machines. > In theory we don't need to know what port zookeeper will run on. In most > cases we're fine with any port available. > Picking any available port is currently supported by the server socket, but > is not supported in the code that parses zookeper configs (this code lives in > zookeper). > We don't have to parse configs though, as we have a way to run zookeper in > process. And in that case we can have a full control on how zookeeper is > initialized. > For this task I want to allow 0 as a port number for zookeeper. Which will > allow us to run zookeeper on any available port. And I will also remove "out > of process" zookeeper, as it clearly provides no benefits to us. > Note: it will still be possible to run external zookeper, if you have it > running somewhere as a service. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1043) Implementation of Darwini graph generator
Sergey Edunov created GIRAPH-1043: - Summary: Implementation of Darwini graph generator Key: GIRAPH-1043 URL: https://issues.apache.org/jira/browse/GIRAPH-1043 Project: Giraph Issue Type: Task Reporter: Sergey Edunov Assignee: Sergey Edunov Implementation of graph generator that is able to capture many properties of social graphs, such as high local clustering coefficient, non-power law degree distributions and log normal joint degree distribution. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-1024) mvn release:prepare not committing changes to pom.xml
Sergey Edunov created GIRAPH-1024: - Summary: mvn release:prepare not committing changes to pom.xml Key: GIRAPH-1024 URL: https://issues.apache.org/jira/browse/GIRAPH-1024 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov This issue is pretty much described in stackoverflow post: http://stackoverflow.com/questions/15166781/mvn-releaseprepare-not-committing-changes-to-pom-xml with newer versions of git we are unable to release giraph. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (GIRAPH-1024) mvn release:prepare not committing changes to pom.xml
[ https://issues.apache.org/jira/browse/GIRAPH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov resolved GIRAPH-1024. --- Resolution: Fixed https://reviews.facebook.net/D43215 mvn release:prepare not committing changes to pom.xml - Key: GIRAPH-1024 URL: https://issues.apache.org/jira/browse/GIRAPH-1024 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov This issue is pretty much described in stackoverflow post: http://stackoverflow.com/questions/15166781/mvn-releaseprepare-not-committing-changes-to-pom-xml with newer versions of git we are unable to release giraph. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-1000) Multi Output support
[ https://issues.apache.org/jira/browse/GIRAPH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379892#comment-14379892 ] Sergey Edunov commented on GIRAPH-1000: --- That would be a great addition to Giraph! I was thinking about it a while ago. Seems like we can implement it in a similar to multiple input format way. See for example: org.apache.giraph.io.formats.multi.MultiVertexInputFormat and other classes in the same package. This is essentially a wrapper around a list on inputs providing the same API as single input format does. In a same way we can have a wrapper around VertexOutputFormat and EdgeOutputFormat providing same APIs, and then just plug them in. We also need this feature, so I'll be happy to help Multi Output support Key: GIRAPH-1000 URL: https://issues.apache.org/jira/browse/GIRAPH-1000 Project: Giraph Issue Type: Improvement Components: bsp, conf and scripts, graph Affects Versions: 1.0.0, 1.1.0, 1.2.0-SNAPSHOT Reporter: Alessio Arleo Labels: features Hadoop natively supports multiple outputs. The objective is to extend Giraph to support multiple output formats during a single giraph run. According to the official Hadoop apidocs*, to take advantage of multiple outputs the the pattern is the following: - Modify the job submission - Modify the reducer class to write on the declared different outputs Since Giraph jobs are executed as mappers, probably this approach (or at least its second part) is not feasible, so further investigation is necessary. *https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-992) Zookeeper logs have too many NodeExists exceptions
Sergey Edunov created GIRAPH-992: Summary: Zookeeper logs have too many NodeExists exceptions Key: GIRAPH-992 URL: https://issues.apache.org/jira/browse/GIRAPH-992 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov There are several places in our code where we do not check properly if zk node already exists before attempting to create it. As a result ZK logs are full of these exceptions. Biggest offender is recursive path creation in ZooKeeperExt.createExt(). Obviously if part of the path already exists we don't need to create it. Second biggest offender is writing input splits from master. Here we lunch multiple threads each of them attempting to create the same path. INFO2015-02-04 14:32:39,730 [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x14b56b9176f0001 type:create cxid:0x1 zxid:0x19 txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_201411061513.83344_0001/_masterJobState Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_201411061513.83344_0001/_masterJobState INFO2015-02-04 14:32:39,740 [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x14b56b9176f0001 type:create cxid:0x3 zxid:0x1a txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_201411061513.83344_0001/_applicationAttemptsDir Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_201411061513.83344_0001/_applicationAttemptsDir INFO2015-02-04 14:32:39,742 [ProcessThread(sid:0 cport:-1):] org.apache.zookeeper.server.PrepRequestProcessor - Got user-level KeeperException when processing sessionid:0x14b56b9176f0001 type:create cxid:0x5 zxid:0x1b txntype:-1 reqpath:n/a Error Path:/_hadoopBsp/job_201411061513.83344_0001/_applicationAttemptsDir/0/_superstepDir Error:KeeperErrorCode = NodeExists for /_hadoopBsp/job_201411061513.83344_0001/_applicationAttemptsDir/0/_superstepDir -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-882) List of zookeeper connection strings is trimmed by Hadoop counters.
[ https://issues.apache.org/jira/browse/GIRAPH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290342#comment-14290342 ] Sergey Edunov commented on GIRAPH-882: -- Thank you, Lee! I've added one comment to your diff and it looks fine for me otherwise. I'll try using this patch on our cluster to make sure it works and will post update here. List of zookeeper connection strings is trimmed by Hadoop counters. --- Key: GIRAPH-882 URL: https://issues.apache.org/jira/browse/GIRAPH-882 Project: Giraph Issue Type: Bug Components: zookeeper Affects Versions: 1.1.0 Reporter: Lukas Nalezenec Attachments: GIRAPH-882-rev2.patch, GIRAPH-882.patch, testrun.log We are running job with quorum of 3 zookeepers. Each serves has got long name (turing452.fi.callan.de:22181). Connection strings are stored to Hadoop Counters (for example: turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turing488.fi.callan.de:22181) but since name of counter is limited to ~63 character the connection string is trimmed (turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin). 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin sessionTimeout=6 Exception in thread main java.net.UnknownHostException: turin at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) at java.net.InetAddress.getAllByName0(InetAddress.java:1246) at java.net.InetAddress.getAllByName(InetAddress.java:1162) at java.net.InetAddress.getAllByName(InetAddress.java:1098) at org.apache.zookeeper.client.StaticHostProvider.init(StaticHostProvider.java:60) at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:445) at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:380) at org.apache.giraph.zk.ZooKeeperExt.init(ZooKeeperExt.java:114) at org.apache.giraph.job.JobProgressTracker.init(JobProgressTracker.java:69) at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:255) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-882) List of zookeeper connection strings is trimmed by Hadoop counters.
[ https://issues.apache.org/jira/browse/GIRAPH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278039#comment-14278039 ] Sergey Edunov commented on GIRAPH-882: -- Lee, thank you for working on this! Please make sure you follow the steps http://giraph.apache.org/generating_patches.html In particular, you need to run mvn clean verify. It looks like your change actually breaks unit tests. It is also recommended to submit review board request or phabricator code review for big patches (this one seems big enough). I submitted one for you: https://reviews.facebook.net/D31563 and put some comments there. Also you should be working against latest version of Giraph from git, I was able to merge your changes, but the more complicated your patch is, the harder it will be to merge. List of zookeeper connection strings is trimmed by Hadoop counters. --- Key: GIRAPH-882 URL: https://issues.apache.org/jira/browse/GIRAPH-882 Project: Giraph Issue Type: Bug Components: zookeeper Affects Versions: 1.1.0 Reporter: Lukas Nalezenec Attachments: GIRAPH-882.patch, testrun.log We are running job with quorum of 3 zookeepers. Each serves has got long name (turing452.fi.callan.de:22181). Connection strings are stored to Hadoop Counters (for example: turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turing488.fi.callan.de:22181) but since name of counter is limited to ~63 character the connection string is trimmed (turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin). 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Client environment:user.name=hadoop 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin sessionTimeout=6 Exception in thread main java.net.UnknownHostException: turin at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method) at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901) at java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293) at java.net.InetAddress.getAllByName0(InetAddress.java:1246) at java.net.InetAddress.getAllByName(InetAddress.java:1162) at java.net.InetAddress.getAllByName(InetAddress.java:1098) at org.apache.zookeeper.client.StaticHostProvider.init(StaticHostProvider.java:60) at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:445) at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:380) at org.apache.giraph.zk.ZooKeeperExt.init(ZooKeeperExt.java:114) at org.apache.giraph.job.JobProgressTracker.init(JobProgressTracker.java:69) at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:255) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-983) Remove checkpoint related error messages from console
Sergey Edunov created GIRAPH-983: Summary: Remove checkpoint related error messages from console Key: GIRAPH-983 URL: https://issues.apache.org/jira/browse/GIRAPH-983 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Assignee: Sergey Edunov Priority: Minor If for any reason job fails, we always see checkpointing related error in console. This should be removed as it confuses users. INFO2014-12-28 18:58:42,913 Can't find any checkpoints for jobID=job_201407091116.163675_0001 java.io.FileNotFoundException: File _bsp/_checkpoints/job_201407091116.163675_0001 does not exist at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1179) at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1216) at org.apache.giraph.utils.CheckpointingUtils.getLastCheckpointedSuperstep(CheckpointingUtils.java:106) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-978) Giraph-Debugger Test Graphs not working
[ https://issues.apache.org/jira/browse/GIRAPH-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270195#comment-14270195 ] Sergey Edunov commented on GIRAPH-978: -- CR: https://reviews.facebook.net/D31137 Giraph-Debugger Test Graphs not working --- Key: GIRAPH-978 URL: https://issues.apache.org/jira/browse/GIRAPH-978 Project: Giraph Issue Type: Bug Components: graph Reporter: Nishant M Gandhi Assignee: Sergey Edunov Priority: Minor Labels: patch Attachments: Giraph-debug-000.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-980) Way to disable checkpoints for particular job and on particular supersteps
Sergey Edunov created GIRAPH-980: Summary: Way to disable checkpoints for particular job and on particular supersteps Key: GIRAPH-980 URL: https://issues.apache.org/jira/browse/GIRAPH-980 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov It is currently impossible to disable checkpoints for particular jobs. For example jobs that do output during the computation do not support checkpointing and no attempt should be made to checkpoint such job. Over use cases exist when we need to manually specify which supersteps are checkpointable for particular job. We need a generic way to configure job ability to do checkpoints. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (GIRAPH-980) Way to disable checkpoints for particular job and on particular supersteps
[ https://issues.apache.org/jira/browse/GIRAPH-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov reassigned GIRAPH-980: Assignee: Sergey Edunov Way to disable checkpoints for particular job and on particular supersteps -- Key: GIRAPH-980 URL: https://issues.apache.org/jira/browse/GIRAPH-980 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Assignee: Sergey Edunov It is currently impossible to disable checkpoints for particular jobs. For example jobs that do output during the computation do not support checkpointing and no attempt should be made to checkpoint such job. Over use cases exist when we need to manually specify which supersteps are checkpointable for particular job. We need a generic way to configure job ability to do checkpoints. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-979) Add type of input to 'missing input' error message
[ https://issues.apache.org/jira/browse/GIRAPH-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267973#comment-14267973 ] Sergey Edunov commented on GIRAPH-979: -- +1 Add type of input to 'missing input' error message -- Key: GIRAPH-979 URL: https://issues.apache.org/jira/browse/GIRAPH-979 Project: Giraph Issue Type: Improvement Reporter: Maja Kabiljo Assignee: Maja Kabiljo Attachments: GIRAPH-979.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Reopened] (GIRAPH-978) Giraph-Debugger Test Graphs not working
[ https://issues.apache.org/jira/browse/GIRAPH-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov reopened GIRAPH-978: -- Assignee: Sergey Edunov Giraph-Debugger Test Graphs not working --- Key: GIRAPH-978 URL: https://issues.apache.org/jira/browse/GIRAPH-978 Project: Giraph Issue Type: Bug Components: graph Reporter: Nishant M Gandhi Assignee: Sergey Edunov Priority: Minor Labels: patch Attachments: Giraph-debug-000.patch -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-975) In-proc ZooKeeper server with Master process
Sergey Edunov created GIRAPH-975: Summary: In-proc ZooKeeper server with Master process Key: GIRAPH-975 URL: https://issues.apache.org/jira/browse/GIRAPH-975 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Assignee: Sergey Edunov Currently by default zookeeper runs as a separate java process, on the same server where master runs. This prevents us from seeing zookeeper logs and makes it harder to debug memory issues. We should be able to run zookeeper inside Master process and perhaps this should be default. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-972) Race condition in checkpointing
Sergey Edunov created GIRAPH-972: Summary: Race condition in checkpointing Key: GIRAPH-972 URL: https://issues.apache.org/jira/browse/GIRAPH-972 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Couple of issues noticed with checkpointing of large jobs: 1) Task ID of master appears to be important. In most cases it is 0, however sometimes it is not and as we can not control it checkpointing should not depend on it. 2) Race condition happens on master when worker dies: org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /_hadoopBsp/job_201411061513.38895_0001/_applicationAttemptsDir/0/_superstepDir/9/_workerHealthyDir/hadoop4921.prn2.facebook.com_3 at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151) at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1180) at org.apache.giraph.zk.ZooKeeperExt.getData(ZooKeeperExt.java:470) at org.apache.giraph.utils.WritableUtils.readFieldsFromZnode(WritableUtils.java:126) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-973) Edge trimming no longer works before superstep 0
Sergey Edunov created GIRAPH-973: Summary: Edge trimming no longer works before superstep 0 Key: GIRAPH-973 URL: https://issues.apache.org/jira/browse/GIRAPH-973 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Edge trimming was introduced in GIRAPH-895 no longer works before 0 superstep. We used to trim edges after input is done and that reduced memory consumption in superstep 0, this is no longer works and we only trim edge arrays at the end of superstep 0. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (GIRAPH-966) Add a way to ignore some thread exceptions
[ https://issues.apache.org/jira/browse/GIRAPH-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230249#comment-14230249 ] Sergey Edunov commented on GIRAPH-966: -- looks good to me Add a way to ignore some thread exceptions -- Key: GIRAPH-966 URL: https://issues.apache.org/jira/browse/GIRAPH-966 Project: Giraph Issue Type: New Feature Reporter: Maja Kabiljo Assignee: Maja Kabiljo Priority: Minor Attachments: GIRAPH-966.patch Add a way not to fail a mapper when an exception happens on some non core threads. By default still fail on every exception. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-963) Aggregators may fail with IllegalArgumentException upon deserialization
Sergey Edunov created GIRAPH-963: Summary: Aggregators may fail with IllegalArgumentException upon deserialization Key: GIRAPH-963 URL: https://issues.apache.org/jira/browse/GIRAPH-963 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Priority: Trivial Found this in one of the runs, fix is simple: java.lang.IllegalArgumentException: Trying to configure configurable object without value, class … at org.apache.giraph.utils.ConfigurationUtils.configureIfPossible(ConfigurationUtils.java:153) at org.apache.giraph.utils.ReflectionUtils.newInstance(ReflectionUtils.java:111) at org.apache.giraph.master.AggregatorReduceOperation.initAggregator(AggregatorReduceOperation.java:65) at org.apache.giraph.master.AggregatorReduceOperation.readFields(AggregatorReduceOperation.java:114) at org.apache.giraph.master.AggregatorToGlobalCommTranslation$AggregatorWrapper.readFields(AggregatorToGlobalCommTranslation.java:288) at org.apache.giraph.master.AggregatorToGlobalCommTranslation.readFields(AggregatorToGlobalCommTranslation.java:184) at org.apache.giraph.master.BspServiceMaster.prepareCheckpointRestart(BspServiceMaster.java:823) at org.apache.giraph.master.BspServiceMaster.assignPartitionOwners(BspServiceMaster.java:1140) at org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1609) at org.apache.giraph.master.MasterThread.run(MasterThread.java:124) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (GIRAPH-950) Auto-restart from checkpoint doesn't pick up latest checkpoint
Sergey Edunov created GIRAPH-950: Summary: Auto-restart from checkpoint doesn't pick up latest checkpoint Key: GIRAPH-950 URL: https://issues.apache.org/jira/browse/GIRAPH-950 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov While running different jobs with checkpoints enabled I noticed some issues: 1) The way we pick up latest checkpoint is not correct. Current implementation just picks whatever is returned last from FileSystem.list(), which is not necessarily the last checkpoint 2) If job restarts from checkpoint it immediately creates another checkpoint. 3) We need more flexibility in GiraphJobRetryChecker to allow restarts after multiple failures. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (GIRAPH-933) Checkpointing improvements
[ https://issues.apache.org/jira/browse/GIRAPH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-933: - Attachment: GIRAPH-933.3.patch Checkpointing improvements -- Key: GIRAPH-933 URL: https://issues.apache.org/jira/browse/GIRAPH-933 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-933.2.patch, GIRAPH-933.3.patch, GIRAPH-933.patch Original Estimate: 48h Remaining Estimate: 48h We need to address some issues with checkpointing: 1) worker2worker messages are not saved 2) BspServiceWorker does not compile under hadoop_0.23 profile 3) it would be nice to be able to manually checkpoint and stop any job at any point of time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-933) Checkpointing improvements
[ https://issues.apache.org/jira/browse/GIRAPH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-933: - Attachment: GIRAPH-933.4.patch Checkpointing improvements -- Key: GIRAPH-933 URL: https://issues.apache.org/jira/browse/GIRAPH-933 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-933.2.patch, GIRAPH-933.3.patch, GIRAPH-933.4.patch, GIRAPH-933.patch Original Estimate: 48h Remaining Estimate: 48h We need to address some issues with checkpointing: 1) worker2worker messages are not saved 2) BspServiceWorker does not compile under hadoop_0.23 profile 3) it would be nice to be able to manually checkpoint and stop any job at any point of time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (GIRAPH-940) Cleanup the list of supported hadoop versions.
Sergey Edunov created GIRAPH-940: Summary: Cleanup the list of supported hadoop versions. Key: GIRAPH-940 URL: https://issues.apache.org/jira/browse/GIRAPH-940 Project: Giraph Issue Type: Task Reporter: Sergey Edunov We now support 14 hadoop version: hadoop_0.20.203 hadoop_0.23 hadoop_1 hadoop_2 hadoop_2.0.0 hadoop_2.0.1 hadoop_2.0.2 hadoop_2.0.3 hadoop_cdh4.1.2 hadoop_facebook hadoop_non_secure hadoop_snapshot hadoop_trunk hadoop_yarn Some of them have known issues like this one https://issues.apache.org/jira/browse/MAPREDUCE-118 in hadoop_0.20.203 This one particularly blocks https://issues.apache.org/jira/browse/GIRAPH-933 Some of them don't even compile (hadoop_2.0.3 , hadoop_2.0.2, hadoop_2.0.1,hadoop_2.0.0, hadoop_0.23, hadoop_non_secure). I have no idea how many of our 'supported profiles' actually work. I think we should review the list of supported hadoop versions and clear the list of profiles to make our lives easier. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-933) Checkpointing improvements
[ https://issues.apache.org/jira/browse/GIRAPH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-933: - Attachment: GIRAPH-933.2.patch Checkpointing improvements -- Key: GIRAPH-933 URL: https://issues.apache.org/jira/browse/GIRAPH-933 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-933.2.patch, GIRAPH-933.patch Original Estimate: 48h Remaining Estimate: 48h We need to address some issues with checkpointing: 1) worker2worker messages are not saved 2) BspServiceWorker does not compile under hadoop_0.23 profile 3) it would be nice to be able to manually checkpoint and stop any job at any point of time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-931) Provide a Strongly Connected Components algorithm
[ https://issues.apache.org/jira/browse/GIRAPH-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088068#comment-14088068 ] Sergey Edunov commented on GIRAPH-931: -- Hi Gianluca, Thank you for working on this! Can you please submit review board request (https://reviews.apache.org) or arcanist review request (https://reviews.facebook.net/) next time? Here is a list of things I noticed in the diff: 1) Make sure you're consistent with using primitive types vs objects. E.g. in SccVertexValue value is Long while it should really be long, as you don't expect it to be null in write(). 2) Similarly you don't have to use ListLong if you intend to store only primitive longs you can use primitive collections from fastutils, like it.unimi.dsi.fastutil.longs.LongArrayList these have smaller memory footprint and hence can help scaling your code. 3) In SccComputation you can move phase extraction from compute() to preSuperstep(), phase is not changed during the superstep and hence, no need to call it for each vertice. 4) You can avoid creating new LongWritable() every time you send message by having LongWritable field in SccComputation and reusing it. 5) In clearParents(), call to parents.clear() is not necessary as you discard parent right after this call anyway. These are mostly performance improvements, algorithm implementation is clean and looks good to me. Again, thank you for this great work! Provide a Strongly Connected Components algorithm - Key: GIRAPH-931 URL: https://issues.apache.org/jira/browse/GIRAPH-931 Project: Giraph Issue Type: Improvement Components: examples Reporter: Gianluca Righetto Priority: Minor Attachments: GIRAPH-931.patch Provide an implementation of an algorithm for finding strongly connected components in a graph to augment the giraph-examples library. This has been initially proposed on GSoC'14. A handful of graph algorithms have been researched in this paper: Optimizing Graph Algorithms on Pregel-like Systems (Salihoglu, S., Widom, J., 2014), and a detailed explanation of SCC can also be found in it. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-927) Decouple netty server threads from message processing
[ https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082630#comment-14082630 ] Sergey Edunov commented on GIRAPH-927: -- Awesome! Thank you Craig! I'll create new JIRA and submit this patch for CR Decouple netty server threads from message processing - Key: GIRAPH-927 URL: https://issues.apache.org/jira/browse/GIRAPH-927 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-927.patch, async.patch Original Estimate: 168h Remaining Estimate: 168h Our profiling shows that a lot of apps are neither CPU nor memory or network bound. Instead they waste a lot of time waiting for lock in MessageStore. That happens in netty threads. We should be able to put messages into queue and then process them in other set of threads. It has to be configurable because adding another thread level will introduce additional overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-927) Decouple netty server threads from message processing
[ https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082642#comment-14082642 ] Sergey Edunov commented on GIRAPH-927: -- https://issues.apache.org/jira/browse/GIRAPH-936 Decouple netty server threads from message processing - Key: GIRAPH-927 URL: https://issues.apache.org/jira/browse/GIRAPH-927 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-927.patch, async.patch Original Estimate: 168h Remaining Estimate: 168h Our profiling shows that a lot of apps are neither CPU nor memory or network bound. Instead they waste a lot of time waiting for lock in MessageStore. That happens in netty threads. We should be able to put messages into queue and then process them in other set of threads. It has to be configurable because adding another thread level will introduce additional overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-936) AsyncMessageStoreWrapper threads are not daemonized
[ https://issues.apache.org/jira/browse/GIRAPH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-936: - Attachment: GIRAPH-936.patch AsyncMessageStoreWrapper threads are not daemonized --- Key: GIRAPH-936 URL: https://issues.apache.org/jira/browse/GIRAPH-936 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Attachments: GIRAPH-936.patch Original Estimate: 2h Remaining Estimate: 2h Issue related to https://issues.apache.org/jira/browse/GIRAPH-927 AsyncMessageStoreWrapper starts a set of threads without making them daemons. Hence mappers are unable to complete computations. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (GIRAPH-933) Checkpointing improvements
Sergey Edunov created GIRAPH-933: Summary: Checkpointing improvements Key: GIRAPH-933 URL: https://issues.apache.org/jira/browse/GIRAPH-933 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov We need to address some issues with checkpointing: 1) worker2worker messages are not saved 2) BspServiceWorker does not compile under hadoop_0.23 profile 3) it would be nice to be able to manually checkpoint and stop any job at any point of time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-933) Checkpointing improvements
[ https://issues.apache.org/jira/browse/GIRAPH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-933: - Attachment: GIRAPH-933.patch Checkpointing improvements -- Key: GIRAPH-933 URL: https://issues.apache.org/jira/browse/GIRAPH-933 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-933.patch Original Estimate: 48h Remaining Estimate: 48h We need to address some issues with checkpointing: 1) worker2worker messages are not saved 2) BspServiceWorker does not compile under hadoop_0.23 profile 3) it would be nice to be able to manually checkpoint and stop any job at any point of time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-933) Checkpointing improvements
[ https://issues.apache.org/jira/browse/GIRAPH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076430#comment-14076430 ] Sergey Edunov commented on GIRAPH-933: -- Review request: https://reviews.apache.org/r/23989 Checkpointing improvements -- Key: GIRAPH-933 URL: https://issues.apache.org/jira/browse/GIRAPH-933 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-933.patch Original Estimate: 48h Remaining Estimate: 48h We need to address some issues with checkpointing: 1) worker2worker messages are not saved 2) BspServiceWorker does not compile under hadoop_0.23 profile 3) it would be nice to be able to manually checkpoint and stop any job at any point of time. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-930) Trailing space in ZooKeeper serverList file trips some file systems
[ https://issues.apache.org/jira/browse/GIRAPH-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076462#comment-14076462 ] Sergey Edunov commented on GIRAPH-930: -- Hi Mostafa, Thank you for working on this. Your implementation looks good to me, but I was thinking about maybe cleaning up some code above instead, what do you think? Attaching another patch (haven't tested yet, will do in the afternoon) Trailing space in ZooKeeper serverList file trips some file systems --- Key: GIRAPH-930 URL: https://issues.apache.org/jira/browse/GIRAPH-930 Project: Giraph Issue Type: Bug Environment: HDInsight Reporter: Mostafa Elhemali Attachments: GIRAPH-930.diff, GIRAPH-930.patch In Azure HDInsight (Hadoop in Microsoft Azure), the default file system is the WASB file system which places data on Azure blob storage. This file system doesn't handle trailing spaces that well, so when the ZooKeeperManager class tries to create the serverList file with a trailing space in there things fall apart. Ideally WASB would handle trailing spaces, but in this case the trailing space is not really necessary. Can we remove it? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-930) Trailing space in ZooKeeper serverList file trips some file systems
[ https://issues.apache.org/jira/browse/GIRAPH-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-930: - Attachment: GIRAPH-930.patch Trailing space in ZooKeeper serverList file trips some file systems --- Key: GIRAPH-930 URL: https://issues.apache.org/jira/browse/GIRAPH-930 Project: Giraph Issue Type: Bug Environment: HDInsight Reporter: Mostafa Elhemali Attachments: GIRAPH-930.diff, GIRAPH-930.patch In Azure HDInsight (Hadoop in Microsoft Azure), the default file system is the WASB file system which places data on Azure blob storage. This file system doesn't handle trailing spaces that well, so when the ZooKeeperManager class tries to create the serverList file with a trailing space in there things fall apart. Ideally WASB would handle trailing spaces, but in this case the trailing space is not really necessary. Can we remove it? -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing
[ https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-927: - Attachment: async.patch Decouple netty server threads from message processing - Key: GIRAPH-927 URL: https://issues.apache.org/jira/browse/GIRAPH-927 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-927.patch, async.patch Original Estimate: 168h Remaining Estimate: 168h Our profiling shows that a lot of apps are neither CPU nor memory or network bound. Instead they waste a lot of time waiting for lock in MessageStore. That happens in netty threads. We should be able to put messages into queue and then process them in other set of threads. It has to be configurable because adding another thread level will introduce additional overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-927) Decouple netty server threads from message processing
[ https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076714#comment-14076714 ] Sergey Edunov commented on GIRAPH-927: -- Hi Craig, I can't reproduce your issue, and everything works within the test case, can you please test this patch? I also attached it to the issue diff --git giraph-core/src/main/java/org/apache/giraph/comm/messages/queue/AsyncMessageStoreWrapper.java giraph-core/src/main/java/org/apache/giraph/comm/messages/queue/AsyncMessageStoreWrapper.java index a62834f..252ee39 100644 --- giraph-core/src/main/java/org/apache/giraph/comm/messages/queue/AsyncMessageStoreWrapper.java +++ giraph-core/src/main/java/org/apache/giraph/comm/messages/queue/AsyncMessageStoreWrapper.java @@ -60,7 +60,7 @@ public final class AsyncMessageStoreWrapperI extends WritableComparable, /** Executor that processes messages in background */ private static final ExecutorService EXECUTOR_SERVICE = Executors.newCachedThreadPool( - new ThreadFactoryBuilder() + new ThreadFactoryBuilder().setDaemon(true) .setNameFormat(AsyncMessageStoreWrapper-%d).build()); /** Number of threads that will process messages in background */ Decouple netty server threads from message processing - Key: GIRAPH-927 URL: https://issues.apache.org/jira/browse/GIRAPH-927 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-927.patch, async.patch Original Estimate: 168h Remaining Estimate: 168h Our profiling shows that a lot of apps are neither CPU nor memory or network bound. Instead they waste a lot of time waiting for lock in MessageStore. That happens in netty threads. We should be able to put messages into queue and then process them in other set of threads. It has to be configurable because adding another thread level will introduce additional overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-927) Decouple netty server threads from message processing
[ https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070509#comment-14070509 ] Sergey Edunov commented on GIRAPH-927: -- Yep, indeed, I'll be working on fix. Decouple netty server threads from message processing - Key: GIRAPH-927 URL: https://issues.apache.org/jira/browse/GIRAPH-927 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-927.patch Original Estimate: 168h Remaining Estimate: 168h Our profiling shows that a lot of apps are neither CPU nor memory or network bound. Instead they waste a lot of time waiting for lock in MessageStore. That happens in netty threads. We should be able to put messages into queue and then process them in other set of threads. It has to be configurable because adding another thread level will introduce additional overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing
[ https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-927: - Attachment: (was: GIRAPH-927.patch) Decouple netty server threads from message processing - Key: GIRAPH-927 URL: https://issues.apache.org/jira/browse/GIRAPH-927 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-927.patch Original Estimate: 168h Remaining Estimate: 168h Our profiling shows that a lot of apps are neither CPU nor memory or network bound. Instead they waste a lot of time waiting for lock in MessageStore. That happens in netty threads. We should be able to put messages into queue and then process them in other set of threads. It has to be configurable because adding another thread level will introduce additional overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing
[ https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-927: - Attachment: GIRAPH-927.patch Decouple netty server threads from message processing - Key: GIRAPH-927 URL: https://issues.apache.org/jira/browse/GIRAPH-927 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-927.patch Original Estimate: 168h Remaining Estimate: 168h Our profiling shows that a lot of apps are neither CPU nor memory or network bound. Instead they waste a lot of time waiting for lock in MessageStore. That happens in netty threads. We should be able to put messages into queue and then process them in other set of threads. It has to be configurable because adding another thread level will introduce additional overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing
[ https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-927: - Attachment: (was: GIRAPH-927.patch) Decouple netty server threads from message processing - Key: GIRAPH-927 URL: https://issues.apache.org/jira/browse/GIRAPH-927 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-927.patch Original Estimate: 168h Remaining Estimate: 168h Our profiling shows that a lot of apps are neither CPU nor memory or network bound. Instead they waste a lot of time waiting for lock in MessageStore. That happens in netty threads. We should be able to put messages into queue and then process them in other set of threads. It has to be configurable because adding another thread level will introduce additional overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing
[ https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-927: - Attachment: GIRAPH-927.patch Decouple netty server threads from message processing - Key: GIRAPH-927 URL: https://issues.apache.org/jira/browse/GIRAPH-927 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-927.patch Original Estimate: 168h Remaining Estimate: 168h Our profiling shows that a lot of apps are neither CPU nor memory or network bound. Instead they waste a lot of time waiting for lock in MessageStore. That happens in netty threads. We should be able to put messages into queue and then process them in other set of threads. It has to be configurable because adding another thread level will introduce additional overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-905) Giraph Debugger
[ https://issues.apache.org/jira/browse/GIRAPH-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064041#comment-14064041 ] Sergey Edunov commented on GIRAPH-905: -- Hi Jaeho, thank you for working on this it is an awesome addition to Giraph! Could you please run checkstyle and fix errors and then submit a review request to review board https://reviews.apache.org/r/ ? Giraph Debugger --- Key: GIRAPH-905 URL: https://issues.apache.org/jira/browse/GIRAPH-905 Project: Giraph Issue Type: New Feature Reporter: Jaeho Shin Attachments: GIRAPH-905.patch Four of us at Stanford (Vikesh Khanna, Semih Salihoglu, Jaeho Shin, and Brian Ba Quan Truong) developed a debugger for Giraph, named Graft, and we hope to integrate our code into Giraph trunk. It is able to launch Giraph jobs in debugging mode to capture traces of certain vertices and MasterCompute at particular supersteps, requiring almost no code change by the user. From the captured traces, it can generate JUnit tests to replicate the contexts under which compute() function was running for the user to reproduce bugs. You can read more about it at our GitHub repository: https://github.com/semihsalihoglu/graft -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-924) Fix checkpointing
[ https://issues.apache.org/jira/browse/GIRAPH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-924: - Attachment: GIRAPH_checkpoint_v3.patch Fix checkpointing - Key: GIRAPH-924 URL: https://issues.apache.org/jira/browse/GIRAPH-924 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-924.patch, GIRAPH_checkpoint_v3.patch Original Estimate: 336h Remaining Estimate: 336h We need to make checkpoiting in Giraph functional again - it misses a lot of data because of many additions we've been making to Giraph (like information from WorkerContext/MasterCompute, proper integration with per superstep output etc). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (GIRAPH-927) Decouple netty server threads from message processing
Sergey Edunov created GIRAPH-927: Summary: Decouple netty server threads from message processing Key: GIRAPH-927 URL: https://issues.apache.org/jira/browse/GIRAPH-927 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Our profiling shows that a lot of apps are neither CPU nor memory or network bound. Instead they waste a lot of time waiting for lock in MessageStore. That happens in netty threads. We should be able to put messages into queue and then process them in other set of threads. It has to be configurable because adding another thread level will introduce additional overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing
[ https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-927: - Attachment: GIRAPH-927.patch Decouple netty server threads from message processing - Key: GIRAPH-927 URL: https://issues.apache.org/jira/browse/GIRAPH-927 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-927.patch Original Estimate: 168h Remaining Estimate: 168h Our profiling shows that a lot of apps are neither CPU nor memory or network bound. Instead they waste a lot of time waiting for lock in MessageStore. That happens in netty threads. We should be able to put messages into queue and then process them in other set of threads. It has to be configurable because adding another thread level will introduce additional overhead. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (GIRAPH-925) Unit tests should pass even if zookeeper port not available
Sergey Edunov created GIRAPH-925: Summary: Unit tests should pass even if zookeeper port not available Key: GIRAPH-925 URL: https://issues.apache.org/jira/browse/GIRAPH-925 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Priority: Minor Currently if something is using port 22182 unit tests will fail. Or even worse, they timeout and then fail. Unit tests should not depend on availability of this port. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-925) Unit tests should pass even if zookeeper port not available
[ https://issues.apache.org/jira/browse/GIRAPH-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-925: - Attachment: GIRAPH-925.patch Unit tests should pass even if zookeeper port not available --- Key: GIRAPH-925 URL: https://issues.apache.org/jira/browse/GIRAPH-925 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Priority: Minor Attachments: GIRAPH-925.patch Original Estimate: 2h Remaining Estimate: 2h Currently if something is using port 22182 unit tests will fail. Or even worse, they timeout and then fail. Unit tests should not depend on availability of this port. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-925) Unit tests should pass even if zookeeper port not available
[ https://issues.apache.org/jira/browse/GIRAPH-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050818#comment-14050818 ] Sergey Edunov commented on GIRAPH-925: -- https://reviews.apache.org/r/23251 Unit tests should pass even if zookeeper port not available --- Key: GIRAPH-925 URL: https://issues.apache.org/jira/browse/GIRAPH-925 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Priority: Minor Attachments: GIRAPH-925.patch Original Estimate: 2h Remaining Estimate: 2h Currently if something is using port 22182 unit tests will fail. Or even worse, they timeout and then fail. Unit tests should not depend on availability of this port. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-924) Fix checkpointing
[ https://issues.apache.org/jira/browse/GIRAPH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-924: - Attachment: GIRAPH-924.patch Fix checkpointing - Key: GIRAPH-924 URL: https://issues.apache.org/jira/browse/GIRAPH-924 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-924.patch Original Estimate: 336h Remaining Estimate: 336h We need to make checkpoiting in Giraph functional again - it misses a lot of data because of many additions we've been making to Giraph (like information from WorkerContext/MasterCompute, proper integration with per superstep output etc). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (GIRAPH-924) Fix checkpointing
Sergey Edunov created GIRAPH-924: Summary: Fix checkpointing Key: GIRAPH-924 URL: https://issues.apache.org/jira/browse/GIRAPH-924 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-924.patch We need to make checkpoiting in Giraph functional again - it misses a lot of data because of many additions we've been making to Giraph (like information from WorkerContext/MasterCompute, proper integration with per superstep output etc). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-924) Fix checkpointing
[ https://issues.apache.org/jira/browse/GIRAPH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-924: - Attachment: GIRAPH-924.patch Fix checkpointing - Key: GIRAPH-924 URL: https://issues.apache.org/jira/browse/GIRAPH-924 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-924.patch Original Estimate: 336h Remaining Estimate: 336h We need to make checkpoiting in Giraph functional again - it misses a lot of data because of many additions we've been making to Giraph (like information from WorkerContext/MasterCompute, proper integration with per superstep output etc). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-924) Fix checkpointing
[ https://issues.apache.org/jira/browse/GIRAPH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046379#comment-14046379 ] Sergey Edunov commented on GIRAPH-924: -- https://reviews.apache.org/r/23140/ Fix checkpointing - Key: GIRAPH-924 URL: https://issues.apache.org/jira/browse/GIRAPH-924 Project: Giraph Issue Type: Improvement Reporter: Sergey Edunov Attachments: GIRAPH-924.patch Original Estimate: 336h Remaining Estimate: 336h We need to make checkpoiting in Giraph functional again - it misses a lot of data because of many additions we've been making to Giraph (like information from WorkerContext/MasterCompute, proper integration with per superstep output etc). -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (GIRAPH-903) Detect crashes of Netty threads
[ https://issues.apache.org/jira/browse/GIRAPH-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sergey Edunov updated GIRAPH-903: - Attachment: GIRAPH-903.patch Detect crashes of Netty threads --- Key: GIRAPH-903 URL: https://issues.apache.org/jira/browse/GIRAPH-903 Project: Giraph Issue Type: Bug Reporter: Sergey Edunov Priority: Minor Attachments: GIRAPH-903.patch, GIRAPH-903.patch Original Estimate: 24h Remaining Estimate: 24h When some of the request processing threads fails, the worker gets stuck but the job doesn't fail and it has to be killed manually. We should detect netty thread crashes and fail the job automatically. You can easily reproduce this if you add a mistake to deserialization of messages for example. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-842) option to dump histogram of memory usage when heap is low on memory
[ https://issues.apache.org/jira/browse/GIRAPH-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026848#comment-14026848 ] Sergey Edunov commented on GIRAPH-842: -- +1 option to dump histogram of memory usage when heap is low on memory --- Key: GIRAPH-842 URL: https://issues.apache.org/jira/browse/GIRAPH-842 Project: Giraph Issue Type: Bug Reporter: Pavan Kumar Assignee: Pavan Kumar Priority: Minor Attachments: GIRAPH-842.patch, GIRAPH-842_1.patch, master-stderr, worker-stderr Currently we are left in blind for jobs that OOM, it would be helpful if we can do a jmap -histo dump when heap has very little free space left. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (GIRAPH-842) option to dump histogram of memory usage when heap is low on memory
[ https://issues.apache.org/jira/browse/GIRAPH-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025989#comment-14025989 ] Sergey Edunov commented on GIRAPH-842: -- Hey Pavan, this is great feature, here are some comments on implementation: 1) Please assign your thread some meaningful name, it can be helpful in debugging stack traces. 2) Stop flag has to be volatile, otherwise your thread may never see the change 3) I would let the thread die if you get InterruptedException. 4) runtime.freeMemory() might give you false alarms if you run job with different values of Xmx and Xms. This is because freeMemory only counts free bytes currently allocated to JVM. If your Xmx setting is bigger than Xms you can go low on freeMemory before JVM allocates another chunk. To get more accurate results you can do (maxMemory - totalMemory + freeMemory) option to dump histogram of memory usage when heap is low on memory --- Key: GIRAPH-842 URL: https://issues.apache.org/jira/browse/GIRAPH-842 Project: Giraph Issue Type: Bug Reporter: Pavan Kumar Assignee: Pavan Kumar Priority: Minor Attachments: GIRAPH-842.patch, master-stderr, worker-stderr Currently we are left in blind for jobs that OOM, it would be helpful if we can do a jmap -histo dump when heap has very little free space left. -- This message was sent by Atlassian JIRA (v6.2#6252)