[jira] [Commented] (GIRAPH-1043) Implementation of Darwini graph generator

2017-02-02 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15850915#comment-15850915
 ] 

Sergey Edunov commented on GIRAPH-1043:
---

The phabricator link is not working anymore. 
Here is the updated link: https://github.com/apache/giraph/pull/19

> Implementation of Darwini graph generator
> -
>
> Key: GIRAPH-1043
> URL: https://issues.apache.org/jira/browse/GIRAPH-1043
> Project: Giraph
>  Issue Type: Task
>Reporter: Sergey Edunov
>Assignee: Sergey Edunov
>
> Implementation of graph generator that is able to capture many properties of 
> social graphs, such as high local clustering coefficient, non-power law 
> degree distributions and log normal joint degree distribution. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (GIRAPH-1126) Broken Link on Introduction Page for User Docs

2016-12-16 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15755930#comment-15755930
 ] 

Sergey Edunov commented on GIRAPH-1126:
---

I believe that version from release-1.0 does not even compile now. Simply 
because Vertex class doesn't have compute() function anymore. 

So, we should either take this link: 
https://github.com/apache/giraph/blob/trunk/giraph-examples/src/main/java/org/apache/giraph/examples/SimpleShortestPathsComputation.java
  
and update intro accordingly. 

Or, much better, rewrite the whole thing using the new API and update intro as 
well.  

> Broken Link on Introduction Page for User Docs
> --
>
> Key: GIRAPH-1126
> URL: https://issues.apache.org/jira/browse/GIRAPH-1126
> Project: Giraph
>  Issue Type: Task
>  Components: site
> Environment: Chrome Browser v54.0.2840.98 on macOS Sierra v10.12.1
>Reporter: Michael Aro
>Assignee: Michael Aro
>Priority: Trivial
>  Labels: documentation
>
> On the Introduction page of the User Docs is a broken link. The "here" link 
> before the code snippet on the page has a broken link to a java file on the 
> Github site. 
> URL: http://giraph.apache.org/intro.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1123) Latest trunk does not compile (checkstyle fails)

2016-10-21 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1123:
--
Fix Version/s: (was: 1.2.0)
   1.3.0

> Latest trunk does not compile (checkstyle fails)
> 
>
> Key: GIRAPH-1123
> URL: https://issues.apache.org/jira/browse/GIRAPH-1123
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Alessio Arleo
>Assignee: Alessio Arleo
> Fix For: 1.3.0
>
> Attachments: GIRAPH-1123.patch
>
>
> Latest trunk does not compile due to checkstyle errors. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1120) Insecure repository configuration

2016-10-21 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1120:
--
Fix Version/s: (was: 1.2.0)
   1.3.0

> Insecure repository configuration 
> --
>
> Key: GIRAPH-1120
> URL: https://issues.apache.org/jira/browse/GIRAPH-1120
> Project: Giraph
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.3.0
>Reporter: Olaf Flebbe
> Fix For: 1.3.0
>
> Attachments: 
> 0001-GIRAPH-1120-Insecure-repository-configuration.patch, 
> 0001-GIRAPH-1120-Insecure-repository-configuration.patch
>
>
> Hi, the repository configuration of giraph is dangerous, since it is 
> susceptible for mitm attacks.
> {code}
> 
> 
>   central
>   http://repo1.maven.org/maven2
>   
> true
>   
> 
> ...
> {code}
> If one looks closer, no repository is needed to be configured since 
> everything from the default profile is in maven central. 
> If anything from a non-default profile is not found in maven central, it 
> should be moved to the respective profile. For instance the CDH artifact 
> repository should be moved to the cdh hadoop_cdh4.1.2 profile.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1106) Update "Quick Start" section on Giraph website

2016-10-21 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1106:
--
Fix Version/s: (was: 1.2.0)
   1.3.0

> Update "Quick Start" section on Giraph website
> --
>
> Key: GIRAPH-1106
> URL: https://issues.apache.org/jira/browse/GIRAPH-1106
> Project: Giraph
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 1.2.0
>Reporter: Jose Luis Larroque
> Fix For: 1.3.0
>
>
> The quick start guide (http://giraph.apache.org/quick_start.html) must be 
> updated, is very confusing for a new user seeing the Hadoop "0.20.203.0-RC1" 
> version, for using with Giraph 1.2.0.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (GIRAPH-1095) Performance regression after GIRAPH-1068

2016-10-21 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov resolved GIRAPH-1095.
---
Resolution: Fixed

> Performance regression after GIRAPH-1068
> 
>
> Key: GIRAPH-1095
> URL: https://issues.apache.org/jira/browse/GIRAPH-1095
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sergey Edunov
>Assignee: Sergey Edunov
> Fix For: 1.2.0
>
>
> We noticed significant performance regression caused by GIRAPH-1068 for jobs 
> that have a lot of supersteps. This is likely caused by some missing 
> zookeeper options. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1124) Create documentation on how to make Giraph release

2016-10-19 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1124:
-

 Summary: Create documentation on how to make Giraph release
 Key: GIRAPH-1124
 URL: https://issues.apache.org/jira/browse/GIRAPH-1124
 Project: Giraph
  Issue Type: Wish
Reporter: Sergey Edunov
Assignee: Sergey Edunov
 Fix For: 1.3.0


Make a documentation on how to do Giraph release



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1044) Update book info in the User Docs / Related Literature page of the site

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1044:
--
Affects Version/s: (was: 1.3.0)
   1.2.0
Fix Version/s: (was: 1.3.0)
   1.2.0

> Update book info in the User Docs / Related Literature page of the site
> ---
>
> Key: GIRAPH-1044
> URL: https://issues.apache.org/jira/browse/GIRAPH-1044
> Project: Giraph
>  Issue Type: Improvement
>  Components: site
>Affects Versions: 1.2.0
>Reporter: Claudio Martella
>Assignee: Roman Shaposhnik
>  Labels: documentation
> Fix For: 1.2.0
>
> Attachments: 
> 0001-GIRAPH-1044.-Update-book-info-in-the-User-Docs-Relat.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1009) Spammy 'lost reservation' messages from ZooKeeper in workers' log at the end of the computation.

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1009:
--
Affects Version/s: (was: 1.3.0)
   1.2.0
Fix Version/s: (was: 1.3.0)
   1.2.0

> Spammy 'lost reservation' messages from ZooKeeper in workers' log at the end 
> of the computation.
> 
>
> Key: GIRAPH-1009
> URL: https://issues.apache.org/jira/browse/GIRAPH-1009
> Project: Giraph
>  Issue Type: Bug
>  Components: bsp, zookeeper
>Affects Versions: 1.2.0
> Environment: All environment, while running with more than one worker.
>Reporter: Hassan Eslami
>Assignee: Hassan Eslami
>Priority: Minor
>  Labels: log, worker, zookeeper
> Fix For: 1.2.0
>
>   Original Estimate: 72h
>  Remaining Estimate: 72h
>
> When running Giraph with more than one worker, ZooKeeper usually throws a 
> bunch of 'lost reservation' info log messages for input splits at the end of 
> the computation in workers' log. This clutters log file of workers, specially 
> in cases where the job is running on a large graph with fairly large number 
> of input splits.
> Here are examples of these log messages:
> ...
> {{INFO2015-05-19 14:44:58,894 \[main-EventThread\] 
> org.apache.giraph.worker.InputSplitsHandler  - process: Input split 
> /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/1126 lost 
> reservation}}
> {{INFO2015-05-19 14:44:58,894 \[main-EventThread\] 
> org.apache.giraph.worker.InputSplitsHandler  - process: Input split 
> /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/2585 lost 
> reservation}}
> {{INFO2015-05-19 14:44:58,895 \[main-EventThread\] 
> org.apache.giraph.worker.InputSplitsHandler  - process: Input split 
> /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/1166 lost 
> reservation}}
> {{INFO2015-05-19 14:44:58,895 \[main-EventThread\] 
> org.apache.giraph.worker.InputSplitsHandler  - process: Input split 
> /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/1212 lost 
> reservation}}
> {{INFO2015-05-19 14:44:58,895 \[main-EventThread\] 
> org.apache.giraph.worker.InputSplitsHandler  - process: Input split 
> /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/1666 lost 
> reservation}}
> {{INFO2015-05-19 14:44:58,896 \[main-EventThread\] 
> org.apache.giraph.worker.InputSplitsHandler  - process: Input split 
> /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/2282 lost 
> reservation}}
> {{INFO2015-05-19 14:44:58,896 \[main-EventThread\] 
> org.apache.giraph.worker.InputSplitsHandler  - process: Input split 
> /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/1302 lost 
> reservation}}
> {{INFO 2015-05-19 14:44:58,896 \[main-EventThread\] 
> org.apache.giraph.worker.InputSplitsHandler - process: Input split 
> /_hadoopBsp/job_201411061513.184523_0001/_edgeInputSplitDir/2364 lost 
> reservation}}
> ...



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-990) Current trunk will build for hadoop 1.2.0 not 0.20.203 as stated by documentation

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-990:
-
Affects Version/s: (was: 1.3.0)
   1.2.0
Fix Version/s: (was: 1.3.0)
   1.2.0

> Current trunk will build for hadoop 1.2.0 not 0.20.203 as stated by 
> documentation
> -
>
> Key: GIRAPH-990
> URL: https://issues.apache.org/jira/browse/GIRAPH-990
> Project: Giraph
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 1.2.0
> Environment: hadoop 0.20.203
>Reporter: Fredrik Einarsson
>Assignee: Roman Shaposhnik
> Fix For: 1.2.0
>
> Attachments: 
> 0001-GIRAPH-990.-Current-trunk-will-build-for-hadoop-1.2..patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> I am new to hadoop, giraph etc and will use it as a part of my master thesis. 
> Therefore I went to your quickstart guide 
> (http://giraph.apache.org/quick_start.html) and followed it.
> If one follows the guide point by point the "mvn package -DskipTests" command 
> will result in giraph-examples jar 
> giraph-examples-1.2.0-SNAPSHOT-for-hadoop-1.2.1-jar-with-dependencies.jar. 
> not the version for 0.20.203 which is needed. Also when I visit your github 
> page it is stated 
> "- Apache Hadoop 0.20.203.0
>   This is the default version used by Giraph: if you do not specify a
> profile with the -P flag, maven will use this version. You may also
> explicitly specify it with "mvn -Phadoop_0.20.203 "."
> Either documentation or mvn settings is wrong. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1086) Use pool of byte arrays with InMemoryDataAccessor

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1086:
--
Fix Version/s: 1.3.0

> Use pool of byte arrays with InMemoryDataAccessor
> -
>
> Key: GIRAPH-1086
> URL: https://issues.apache.org/jira/browse/GIRAPH-1086
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Fix For: 1.3.0
>
>
> Have a pool of byte arrays with InMemoryDataAccessor, to save on byte array 
> creation and initialization.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1087) Retry requests after channel failure

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1087:
--
Fix Version/s: 1.3.0

> Retry requests after channel failure
> 
>
> Key: GIRAPH-1087
> URL: https://issues.apache.org/jira/browse/GIRAPH-1087
> Project: Giraph
>  Issue Type: Bug
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Fix For: 1.3.0
>
>
> We currently don't have a callback to retry requests after channel failure, 
> and would either wait for request timeout or not retrying request at all at 
> places where we don't wait for open requests.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1105) Fix number of open requests in FacebookConfiguration

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1105:
--
Fix Version/s: 1.2.0

> Fix number of open requests in FacebookConfiguration
> 
>
> Key: GIRAPH-1105
> URL: https://issues.apache.org/jira/browse/GIRAPH-1105
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Fix For: 1.2.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1107) Allow observers to access job counters

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1107:
--
Fix Version/s: 1.3.0

> Allow observers to access job counters
> --
>
> Key: GIRAPH-1107
> URL: https://issues.apache.org/jira/browse/GIRAPH-1107
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>Priority: Minor
> Fix For: 1.3.0
>
>
> From mapper/master/worker observer we might want to update some job counters 
> for stats. For that we should allow observers to access job context.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1114) Expose StatusReporter from workers in blocks framework

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1114:
--
Fix Version/s: 1.3.0

> Expose StatusReporter from workers in blocks framework
> --
>
> Key: GIRAPH-1114
> URL: https://issues.apache.org/jira/browse/GIRAPH-1114
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>Priority: Minor
> Fix For: 1.3.0
>
>
> Sometimes we need to call progress or update status from workers, expose this 
> functionality



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1108) Allow measuring time spent doing GC in some interval

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1108:
--
Fix Version/s: 1.3.0

> Allow measuring time spent doing GC in some interval
> 
>
> Key: GIRAPH-1108
> URL: https://issues.apache.org/jira/browse/GIRAPH-1108
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>Priority: Minor
> Fix For: 1.3.0
>
>
> Sometimes when things are slow, we want to know whether it's because of GC or 
> not. Keep track of last k GC pauses and a way to check how much time since 
> some timestamp was spent doing GC.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1115) Move UncaughtExceptionHandler setup to GraphTaskManager

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1115:
--
Fix Version/s: 1.3.0

> Move UncaughtExceptionHandler setup to GraphTaskManager
> ---
>
> Key: GIRAPH-1115
> URL: https://issues.apache.org/jira/browse/GIRAPH-1115
> Project: Giraph
>  Issue Type: Bug
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>Priority: Minor
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1077) Jobs getting stuck after channel failure

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1077:
--
Fix Version/s: 1.2.0

> Jobs getting stuck after channel failure
> 
>
> Key: GIRAPH-1077
> URL: https://issues.apache.org/jira/browse/GIRAPH-1077
> Project: Giraph
>  Issue Type: Bug
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Fix For: 1.2.0
>
>
> When a channel fails currently we just log the failure. Since we don't wait 
> on open requests from every place, checking requests doesn't get called 
> always, and we've seen issues with jobs staying stuck, for example during the 
> input stage when request for split to read from worker to master fails. When 
> we know that channel failed, we should try to resend the requests from that 
> channel.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1082) Remove limit on the number of partitions

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1082:
--
Fix Version/s: 1.2.0

> Remove limit on the number of partitions
> 
>
> Key: GIRAPH-1082
> URL: https://issues.apache.org/jira/browse/GIRAPH-1082
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Fix For: 1.2.0
>
>
> Currently we have a limit on how many partitions we can have because we write 
> all partition information to Zookeeper. We can instead send this information 
> in requests and remove the hard limit.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1081) Fix a bug in internal out-of-core infra: multithreaded accesses to buffers

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1081:
--
Fix Version/s: 1.2.0

> Fix a bug in internal out-of-core infra: multithreaded accesses to buffers
> --
>
> Key: GIRAPH-1081
> URL: https://issues.apache.org/jira/browse/GIRAPH-1081
> Project: Giraph
>  Issue Type: Bug
>Reporter: Hassan Eslami
>Assignee: Hassan Eslami
> Fix For: 1.2.0
>
>
> The multi-threaded accesses to raw data buffers in DiskBackedDataStore is 
> overlooked, violating assumption on properly partitioning data to different 
> IO threads.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1085) Add InMemoryDataAccessor

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1085:
--
Fix Version/s: 1.2.0

> Add InMemoryDataAccessor
> 
>
> Key: GIRAPH-1085
> URL: https://issues.apache.org/jira/browse/GIRAPH-1085
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
> Fix For: 1.2.0
>
>
> When we deal with graphs which have a lot of vertices with very little total 
> data associated with them (values + edges) we start experiencing memory 
> problems because of too many objects created, since every vertex has multiple 
> objects associated with it. To solve this problem, we should have a 
> serialized partition representation (current ByteArrayPartition just keeps 
> byte[] per vertex, not per partition). We can leverage the out-of-core 
> infrastructure and just add data accessor which won't be backed by disk but 
> in memory buffers.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1098) Job may get stuck if zookeeper port fixed and is in use

2016-10-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1098:
--
Fix Version/s: 1.2.0

> Job may get stuck if zookeeper port fixed and is in use
> ---
>
> Key: GIRAPH-1098
> URL: https://issues.apache.org/jira/browse/GIRAPH-1098
> Project: Giraph
>  Issue Type: Bug
>Reporter: Sergey Edunov
>Assignee: Sergey Edunov
> Fix For: 1.2.0
>
>
> We see jobs getting stuck indefinitely if zookeeper port is in use:
> INFO2016-07-19 16:08:29,168 [main] 
> org.apache.zookeeper.server.NIOServerCnxnFactory  - binding to port 
> ::/0:0:0:0:0:0:0:0:22181
> ERROR   2016-07-19 16:08:29,168 [main] 
> org.apache.giraph.zk.InProcessZooKeeperRunner  - Unable to start zookeeper
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
>   at 
> org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.runFromConfig(InProcessZooKeeperRunner.java:196)
>   at 
> org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.start(InProcessZooKeeperRunner.java:154)
>   at 
> org.apache.giraph.zk.InProcessZooKeeperRunner$QuorumRunner.start(InProcessZooKeeperRunner.java:97)
>   at 
> org.apache.giraph.zk.InProcessZooKeeperRunner.start(InProcessZooKeeperRunner.java:52)
>   at 
> org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServer(ZooKeeperManager.java:476)
>   at 
> org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:447)
>   at 
> org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:247)
>   at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:56)
>   at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:627)
>   at org.apache.hadoop.mapred.MapTask.runImpl(MapTask.java:301)
>   at org.apache.hadoop.mapred.Task.run(Task.java:604)
>   at org.apache.hadoop.mapred.CoronaChild.main(CoronaChild.java:177)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1122) Javadoc generation fails for Giraph 1.2.0

2016-10-12 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1122:
-

 Summary: Javadoc generation fails for Giraph 1.2.0
 Key: GIRAPH-1122
 URL: https://issues.apache.org/jira/browse/GIRAPH-1122
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov
Assignee: Sergey Edunov


Javadoc generation currently fails in Giraph 1.2.0 
We need to fix it to be able to update website after the release



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1118) Giraph-gora and Giraph-rexster test cases fail in release-1.2

2016-10-05 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1118:
-

 Summary: Giraph-gora and Giraph-rexster test cases fail in 
release-1.2
 Key: GIRAPH-1118
 URL: https://issues.apache.org/jira/browse/GIRAPH-1118
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov


Trying to do
mvn clean install -Phadoop_1
mvn clean install -Phadoop_2

gora and rexster test cases fail and block release



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1117) Provide a flexible way to decide whether to create vertex when it is not present in the input

2016-09-26 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1117:
-

 Summary: Provide a flexible way to decide whether to create vertex 
when it is not present in the input
 Key: GIRAPH-1117
 URL: https://issues.apache.org/jira/browse/GIRAPH-1117
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
Priority: Minor


Currently there is only one way to control if source vertices that don't 
explicitly exist in the input should be created: giraph.createEdgeSourceVertices
This way one can enable or disable vertex creation on the job level. However 
sometime we need more fine-grained control. E.g. we want to create vertices for 
some edge input and then skip creation for other part. 





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1113) Errors from vertex combiner don't have vertex id, which makes it very hard to debug.

2016-09-14 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1113:
--
Description: Vertex combiners only have access to vertex values so if 
something happens and exception is thrown there is no way to track it back to 
particular vertex. We need to handle these exceptions higher in the code and 
re-throw with extra information

> Errors from vertex combiner don't have vertex id, which makes it very hard to 
> debug.
> 
>
> Key: GIRAPH-1113
> URL: https://issues.apache.org/jira/browse/GIRAPH-1113
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Sergey Edunov
>Assignee: Sergey Edunov
>Priority: Minor
>
> Vertex combiners only have access to vertex values so if something happens 
> and exception is thrown there is no way to track it back to particular 
> vertex. We need to handle these exceptions higher in the code and re-throw 
> with extra information



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (GIRAPH-1113) Errors from vertex combiner don't have vertex id, which makes it very hard to debug.

2016-09-14 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov reassigned GIRAPH-1113:
-

Assignee: Sergey Edunov

> Errors from vertex combiner don't have vertex id, which makes it very hard to 
> debug.
> 
>
> Key: GIRAPH-1113
> URL: https://issues.apache.org/jira/browse/GIRAPH-1113
> Project: Giraph
>  Issue Type: Improvement
>Reporter: Sergey Edunov
>Assignee: Sergey Edunov
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1113) Errors from vertex combiner don't have vertex id, which makes it very hard to debug.

2016-09-14 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1113:
-

 Summary: Errors from vertex combiner don't have vertex id, which 
makes it very hard to debug.
 Key: GIRAPH-1113
 URL: https://issues.apache.org/jira/browse/GIRAPH-1113
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-1094) Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput

2016-09-14 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15490985#comment-15490985
 ] 

Sergey Edunov commented on GIRAPH-1094:
---

According to hbase docs, hadoop 1x is only compatible with Hbase 0.94:
https://hbase.apache.org/book.html#hadoop

And hbase 0.94 is not compatible with our version of guava. 

Seems like we need to update hbase, but that will also mean we need to remove 
hbase support from hadoop_1 profile. 
I'm not sure how many users of hadoop_1 we have and how many of them care about 
hbase. 
This issue has been blocking the release for quite a while now, so I'll make an 
upgrade and remove support for hbase in hadoop_1 profile if there are no 
explicit objections by the end of this week. 


> Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput
> -
>
> Key: GIRAPH-1094
> URL: https://issues.apache.org/jira/browse/GIRAPH-1094
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sergey Edunov
>Assignee: Roman Shaposhnik
>
> This test case seems to fail because of a missing dependency. 
> 16/07/13 22:31:50 ERROR master.MasterFileSystem: bootstrap
> org.apache.hadoop.hbase.DroppedSnapshotException: region: -ROOT-,,0
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1684)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1552)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1047)
>   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:995)
>   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:960)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.bootstrap(MasterFileSystem.java:523)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:463)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:148)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:133)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:573)
>   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:432)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NoClassDefFoundError: 
> com/google/common/io/NullOutputStream
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:375)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:1330)
>   at 
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:913)
>   at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:794)
>   at 
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2429)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1659)
> Full log can be found here:
> https://builds.apache.org/job/Giraph-1.2/5/MVN_PROFILE=hadoop_1,jdk=JDK%201.7%20(latest),label=ubuntu/testReport/junit/org.apache.giraph.io.hbase/TestHBaseRootMarkerVertextFormat/testHBaseInputOutput/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1111) FileOutputFormat#setOutputPath is not always available

2016-09-09 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-:
-

 Summary: FileOutputFormat#setOutputPath is not always available
 Key: GIRAPH-
 URL: https://issues.apache.org/jira/browse/GIRAPH-
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
Assignee: Sergey Edunov
Priority: Minor


We need to make Giraph work with hadoop distributions where 
FileOutputFormat#setOutputPath is not available. It is very easy to pull this 
implementation out as every hadoop distribution has exactly the same 
implementation. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-1094) Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput

2016-08-25 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1094?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15437665#comment-15437665
 ] 

Sergey Edunov commented on GIRAPH-1094:
---

This bug is known:

http://stackoverflow.com/questions/17625938/hbase-minidfscluster-java-fails-in-certain-environments
it's caused by HDFS issue:
https://issues.apache.org/jira/browse/HDFS-2556

It doesn't really affect Giraph in runtime, so I think we shouldn't block the 
release. 

> Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput
> -
>
> Key: GIRAPH-1094
> URL: https://issues.apache.org/jira/browse/GIRAPH-1094
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sergey Edunov
>Assignee: Roman Shaposhnik
>
> This test case seems to fail because of a missing dependency. 
> 16/07/13 22:31:50 ERROR master.MasterFileSystem: bootstrap
> org.apache.hadoop.hbase.DroppedSnapshotException: region: -ROOT-,,0
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1684)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1552)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1047)
>   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:995)
>   at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:960)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.bootstrap(MasterFileSystem.java:523)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:463)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:148)
>   at 
> org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:133)
>   at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:573)
>   at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:432)
>   at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.NoClassDefFoundError: 
> com/google/common/io/NullOutputStream
>   at 
> org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:375)
>   at 
> org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:1330)
>   at 
> org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:913)
>   at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:794)
>   at 
> org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2429)
>   at 
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1659)
> Full log can be found here:
> https://builds.apache.org/job/Giraph-1.2/5/MVN_PROFILE=hadoop_1,jdk=JDK%201.7%20(latest),label=ubuntu/testReport/junit/org.apache.giraph.io.hbase/TestHBaseRootMarkerVertextFormat/testHBaseInputOutput/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1104) NegativeArraySize exception in BigDataOutput

2016-08-09 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1104:
-

 Summary: NegativeArraySize exception in BigDataOutput
 Key: GIRAPH-1104
 URL: https://issues.apache.org/jira/browse/GIRAPH-1104
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov
Assignee: Sergey Edunov


We're seeing this exception in some jobs. Supposedly related to high degree 
vertices

Caused by: java.lang.NegativeArraySizeException
at 
org.apache.giraph.utils.UnsafeByteArrayOutputStream.ensureSize(UnsafeByteArrayOutputStream.java:117)
at 
org.apache.giraph.utils.UnsafeByteArrayOutputStream.write(UnsafeByteArrayOutputStream.java:168)
at 
org.apache.giraph.utils.io.BigDataOutput.write(BigDataOutput.java:183)
at org.apache.giraph.edge.ByteArrayEdges.write(ByteArrayEdges.java:204)
at 
org.apache.giraph.ooc.data.DiskBackedPartitionStore.writeOutEdges(DiskBackedPartitionStore.java:353)
at 
org.apache.giraph.ooc.data.DiskBackedPartitionStore.offloadInMemoryPartitionData(DiskBackedPartitionStore.java:389)
at 
org.apache.giraph.ooc.data.DiskBackedDataStore.offloadPartitionDataProxy(DiskBackedDataStore.java:294)
at 
org.apache.giraph.ooc.data.DiskBackedPartitionStore.offloadPartitionData(DiskBackedPartitionStore.java:318)
at 
org.apache.giraph.ooc.command.StorePartitionIOCommand.execute(StorePartitionIOCommand.java:55)
at 
org.apache.giraph.ooc.OutOfCoreIOCallable.call(OutOfCoreIOCallable.java:99)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1100) Multiple mutation requests to one vertex result in failure

2016-07-27 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1100:
-

 Summary: Multiple mutation requests to one vertex result in failure
 Key: GIRAPH-1100
 URL: https://issues.apache.org/jira/browse/GIRAPH-1100
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov
Assignee: Sergey Edunov


The scenario is simple:
You send multiple addEdgeRequest() where the source vertex of new edge does not 
exist (typical scenario for adding reverse edges). If two of these requests 
happen to arrive in the single SendPartitionMutationsRequest, giraph is unable 
to handle it:


FATAL   2016-07-26 17:59:09,563 [netty-server-worker-6] 
org.apache.giraph.graph.GraphTaskManager  - uncaughtException: 
OverrideExceptionHandler on thread netty-server-worker-6, msg = readFields: 
Already has vertex id 977939745592684, exiting...
java.lang.IllegalStateException: readFields: Already has vertex id 
977939745592684
at 
org.apache.giraph.comm.requests.SendPartitionMutationsRequest.readFieldsRequest(SendPartitionMutationsRequest.java:98)
at 
org.apache.giraph.comm.requests.WritableRequest.readFields(WritableRequest.java:118)
at 
org.apache.giraph.utils.RequestUtils.decodeWritableRequest(RequestUtils.java:52)
at 
org.apache.giraph.comm.netty.handler.RequestDecoder.channelRead(RequestDecoder.java:89)
at 
io.netty.channel.DefaultChannelHandlerContext.invokeChannelRead(DefaultChannelHandlerContext.java:338)
at 
io.netty.channel.DefaultChannelHandlerContext.access$700(DefaultChannelHandlerContext.java:29)
at 
io.netty.channel.DefaultChannelHandlerContext$8.run(DefaultChannelHandlerContext.java:329)
at 
io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:354)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:353)
at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1099) Bypass all DNC calls in Giraph

2016-07-27 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1099:
-

 Summary: Bypass all DNC calls in Giraph
 Key: GIRAPH-1099
 URL: https://issues.apache.org/jira/browse/GIRAPH-1099
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov
Assignee: Sergey Edunov


After GIRAPH-1034 we have reduced the number of DNS lookups and reverse DNS 
lookups dramatically. However, we still see failures occasionally and it would 
be great to completely bypass DNS if PREFER_IP_ADDRESSES is set. 

Currently I'm aware of two places that make DNS lookups:

One in BspService:
this.hostname = conf.getLocalHostname(); 

Another one is probably related:
java.net.UnknownHostException: ***.com: unknown error
at java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method)
at java.net.InetAddress$2.lookupAllHostAddr(InetAddress.java:928)
at 
java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1323)
at java.net.InetAddress.getAllByName0(InetAddress.java:1276)
at java.net.InetAddress.getAllByName(InetAddress.java:1192)
at java.net.InetAddress.getAllByName(InetAddress.java:1126)
at 
org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
at org.apache.giraph.zk.ZooKeeperExt.(ZooKeeperExt.java:114)
at org.apache.giraph.bsp.BspService.(BspService.java:281)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Closed] (GIRAPH-1098) Job may get stuck if zookeeper port fixed and is in use

2016-07-20 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov closed GIRAPH-1098.
-

> Job may get stuck if zookeeper port fixed and is in use
> ---
>
> Key: GIRAPH-1098
> URL: https://issues.apache.org/jira/browse/GIRAPH-1098
> Project: Giraph
>  Issue Type: Bug
>Reporter: Sergey Edunov
>Assignee: Sergey Edunov
>
> We see jobs getting stuck indefinitely if zookeeper port is in use:
> INFO2016-07-19 16:08:29,168 [main] 
> org.apache.zookeeper.server.NIOServerCnxnFactory  - binding to port 
> ::/0:0:0:0:0:0:0:0:22181
> ERROR   2016-07-19 16:08:29,168 [main] 
> org.apache.giraph.zk.InProcessZooKeeperRunner  - Unable to start zookeeper
> java.net.BindException: Address already in use
>   at sun.nio.ch.Net.bind0(Native Method)
>   at sun.nio.ch.Net.bind(Net.java:433)
>   at sun.nio.ch.Net.bind(Net.java:425)
>   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
>   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
>   at 
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
>   at 
> org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.runFromConfig(InProcessZooKeeperRunner.java:196)
>   at 
> org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.start(InProcessZooKeeperRunner.java:154)
>   at 
> org.apache.giraph.zk.InProcessZooKeeperRunner$QuorumRunner.start(InProcessZooKeeperRunner.java:97)
>   at 
> org.apache.giraph.zk.InProcessZooKeeperRunner.start(InProcessZooKeeperRunner.java:52)
>   at 
> org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServer(ZooKeeperManager.java:476)
>   at 
> org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:447)
>   at 
> org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:247)
>   at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:56)
>   at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
>   at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:627)
>   at org.apache.hadoop.mapred.MapTask.runImpl(MapTask.java:301)
>   at org.apache.hadoop.mapred.Task.run(Task.java:604)
>   at org.apache.hadoop.mapred.CoronaChild.main(CoronaChild.java:177)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1098) Job may get stuck if zookeeper port fixed and is in use

2016-07-20 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1098:
-

 Summary: Job may get stuck if zookeeper port fixed and is in use
 Key: GIRAPH-1098
 URL: https://issues.apache.org/jira/browse/GIRAPH-1098
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov
Assignee: Sergey Edunov


We see jobs getting stuck indefinitely if zookeeper port is in use:

INFO2016-07-19 16:08:29,168 [main] 
org.apache.zookeeper.server.NIOServerCnxnFactory  - binding to port 
::/0:0:0:0:0:0:0:0:22181
ERROR   2016-07-19 16:08:29,168 [main] 
org.apache.giraph.zk.InProcessZooKeeperRunner  - Unable to start zookeeper
java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:433)
at sun.nio.ch.Net.bind(Net.java:425)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
at 
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:95)
at 
org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.runFromConfig(InProcessZooKeeperRunner.java:196)
at 
org.apache.giraph.zk.InProcessZooKeeperRunner$ZooKeeperServerRunner.start(InProcessZooKeeperRunner.java:154)
at 
org.apache.giraph.zk.InProcessZooKeeperRunner$QuorumRunner.start(InProcessZooKeeperRunner.java:97)
at 
org.apache.giraph.zk.InProcessZooKeeperRunner.start(InProcessZooKeeperRunner.java:52)
at 
org.apache.giraph.zk.ZooKeeperManager.onlineZooKeeperServer(ZooKeeperManager.java:476)
at 
org.apache.giraph.graph.GraphTaskManager.startZooKeeperManager(GraphTaskManager.java:447)
at 
org.apache.giraph.graph.GraphTaskManager.setup(GraphTaskManager.java:247)
at org.apache.giraph.graph.GraphMapper.setup(GraphMapper.java:56)
at org.apache.giraph.graph.GraphMapper.run(GraphMapper.java:90)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:627)
at org.apache.hadoop.mapred.MapTask.runImpl(MapTask.java:301)
at org.apache.hadoop.mapred.Task.run(Task.java:604)
at org.apache.hadoop.mapred.CoronaChild.main(CoronaChild.java:177)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1097) Fix TestOutOfCore.testOutOfCoreLocalDiskAccessor

2016-07-18 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1097:
-

 Summary: Fix TestOutOfCore.testOutOfCoreLocalDiskAccessor
 Key: GIRAPH-1097
 URL: https://issues.apache.org/jira/browse/GIRAPH-1097
 Project: Giraph
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sergey Edunov
Assignee: Sergey Edunov
 Fix For: 1.2.0


TestOutOfCore.testOutOfCoreLocalDiskAccessor for hadoop_1 profile fails. 

java.lang.AssertionError
at org.junit.Assert.fail(Assert.java:86)
at org.junit.Assert.assertTrue(Assert.java:41)
at org.junit.Assert.assertTrue(Assert.java:52)
at org.apache.giraph.TestOutOfCore.runTest(TestOutOfCore.java:100)
at 
org.apache.giraph.TestOutOfCore.testOutOfCoreLocalDiskAccessor(TestOutOfCore.java:84)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1095) Performance regression after GIRAPH-1068

2016-07-15 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1095:
-

 Summary: Performance regression after GIRAPH-1068
 Key: GIRAPH-1095
 URL: https://issues.apache.org/jira/browse/GIRAPH-1095
 Project: Giraph
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sergey Edunov
Assignee: Sergey Edunov
 Fix For: 1.2.0


We noticed significant performance regression caused by GIRAPH-1068 for jobs 
that have a lot of supersteps. This is likely caused by some missing zookeeper 
options. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1094) Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput

2016-07-13 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1094:
-

 Summary: Fix TestHBaseRootMarkerVertextFormat.testHBaseInputOutput
 Key: GIRAPH-1094
 URL: https://issues.apache.org/jira/browse/GIRAPH-1094
 Project: Giraph
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sergey Edunov


This test case seems to fail because of a missing dependency. 

16/07/13 22:31:50 ERROR master.MasterFileSystem: bootstrap
org.apache.hadoop.hbase.DroppedSnapshotException: region: -ROOT-,,0
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1684)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1552)
at 
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1047)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:995)
at org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:960)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.bootstrap(MasterFileSystem.java:523)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.checkRootDir(MasterFileSystem.java:463)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.createInitialFileSystemLayout(MasterFileSystem.java:148)
at 
org.apache.hadoop.hbase.master.MasterFileSystem.(MasterFileSystem.java:133)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:573)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:432)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NoClassDefFoundError: com/google/common/io/NullOutputStream
at 
org.apache.hadoop.hbase.io.hfile.HFileWriterV2.close(HFileWriterV2.java:375)
at 
org.apache.hadoop.hbase.regionserver.StoreFile$Writer.close(StoreFile.java:1330)
at 
org.apache.hadoop.hbase.regionserver.Store.internalFlushCache(Store.java:913)
at org.apache.hadoop.hbase.regionserver.Store.flushCache(Store.java:794)
at 
org.apache.hadoop.hbase.regionserver.Store$StoreFlusherImpl.flushCache(Store.java:2429)
at 
org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:1659)


Full log can be found here:
https://builds.apache.org/job/Giraph-1.2/5/MVN_PROFILE=hadoop_1,jdk=JDK%201.7%20(latest),label=ubuntu/testReport/junit/org.apache.giraph.io.hbase/TestHBaseRootMarkerVertextFormat/testHBaseInputOutput/





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1093) Fix TestRexsterLongDoubleFloatIOFormat test case

2016-07-13 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1093:
-

 Summary: Fix TestRexsterLongDoubleFloatIOFormat test case
 Key: GIRAPH-1093
 URL: https://issues.apache.org/jira/browse/GIRAPH-1093
 Project: Giraph
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sergey Edunov


This test case appears fluky, and fails with "Address already in use" exception:

java.net.BindException: Address already in use
at sun.nio.ch.Net.bind0(Native Method)
at sun.nio.ch.Net.bind(Net.java:463)
at sun.nio.ch.Net.bind(Net.java:455)
at 
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:395)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:366)
at 
org.glassfish.grizzly.nio.transport.TCPNIOTransport.bind(TCPNIOTransport.java:357)
at 
org.glassfish.grizzly.http.server.NetworkListener.start(NetworkListener.java:606)
at 
org.glassfish.grizzly.http.server.HttpServer.start(HttpServer.java:260)
at 
com.tinkerpop.rexster.server.HttpRexsterServer.reconfigure(HttpRexsterServer.java:195)
at 
com.tinkerpop.rexster.server.HttpRexsterServer.start(HttpRexsterServer.java:149)
at 



Full log can be found here:
https://builds.apache.org/job/Giraph-1.2/5/MVN_PROFILE=hadoop_2,jdk=JDK%201.7%20(latest),label=ubuntu/testReport/junit/org.apache.giraph.rexster.io.formats/TestRexsterLongDoubleFloatIOFormat/org_apache_giraph_rexster_io_formats_TestRexsterLongDoubleFloatIOFormat/



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1091) Fix SimpleRangePartitionFactoryTest

2016-07-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1091:
--
Fix Version/s: 1.2.0

> Fix SimpleRangePartitionFactoryTest
> ---
>
> Key: GIRAPH-1091
> URL: https://issues.apache.org/jira/browse/GIRAPH-1091
> Project: Giraph
>  Issue Type: Bug
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>Priority: Minor
> Fix For: 1.2.0
>
>
> SimpleRangePartitionFactoryTest relied on old logic for calculating number of 
> partitions and got broken with GIRAPH-1082.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1092) TestCollections.testLargeBasicList fails with OOM

2016-07-13 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1092:
--
Fix Version/s: 1.2.0

> TestCollections.testLargeBasicList fails with OOM
> -
>
> Key: GIRAPH-1092
> URL: https://issues.apache.org/jira/browse/GIRAPH-1092
> Project: Giraph
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: Sergey Edunov
>Assignee: Sergey Edunov
> Fix For: 1.2.0
>
>
> TestCollections.testLargeBasicList fails with OOM in Jenkins. This test case 
> requires more than 1G memory to run. After a small chat with author we 
> decided to disable this test case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1092) TestCollections.testLargeBasicList fails with OOM

2016-07-13 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1092:
-

 Summary: TestCollections.testLargeBasicList fails with OOM
 Key: GIRAPH-1092
 URL: https://issues.apache.org/jira/browse/GIRAPH-1092
 Project: Giraph
  Issue Type: Bug
Affects Versions: 1.2.0
Reporter: Sergey Edunov
Assignee: Sergey Edunov


TestCollections.testLargeBasicList fails with OOM in Jenkins. This test case 
requires more than 1G memory to run. After a small chat with author we decided 
to disable this test case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (GIRAPH-1079) Add triangle counting example

2016-07-05 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov resolved GIRAPH-1079.
---
Resolution: Fixed

> Add triangle counting example
> -
>
> Key: GIRAPH-1079
> URL: https://issues.apache.org/jira/browse/GIRAPH-1079
> Project: Giraph
>  Issue Type: New Feature
>Reporter: Maja Kabiljo
>Assignee: Maja Kabiljo
>Priority: Minor
> Fix For: 1.2.0
>
>
> Add an app for triangle counting



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-882) List of zookeeper connection strings is trimmed by Hadoop counters.

2016-07-01 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15359227#comment-15359227
 ] 

Sergey Edunov commented on GIRAPH-882:
--

I believe that if you set up external zookeeper quorum this code isn't even 
gets called. In this case, all access to the zookeeper goes through 
ZooKeeperExt. Check for example GraphTaskManager.java, these lines:

String serverPortList = conf.getZookeeperList();
if (serverPortList.isEmpty()) {
  if (startZooKeeperManager()) {
return; // ZK connect/startup failed
  }
} else {
  createZooKeeperCounter(serverPortList);
}

The first "if" is the only entry point into the ZooKeeperManager as far as I 
can see. And we don't go there if external zookeeper quorum is set. 

> List of zookeeper connection strings is trimmed by Hadoop counters.
> ---
>
> Key: GIRAPH-882
> URL: https://issues.apache.org/jira/browse/GIRAPH-882
> Project: Giraph
>  Issue Type: Bug
>  Components: zookeeper
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Lukas Nalezenec
> Attachments: GIRAPH-882-rev2.patch, GIRAPH-882-rev3.patch, 
> GIRAPH-882.patch, testrun.log
>
>
> We are running job with quorum of 3 zookeepers. Each serves has got long name 
> (turing452.fi.callan.de:22181). Connection strings are stored to Hadoop 
> Counters (for example: 
> turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turing488.fi.callan.de:22181)
>  but since name of counter is limited to ~63 character the connection string 
> is trimmed (turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin).
> 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Client 
> environment:user.name=hadoop
> 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin 
> sessionTimeout=6 
> Exception in thread "main" java.net.UnknownHostException: turin
>   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1162)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1098)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60)
>   at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
>   at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
>   at org.apache.giraph.zk.ZooKeeperExt.(ZooKeeperExt.java:114)
>   at 
> org.apache.giraph.job.JobProgressTracker.(JobProgressTracker.java:69)
>   at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:255)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-882) List of zookeeper connection strings is trimmed by Hadoop counters.

2016-06-29 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15355531#comment-15355531
 ] 

Sergey Edunov commented on GIRAPH-882:
--

Hi Donngjin, 

Are you available for chat sometime this week? I'm trying to figure out how 
this diff relates to GIRAPH-1068. There we removed an option to start multiple 
zookeepers from the giraph itself. It is still possible to connect to external 
zookeeper quorum though. 
To understand what needs to be done here, can you explain your use case? If I 
understand correctly, here you create 5 zookeeper instances from the Giraph 
job. What is the reason, and why don't you use single zookeper?  

> List of zookeeper connection strings is trimmed by Hadoop counters.
> ---
>
> Key: GIRAPH-882
> URL: https://issues.apache.org/jira/browse/GIRAPH-882
> Project: Giraph
>  Issue Type: Bug
>  Components: zookeeper
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Lukas Nalezenec
> Attachments: GIRAPH-882-rev2.patch, GIRAPH-882-rev3.patch, 
> GIRAPH-882.patch, testrun.log
>
>
> We are running job with quorum of 3 zookeepers. Each serves has got long name 
> (turing452.fi.callan.de:22181). Connection strings are stored to Hadoop 
> Counters (for example: 
> turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turing488.fi.callan.de:22181)
>  but since name of counter is limited to ~63 character the connection string 
> is trimmed (turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin).
> 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Client 
> environment:user.name=hadoop
> 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Initiating client connection, 
> connectString=turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin 
> sessionTimeout=6 
> Exception in thread "main" java.net.UnknownHostException: turin
>   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
>   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
>   at 
> java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
>   at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1162)
>   at java.net.InetAddress.getAllByName(InetAddress.java:1098)
>   at 
> org.apache.zookeeper.client.StaticHostProvider.(StaticHostProvider.java:60)
>   at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:445)
>   at org.apache.zookeeper.ZooKeeper.(ZooKeeper.java:380)
>   at org.apache.giraph.zk.ZooKeeperExt.(ZooKeeperExt.java:114)
>   at 
> org.apache.giraph.job.JobProgressTracker.(JobProgressTracker.java:69)
>   at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:255)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-1068) Make Zookeeper accept 0 as a port number and let it choose any available free port

2016-06-02 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-1068:
--
Summary: Make Zookeeper accept 0 as a port number and let it choose any 
available free port  (was: Make Zookeeper accept 0 as a port number and let it 
choose free available port)

> Make Zookeeper accept 0 as a port number and let it choose any available free 
> port
> --
>
> Key: GIRAPH-1068
> URL: https://issues.apache.org/jira/browse/GIRAPH-1068
> Project: Giraph
>  Issue Type: Task
>Reporter: Sergey Edunov
>Assignee: Sergey Edunov
>
> We have a few use cases where having zookeeper bound to specific port is very 
> inconvenient. 
> 1) Unit tests that run in parallel. 
> 2) Shared clusters where multiple giraph instances can run on the same 
> machines. 
> In theory we don't need to know what port zookeeper will run on. In most 
> cases we're fine with any port available. 
> Picking any available port is currently supported by the server socket, but 
> is not supported in the code that parses zookeper configs (this code lives in 
> zookeper). 
> We don't have to parse configs though, as we have a way to run zookeper in 
> process. And in that case we can have a full control on how zookeeper is 
> initialized. 
> For this task I want to allow 0 as a port number for zookeeper. Which will 
> allow us to run zookeeper on any available port. And I will also remove "out 
> of process" zookeeper, as it clearly provides no benefits to us.  
> Note: it will still be possible to run external zookeper, if you have it 
> running somewhere as a service. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1043) Implementation of Darwini graph generator

2016-02-08 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1043:
-

 Summary: Implementation of Darwini graph generator
 Key: GIRAPH-1043
 URL: https://issues.apache.org/jira/browse/GIRAPH-1043
 Project: Giraph
  Issue Type: Task
Reporter: Sergey Edunov
Assignee: Sergey Edunov


Implementation of graph generator that is able to capture many properties of 
social graphs, such as high local clustering coefficient, non-power law degree 
distributions and log normal joint degree distribution. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-1024) mvn release:prepare not committing changes to pom.xml

2015-07-29 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-1024:
-

 Summary: mvn release:prepare not committing changes to pom.xml
 Key: GIRAPH-1024
 URL: https://issues.apache.org/jira/browse/GIRAPH-1024
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov


This issue is pretty much described in stackoverflow post: 
http://stackoverflow.com/questions/15166781/mvn-releaseprepare-not-committing-changes-to-pom-xml
with newer versions of git we are unable to release giraph. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (GIRAPH-1024) mvn release:prepare not committing changes to pom.xml

2015-07-29 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-1024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov resolved GIRAPH-1024.
---
Resolution: Fixed

https://reviews.facebook.net/D43215

 mvn release:prepare not committing changes to pom.xml
 -

 Key: GIRAPH-1024
 URL: https://issues.apache.org/jira/browse/GIRAPH-1024
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov

 This issue is pretty much described in stackoverflow post: 
 http://stackoverflow.com/questions/15166781/mvn-releaseprepare-not-committing-changes-to-pom-xml
 with newer versions of git we are unable to release giraph. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-1000) Multi Output support

2015-03-25 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-1000?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14379892#comment-14379892
 ] 

Sergey Edunov commented on GIRAPH-1000:
---

That would be a great addition to Giraph! 
I was thinking about it a while ago. Seems like we can implement it in a 
similar to multiple input format way. See for example: 
org.apache.giraph.io.formats.multi.MultiVertexInputFormat and other classes in 
the same package. This is essentially a wrapper around a list on inputs 
providing the same API as single input format does. 
In a same way we can have a wrapper around VertexOutputFormat and 
EdgeOutputFormat providing same APIs, and then just plug them in.
We also need this feature, so I'll be happy to help

 Multi Output support
 

 Key: GIRAPH-1000
 URL: https://issues.apache.org/jira/browse/GIRAPH-1000
 Project: Giraph
  Issue Type: Improvement
  Components: bsp, conf and scripts, graph
Affects Versions: 1.0.0, 1.1.0, 1.2.0-SNAPSHOT
Reporter: Alessio Arleo
  Labels: features

 Hadoop natively supports multiple outputs. The objective is to extend Giraph 
 to support multiple output formats during a single giraph run.
 According to the official Hadoop apidocs*, to take advantage of multiple 
 outputs the  the pattern is the following:
 - Modify the job submission
 - Modify the reducer class to write on the declared different outputs
 Since Giraph jobs are executed as mappers, probably this approach (or at 
 least its second part) is not feasible, so further investigation is necessary.
 *https://hadoop.apache.org/docs/r1.2.1/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-992) Zookeeper logs have too many NodeExists exceptions

2015-02-04 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-992:


 Summary: Zookeeper logs have too many NodeExists exceptions
 Key: GIRAPH-992
 URL: https://issues.apache.org/jira/browse/GIRAPH-992
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov


There are several places in our code where we do not check properly if zk node 
already exists before attempting to create it. As a result ZK logs are full of 
these exceptions. 
Biggest offender is recursive path creation in ZooKeeperExt.createExt(). 
Obviously if part of the path already exists we don't need to create it. 
Second biggest offender is writing input splits from master. Here we lunch 
multiple threads each of them attempting to create the same path. 

INFO2015-02-04 14:32:39,730 [ProcessThread(sid:0 cport:-1):] 
org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
KeeperException when processing sessionid:0x14b56b9176f0001 type:create 
cxid:0x1 zxid:0x19 txntype:-1 reqpath:n/a Error 
Path:/_hadoopBsp/job_201411061513.83344_0001/_masterJobState 
Error:KeeperErrorCode = NodeExists for 
/_hadoopBsp/job_201411061513.83344_0001/_masterJobState
INFO2015-02-04 14:32:39,740 [ProcessThread(sid:0 cport:-1):] 
org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
KeeperException when processing sessionid:0x14b56b9176f0001 type:create 
cxid:0x3 zxid:0x1a txntype:-1 reqpath:n/a Error 
Path:/_hadoopBsp/job_201411061513.83344_0001/_applicationAttemptsDir 
Error:KeeperErrorCode = NodeExists for 
/_hadoopBsp/job_201411061513.83344_0001/_applicationAttemptsDir
INFO2015-02-04 14:32:39,742 [ProcessThread(sid:0 cport:-1):] 
org.apache.zookeeper.server.PrepRequestProcessor  - Got user-level 
KeeperException when processing sessionid:0x14b56b9176f0001 type:create 
cxid:0x5 zxid:0x1b txntype:-1 reqpath:n/a Error 
Path:/_hadoopBsp/job_201411061513.83344_0001/_applicationAttemptsDir/0/_superstepDir
 Error:KeeperErrorCode = NodeExists for 
/_hadoopBsp/job_201411061513.83344_0001/_applicationAttemptsDir/0/_superstepDir




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-882) List of zookeeper connection strings is trimmed by Hadoop counters.

2015-01-23 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290342#comment-14290342
 ] 

Sergey Edunov commented on GIRAPH-882:
--

Thank you, Lee!
I've added one comment to your diff and it looks fine for me otherwise. I'll 
try using this patch on our cluster to make sure it works and will post update 
here.

 List of zookeeper connection strings is trimmed by Hadoop counters.
 ---

 Key: GIRAPH-882
 URL: https://issues.apache.org/jira/browse/GIRAPH-882
 Project: Giraph
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 1.1.0
Reporter: Lukas Nalezenec
 Attachments: GIRAPH-882-rev2.patch, GIRAPH-882.patch, testrun.log


 We are running job with quorum of 3 zookeepers. Each serves has got long name 
 (turing452.fi.callan.de:22181). Connection strings are stored to Hadoop 
 Counters (for example: 
 turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turing488.fi.callan.de:22181)
  but since name of counter is limited to ~63 character the connection string 
 is trimmed (turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin).
 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Client 
 environment:user.name=hadoop
 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Initiating client connection, 
 connectString=turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin 
 sessionTimeout=6 
 Exception in thread main java.net.UnknownHostException: turin
   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
   at 
 java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
   at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
   at java.net.InetAddress.getAllByName(InetAddress.java:1162)
   at java.net.InetAddress.getAllByName(InetAddress.java:1098)
   at 
 org.apache.zookeeper.client.StaticHostProvider.init(StaticHostProvider.java:60)
   at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:445)
   at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:380)
   at org.apache.giraph.zk.ZooKeeperExt.init(ZooKeeperExt.java:114)
   at 
 org.apache.giraph.job.JobProgressTracker.init(JobProgressTracker.java:69)
   at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:255)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-882) List of zookeeper connection strings is trimmed by Hadoop counters.

2015-01-14 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278039#comment-14278039
 ] 

Sergey Edunov commented on GIRAPH-882:
--

Lee, thank you for working on this!

Please make sure you follow the steps 
http://giraph.apache.org/generating_patches.html

In particular, you need to run mvn clean verify. It looks like your change 
actually breaks unit tests.
It is also recommended to submit review board request or phabricator code 
review for big patches (this one seems big enough). I submitted one for you: 
https://reviews.facebook.net/D31563  and put some comments there.

Also you should be working against latest version of Giraph from git, I was 
able to merge your changes, but the more complicated your patch is, the harder 
it will be to merge. 



 List of zookeeper connection strings is trimmed by Hadoop counters.
 ---

 Key: GIRAPH-882
 URL: https://issues.apache.org/jira/browse/GIRAPH-882
 Project: Giraph
  Issue Type: Bug
  Components: zookeeper
Affects Versions: 1.1.0
Reporter: Lukas Nalezenec
 Attachments: GIRAPH-882.patch, testrun.log


 We are running job with quorum of 3 zookeepers. Each serves has got long name 
 (turing452.fi.callan.de:22181). Connection strings are stored to Hadoop 
 Counters (for example: 
 turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turing488.fi.callan.de:22181)
  but since name of counter is limited to ~63 character the connection string 
 is trimmed (turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin).
 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Client 
 environment:user.name=hadoop
 14/03/18 23:44:41 INFO zookeeper.ZooKeeper: Initiating client connection, 
 connectString=turing452.fi.callan.de:22181,turing124.fi.callan.de:22181,turin 
 sessionTimeout=6 
 Exception in thread main java.net.UnknownHostException: turin
   at java.net.Inet4AddressImpl.lookupAllHostAddr(Native Method)
   at java.net.InetAddress$1.lookupAllHostAddr(InetAddress.java:901)
   at 
 java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1293)
   at java.net.InetAddress.getAllByName0(InetAddress.java:1246)
   at java.net.InetAddress.getAllByName(InetAddress.java:1162)
   at java.net.InetAddress.getAllByName(InetAddress.java:1098)
   at 
 org.apache.zookeeper.client.StaticHostProvider.init(StaticHostProvider.java:60)
   at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:445)
   at org.apache.zookeeper.ZooKeeper.init(ZooKeeper.java:380)
   at org.apache.giraph.zk.ZooKeeperExt.init(ZooKeeperExt.java:114)
   at 
 org.apache.giraph.job.JobProgressTracker.init(JobProgressTracker.java:69)
   at org.apache.giraph.job.GiraphJob.run(GiraphJob.java:255)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-983) Remove checkpoint related error messages from console

2015-01-13 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-983:


 Summary: Remove checkpoint related error messages from console
 Key: GIRAPH-983
 URL: https://issues.apache.org/jira/browse/GIRAPH-983
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov
Assignee: Sergey Edunov
Priority: Minor


If for any reason job fails, we always see checkpointing related error in 
console. This should be removed as it confuses users.

INFO2014-12-28 18:58:42,913 Can't find any checkpoints for 
jobID=job_201407091116.163675_0001
java.io.FileNotFoundException: File 
_bsp/_checkpoints/job_201407091116.163675_0001 does not exist
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1179)
at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1216)
at 
org.apache.giraph.utils.CheckpointingUtils.getLastCheckpointedSuperstep(CheckpointingUtils.java:106)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-978) Giraph-Debugger Test Graphs not working

2015-01-08 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14270195#comment-14270195
 ] 

Sergey Edunov commented on GIRAPH-978:
--

CR: https://reviews.facebook.net/D31137

 Giraph-Debugger Test Graphs not working
 ---

 Key: GIRAPH-978
 URL: https://issues.apache.org/jira/browse/GIRAPH-978
 Project: Giraph
  Issue Type: Bug
  Components: graph
Reporter: Nishant M Gandhi
Assignee: Sergey Edunov
Priority: Minor
  Labels: patch
 Attachments: Giraph-debug-000.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-980) Way to disable checkpoints for particular job and on particular supersteps

2015-01-07 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-980:


 Summary: Way to disable checkpoints for particular job and on 
particular supersteps
 Key: GIRAPH-980
 URL: https://issues.apache.org/jira/browse/GIRAPH-980
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov


It is currently impossible to disable checkpoints for particular jobs. For 
example jobs that do output during the computation do not support checkpointing 
and no attempt should be made to checkpoint such job. 
Over use cases exist when we need to manually specify which supersteps are 
checkpointable for particular job. 
We need a generic way to configure job ability to do checkpoints. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (GIRAPH-980) Way to disable checkpoints for particular job and on particular supersteps

2015-01-07 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov reassigned GIRAPH-980:


Assignee: Sergey Edunov

 Way to disable checkpoints for particular job and on particular supersteps
 --

 Key: GIRAPH-980
 URL: https://issues.apache.org/jira/browse/GIRAPH-980
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
Assignee: Sergey Edunov

 It is currently impossible to disable checkpoints for particular jobs. For 
 example jobs that do output during the computation do not support 
 checkpointing and no attempt should be made to checkpoint such job. 
 Over use cases exist when we need to manually specify which supersteps are 
 checkpointable for particular job. 
 We need a generic way to configure job ability to do checkpoints. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-979) Add type of input to 'missing input' error message

2015-01-07 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14267973#comment-14267973
 ] 

Sergey Edunov commented on GIRAPH-979:
--

+1

 Add type of input to 'missing input' error message
 --

 Key: GIRAPH-979
 URL: https://issues.apache.org/jira/browse/GIRAPH-979
 Project: Giraph
  Issue Type: Improvement
Reporter: Maja Kabiljo
Assignee: Maja Kabiljo
 Attachments: GIRAPH-979.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Reopened] (GIRAPH-978) Giraph-Debugger Test Graphs not working

2015-01-06 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov reopened GIRAPH-978:
--
  Assignee: Sergey Edunov

 Giraph-Debugger Test Graphs not working
 ---

 Key: GIRAPH-978
 URL: https://issues.apache.org/jira/browse/GIRAPH-978
 Project: Giraph
  Issue Type: Bug
  Components: graph
Reporter: Nishant M Gandhi
Assignee: Sergey Edunov
Priority: Minor
  Labels: patch
 Attachments: Giraph-debug-000.patch






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-975) In-proc ZooKeeper server with Master process

2014-12-23 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-975:


 Summary: In-proc ZooKeeper server with Master process
 Key: GIRAPH-975
 URL: https://issues.apache.org/jira/browse/GIRAPH-975
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
Assignee: Sergey Edunov


Currently by default zookeeper runs as a separate java process, on the same 
server where master runs. This prevents us from seeing zookeeper logs and makes 
it harder to debug memory issues.  We should be able to run zookeeper inside 
Master process and perhaps this should be default. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-972) Race condition in checkpointing

2014-12-18 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-972:


 Summary: Race condition in checkpointing
 Key: GIRAPH-972
 URL: https://issues.apache.org/jira/browse/GIRAPH-972
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov


Couple of issues noticed with checkpointing of large jobs:
1) Task ID of master appears to be important. In most cases it is 0, however 
sometimes it is not and as we can not control it checkpointing should not 
depend on it.

2) Race condition happens on master when worker dies:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode 
for 
/_hadoopBsp/job_201411061513.38895_0001/_applicationAttemptsDir/0/_superstepDir/9/_workerHealthyDir/hadoop4921.prn2.facebook.com_3
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1151)
at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1180)
at org.apache.giraph.zk.ZooKeeperExt.getData(ZooKeeperExt.java:470)
at 
org.apache.giraph.utils.WritableUtils.readFieldsFromZnode(WritableUtils.java:126)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-973) Edge trimming no longer works before superstep 0

2014-12-18 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-973:


 Summary: Edge trimming no longer works before superstep 0
 Key: GIRAPH-973
 URL: https://issues.apache.org/jira/browse/GIRAPH-973
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov


Edge trimming was introduced in GIRAPH-895 no longer works before 0 superstep.  
We used to trim edges after input is done and that reduced memory consumption 
in superstep 0, this is no longer works and we only trim edge arrays at the end 
of superstep 0. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (GIRAPH-966) Add a way to ignore some thread exceptions

2014-12-01 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14230249#comment-14230249
 ] 

Sergey Edunov commented on GIRAPH-966:
--

looks good to me

 Add a way to ignore some thread exceptions
 --

 Key: GIRAPH-966
 URL: https://issues.apache.org/jira/browse/GIRAPH-966
 Project: Giraph
  Issue Type: New Feature
Reporter: Maja Kabiljo
Assignee: Maja Kabiljo
Priority: Minor
 Attachments: GIRAPH-966.patch


 Add a way not to fail a mapper when an exception happens on some non core 
 threads. By default still fail on every exception.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-963) Aggregators may fail with IllegalArgumentException upon deserialization

2014-11-03 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-963:


 Summary: Aggregators may fail with IllegalArgumentException upon 
deserialization
 Key: GIRAPH-963
 URL: https://issues.apache.org/jira/browse/GIRAPH-963
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov
Priority: Trivial


Found this in one of the runs, fix is simple:

java.lang.IllegalArgumentException: Trying to configure configurable object 
without value, class …
at 
org.apache.giraph.utils.ConfigurationUtils.configureIfPossible(ConfigurationUtils.java:153)
at 
org.apache.giraph.utils.ReflectionUtils.newInstance(ReflectionUtils.java:111)
at 
org.apache.giraph.master.AggregatorReduceOperation.initAggregator(AggregatorReduceOperation.java:65)
at 
org.apache.giraph.master.AggregatorReduceOperation.readFields(AggregatorReduceOperation.java:114)
at 
org.apache.giraph.master.AggregatorToGlobalCommTranslation$AggregatorWrapper.readFields(AggregatorToGlobalCommTranslation.java:288)
at 
org.apache.giraph.master.AggregatorToGlobalCommTranslation.readFields(AggregatorToGlobalCommTranslation.java:184)
at 
org.apache.giraph.master.BspServiceMaster.prepareCheckpointRestart(BspServiceMaster.java:823)
at 
org.apache.giraph.master.BspServiceMaster.assignPartitionOwners(BspServiceMaster.java:1140)
at 
org.apache.giraph.master.BspServiceMaster.coordinateSuperstep(BspServiceMaster.java:1609)
at org.apache.giraph.master.MasterThread.run(MasterThread.java:124)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (GIRAPH-950) Auto-restart from checkpoint doesn't pick up latest checkpoint

2014-09-24 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-950:


 Summary: Auto-restart from checkpoint doesn't pick up latest 
checkpoint
 Key: GIRAPH-950
 URL: https://issues.apache.org/jira/browse/GIRAPH-950
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov


While running different jobs with checkpoints enabled I noticed some issues:
1) The way we pick up latest checkpoint is not correct. Current implementation 
just picks whatever is returned last from FileSystem.list(), which is not 
necessarily the last checkpoint
2) If job restarts from checkpoint it immediately creates another checkpoint. 
3) We need more flexibility in GiraphJobRetryChecker to allow restarts after 
multiple failures. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (GIRAPH-933) Checkpointing improvements

2014-08-15 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-933:
-

Attachment: GIRAPH-933.3.patch

 Checkpointing improvements
 --

 Key: GIRAPH-933
 URL: https://issues.apache.org/jira/browse/GIRAPH-933
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-933.2.patch, GIRAPH-933.3.patch, GIRAPH-933.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We need to address some issues with checkpointing:
 1) worker2worker messages are not saved
 2) BspServiceWorker does not compile under hadoop_0.23 profile
 3) it would be nice to be able to manually checkpoint and stop any job at any 
 point of time. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-933) Checkpointing improvements

2014-08-15 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-933:
-

Attachment: GIRAPH-933.4.patch

 Checkpointing improvements
 --

 Key: GIRAPH-933
 URL: https://issues.apache.org/jira/browse/GIRAPH-933
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-933.2.patch, GIRAPH-933.3.patch, 
 GIRAPH-933.4.patch, GIRAPH-933.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We need to address some issues with checkpointing:
 1) worker2worker messages are not saved
 2) BspServiceWorker does not compile under hadoop_0.23 profile
 3) it would be nice to be able to manually checkpoint and stop any job at any 
 point of time. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (GIRAPH-940) Cleanup the list of supported hadoop versions.

2014-08-11 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-940:


 Summary: Cleanup the list of supported hadoop versions.
 Key: GIRAPH-940
 URL: https://issues.apache.org/jira/browse/GIRAPH-940
 Project: Giraph
  Issue Type: Task
Reporter: Sergey Edunov


We now support 14 hadoop version:
 
hadoop_0.20.203
hadoop_0.23
hadoop_1
hadoop_2
hadoop_2.0.0
hadoop_2.0.1
hadoop_2.0.2
hadoop_2.0.3
hadoop_cdh4.1.2
hadoop_facebook
hadoop_non_secure
hadoop_snapshot
hadoop_trunk
hadoop_yarn

Some of them have known issues like this one 
https://issues.apache.org/jira/browse/MAPREDUCE-118 in hadoop_0.20.203  This 
one particularly blocks https://issues.apache.org/jira/browse/GIRAPH-933  

Some of them don't even compile  (hadoop_2.0.3 , hadoop_2.0.2, 
hadoop_2.0.1,hadoop_2.0.0, hadoop_0.23, hadoop_non_secure). I have no idea how 
many of our 'supported profiles' actually work. 

I think we should review the list of supported hadoop versions and clear the 
list of profiles to make our lives easier. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-933) Checkpointing improvements

2014-08-11 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-933:
-

Attachment: GIRAPH-933.2.patch

 Checkpointing improvements
 --

 Key: GIRAPH-933
 URL: https://issues.apache.org/jira/browse/GIRAPH-933
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-933.2.patch, GIRAPH-933.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We need to address some issues with checkpointing:
 1) worker2worker messages are not saved
 2) BspServiceWorker does not compile under hadoop_0.23 profile
 3) it would be nice to be able to manually checkpoint and stop any job at any 
 point of time. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-931) Provide a Strongly Connected Components algorithm

2014-08-06 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14088068#comment-14088068
 ] 

Sergey Edunov commented on GIRAPH-931:
--

Hi Gianluca, 
Thank you for working on this! Can you please submit review board request 
(https://reviews.apache.org) or arcanist review request 
(https://reviews.facebook.net/) next time?
Here is a list of things I noticed in the diff:
1) Make sure you're consistent with using primitive types vs objects. E.g. in 
SccVertexValue value is Long while it should really be long, as you don't 
expect it to be null in write(). 
2) Similarly you don't have to use ListLong if you intend to store only 
primitive longs you can use primitive collections from fastutils, like 
it.unimi.dsi.fastutil.longs.LongArrayList  these have smaller memory footprint 
and hence can help scaling your code.
3) In SccComputation you can move phase extraction from compute() to 
preSuperstep(), phase is not changed during the superstep and hence, no need to 
call it for each vertice.
4) You can avoid creating new LongWritable() every time you send message by 
having LongWritable field in SccComputation and reusing it. 
5) In clearParents(),  call to parents.clear() is not necessary as you discard 
parent right after this call anyway.

These are mostly performance improvements, algorithm implementation is clean 
and looks good to me. Again, thank you for this great work!


 Provide a Strongly Connected Components algorithm
 -

 Key: GIRAPH-931
 URL: https://issues.apache.org/jira/browse/GIRAPH-931
 Project: Giraph
  Issue Type: Improvement
  Components: examples
Reporter: Gianluca Righetto
Priority: Minor
 Attachments: GIRAPH-931.patch


 Provide an implementation of an algorithm for finding strongly connected 
 components in a graph to augment the giraph-examples library. This has been 
 initially proposed on GSoC'14.
 A handful of graph algorithms have been researched in this paper: Optimizing 
 Graph Algorithms on Pregel-like Systems (Salihoglu, S., Widom, J., 2014), 
 and a detailed explanation of SCC can also be found in it.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-927) Decouple netty server threads from message processing

2014-08-01 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082630#comment-14082630
 ] 

Sergey Edunov commented on GIRAPH-927:
--

Awesome! Thank you Craig! 
I'll create new JIRA and submit this patch for CR

 Decouple netty server threads from message processing
 -

 Key: GIRAPH-927
 URL: https://issues.apache.org/jira/browse/GIRAPH-927
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-927.patch, async.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Our profiling shows that a lot of apps are neither CPU nor memory or network 
 bound. Instead they waste a lot of time waiting for lock in MessageStore. 
 That happens in netty threads. 
 We should be able to put messages into queue and then process them in other 
 set of threads. 
 It has to be configurable because adding another thread level will introduce 
 additional overhead. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-927) Decouple netty server threads from message processing

2014-08-01 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14082642#comment-14082642
 ] 

Sergey Edunov commented on GIRAPH-927:
--

https://issues.apache.org/jira/browse/GIRAPH-936

 Decouple netty server threads from message processing
 -

 Key: GIRAPH-927
 URL: https://issues.apache.org/jira/browse/GIRAPH-927
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-927.patch, async.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Our profiling shows that a lot of apps are neither CPU nor memory or network 
 bound. Instead they waste a lot of time waiting for lock in MessageStore. 
 That happens in netty threads. 
 We should be able to put messages into queue and then process them in other 
 set of threads. 
 It has to be configurable because adding another thread level will introduce 
 additional overhead. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-936) AsyncMessageStoreWrapper threads are not daemonized

2014-08-01 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-936:
-

Attachment: GIRAPH-936.patch

 AsyncMessageStoreWrapper threads are not daemonized
 ---

 Key: GIRAPH-936
 URL: https://issues.apache.org/jira/browse/GIRAPH-936
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov
 Attachments: GIRAPH-936.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Issue related to https://issues.apache.org/jira/browse/GIRAPH-927
 AsyncMessageStoreWrapper starts a set of threads without making them daemons. 
 Hence mappers are unable to complete computations.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (GIRAPH-933) Checkpointing improvements

2014-07-28 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-933:


 Summary: Checkpointing improvements
 Key: GIRAPH-933
 URL: https://issues.apache.org/jira/browse/GIRAPH-933
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov


We need to address some issues with checkpointing:
1) worker2worker messages are not saved
2) BspServiceWorker does not compile under hadoop_0.23 profile
3) it would be nice to be able to manually checkpoint and stop any job at any 
point of time. 





--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-933) Checkpointing improvements

2014-07-28 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-933:
-

Attachment: GIRAPH-933.patch

 Checkpointing improvements
 --

 Key: GIRAPH-933
 URL: https://issues.apache.org/jira/browse/GIRAPH-933
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-933.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We need to address some issues with checkpointing:
 1) worker2worker messages are not saved
 2) BspServiceWorker does not compile under hadoop_0.23 profile
 3) it would be nice to be able to manually checkpoint and stop any job at any 
 point of time. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-933) Checkpointing improvements

2014-07-28 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076430#comment-14076430
 ] 

Sergey Edunov commented on GIRAPH-933:
--

Review request: https://reviews.apache.org/r/23989

 Checkpointing improvements
 --

 Key: GIRAPH-933
 URL: https://issues.apache.org/jira/browse/GIRAPH-933
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-933.patch

   Original Estimate: 48h
  Remaining Estimate: 48h

 We need to address some issues with checkpointing:
 1) worker2worker messages are not saved
 2) BspServiceWorker does not compile under hadoop_0.23 profile
 3) it would be nice to be able to manually checkpoint and stop any job at any 
 point of time. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-930) Trailing space in ZooKeeper serverList file trips some file systems

2014-07-28 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076462#comment-14076462
 ] 

Sergey Edunov commented on GIRAPH-930:
--

Hi Mostafa, 
Thank you for working on this. Your implementation looks good to me, but I was 
thinking about maybe cleaning up some code above instead, what do you think? 
Attaching another patch (haven't tested yet, will do in the afternoon) 


 Trailing space in ZooKeeper serverList file trips some file systems
 ---

 Key: GIRAPH-930
 URL: https://issues.apache.org/jira/browse/GIRAPH-930
 Project: Giraph
  Issue Type: Bug
 Environment: HDInsight
Reporter: Mostafa Elhemali
 Attachments: GIRAPH-930.diff, GIRAPH-930.patch


 In Azure HDInsight (Hadoop in Microsoft Azure), the default file system is 
 the WASB file system which places data on Azure blob storage. This file 
 system doesn't handle trailing spaces that well, so when the ZooKeeperManager 
 class tries to create the serverList file with a trailing space in there 
 things fall apart.
 Ideally WASB would handle trailing spaces, but in this case the trailing 
 space is not really necessary. Can we remove it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-930) Trailing space in ZooKeeper serverList file trips some file systems

2014-07-28 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-930?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-930:
-

Attachment: GIRAPH-930.patch

 Trailing space in ZooKeeper serverList file trips some file systems
 ---

 Key: GIRAPH-930
 URL: https://issues.apache.org/jira/browse/GIRAPH-930
 Project: Giraph
  Issue Type: Bug
 Environment: HDInsight
Reporter: Mostafa Elhemali
 Attachments: GIRAPH-930.diff, GIRAPH-930.patch


 In Azure HDInsight (Hadoop in Microsoft Azure), the default file system is 
 the WASB file system which places data on Azure blob storage. This file 
 system doesn't handle trailing spaces that well, so when the ZooKeeperManager 
 class tries to create the serverList file with a trailing space in there 
 things fall apart.
 Ideally WASB would handle trailing spaces, but in this case the trailing 
 space is not really necessary. Can we remove it?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing

2014-07-28 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-927:
-

Attachment: async.patch

 Decouple netty server threads from message processing
 -

 Key: GIRAPH-927
 URL: https://issues.apache.org/jira/browse/GIRAPH-927
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-927.patch, async.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Our profiling shows that a lot of apps are neither CPU nor memory or network 
 bound. Instead they waste a lot of time waiting for lock in MessageStore. 
 That happens in netty threads. 
 We should be able to put messages into queue and then process them in other 
 set of threads. 
 It has to be configurable because adding another thread level will introduce 
 additional overhead. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-927) Decouple netty server threads from message processing

2014-07-28 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14076714#comment-14076714
 ] 

Sergey Edunov commented on GIRAPH-927:
--

Hi Craig, I can't reproduce your issue, and everything works within the test 
case, 
can you please test this patch?  I also attached it to the issue

diff --git 
giraph-core/src/main/java/org/apache/giraph/comm/messages/queue/AsyncMessageStoreWrapper.java
 
giraph-core/src/main/java/org/apache/giraph/comm/messages/queue/AsyncMessageStoreWrapper.java
index a62834f..252ee39 100644
--- 
giraph-core/src/main/java/org/apache/giraph/comm/messages/queue/AsyncMessageStoreWrapper.java
+++ 
giraph-core/src/main/java/org/apache/giraph/comm/messages/queue/AsyncMessageStoreWrapper.java
@@ -60,7 +60,7 @@ public final class AsyncMessageStoreWrapperI extends 
WritableComparable,
   /** Executor that processes messages in background */
   private static final ExecutorService EXECUTOR_SERVICE =
   Executors.newCachedThreadPool(
-  new ThreadFactoryBuilder()
+  new ThreadFactoryBuilder().setDaemon(true)
   .setNameFormat(AsyncMessageStoreWrapper-%d).build());

   /** Number of threads that will process messages in background */



 Decouple netty server threads from message processing
 -

 Key: GIRAPH-927
 URL: https://issues.apache.org/jira/browse/GIRAPH-927
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-927.patch, async.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Our profiling shows that a lot of apps are neither CPU nor memory or network 
 bound. Instead they waste a lot of time waiting for lock in MessageStore. 
 That happens in netty threads. 
 We should be able to put messages into queue and then process them in other 
 set of threads. 
 It has to be configurable because adding another thread level will introduce 
 additional overhead. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-927) Decouple netty server threads from message processing

2014-07-22 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14070509#comment-14070509
 ] 

Sergey Edunov commented on GIRAPH-927:
--

Yep, indeed, I'll be working on fix. 

 Decouple netty server threads from message processing
 -

 Key: GIRAPH-927
 URL: https://issues.apache.org/jira/browse/GIRAPH-927
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-927.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Our profiling shows that a lot of apps are neither CPU nor memory or network 
 bound. Instead they waste a lot of time waiting for lock in MessageStore. 
 That happens in netty threads. 
 We should be able to put messages into queue and then process them in other 
 set of threads. 
 It has to be configurable because adding another thread level will introduce 
 additional overhead. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing

2014-07-18 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-927:
-

Attachment: (was: GIRAPH-927.patch)

 Decouple netty server threads from message processing
 -

 Key: GIRAPH-927
 URL: https://issues.apache.org/jira/browse/GIRAPH-927
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-927.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Our profiling shows that a lot of apps are neither CPU nor memory or network 
 bound. Instead they waste a lot of time waiting for lock in MessageStore. 
 That happens in netty threads. 
 We should be able to put messages into queue and then process them in other 
 set of threads. 
 It has to be configurable because adding another thread level will introduce 
 additional overhead. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing

2014-07-18 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-927:
-

Attachment: GIRAPH-927.patch

 Decouple netty server threads from message processing
 -

 Key: GIRAPH-927
 URL: https://issues.apache.org/jira/browse/GIRAPH-927
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-927.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Our profiling shows that a lot of apps are neither CPU nor memory or network 
 bound. Instead they waste a lot of time waiting for lock in MessageStore. 
 That happens in netty threads. 
 We should be able to put messages into queue and then process them in other 
 set of threads. 
 It has to be configurable because adding another thread level will introduce 
 additional overhead. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing

2014-07-17 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-927:
-

Attachment: (was: GIRAPH-927.patch)

 Decouple netty server threads from message processing
 -

 Key: GIRAPH-927
 URL: https://issues.apache.org/jira/browse/GIRAPH-927
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-927.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Our profiling shows that a lot of apps are neither CPU nor memory or network 
 bound. Instead they waste a lot of time waiting for lock in MessageStore. 
 That happens in netty threads. 
 We should be able to put messages into queue and then process them in other 
 set of threads. 
 It has to be configurable because adding another thread level will introduce 
 additional overhead. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing

2014-07-17 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-927:
-

Attachment: GIRAPH-927.patch

 Decouple netty server threads from message processing
 -

 Key: GIRAPH-927
 URL: https://issues.apache.org/jira/browse/GIRAPH-927
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-927.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Our profiling shows that a lot of apps are neither CPU nor memory or network 
 bound. Instead they waste a lot of time waiting for lock in MessageStore. 
 That happens in netty threads. 
 We should be able to put messages into queue and then process them in other 
 set of threads. 
 It has to be configurable because adding another thread level will introduce 
 additional overhead. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-905) Giraph Debugger

2014-07-16 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14064041#comment-14064041
 ] 

Sergey Edunov commented on GIRAPH-905:
--

Hi Jaeho, thank you for working on this it is an awesome addition to Giraph!
Could you please run checkstyle and fix errors and then submit a review request 
to review board https://reviews.apache.org/r/ ?


 Giraph Debugger
 ---

 Key: GIRAPH-905
 URL: https://issues.apache.org/jira/browse/GIRAPH-905
 Project: Giraph
  Issue Type: New Feature
Reporter: Jaeho Shin
 Attachments: GIRAPH-905.patch


 Four of us at Stanford (Vikesh Khanna, Semih Salihoglu, Jaeho Shin, and Brian 
 Ba Quan Truong) developed a debugger for Giraph, named Graft, and we hope to 
 integrate our code into Giraph trunk.  It is able to launch Giraph jobs in 
 debugging mode to capture traces of certain vertices and MasterCompute at 
 particular supersteps, requiring almost no code change by the user.  From the 
 captured traces, it can generate JUnit tests to replicate the contexts under 
 which compute() function was running for the user to reproduce bugs.  You can 
 read more about it at our GitHub repository: 
 https://github.com/semihsalihoglu/graft



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-924) Fix checkpointing

2014-07-15 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-924:
-

Attachment: GIRAPH_checkpoint_v3.patch

 Fix checkpointing
 -

 Key: GIRAPH-924
 URL: https://issues.apache.org/jira/browse/GIRAPH-924
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-924.patch, GIRAPH_checkpoint_v3.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 We need to make checkpoiting in Giraph functional again - it misses a lot of 
 data because of many additions we've been making to Giraph (like information 
 from WorkerContext/MasterCompute, proper integration with per superstep 
 output etc).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (GIRAPH-927) Decouple netty server threads from message processing

2014-07-08 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-927:


 Summary: Decouple netty server threads from message processing
 Key: GIRAPH-927
 URL: https://issues.apache.org/jira/browse/GIRAPH-927
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov


Our profiling shows that a lot of apps are neither CPU nor memory or network 
bound. Instead they waste a lot of time waiting for lock in MessageStore. That 
happens in netty threads. 
We should be able to put messages into queue and then process them in other set 
of threads. 
It has to be configurable because adding another thread level will introduce 
additional overhead. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-927) Decouple netty server threads from message processing

2014-07-08 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-927?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-927:
-

Attachment: GIRAPH-927.patch

 Decouple netty server threads from message processing
 -

 Key: GIRAPH-927
 URL: https://issues.apache.org/jira/browse/GIRAPH-927
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-927.patch

   Original Estimate: 168h
  Remaining Estimate: 168h

 Our profiling shows that a lot of apps are neither CPU nor memory or network 
 bound. Instead they waste a lot of time waiting for lock in MessageStore. 
 That happens in netty threads. 
 We should be able to put messages into queue and then process them in other 
 set of threads. 
 It has to be configurable because adding another thread level will introduce 
 additional overhead. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (GIRAPH-925) Unit tests should pass even if zookeeper port not available

2014-07-02 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-925:


 Summary: Unit tests should pass even if zookeeper port not 
available
 Key: GIRAPH-925
 URL: https://issues.apache.org/jira/browse/GIRAPH-925
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
Priority: Minor


Currently if something is using port 22182 unit tests will fail. Or even worse, 
they timeout and then fail. Unit tests should not depend on availability of 
this port. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-925) Unit tests should pass even if zookeeper port not available

2014-07-02 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-925?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-925:
-

Attachment: GIRAPH-925.patch

 Unit tests should pass even if zookeeper port not available
 ---

 Key: GIRAPH-925
 URL: https://issues.apache.org/jira/browse/GIRAPH-925
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
Priority: Minor
 Attachments: GIRAPH-925.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently if something is using port 22182 unit tests will fail. Or even 
 worse, they timeout and then fail. Unit tests should not depend on 
 availability of this port. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-925) Unit tests should pass even if zookeeper port not available

2014-07-02 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14050818#comment-14050818
 ] 

Sergey Edunov commented on GIRAPH-925:
--

https://reviews.apache.org/r/23251

 Unit tests should pass even if zookeeper port not available
 ---

 Key: GIRAPH-925
 URL: https://issues.apache.org/jira/browse/GIRAPH-925
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
Priority: Minor
 Attachments: GIRAPH-925.patch

   Original Estimate: 2h
  Remaining Estimate: 2h

 Currently if something is using port 22182 unit tests will fail. Or even 
 worse, they timeout and then fail. Unit tests should not depend on 
 availability of this port. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-924) Fix checkpointing

2014-07-01 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-924:
-

Attachment: GIRAPH-924.patch

 Fix checkpointing
 -

 Key: GIRAPH-924
 URL: https://issues.apache.org/jira/browse/GIRAPH-924
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-924.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 We need to make checkpoiting in Giraph functional again - it misses a lot of 
 data because of many additions we've been making to Giraph (like information 
 from WorkerContext/MasterCompute, proper integration with per superstep 
 output etc).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (GIRAPH-924) Fix checkpointing

2014-06-27 Thread Sergey Edunov (JIRA)
Sergey Edunov created GIRAPH-924:


 Summary: Fix checkpointing
 Key: GIRAPH-924
 URL: https://issues.apache.org/jira/browse/GIRAPH-924
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-924.patch

We need to make checkpoiting in Giraph functional again - it misses a lot of 
data because of many additions we've been making to Giraph (like information 
from WorkerContext/MasterCompute, proper integration with per superstep output 
etc).




--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-924) Fix checkpointing

2014-06-27 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-924:
-

Attachment: GIRAPH-924.patch

 Fix checkpointing
 -

 Key: GIRAPH-924
 URL: https://issues.apache.org/jira/browse/GIRAPH-924
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-924.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 We need to make checkpoiting in Giraph functional again - it misses a lot of 
 data because of many additions we've been making to Giraph (like information 
 from WorkerContext/MasterCompute, proper integration with per superstep 
 output etc).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-924) Fix checkpointing

2014-06-27 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14046379#comment-14046379
 ] 

Sergey Edunov commented on GIRAPH-924:
--

https://reviews.apache.org/r/23140/

 Fix checkpointing
 -

 Key: GIRAPH-924
 URL: https://issues.apache.org/jira/browse/GIRAPH-924
 Project: Giraph
  Issue Type: Improvement
Reporter: Sergey Edunov
 Attachments: GIRAPH-924.patch

   Original Estimate: 336h
  Remaining Estimate: 336h

 We need to make checkpoiting in Giraph functional again - it misses a lot of 
 data because of many additions we've been making to Giraph (like information 
 from WorkerContext/MasterCompute, proper integration with per superstep 
 output etc).



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (GIRAPH-903) Detect crashes of Netty threads

2014-06-25 Thread Sergey Edunov (JIRA)

 [ 
https://issues.apache.org/jira/browse/GIRAPH-903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Edunov updated GIRAPH-903:
-

Attachment: GIRAPH-903.patch

 Detect crashes of Netty threads
 ---

 Key: GIRAPH-903
 URL: https://issues.apache.org/jira/browse/GIRAPH-903
 Project: Giraph
  Issue Type: Bug
Reporter: Sergey Edunov
Priority: Minor
 Attachments: GIRAPH-903.patch, GIRAPH-903.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 When some of the request processing threads fails, the worker gets stuck but 
 the job doesn't fail and it has to be killed manually. We should detect netty 
 thread crashes and fail the job automatically.
 You can easily reproduce this if you add a mistake to deserialization of 
 messages for example.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-842) option to dump histogram of memory usage when heap is low on memory

2014-06-10 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14026848#comment-14026848
 ] 

Sergey Edunov commented on GIRAPH-842:
--

+1

 option to dump histogram of memory usage when heap is low on memory
 ---

 Key: GIRAPH-842
 URL: https://issues.apache.org/jira/browse/GIRAPH-842
 Project: Giraph
  Issue Type: Bug
Reporter: Pavan Kumar
Assignee: Pavan Kumar
Priority: Minor
 Attachments: GIRAPH-842.patch, GIRAPH-842_1.patch, master-stderr, 
 worker-stderr


 Currently we are left in blind for jobs that OOM, it would be helpful if we 
 can do a jmap -histo dump when heap has very little free space left.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (GIRAPH-842) option to dump histogram of memory usage when heap is low on memory

2014-06-09 Thread Sergey Edunov (JIRA)

[ 
https://issues.apache.org/jira/browse/GIRAPH-842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025989#comment-14025989
 ] 

Sergey Edunov commented on GIRAPH-842:
--

Hey Pavan, this is great feature, here are some comments on implementation:

1) Please assign your thread some meaningful name, it can be helpful in 
debugging stack traces.
2) Stop flag has to be volatile, otherwise your thread may never see the change
3) I would let the thread die if you get InterruptedException.
4) runtime.freeMemory() might give you false alarms if you run job with 
different values of Xmx and Xms. This is because freeMemory only counts free 
bytes currently allocated to JVM. If your Xmx setting is bigger than Xms you 
can go low on freeMemory before JVM allocates another chunk. To get more 
accurate results you can do (maxMemory - totalMemory + freeMemory) 

 option to dump histogram of memory usage when heap is low on memory
 ---

 Key: GIRAPH-842
 URL: https://issues.apache.org/jira/browse/GIRAPH-842
 Project: Giraph
  Issue Type: Bug
Reporter: Pavan Kumar
Assignee: Pavan Kumar
Priority: Minor
 Attachments: GIRAPH-842.patch, master-stderr, worker-stderr


 Currently we are left in blind for jobs that OOM, it would be helpful if we 
 can do a jmap -histo dump when heap has very little free space left.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   >