The data I have as input is not in a Graph-Format so I use an
EdgeInputFormat to create a Graph. Its also deterministic so the same Graph
should be build with the same input.
Each line in the input is a set of connected vertices.
I create edges in a way that they form a star around the vertex with the
lowest ID. Every vertex has an edge to that vertex and it has an edge to
all other vertices. See the code further down.
I will see if I can sort the output and then do a diff to see if they have
the same results. Is there a simple way to make sure that the same graph
was build? (The full dataset has around a billion vertices so I can't
visualize it)
private class ConCompInReader extends TextEdgeReader {
private Text[] nodeArray = null;
private int otherNodeId = -1;
NullWritable edgeValue = NullWritable.get();
// flag used to create edges in both directions
private boolean flipped = true;
private boolean done = true;
@Override
public final Text getCurrentSourceId() throws IOException,
InterruptedException {
if (flipped)
return nodeArray[otherNodeId];
return nodeArray[0];
}
@Override
public final Edge<Text, NullWritable> getCurrentEdge()
throws IOException, InterruptedException {
if (flipped)
return EdgeFactory.create(nodeArray[0], edgeValue);
return EdgeFactory.create(nodeArray[otherNodeId], edgeValue);
}
@Override
public final boolean nextEdge() throws IOException,
InterruptedException {
// flipp direction of the edge
if(!flipped){
flipped = true;
}
// line not yet done and already created the flipped version
else if(!done){
otherNodeId++;
flipped = false;
if (otherNodeId >= nodeArray.length){
done = true;
}
}
// TODO this version ignores singels switch with version below
if needed
// if done with this line read the next one if there is one
// if current line is consumed it checks if there is a next one
if (done) {
boolean hasNext = getRecordReader().nextKeyValue();
while (done && hasNext) {
String[] stringNodeArray =
getRecordReader().getCurrentValue().toString().split(
"\\s+");
// found line with more than one URL
if (stringNodeArray.length >= 2) {
otherNodeId = 1;
done = false;
flipped = false;
Arrays.sort(stringNodeArray);
nodeArray = new Text[stringNodeArray.length];
for (int i = 0; i < stringNodeArray.length; i++) {
nodeArray[i] = new Text(stringNodeArray[i]);
}
}
// if not ignore line and read a new one
else {
hasNext = getRecordReader().nextKeyValue();
}
}
}
// TODO version that does not ignore singels
/*if (done && getRecordReader().nextKeyValue()) {
arrayNode = getRecordReader().getCurrentValue();
String[] stringNodeArray =
arrayNode.toString().split("\\s+");
otherNodeId = 1;
done = false;
flipped = false;
Arrays.sort(stringNodeArray);
nodeArray = new Text[stringNodeArray.length];
for (int i = 0; i < stringNodeArray.length; i++) {
nodeArray[i] = new Text(stringNodeArray[i]);
}
}*/
return !done;
}
}
On Thu, Feb 27, 2014 at 10:57 AM, Sebastian Schelter <[email protected]> wrote:
> Hi Martin,
>
> You are right, this should not happen, your code looks correct. One way to
> check the output would be to simply count the number vertices per component
> and see if that number stays stable.
>
> Do you supply all vertices in your input data or are some vertices created
> during the computation? Maybe there's a bug/race condition with the vertex
> creation (just wildly guessing)
>
> Best,
> Sebastian
>
>
>
> On 02/27/2014 10:48 AM, Martin Neumann wrote:
>
>> Hej,
>>
>> I have modified the connected component example to fit my input data. I
>> expect it to be deterministic.
>>
>> But when I run it multiple times it takes a different number of Super
>> steps. This only happens on the complete dataset and not on my small test
>> dataset.
>> (So I cannot check the output for correctness in a simple way)
>>
>> Has anyone an Idea how this could happen?
>>
>> cheers Martin
>>
>>
>>
>>
>> In case its useful here the computation class:
>>
>> @Override
>> public void compute(Vertex<Text, Text, NullWritable> vertex,
>> Iterable<Text> inmessage) throws IOException {
>> boolean changed = false;
>>
>> // first superstep && setup
>> if (getSuperstep() == 0) {
>> //initialize value
>> vertex.setValue(vertex.getId());
>> //cheating by checking the neighbors ID's (cuts down 1
>> iteration)
>> for (Edge<Text, NullWritable> e : vertex.getEdges()) {
>> Text candidate = e.getTargetVertexId();
>> if (candidate.compareTo(vertex.getValue()) < 0) {
>> changed = true;
>> vertex.setValue(candidate);
>> }
>> }
>> }
>>
>> // other superstep
>> else {
>> // read all messages and compare with own state
>> for (Text message : inmessage) {
>> if (message.compareTo(vertex.getValue()) < 0) {
>> changed = true;
>> vertex.setValue(message);
>> }
>> }
>> }
>>
>> // if state has changed send a message to all neighbors
>> if (changed && getSuperstep() < limiter) {
>> sendMessageToAllEdges(vertex, vertex.getValue());
>> }
>>
>> vertex.voteToHalt();
>> }
>>
>>
>