Hi,
1) What's the best way for storing extra data (such as URL) on a vertex?
I thought this would be through a class variable but I could not find
the way to access that variable from the neighbor.
For example I'd like to remove the duplicate edges going towards the
nodes with the "same url" (Duplicate Removal phase of LinkRank). How can
I learn my neighbor's url variable: targetUrl?
2) Is removing edges like this a valid approach?
public class LinkRankVertex extends Vertex<IntWritable, FloatWritable,
NullWritable, FloatWritable> {
public String url;
public void removeDuplicateLinks() {
int targetId;
String targetUrl;
Set<String> urls = new HashSet<String>();
ArrayListEdges<IntWritable, NullWritable> edges = new
ArrayListEdges<IntWritable, NullWritable>();
for (Edge<IntWritable, NullWritable> edge : getEdges()) {
targetId = edge.getTargetVertexId().get();
targetUrl = ...??
if (!urls.contains(targetUrl)) {
urls.add(targetUrl);
edges.add(edge);
}
}
setEdges(edges);
}
}
Thanks,
Emre.