Hi,

1) What's the best way for storing extra data (such as URL) on a vertex? I thought this would be through a class variable but I could not find the way to access that variable from the neighbor. For example I'd like to remove the duplicate edges going towards the nodes with the "same url" (Duplicate Removal phase of LinkRank). How can I learn my neighbor's url variable: targetUrl?

2) Is removing edges like this a valid approach?


public class LinkRankVertex extends Vertex<IntWritable, FloatWritable,
        NullWritable, FloatWritable> {

    public String url;
    public void removeDuplicateLinks() {
        int targetId;
        String targetUrl;

        Set<String> urls = new HashSet<String>();
ArrayListEdges<IntWritable, NullWritable> edges = new ArrayListEdges<IntWritable, NullWritable>();

        for (Edge<IntWritable, NullWritable> edge : getEdges()) {
            targetId = edge.getTargetVertexId().get();
            targetUrl = ...??
            if (!urls.contains(targetUrl)) {
                urls.add(targetUrl);
                edges.add(edge);
            }
        }
        setEdges(edges);
    }
}

Thanks,
Emre.

Reply via email to