Re: OPIC scoring differences
On 7/9/07, Andrzej Bialecki [EMAIL PROTECTED] wrote: Carl Cerecke wrote: Hi, The docs for the OPICScoringFilter mention that the plugin implements a variant of OPIC from Artiboul et al's paper. What exactly is different? How does the difference affect the scores? As it is now, the implementation doesn't preserve the total cash value in the system, and also there is almost no smoothing between the iterations (Abiteboul's history). As a consequence, scores may (and do) vary dramatically between iterations, and they don't converge to stable values, i.e. they always increase. For pages that get a lot of score contributions from other pages this leads to an explosive increase into the range of thousands or eventually millions. This means that the scores produced by the OPIC plugin exaggerate score differences between pages more and more, even if the web graph that you crawl is stable. In a sense, to follow the cash analogy, our implementation of OPIC illustrates a runaway economy - galloping inflation, rich get richer and poor get poorer ;) Also, there's a comment in the code: // XXX (ab) no adjustment? I think this is contrary to the algorithm descr. // XXX in the paper, where page loses its score if it's distributed to // XXX linked pages... Is this something that will be looked at eventually or is the scoring good enough at the moment without some adjustment. Yes, I'll start working on it when I get back from vacations. I did some simulations that show how to fix it (see http://wiki.apache.org/nutch/FixingOpicScoring bottom of the page). Andrzej, nice to see you working on this. There is one thing that I don't understand about your presentation. Assume that page A is the only url in our crawldb and it contains n outlinks. t = 0 - Generate runs, A is generated. t = 1 - Page A is fetched and its cash is distributed to its outlinks. t = 2 - Generate runs, pages P0-Pn are generated. t = 3 - P0 - Pn are fetched and their cash are distributed to their outlinks. - At this time, it is possible that page Pk links to page A. So, now Page A's cash 0. t = 4 - Generate runs, page A is considered but is not generated (since its next fetch time is later than current time). - Won't page A become a temporary sink? Time between subsequent fetches may be as large as 30 days in default configuration. So, page A will accumulate cash for a long time without distributing it. - I don't see how we can achieve that, but, IMO, if a page is considered but not generated, nutch should distribute its cash to outlinks the outlinks that are stored in its parse data. (I know that this is incredibly hard (if not impossible) to do this.) Or am I missing something here? -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- Doğacan Güney
Re: OPIC scoring differences
Doğacan Güney wrote: Andrzej, nice to see you working on this. There is one thing that I don't understand about your presentation. Assume that page A is the only url in our crawldb and it contains n outlinks. t = 0 - Generate runs, A is generated. t = 1 - Page A is fetched and its cash is distributed to its outlinks. t = 2 - Generate runs, pages P0-Pn are generated. t = 3 - P0 - Pn are fetched and their cash are distributed to their outlinks. - At this time, it is possible that page Pk links to page A. So, now Page A's cash 0. t = 4 - Generate runs, page A is considered but is not generated (since its next fetch time is later than current time). - Won't page A become a temporary sink? Time between subsequent fetches may be as large as 30 days in default configuration. So, page A will accumulate cash for a long time without distributing it. Yes. That's why Abiteboul used history (several cycles long) to smooth out temporary imbalances in cache redistribution. The history component described in the paper could be either several cycles long, or specific period of time long. In our case I think the history for rarely updated pages should span the db.max.interval period plus some, and for frequently updated pages it should span several cycles. - I don't see how we can achieve that, but, IMO, if a page is considered but not generated, nutch should distribute its cash to outlinks the outlinks that are stored in its parse data. (I know that this is incredibly hard (if not impossible) to do this.) Actually we store outlinks in two places - one place is obviously the segments. The other less obvious place is the linkdb - the data is there, it just needs to be inverted (again). So, theoretically, we could modify the updatedb process to consider the complete webgraph, i.e. all link information collected so far - but the main attractiveness of OPIC is that it's incremental, so that you don't have to consider the whole webgraph with small incremental updates. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: OPIC scoring differences
Hi, On 7/9/07, Carl Cerecke [EMAIL PROTECTED] wrote: Hi, The docs for the OPICScoringFilter mention that the plugin implements a variant of OPIC from Artiboul et al's paper. What exactly is different? How does the difference affect the scores? Also, there's a comment in the code: // XXX (ab) no adjustment? I think this is contrary to the algorithm descr. // XXX in the paper, where page loses its score if it's distributed to // XXX linked pages... Is this something that will be looked at eventually or is the scoring good enough at the moment without some adjustment. I certainly hope that this is something that will be looked at eventually. IMHO, scoring is not good enough, but it doesn't bother anyone enough so that they decide to fix it. Also, see Andrzej's comments in NUTCH-267 about why plugin scoring-opic is not really OPIC. It is basically a glorified link counter. Cheers, Carl. -- Doğacan Güney
Re: OPIC scoring differences
Carl Cerecke wrote: Hi, The docs for the OPICScoringFilter mention that the plugin implements a variant of OPIC from Artiboul et al's paper. What exactly is different? How does the difference affect the scores? As it is now, the implementation doesn't preserve the total cash value in the system, and also there is almost no smoothing between the iterations (Abiteboul's history). As a consequence, scores may (and do) vary dramatically between iterations, and they don't converge to stable values, i.e. they always increase. For pages that get a lot of score contributions from other pages this leads to an explosive increase into the range of thousands or eventually millions. This means that the scores produced by the OPIC plugin exaggerate score differences between pages more and more, even if the web graph that you crawl is stable. In a sense, to follow the cash analogy, our implementation of OPIC illustrates a runaway economy - galloping inflation, rich get richer and poor get poorer ;) Also, there's a comment in the code: // XXX (ab) no adjustment? I think this is contrary to the algorithm descr. // XXX in the paper, where page loses its score if it's distributed to // XXX linked pages... Is this something that will be looked at eventually or is the scoring good enough at the moment without some adjustment. Yes, I'll start working on it when I get back from vacations. I did some simulations that show how to fix it (see http://wiki.apache.org/nutch/FixingOpicScoring bottom of the page). -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: OPIC score calculation issues
(Better late than never... I forgot I didn't yet respond to your posting). Doug Cutting wrote: I think all that you're saying is that we should not run two CrawlDB updates at once, right? But there are lots of reasons we cannot do that besides the OPIC calculation. When we used WebDB it was possible to overlap generate / fetch / update cycles, because we would lock pages selected by FetchListTool for a period of time. Now we don't do this. The advantage is that we don't have to rewrite CrawlDB. But operations on CrawlDB are considerably faster than on WebDB, perhaps we should consider going back to this method? Also, the cash value of those outlinks that point to URLs not in the current fetchlist will be dropped, because they won't be collected anywhere. No, every cash value is used. The input to a crawl db update includes a CrawlDatum for every known url, including those just linked to. If the only CrawlDatum for a url is a new outlink from a page crawled, then the score for the page is 1.0 + the score of that outlink. Of course, you are right, I missed this. And a final note: CrawlDB.update() uses the initial score value recorded in the segment, and NOT the value that is actually found in CrawlDB at the time of the update. This means that if there was another update in the meantime, your new score in CrawlDB will be overwritten with the score based on an older initial value. This is counter-intuitive - I think CrawlDB.update() should always use the latest score value found in the current CrawlDB. I.e. in CrawlDBReducer instead of doing: result.setScore(result.getScore() + scoreIncrement); we should do: result.setScore(old.getScore() + scoreIncrement); The change is not quite that simple, since 'old' is sometimes null. So perhaps we need to add an 'score' variable that is set to old.score when old!=null and to 1.0 otherwise (for newly linked pages). The reason I didn't do it that way was to permit the Fetcher to modify scores, since I was thinking of the Fetcher as the actor whose actions are being processed here, and of the CrawlDb as the passive thing acted on. But indeed, if you have another process that's updating a CrawlDb while a Fetcher is running, this may not be the case. So if we want to switch things so that the Fetcher is not permitted to adjust scores, then this seems like a reasonable change. I would vote for implementing this change. The reason is that the active actor that computes new scores is CrawlDb.update(). Fetcher may provide additional information to affect the score, but IMHO the logic to calculate new scores should be concentrated in the update() method. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: OPIC score calculation issues
Andrzej Bialecki wrote: When we used WebDB it was possible to overlap generate / fetch / update cycles, because we would lock pages selected by FetchListTool for a period of time. Now we don't do this. The advantage is that we don't have to rewrite CrawlDB. But operations on CrawlDB are considerably faster than on WebDB, perhaps we should consider going back to this method? Yes, this would be a good addition. Ideally we should change Crawl.java to overlap these too. When -topN is specified and substantially smaller than the total size of the crawldb, then we can generate, start a fetch job, then generate again. As each fetch completes, we can start the next, then run an update and generate based on the just-completed fetch, so that we're constantly fetching. This could be implemented by: (a) adding a status for generated crawl data; (b) adding a option to updatedb to include the generated output from some segments. Then, in the above algorithm, the first time we'd update with only the generator output, but, after that, we can combine the updates with fetcher and generator output. This way, in the course of a crawl, we only re-write the crawldb one additional time, rather than twice as many times. Does this make sense? And a final note: CrawlDB.update() uses the initial score value recorded in the segment, and NOT the value that is actually found in CrawlDB at the time of the update. This means that if there was another update in the meantime, your new score in CrawlDB will be overwritten with the score based on an older initial value. This is counter-intuitive - I think CrawlDB.update() should always use the latest score value found in the current CrawlDB. I.e. in CrawlDBReducer instead of doing: result.setScore(result.getScore() + scoreIncrement); we should do: result.setScore(old.getScore() + scoreIncrement); The change is not quite that simple, since 'old' is sometimes null. So perhaps we need to add an 'score' variable that is set to old.score when old!=null and to 1.0 otherwise (for newly linked pages). The reason I didn't do it that way was to permit the Fetcher to modify scores, since I was thinking of the Fetcher as the actor whose actions are being processed here, and of the CrawlDb as the passive thing acted on. But indeed, if you have another process that's updating a CrawlDb while a Fetcher is running, this may not be the case. So if we want to switch things so that the Fetcher is not permitted to adjust scores, then this seems like a reasonable change. I would vote for implementing this change. The reason is that the active actor that computes new scores is CrawlDb.update(). Fetcher may provide additional information to affect the score, but IMHO the logic to calculate new scores should be concentrated in the update() method. I agree: +1. I was just trying to explain the existing logic. I think this would provide a significant improvement, with little lost. Doug
Re: OPIC score calculation issues
Andrzej Bialecki wrote: * CrawlDBReducer (used by CrawlDB.update()) collects all CrawlDatum-s from crawl_parse with the same URL, which means that we get: * the original CrawlDatum * (optionally a CrawlDatum that contains just a Signature) * all CrawlDatum.LINKED entries pointing to our URL, generated by outlinks from other pages. Based on this information, a new score is calculated by adding the original score and all scores from incoming links. HOWEVER... and here's where I suspect the current code is wrong: since we are processing just one segment the incoming link information is very incomplete because it comes only from the outlinks discovered by fetching this segment's fetchlist, and not the complete LinkDB. I think the code is correct. OPIC is an incremental algorithm, designed to be calculated while crawling. As each new link is seen, it increments the score of the page it links to. OPIC is thus much simpler and faster to calculate than PageRank. (It also provides a good approximation of PageRank, but prioritizes better when crawling than PageRank. Crawling using an incrementally calculated PageRank is not as good as OPIC at crawling higher PageRank pages sooner.) One mitigating factor could be that we already accounted for incoming links from other segments when processing those other segments - so our initial score already includes the inlink information from other segments. But this assumes that we never generate and process more than 1 segment in parallel, i.e. that we finish updating from all previous segments before we update from the current segment (otherwise we wouldn't know the updated initial score). I think all that you're saying is that we should not run two CrawlDB updates at once, right? But there are lots of reasons we cannot do that besides the OPIC calculation. Also, the cash value of those outlinks that point to URLs not in the current fetchlist will be dropped, because they won't be collected anywhere. No, every cash value is used. The input to a crawl db update includes a CrawlDatum for every known url, including those just linked to. If the only CrawlDatum for a url is a new outlink from a page crawled, then the score for the page is 1.0 + the score of that outlink. I think a better option would be to add the LinkDB as an input dir to CrawlDB.update(), so that we have access to all previously collected inlinks. That would be a lot slower, and it would not compute OPIC. And a final note: CrawlDB.update() uses the initial score value recorded in the segment, and NOT the value that is actually found in CrawlDB at the time of the update. This means that if there was another update in the meantime, your new score in CrawlDB will be overwritten with the score based on an older initial value. This is counter-intuitive - I think CrawlDB.update() should always use the latest score value found in the current CrawlDB. I.e. in CrawlDBReducer instead of doing: result.setScore(result.getScore() + scoreIncrement); we should do: result.setScore(old.getScore() + scoreIncrement); The change is not quite that simple, since 'old' is sometimes null. So perhaps we need to add an 'score' variable that is set to old.score when old!=null and to 1.0 otherwise (for newly linked pages). The reason I didn't do it that way was to permit the Fetcher to modify scores, since I was thinking of the Fetcher as the actor whose actions are being processed here, and of the CrawlDb as the passive thing acted on. But indeed, if you have another process that's updating a CrawlDb while a Fetcher is running, this may not be the case. So if we want to switch things so that the Fetcher is not permitted to adjust scores, then this seems like a reasonable change. Doug
Re: OPIC
Massimo Miccoli wrote: Sorry Andrzej, I mean on DeleteDuplicates.java, not in runtime. Is that the correct place to integrate some like Shingling or n-gram? Yes. But there is an small issue of high dimensionality to solve, otherwise it will be very inefficient... Both shingling and n-gram based methods (word n-gram or character n-gram) produce a profile of a document, which can be compared to other profiles, one by one. So, this seems to be appropriate to detect near duplicates - you create a profile for each document (in IndexDoc), and sort them... but here's where the problems start. Usually such profiles take a lot of space (e.g. a list of 100 top n-grams), and comparing them takes a lot of resources - and several comparison operations are needed per item to sort the signatures. This is currently done by HashScore. (BTW, HashScore is missing the fetchTime, which the original dedup algorithm took also into account when comparing pages...). So, you need to reduce the number of dimensions in a signature to decrease the complexity of compare operations. This can be done using purely numeric signatures (e.g. Nilsimsa - but this particular approach brings numerous problems with quantization noise). -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: OPIC
Hi Doug, Many thanks for your patch. I now try it. I'm also thinking to integrate some algo for near duplicated urls detection. I mean some like Shingling. Is dedup the best place to integrate the algo? Thanks, Massimo Doug Cutting ha scritto: Here is a patch that implements this. I'm still testing it. If it appears to work well, I will commit it. Doug Cutting wrote: Massimo Miccoli wrote: Any news about integration of OPIC in mapred? I have time to develop OPIC on Nutch Mapred. Can you help me to start? By the email from Carlos Alberto-Alejandro CASTILLO-Ocaranza, seams that the best way to integrate OPIC in on old webdb, is this way valid also CrawlDb in Mapred? Yes. I think the way to implement this in the mapred branch is: 1. In CrawlDatum.java, replace 'int linkCount' with 'float score'. The default value of this should be 1.0f. This will require changes to accessors, write, readFields, compareTo etc. A constructor which specifies the score should be added. The comparator should sort by decreasing score. 2. In crawl/Fetcher.java, add the score to the Content's metadata: public static String SCORE_KEY = org.apache.nutch.crawl.score; ... private void output(...) { ... content.getMetadata().setProperty(SCORE_KEY, datum.getScore()); ... } 3. In ParseOutputFormat.java, when writing the CrawlDatum for each outlink (line 77), set the score of the link CrawlDatum to be the score of the page: float score = Float.valueOf(parse.getData().get(Fetcher.SCORE_KEY)); score /= links.length; for (int i = 0; i links.length, ...) { ... new CrawlDatum(CrawlDatum.STATUS_LINKED, interval, score); ... } 4. In CrawlDbReducer.java, remove linkCount calculations. Replace these with something like: float scoreIncrement = 0.0f; while (values.next()) { ... switch (datum.getStatus()) { ... CrawlDatum.STATUS_LINKED: scoreIncrement += datum.getScore(); break; ... } ... result.setScore(result.getScore() + scoreIncrement); I think that should do it, no? Doug Index: conf/crawl-tool.xml === --- conf/crawl-tool.xml (revision 326624) +++ conf/crawl-tool.xml (working copy) @@ -15,13 +15,6 @@ /property property - nameindexer.boost.by.link.count/name - valuetrue/value - descriptionWhen true scores for a page are multipled by the log of - the number of incoming links to the page./description -/property - -property namedb.ignore.internal.links/name valuefalse/value descriptionIf true, when adding new links to a page, links from Index: conf/nutch-default.xml === --- conf/nutch-default.xml (revision 326624) +++ conf/nutch-default.xml (working copy) @@ -440,24 +440,6 @@ !-- indexer properties -- property - nameindexer.score.power/name - value0.5/value - descriptionDetermines the power of link analyis scores. Each - pages's boost is set to iscoresupscorePower/sup/i where - iscore/i is its link analysis score and iscorePower/i is the - value of this parameter. This is compiled into indexes, so, when - this is changed, pages must be re-indexed for it to take - effect./description -/property - -property - nameindexer.boost.by.link.count/name - valuetrue/value - descriptionWhen true scores for a page are multipled by the log of - the number of incoming links to the page./description -/property - -property nameindexer.max.title.length/name value100/value descriptionThe maximum number of characters of a title that are indexed. Index: src/java/org/apache/nutch/crawl/CrawlDatum.java === --- src/java/org/apache/nutch/crawl/CrawlDatum.java (revision 326624) +++ src/java/org/apache/nutch/crawl/CrawlDatum.java (working copy) @@ -31,7 +31,7 @@ public static final String FETCH_DIR_NAME = crawl_fetch; public static final String PARSE_DIR_NAME = crawl_parse; - private final static byte CUR_VERSION = 1; + private final static byte CUR_VERSION = 2; public static final byte STATUS_DB_UNFETCHED = 1; public static final byte STATUS_DB_FETCHED = 2; @@ -47,17 +47,20 @@ private long fetchTime = System.currentTimeMillis(); private byte retries; private float fetchInterval; - private int linkCount; + private float score = 1.0f; public CrawlDatum() {} public CrawlDatum(int status, float fetchInterval) { this.status = (byte)status; this.fetchInterval = fetchInterval; -if (status == STATUS_LINKED) - linkCount = 1; } + public CrawlDatum(int status, float fetchInterval, float score) { +this(status, fetchInterval); +this.score = score; + } + // // accessor methods // @@ -80,8 +83,8 @@ this.fetchInterval = fetchInterval; } - public
Re: OPIC
Massimo Miccoli wrote: Hi Doug, Many thanks for your patch. I now try it. I'm also thinking to integrate some algo for near duplicated urls detection. I mean some like Shingling. Is dedup the best place to integrate the algo? That would be lovely. Dedup is the place to start, but certainly not the place to stop... ;-) I think we should introduce a separate dedup field for each page in the DB. The reason is that if we re-use the md5 (or change its semantics to mean near duplicates covered by this value) then we run a risk of loosing a lot of legitimate unique urls from the DB. Shingling, if you know how to implement it efficiently, would certainly be nice - but we could start by just passing a normalized text to md5. By normalized text I mean all lowercase, stopwords removed, punctuation removed, any consecutive whitespace replaced with exactly 1 space character. We could also use an n-gram profile (either word-level or character level) with coarse quantization. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: OPIC
Here is a patch that implements this. I'm still testing it. If it appears to work well, I will commit it. Doug Cutting wrote: Massimo Miccoli wrote: Any news about integration of OPIC in mapred? I have time to develop OPIC on Nutch Mapred. Can you help me to start? By the email from Carlos Alberto-Alejandro CASTILLO-Ocaranza, seams that the best way to integrate OPIC in on old webdb, is this way valid also CrawlDb in Mapred? Yes. I think the way to implement this in the mapred branch is: 1. In CrawlDatum.java, replace 'int linkCount' with 'float score'. The default value of this should be 1.0f. This will require changes to accessors, write, readFields, compareTo etc. A constructor which specifies the score should be added. The comparator should sort by decreasing score. 2. In crawl/Fetcher.java, add the score to the Content's metadata: public static String SCORE_KEY = org.apache.nutch.crawl.score; ... private void output(...) { ... content.getMetadata().setProperty(SCORE_KEY, datum.getScore()); ... } 3. In ParseOutputFormat.java, when writing the CrawlDatum for each outlink (line 77), set the score of the link CrawlDatum to be the score of the page: float score = Float.valueOf(parse.getData().get(Fetcher.SCORE_KEY)); score /= links.length; for (int i = 0; i links.length, ...) { ... new CrawlDatum(CrawlDatum.STATUS_LINKED, interval, score); ... } 4. In CrawlDbReducer.java, remove linkCount calculations. Replace these with something like: float scoreIncrement = 0.0f; while (values.next()) { ... switch (datum.getStatus()) { ... CrawlDatum.STATUS_LINKED: scoreIncrement += datum.getScore(); break; ... } ... result.setScore(result.getScore() + scoreIncrement); I think that should do it, no? Doug Index: conf/crawl-tool.xml === --- conf/crawl-tool.xml (revision 326624) +++ conf/crawl-tool.xml (working copy) @@ -15,13 +15,6 @@ /property property - nameindexer.boost.by.link.count/name - valuetrue/value - descriptionWhen true scores for a page are multipled by the log of - the number of incoming links to the page./description -/property - -property namedb.ignore.internal.links/name valuefalse/value descriptionIf true, when adding new links to a page, links from Index: conf/nutch-default.xml === --- conf/nutch-default.xml (revision 326624) +++ conf/nutch-default.xml (working copy) @@ -440,24 +440,6 @@ !-- indexer properties -- property - nameindexer.score.power/name - value0.5/value - descriptionDetermines the power of link analyis scores. Each - pages's boost is set to iscoresupscorePower/sup/i where - iscore/i is its link analysis score and iscorePower/i is the - value of this parameter. This is compiled into indexes, so, when - this is changed, pages must be re-indexed for it to take - effect./description -/property - -property - nameindexer.boost.by.link.count/name - valuetrue/value - descriptionWhen true scores for a page are multipled by the log of - the number of incoming links to the page./description -/property - -property nameindexer.max.title.length/name value100/value descriptionThe maximum number of characters of a title that are indexed. Index: src/java/org/apache/nutch/crawl/CrawlDatum.java === --- src/java/org/apache/nutch/crawl/CrawlDatum.java (revision 326624) +++ src/java/org/apache/nutch/crawl/CrawlDatum.java (working copy) @@ -31,7 +31,7 @@ public static final String FETCH_DIR_NAME = crawl_fetch; public static final String PARSE_DIR_NAME = crawl_parse; - private final static byte CUR_VERSION = 1; + private final static byte CUR_VERSION = 2; public static final byte STATUS_DB_UNFETCHED = 1; public static final byte STATUS_DB_FETCHED = 2; @@ -47,17 +47,20 @@ private long fetchTime = System.currentTimeMillis(); private byte retries; private float fetchInterval; - private int linkCount; + private float score = 1.0f; public CrawlDatum() {} public CrawlDatum(int status, float fetchInterval) { this.status = (byte)status; this.fetchInterval = fetchInterval; -if (status == STATUS_LINKED) - linkCount = 1; } + public CrawlDatum(int status, float fetchInterval, float score) { +this(status, fetchInterval); +this.score = score; + } + // // accessor methods // @@ -80,8 +83,8 @@ this.fetchInterval = fetchInterval; } - public int getLinkCount() { return linkCount; } - public void setLinkCount(int linkCount) { this.linkCount = linkCount; } + public float getScore() { return score; } + public void setScore(float score) { this.score = score; } // // writable methods @@ -96,18 +99,18 @@ public void readFields(DataInput