On 4/21/07, Lorenzo <[EMAIL PROTECTED]> wrote:
Doğacan Güney wrote:
> On 4/19/07, Lorenzo <[EMAIL PROTECTED]> wrote:
>>
>> Hi,
>> sorry to re-open this thread, but I am facing the same problem of
>> Nicolás.
>> I like both yours (Doğacan) and Nicolas' ideas, more yours as I think
>> abstract
>> classes are not good extension points.
>> Anyway, is any of these implemented? I really need it!
>
>
> Well, I have implemented a subset of what we discussed in
> <https://issues.apache.org/jira/browse/NUTCH-468>
> NUTCH-468 <https://issues.apache.org/jira/browse/NUTCH-468>. There is
> a lot
> more to be done but IMHO, NUTCH-468 may be a good starting point.
>
> Also, I can't understand from the docs what does it means that the
>> adjust datum
>> will update the score of the original datum in updatedb.
>> Update or adjusted in which way? I obtain strange values..
>
>
> In ScoringFilter.updateDbScore you get a list of inlinked datums that
you
> can use to change score. Now, if in distributeScoreToOutlink(s) you
> return a
> datum with a status of STATUS_LINKED, you will get this datum as one
> of the
> inlinked datums in updateDbScore.
>
> I hope, this clears it up a bit.
>
Uhmm... so, suppose I decided, from its content, that the current page
http://foo/bar.htm is really desiderable.
I have put in ParseData's metadata a flag to mark it.
In distributeScoreToOutlink(s) I read it from the ParseData param, and
put it in the adjust CrawlData metadata
MapWritable adjustMap = adjust.getMetaData();
adjustMap.put(key, new FloatWritable(bootsValue));
return adjust;
So in updateDbScore(Text url, CrawlDatum old, CrawlDatum datum, List
inlinked)
the adjust CrawlData will be between the inlinked List. Is it right? How
do I distinguish it?
I can put the URL in metadata too, and scroll through the list, but
maybe there is a better method?
Best approach is yours, you should put a flag in adjust datum's metadata to
mark it, then process it in updateDbScore.
Also, this CrawlDatum will be the same that is passed to indexerScore?
You get 2 CrawlDatum's in indexerScore. First is fetchDatum which is the one
in crawl_fetch that contains the fetching status. Second is dbDatum which
comes from crawldb. This dbDatum is the one that you set in
updateDbScore(The 'datum' argument of updateDbScore)
Thanks a lot!
Lorenzo
--
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers