On 4/19/07, Lorenzo <[EMAIL PROTECTED]> wrote:

Hi,
sorry to re-open this thread, but I am facing the same problem of Nicolás.
I like both yours (Doğacan) and Nicolas' ideas, more yours as I think
abstract
classes are not good extension points.
Anyway, is any of these implemented? I really need it!


Well, I have implemented a subset of what we discussed in
<https://issues.apache.org/jira/browse/NUTCH-468>
NUTCH-468 <https://issues.apache.org/jira/browse/NUTCH-468>. There is a lot
more to be done but IMHO, NUTCH-468 may be a good starting point.

Also, I can't understand from the docs what does it means that the
adjust datum
will update the score of the original datum in updatedb.
Update or adjusted in which way? I obtain strange values..


In ScoringFilter.updateDbScore you get a list of inlinked datums that you
can use to change score. Now, if in distributeScoreToOutlink(s) you return a
datum with a status of STATUS_LINKED, you will get this datum as one of the
inlinked datums in updateDbScore.

I hope, this clears it up a bit.

Thanks!

Lorenzo



> Hi,
> On 2/27/07, Nicolás Lichtmaier <[EMAIL PROTECTED]
> <http://www.opensubscriber.com/sendEmail.os?message=6159544&inline=0>>
> wrote:
> [snip]
> >
> > It doesn't seem a good way to do it. What if there are no outlinks?
> This
> > method won't be called at all. And anyway, it would be called once per
> > each outlink, which would multiplicate the work.
> Multiplication is easy to solve but you are right that it won't work
> if there are no outlinks.
> Maybe scoring filter api should change? A distributeScoreToOutlinks
> method may be more useful than the current one: (which will be called
> even if there are no outlinks)
> CrawlDatum distributeScoreToOutlinks(Text fromUrl, List<String>
> toUrlList, List<CrawlDatum> datumList, ParseData parseData,
> CrawlDatum adjust)
> This method gives more control to the plugin since knowing all the
> outlinks the plugin can make more informed decisions. Like, right now,
> there is no way a scoring filter can be sure that it has distributed
> all its cash (e.g if db.score.internal.link is 0.5 and
> db.score.external.link is 1.0, filter will almost always distribute
> less than its cash).
> This will also work for your case, since you will just ignore the
> outlinks and return the adjust datum based on information in parse
> metadata.
> What do you (and others) think?
> >
> > Thanks!
> >
> >
> --
> Doğacan Güney




--
Doğacan Güney
-------------------------------------------------------------------------
This SF.net email is sponsored by DB2 Express
Download DB2 Express C - the FREE version of DB2 express and take
control of your XML. No limits. Just data. Click to get it now.
http://sourceforge.net/powerbar/db2/
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to