hello,
I use the Prune tool to remove documents from segment indexes but it does not
remove pages and links from WebDB.
To prevent the presence of the unwanted URLs when new segments are created, it
is advised to use our own link net.nutch.net.URLFilter, or PruneDBTool (under
construction...
hi , i'm new to nutch ,i want to know what's the useness of the
subcollection plugin?
where is the introduction?
On 12/19/06, liv <[EMAIL PROTECTED]> wrote:
I may be loosing all and every credit ... it's still in the same state -
reindex doesn't change the subcollection field!
I did a REFET
look here:
http://issues.apache.org/jira/browse/NUTCH-201?page=all
unfortunately it doesn't work as expected... yet
kauu wrote:
>
> hi , i'm new to nutch ,i want to know what's the useness of the
> subcollection plugin?
> where is the introduction?
>
--
View this message in context:
http:/
I checked the patch for subcollections
(http://issues.apache.org/jira/browse/NUTCH-201) - although I assumed it is
included in the latest public release 0.8.1.
Compared to the current source code, it looks like having has an extra file
(which doesn't exist in version 0.8.1)
src/plugin/subcollect
Hi all,
I've been tasked with looking into this and am not a coder - that said,
Nutch is doing great and the bean counters have asked me to look into
adding sponsored link results and I'm wondering how best to add this.
It would be nice to utilize the Nutch engine to come up with the pages
You may want to consider letting a third-party handle your sponsored links,
unless of course you already have an infrastructure for handling everything
you already mentioned as well as the following:
* Advertiser registration
* Advertiser purchase of keywords/page space
* Calculation of impressio
I might be totally off base with what your asking to do, but take a look at
this open source project: http://phpadsnew.com/two/.
Its basically an advertising engine, built on PHP. Integration within any
application is a breeze, and it supports external advertising such as Google
Ads.
Sean
--
Let me qualify this - ad banner rotation is dealt with - I'm looking for
something that will use our Nutch engine to serve up relevant links from
people who pay for that privilege. We do not want to serve up ad's from
someone else's system i.e. the big G or Y, but use our own Nutch search
resu
Are you looking for something like the google keymatch as described in [1]
which was then more or less mimiced in nutch web2 module[1],
and since also atleast as a lookalike released in google code [3]
--
Sami Siren
[1] http://www.google.com/enterprise/mini/end_user_features.html
[2]
http://svn.
Thanks Sami,
This is closer from an initial look - does this do anything on the
backend (i.e. defining the data flags sow e can get a match) as well or
do we need to build that..??
Sami Siren wrote:
Are you looking for something like the google keymatch as described in
[1]
which was then mo
For anyone searching this thread in the future. One possible cause of
this is when the hadoop nodes are not time synchronized with ntp or
something similar.
For example if one or more of the slave nodes is a few minutes ahead of
the others and an inject job is run on one of those nodes (and
Hello,
I am running nutch .8 against hadoop .4, just for reference
I want to add a delete duplicate based on a similarity algorithm, as opposed
to the hash method that is currently in there.
I would have to say I am pretty lost as to how the delete duplicates class
is working.
I would guess that
12 matches
Mail list logo