Hi Mike,

Yes it is possible to extend the TLD list. In fact, when the TLD lost was
compiled the author left a note explicitly stating that it may not be
complete.
https://github.com/apache/nutch/blob/master/conf/domain-suffixes.xml.template
Please submit a PR if you wish to make any changes or additions. You can
use the parser checker tool to validate your change before creating the PR.
Thanks
lewismc

On Tue, Nov 8, 2022 at 02:16 <user-digest-h...@nutch.apache.org> wrote:

>
> ---------- Forwarded message ----------
> From: Mike <mz579...@gmail.com>
> To: user@nutch.apache.org
> Cc:
> Bcc:
> Date: Tue, 8 Nov 2022 11:15:51 +0100
> Subject: Incomplete TLD List
> Hi!
> Some of the new TLDs are wrongly indexed by Nutch, is it possible to extend
> the TLD list?
>
>         "url":"https://about.google/intl/en_FR/how-our-business-works/";,
>         "tstamp":"2022-11-06T17:22:14.808Z",
>         "domain":"google",
>         "digest":"3b9a23d42f200392d12a697bbb8d4d87",
>
>
> Thanks
>
> Mike
>
-- 
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc

Reply via email to