Hi Mike, Yes it is possible to extend the TLD list. In fact, when the TLD lost was compiled the author left a note explicitly stating that it may not be complete. https://github.com/apache/nutch/blob/master/conf/domain-suffixes.xml.template Please submit a PR if you wish to make any changes or additions. You can use the parser checker tool to validate your change before creating the PR. Thanks lewismc
On Tue, Nov 8, 2022 at 02:16 <user-digest-h...@nutch.apache.org> wrote: > > ---------- Forwarded message ---------- > From: Mike <mz579...@gmail.com> > To: user@nutch.apache.org > Cc: > Bcc: > Date: Tue, 8 Nov 2022 11:15:51 +0100 > Subject: Incomplete TLD List > Hi! > Some of the new TLDs are wrongly indexed by Nutch, is it possible to extend > the TLD list? > > "url":"https://about.google/intl/en_FR/how-our-business-works/", > "tstamp":"2022-11-06T17:22:14.808Z", > "domain":"google", > "digest":"3b9a23d42f200392d12a697bbb8d4d87", > > > Thanks > > Mike > -- http://home.apache.org/~lewismc/ http://people.apache.org/keys/committer/lewismc