Re: Help posting question

2024-04-19 Thread Lewis John McGibbney
Hi Sheham,

On 2024/04/19 15:18:01 Sheham Izat wrote:
> 
> My questions are:
> 
> 1) What do I need to do to get Nutch to continue working even if there are
> hung threads?

>From what I can see in the log you provided, nothing is preventing Nutch from 
>continuing to work. The Fetcher job finished successfully.

> 2) Is there a way to avoid having these hanging threads in the first place?

Several factors can lead to hung fetcher threads. Lots of questions have been 
asked on this mailing list relating to exactly this issue. I would encourage 
you to study some of the community responses and see if they assist you in a 
better understanding of the possible issues. You can filter questions in the 
mailing list search with the following criteria
* date range: more than 1 days ago
* body: hung

https://lists.apache.org/list.html?user@nutch.apache.org


Re: Help posting question

2024-04-19 Thread Sheham Izat
Hi Shashanka, All,

Thank you for your reply!

I'm using Nutch 1.19. I did the injection and segment generation using the
following commands:

bin/nutch inject crawl/crawldb urls
bin/nutch generate crawl/crawldb crawl/segments

When I run the fetch command, Nutch stops with errors about hung threads.
I've attached the fetch command output and the nutch-site.xml.

s1=`ls -d crawl/segments/2* | tail -1`
bin/nutch fetch $s1

My questions are:

1) What do I need to do to get Nutch to continue working even if there are
hung threads?
2) Is there a way to avoid having these hanging threads in the first place?

Thank you
Sheham


On Fri, Apr 19, 2024 at 1:04 AM Shashanka Balakuntala <
shbalakunt...@gmail.com> wrote:

> Hi Shehamizat,
> Please feel free to drop questions on the email itself. One of us/community
> will be glad to help on the same.
>
> *Regards*
>   Shashanka Balakuntala Srinivasa
>
>
>
> On Fri, 19 Apr 2024 at 7:15 AM, Sheham Izat  wrote:
>
> > Hi,
> >
> > I'm trying to get Nutch to work and I have issues, how can I post
> questions
> > on the group?
> >
> > Thank you,
> > Sheham
> >
>
[root@localhost apache-nutch-1.19]# bin/nutch fetch $s1
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in 
[jar:file:/opt/apache-nutch-1.19/lib/log4j-slf4j-impl-2.18.0.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in 
[jar:file:/opt/apache-nutch-1.19/lib/slf4j-reload4j-1.7.36.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]
2024-04-07 22:46:27,222 INFO o.a.n.p.PluginManifestParser [main] Plugins: 
looking in: /opt/apache-nutch-1.19/plugins
2024-04-07 22:46:27,353 INFO o.a.n.p.PluginRepository [main] Plugin 
Auto-activation mode: [true]
2024-04-07 22:46:27,354 INFO o.a.n.p.PluginRepository [main] Registered Plugins:
2024-04-07 22:46:27,354 INFO o.a.n.p.PluginRepository [main]Regex URL 
Filter (urlfilter-regex)
2024-04-07 22:46:27,354 INFO o.a.n.p.PluginRepository [main]Html Parse 
Plug-in (parse-html)
2024-04-07 22:46:27,354 INFO o.a.n.p.PluginRepository [main]HTTP Framework 
(lib-http)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]the nutch core 
extension points (nutch-extensionpoints)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]Basic Indexing 
Filter (index-basic)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]Anchor Indexing 
Filter (index-anchor)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]Tika Parser 
Plug-in (parse-tika)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]Basic URL 
Normalizer (urlnormalizer-basic)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]Regex URL 
Filter Framework (lib-regex-filter)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]Regex URL 
Normalizer (urlnormalizer-regex)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]URL Validator 
(urlfilter-validator)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]CyberNeko HTML 
Parser (lib-nekohtml)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]OPIC Scoring 
Plug-in (scoring-opic)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]Pass-through 
URL Normalizer (urlnormalizer-pass)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]Http Protocol 
Plug-in (protocol-http)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main]SolrIndexWriter 
(indexer-solr)
2024-04-07 22:46:27,355 INFO o.a.n.p.PluginRepository [main] Registered 
Extension-Points:
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (Nutch Content 
Parser)
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (Nutch URL 
Filter)
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (HTML Parse 
Filter)
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (Nutch Scoring)
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (Nutch URL 
Normalizer)
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (Nutch 
Publisher)
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (Nutch 
Exchange)
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (Nutch 
Protocol)
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (Nutch URL 
Ignore Exemption Filter)
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (Nutch Index 
Writer)
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (Nutch Segment 
Merge Filter)
2024-04-07 22:46:27,356 INFO o.a.n.p.PluginRepository [main] (Nutch 
Indexing Filter)
2024-04-07 22:46:27,367 INFO o.a.n.f.Fetcher [main] Fetcher: starting at 
2024-04-07 22:46:27
2024-04-07 22:46:27,367 INFO o.a.n.f.Fetcher [main] Fetcher: segment: 
crawl/segments/20240407224534
2024-04-07 22:46:28,109 INFO o.a.n.f.FetchItemQueues