[jira] Created: (NUTCH-743) Site search powered by Lucene/Solr
Site search powered by Lucene/Solr -- Key: NUTCH-743 URL: https://issues.apache.org/jira/browse/NUTCH-743 Project: Nutch Issue Type: New Feature Components: documentation Reporter: Sami Siren Assignee: Sami Siren Priority: Minor Replace current Nutch site search with Lucene/Solr powered search hosted by Lucid Imagination (http://www.lucidimagination.com/search). It allows one to search all of the Nutch (content from other parts of the Lucene ecosystem is also available) content from a single place, including web, wiki, JIRA and mail archives. Lucid has a fault tolerant setup with replication and fail over as well as monitoring services in place. A preview of the site with the new search enabled is available at http://people.apache.org/~siren/site/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (NUTCH-743) Site search powered by Lucene/Solr
[ https://issues.apache.org/jira/browse/NUTCH-743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sami Siren updated NUTCH-743: - Attachment: NUTCH-743.patch If there are no objections I will commit this within a week or so. Site search powered by Lucene/Solr -- Key: NUTCH-743 URL: https://issues.apache.org/jira/browse/NUTCH-743 Project: Nutch Issue Type: New Feature Components: documentation Reporter: Sami Siren Assignee: Sami Siren Priority: Minor Attachments: NUTCH-743.patch Replace current Nutch site search with Lucene/Solr powered search hosted by Lucid Imagination (http://www.lucidimagination.com/search). It allows one to search all of the Nutch (content from other parts of the Lucene ecosystem is also available) content from a single place, including web, wiki, JIRA and mail archives. Lucid has a fault tolerant setup with replication and fail over as well as monitoring services in place. A preview of the site with the new search enabled is available at http://people.apache.org/~siren/site/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (NUTCH-743) Site search powered by Lucene/Solr
[ https://issues.apache.org/jira/browse/NUTCH-743?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723176#action_12723176 ] Andrzej Bialecki commented on NUTCH-743: - +1, based on the outcome of a thorough discussion of pros/cons of the same subject on the Lucene lists. Site search powered by Lucene/Solr -- Key: NUTCH-743 URL: https://issues.apache.org/jira/browse/NUTCH-743 Project: Nutch Issue Type: New Feature Components: documentation Reporter: Sami Siren Assignee: Sami Siren Priority: Minor Attachments: NUTCH-743.patch Replace current Nutch site search with Lucene/Solr powered search hosted by Lucid Imagination (http://www.lucidimagination.com/search). It allows one to search all of the Nutch (content from other parts of the Lucene ecosystem is also available) content from a single place, including web, wiki, JIRA and mail archives. Lucid has a fault tolerant setup with replication and fail over as well as monitoring services in place. A preview of the site with the new search enabled is available at http://people.apache.org/~siren/site/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Per-host fetch-interval
Hi, I was wondering what would be the best way to configure per-host re-crawl intervals. The default db.fetch.interval applies to all URLs, but I'd like for some hosts to be recrawled more frequently. Is there a JIRA ticket open on this? I haven't been able to find one Sandeep
[jira] Commented: (NUTCH-729) NPE in FieldIndexer when BasicFields url doesn't exist
[ https://issues.apache.org/jira/browse/NUTCH-729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12723286#action_12723286 ] Tadesse Sefer commented on NUTCH-729: - Where do you change the logging to use a url key? NPE in FieldIndexer when BasicFields url doesn't exist -- Key: NUTCH-729 URL: https://issues.apache.org/jira/browse/NUTCH-729 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 0.9.0, 1.0.0 Environment: All Reporter: Dennis Kubes Assignee: Dennis Kubes Fix For: 1.1 Attachments: NUTCH-729-1-20090235.patch There is a NullPointerException during a logging call in FieldIndexer when there isn't a url for a document. Documents shouldn't be without urls but since the FieldIndexer doesn't validate fields it is possible for it to occur. Most often this happens when BasicFields is run with the wrong segments directory and doesn't complain. It could also occur if using the FieldIndexer to index things other than basic fields. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Build failed in Hudson: Nutch-trunk #854
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/854/ -- [...truncated 4676 lines...] deploy: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex copy-generated-lib: [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-regex init: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/test init-plugin: deps-jar: compile: [echo] Compiling plugin: urlfilter-suffix [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/classes [javac] Note: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/src/plugin/urlfilter-suffix/src/java/org/apache/nutch/urlfilter/suffix/SuffixURLFilter.java uses unchecked or unsafe operations. [javac] Note: Recompile with -Xlint:unchecked for details. jar: [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-suffix/urlfilter-suffix.jar deps-test: deploy: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix copy-generated-lib: [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-suffix init: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/test init-plugin: deps-jar: compile: [echo] Compiling plugin: urlfilter-validator [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/classes jar: [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlfilter-validator/urlfilter-validator.jar deps-test: deploy: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator copy-generated-lib: [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlfilter-validator init: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/test init-plugin: deps-jar: compile: [echo] Compiling plugin: urlnormalizer-basic [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/classes jar: [jar] Building jar: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-basic/urlnormalizer-basic.jar deps-test: deploy: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic copy-generated-lib: [copy] Copying 1 file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/plugins/urlnormalizer-basic init: [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes [mkdir] Created dir: http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/test init-plugin: deps-jar: compile: [echo] Compiling plugin: urlnormalizer-pass [javac] Compiling 1 source file to http://hudson.zones.apache.org/hudson/job/Nutch-trunk/ws/trunk/build/urlnormalizer-pass/classes jar: [jar] Building jar: