Re: what is the difference between nutch and some other opensource search engines

2008-04-09 Thread ogjunk-nutch
Broad question, broad answer: free, scalable, extensible, open-source are a few 
characteristics that come to mind.
 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

- Original Message 
From: minskv [EMAIL PROTECTED]
To: nutch-dev nutch-dev@lucene.apache.org
Sent: Wednesday, April 9, 2008 2:44:51 PM
Subject: what is the difference between nutch and some other opensource search 
engines

and what is the main competitive strength of nutch?

2008-04-10 



minskv 





Hudson build is back to normal: Nutch-trunk #416

2008-04-09 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/416/changes




[jira] Created: (NUTCH-627) Minimize host address lookup

2008-04-09 Thread Otis Gospodnetic (JIRA)
Minimize host address lookup


 Key: NUTCH-627
 URL: https://issues.apache.org/jira/browse/NUTCH-627
 Project: Nutch
  Issue Type: Improvement
  Components: generator
Reporter: Otis Gospodnetic
 Attachments: NUTCH-627.patch

The simple patch that I'm about to attach keeps track of hosts whose max URLs 
per host limit we already reached, as well as hosts whose hostname-IP lookup 
already failed.  For such hosts, further DNS lookups are skipped:
- there is no point in looking up a hostname yet again if we already have the 
max number of URLs for that host
- there is little point in attempting to look up a hostname yet again if the 
previous lookup already failed

In a simple test, this saved a few hundred thousand lookups for the first case 
and a few hundred lookups for the second case.

If nobody complains, I'll commit by the end of the week.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-500) Add hadoop masters configuration file into conf folder

2008-04-09 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12587480#action_12587480
 ] 

Hudson commented on NUTCH-500:
--

Integrated in Nutch-trunk #416 (See 
[http://hudson.zones.apache.org/hudson/job/Nutch-trunk/416/])

 Add hadoop masters configuration file into conf folder
 --

 Key: NUTCH-500
 URL: https://issues.apache.org/jira/browse/NUTCH-500
 Project: Nutch
  Issue Type: Improvement
  Components: ndfs
Affects Versions: 0.9.0
 Environment: Linux Fedora 7, Java 1.5
Reporter: Emmanuel Joke
Assignee: Dennis Kubes
Priority: Minor
 Fix For: 1.0.0

 Attachments: NUTCH-500-1-20080331.patch


 Hadoop scripts read a configuration file named masters to know how many 
 namenode should be started.
 This file is not in the repository for the moment, thus it generate some 
 errors message (error which is not really important)  when we start the 
 cluster.
 Anyway it could be a good idea to add a template file in the conf directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-627) Minimize host address lookup

2008-04-09 Thread Otis Gospodnetic (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Otis Gospodnetic updated NUTCH-627:
---

Attachment: NUTCH-627.patch

 Minimize host address lookup
 

 Key: NUTCH-627
 URL: https://issues.apache.org/jira/browse/NUTCH-627
 Project: Nutch
  Issue Type: Improvement
  Components: generator
Reporter: Otis Gospodnetic
 Attachments: NUTCH-627.patch


 The simple patch that I'm about to attach keeps track of hosts whose max 
 URLs per host limit we already reached, as well as hosts whose hostname-IP 
 lookup already failed.  For such hosts, further DNS lookups are skipped:
 - there is no point in looking up a hostname yet again if we already have the 
 max number of URLs for that host
 - there is little point in attempting to look up a hostname yet again if the 
 previous lookup already failed
 In a simple test, this saved a few hundred thousand lookups for the first 
 case and a few hundred lookups for the second case.
 If nobody complains, I'll commit by the end of the week.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.