[jira] Commented: (NUTCH-776) Configurable queue depth

2010-01-07 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797653#action_12797653 ] Julien Nioche commented on NUTCH-776: - Did you notice any improvement in the fetch rate

help for hadoop and hbase

2010-01-07 Thread wnkdu
i need sm help here..i wuld like to hav a live chati wuld like to know smthing about using hadoop and hbase for building a search engine.how to go about doing it.i am new to hadoop -- View this message in context:

Re: [jira] Commented: (NUTCH-776) Configurable queue depth

2010-01-07 Thread MilleBii
Actually I created a key to set it adequately... The best results came with a depth of 1 and a big number of threads (I use 1800) ?!? That is because I have numerous sites (like blogs) that have different domain names and single IP... This a result of topical focused crawling. Since it was not

[Nutch Wiki] Trivial Update of PublicServers by Geoff reyMcCaleb

2010-01-07 Thread Apache Wiki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The PublicServers page has been changed by GeoffreyMcCaleb. The comment on this change is: Updated description of nsyght.com.

Potential Bug: Index documents with incorrect segment numbers

2010-01-07 Thread igor.k
Hey Guys, I've been running various crawls with Nutch and noticed some strange behavior. When examining an index with Luke, I noticed that for some documents, the segment number is incorrect. This seems to occur very rarely. Example: A document in the index will have a url : www.sample.com,

Injecting URLs and define Inlink?

2010-01-07 Thread MyD
Dear Nutch developers: Is there any way to inject URLs and define the inlink for those URLs? How and where can I find the inlink from a certain URL? Example: We inject a URL www.example.com/john_doe. We start the crawl and maybe we are crawling the URL www.example.com/john_doe4. *=

Build failed in Hudson: Nutch-trunk #1032

2010-01-07 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Nutch-trunk/1032/ -- [...truncated 1183 lines...] A src/plugin/parse-pdf/src/java/org/apache A src/plugin/parse-pdf/src/java/org/apache/nutch A

Re: Injecting URLs and define Inlink?

2010-01-07 Thread xiao yang
What do you mean? You already know the url. Why do you want to find it? On Thu, Jan 7, 2010 at 7:12 PM, MyD myd.ro...@googlemail.com wrote: Dear Nutch developers: Is there any way to inject URLs and define the inlink for those URLs? How and where can I find the inlink from a certain URL?

Re: help for hadoop and hbase

2010-01-07 Thread xiao yang
You should use Nutch for building a search engine. There is no need to use HBase. On Thu, Jan 7, 2010 at 9:41 AM, wnkdu emjac...@gmail.com wrote: i need sm help here..i wuld like to hav a live chati wuld like to know smthing about using hadoop and hbase for building a search