date:20060810

Re: Stalling during fetch (0.7)

2006-08-10 Thread Benjamin Higgins

Further details: If I run strace on the process, it looks like this, over and over and over: gettimeofday({1155249187, 52}, NULL) = 0 gettimeofday({1155249188, 389}, NULL) = 0 gettimeofday({1155249188, 679}, NULL) = 0 gettimeofday({1155249188, 955}, NULL) = 0 clock_gettime(CLOCK_REALTI

Re: More Fetcher NullPointerException

2006-08-10 Thread Raphael Hoffmann

I had the same problem before. Just read http://www.mail-archive.com/nutch-dev%40lucene.apache.org/msg04303.html Make that tiny change on line 385 of HttpBase.java and it will work fine. Raphael Sellek, Greg wrote: I am experiencing the same issue as a similar post for 8/6. Whenever I try

Stalling during fetch (0.7)

2006-08-10 Thread Benjamin Higgins

Hello, Nutch is stalling in the fetch process. I've run it twice now, and it is stopping on the *same* URL both times. I don't get what's going on! The last status report was: 060810 145315 status: segment 20060810142649, 7900 pages, 14 errors, 98421231 bytes, 1571224 ms 060810 145315 status: 5

crawl-urlfilter subpages of domains

2006-08-10 Thread Jens Martin Schubert

Hello, is it possible to crawl e.g. http://www.domain.com, but to skip crawling all urls matching to (http://www.domain.com/subpage/) I tried to achieve this with crawl-urlfilter.txt/regex-urlfilter.txt. but it doesn't work: -ftp.tu-clausthal.de -^http://([a-z0-9]*\.)asta.tu-clausthal.de/de/m

Nutch vs. Google Appliance

2006-08-10 Thread Stevenson, Kerry

Hello all - I have been taking a look at Nutch for purposes of indexing a large pile of internal LAN files at our company, and so far it looks quite impressive. I believe it could substitute for the Google Mini appliance. However, the bigger Google boxes add more features that I am not sure can be

common-terms.utf8

2006-08-10 Thread Lourival Júnior

Hi, Could anyone explain me what does exactly the common-terms.utf8 file? I don't understand the real functionality of this file... Regards, -- Lourival Junior Universidade Federal do Pará Curso de Bacharelado em Sistemas de Informação http://www.ufpa.br/cbsi Msn: [EMAIL PROTECTED]

file access rights/permissions considerations - the least painful way

2006-08-10 Thread Tomi NA

I'm interested in crawling multiple shared folders (among other things) on a corporate LAN. It is a LAN of MS clients with Active Directory managed accounts. The users routinely access the files based on ntfs-level (and sharing?) permissions. Idealy, I'd like to set up a central server (probably

More Fetcher NullPointerException

2006-08-10 Thread Sellek, Greg

I am experiencing the same issue as a similar post for 8/6. Whenever I try and fetch pages, I see a lot of "fetch of xxx failed with: java.lang.NullPointerException" I have put the appropriate agent info in both the nutch-default and nutch-site config files. I tried using DEBUG logging to get m

Re: number of mapper

2006-08-10 Thread Dennis Kubes

Take a look at this, http://wiki.apache.org/lucene-hadoop/HowManyMapsAndReduces It will answer why you have a few more map tasks that are set in the configuration. Dennis Murat Ali Bayir wrote: my configs are given below: in hadoop-site number of mapper = 130 in my code I use job.setNumMapT

Re: number of mapper

2006-08-10 Thread Murat Ali Bayir

my configs are given below: in hadoop-site number of mapper = 130 in my code I use job.setNumMapTasks = 130 in hadoop-default numberof mapper = 2 in this configuration I have taken 135 mapper in my job. However there is no problem in number of reducer. Andrzej Bialecki wrote: Murat Ali Bayir

Re: number of mapper

2006-08-10 Thread Andrzej Bialecki

Murat Ali Bayir wrote: Hi everbody, Although I change the number of mappers in hadoop-site.xml and use job.setNumMapTasks method the system gives another number as a number of mapper, the problem only occurs for number of mapper, number of reducers works correctly. What I have to do for setti

Re: number of mapper

2006-08-10 Thread Murat Ali Bayir

it can not be problem, it only restrict the number of tasks running simultaneously, there can be pending tasks also, i check that this not problem. I am not sure but I notice that the number of mapper tasks is equal to k*number of different parts in input path. To illusrate I have 15 parts in

Re: problems with start-all command

2006-08-10 Thread Dennis Kubes

The name node is running. Run the bin/stop-all.sh script first and then do a ps -ef | grep NameNode to see if the process is still running. If it is, it may need to be killed by hand kill -9 processid. The second problem is the setup of ssh keys as described in previous email. Also I would re

Re: number of mapper

2006-08-10 Thread Dennis Kubes

There is also a mapred.tasktracker.tasks.maximum variable which may be causing the task number to be different. Dennis Murat Ali Bayir wrote: Hi everbody, Although I change the number of mappers in hadoop-site.xml and use job.setNumMapTasks method the system gives another number as a number o

Index with synonyms

2006-08-10 Thread Keyserzero

Hey list, I would like to ask you if it is possible to start a search query with a simple word (e.g. "Home"). Then Nutch will lookup the word “Home” in a list with synonyms. Nutch will then recognize that “House” is a synonym for “Home”. Now, Nutch can start a search query with “House” and “Ho

number of mapper

2006-08-10 Thread Murat Ali Bayir

Hi everbody, Although I change the number of mappers in hadoop-site.xml and use job.setNumMapTasks method the system gives another number as a number of mapper, the problem only occurs for number of mapper, number of reducers works correctly. What I have to do for setting the number of mappers

Extended crawling configuration with "mapred.input.value.class"?

2006-08-10 Thread Timo Scheuer

Hi, I am interested in more comprehensive configuration of the crawl targets. The actual version only supports lists (files) containing URLs. One thing that could be desirable is the injection of URLs with metadata attached. This metadata (inserted into the CrawlData object) could be read by pl

problem with the DFS commande

2006-08-10 Thread kawther khazri

hello, When I execute the DFS commande,I have this: [EMAIL PROTECTED] search]$ bin/start-all.sh starting namenode, logging to /nutch/search/logs/hadoop-nutch-namenode-localhost.out The authenticity of host 'localhost (127.0.0.1)' can't be established. RSA key fingerprint is 81:0e:49:ce:61:8c:7b:09

Crawling flash

2006-08-10 Thread Iain

I want to include embedded flash in my crawls. Despite (apparently successfully) including the parse-swf plugin, embedded flash does not seem to be retrieved. Im assuming that the object tags are not being parsed to find the .swf files. Can anyone comment? Thanks Iain

problems with start-all command

2006-08-10 Thread kawther khazri

[input] [input] [input] [input] hello, we are trying to install nutch in single machine using this guide: "http://wiki.apache.org/nutch/NutchHadoopTutorial?highlight=%28nutch%29";, we are blocked in this step: *first we execute this command

Re: Stalling during fetch (0.7)

Re: More Fetcher NullPointerException

Stalling during fetch (0.7)

crawl-urlfilter subpages of domains

Nutch vs. Google Appliance

common-terms.utf8

file access rights/permissions considerations - the least painful way

More Fetcher NullPointerException

Re: number of mapper

Re: number of mapper

Re: number of mapper

Re: number of mapper

Re: problems with start-all command

Re: number of mapper

Index with synonyms

number of mapper

Extended crawling configuration with "mapred.input.value.class"?

problem with the DFS commande

Crawling flash

problems with start-all command

20 matches

Site Navigation

Mail list logo

Footer information