Google feature in Nutch

2006-08-11 Thread Florian Fricker
Hello When i start a search query in Google e.x. “Explore” then Google tells me the first few results. After this results Google displays more results but with the search term, for example Search. The meaning is more or less the same, but with the query “Search” you will got more results

Re: Querying Fields

2006-08-11 Thread Lukas Vlcek
Hi, You need to look into source to find out what exactly it does. As far as I know it does not add any new filed into index (it should be done via index-more plugin) but it allows you to query using type: date: and site: I think. Lukas On 8/9/06, Lourival Júnior [EMAIL PROTECTED] wrote: What

Re: common-terms.utf8

2006-08-11 Thread Timo Scheuer
Hi, Could anyone explain me what does exactly the common-terms.utf8 file? I don't understand the real functionality of this file... During indexing (and also during searching) the common terms are used to form n-grams to make search faster for common words like articles for example. It is an

Re: common-terms.utf8

2006-08-11 Thread Lourival Júnior
Hi Timo! Thanks a lot! now I have a clearly knowledge about this file. This article helps a lot too: http://searchenginewatch.com/showPage.html?page=2156061 Thanks again! On 8/11/06, Timo Scheuer [EMAIL PROTECTED] wrote: Hi, Could anyone explain me what does exactly the common-terms.utf8

Re: Querying Fields

2006-08-11 Thread Lourival Júnior
Yes yes, I tested the index-more and query-more plugin. They allows to search these fields easily. However if I could find a documentation about they I would not spend time thinking in a solution. Thanks a lot! On 8/11/06, Lukas Vlcek [EMAIL PROTECTED] wrote: Hi, You need to look into source

Re: common-terms.utf8

2006-08-11 Thread Lourival Júnior
Hi Timo! I analyzed to index before and after using correctly the common-terms.utf8file. Before adding the common terms in my language my index had about 3mb. After add the common terms it has now 5mb! Why it occurs? Regards! On 8/11/06, Lourival Júnior [EMAIL PROTECTED] wrote: Hi Timo!

RE: More Fetcher NullPointerException

2006-08-11 Thread Sellek, Greg
Thanks, that did the trick. -Original Message- From: Raphael Hoffmann [mailto:[EMAIL PROTECTED] Sent: Thursday, August 10, 2006 5:13 PM To: nutch-user@lucene.apache.org Subject: Re: More Fetcher NullPointerException I had the same problem before. Just read

Re: Nutch vs. Google Appliance

2006-08-11 Thread Sami Siren
Stevenson, Kerry wrote: Hello all - I have been taking a look at Nutch for purposes of indexing a large pile of internal LAN files at our company, and so far it looks quite impressive. I believe it could substitute for the Google Mini appliance. However, the bigger Google boxes add more features

Re: [Fwd: Re: 0.8 Recrawl script updated]

2006-08-11 Thread Jacob Brunson
Matthew, Looking over your recrawl script, it seems like you are merging *all* segments together, including any old segments. It seems to me that you could just be merging only the new segments together. Maybe you could explain a little of the reasoning behind this. Thanks, Jacob Brunson On

turn on debug log on nutch-0.8.

2006-08-11 Thread Feng Ji
Hi there, I found nutch-0.8. using apache's commons logging system http://jakarta.apache.org/commons/logging/apidocs/index.html under the developing stage, I'd like to turn on debug mode if (log.isDebugEnabled()) { ... I checked nutch-default.xml, but can't find a place to turn it on. Does

Re: turn on debug log on nutch-0.8.

2006-08-11 Thread Raphael Hoffmann
see /conf/log4j.propertiesl, just set the debug level of nutch or hadoop to DEBUG, by default debugging output is being written to /log/hadoop.log. Feng Ji wrote: Hi there, I found nutch-0.8. using apache's commons logging system http://jakarta.apache.org/commons/logging/apidocs/index.html

Re: Nutch vs. Google Appliance

2006-08-11 Thread jian chen
The thing I don't like commercial products like google mini or similar is that, they charge you based on the number of documents allowable for indexing. While in its core, the software probably is the same with just some switches turned on and off. I know that you can use httpclient and java to

Re: [Nutch-general] log4j.properties

2006-08-11 Thread ogjunk-nutch
- Original Message From: Murat Ali Bayir [EMAIL PROTECTED] To: nutch-user@lucene.apache.org Sent: Friday, August 11, 2006 6:56:41 AM Subject: [Nutch-general] log4j.properties Hello everbody. here is the first few lines of my log4j.properties file log4j.rootLogger=INFO,DRFA,stdout #

log4j.properties bug(?)

2006-08-11 Thread ogjunk-nutch
Hello, I noticed the following line in conf/log4j.properties: log4j.appender.DRFA.File=${hadoop.log.dir}/${hadoop.log.file} I noticed that the ${hadoop.log.dir}/${hadoop.log.file} sometimes gets interpreted as /, indicating that the 2 hadoop properties there are undefined. I also noticed

hadoop.log vs. nutch.log

2006-08-11 Thread ogjunk-nutch
Hi, In bin/nutch I saw this: if [ $NUTCH_LOGFILE = ]; then NUTCH_LOGFILE='hadoop.log' fi Wouldn't it make more sense to name the file nutch.log? Everything there is Nutch-specific - Injector, Generator and I see some mapred things. But as a Nutch user I'd expect to see nutch.log,

Re: [Nutch-general] common-terms.utf8

2006-08-11 Thread ogjunk-nutch
This is because Nutch turns those common terms into ngrams (not sure of what size), and that increases the size of the index. For example, if you have a phrase like: vacation time Normally, Nutch will index this phrase as 2 terms, a total of 12 characters (probably less, if these words are

build kaput: plugins' jars can't be found

2006-08-11 Thread ogjunk-nutch
Hi, While building Nutch, I noticed several places where various Jars from plugins' lib directories could not be found, for example: $ ant package ... deploy: [copy] Warning: Could not find file /home/otis/dev/repos/lucene/nutch/trunk/build/lib-log4j/lib-log4j.jar to copy. init:

Re: log4j.properties bug(?)

2006-08-11 Thread Sami Siren
[EMAIL PROTECTED] wrote: I assume the idea is that the JVM knows about hadoop.log.dir system property, and then log4j knows about it, too. However, it doesn't _always_ work. That is, when invoking various bin/nutch commands as described in http://lucene.apache.org/nutch/tutorial8.html , this

Re: hadoop.log vs. nutch.log

2006-08-11 Thread Sami Siren
[EMAIL PROTECTED] wrote: Hi, In bin/nutch I saw this: if [ $NUTCH_LOGFILE = ]; then NUTCH_LOGFILE='hadoop.log' fi Wouldn't it make more sense to name the file nutch.log? Everything there is Nutch-specific - Injector, Generator and I see some mapred things. But as a Nutch user I'd

On fetcher slowness

2006-08-11 Thread ogjunk-nutch
Hello, Several people reported issues with slow fetcher in 0.8... I run Nutch on a dual CPU (+HT) box, and have noticed that the fetch speed didn't increase when I went from using 100 threads, to 200 threads. Has anyone else observed the same? I was using 2 map tasks (mapred.map.tasks

Re: [Nutch-general] log4j.properties bug(?)

2006-08-11 Thread Sami Siren
[EMAIL PROTECTED] wrote: Hi Sami, - Original Message [EMAIL PROTECTED] wrote: I assume the idea is that the JVM knows about hadoop.log.dir system property, and then log4j knows about it, too. However, it doesn't _always_ work. That is, when invoking various bin/nutch commands as