Re: subcollections IT WORKS

2006-12-21 Thread WebDev Freak
Were you able by any chance figure out how to search on multiple subcollections? For example let's say you have the following subcollections: books, magazines, cd, dvd, software. I would like to have a search page with checkboxes to pick any of the subcollections. For example search on books, ma

Re: Hi...How to set Nutch-0.8.1 to save logs into log files when running the crawl job?

2006-12-21 Thread Sean Dean
You can play around with these two, by setting them to "true" in your nutch-site.xml file. Hadoop logs just about everything to logs/hadoop.log. The file truncates each day automatically, and places .-- onto it. http.verbose false If true, HTTP will log more verbosely. fetcher.verbo

Hi...How to set Nutch-0.8.1 to save logs into log files when running the crawl job?

2006-12-21 Thread kevin
Hi, How to set Nutch-0.8.1 to save logs into log files when running the crawl job? Is it setting in the nutch-site.xml, or other configuration file? Thanks your help in advance! -- kevin

convert bin/nutch to use ant?

2006-12-21 Thread Phillip Rhodes
I move between XP/Mac/Sun/Linux based upon client or where I am (work vs. home) and found ant to be a good cross-platform scripting language.I went to run a nutch crawl on my XP box, and the script is not setup to run in an XP environment (yes, I could install cygwin) I have started creating a

Re: Fun question for index merge

2006-12-21 Thread sdeck
I tested this last night, so in case anyone wants to know the answer, yes, this can be done. If all you need are the Lucene indexes for your website, you can do the crawl, do another crawl, and then do an IndexMerger (from the nutch.crawl api dir) Then do a DeleteDuplicates on that new index wham

Re: Nutch 0.9 logging to catalina.out fails

2006-12-21 Thread RP
That was it - the log4j.properties in the original nutch.war under 0.8 is NOT the same as the log4j.properties in the 0.8 conf directory (which is the same as the 0.9 one) Thx for pointing me in the right direction rp Sean Dean wrote: I think it might be getting logged into a file,

Re: Nutch 0.9 logging to catalina.out fails

2006-12-21 Thread Sean Dean
I think it might be getting logged into a file, that you specified with that command line option OR either Tomcat or Nutch isn't reading the "log4j.properties" file in your servlet container under ROOT/WEB-INF/classes/. Do you have that file there? If so, does it have something like "log4j.ro

Re: dump page content to Windows file system?

2006-12-21 Thread Dennis Kubes
If you mean from the DFS to local filesystem you can do a copyToLocal. If you mean from a binary to a readable format your would need to write a MapReduce job and specify a TextOutputFormat. If you are trying to read the crawl database you can use the nutch readdb command. Dennis David Barg

Re: Cannot generate all injected URLS

2006-12-21 Thread Dennis Kubes
What was the problem? Dennis Frank Kempf wrote: solved THX

Re: Which Operating-System do you use for Nutch

2006-12-21 Thread Dennis Kubes
Fedora Core 5 minimal install with Java 1.5.10 Tomi NA wrote: On 9/26/06, Jim Wilson <[EMAIL PROTECTED]> wrote: I'd do it, but I'm too busy being consumed with worries about the lack of support for HTTP/NTLM credentials and SMB fileshare indexing. Arrrgg - tis another sad day in the life of t

Re: Nutch 0.9 logging to catalina.out fails

2006-12-21 Thread RP
That took out the error (thanks!) but my log file does not show what it did in the past (below) and all I did was switch from 0.8 to 0.9..?? Current log just has the server startup and nothing from Nutch..?? 0.8 logging INFO: Server startup in 36336 ms 2006-11-28 16:14:16,685 INFO Configurat

Re: unavailable robots.txt kills fetch (not NUTCH-344)

2006-12-21 Thread Andrzej Bialecki
Carsten Lehmann wrote: Dear List, I think there is another robots.txt-related problem which is not adressed by NUTCH-344, but also results in an aborted fetch. I am sure that in my last fetch all fetcher threads died while they were waiting for a robots.txt-file to be delivered by a not properl

Re: Nutch 0.9 logging to catalina.out fails

2006-12-21 Thread Andrzej Bialecki
RP wrote: No changes to logging configuration that worked fine at 0.8 but at 0.9 I get this once I do a query (query returns just fine): INFO: Server startup in 1947 ms log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: / (Is a directory) Most likely your "hadoop.l

unavailable robots.txt kills fetch (not NUTCH-344)

2006-12-21 Thread Carsten Lehmann
Dear List, I think there is another robots.txt-related problem which is not adressed by NUTCH-344, but also results in an aborted fetch. I am sure that in my last fetch all fetcher threads died while they were waiting for a robots.txt-file to be delivered by a not properly responding web server.