Were you able by any chance figure out how to search on multiple
subcollections? For example let's say you have the following
subcollections: books, magazines, cd, dvd, software. I would like to have a
search page with checkboxes to pick any of the subcollections. For example
search on books, ma
You can play around with these two, by setting them to "true" in your
nutch-site.xml file. Hadoop logs just about everything to logs/hadoop.log. The
file truncates each day automatically, and places .-- onto it.
http.verbose
false
If true, HTTP will log more verbosely.
fetcher.verbo
Hi,
How to set Nutch-0.8.1 to save logs into log files when running the crawl
job?
Is it setting in the nutch-site.xml, or other configuration file?
Thanks your help in advance!
--
kevin
I move between XP/Mac/Sun/Linux based upon client or where I am (work
vs. home) and found ant to be a good cross-platform scripting language.I
went to run a nutch crawl on my XP box, and the script is not setup to
run in an XP environment (yes, I could install cygwin)
I have started creating a
I tested this last night, so in case anyone wants to know the answer, yes,
this can be done.
If all you need are the Lucene indexes for your website, you can do the
crawl, do another crawl, and then do an
IndexMerger (from the nutch.crawl api dir)
Then do a DeleteDuplicates on that new index
wham
That was it - the log4j.properties in the original nutch.war under 0.8
is NOT the same as the log4j.properties in the 0.8 conf directory (which
is the same as the 0.9 one) Thx for pointing me in the right
direction
rp
Sean Dean wrote:
I think it might be getting logged into a file,
I think it might be getting logged into a file, that you specified with that
command line option OR either Tomcat or Nutch isn't reading the
"log4j.properties" file in your servlet container under ROOT/WEB-INF/classes/.
Do you have that file there?
If so, does it have something like "log4j.ro
If you mean from the DFS to local filesystem you can do a copyToLocal.
If you mean from a binary to a readable format your would need to write
a MapReduce job and specify a TextOutputFormat. If you are trying to
read the crawl database you can use the nutch readdb command.
Dennis
David Barg
What was the problem?
Dennis
Frank Kempf wrote:
solved
THX
Fedora Core 5 minimal install with Java 1.5.10
Tomi NA wrote:
On 9/26/06, Jim Wilson <[EMAIL PROTECTED]> wrote:
I'd do it, but I'm too busy being consumed with worries about the
lack of
support for HTTP/NTLM credentials and SMB fileshare indexing.
Arrrgg - tis another sad day in the life of t
That took out the error (thanks!) but my log file does not show what it
did in the past (below) and all I did was switch from 0.8 to 0.9..??
Current log just has the server startup and nothing from Nutch..??
0.8 logging
INFO: Server startup in 36336 ms
2006-11-28 16:14:16,685 INFO Configurat
Carsten Lehmann wrote:
Dear List,
I think there is another robots.txt-related problem which is not
adressed by NUTCH-344,
but also results in an aborted fetch.
I am sure that in my last fetch all fetcher threads died
while they were waiting for a robots.txt-file to be delivered by a not
properl
RP wrote:
No changes to logging configuration that worked fine at 0.8 but at 0.9
I get this once I do a query (query returns just fine):
INFO: Server startup in 1947 ms
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: / (Is a directory)
Most likely your "hadoop.l
Dear List,
I think there is another robots.txt-related problem which is not
adressed by NUTCH-344,
but also results in an aborted fetch.
I am sure that in my last fetch all fetcher threads died
while they were waiting for a robots.txt-file to be delivered by a not
properly responding web server.
14 matches
Mail list logo