I've not looked but do they have a robots.txt file or META tag set that
may be stopping things..??
rp
Meryl Silverburgh wrote:
All,
Can you please help me with my problem? I have posted my question a
few time, but I still cant solve it. I appreciate if anyone can help
me with that.
tch rate is way below 1
page/sec when I do this. Same crawl list with old fetch flies by
comparison ~ 10pages/sec Box is only doing nutch. Anyone else seen
this or have a workaround..??
--
rp
- at
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1487)
2007-02-15 20:05:15,307 INFO NutchController - at
java.lang.Thread.run(Thread.java:595)
RP wrote:
Hi all,
Pulled down the WEB2 stuff via SVN to finally look at the keymatch and
spellchecker stuff. Did the ANT thing pe
at
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1487)
at java.lang.Thread.run(Thread.java:595)
Thanks,
rp
M
down to 128MB and see what happens
rp
Sean Dean wrote:
It looks like you don't have enough RAM to maintain the quick speeds you were
seeing when the index was only around 3000 pages.
Nutch scales very well, but the hardware behind it must also. Using quick calculations and common
from commercial engines?
Tweakability is good, but real world performance is better and we just
want to be sure our results are based on something more than my playing
around with the values
--
rp
That was it - the log4j.properties in the original nutch.war under 0.8
is NOT the same as the log4j.properties in the 0.8 conf directory (which
is the same as the 0.9 one) Thx for pointing me in the right
direction
rp
Sean Dean wrote:
I think it might be getting logged into a file
startup in 5967 ms
Andrzej Bialecki wrote:
RP wrote:
No changes to logging configuration that worked fine at 0.8 but at
0.9 I get this once I do a query (query returns just fine):
INFO: Server startup in 1947 ms
log4j:ERROR setFile(null,true) call failed.
java.io.FileNotFoundException: / (Is a
ter the fetch is done, and
my combined times seem shorter than doing it in one step as I'm also CPU
AND bandwidth throttled. As always, your mileage may vary so give some
things a try and you might get a nice surprise in improved speed
--
rp
(Http11AprProtocol.java:706)
at
org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1487)
at java.lang.Thread.run(Thread.java:595)
log4j:ERROR Either File or DatePattern options are not set for appender
[DRFA].
--
rp
-keymatch-onebox.googlecode.com/svn/trunk/Keymatch.java
2006/12/19, RP <[EMAIL PROTECTED]>:
Let me qualify this - ad banner rotation is dealt with - I'm looking for
something that will use our Nutch engine to serve up relevant links from
people who pay for that privilege. We do not wa
rve those links up
to the searchers to help offset the cost of the service and serve up or
flag links that rank first because of payment followed by normal search
link results
rp
Sean Dean wrote:
I might be totally off base with what your asking to do, but take a look at
this open sourc
to make this happen so if there
are any easy or "best practices" ways on doing this any help/pointers
would be appreciated
--
rp
at org.apache.nutch.crawl.Crawl.main(Crawl.java:130)
Is this a bug or my instance's misconfiguration?
Running on single box, java-1.5.0.09
Thanks.
--
rp
Fixed - re-ran merge again and all is well - a trip through the indexes
with Luke showed the original segments still there and the re-run took
care of it
rp
RP wrote:
Upgrade proceeding on 0.9x - was able to parse and index just fine
after first halt, but now I get an error in the logs
ange from default 8 to 9
procedure was the addition of the mapred.speculation=false which cured
the parse error and allowed me to continue and index. Only other thing
may be the linkdb not being put where it was told, but I moved it over
(it built fine) and the index operation ran without issue.
--
rp
Andrzej - Thanks that conf switch seemed to take care of it...! Any
idea if the Hadoop native stuff will give any performance boost..??
rp
Andrzej Bialecki wrote:
RP wrote:
Pardon my ignorance, but where and how do I do this..?? Nothing in
conf files that I can see as a switch
Ah-ha
Pardon my ignorance, but where and how do I do this..?? Nothing in conf
files that I can see as a switch
rp
Andrzej Bialecki wrote:
RP wrote:
2006-12-15 00:52:43,895 WARN mapred.LocalJobRunner - job_dokmpz
java.lang.NullPointerException
at
d.PhasedFileSystem.commit(PhasedFileSystem.java:211)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:315)
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:137)
--
rp
19 matches
Mail list logo