I need a recursive file delete for cleaning up after a JUnit test.
There is one in Commons IO (org.apache.commons.io):
FileUtils.deleteDirectory(File directory)
I wonder whether I should use org.apache.commons.io as a new
jar added to lib or arrange a libtest for jars only used by
JUnit
Paul Baclace wrote:
I need a recursive file delete for cleaning up after a JUnit test.
I just now spotted:
org.apache.nutch.fs.LocalFileSystem.delete(File f)
which does what I want (recursive, local delete).
So no need for common.io.
Paul
Joshua,
We have received your message. I'm only remotely involved with
Nutch, so I'm prodding other committers to Nutch to please update the
links to take advantage of the mirroring system in place.
Please - someone reply back volunteering to correct this ASAP.
Erik
On Oct 11,
[
http://issues.apache.org/jira/browse/NUTCH-109?page=comments#action_12331877 ]
Fuad Efendi commented on NUTCH-109:
---
Ok, I'll do it tonight;
I believe fetcher.server.delay means Wait for a Response from Server, then
throw a Timeout Exception
I can also
Gal Nitzan wrote:
Hi Andrzej,
Yes, it seems like a good option. However, it is GPL, and I noticed in
one of the posts that this license is no good for apach.org :).
If you refer to the bricks automata library, it's BSD-licensed. I
mentioned in one of the posts that the Innovation
[
http://issues.apache.org/jira/browse/NUTCH-109?page=comments#action_12331897 ]
Fuad Efendi commented on NUTCH-109:
---
Opps... need to learn more!
[protocol-httpclient] Http.java is Singleton, it uses
MultiThreadedHttpConnectionManager
It uses single
Andrzej Bialecki wrote:
100k regexps is still alot, so I'm not totally sure it would be much
faster, but perhaps worth checking.
I have worked with this type of technology before (minimized,
determinized FSAs, constructed from large sets of strings expressions)
and it should be very fast to
Anyone answer this question? I see in the Hits class that there's a
boolean totalIsExact attribute, but this becomes false only when
deduplication (per site) occurs during the search. And I see that
underneath Nutch, Lucene will obtain the documents for only the top
hits.
But does Nutch/Lucene
Erik Hatcher wrote:
Please - someone reply back volunteering to correct this ASAP.
My bad. I'm fixing this right now. In 24 hours all Nutch downloads
should be through the mirrors.
Sorry!
Doug
Doug Cutting wrote:
Andrzej Bialecki wrote:
100k regexps is still alot, so I'm not totally sure it would be much
faster, but perhaps worth checking.
I have worked with this type of technology before (minimized,
determinized FSAs, constructed from large sets of strings expressions)
and it
[
http://issues.apache.org/jira/browse/NUTCH-109?page=comments#action_12331913 ]
Fuad Efendi commented on NUTCH-109:
---
I was totally wrong and unfair:
Have you seen Kelvin Tan's patch?
You should take a look, it's in JIRA, and addresses some of the
202443 Pages consumed: 13 (at index 13). Links fetched: 233386.
202443 Suspicious outlink count = 30442 for [http://www.dmoz.org/].
202444 Pages consumed: 135000 (at index 135000). Links fetched: 272315.
If there is maxoutlinks already specified in the xml config, why does
nutch bother
Hi all, I was interesting in keeping count of the number of time every URL
is selected by an user.
The problem is not on how do I know what page is clicked, but how can
I store every page/number-of-clicks touple?
What is the best way to store theese informations, and let nutch use them?
Can you
I think it would be nice to have a few cluster
strategies on the wiki.
It seems there are at least three separate needs: CPU,
storage and bandwidth, and I think the more those
could be cleanly spread to different boxes, the
better.
Guess I am imagining a breakdown that lists, by
priority, how
OpenSearchServlet outputs illegal xml characters
Key: NUTCH-110
URL: http://issues.apache.org/jira/browse/NUTCH-110
Project: Nutch
Type: Bug
Components: searcher
Versions: 0.7
Environment: linux, jdk 1.5
[ http://issues.apache.org/jira/browse/NUTCH-110?page=all ]
[EMAIL PROTECTED] updated NUTCH-110:
Attachment: fixIllegalXmlChars.patch
Attached patch runs all xml text through a check for bad xml characters. This
patch is brutal dropping silently
Hi,
I'm not an XML expert by any means, but wouldn't it be simpler to just wrap
any text where illegal chars are possible with a !CDATA[ ]! tag? That
way, the offending characters won't be dropped and the process won't be
lossy, no?
If the CDATA method won't work, and there's no other way
[
http://issues.apache.org/jira/browse/NUTCH-109?page=comments#action_12331950 ]
Fuad Efendi commented on NUTCH-109:
---
Please see attachment for more details.
In order to be fair (protocol-http uses single shared Socket per Host) I tried
to modify this
[ http://issues.apache.org/jira/browse/NUTCH-109?page=all ]
Fuad Efendi updated NUTCH-109:
--
Attachment: test_results.txt
Nutch - Fetcher - Performance Test - new Protocol-HTTPClient-Innovation
19 matches
Mail list logo