That sounds creepy indeed. It would still need a similar amount of RAM plus
network overhead. Would a bloom filter be useful at all? It takes a lot less
space and i can live with a non-deterministic approach.
On Tuesday 18 October 2011 01:45:20 Sergey A Volkov wrote:
Hi
I think some
With ParserChecker it's similarly truncated. Could it be the fact that it's a
.asp page? The output is as follows:
# bin/nutch org.apache.nutch.parse.ParserChecker -dumpText http://www.canisius.
edu/archives/ruddick.asp
-
Url
---
Strange! I parsed it yesterday as well with parse-tike and the Boilerpipe
patch enabled and got a lot of output. Can you try a different parser? Your
settings look fine but are there any other exoting settings you use or custom
code?
On Tuesday 18 October 2011 15:53:26 Chip Calhoun wrote:
This is strange. What plugin actually fails? What OS are you using? How do you
compile? A normal 1.3 check out on Linux will compile fine with ant.
On Tuesday 18 October 2011 14:58:13 Ashish Mehrotra wrote:
All,
I have tried to compile the nutch src and plugins but have faced this issue
--
Aha! It turns out that removing protocol-httpclient from my nutch-site.xml's
plugin.includes value fixes this. If I'm remembering correctly, I only added
this in the hope that it would fix something else that it didn't actually fix,
so hopefully removing it won't break anything.
-Original
I am using mac os x snow leopard. Java 1.6 for OS X
On Oct 18, 2011, at 7:00 AM, Markus Jelsma markus.jel...@openindex.io wrote:
This is strange. What plugin actually fails? What OS are you using? How do
you
compile? A normal 1.3 check out on Linux will compile fine with ant.
On Tuesday
Good to hear. Keep in mind, that plugin is broken and should not be used at
all.
Aha! It turns out that removing protocol-httpclient from my
nutch-site.xml's plugin.includes value fixes this. If I'm remembering
correctly, I only added this in the hope that it would fix something else
that it
Actually some kv storages use bloom filter for similar purpose.
What is your queue size? And what is redirect rate?
If most redirects are not crossdomain and average number of urls per
domain is not very big some fixed size chache in FetchItemQueue may
help. But this leads to lots of changes
How/where are you trying to compile this from? Command line or from within
eclipse?
On Tue, Oct 18, 2011 at 4:32 PM, Ashish M ashme...@yahoo.com wrote:
I am using mac os x snow leopard. Java 1.6 for OS X
On Oct 18, 2011, at 7:00 AM, Markus Jelsma markus.jel...@openindex.io
wrote:
This is
Have tried from command line as well as from IDE (Idea) ... Fails in the
resolve-default target of build.xml.
I tried it and It ran fine on a windows machine. I donlaoded nutch 1.3 src and
ant binary. On running ant from commandline, the build completed
Sent from my iPhone. Please ignore the
i am talking about this patch
https://issues.apache.org/jira/browse/NUTCH-1098
not committed yet
I've never seen the unresolved dependencies that you mention. I'm can only
assume that this must be something environment specific...
[ivy:resolve]::
[ivy:resolve]
:: UNRESOLVED DEPENDENCIES ::[ivy:resolve]
That happened when I removed the log attribute from ivy:resolve tag in
build.xml. That should not have happened- the removal of log attribute.
Is there any particular version of ivy you are expecting?
Sent from my iPhone. Please ignore the typos.
On Oct 18, 2011, at 9:35 AM, lewis john
details of ivy shipped with Nutch 1.3 can be seen here [1], I'm just curious
why compiling your plugins is not working. If you're using ant then
ant comple-plugins
will obviously do the business, however
ant compile
should also result in a succesful compile job. If the software releases you
That's right, Lewis.
I compared the file sizes and they seem to be the same as present in svn.
As I mentioned previously, the problem is in task resolve-default which uses
ivy:resolve and that fails for me for attribute log.
PS: I have been trying this on Mac OS 10.6.8. The build worked fine
15 matches
Mail list logo