Re: http.redirect.max and duplicate fetch/parse

2011-10-18 Thread Markus Jelsma
That sounds creepy indeed. It would still need a similar amount of RAM plus network overhead. Would a bloom filter be useful at all? It takes a lot less space and i can live with a non-deterministic approach. On Tuesday 18 October 2011 01:45:20 Sergey A Volkov wrote: Hi I think some

RE: Truncated content despite my content.limit settings.

2011-10-18 Thread Chip Calhoun
With ParserChecker it's similarly truncated. Could it be the fact that it's a .asp page? The output is as follows: # bin/nutch org.apache.nutch.parse.ParserChecker -dumpText http://www.canisius. edu/archives/ruddick.asp - Url ---

Re: Truncated content despite my content.limit settings.

2011-10-18 Thread Markus Jelsma
Strange! I parsed it yesterday as well with parse-tike and the Boilerpipe patch enabled and got a lot of output. Can you try a different parser? Your settings look fine but are there any other exoting settings you use or custom code? On Tuesday 18 October 2011 15:53:26 Chip Calhoun wrote:

Re: compilation of nutch1.3 plugins fails

2011-10-18 Thread Markus Jelsma
This is strange. What plugin actually fails? What OS are you using? How do you compile? A normal 1.3 check out on Linux will compile fine with ant. On Tuesday 18 October 2011 14:58:13 Ashish Mehrotra wrote: All, I have tried to compile the nutch src and plugins but have faced this issue --

RE: Truncated content despite my content.limit settings.

2011-10-18 Thread Chip Calhoun
Aha! It turns out that removing protocol-httpclient from my nutch-site.xml's plugin.includes value fixes this. If I'm remembering correctly, I only added this in the hope that it would fix something else that it didn't actually fix, so hopefully removing it won't break anything. -Original

Re: compilation of nutch1.3 plugins fails

2011-10-18 Thread Ashish M
I am using mac os x snow leopard. Java 1.6 for OS X On Oct 18, 2011, at 7:00 AM, Markus Jelsma markus.jel...@openindex.io wrote: This is strange. What plugin actually fails? What OS are you using? How do you compile? A normal 1.3 check out on Linux will compile fine with ant. On Tuesday

Re: Truncated content despite my content.limit settings.

2011-10-18 Thread Markus Jelsma
Good to hear. Keep in mind, that plugin is broken and should not be used at all. Aha! It turns out that removing protocol-httpclient from my nutch-site.xml's plugin.includes value fixes this. If I'm remembering correctly, I only added this in the hope that it would fix something else that it

Re: http.redirect.max and duplicate fetch/parse

2011-10-18 Thread Sergey A Volkov
Actually some kv storages use bloom filter for similar purpose. What is your queue size? And what is redirect rate? If most redirects are not crossdomain and average number of urls per domain is not very big some fixed size chache in FetchItemQueue may help. But this leads to lots of changes

Re: compilation of nutch1.3 plugins fails

2011-10-18 Thread lewis john mcgibbney
How/where are you trying to compile this from? Command line or from within eclipse? On Tue, Oct 18, 2011 at 4:32 PM, Ashish M ashme...@yahoo.com wrote: I am using mac os x snow leopard. Java 1.6 for OS X On Oct 18, 2011, at 7:00 AM, Markus Jelsma markus.jel...@openindex.io wrote: This is

Re: compilation of nutch1.3 plugins fails

2011-10-18 Thread Ashish M
Have tried from command line as well as from IDE (Idea) ... Fails in the resolve-default target of build.xml. I tried it and It ran fine on a windows machine. I donlaoded nutch 1.3 src and ant binary. On running ant from commandline, the build completed Sent from my iPhone. Please ignore the

Re: Nutch not crawling URLs with spanish accented characters ( ñ)

2011-10-18 Thread Radim Kolar
i am talking about this patch https://issues.apache.org/jira/browse/NUTCH-1098 not committed yet

Re: compilation of nutch1.3 plugins fails

2011-10-18 Thread lewis john mcgibbney
I've never seen the unresolved dependencies that you mention. I'm can only assume that this must be something environment specific... [ivy:resolve]:: [ivy:resolve] :: UNRESOLVED DEPENDENCIES ::[ivy:resolve]

Re: compilation of nutch1.3 plugins fails

2011-10-18 Thread Ashish M
That happened when I removed the log attribute from ivy:resolve tag in build.xml. That should not have happened- the removal of log attribute. Is there any particular version of ivy you are expecting? Sent from my iPhone. Please ignore the typos. On Oct 18, 2011, at 9:35 AM, lewis john

Re: compilation of nutch1.3 plugins fails

2011-10-18 Thread lewis john mcgibbney
details of ivy shipped with Nutch 1.3 can be seen here [1], I'm just curious why compiling your plugins is not working. If you're using ant then ant comple-plugins will obviously do the business, however ant compile should also result in a succesful compile job. If the software releases you

Re: compilation of nutch1.3 plugins fails

2011-10-18 Thread Ashish M
That's right, Lewis. I compared the file sizes and they seem to be the same as present in svn. As I mentioned previously, the problem is in task resolve-default which uses ivy:resolve and that fails for me for attribute log. PS: I have been trying this on Mac OS 10.6.8. The build worked fine