Nutch in WebSphere

2009-10-27 Thread Joshua J Pavel
I'm very new at this, so forgive my novice questions. I'm trying to install nutch in WebSphere 6.1. While I can see that others have done this before, I've been unsuccessful. I keep getting this error: Error 500: java.lang.Error: java.lang.NoClassDefFoundError: org.apache.jsp._search (wrong

Nutch in Websphere

2009-10-27 Thread Joshua J Pavel
I'm very new at this, so forgive my novice questions. I'm trying to install nutch in WebSphere 6.1. While I can see that others have done this before, I've been unsuccessful. I keep getting this error: Error 500: java.lang.Error: java.lang.NoClassDefFoundError: org.apache.jsp._search (wrong

Asking again - WebSphere question

2009-11-02 Thread Joshua J Pavel
I'm very new at this, so forgive my novice questions. I'm trying to install nutch in WebSphere 6.1. While I can see that others have done this before, I've been unsuccessful. I keep getting this error: Error 500: java.lang.Error: java.lang.NoClassDefFoundError: org.apache.jsp._search (wrong

Update live search index

2010-01-05 Thread Joshua J Pavel
Hello all - I need to update the live search index - most preferably without restarting the application. I'm using nutch 0.9 in WebSphere. By doing a few searches, it seems that this is a large issue with a lot of history. Where does it stand today? Is there a .jsp I can create to

How do I crawl relative URLs not in href tags?

2010-01-17 Thread Joshua J Pavel
So, with HTML like this (from a dropdown box): option value=/en_AU/news/articles/20100117.html selected=selectedSunday 17 January 2010/option option value=/en_AU/news/articles/20100116.htmlSaturday 16 January 2010/option option value=/en_AU/news/articles/20100115.htmlFriday 15 January

Recrawl and crawl-urlfilter.txt

2010-03-12 Thread Joshua J Pavel
I'm having multiple problems recrawling with nutch 0.9. Here are 2 questions. :-) Right now, using the script I find here ( http://today.java.net/pub/a/today/2006/02/16/introduction-to-nutch-2.html ), I think I'm close to a workable solution, but the recrawl doesn't respect the

Hadoop Disk Error

2010-04-16 Thread Joshua J Pavel
We're just now moving from a nutch .9 installation to 1.0, so I'm not entirely new to this. However, I can't even get past the first fetch now, due to a hadoop error. Looking in the mailing list archives, normally this error is caused from either permissions or a full disk. I overrode the use

Re: Hadoop Disk Error

2010-04-16 Thread Joshua J Pavel
: | | --| |Joshua J Pavel/Raleigh/i...@ibmus

Re: Hadoop Disk Error

2010-04-19 Thread Joshua J Pavel
: | | --| |Joshua J Pavel/Raleigh/i...@ibmus

RE: Hadoop Disk Error

2010-04-20 Thread Joshua J Pavel
| --| Are you sure that you have enough space in the temporary directory used by Hadoop? From: Joshua J Pavel [mailto:jpa...@us.ibm.com] Sent: Tuesday, 20 April 2010 6:42 AM To: nutch-user@lucene.apache.org Subject: Re: Hadoop

RE: Hadoop Disk Error

2010-04-20 Thread Joshua J Pavel
: | | --| |Joshua J Pavel/Raleigh/i...@ibmus

Re: Hadoop Disk Error

2010-04-20 Thread Joshua J Pavel
in the config file? J. -- DigitalPebble Ltd http://www.digitalpebble.com On 20 April 2010 14:00, Joshua J Pavel jpa...@us.ibm.com wrote: I am - I changed the location to a filesystem with lots of free space and watched disk utilization during a crawl. It'll be a relatively small crawl, and I

Re: Hadoop Disk Error

2010-04-20 Thread Joshua J Pavel
correctly. Thanks for taking a look at this. | | From: | | --| |Joshua J Pavel/Raleigh/i...@ibmus

RE: Hadoop Disk Error

2010-04-21 Thread Joshua J Pavel
the temp space limit is different there. From: Joshua J Pavel [mailto:jpa...@us.ibm.com] Sent: Wednesday, 21 April 2010 3:40 AM To: nutch-user@lucene.apache.org Subject: Re: Hadoop Disk Error Yes - how much free space does it need? We ran 0.9 using /tmp, and that has ~ 1 GB. After I first saw

Re: Hadoop Disk Error

2010-04-21 Thread Joshua J Pavel
(instead of parsing while refetching) Your crawl is fairly small so it should not require much space at all. Thanks Julien On 21 April 2010 15:28, Joshua J Pavel jpa...@us.ibm.com wrote: I get the same error on a filesystem with 10 GB (disk space is a commodity here). The final crawl when

Language specifications

2010-04-22 Thread Joshua J Pavel
Alternate question... thanks to everyone who has tried to help me through the hadoop/AIX issues with 1.0, but I'm going to need to shelf that for just a second while I work on some stuff with 0.9 again. I need to support one site that has 3 translations: English, French, and Spanish. The

Re: Hadoop Disk Error

2010-04-26 Thread Joshua J Pavel
: | | --| |Joshua J Pavel/Raleigh/i...@ibmus