Re: Seattle Hadoop/Scalability/NoSQL Meetup Tonight!

2010-02-25 Thread Bradford Stephens
Thanks for coming, everyone! We had around 25 people. A *huge* success, for Seattle. And a big thanks to 10gen for sending Richard. Can't wait to see you all next month. On Wed, Feb 24, 2010 at 2:15 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: The Seattle Hadoop/Scalability/NoSQL

Re: regex-urlfilter.txt and paging variables

2010-02-25 Thread MilleBii
You can add a specific rule before that exclusion rule Something like : +.*/?page=.* 2010/2/25, Ian M. Evans ianev...@digitalhit.com: I suck at regex and in keeping with the Olympic spirit, I probably suck at giant slalom too. In the regex-urlfilter.txt there's the suggested probable queries

Re: regex-urlfilter.txt and paging variables

2010-02-25 Thread Andreas P. Koenzen
Replace it with this: -...@!*] That's it... Best regards, --- Andreas P. Koenzen On 25/02/2010, at 03:06 a.m., Ian M. Evans wrote: I suck at regex and in keeping with the Olympic spirit, I probably suck at giant slalom too. In the regex-urlfilter.txt there's the suggested probable

Re: Nutch v0.4

2010-02-25 Thread Andrzej Bialecki
On 2010-02-24 17:34, Pedro Bezunartea López wrote: Hi Ashley, Hi, I'm looking to reproduce program analysis results based on Nutch v0.4. I realize this is a very old release, but is it possible to obtain the source from somewhere? I see some of the classes I'm looking for in v0.7, but I need

Re: Nutch v0.4

2010-02-25 Thread Pedro Bezunartea López
I was curious about this, and after a little browsing through sourceforge, I found the CVS link: http://nutch.cvs.sourceforge.net/viewvc/nutch/nutch/?pathrev=nutch_0_4 HTH, Pedro. 2010/2/25 Andrzej Bialecki a...@getopt.org On 2010-02-24 17:34, Pedro Bezunartea López wrote: Hi Ashley,

Re: Nutch v0.4

2010-02-25 Thread Ashley Sterritt
Great, thanks! 2010/2/25 Pedro Bezunartea López pe...@bezunartea.net: I was curious about this, and after a little browsing through sourceforge, I found the CVS link: http://nutch.cvs.sourceforge.net/viewvc/nutch/nutch/?pathrev=nutch_0_4 HTH, Pedro. 2010/2/25 Andrzej Bialecki

Text.encode failing during de-duplication

2010-02-25 Thread Eddie Drapkin
Hello, I'm trying to upgrade from Nutch 0.9 to Nutch 1.0 and I've solved all of the issues that I seem be having, except for one. When I run a web crawl, everything fetches fine until it gets to dedup, in which case, I get this stack trace: 2010-02-25 14:31:46,592 WARN mapred.LocalJobRunner -