Re: Nutch Parser annoyingly faulty

2011-03-04 Thread Julien Nioche
Hi Jurgen, Since I wrote this email - which I thought got ignored by the Nutch developers - Thanks for reporting the problem Jurgen. and sorry that you felt you were being ignored. The few active developers Nutch has contribute during their spare time, the reason why you did not get any

Re: Nutch Parser annoyingly faulty

2011-03-04 Thread Juergen Specht
Hi Julien, On 3/4/11 7:09 PM, Julien Nioche wrote: Thanks for reporting the problem Jurgen. and sorry that you felt you were being ignored. The few active developers Nutch has contribute during their spare time, the reason why you did not get any comments on this, is that no one had an instant

Re: Nutch Parser annoyingly faulty

2011-03-03 Thread Scott Gonyea
Has anyone looked into this? This is especially a problem when folks like Juergen are a customer and, quite rightfully, raise hell. I wasn't aware of this, since Nutch is a software metaphor for a firehose. But what I have noticed is that the URL Parser is really, really terrible.

Nutch Parser annoyingly faulty

2011-02-25 Thread Juergen Specht
Hi Nutch Team, before I permanently reject Nutch from all my sites, I better tell you why...your URL parser is extremely faulty and creates a lot of trouble. Here is an example, if you have a link on a page, say: http://www.somesite/somepage/ and the link in HTML looks like: a href=.This