[jira] Commented: (NUTCH-497) Extreme Nested Tags causes StackOverflowException in DomContentUtils...Spider Trap

2007-06-21 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506775 ] Andrzej Bialecki commented on NUTCH-497: - The patch looks good to me as it is now - however, I've seen

Re: Found the bug in Generator when number of URLs is small

2007-06-21 Thread Doğacan Güney
On 6/21/07, Vishal Shah [EMAIL PROTECTED] wrote: Hi, I think I found the reason why the generator returns with an empty fetchlist for small fetchsizes. After the first job finishes running, the generator checks the following condition to see if it got an empty list: if (readers ==

Hudson build is back to normal: Nutch-Nightly #124

2007-06-21 Thread hudson
See http://lucene.zones.apache.org:8080/hudson/job/Nutch-Nightly/124/changes

[jira] Created: (NUTCH-503) Generator exits incorrectly for small fetchlists

2007-06-21 Thread Vishal Shah (JIRA)
Generator exits incorrectly for small fetchlists - Key: NUTCH-503 URL: https://issues.apache.org/jira/browse/NUTCH-503 Project: Nutch Issue Type: Bug Components: generator

[jira] Updated: (NUTCH-503) Generator exits incorrectly for small fetchlists

2007-06-21 Thread Vishal Shah (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vishal Shah updated NUTCH-503: -- Attachment: emptyfetchlist.patch Hi, The previous patch is missing a header line. I've reattached

http.content.limit not respected when the Content-Type header has charset attributes

2007-06-21 Thread Vishal Shah
Hi, Many of the urls we crawl have headers that look like this: Connection: close Date: Thu, 21 Jun 2007 09:28:42 GMT Accept-Ranges: bytes ETag: 2c0c3-650-cc1eb800 Server: Apache/2.0.40 (Red Hat Linux) Content-Length: 1616 Content-Type: text/html; charset=ISO-8859-1 Last-Modified: Mon, 09

RE: Found the bug in Generator when number of URLs is small

2007-06-21 Thread Vishal Shah
Hi Dogacan, I've uploaded the patch to Nutch-503. http://issues.apache.org/jira/browse/NUTCH-503 Regards, -vishal. -Original Message- From: Dogacan Güney [mailto:[EMAIL PROTECTED] Sent: Thursday, June 21, 2007 12:33 PM To: nutch-dev@lucene.apache.org; [EMAIL PROTECTED] Subject: Re:

[jira] Commented: (NUTCH-471) Fix synchronization in NutchBean creation

2007-06-21 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506883 ] Doğacan Güney commented on NUTCH-471: - We have been using this on our machines for some time, so if there are no

[jira] Commented: (NUTCH-503) Generator exits incorrectly for small fetchlists

2007-06-21 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506922 ] Emmanuel Joke commented on NUTCH-503: - I just try your patch and i'm afraid I still have the same issue.

[jira] Resolved: (NUTCH-471) Fix synchronization in NutchBean creation

2007-06-21 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney resolved NUTCH-471. - Resolution: Fixed Assignee: Doğacan Güney Committed in rev. 549507 with minor style