[jira] Commented: (NUTCH-567) Proper (?) handling of URIs in TagSoup.

2008-02-08 Thread Emmanuel Joke (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12566950#action_12566950 ] Emmanuel Joke commented on NUTCH-567: - Hi Dogacan, do you think you will commit this new

[jira] Created: (NUTCH-607) Update build.xml to include tika jar

2008-02-08 Thread Dennis Kubes (JIRA)
Update build.xml to include tika jar Key: NUTCH-607 URL: https://issues.apache.org/jira/browse/NUTCH-607 Project: Nutch Issue Type: Bug Environment: All Reporter: Dennis Kubes

[jira] Updated: (NUTCH-607) Update build.xml to include tika jar in war file

2008-02-08 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-607: --- Attachment: NUTCH-607-1-20080208.patch Updates build to include the tika.jar for the war. Correct

[jira] Updated: (NUTCH-606) Refactoring of Generator, run all urls through checks

2008-02-08 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-606: --- Attachment: NUTCH-606-1-20080208.patch Refactors the generator and ensures the checks are run on all

[jira] Assigned: (NUTCH-606) Refactoring of Generator, run all urls through checks

2008-02-08 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes reassigned NUTCH-606: -- Assignee: Dennis Kubes Refactoring of Generator, run all urls through checks

[jira] Updated: (NUTCH-606) Refactoring of Generator, run all urls through checks

2008-02-08 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-606: --- Attachment: NUTCH-606-2-20080208.patch Adds some refactoring to close file readers before exiting

[jira] Created: (NUTCH-606) Refactoring of Generator, run all urls through checks

2008-02-08 Thread Dennis Kubes (JIRA)
Refactoring of Generator, run all urls through checks - Key: NUTCH-606 URL: https://issues.apache.org/jira/browse/NUTCH-606 Project: Nutch Issue Type: Bug Components: generator

[jira] Commented: (NUTCH-606) Refactoring of Generator, run all urls through checks

2008-02-08 Thread Andrzej Bialecki (JIRA)
-20080208.patch, NUTCH-606-2-20080208.patch Refactor the generator to make sure all host run through checks such as host and protocol checks, ip checks if necessary. Currently the generator only does this for urls if generate.max.per.host 0 which by default is -1. So by default all urls

[jira] Commented: (NUTCH-607) Update build.xml to include tika jar in war file

2008-02-08 Thread Chris A. Mattmann (JIRA)
Assignee: Dennis Kubes Fix For: 1.0.0 Attachments: NUTCH-607-1-20080208.patch Update the build.xml to include the tika jar in the war file. Currently the jar is not included and the cached.jsp page errors out. -- This message is automatically generated by JIRA

[jira] Created: (NUTCH-608) Upgrade nutch to use released apache-tika-0.1-incubating

2008-02-08 Thread Chris A. Mattmann (JIRA)
Upgrade nutch to use released apache-tika-0.1-incubating Key: NUTCH-608 URL: https://issues.apache.org/jira/browse/NUTCH-608 Project: Nutch Issue Type: Improvement

[jira] Updated: (NUTCH-606) Refactoring of Generator, run all urls through checks

2008-02-08 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-606: --- Attachment: NUTCH-606-3-20080208.patch Added an empty check for hostnames Refactoring of Generator