[Nutch-dev] [jira] Commented: (NUTCH-505) Outlink urls should be validated

2007-07-12 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512201 ] Doğacan Güney commented on NUTCH-505: - Andrzej, on my tests, java.util.regex is faster on both Java 1.5 and Java 1

[Nutch-dev] [jira] Created: (NUTCH-513) suffix-urlfilter.txt does not have a template

2007-07-12 Thread JIRA
suffix-urlfilter.txt does not have a template - Key: NUTCH-513 URL: https://issues.apache.org/jira/browse/NUTCH-513 Project: Nutch Issue Type: Improvement Affects Versions: 1.0.0 Re

[Nutch-dev] [jira] Closed: (NUTCH-512) Search on date range

2007-07-12 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-512. --- Resolution: Invalid Please use mailing lists for such questions. > Search on date range > ---

[Nutch-dev] [jira] Closed: (NUTCH-511) Recrawling

2007-07-12 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrzej Bialecki closed NUTCH-511. --- Resolution: Invalid Assignee: Andrzej Bialecki Please use mailing lists for such questi

[Nutch-dev] [jira] Commented: (NUTCH-505) Outlink urls should be validated

2007-07-12 Thread Andrzej Bialecki (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512139 ] Andrzej Bialecki commented on NUTCH-505: - Please test Java 1.5 and Java 1.6 - IIRC there are some differences

[Nutch-dev] [jira] Updated: (NUTCH-505) Outlink urls should be validated

2007-07-12 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-505: Attachment: NUTCH-505-v3.patch filtered.txt New and final version. I shuffled some c

[Nutch-dev] [jira] Commented: (NUTCH-505) Outlink urls should be validated

2007-07-12 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512074 ] Doğacan Güney commented on NUTCH-505: - Thanks for the suggestion. Automaton really looks good, but using automaton

[Nutch-dev] [jira] Commented: (NUTCH-505) Outlink urls should be validated

2007-07-12 Thread Espen Amble Kolstad (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512071 ] Espen Amble Kolstad commented on NUTCH-505: --- Automaton (http://www.brics.dk/automaton/), used in AutomatonUR

[Nutch-dev] [jira] Updated: (NUTCH-505) Outlink urls should be validated

2007-07-12 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doğacan Güney updated NUTCH-505: Attachment: NUTCH-505-v2.patch After my last commit, I read that Sun's java.util.regex implementatio

[Nutch-dev] [jira] Created: (NUTCH-512) Search on date range

2007-07-12 Thread anuradha (JIRA)
Search on date range Key: NUTCH-512 URL: https://issues.apache.org/jira/browse/NUTCH-512 Project: Nutch Issue Type: Wish Affects Versions: 0.9.0 Reporter: anuradha Hi, I need to search on date range. I

[Nutch-dev] [jira] Created: (NUTCH-511) Recrawling

2007-07-12 Thread anuradha (JIRA)
Recrawling --- Key: NUTCH-511 URL: https://issues.apache.org/jira/browse/NUTCH-511 Project: Nutch Issue Type: Wish Affects Versions: 0.9.0 Reporter: anuradha Hi, First I have crawled one website. I added one page

[Nutch-dev] [jira] Commented: (NUTCH-506) Nutch should delegate compression to Hadoop

2007-07-12 Thread JIRA
[ https://issues.apache.org/jira/browse/NUTCH-506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512011 ] Doğacan Güney commented on NUTCH-506: - For some reason, crawl_generate is not compressed, even though crawldb, cr