[jira] [Updated] (NUTCH-897) Subcollection requires blacklist element

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-897: Attachment: NUTCH-897.patch Attached tested fix and if confirmed to work and not break existing

Re: Clean up open legacy issues in Jira

2011-04-01 Thread Mattmann, Chris A (388J)
Super +1 Markus -- I've tried over the past 9 months to do this periodically when I've rolled releases, but if everyone could take a look and close out really old or non-applicable bugs, that would be great! BTW, time is freeing up for me lately, so it might be time finally for the 1.3

[jira] [Closed] (NUTCH-973) Remove Segment Merger in 1.3

2011-04-01 Thread Julien Nioche (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Julien Nioche closed NUTCH-973. --- Resolution: Not A Problem You are right, let's leave it for now. It won't be a problem once we're on

[jira] [Closed] (NUTCH-39) pagination in search result

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-39. -- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-36) Chinese in Nutch

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-36. -- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-13) If dns points to 127.0.0.1, the url is also crawled

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-13. -- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-79) Fault tolerant searching.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-79. -- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-103) Vivisimo like treeview and url redirect

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-103. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Commented] (NUTCH-18) Windows servers include illegal characters in URLs

2011-04-01 Thread David Escuer (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-18?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13014581#comment-13014581 ] David Escuer commented on NUTCH-18: --- La persona amb la qui vol contactar estarà fora de

[jira] [Closed] (NUTCH-104) Nutch query parser does not support CJK bi-gram segmentation.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-104. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-180) Performance problem with widely used keywords

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-180. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-581) DistributedSearch does not update search servers added to search-servers.txt on the fly

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-581. --- DistributedSearch does not update search servers added to search-servers.txt on the fly

[jira] [Closed] (NUTCH-877) Allow setting of slop values for non-quote phrase queries on query-basic plugin

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-877. --- Allow setting of slop values for non-quote phrase queries on query-basic plugin

[jira] [Updated] (NUTCH-265) Getting Clustered results in better form.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-265: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-674) NutchBean doesn't check for searcher.dir existance.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-674: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-423) Add other index-basic fields as query plugins

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-423: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-47) Configure host filter to do wildcard prefixes - *.redhat.com

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-47: --- Bulk close of legacy issues:

[jira] [Updated] (NUTCH-943) Search Results default dedup field site should be stored in index.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-943: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-469: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-377) Add possibility to search for multiple values

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-377: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-453) Move stop words to a config file

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-453: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-542) Null Pointer Exception on getSummary when segment no longer exists

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-542: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-466) Flexible segment format

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-466: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-480) Searching multiple indexes with a single nutch instance

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-480: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-470) Adding optional terms to a query

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-470: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-541) Index url field untokenized

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-541: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-72) Query basic filter with correction feature

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-72: --- Bulk close of legacy issues:

[jira] [Updated] (NUTCH-260) Three new plugins that parse, index and query meta tags defined in the configuration

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-260: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-445) Domain İndexing / Query Filter

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-445: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-820) Infinite loop when hitspersite is set

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-820: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-92) DistributedSearch incorrectly scores results

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-92: --- Bulk close of legacy issues:

[jira] [Updated] (NUTCH-573) Multiple Domains - Query Search

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-573: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-764) Add support for vfsfile:// loading of plugins for JBoss

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-764: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-455) dedup on tokenized fields is faulty

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-455: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-708) NutchBean: OOM due to searcher.max.hits and dedup.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-708: Bulk close of legacy issues:

[jira] [Closed] (NUTCH-72) Query basic filter with correction feature

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-72. -- Resolution: Won't Fix Query basic filter with correction feature

[jira] [Closed] (NUTCH-294) Topic-maps of related searchwords

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-294. --- Resolution: Won't Fix Topic-maps of related searchwords -

[jira] [Closed] (NUTCH-943) Search Results default dedup field site should be stored in index.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-943. --- Resolution: Won't Fix Search Results default dedup field site should be stored in index.

[jira] [Closed] (NUTCH-540) some problem about the Nutch cache

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-540. --- Resolution: Won't Fix some problem about the Nutch cache --

[jira] [Closed] (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-469. --- Resolution: Won't Fix changes to geoPosition plugin to make it work on nutch 0.9

[jira] [Closed] (NUTCH-92) DistributedSearch incorrectly scores results

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-92. -- Resolution: Won't Fix DistributedSearch incorrectly scores results

[jira] [Closed] (NUTCH-674) NutchBean doesn't check for searcher.dir existance.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-674. --- Resolution: Won't Fix NutchBean doesn't check for searcher.dir existance.

[jira] [Closed] (NUTCH-820) Infinite loop when hitspersite is set

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-820. --- Resolution: Won't Fix Infinite loop when hitspersite is set -

[jira] [Closed] (NUTCH-708) NutchBean: OOM due to searcher.max.hits and dedup.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-708. --- Resolution: Won't Fix NutchBean: OOM due to searcher.max.hits and dedup.

[jira] [Closed] (NUTCH-445) Domain İndexing / Query Filter

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-445. --- Resolution: Won't Fix Domain İndexing / Query Filter --

[jira] [Closed] (NUTCH-541) Index url field untokenized

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-541. --- Resolution: Won't Fix Index url field untokenized ---

[jira] [Closed] (NUTCH-455) dedup on tokenized fields is faulty

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-455. --- Resolution: Won't Fix dedup on tokenized fields is faulty ---

[jira] [Closed] (NUTCH-638) Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-638. --- Resolution: Won't Fix Launching Distributed Searchers with URI indicating filesystem to use rather

[jira] [Closed] (NUTCH-479) Support for OR queries

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-479. --- Resolution: Won't Fix Support for OR queries -- Key:

[jira] [Closed] (NUTCH-466) Flexible segment format

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-466. --- Resolution: Won't Fix Flexible segment format --- Key:

[jira] [Closed] (NUTCH-377) Add possibility to search for multiple values

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-377. --- Resolution: Won't Fix Add possibility to search for multiple values

[jira] [Closed] (NUTCH-386) Plugin to index categories by url rules

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-386. --- Resolution: Won't Fix Plugin to index categories by url rules

[jira] [Closed] (NUTCH-453) Move stop words to a config file

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-453. --- Resolution: Won't Fix Move stop words to a config file

[jira] [Closed] (NUTCH-260) Three new plugins that parse, index and query meta tags defined in the configuration

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-260. --- Resolution: Won't Fix Three new plugins that parse, index and query meta tags defined in the

[jira] [Closed] (NUTCH-542) Null Pointer Exception on getSummary when segment no longer exists

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-542. --- Resolution: Won't Fix Null Pointer Exception on getSummary when segment no longer exists

[jira] [Closed] (NUTCH-355) The title of query result could like the summary have the highlight??

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-355. --- Resolution: Won't Fix The title of query result could like the summary have the highlight??

[jira] [Closed] (NUTCH-470) Adding optional terms to a query

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-470. --- Resolution: Won't Fix Adding optional terms to a query

[jira] [Closed] (NUTCH-764) Add support for vfsfile:// loading of plugins for JBoss

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-764. --- Resolution: Won't Fix Add support for vfsfile:// loading of plugins for JBoss

[jira] [Closed] (NUTCH-290) parse-pdf: Garbage indexed when text-extraction not allowed

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-290. --- Resolution: Won't Fix parse-pdf: Garbage indexed when text-extraction not allowed

[jira] [Closed] (NUTCH-358) Language Switching PROBLEM FIXED

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-358. --- Resolution: Won't Fix Language Switching PROBLEM FIXED

[jira] [Closed] (NUTCH-389) a url tokenizer implementation for tokenizing index fields : url and host

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-389. --- Resolution: Won't Fix a url tokenizer implementation for tokenizing index fields : url and host

[jira] [Closed] (NUTCH-396) mergesegs sorts URLs, making segments useless for subsequent fetch

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-396. --- Resolution: Won't Fix mergesegs sorts URLs, making segments useless for subsequent fetch

[jira] [Closed] (NUTCH-326) WordExtractor throws java.util.NoSuchElementException on some documents

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-326. --- Resolution: Won't Fix WordExtractor throws java.util.NoSuchElementException on some documents

[jira] [Closed] (NUTCH-352) Add jar command to bin/nutch to allow launching hadoop job jars

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-352. --- Resolution: Won't Fix Add jar command to bin/nutch to allow launching hadoop job jars

[jira] [Closed] (NUTCH-343) Index MP3 SHA1 hashes

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-343. --- Resolution: Won't Fix Index MP3 SHA1 hashes - Key: NUTCH-343

[jira] [Closed] (NUTCH-26) New Http Authentication mechanism

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-26. -- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-259) Problem in IndexSorter after dedup

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-259. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-283) If the Fetcher times out and abandons Fetcher Threads, severe errors will occur on those Threads

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-283. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-158) Process Sitemap data in text, rss or xml format as well as OAI-PMH

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-158. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-251) Administration GUI

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-251. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-164) Locale (language) choice by first session has global effect to all sessions

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-164. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-162) country code jp is used instead of language code ja for Japanese

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-162. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-441) Thai Analyzer Plugin

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-441. --- Resolution: Won't Fix Thai Analyzer Plugin Key: NUTCH-441

[jira] [Closed] (NUTCH-224) Nutch doesn't handle Korean text at all

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-224. --- Resolution: Won't Fix Nutch doesn't handle Korean text at all

[jira] [Closed] (NUTCH-568) Indexer does not update the Lucene TITLE field

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-568. --- Resolution: Won't Fix Indexer does not update the Lucene TITLE field

[jira] [Closed] (NUTCH-249) black- white list url filtering

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-249. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-709) JSParseFilter gets into an infinate loop and ets all the stack

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-709. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-289) CrawlDatum should store IP address

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-289. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-496) ConcurrentModificationException can be thrown when getSorted() is called.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-496. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-424) NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch 0.9 and 0.8.1 (Nekohtml 0.9.4))

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-424. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-119) Regexp to extract outlinks incorrect

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-119. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-414) parse-mp3 plugin concatenating previous tags for text field

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-414. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-113) Disable permanent DNS-to-IP caching for JVM 1.4

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-113. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-87) Efficient site-specific crawling for a large number of sites

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-87. -- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-460) RDF parser plugin

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-460. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-182) Log when db.max configuration limits reached

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-182. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-826) Mailing list is broken.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-826. --- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-570) Improvement of URL Ordering in Generator.java

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-570. --- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-742) Checksum Error

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-742. --- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-44) too many search results

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-44. -- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-854) Define standard attributes with values and explaination to configuration files in conf directory

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-854. --- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-958) Httpclient scheme priority order fix

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-958. --- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-866) STOP Nutch without breaking the crawled data

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-866. --- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-86) LanguageIdentifier API enhancements

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-86. -- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-591) StringIndexOutOfBoundsException when extracting text from a Word document.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-591. --- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-363) Fetcher normalizes everything at least twice

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-363. --- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-185) XMLParser is configurable xml parser plugin.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-185. --- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-310) Review Log Levels

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-310. --- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-659) Help! No urls fetched for internal repository website

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-659. --- Bulk close of resolved issues:

[jira] [Closed] (NUTCH-774) Retry interval in crawl date is set to 0

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-774. --- Bulk close of resolved issues:

  1   2   >