Build failed in Jenkins: Nutch-trunk #1444

2011-04-01 Thread Apache Hudson Server
See -- [...truncated 1009 lines...] A src/plugin/subcollection/src/java/org/apache/nutch/collection A src/plugin/subcollection/src/java/org/apache/nutch/collection/Subcollection.java A

[jira] [Closed] (NUTCH-519)   prased incorrectly

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-519. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-458) Proxy forwarding to nutch.war does not work. Need to add some code...

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-458. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-523) web2 searchform problems with patch

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-523. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-689) Swf parser doesn't seem to handle relative links

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-689. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-521) Modified injector to allow newly injected CrawlDatum to overwrite original

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-521. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-461) microformats-reltag plugin and relative links

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-461. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-595) "Target file:/.... already exists"

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-595. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-364) Javascript parser creates some fairly bogus URLs

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-364. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-751) Upgrade version of HttpClient

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-751. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-152?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-152. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-759) Removal of deprecated APIs

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-759. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-644) RTF parser doesn't compile anymore

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-644. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-186. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-677) Segment merge filering based on segment content

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-677. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-564) External parser supports encoding attribute

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-564. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-714) Need a SFTP and SCP Protocol Handler

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-714. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-101) RobotRulesParser

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-101. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-309) Uses commons logging Code Guards

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-309. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-73) A page for CSV results

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-73?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-73. -- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_open

[jira] [Closed] (NUTCH-716) Make subcollection index filed multivalued

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-716. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-774) Retry interval in crawl date is set to 0

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-774?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-774. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-659) Help! No urls fetched for internal repository website

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-659. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-310) Review Log Levels

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-310. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-185) XMLParser is configurable xml parser plugin.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-185. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-363) Fetcher normalizes everything at least twice

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-363. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-473) ExcelExtractor performance bad due to String concatenation

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-473. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-591) StringIndexOutOfBoundsException when extracting text from a Word document.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-591. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-86) LanguageIdentifier API enhancements

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-86?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-86. -- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_open

[jira] [Closed] (NUTCH-958) Httpclient scheme priority order fix

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-958. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-854) Define standard attributes with values and explaination to configuration files in conf directory

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-854. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-860) package task fails

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-860?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-860. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-866) STOP Nutch without breaking the crawled data

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-866?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-866. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-44) too many search results

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-44?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-44. -- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_open

[jira] [Closed] (NUTCH-742) Checksum Error

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-742. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-650) Hbase Integration

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-650. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-570) Improvement of URL Ordering in Generator.java

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-570. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-826) Mailing list is broken.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-826?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-826. --- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Closed] (NUTCH-50) Benchmarks & Performance goals

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-50?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-50. -- Bulk close of resolved issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_open

[jira] [Closed] (NUTCH-182) Log when db.max configuration limits reached

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-182. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-87) Efficient site-specific crawling for a large number of sites

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-87?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-87. -- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738ee

[jira] [Closed] (NUTCH-460) RDF parser plugin

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-460. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-414) parse-mp3 plugin concatenating previous tags for text field

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-414. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-113) Disable permanent DNS-to-IP caching for JVM 1.4

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-113. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-119) Regexp to extract outlinks incorrect

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-119. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-100) New plugin urlfilter-db

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-100. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-496) ConcurrentModificationException can be thrown when getSorted() is called.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-496. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-289) CrawlDatum should store IP address

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-289. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-709) JSParseFilter gets into an infinate loop and ets all the stack

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-709. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-424) NekoHTML's DOMFragmentParser hangs on certain URLs (CLONE: Problem persists with Nutch 0.9 and 0.8.1 (Nekohtml 0.9.4))

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-424. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-249) black- white list url filtering

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-249. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-224) Nutch doesn't handle Korean text at all

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-224. --- Resolution: Won't Fix > Nutch doesn't handle Korean text at all >

[jira] [Closed] (NUTCH-568) Indexer does not update the Lucene "TITLE" field

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-568. --- Resolution: Won't Fix > Indexer does not update the Lucene "TITLE" field > ---

[jira] [Closed] (NUTCH-441) Thai Analyzer Plugin

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-441. --- Resolution: Won't Fix > Thai Analyzer Plugin > > > Key: NUTCH-441

[jira] [Closed] (NUTCH-472) NullPointerException in ZipTextExtractor if no MIME type for zipped file

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-472. --- Resolution: Won't Fix > NullPointerException in ZipTextExtractor if no MIME type for zipped file > ---

[jira] [Closed] (NUTCH-272) Max. pages to crawl/fetch per site (emergency limit)

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-272. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-162) country code "jp" is used instead of language code "ja" for Japanese

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-162. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-98) RobotRulesParser interprets robots.txt incorrectly

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-98?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-98. -- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738ee

[jira] [Closed] (NUTCH-251) Administration GUI

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-251. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-281) cached.jsp: base-href needs to be outside comments

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-281. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-158) Process Sitemap data in text, rss or xml format as well as OAI-PMH

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-158. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-267) Indexer doesn't consider linkdb when calculating boost value

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-267. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-164) Locale (language) choice by first session has global effect to all sessions

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-164?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-164. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-283) If the Fetcher times out and abandons Fetcher Threads, severe errors will occur on those Threads

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-283. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-48) "Did you mean" query enhancement/refignment feature request

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-48?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-48. -- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738ee

[jira] [Closed] (NUTCH-129) rtf-parser does not work when opened with wordpad files and saved

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-129. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-26) New Http Authentication mechanism

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-26?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-26. -- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738ee

[jira] [Closed] (NUTCH-259) Problem in IndexSorter after dedup

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-259. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

Re: Clean up open legacy issues in Jira

2011-04-01 Thread Markus Jelsma
On Friday 01 April 2011 16:20:24 Mattmann, Chris A (388J) wrote: > Super +1 Markus -- I've tried over the past 9 months to do this > periodically when I've rolled releases, but if everyone could take a look > and close out really old or non-applicable bugs, that would be great! Ahum, it seems i

[jira] [Closed] (NUTCH-300) Clustering API improvements

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-300. --- Resolution: Won't Fix > Clustering API improvements > --- > >

[jira] [Closed] (NUTCH-299) Bittorrent Parser

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-299. --- Resolution: Won't Fix > Bittorrent Parser > - > > Key: NUTCH-299 >

[jira] [Closed] (NUTCH-316) Confusion about query languages

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-316. --- Resolution: Won't Fix > Confusion about query languages > --- > >

[jira] [Closed] (NUTCH-352) Add jar command to bin/nutch to allow launching hadoop job jars

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-352. --- Resolution: Won't Fix > Add jar command to bin/nutch to allow launching hadoop job jars >

[jira] [Closed] (NUTCH-343) Index MP3 SHA1 hashes

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-343. --- Resolution: Won't Fix > Index MP3 SHA1 hashes > - > > Key: NUTCH-3

[jira] [Closed] (NUTCH-396) mergesegs sorts URLs, making segments useless for subsequent fetch

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-396. --- Resolution: Won't Fix > mergesegs sorts URLs, making segments useless for subsequent fetch > -

[jira] [Closed] (NUTCH-326) WordExtractor throws java.util.NoSuchElementException on some documents

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-326. --- Resolution: Won't Fix > WordExtractor throws java.util.NoSuchElementException on some documents >

[jira] [Closed] (NUTCH-389) a url tokenizer implementation for tokenizing index fields : url and host

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-389. --- Resolution: Won't Fix > a url tokenizer implementation for tokenizing index fields : url and host > --

[jira] [Closed] (NUTCH-358) Language Switching PROBLEM FIXED

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-358?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-358. --- Resolution: Won't Fix > Language Switching PROBLEM FIXED > > >

[jira] [Closed] (NUTCH-290) parse-pdf: Garbage indexed when text-extraction not allowed

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-290. --- Resolution: Won't Fix > parse-pdf: Garbage indexed when text-extraction not allowed >

[jira] [Closed] (NUTCH-764) Add support for vfsfile:// loading of plugins for JBoss

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-764. --- Resolution: Won't Fix > Add support for vfsfile:// loading of plugins for JBoss >

[jira] [Closed] (NUTCH-470) Adding optional terms to a query

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-470. --- Resolution: Won't Fix > Adding optional terms to a query > > >

[jira] [Closed] (NUTCH-941) Search returns blank page, when there is more than one SOLR server configured

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-941?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-941. --- Resolution: Won't Fix > Search returns blank page, when there is more than one SOLR server configured

[jira] [Closed] (NUTCH-542) Null Pointer Exception on getSummary when segment no longer exists

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-542. --- Resolution: Won't Fix > Null Pointer Exception on getSummary when segment no longer exists > -

[jira] [Closed] (NUTCH-355) The title of query result could like the summary have the highlight??

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-355. --- Resolution: Won't Fix > The title of query result could like the summary have the highlight?? > -

[jira] [Closed] (NUTCH-260) Three new plugins that parse, index and query meta tags defined in the configuration

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-260. --- Resolution: Won't Fix > Three new plugins that parse, index and query meta tags defined in the > conf

[jira] [Closed] (NUTCH-453) Move stop words to a config file

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-453. --- Resolution: Won't Fix > Move stop words to a config file > > >

[jira] [Closed] (NUTCH-386) Plugin to index categories by url rules

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-386. --- Resolution: Won't Fix > Plugin to index categories by url rules >

[jira] [Closed] (NUTCH-265) Getting Clustered results in better form.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-265. --- Resolution: Won't Fix > Getting Clustered results in better form. > --

[jira] [Closed] (NUTCH-423) Add other index-basic fields as query plugins

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-423. --- Resolution: Won't Fix > Add other index-basic fields as query plugins > --

[jira] [Closed] (NUTCH-377) Add possibility to search for multiple values

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-377. --- Resolution: Won't Fix > Add possibility to search for multiple values > --

[jira] [Closed] (NUTCH-479) Support for OR queries

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-479. --- Resolution: Won't Fix > Support for OR queries > -- > > Key: NUTCH

[jira] [Closed] (NUTCH-466) Flexible segment format

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-466. --- Resolution: Won't Fix > Flexible segment format > --- > > Key: NUT

[jira] [Closed] (NUTCH-638) Launching Distributed Searchers with URI indicating filesystem to use rather than relying on hadoop config files.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-638. --- Resolution: Won't Fix > Launching Distributed Searchers with URI indicating filesystem to use rather

[jira] [Closed] (NUTCH-455) dedup on tokenized fields is faulty

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-455. --- Resolution: Won't Fix > dedup on tokenized fields is faulty > --- > >

[jira] [Closed] (NUTCH-573) Multiple Domains - Query Search

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-573. --- Resolution: Won't Fix > Multiple Domains - Query Search > --- > >

[jira] [Closed] (NUTCH-541) Index url field untokenized

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-541. --- Resolution: Won't Fix > Index url field untokenized > --- > >

[jira] [Closed] (NUTCH-708) NutchBean: OOM due to searcher.max.hits and dedup.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-708. --- Resolution: Won't Fix > NutchBean: OOM due to searcher.max.hits and dedup. > -

[jira] [Closed] (NUTCH-445) Domain İndexing / Query Filter

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-445. --- Resolution: Won't Fix > Domain İndexing / Query Filter > -- > >

[jira] [Closed] (NUTCH-820) Infinite loop when hitspersite is set

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-820. --- Resolution: Won't Fix > Infinite loop when hitspersite is set > -

[jira] [Closed] (NUTCH-674) NutchBean doesn't check for searcher.dir existance.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-674. --- Resolution: Won't Fix > NutchBean doesn't check for searcher.dir existance. >

  1   2   >