[jira] Created: (NUTCH-897) Subcollection requires blacklist element

2010-09-06 Thread Markus Jelsma (JIRA)
Subcollection requires blacklist element Key: NUTCH-897 URL: https://issues.apache.org/jira/browse/NUTCH-897 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.2

[jira] Commented: (NUTCH-716) Make subcollection index filed multivalued

2010-09-06 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906488#action_12906488 ] Markus Jelsma commented on NUTCH-716: - This patch concatenates multiple values in a

[jira] Issue Comment Edited: (NUTCH-716) Make subcollection index filed multivalued

2010-09-06 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906488#action_12906488 ] Markus Jelsma edited comment on NUTCH-716 at 9/6/10 9:32 AM: -

[jira] Issue Comment Edited: (NUTCH-716) Make subcollection index filed multivalued

2010-09-06 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906488#action_12906488 ] Markus Jelsma edited comment on NUTCH-716 at 9/6/10 9:51 AM: -

[jira] Created: (NUTCH-898) Multi valued subcollection is not multi valued

2010-09-06 Thread Markus Jelsma (JIRA)
Multi valued subcollection is not multi valued -- Key: NUTCH-898 URL: https://issues.apache.org/jira/browse/NUTCH-898 Project: Nutch Issue Type: Bug Components: indexer

[jira] Issue Comment Edited: (NUTCH-716) Make subcollection index filed multivalued

2010-09-06 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12906488#action_12906488 ] Markus Jelsma edited comment on NUTCH-716 at 9/6/10 12:45 PM: --

[jira] Closed: (NUTCH-898) Multi valued subcollection is not multi valued

2010-09-07 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-898. --- Resolution: Won't Fix The old (only) nightly build i was using did allow multiple values but

[jira] Created: (NUTCH-900) Confusion in nutch-default between http.content.limit and file.content.limit

2010-09-08 Thread Markus Jelsma (JIRA)
Confusion in nutch-default between http.content.limit and file.content.limit Key: NUTCH-900 URL: https://issues.apache.org/jira/browse/NUTCH-900 Project: Nutch

[jira] Updated: (NUTCH-900) Confusion in nutch-default between http.content.limit and file.content.limit

2010-09-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-900: Attachment: NUTCH-900.MarkusJelsma.100908.patch.txt Confusion in nutch-default between

[jira] Updated: (NUTCH-900) Confusion in nutch-default between http.content.limit and file.content.limit

2010-09-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-900: Patch Info: [Patch Available] Confusion in nutch-default between http.content.limit and

[jira] Created: (NUTCH-901) Make index-more plug-in configurable

2010-09-08 Thread Markus Jelsma (JIRA)
Make index-more plug-in configurable -- Key: NUTCH-901 URL: https://issues.apache.org/jira/browse/NUTCH-901 Project: Nutch Issue Type: Improvement Components: indexer Reporter:

[jira] Updated: (NUTCH-901) Make index-more plug-in configurable

2010-09-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-901: Attachment: NUTCH-901-MarkusJelsma.998958.patch Here's a patch for version 1.2. It includes a

[jira] Updated: (NUTCH-901) Make index-more plug-in configurable

2010-09-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-901: Attachment: NUTCH-901-trunk.998961.patch Here's also a patch for 2.0 trunk. I could not test the

[jira] Updated: (NUTCH-922) SolrWriter should log source fields that are not mapped

2010-10-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-922: Component/s: indexer SolrWriter should log source fields that are not mapped

[jira] Created: (NUTCH-922) SolrWriter should log source fields that are not mapped

2010-10-20 Thread Markus Jelsma (JIRA)
SolrWriter should log source fields that are not mapped --- Key: NUTCH-922 URL: https://issues.apache.org/jira/browse/NUTCH-922 Project: Nutch Issue Type: Improvement Reporter:

[jira] Assigned: (NUTCH-924) Static field in solr mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-924: --- Assignee: Markus Jelsma Static field in solr mapping

[jira] Assigned: (NUTCH-923) Multilingual support for Solr-index-mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-923: --- Assignee: Markus Jelsma Multilingual support for Solr-index-mapping

[jira] Commented: (NUTCH-924) Static field in solr mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923851#action_12923851 ] Markus Jelsma commented on NUTCH-924: - Yes, i'll look into it next week orso. The pro

[jira] Commented: (NUTCH-924) Static field in solr mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923861#action_12923861 ] Markus Jelsma commented on NUTCH-924: - Great! The patch almost works as i expected. It:

[jira] Commented: (NUTCH-923) Multilingual support for Solr-index-mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923879#action_12923879 ] Markus Jelsma commented on NUTCH-923: - This is a very useful feature. +1 Multilingual

[jira] Commented: (NUTCH-923) Multilingual support for Solr-index-mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923919#action_12923919 ] Markus Jelsma commented on NUTCH-923: - Andrzej is right. The LanguageIndexingFilter can

[jira] Assigned: (NUTCH-824) Crawling - File Error 404 when fetching file with an hexadecimal character in the file name.

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-824: --- Assignee: Markus Jelsma Crawling - File Error 404 when fetching file with an hexadecimal

[jira] Commented: (NUTCH-824) Crawling - File Error 404 when fetching file with an hexadecimal character in the file name.

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925308#action_12925308 ] Markus Jelsma commented on NUTCH-824: - You're correct, no patch has been submitted and

[jira] Updated: (NUTCH-824) Crawling - File Error 404 when fetching file with an hexadecimal character in the file name.

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-824: Affects Version/s: 2.0 1.3 1.2 Fix Version/s:

[jira] Reopened: (NUTCH-824) Crawling - File Error 404 when fetching file with an hexadecimal character in the file name.

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reopened NUTCH-824: - Crawling - File Error 404 when fetching file with an hexadecimal character in the file name.

[jira] Commented: (NUTCH-901) Make index-more plug-in configurable

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12925318#action_12925318 ] Markus Jelsma commented on NUTCH-901: - Applied patch and added Mattmann's test to

[jira] Updated: (NUTCH-900) Confusion in nutch-default between http.content.limit and file.content.limit

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-900: Attachment: NUTCH-900-1.3.patch This patch is for branch-1.3 and fixes a typo in http.content.limit

[jira] Commented: (NUTCH-924) Static field in solr mapping

2010-11-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12932415#action_12932415 ] Markus Jelsma commented on NUTCH-924: - Yes, it needs to be added to trunk too. Please

[jira] Updated: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-936: Patch Info: [Patch Available] LanguageIdentifier should not set empty lang field on NutchDocument

[jira] Updated: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-936: Attachment: NUTCH-936-v13-1.patch NUTCH-936-v13-1.patch

[jira] Issue Comment Edited: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934453#action_12934453 ] Markus Jelsma edited comment on NUTCH-936 at 11/22/10 8:10 AM: ---

[jira] Updated: (NUTCH-912) MoreIndexingFilter does not parse docx and xlsx date formats

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-912: Patch Info: [Patch Available] Affects Version/s: 2.0 1.3

[jira] Updated: (NUTCH-912) MoreIndexingFilter does not parse docx and xlsx date formats

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-912: Attachment: NUTCH-912-v13-1.patch NUTCH-912-v12-1.patch

[jira] Issue Comment Edited: (NUTCH-912) MoreIndexingFilter does not parse docx and xlsx date formats

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934473#action_12934473 ] Markus Jelsma edited comment on NUTCH-912 at 11/22/10 9:24 AM: ---

[jira] Commented: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934474#action_12934474 ] Markus Jelsma commented on NUTCH-936: - Committed for 1.3 in 1037732 Can't commit right

[jira] Commented: (NUTCH-912) MoreIndexingFilter does not parse docx and xlsx date formats

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12934475#action_12934475 ] Markus Jelsma commented on NUTCH-912: - Committed for 1.3 in 1037733 Can't commit right

[jira] Updated: (NUTCH-935) remove unnecessary /./ in basic urlnormalizer

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-935: Affects Version/s: 2.0 1.3 Fix Version/s: 2.0

[jira] Commented: (NUTCH-939) Added -dir command line option to Indexer and SolrIndexer, allowing to specify directory containing segments

2010-11-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12936003#action_12936003 ] Markus Jelsma commented on NUTCH-939: - This is a useful patch! Could you also submit a

[jira] Created: (NUTCH-961) Expose Tika's boilerpipe support

2011-01-23 Thread Markus Jelsma (JIRA)
Expose Tika's boilerpipe support Key: NUTCH-961 URL: https://issues.apache.org/jira/browse/NUTCH-961 Project: Nutch Issue Type: New Feature Components: parser Reporter: Markus Jelsma

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987132#action_12987132 ] Markus Jelsma commented on NUTCH-963: - Thanks Claudio. I'll fix the formatting and add a

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987559#action_12987559 ] Markus Jelsma commented on NUTCH-963: - The class works fine although i did add a commit

[jira] Commented: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987561#action_12987561 ] Markus Jelsma commented on NUTCH-964: - I remembered ;). I also updated the CHANGES and

[jira] Commented: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987566#action_12987566 ] Markus Jelsma commented on NUTCH-964: - I followed Chris' instruction in some issue on

[jira] Commented: (NUTCH-961) Expose Tika's boilerpipe support

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987575#action_12987575 ] Markus Jelsma commented on NUTCH-961: - Boilerpipe comes with several algorithms for

[jira] Commented: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12987613#action_12987613 ] Markus Jelsma commented on NUTCH-964: - Well, just building the most recent Gora did the

[jira] Resolved: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-964. - Resolution: Fixed Committed for trunk in rev 1064169. ERROR conf.Configuration - Failed to set

[jira] Updated: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-963: Attachment: NUTCH-963-command-and-log4j.patch SolrClean.java Add support for

[jira] Updated: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-963: Attachment: (was: NUTCH-963-command-and-log4j.patch) Add support for deleting Solr documents

[jira] Updated: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-963: Attachment: (was: SolrClean.java) Add support for deleting Solr documents with STATUS_DB_GONE

[jira] Created: (NUTCH-967) Upgrade to Tika 0.9

2011-02-17 Thread Markus Jelsma (JIRA)
Upgrade to Tika 0.9 --- Key: NUTCH-967 URL: https://issues.apache.org/jira/browse/NUTCH-967 Project: Nutch Issue Type: Task Components: parser Affects Versions: 1.3, 2.0 Reporter: Markus Jelsma

[jira] Resolved: (NUTCH-934) Upgrade to Tika 0.8

2011-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-934. - Resolution: Won't Fix This issue is superceded by NUTCH-967 Upgrade to Tika 0.8

[jira] Commented: (NUTCH-872) Change the default fetcher.parse to FALSE

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008397#comment-13008397 ] Markus Jelsma commented on NUTCH-872: - To all: Andrzej has committed this to 1.3 as

[jira] Commented: (NUTCH-958) Httpclient scheme priority order fix

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008401#comment-13008401 ] Markus Jelsma commented on NUTCH-958: - Hi Claudio. Is this desired behaviour? Shouldn't

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008402#comment-13008402 ] Markus Jelsma commented on NUTCH-963: - Julien, shouldn't the deduplicate mechanism kept

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008421#comment-13008421 ] Markus Jelsma commented on NUTCH-963: - Solr deduplication makes its own (fuzzy) hashes

[jira] Commented: (NUTCH-958) Httpclient scheme priority order fix

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008422#comment-13008422 ] Markus Jelsma commented on NUTCH-958: - Claudio, i am not sure if this workaround should

[jira] Commented: (NUTCH-967) Upgrade to Tika 0.9

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008453#comment-13008453 ] Markus Jelsma commented on NUTCH-967: - That didn't show up in test nor in a crawl, but

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13008469#comment-13008469 ] Markus Jelsma commented on NUTCH-963: - Committed for branch-1.3 in rev 1082944. - new

[jira] [Created] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci

2011-03-22 Thread Markus Jelsma (JIRA)
Injector job crashes with MySQL with table collation set to utf8_general_ci --- Key: NUTCH-970 URL: https://issues.apache.org/jira/browse/NUTCH-970 Project: Nutch Issue

[jira] [Commented] (NUTCH-967) Upgrade to Tika 0.9

2011-03-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13012993#comment-13012993 ] Markus Jelsma commented on NUTCH-967: - I applied your patch (seems i didn't properly

[jira] [Commented] (NUTCH-967) Upgrade to Tika 0.9

2011-03-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013006#comment-13013006 ] Markus Jelsma commented on NUTCH-967: - ant test-plugins BUILD SUCCESSFUL Total time:

[jira] [Updated] (NUTCH-897) Subcollection requires blacklist element

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-897: Attachment: NUTCH-897.patch Attached tested fix and if confirmed to work and not break existing

[jira] [Closed] (NUTCH-39) pagination in search result

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-39. -- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-36) Chinese in Nutch

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-36. -- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-13) If dns points to 127.0.0.1, the url is also crawled

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-13. -- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-79) Fault tolerant searching.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-79. -- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-83) Release deliverable as zip

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-83. -- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-103) Vivisimo like treeview and url redirect

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-103. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-104) Nutch query parser does not support CJK bi-gram segmentation.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-104. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-144) corrupt language identifier tri files and bad language recognition for german

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-144. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-180) Performance problem with widely used keywords

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-180. --- Resolution: Won't Fix Bulk close of legacy issues:

[jira] [Closed] (NUTCH-581) DistributedSearch does not update search servers added to search-servers.txt on the fly

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-581. --- DistributedSearch does not update search servers added to search-servers.txt on the fly

[jira] [Closed] (NUTCH-877) Allow setting of slop values for non-quote phrase queries on query-basic plugin

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-877. --- Allow setting of slop values for non-quote phrase queries on query-basic plugin

[jira] [Updated] (NUTCH-265) Getting Clustered results in better form.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-265: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-674) NutchBean doesn't check for searcher.dir existance.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-674: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-423) Add other index-basic fields as query plugins

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-423: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-47) Configure host filter to do wildcard prefixes - *.redhat.com

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-47: --- Bulk close of legacy issues:

[jira] [Updated] (NUTCH-943) Search Results default dedup field site should be stored in index.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-943: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-469) changes to geoPosition plugin to make it work on nutch 0.9

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-469: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-377) Add possibility to search for multiple values

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-377: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-540) some problem about the Nutch cache

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-540: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-453) Move stop words to a config file

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-453: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-542) Null Pointer Exception on getSummary when segment no longer exists

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-542: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-466) Flexible segment format

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-466: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-480) Searching multiple indexes with a single nutch instance

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-480: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-470) Adding optional terms to a query

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-470: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-541) Index url field untokenized

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-541: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-294) Topic-maps of related searchwords

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-294: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-72) Query basic filter with correction feature

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-72: --- Bulk close of legacy issues:

[jira] [Updated] (NUTCH-260) Three new plugins that parse, index and query meta tags defined in the configuration

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-260: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-445) Domain ─░ndexing / Query Filter

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-445: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-820) Infinite loop when hitspersite is set

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-820?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-820: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-92) DistributedSearch incorrectly scores results

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-92?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-92: --- Bulk close of legacy issues:

[jira] [Updated] (NUTCH-573) Multiple Domains - Query Search

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-573: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-764) Add support for vfsfile:// loading of plugins for JBoss

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-764: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-479) Support for OR queries

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-479: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-455) dedup on tokenized fields is faulty

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-455: Bulk close of legacy issues:

[jira] [Updated] (NUTCH-708) NutchBean: OOM due to searcher.max.hits and dedup.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-708: Bulk close of legacy issues:

[jira] [Closed] (NUTCH-72) Query basic filter with correction feature

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-72?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-72. -- Resolution: Won't Fix Query basic filter with correction feature

[jira] [Closed] (NUTCH-294) Topic-maps of related searchwords

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-294. --- Resolution: Won't Fix Topic-maps of related searchwords -

  1   2   3   4   5   6   7   8   9   10   >