[jira] Created: (NUTCH-897) Subcollection requires blacklist element

2010-09-06 Thread Markus Jelsma (JIRA)
Subcollection requires blacklist element Key: NUTCH-897 URL: https://issues.apache.org/jira/browse/NUTCH-897 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.2

[jira] Commented: (NUTCH-716) Make subcollection index filed multivalued

2010-09-06 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906488#action_12906488 ] Markus Jelsma commented on NUTCH-716: - This patch concatenates multiple values in a sing

[jira] Issue Comment Edited: (NUTCH-716) Make subcollection index filed multivalued

2010-09-06 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906488#action_12906488 ] Markus Jelsma edited comment on NUTCH-716 at 9/6/10 9:32 AM: - Th

[jira] Issue Comment Edited: (NUTCH-716) Make subcollection index filed multivalued

2010-09-06 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906488#action_12906488 ] Markus Jelsma edited comment on NUTCH-716 at 9/6/10 9:51 AM: - Th

[jira] Created: (NUTCH-898) Multi valued subcollection is not multi valued

2010-09-06 Thread Markus Jelsma (JIRA)
Multi valued subcollection is not multi valued -- Key: NUTCH-898 URL: https://issues.apache.org/jira/browse/NUTCH-898 Project: Nutch Issue Type: Bug Components: indexer Environme

[jira] Issue Comment Edited: (NUTCH-716) Make subcollection index filed multivalued

2010-09-06 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-716?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12906488#action_12906488 ] Markus Jelsma edited comment on NUTCH-716 at 9/6/10 12:45 PM: --

[jira] Closed: (NUTCH-898) Multi valued subcollection is not multi valued

2010-09-07 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-898. --- Resolution: Won't Fix The old (only) nightly build i was using did allow multiple values but concaten

[jira] Created: (NUTCH-900) Confusion in nutch-default between http.content.limit and file.content.limit

2010-09-08 Thread Markus Jelsma (JIRA)
Confusion in nutch-default between http.content.limit and file.content.limit Key: NUTCH-900 URL: https://issues.apache.org/jira/browse/NUTCH-900 Project: Nutch Issu

[jira] Updated: (NUTCH-900) Confusion in nutch-default between http.content.limit and file.content.limit

2010-09-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-900: Attachment: NUTCH-900.MarkusJelsma.100908.patch.txt > Confusion in nutch-default between http.conten

[jira] Updated: (NUTCH-900) Confusion in nutch-default between http.content.limit and file.content.limit

2010-09-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-900: Patch Info: [Patch Available] > Confusion in nutch-default between http.content.limit and file.conte

[jira] Created: (NUTCH-901) Make index-more plug-in configurable

2010-09-08 Thread Markus Jelsma (JIRA)
Make index-more plug-in configurable -- Key: NUTCH-901 URL: https://issues.apache.org/jira/browse/NUTCH-901 Project: Nutch Issue Type: Improvement Components: indexer Reporter: Marku

[jira] Updated: (NUTCH-901) Make index-more plug-in configurable

2010-09-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-901: Attachment: NUTCH-901-MarkusJelsma.998958.patch Here's a patch for version 1.2. It includes a backwa

[jira] Issue Comment Edited: (NUTCH-901) Make index-more plug-in configurable

2010-09-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12912547#action_12912547 ] Markus Jelsma edited comment on NUTCH-901 at 9/20/10 11:53 AM: ---

[jira] Updated: (NUTCH-901) Make index-more plug-in configurable

2010-09-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-901: Attachment: NUTCH-901-trunk.998961.patch Here's also a patch for 2.0 trunk. I could not test the cod

[jira] Updated: (NUTCH-922) SolrWriter should log source fields that are not mapped

2010-10-20 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-922: Component/s: indexer > SolrWriter should log source fields that are not mapped > ---

[jira] Created: (NUTCH-922) SolrWriter should log source fields that are not mapped

2010-10-20 Thread Markus Jelsma (JIRA)
SolrWriter should log source fields that are not mapped --- Key: NUTCH-922 URL: https://issues.apache.org/jira/browse/NUTCH-922 Project: Nutch Issue Type: Improvement Reporter:

[jira] Assigned: (NUTCH-924) Static field in solr mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-924: --- Assignee: Markus Jelsma > Static field in solr mapping > > >

[jira] Assigned: (NUTCH-923) Multilingual support for Solr-index-mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-923: --- Assignee: Markus Jelsma > Multilingual support for Solr-index-mapping > --

[jira] Commented: (NUTCH-924) Static field in solr mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923851#action_12923851 ] Markus Jelsma commented on NUTCH-924: - Yes, i'll look into it next week orso. The pro fo

[jira] Commented: (NUTCH-924) Static field in solr mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923861#action_12923861 ] Markus Jelsma commented on NUTCH-924: - Great! The patch almost works as i expected. It:

[jira] Commented: (NUTCH-923) Multilingual support for Solr-index-mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923879#action_12923879 ] Markus Jelsma commented on NUTCH-923: - This is a very useful feature. +1 > Multilingual

[jira] Commented: (NUTCH-923) Multilingual support for Solr-index-mapping

2010-10-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12923919#action_12923919 ] Markus Jelsma commented on NUTCH-923: - Andrzej is right. The LanguageIndexingFilter can

[jira] Commented: (NUTCH-714) Need a SFTP and SCP Protocol Handler

2010-10-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925153#action_12925153 ] Markus Jelsma commented on NUTCH-714: - I believe a 1.3 patch would be very welcome. Nutc

[jira] Assigned: (NUTCH-824) Crawling - File Error 404 when fetching file with an hexadecimal character in the file name.

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-824: --- Assignee: Markus Jelsma > Crawling - File Error 404 when fetching file with an hexadecimal cha

[jira] Commented: (NUTCH-824) Crawling - File Error 404 when fetching file with an hexadecimal character in the file name.

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925308#action_12925308 ] Markus Jelsma commented on NUTCH-824: - You're correct, no patch has been submitted and i

[jira] Updated: (NUTCH-824) Crawling - File Error 404 when fetching file with an hexadecimal character in the file name.

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-824: Affects Version/s: 2.0 1.3 1.2 Fix Version/s: 2

[jira] Reopened: (NUTCH-824) Crawling - File Error 404 when fetching file with an hexadecimal character in the file name.

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-824?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reopened NUTCH-824: - > Crawling - File Error 404 when fetching file with an hexadecimal character in > the file name. > --

[jira] Commented: (NUTCH-901) Make index-more plug-in configurable

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925318#action_12925318 ] Markus Jelsma commented on NUTCH-901: - Applied patch and added Mattmann's test to branch

[jira] Updated: (NUTCH-900) Confusion in nutch-default between http.content.limit and file.content.limit

2010-10-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-900: Attachment: NUTCH-900-1.3.patch This patch is for branch-1.3 and fixes a typo in http.content.limit

[jira] Commented: (NUTCH-924) Static field in solr mapping

2010-11-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-924?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12932415#action_12932415 ] Markus Jelsma commented on NUTCH-924: - Yes, it needs to be added to trunk too. Please su

[jira] Created: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument

2010-11-19 Thread Markus Jelsma (JIRA)
LanguageIdentifier should not set empty lang field on NutchDocument --- Key: NUTCH-936 URL: https://issues.apache.org/jira/browse/NUTCH-936 Project: Nutch Issue Type: Bug

[jira] Updated: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument

2010-11-19 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-936: Description: For some reason the language identifier plugin sometimes sets an empty value for the l

[jira] Updated: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-936: Patch Info: [Patch Available] > LanguageIdentifier should not set empty lang field on NutchDocument

[jira] Updated: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-936: Attachment: NUTCH-936-v13-1.patch NUTCH-936-v13-1.patch NUTCH-936-v12

[jira] Issue Comment Edited: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934453#action_12934453 ] Markus Jelsma edited comment on NUTCH-936 at 11/22/10 8:10 AM: ---

[jira] Updated: (NUTCH-912) MoreIndexingFilter does not parse docx and xlsx date formats

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-912: Patch Info: [Patch Available] Affects Version/s: 2.0 1.3 Fi

[jira] Commented: (NUTCH-912) MoreIndexingFilter does not parse docx and xlsx date formats

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934473#action_12934473 ] Markus Jelsma commented on NUTCH-912: - I added the new date format according to http://

[jira] Updated: (NUTCH-912) MoreIndexingFilter does not parse docx and xlsx date formats

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-912: Attachment: NUTCH-912-v13-1.patch NUTCH-912-v12-1.patch NUTCH-912-v12

[jira] Issue Comment Edited: (NUTCH-912) MoreIndexingFilter does not parse docx and xlsx date formats

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934473#action_12934473 ] Markus Jelsma edited comment on NUTCH-912 at 11/22/10 9:24 AM: ---

[jira] Commented: (NUTCH-936) LanguageIdentifier should not set empty lang field on NutchDocument

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934474#action_12934474 ] Markus Jelsma commented on NUTCH-936: - Committed for 1.3 in 1037732 Can't commit right n

[jira] Commented: (NUTCH-912) MoreIndexingFilter does not parse docx and xlsx date formats

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12934475#action_12934475 ] Markus Jelsma commented on NUTCH-912: - Committed for 1.3 in 1037733 Can't commit right n

[jira] Updated: (NUTCH-935) remove unnecessary /./ in basic urlnormalizer

2010-11-22 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-935: Affects Version/s: 2.0 1.3 Fix Version/s: 2.0 1

[jira] Commented: (NUTCH-939) Added -dir command line option to Indexer and SolrIndexer, allowing to specify directory containing segments

2010-11-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12936003#action_12936003 ] Markus Jelsma commented on NUTCH-939: - This is a useful patch! Could you also submit a p

[jira] Commented: (NUTCH-901) Make index-more plug-in configurable

2011-01-04 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12977451#action_12977451 ] Markus Jelsma commented on NUTCH-901: - Thanks. Will remember next time. > Make index-

[jira] Created: (NUTCH-961) Expose Tika's boilerpipe support

2011-01-23 Thread Markus Jelsma (JIRA)
Expose Tika's boilerpipe support Key: NUTCH-961 URL: https://issues.apache.org/jira/browse/NUTCH-961 Project: Nutch Issue Type: New Feature Components: parser Reporter: Markus Jelsma

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987132#action_12987132 ] Markus Jelsma commented on NUTCH-963: - Thanks Claudio. I'll fix the formatting and add a

[jira] Updated: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-963: Affects Version/s: (was: 1.3) 2.0 Fix Version/s: 2.0

[jira] Created: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
ERROR conf.Configuration - Failed to set setXIncludeAware(true) --- Key: NUTCH-964 URL: https://issues.apache.org/jira/browse/NUTCH-964 Project: Nutch Issue Type: Bug Affects Ve

[jira] Updated: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-964: Attachment: NUTCH-964.patch Upgrades xercesImpl from 2.6.2 to 2.9.1 > ERROR conf.Configuration - Fa

[jira] Updated: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-964: Patch Info: [Patch Available] > ERROR conf.Configuration - Failed to set setXIncludeAware(true) > --

[jira] Updated: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-964: Affects Version/s: 2.0 Fix Version/s: 2.0 > ERROR conf.Configuration - Failed to set setXInc

[jira] Assigned: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-964: --- Assignee: Markus Jelsma > ERROR conf.Configuration - Failed to set setXIncludeAware(true) > --

[jira] Updated: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-964: Attachment: NUTCH-964-trunk.patch Patch for Nutch 2.0 > ERROR conf.Configuration - Failed to set se

[jira] Commented: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987555#action_12987555 ] Markus Jelsma commented on NUTCH-964: - All tests pass. Committed for branch-1.3 in rev 1

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987559#action_12987559 ] Markus Jelsma commented on NUTCH-963: - The class works fine although i did add a commit

[jira] Commented: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987561#action_12987561 ] Markus Jelsma commented on NUTCH-964: - I remembered ;). I also updated the CHANGES and a

[jira] Commented: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987566#action_12987566 ] Markus Jelsma commented on NUTCH-964: - I followed Chris' instruction in some issue on Go

[jira] Commented: (NUTCH-961) Expose Tika's boilerpipe support

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987575#action_12987575 ] Markus Jelsma commented on NUTCH-961: - Boilerpipe comes with several algorithms for stri

[jira] Commented: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12987613#action_12987613 ] Markus Jelsma commented on NUTCH-964: - Well, just building the most recent Gora did the

[jira] Resolved: (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-964. - Resolution: Fixed Committed for trunk in rev 1064169. > ERROR conf.Configuration - Failed to set

[jira] Updated: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-963: Attachment: NUTCH-963-command-and-log4j.patch SolrClean.java > Add support for delet

[jira] Updated: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-963: Attachment: (was: SolrClean.java) > Add support for deleting Solr documents with STATUS_DB_GONE

[jira] Updated: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-963: Attachment: SolrClean.java NUTCH-963-command-and-log4j.patch Here's a patch for the

[jira] Updated: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-963: Attachment: (was: NUTCH-963-command-and-log4j.patch) > Add support for deleting Solr documents w

[jira] Updated: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-963: Attachment: SolrClean.java Of course! You reset numDeletes for each batch. Thanks! > Add support fo

[jira] Updated: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-01-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-963: Attachment: (was: SolrClean.java) > Add support for deleting Solr documents with STATUS_DB_GONE

[jira] Created: (NUTCH-967) Upgrade to Tika 0.9

2011-02-17 Thread Markus Jelsma (JIRA)
Upgrade to Tika 0.9 --- Key: NUTCH-967 URL: https://issues.apache.org/jira/browse/NUTCH-967 Project: Nutch Issue Type: Task Components: parser Affects Versions: 1.3, 2.0 Reporter: Markus Jelsma

[jira] Resolved: (NUTCH-934) Upgrade to Tika 0.8

2011-02-17 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-934. - Resolution: Won't Fix This issue is superceded by NUTCH-967 > Upgrade to Tika 0.8 > -

[jira] Commented: (NUTCH-872) Change the default fetcher.parse to FALSE

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-872?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008397#comment-13008397 ] Markus Jelsma commented on NUTCH-872: - To all: Andrzej has committed this to 1.3 as wel

[jira] Commented: (NUTCH-958) Httpclient scheme priority order fix

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008401#comment-13008401 ] Markus Jelsma commented on NUTCH-958: - Hi Claudio. Is this desired behaviour? Shouldn't

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008402#comment-13008402 ] Markus Jelsma commented on NUTCH-963: - Julien, shouldn't the deduplicate mechanism kept

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008421#comment-13008421 ] Markus Jelsma commented on NUTCH-963: - Solr deduplication makes its own (fuzzy) hashes

[jira] Commented: (NUTCH-958) Httpclient scheme priority order fix

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008422#comment-13008422 ] Markus Jelsma commented on NUTCH-958: - Claudio, i am not sure if this workaround should

[jira] Commented: (NUTCH-967) Upgrade to Tika 0.9

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008453#comment-13008453 ] Markus Jelsma commented on NUTCH-967: - That didn't show up in test nor in a crawl, but

[jira] Commented: (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-03-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008469#comment-13008469 ] Markus Jelsma commented on NUTCH-963: - Committed for branch-1.3 in rev 1082944. - new c

[jira] [Created] (NUTCH-970) Injector job crashes with MySQL with table collation set to utf8_general_ci

2011-03-22 Thread Markus Jelsma (JIRA)
Injector job crashes with MySQL with table collation set to utf8_general_ci --- Key: NUTCH-970 URL: https://issues.apache.org/jira/browse/NUTCH-970 Project: Nutch Issue

[jira] [Commented] (NUTCH-967) Upgrade to Tika 0.9

2011-03-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13012993#comment-13012993 ] Markus Jelsma commented on NUTCH-967: - I applied your patch (seems i didn't properly re

[jira] [Commented] (NUTCH-967) Upgrade to Tika 0.9

2011-03-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013006#comment-13013006 ] Markus Jelsma commented on NUTCH-967: - ant test-plugins BUILD SUCCESSFUL Total time: 2

[jira] [Updated] (NUTCH-897) Subcollection requires blacklist element

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-897: Affects Version/s: 2.0 1.3 Fix Version/s: 2.0 1

[jira] [Updated] (NUTCH-897) Subcollection requires blacklist element

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-897: Patch Info: [Patch Available] > Subcollection requires blacklist element > -

[jira] [Updated] (NUTCH-897) Subcollection requires blacklist element

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-897: Attachment: NUTCH-897.patch Attached tested fix and if confirmed to work and not break existing con

[jira] [Commented] (NUTCH-973) Remove Segment Merger in 1.3

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13014566#comment-13014566 ] Markus Jelsma commented on NUTCH-973: - I'm not sure we should. In 1.x fetches still gen

[jira] [Closed] (NUTCH-18) Windows servers include illegal characters in URLs

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-18?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-18. -- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738ee

[jira] [Closed] (NUTCH-39) pagination in search result

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-39?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-39. -- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738ee

[jira] [Closed] (NUTCH-36) Chinese in Nutch

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-36?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-36. -- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738ee

[jira] [Closed] (NUTCH-13) If dns points to 127.0.0.1, the url is also crawled

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-13. -- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738ee

[jira] [Closed] (NUTCH-79) Fault tolerant searching.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-79?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-79. -- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738ee

[jira] [Closed] (NUTCH-83) Release deliverable as zip

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-83?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-83. -- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738ee

[jira] [Closed] (NUTCH-103) Vivisimo like treeview and url redirect

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-103. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-104) Nutch query parser does not support CJK bi-gram segmentation.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-104. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-144) corrupt language identifier tri files and bad language recognition for german

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-144. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-180) Performance problem with widely used keywords

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-180. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-132) Add ability to sort on more than one column

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-132. --- Resolution: Won't Fix Bulk close of legacy issues: http://www.lucidimagination.com/search/document/273

[jira] [Closed] (NUTCH-581) DistributedSearch does not update search servers added to search-servers.txt on the fly

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-581?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-581. --- > DistributedSearch does not update search servers added to search-servers.txt > on the fly > ---

[jira] [Closed] (NUTCH-877) Allow setting of slop values for non-quote phrase queries on query-basic plugin

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-877. --- > Allow setting of slop values for non-quote phrase queries on query-basic > plugin > ---

[jira] [Closed] (NUTCH-775) Enhance Searcher interface

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-775. --- > Enhance Searcher interface > -- > > Key: NUTCH-775 >

[jira] [Updated] (NUTCH-265) Getting Clustered results in better form.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-265: Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Updated] (NUTCH-674) NutchBean doesn't check for searcher.dir existance.

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-674?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-674: Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Updated] (NUTCH-423) Add other index-basic fields as query plugins

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-423: Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_o

[jira] [Updated] (NUTCH-47) Configure host filter to do wildcard prefixes - *.redhat.com

2011-04-01 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-47?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-47: --- Bulk close of legacy issues: http://www.lucidimagination.com/search/document/2738eeb014805854/clean_up_open

  1   2   3   4   5   6   7   8   9   10   >