[jira] [Closed] (NUTCH-454) Review Debug Level Log Guards

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-454. --- Closing all resolved issues with a non-fixed status. > Review Debug Level Log Guards > --

[jira] [Closed] (NUTCH-934) Upgrade to Tika 0.8

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-934. --- Closing all resolved issues with a non-fixed status. > Upgrade to Tika 0.8 > --- > >

[jira] [Closed] (NUTCH-692) AlreadyBeingCreatedException with Hadoop 0.19

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-692. --- Closing all resolved issues with a non-fixed status. > AlreadyBeingCreatedException with Hadoop 0.19 > --

[jira] [Closed] (NUTCH-733) plain text view of cached files ignores HTML encoding

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-733. --- Closing all resolved issues with a non-fixed status. > plain text view of cached files ignores HTML encod

[jira] [Closed] (NUTCH-778) Running Nutch On linux having whoami exception?

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-778. --- Closing all resolved issues with a non-fixed status. > Running Nutch On linux having whoami exception? >

[jira] [Closed] (NUTCH-736) how long it takes nutch 1.0 to fetch

2011-04-13 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-736. --- Closing all resolved issues with a non-fixed status. > how long it takes nutch 1.0 to fetch > ---

[jira] [Resolved] (NUTCH-980) Fix IllegalAccessError with slf4j used in Solrj.

2011-04-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-980. - Resolution: Fixed Committed for trunk in rev. 1092062. > Fix IllegalAccessError with slf4j used i

[jira] [Updated] (NUTCH-976) SolrIndex constants in wrong namespace (or prefix)

2011-04-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-976: Attachment: NUTCH-976-1.3-2.patch NUTCH-976-trunk-2.patch Patches for 1.3 and trunk.

[jira] [Commented] (NUTCH-976) SolrIndex constants in wrong namespace (or prefix)

2011-04-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019749#comment-13019749 ] Markus Jelsma commented on NUTCH-976: - All seems to be alright now for trunk and 1.3, a

[jira] [Commented] (NUTCH-975) Fix missing/wrong headers in source files

2011-04-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13019759#comment-13019759 ] Markus Jelsma commented on NUTCH-975: - Great stuff Julien! I'll also add the header for

[jira] [Updated] (NUTCH-975) Fix missing/wrong headers in source files

2011-04-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-975: Attachment: NUTCH-975-trunk-bin.patch Here's the patch for bin/nutch in trunk. > Fix missing/wrong

[jira] [Resolved] (NUTCH-975) Fix missing/wrong headers in source files

2011-04-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-975. - Resolution: Fixed Assignee: Markus Jelsma Everything builds and runs fine with these patches

[jira] [Updated] (NUTCH-976) Rename properties solrindex.* to solr.*

2011-04-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-976: Description: All Solr properties are now consistently using solr.* instead of solrindex.*. This has

[jira] [Resolved] (NUTCH-976) Rename properties solrindex.* to solr.*

2011-04-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-976. - Resolution: Fixed Fixed in 1.3 in rev 1092084 and for trunk in rev 1092085. Thanks Julien for com

[jira] [Resolved] (NUTCH-977) SolrMappingReader uses hardcoded configuration parameter name for mapping file

2011-04-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-977. - Resolution: Fixed Committed for trunk in rev. 1092090 for 1.3 in rev. 1092091. > SolrMappingReade

[jira] [Closed] (NUTCH-922) SolrWriter should log source fields that are not mapped

2011-04-14 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-922?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-922. --- Resolution: Not A Problem No problem, unmapped fields are written anyway. > SolrWriter should log sou

[jira] [Reopened] (NUTCH-386) Plugin to index categories by url rules

2011-04-16 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reopened NUTCH-386: - This is one of the closed legacy issues. I reopened it so Richard can actually attach the patch. > P

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-04-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: NUTCH-961-1.3-tikaparser.patch BoilerpipeExtractorRepository.java Here's

[jira] [Created] (NUTCH-984) Parse-tika throws some URL's away

2011-04-18 Thread Markus Jelsma (JIRA)
Parse-tika throws some URL's away - Key: NUTCH-984 URL: https://issues.apache.org/jira/browse/NUTCH-984 Project: Nutch Issue Type: Bug Components: parser Affects Versions: 1.3, 2.0 Re

[jira] [Updated] (NUTCH-984) Parse-tika throws some URL's away

2011-04-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-984: Description: For some reason using parse-tika a crawl just wouldn't dive into some website news arc

[jira] [Commented] (NUTCH-984) Parse-tika throws some URL's away

2011-04-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021113#comment-13021113 ] Markus Jelsma commented on NUTCH-984: - Yes i can test these URL's with tika-parsers 0.9

[jira] [Commented] (NUTCH-985) Problems indexing lastModifiedDate in Solr

2011-04-19 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021712#comment-13021712 ] Markus Jelsma commented on NUTCH-985: - This is similar to another issue described today

[jira] [Commented] (NUTCH-985) Problems indexing lastModifiedDate in Solr

2011-04-19 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021739#comment-13021739 ] Markus Jelsma commented on NUTCH-985: - Yes, something has to be done. What did you atta

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: (was: BoilerpipeExtractorRepository.java) > Expose Tika's boilerpipe support > -

[jira] [Updated] (NUTCH-961) Expose Tika's boilerpipe support

2011-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-961: Attachment: BoilerpipeExtractorRepository.java Here's the correct file. > Expose Tika's boilerpipe

[jira] [Created] (NUTCH-986) Dedup fails due to date format (long)

2011-04-26 Thread Markus Jelsma (JIRA)
Dedup fails due to date format (long) - Key: NUTCH-986 URL: https://issues.apache.org/jira/browse/NUTCH-986 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.3

[jira] [Updated] (NUTCH-985) Problems indexing lastModifiedDate in Solr

2011-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-985: Affects Version/s: 1.3 > Problems indexing lastModifiedDate in Solr > --

[jira] [Created] (NUTCH-987) Support HTTP auth for Solr communication

2011-04-26 Thread Markus Jelsma (JIRA)
Support HTTP auth for Solr communication Key: NUTCH-987 URL: https://issues.apache.org/jira/browse/NUTCH-987 Project: Nutch Issue Type: Improvement Components: indexer Reporter:

[jira] [Commented] (NUTCH-961) Expose Tika's boilerpipe support

2011-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025295#comment-13025295 ] Markus Jelsma commented on NUTCH-961: - Not safely, there are still issues regarding HTM

[jira] [Issue Comment Edited] (NUTCH-984) Parse-tika throws some URL's away

2011-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13021113#comment-13021113 ] Markus Jelsma edited comment on NUTCH-984 at 4/26/11 4:02 PM: --

[jira] [Updated] (NUTCH-985) Problems indexing lastModifiedDate in Solr

2011-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-985: Affects Version/s: 2.0 Fix Version/s: 2.0 1.3 Assignee: M

[jira] [Updated] (NUTCH-983) Upgrade SolrJ

2011-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-983: Affects Version/s: 1.3 Fix Version/s: 1.3 > Upgrade SolrJ > - > >

[jira] [Updated] (NUTCH-986) Dedup fails due to date format (long)

2011-04-26 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-986: Affects Version/s: 2.0 Fix Version/s: 2.0 Assignee: Markus Jelsma > Dedup fails

[jira] [Created] (NUTCH-988) index-feed plugin also doesn't use proper date fields

2011-04-27 Thread Markus Jelsma (JIRA)
index-feed plugin also doesn't use proper date fields - Key: NUTCH-988 URL: https://issues.apache.org/jira/browse/NUTCH-988 Project: Nutch Issue Type: Improvement Affects Versions: 1.3,

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-987: Attachment: NUTCH-987-1.3-hack.patch Attached nasty hack for the sake of not losing it. > Support H

[jira] [Created] (NUTCH-989) index-basic plugin also uses invalid date format for Solr

2011-04-27 Thread Markus Jelsma (JIRA)
index-basic plugin also uses invalid date format for Solr - Key: NUTCH-989 URL: https://issues.apache.org/jira/browse/NUTCH-989 Project: Nutch Issue Type: Improvement Affects Versio

[jira] [Updated] (NUTCH-985) Problems indexing lastModifiedDate in Solr

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-985: Attachment: NUTCH-985.1.3-1.patch Here's a working patch. It adds a date fieldtype to the schema and

[jira] [Updated] (NUTCH-985) MoreIndexingFilter doesn't use properly formatted date fields for Solr

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-985: Summary: MoreIndexingFilter doesn't use properly formatted date fields for Solr (was: Problems inde

[jira] [Updated] (NUTCH-989) index-basic plugin doesn

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-989: Summary: index-basic plugin doesn (was: index-basic plugin also uses invalid date format for Solr)

[jira] [Updated] (NUTCH-989) index-basic plugin doesn't use Solr date fieldType

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-989: Description: The index-basic plugin actually sends over a properly formatted date with millis but th

[jira] [Updated] (NUTCH-986) Dedup fails due to date format (long)

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-986: Patch Info: [Patch Available] > Dedup fails due to date format (long) >

[jira] [Updated] (NUTCH-986) Dedup fails due to date format (long)

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-986: Attachment: NUTCH-986-1.3-1.patch Here's a patch. It leaves all code intact but only converts the in

[jira] [Updated] (NUTCH-986) Dedup fails due to date format (long)

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-986: Attachment: NUTCH-986-trunk-1.patch Patch for trunk! > Dedup fails due to date format (long) >

[jira] [Updated] (NUTCH-985) MoreIndexingFilter doesn't use properly formatted date fields for Solr

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-985: Attachment: NUTCH-985-trunk-1.patch Patch for trunk! > MoreIndexingFilter doesn't use properly form

[jira] [Commented] (NUTCH-983) Upgrade SolrJ

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025809#comment-13025809 ] Markus Jelsma commented on NUTCH-983: - Can someone take a look at this? Updating ivy fr

[jira] [Commented] (NUTCH-983) Upgrade SolrJ

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025818#comment-13025818 ] Markus Jelsma commented on NUTCH-983: - SolrJ itself comes in nicely but it seems it com

[jira] [Commented] (NUTCH-990) protocol-httpclient fails with actually plain/text pages

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025820#comment-13025820 ] Markus Jelsma commented on NUTCH-990: - What does the log say? I tried my home_dir with

[jira] [Created] (NUTCH-991) SolrDedup must issue a commit

2011-04-27 Thread Markus Jelsma (JIRA)
SolrDedup must issue a commit - Key: NUTCH-991 URL: https://issues.apache.org/jira/browse/NUTCH-991 Project: Nutch Issue Type: Improvement Components: indexer Affects Versions: 1.3, 2.0 R

[jira] [Assigned] (NUTCH-991) SolrDedup must issue a commit

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-991: --- Assignee: Markus Jelsma > SolrDedup must issue a commit > - > >

[jira] [Commented] (NUTCH-985) MoreIndexingFilter doesn't use properly formatted date fields for Solr

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025835#comment-13025835 ] Markus Jelsma commented on NUTCH-985: - You are right. But various index-* plugins write

[jira] [Commented] (NUTCH-983) Upgrade SolrJ

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025837#comment-13025837 ] Markus Jelsma commented on NUTCH-983: - I can give it a try. What does the exclude exact

[jira] [Issue Comment Edited] (NUTCH-983) Upgrade SolrJ

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025837#comment-13025837 ] Markus Jelsma edited comment on NUTCH-983 at 4/27/11 3:28 PM: --

[jira] [Updated] (NUTCH-979) Add support for deleting Solr documents with ProtocolStatusCodes.NOTFOUND

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-979: Attachment: SolrClean.java Here's a WIP in case i'll accidentally send it all to the litter bin. Thi

[jira] [Commented] (NUTCH-986) Dedup fails due to date format (long)

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025869#comment-13025869 ] Markus Jelsma commented on NUTCH-986: - If there are no objections i'll commit this one

[jira] [Commented] (NUTCH-990) protocol-httpclient fails with short pages

2011-04-27 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13025956#comment-13025956 ] Markus Jelsma commented on NUTCH-990: - Could you post only the relevant parts of the lo

[jira] [Resolved] (NUTCH-986) Dedup fails due to date format (long)

2011-04-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-986. - Resolution: Fixed Committed 1.3 in rev. 1097390 and for trunk in rev. 1097391. > Dedup fails due

[jira] [Updated] (NUTCH-986) Dedup fails due to date format (long)

2011-04-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-986: Attachment: NUTCH-986-1.3-2.patch NUTCH-986-trunk-2.patch Previous patch was incorre

[jira] [Commented] (NUTCH-986) Dedup fails due to date format (long)

2011-04-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13026255#comment-13026255 ] Markus Jelsma commented on NUTCH-986: - Recommitted 1.3 in rev 1097410 and for trunk in

[jira] [Updated] (NUTCH-991) SolrDedup must issue a commit

2011-04-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-991: Attachment: NUTCH-991-trunk-1.patch NUTCH-991-1.3-1.patch Added the commit operation

[jira] [Resolved] (NUTCH-991) SolrDedup must issue a commit

2011-04-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-991. - Resolution: Fixed Committed in 1.3 rev 1097415 and trunk 1097416. > SolrDedup must issue a commit

[jira] [Created] (NUTCH-992) SolrDedup is broken in trunk

2011-04-28 Thread Markus Jelsma (JIRA)
SolrDedup is broken in trunk Key: NUTCH-992 URL: https://issues.apache.org/jira/browse/NUTCH-992 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 2.0 Reporter: Markus

[jira] [Updated] (NUTCH-991) SolrDedup must issue a commit

2011-04-28 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-991: Patch Info: [Patch Available] > SolrDedup must issue a commit > - > >

[jira] [Commented] (NUTCH-990) protocol-httpclient fails with short pages

2011-04-29 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027033#comment-13027033 ] Markus Jelsma commented on NUTCH-990: - guess we can mark it as won't fix then and close

[jira] [Resolved] (NUTCH-990) protocol-httpclient fails with short pages

2011-04-30 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-990. - Resolution: Won't Fix > protocol-httpclient fails with short pages > -

[jira] [Assigned] (NUTCH-989) index-basic plugin doesn't use Solr date fieldType

2011-05-02 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-989: --- Assignee: Markus Jelsma > index-basic plugin doesn't use Solr date fieldType > ---

[jira] [Updated] (NUTCH-989) index-basic plugin doesn't use Solr date fieldType

2011-05-02 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-989: Fix Version/s: 1.3 The supplied Solr schema must use a date fieldType instead of long. If not, dedu

[jira] [Commented] (NUTCH-983) Upgrade SolrJ

2011-05-02 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027597#comment-13027597 ] Markus Jelsma commented on NUTCH-983: - It works as expected in trunk but i can't seem t

[jira] [Updated] (NUTCH-710) Support for rel="canonical" attribute

2011-05-02 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-710: Fix Version/s: 2.0 Putting a useful issue back on the radar. Fix for 2.0? > Support for rel="canoni

[jira] [Updated] (NUTCH-717) Make Nutch Solr integration easier

2011-05-02 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-717: Fix Version/s: 2.0 Back on the radar for 2.0? > Make Nutch Solr integration easier > -

[jira] [Commented] (NUTCH-783) IndexerChecker Utilty

2011-05-02 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027604#comment-13027604 ] Markus Jelsma commented on NUTCH-783: - What's this? Shouldn't it be closed? > IndexerC

[jira] [Commented] (NUTCH-783) IndexerChecker Utilty

2011-05-02 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027613#comment-13027613 ] Markus Jelsma commented on NUTCH-783: - You're right. Shouldn't it be marked for a versi

[jira] [Issue Comment Edited] (NUTCH-983) Upgrade SolrJ

2011-05-02 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13027597#comment-13027597 ] Markus Jelsma edited comment on NUTCH-983 at 5/2/11 12:54 PM: --

[jira] [Updated] (NUTCH-987) Support HTTP auth for Solr communication

2011-05-02 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-987: Fix Version/s: 2.0 > Support HTTP auth for Solr communication >

[jira] [Resolved] (NUTCH-989) index-basic plugin doesn't use Solr date fieldType

2011-05-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-989. - Resolution: Fixed Date fieldType added and updated tstamp field to use the new fieldType. Committ

[jira] [Closed] (NUTCH-989) index-basic plugin doesn't use Solr date fieldType

2011-05-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-989. --- > index-basic plugin doesn't use Solr date fieldType > --

[jira] [Created] (NUTCH-994) Fine tune Solr schema

2011-05-05 Thread Markus Jelsma (JIRA)
Fine tune Solr schema - Key: NUTCH-994 URL: https://issues.apache.org/jira/browse/NUTCH-994 Project: Nutch Issue Type: Improvement Components: indexer Affects Versions: 1.3, 2.0 Reporter: Markus

[jira] [Commented] (NUTCH-983) Upgrade SolrJ

2011-05-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029365#comment-13029365 ] Markus Jelsma commented on NUTCH-983: - That works indeed (seems the exclusions must not

[jira] [Commented] (NUTCH-983) Upgrade SolrJ

2011-05-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029399#comment-13029399 ] Markus Jelsma commented on NUTCH-983: - I'll take a look at it for trunk, hopefully tomo

[jira] [Commented] (NUTCH-983) Upgrade SolrJ

2011-05-05 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13029470#comment-13029470 ] Markus Jelsma commented on NUTCH-983: - Great stuff! I just got my first svn conflict in

[jira] [Created] (NUTCH-996) Indexer adds solr.commit.size+1 docs

2011-05-07 Thread Markus Jelsma (JIRA)
Indexer adds solr.commit.size+1 docs Key: NUTCH-996 URL: https://issues.apache.org/jira/browse/NUTCH-996 Project: Nutch Issue Type: Bug Components: indexer Affects Versions: 1.3, 2.0

[jira] [Commented] (NUTCH-887) Delegate parsing of feeds to Tika

2011-05-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-887?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030549#comment-13030549 ] Markus Jelsma commented on NUTCH-887: - Julien committed NUTCH-888 for 1.3 and trunk. I

[jira] [Closed] (NUTCH-977) SolrMappingReader uses hardcoded configuration parameter name for mapping file

2011-05-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-977. --- > SolrMappingReader uses hardcoded configuration parameter name for mapping file > ---

[jira] [Closed] (NUTCH-991) SolrDedup must issue a commit

2011-05-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-991. --- > SolrDedup must issue a commit > - > > Key: NUTCH-991 >

[jira] [Closed] (NUTCH-980) Fix IllegalAccessError with slf4j used in Solrj.

2011-05-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-980. --- > Fix IllegalAccessError with slf4j used in Solrj. > > >

[jira] [Closed] (NUTCH-912) MoreIndexingFilter does not parse docx and xlsx date formats

2011-05-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-912. --- > MoreIndexingFilter does not parse docx and xlsx date formats > -

[jira] [Closed] (NUTCH-963) Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 urls)

2011-05-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-963. --- > Add support for deleting Solr documents with STATUS_DB_GONE in CrawlDB (404 > urls) > -

[jira] [Closed] (NUTCH-935) remove unnecessary /./ in basic urlnormalizer

2011-05-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-935. --- > remove unnecessary /./ in basic urlnormalizer > - > >

[jira] [Closed] (NUTCH-976) Rename properties solrindex.* to solr.*

2011-05-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-976. --- > Rename properties solrindex.* to solr.* > > >

[jira] [Closed] (NUTCH-986) Dedup fails due to date format (long)

2011-05-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-986?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-986. --- > Dedup fails due to date format (long) > - > > Key: N

[jira] [Closed] (NUTCH-897) Subcollection requires blacklist element

2011-05-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-897. --- > Subcollection requires blacklist element > > >

[jira] [Closed] (NUTCH-964) ERROR conf.Configuration - Failed to set setXIncludeAware(true)

2011-05-08 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma closed NUTCH-964. --- > ERROR conf.Configuration - Failed to set setXIncludeAware(true) > --

[jira] [Commented] (NUTCH-585) [PARSE-HTML plugin] Block certain parts of HTML code from being indexed

2011-05-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-585?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030660#comment-13030660 ] Markus Jelsma commented on NUTCH-585: - Thanks for mentioning Wim. This patch can be use

[jira] [Assigned] (NUTCH-994) Fine tune Solr schema

2011-05-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma reassigned NUTCH-994: --- Assignee: Markus Jelsma > Fine tune Solr schema > - > > Ke

[jira] [Updated] (NUTCH-994) Fine tune Solr schema

2011-05-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-994: Attachment: NUTCH-994-all.patch This patches changes: * non-analyzed field types to their Trie-based

[jira] [Updated] (NUTCH-994) Fine tune Solr schema

2011-05-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma updated NUTCH-994: Patch Info: [Patch Available] > Fine tune Solr schema > - > > Ke

[jira] [Issue Comment Edited] (NUTCH-994) Fine tune Solr schema

2011-05-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030969#comment-13030969 ] Markus Jelsma edited comment on NUTCH-994 at 5/10/11 12:34 AM: --

[jira] [Issue Comment Edited] (NUTCH-994) Fine tune Solr schema

2011-05-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13030969#comment-13030969 ] Markus Jelsma edited comment on NUTCH-994 at 5/10/11 12:35 AM: --

[jira] [Resolved] (NUTCH-996) Indexer adds solr.commit.size+1 docs

2011-05-09 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Markus Jelsma resolved NUTCH-996. - Resolution: Fixed Committed for trunk in rev. 1101279 and for 1.3 in 1101280. Commit.size might be

[jira] [Commented] (NUTCH-985) MoreIndexingFilter doesn't use properly formatted date fields for Solr

2011-05-11 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031682#comment-13031682 ] Markus Jelsma commented on NUTCH-985: - >From dev@nutch > For now a quick fix for the mo

[jira] [Commented] (NUTCH-997) IndexingFitlers to store Date objects instead of Strings

2011-05-18 Thread Markus Jelsma (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13035297#comment-13035297 ] Markus Jelsma commented on NUTCH-997: - Good work, especially that it supercedes the oth

<    1   2   3   4   5   6   7   8   9   10   >