[jira] [Commented] (SOLR-4197) EDismax allows end users to use local params in q= to override global params
[ https://issues.apache.org/jira/browse/SOLR-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533611#comment-13533611 ] Peter Wolanin commented on SOLR-4197: - Ok, but there is no way to enforce that in the configuration, right? At the very least it's a documentation problem, but I would still consider it a problem that I can't lock this down via solrconfig.xml EDismax allows end users to use local params in q= to override global params Key: SOLR-4197 URL: https://issues.apache.org/jira/browse/SOLR-4197 Project: Solr Issue Type: Bug Affects Versions: 3.5, 3.6, 4.0 Reporter: Peter Wolanin Edismax is advertised as suitable to be used to process advanced user input directly. Thus, it would seem reasonable to have an application directly pass user input in the q= parameter to a back-end Solr server. However, it seems that users can enter local params at the start of q= which override the global params that the application (e.g. website) may have set on the query string. Confirmed with Erik Hatcher that this is somewhat unexpected behavior (though one could argue it's an expected feature of any query parser) Proposed fix - add a parameter (e.g. that can be used as an invariant) that can be passed to inhibit Solr from using local params from the q= parameter. This is somewhat related to SOLR-1687 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4197) EDismax allows end users to use local params in q= to override global params
[ https://issues.apache.org/jira/browse/SOLR-4197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13533643#comment-13533643 ] Peter Wolanin commented on SOLR-4197: - Apparently adding a space at the beginning is not a complete solution - I then get an exception when it's the standard lucene parser: {code} Problem accessing /solr/select. Reason: org.apache.lucene.queryParser.ParseException: Cannot parse ' {!lucene}hello': Encountered } } at line 1, column 9. Was expecting one of: TO ... RANGEEX_QUOTED ... RANGEEX_GOOP ... {code} EDismax allows end users to use local params in q= to override global params Key: SOLR-4197 URL: https://issues.apache.org/jira/browse/SOLR-4197 Project: Solr Issue Type: Bug Affects Versions: 3.5, 3.6, 4.0 Reporter: Peter Wolanin Edismax is advertised as suitable to be used to process advanced user input directly. Thus, it would seem reasonable to have an application directly pass user input in the q= parameter to a back-end Solr server. However, it seems that users can enter local params at the start of q= which override the global params that the application (e.g. website) may have set on the query string. Confirmed with Erik Hatcher that this is somewhat unexpected behavior (though one could argue it's an expected feature of any query parser) Proposed fix - add a parameter (e.g. that can be used as an invariant) that can be passed to inhibit Solr from using local params from the q= parameter. This is somewhat related to SOLR-1687 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4197) EDismax allows end users to use local params in q= to override global params
Peter Wolanin created SOLR-4197: --- Summary: EDismax allows end users to use local params in q= to override global params Key: SOLR-4197 URL: https://issues.apache.org/jira/browse/SOLR-4197 Project: Solr Issue Type: Bug Affects Versions: 4.0, 3.6, 3.5 Reporter: Peter Wolanin Edismax is advertised as suitable to be used to process advanced user input directly. Thus, it would seem reasonable to have an application directly pass user input in the q= parameter to a back-end Solr server. However, it seems that users can enter local params at the start of q= which override the global params that the application (e.g. website) may have set on the query string. Confirmed with Erik Hatcher that this is somewhat unexpected behavior (though one could argue it's an expected feature of any query parser) Proposed fix - add a parameter (e.g. that can be used as an invariant) that can be passed to inhibit Solr from using local params from the q= parameter. This is somewhat related to SOLR-1687 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4077) Solr params like zkHost are not consistently settable via JNDI - many only via system properties
Peter Wolanin created SOLR-4077: --- Summary: Solr params like zkHost are not consistently settable via JNDI - many only via system properties Key: SOLR-4077 URL: https://issues.apache.org/jira/browse/SOLR-4077 Project: Solr Issue Type: Bug Components: SolrCloud Reporter: Peter Wolanin Fix For: 4.0.1, 4.1, 5.0 The Solr home can be set via JNDI environment, and in general system properties should be used for configuring the container, not the application, since the container may run several web apps. Let's add a helper method to something like SolrResourceLoader.java to look up values like zkHost (to find the zookeepers) or hostPort that can currently be in solr.xml OR in a system property, but not in e.g. a tomcat context file. The helper would avoid then need to write code to try both options as currently exists in locateSolrHome() -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2166) termvector component has strange syntax
[ https://issues.apache.org/jira/browse/SOLR-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2166: Attachment: SOLR-2166.diff termvector component has strange syntax --- Key: SOLR-2166 URL: https://issues.apache.org/jira/browse/SOLR-2166 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Attachments: SOLR-2166.diff The termvector response format could really be improved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2166) termvector component has strange syntax
[ https://issues.apache.org/jira/browse/SOLR-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2166: Attachment: SOLR-2166.diff Here's a patch rolled against Solr 3.5 which I think makes the format into something more compact and that doesn't fail on JSON parsing. BEFORE: {code} termVectors:{ doc-49:{ uniqueKey:evfbih/node/89, content:{ abba:{ positions:{ position:49}}, abigo:{ positions:{ position:5, position:72}}, {code} AFTER: {code} termVectors:{ doc-49:{ uniqueKey:evfbih/node/89, content:{ abba:{ positions:[49]}, abigo:{ positions:[5, 72]}, {code} termvector component has strange syntax --- Key: SOLR-2166 URL: https://issues.apache.org/jira/browse/SOLR-2166 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Attachments: SOLR-2166.diff The termvector response format could really be improved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2166) termvector component has strange syntax
[ https://issues.apache.org/jira/browse/SOLR-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2166: Attachment: (was: SOLR-2166.diff) termvector component has strange syntax --- Key: SOLR-2166 URL: https://issues.apache.org/jira/browse/SOLR-2166 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Attachments: SOLR-2166.diff The termvector response format could really be improved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2166) termvector component has strange syntax
[ https://issues.apache.org/jira/browse/SOLR-2166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2166: Attachment: workaround-managled-SOLR-2166.diff As a work-around one could enabled access to the unused(?) writeNamedListAsMapMangled() function which prevents writing duplicate keys. For this work-around patch, use: json.nl=mapm instead of json.nl=map to see the behavior AFTER: {code} termVectors:{ doc-49:{ uniqueKey:evfbih/node/89, content:{ abba:{ positions:{ position:49}}, abigo:{ positions:{ position:5, position_1:72}}, {code} termvector component has strange syntax --- Key: SOLR-2166 URL: https://issues.apache.org/jira/browse/SOLR-2166 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Attachments: SOLR-2166.diff, workaround-managled-SOLR-2166.diff The termvector response format could really be improved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2535) REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2535: Summary: REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings (was: In Solr 3.2 and trunk the admin/file handler fails to show directory listings) REGRESSION: in Solr 3.x and trunk the admin/file handler fails to show directory listings - Key: SOLR-2535 URL: https://issues.apache.org/jira/browse/SOLR-2535 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.1, 3.2, 4.0 Environment: java 1.6, jetty Reporter: Peter Wolanin Fix For: 3.4, 4.0 Attachments: SOLR-2535.patch, SOLR-2535_fix_admin_file_handler_for_directory_listings.patch In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage
[ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13052601#comment-13052601 ] Peter Wolanin commented on SOLR-2462: - I generated a patch for 3.2 looking at the commit on branch_3x. It looks somewhat different from the last patch by James. I also just compared the trunk commit to the last patch and it doesn't match https://issues.apache.org/jira/secure/attachment/12481574/SOLR-2462.patch Did the wrong patch get committed, or was the final patch just never get posted to this issue before commit? Using spellcheck.collate can result in extremely high memory usage -- Key: SOLR-2462 URL: https://issues.apache.org/jira/browse/SOLR-2462 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 3.1 Reporter: James Dyer Assignee: Robert Muir Priority: Critical Fix For: 3.3, 4.0 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch When using spellcheck.collate, class SpellPossibilityIterator creates a ranked list of *every* possible correction combination. But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory. This bug was introduced with SOLR-2010. However, it is triggered anytime spellcheck.collate is used. It is not necessary to use any features that were added with SOLR-2010. We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with infinite GC loops. It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app. This URL results in a search with ~12 misspelled words. We have spellcheck.count set to 15. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2535: Attachment: SOLR-2535.patch Here's the patch I used. As before, it's just David's with the extra changes omitted. In Solr 3.2 and trunk the admin/file handler fails to show directory listings - Key: SOLR-2535 URL: https://issues.apache.org/jira/browse/SOLR-2535 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.1, 3.2, 4.0 Environment: java 1.6, jetty Reporter: Peter Wolanin Fix For: 3.3 Attachments: SOLR-2535.patch, SOLR-2535_fix_admin_file_handler_for_directory_listings.patch In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13047999#comment-13047999 ] Peter Wolanin commented on SOLR-2535: - Quick test works - patched the 3.2 source and rebuilding the directory and subdirctory listings work as expected. The patch I used is the same as David's but just re-rolled without the changes to SolrDisptatchFilter.java I'm trying to attach it, but jira is throwing a stack trace. In Solr 3.2 and trunk the admin/file handler fails to show directory listings - Key: SOLR-2535 URL: https://issues.apache.org/jira/browse/SOLR-2535 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.1, 3.2, 4.0 Environment: java 1.6, jetty Reporter: Peter Wolanin Fix For: 3.3 Attachments: SOLR-2535_fix_admin_file_handler_for_directory_listings.patch In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2535) In Solr 3.1.0 the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13046122#comment-13046122 ] Peter Wolanin commented on SOLR-2535: - This ought to be a trivial fix, so I hope we can get it in 3.1.1, or is 3.3 going to be the next minor version? In Solr 3.1.0 the admin/file handler fails to show directory listings - Key: SOLR-2535 URL: https://issues.apache.org/jira/browse/SOLR-2535 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.1, 4.0 Environment: java 1.6, jetty Reporter: Peter Wolanin Fix For: 3.3 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2535) In Solr 3.2 and trunk the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2535: Affects Version/s: 3.2 Summary: In Solr 3.2 and trunk the admin/file handler fails to show directory listings (was: In Solr 3.1.0 the admin/file handler fails to show directory listings) In Solr 3.2 and trunk the admin/file handler fails to show directory listings - Key: SOLR-2535 URL: https://issues.apache.org/jira/browse/SOLR-2535 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.1, 3.2, 4.0 Environment: java 1.6, jetty Reporter: Peter Wolanin Fix For: 3.3 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2535) In Solr 3.1.0 the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2535: Fix Version/s: 4.0 Description: In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. was: In Solr 4.1.1, going to the path solr/admin/file I see an XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. Affects Version/s: 4.0 In Solr 3.1.0 the admin/file handler fails to show directory listings - Key: SOLR-2535 URL: https://issues.apache.org/jira/browse/SOLR-2535 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.1, 4.0 Environment: java 1.6, jetty Reporter: Peter Wolanin Fix For: 3.1.1, 3.2, 4.0 In Solr 1.4.1, going to the path solr/admin/file I see an XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2535) In Solr 3.1.0 the admin/file handler failes to show director listings
In Solr 3.1.0 the admin/file handler failes to show director listings - Key: SOLR-2535 URL: https://issues.apache.org/jira/browse/SOLR-2535 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.1 Environment: java 1.6, jetty Reporter: Peter Wolanin Fix For: 3.1.1, 3.2 In Solr 4.1.1, going to the path solr/admin/file I see and XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2535) In Solr 3.1.0 the admin/file handler fails to show directory listings
[ https://issues.apache.org/jira/browse/SOLR-2535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2535: Description: In Solr 4.1.1, going to the path solr/admin/file I see and XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. was: In Solr 4.1.1, going to the path solr/admin/file I see and XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. Summary: In Solr 3.1.0 the admin/file handler fails to show directory listings (was: In Solr 3.1.0 the admin/file handler failes to show director listings) In Solr 3.1.0 the admin/file handler fails to show directory listings - Key: SOLR-2535 URL: https://issues.apache.org/jira/browse/SOLR-2535 Project: Solr Issue Type: Bug Components: SearchComponents - other Affects Versions: 3.1 Environment: java 1.6, jetty Reporter: Peter Wolanin Fix For: 3.1.1, 3.2 In Solr 4.1.1, going to the path solr/admin/file I see and XML-formatted listing of the conf directory, like: {noformat} response lst name=responseHeaderint name=status0/intint name=QTime1/int/lst lst name=files lst name=elevate.xmllong name=size1274/longdate name=modified2011-03-06T20:42:54Z/date/lst ... /lst /response {noformat} I can list the xslt sub-dir using solr/admin/files?file=/xslt In Solr 3.1.0, both of these fail with a 500 error: {noformat} HTTP ERROR 500 Problem accessing /solr/admin/file/. Reason: did not find a CONTENT object java.io.IOException: did not find a CONTENT object {noformat} Looking at the code in class ShowFileRequestHandler, it seem like 3.1.0 should still handle directory listings if not file name is given, or if the file is a directory, so I am filing this as a bug. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2168) Velocity facet output for facet missing
[ https://issues.apache.org/jira/browse/SOLR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13034994#comment-13034994 ] Peter Wolanin commented on SOLR-2168: - Did this change to the templates get committed to the actual Solr repo? Velocity facet output for facet missing --- Key: SOLR-2168 URL: https://issues.apache.org/jira/browse/SOLR-2168 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 3.1 Reporter: Peter Wolanin Priority: Minor Attachments: SOLR-2168.patch If I add fact.missing to the facet params for a field, the Veolcity output has in the facet list: $facet.name (9220) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-232) let Solr set request headers (for logging)
[ https://issues.apache.org/jira/browse/SOLR-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024649#comment-13024649 ] Peter Wolanin commented on SOLR-232: Looks like the title needs to change? From looking at the Solr 1.4 code, it seems this issue is now about setting RESPONSE headers? That's certainly the use case I have in mind, and what seems to be commented out in the Solr 1.4 code: https://svn.apache.org/repos/asf/lucene/solr/branches/branch-1.4/src/webapp/src/org/apache/solr/servlet/SolrDispatchFilter.java {code} // add info to http headers //TODO: See SOLR-232 and SOLR-267. /*try { NamedList solrRspHeader = solrRsp.getResponseHeader(); for (int i=0; isolrRspHeader.size(); i++) { ((javax.servlet.http.HttpServletResponse) response).addHeader((Solr- + solrRspHeader.getName(i)), String.valueOf(solrRspHeader.getVal(i))); } } catch (ClassCastException cce) { log.log(Level.WARNING, exception adding response header log information, cce); }*/ {code} However, the things currently sent in the response header seem to be missing the # of matches (logged as hits), and I'm not sure I'd want all the params sent back as headers by default. So, maybe we need a method like solrRsp.getHttpResponseHeader(); instead of using solrRsp.getResponseHeader(); and corresponding setters? let Solr set request headers (for logging) -- Key: SOLR-232 URL: https://issues.apache.org/jira/browse/SOLR-232 Project: Solr Issue Type: New Feature Environment: tomcat? Reporter: Ian Holsman Priority: Minor Attachments: meta.patch I need the ability to log certain information about a request so that I can feed it into performance and capacity monitoring systems. I would like to know things like - how long the request took - how many rows were fetched and returned - what handler was called. per request. the following patch is 1 way to implement this, I'm sure there are better ways. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-232) let Solr set request headers (for logging)
[ https://issues.apache.org/jira/browse/SOLR-232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13024651#comment-13024651 ] Peter Wolanin commented on SOLR-232: In addition, or instead, we could make it so that which elements from the responseHeader are set as http response headers is configurable in solrconfig.xml for each request handler? let Solr set request headers (for logging) -- Key: SOLR-232 URL: https://issues.apache.org/jira/browse/SOLR-232 Project: Solr Issue Type: New Feature Environment: tomcat? Reporter: Ian Holsman Priority: Minor Attachments: meta.patch I need the ability to log certain information about a request so that I can feed it into performance and capacity monitoring systems. I would like to know things like - how long the request took - how many rows were fetched and returned - what handler was called. per request. the following patch is 1 way to implement this, I'm sure there are better ways. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-232) let Solr set request headers (for logging)
[ https://issues.apache.org/jira/browse/SOLR-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-232: --- Attachment: SOLR-232.patch Here's a patch against Solr 3.1 just as a proof of concept that adds the hits as a response header X-Solr-Hits Apparently this code has been commented out so long that the log call and other things changed. let Solr set request headers (for logging) -- Key: SOLR-232 URL: https://issues.apache.org/jira/browse/SOLR-232 Project: Solr Issue Type: New Feature Environment: tomcat? Reporter: Ian Holsman Priority: Minor Attachments: SOLR-232.patch, meta.patch I need the ability to log certain information about a request so that I can feed it into performance and capacity monitoring systems. I would like to know things like - how long the request took - how many rows were fetched and returned - what handler was called. per request. the following patch is 1 way to implement this, I'm sure there are better ways. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-232) let Solr set request headers (for logging)
[ https://issues.apache.org/jira/browse/SOLR-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-232: --- Attachment: (was: SOLR-232.patch) let Solr set request headers (for logging) -- Key: SOLR-232 URL: https://issues.apache.org/jira/browse/SOLR-232 Project: Solr Issue Type: New Feature Environment: tomcat? Reporter: Ian Holsman Priority: Minor Attachments: SOLR-232.patch, meta.patch I need the ability to log certain information about a request so that I can feed it into performance and capacity monitoring systems. I would like to know things like - how long the request took - how many rows were fetched and returned - what handler was called. per request. the following patch is 1 way to implement this, I'm sure there are better ways. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-232) let Solr set request headers (for logging)
[ https://issues.apache.org/jira/browse/SOLR-232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-232: --- Attachment: SOLR-232.patch Hmm, that check isn't quite right - the ping handler ends up getting: X-Solr-Hits: null Since String.valuOf(Object object) has the behavior that if the argument is null, then a string equal to null; This better POC patch checks for null in the right place. Deleted the old one. let Solr set request headers (for logging) -- Key: SOLR-232 URL: https://issues.apache.org/jira/browse/SOLR-232 Project: Solr Issue Type: New Feature Environment: tomcat? Reporter: Ian Holsman Priority: Minor Attachments: SOLR-232.patch, meta.patch I need the ability to log certain information about a request so that I can feed it into performance and capacity monitoring systems. I would like to know things like - how long the request took - how many rows were fetched and returned - what handler was called. per request. the following patch is 1 way to implement this, I'm sure there are better ways. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2266) java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate field in a boost function with rord()
java.lang.ArrayIndexOutOfBoundsException in field cache when using a tdate field in a boost function with rord() Key: SOLR-2266 URL: https://issues.apache.org/jira/browse/SOLR-2266 Project: Solr Issue Type: Bug Affects Versions: 1.4.1 Environment: Mac OS 10.6 java version 1.6.0_22 Java(TM) SE Runtime Environment (build 1.6.0_22-b04-307-10M3261) Java HotSpot(TM) 64-Bit Server VM (build 17.1-b03-307, mixed mode) Reporter: Peter Wolanin I have been testing a switch to long and tdate instead of int and date fields in the schema.xml for our Drupal integration. This indexes fine, but search fails with a 500 error. {code} INFO: [d7] webapp=/solr path=/select params={spellcheck=truefacet=truefacet.mincount=1indent=1spellcheck.q=termjson.nl=mapwt=jsonrows=10version=1.2fl=id,entity_id,entity,bundle,bundle_name,nid,title,comment_count,type,created,changed,score,path,url,uid,namestart=0facet.sort=trueq=termbf=recip(rord(created),4,19,19)^200.0} status=500 QTime=4 Dec 5, 2010 11:52:28 AM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: 39 at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:721) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:224) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:692) at org.apache.solr.search.function.ReverseOrdFieldSource.getValues(ReverseOrdFieldSource.java:61) at org.apache.solr.search.function.TopValueSource.getValues(TopValueSource.java:57) at org.apache.solr.search.function.ReciprocalFloatFunction.getValues(ReciprocalFloatFunction.java:61) at org.apache.solr.search.function.FunctionQuery$AllScorer.init(FunctionQuery.java:123) at org.apache.solr.search.function.FunctionQuery$FunctionWeight.scorer(FunctionQuery.java:93) at org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:297) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:250) at org.apache.lucene.search.Searcher.search(Searcher.java:171) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1101) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:182) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at com.acquia.search.HmacFilter.doFilter(HmacFilter.java:62) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) {code} The exception goes away if I remove the boost function param bf=recip(rord(created),4,19,19)^200.0 Omitting the recip() doesn't help, so just bf=rord(created)^200.0 still causes the exception. In
[jira] Issue Comment Edited: (SOLR-2168) Velocity facet output for facet missing
[ https://issues.apache.org/jira/browse/SOLR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921852#action_12921852 ] Peter Wolanin edited comment on SOLR-2168 at 10/17/10 10:25 AM: Those all sound like good changes. In terms of templating -I'd find something like erb, or PHP, or jsp much easier, and I imagine many more people are failiar with those. So far, I feel like it's hard to understand in velocity how variables and control structures are distinguished from the output, and it's not clear that it's a real template in terms of the way e.g. white space is handled or not. This is especially true in the case of macro output, where is seems like e.g. the carriage returns and spaces I'd naturally include in control structures to make them readable become part of the output. The variable handling is also weird, that I need to use #set() for actual assignment? In terms of readablilty, loo for example, at this bit: {code} lia href=#url_for_home#lensfq=$esc.url( {code} the fq= is output in the middle of a series of macro and function calls but nothing visually distinguishes them. Can I define new functions instead of macros? If a macro call could be written as #{url_for_home} it would provide more visual separation. I notice in the patch you have: {code} -${field.name}:[* TO *] {code} Looks like the function call can be written like this? {code} ${esc.url(-${field.name}:[* TO *])} {code} was (Author: pwolanin): Those all sound like good changes. In terms of templating -I'd find something like erb, or PHP, or jsp much easier, and I imagine many more people are failiar with those. So far, I feel like it's hard to understand in velocity how variables and control structures are distinguished from the output, and it's not clear that it's a real template in terms of the way e.g. white space is handled or not. This is especially try in the case of macro output, where is seems like e.g. the carriae returns and spaces I'd naturally include in control structures to make them readable become part of the output. The variable handling is also weird, that I need to use #set() for actual assignment? In terms of readablilty, loo for example, at this bit: {code} lia href=#url_for_home#lensfq=$esc.url( {code} the fq= is output in the middle of a series of macro and function calls but nothing visually distinguishes them. Can I define new functions instead of macros? If a macro call could be written as #{url_for_home} it would provide more visual separation. I notice in the patch you have: {code} -${field.name}:[* TO *] {code} Looks like the function call can be written like this? {code} ${esc.url(-${field.name}:[* TO *])} {code} Velocity facet output for facet missing --- Key: SOLR-2168 URL: https://issues.apache.org/jira/browse/SOLR-2168 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 3.1 Reporter: Peter Wolanin Priority: Minor Attachments: SOLR-2168.patch If I add fact.missing to the facet params for a field, the Veolcity output has in the facet list: $facet.name (9220) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-2168) Velocity facet output for facet missing
[ https://issues.apache.org/jira/browse/SOLR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12921855#action_12921855 ] Peter Wolanin commented on SOLR-2168: - If you want to start using git more widely for devlopement (assuming people still post the final patches as attachments here) you might want to set up a canonical mirror some place on github so that everyone uses the same initial tree. We have this for Drupal: http://github.com/drupal/drupal and mirroring out of svn is probably even easier if someone has a server and can just run a script on cron every ~15 min. Velocity facet output for facet missing --- Key: SOLR-2168 URL: https://issues.apache.org/jira/browse/SOLR-2168 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 3.1 Reporter: Peter Wolanin Priority: Minor Attachments: SOLR-2168.patch If I add fact.missing to the facet params for a field, the Veolcity output has in the facet list: $facet.name (9220) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2168) Velocity facet output for facet missing
[ https://issues.apache.org/jira/browse/SOLR-2168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2168: Attachment: SOLR-2168.patch Attaching a function (if not elegant) fix - I find the velocity template syntax a little ... annoying Velocity facet output for facet missing --- Key: SOLR-2168 URL: https://issues.apache.org/jira/browse/SOLR-2168 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 3.1 Reporter: Peter Wolanin Priority: Minor Attachments: SOLR-2168.patch If I add fact.missing to the facet params for a field, the Veolcity output has in the facet list: $facet.name (9220) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Created: (SOLR-2149) Allow copyField directives to be controled by another (boolean) field
Allow copyField directives to be controled by another (boolean) field - Key: SOLR-2149 URL: https://issues.apache.org/jira/browse/SOLR-2149 Project: Solr Issue Type: New Feature Reporter: Peter Wolanin Thinking about alternative approaches to the problem outlined in SOLR-2010, it occurs to me that there are many cases where it would be useful to be able to control copyField behavior rather than having to fully populate or omit document fields. In regards to spellcheck, I could then have a few different spellcheck indexes each built from a different field and indicate for each document whether it's text should be added to each of the different spellcheck fields. I'm imagining a general syntax like this: {code} copyField source=body dest=teaser maxChars=300 controlField=populate_teaser/ {code} If not sure if Solr would could use the value of a control field only matches the ignored field type, but that's what I'm thinking about as one possibility. In other words, I can pass index-time flags into the document that are reflected in the terms of what's indexed but not explicitly stored in the document. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Updated: (SOLR-2149) Allow copyField directives to be controled by another (boolean) field
[ https://issues.apache.org/jira/browse/SOLR-2149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-2149: Attachment: SOLR-2149.patch The attached patch against 1.4 is not at all functional, just taking a rough look at where the code would need to be modified. Allow copyField directives to be controled by another (boolean) field - Key: SOLR-2149 URL: https://issues.apache.org/jira/browse/SOLR-2149 Project: Solr Issue Type: New Feature Reporter: Peter Wolanin Attachments: SOLR-2149.patch Thinking about alternative approaches to the problem outlined in SOLR-2010, it occurs to me that there are many cases where it would be useful to be able to control copyField behavior rather than having to fully populate or omit document fields. In regards to spellcheck, I could then have a few different spellcheck indexes each built from a different field and indicate for each document whether it's text should be added to each of the different spellcheck fields. I'm imagining a general syntax like this: {code} copyField source=body dest=teaser maxChars=300 controlField=populate_teaser/ {code} If not sure if Solr would could use the value of a control field only matches the ignored field type, but that's what I'm thinking about as one possibility. In other words, I can pass index-time flags into the document that are reflected in the terms of what's indexed but not explicitly stored in the document. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1967) New Native PHP Response Writer Class
[ https://issues.apache.org/jira/browse/SOLR-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12905186#action_12905186 ] Peter Wolanin commented on SOLR-1967: - To my mind, the PHP response writer should just be removed. PHP has had a number of security issues around unserializing data, and in most languages, unserializing potentially untrusted data may be an security vulnerability. New Native PHP Response Writer Class Key: SOLR-1967 URL: https://issues.apache.org/jira/browse/SOLR-1967 Project: Solr Issue Type: New Feature Components: clients - php, Response Writers Affects Versions: 1.4 Reporter: Israel Ekpo Fix For: 1.5, 3.1, 4.0, Next Attachments: phpnative.tar.gz, phpnativeresponsewriter.jar Original Estimate: 0h Remaining Estimate: 0h Hi Solr users, If you are using Apache Solr via PHP, I have some good news for you. There is a new response writer for the PHP native extension, currently available as a plugin. This new feature adds a new response writer class to the org.apache.solr.request package. This class is used by the PHP Native Solr Client driver to prepare the query response from Solr. This response writer allows you to configure the way the data is serialized for the PHP client. You can use your own class name and you can also control how the properties are serialized as well. The formatting of the response data is very similar to the way it is currently done by the PECL extension on the client side. The only difference now is that this serialization is happening on the server side instead. You will find this new response writer particularly useful when dealing with responses for - highlighting - admin threads responses - more like this responses to mention just a few You can pass the objectClassName request parameter to specify the class name to be used for serializing objects. Please note that the class must be available on the client side to avoid a PHP_Incomplete_Object error during the unserialization process. You can also pass in the objectPropertiesStorageMode request parameter with either a 0 (independent properties) or a 1 (combined properties). These parameters can also be passed as a named list when loading the response writer in the solrconfig.xml file Having this control allows you to create custom objects which gives the flexibility of implementing custom __get methods, ArrayAccess, Traversable and Iterator interfaces on the PHP client side. Until this class in incorporated into Solr, you simply have to copy the jar file containing this plugin into your lib directory under $SOLR_HOME The jar file is available here and so is the source code. Then set up the configuration as shown below and then restart your servelet container Below is an example configuration in solrconfig.xml code queryResponseWriter name=phpnative class=org.apache.solr.request.PHPNativeResponseWriter !-- You can choose a different class for your objects. Just make sure the class is available in the client -- str name=objectClassNameSolrObject/str !-- 0 means OBJECT_PROPERTIES_STORAGE_MODE_INDEPENDENT 1 means OBJECT_PROPERTIES_STORAGE_MODE_COMBINED In independed mode, each property is a separate property In combined mode, all the properites are merged into a _properties array. The combined mode allows you to create custom __getters and you could also implement ArrayAccess, Iterator and Traversable -- int name=objectPropertiesStorageMode0/int /queryResponseWriter code Below is an example implementation on the PHP client side. Support for specifying custom response writers will be available starting from the 0.9.11 version of the PECL extension for Solr currently available here http://pecl.php.net/package/solr Here is an example of how to use the new response writer with the PHP client. code ?php class SolrClass { public $_properties = array(); public function __get($property_name) { if (property_exists($this, $property_name)) { return $this-$property_name; } else if (isset($_properties[$property_name])) { return $_properties[$property_name]; } return null; } } $options = array ( 'hostname' = 'localhost', 'port' = 8983, 'path' = '/solr/' ); $client = new SolrClient($options); $client-setResponseWriter(phpnative); $response = $client-ping(); $query = new SolrQuery(); $query-setQuery(:); $query-set(objectClassName, SolrClass); $query-set(objectPropertiesStorageMode, 1); $response = $client-query($query); $resp = $response-getResponse(); ? code Documentation of the changes to the PECL extension are available here http://docs.php.net/manual/en/solrclient.construct.php http://docs.php.net/manual/en/solrclient.setresponsewriter.php Please contact me at
[jira] Commented: (SOLR-1819) Upgrade to Tika 0.7
[ https://issues.apache.org/jira/browse/SOLR-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880685#action_12880685 ] Peter Wolanin commented on SOLR-1819: - As a side note, looks like Solr trunk is using a 0.8 snapshot of Tika Upgrade to Tika 0.7 --- Key: SOLR-1819 URL: https://issues.apache.org/jira/browse/SOLR-1819 Project: Solr Issue Type: Improvement Reporter: Tricia Williams Assignee: Grant Ingersoll Priority: Minor Fix For: Next See title. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries
[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12873461#action_12873461 ] Peter Wolanin commented on SOLR-1852: - Yes, I'd propose to have this in 1.4.1 since it's a pretty serious bug in the places where it manifests. enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Assignee: Robert Muir Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the word delimiter filter. Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries
[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12871123#action_12871123 ] Peter Wolanin commented on SOLR-1852: - I'm thinking about 1.4 backporting - not sure what's happening with 1.5 Yes, you'd have to re-index if we have to backport to 1.4, but I assume that's only going to affect documents that would currently have broken searches? enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Assignee: Robert Muir Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the word delimiter filter. Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries
[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12870624#action_12870624 ] Peter Wolanin commented on SOLR-1852: - now this has been in trunk longer, do you feel any more confident about a back port? enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Assignee: Robert Muir Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the word delimiter filter. Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries
[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12852233#action_12852233 ] Peter Wolanin commented on SOLR-1852: - I'm confused by that comment - I thought this code is already in 1.5/trunk and the issue is backporting to the 1.4 branch? enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Assignee: Robert Muir Attachments: SOLR-1852.patch, SOLR-1852_testcase.patch Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the word delimiter filter. Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1852) enablePositionIncrements=true causes searches to fail when they are parse as phrase queries
enablePositionIncrements=true causes searches to fail when they are parse as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. Our default handler is dismax, but this also fails with the standard handler. So I'm wondering if this is a known issue, or am I missing something subtle in the analysis chain? Solr is 1.4.0 that I built. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1852) enablePositionIncrements=true causes searches to fail when they are parse as phrase queries
[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1852: Attachment: SOLR-1852.patch This patch was created by Mark Miller - it's a back port of Solr trunk code plus a tweak to let 1.4 compile With this updated Whitespace Delimiter if I reindex the bug seems to be fixed. enablePositionIncrements=true causes searches to fail when they are parse as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Attachments: SOLR-1852.patch Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. Our default handler is dismax, but this also fails with the standard handler. So I'm wondering if this is a known issue, or am I missing something subtle in the analysis chain? Solr is 1.4.0 that I built. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-1852) enablePositionIncrements=true causes searches to fail when they are parse as phrase queries
[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850608#action_12850608 ] Peter Wolanin edited comment on SOLR-1852 at 3/27/10 11:41 PM: --- This patch was created by Mark Miller - it's a back port of Solr trunk code plus a tweak to let 1.4 compile With this updated Whitespace Delimiter if I reindex the bug seems to be fixed. In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is treated as phrase query as if I had quoted it like Identi ca. That phrase search also fails. I had expected that Identi.ca would be the same as Identi ca (i.e. 2 separate tokens, not a phrase). was (Author: pwolanin): This patch was created by Mark Miller - it's a back port of Solr trunk code plus a tweak to let 1.4 compile With this updated Whitespace Delimiter if I reindex the bug seems to be fixed. enablePositionIncrements=true causes searches to fail when they are parse as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Attachments: SOLR-1852.patch Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. Our default handler is dismax, but this also fails with the standard handler. So I'm wondering if this is a known issue, or am I missing something subtle in the analysis chain? Solr is 1.4.0 that I built. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries
[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1852: Description: Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the whitespace delimiter. Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor was: Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. Our default handler is dismax, but this also fails with the standard handler. So I'm wondering if this is a known issue, or am I missing something subtle in the analysis chain? Solr is 1.4.0 that I built. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor Summary: enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries (was: enablePositionIncrements=true causes searches to fail when they are parse as phrase queries) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Attachments: SOLR-1852.patch Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the whitespace delimiter. Setting enablePositionIncrements=false in the stop
[jira] Updated: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries
[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1852: Description: Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the word delimiter filter. Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactory was: Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the whitespace delimiter. Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactor enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Attachments: SOLR-1852.patch Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the word delimiter filter. Setting enablePositionIncrements=false
[jira] Issue Comment Edited: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries
[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850608#action_12850608 ] Peter Wolanin edited comment on SOLR-1852 at 3/27/10 11:52 PM: --- This patch was created by Mark Miller - it's a back port of Solr trunk code plus a tweak to let 1.4 compile With this updated WordDelimiterFilter if I reindex the bug seems to be fixed. In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is treated as phrase query as if I had quoted it like Identi ca. That phrase search also fails. I had expected that Identi.ca would be the same as Identi ca (i.e. 2 separate tokens, not a phrase). was (Author: pwolanin): This patch was created by Mark Miller - it's a back port of Solr trunk code plus a tweak to let 1.4 compile With this updated Whitespace Delimiter if I reindex the bug seems to be fixed. In terms of the bug's symptoms to reproduce it, it looks as though Identi.ca is treated as phrase query as if I had quoted it like Identi ca. That phrase search also fails. I had expected that Identi.ca would be the same as Identi ca (i.e. 2 separate tokens, not a phrase). enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Attachments: SOLR-1852.patch Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the word delimiter filter. Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1852) enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries
[ https://issues.apache.org/jira/browse/SOLR-1852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850610#action_12850610 ] Peter Wolanin commented on SOLR-1852: - The changes in the patch originate at SOLR-1706 and SOLR-1657, however I don't think it's actually the same bug as SOLR-1706 intended to fix since the the admin analyzer interface the generated tokens look correct. enablePositionIncrements=true can cause searches to fail when they are parsed as phrase queries - Key: SOLR-1852 URL: https://issues.apache.org/jira/browse/SOLR-1852 Project: Solr Issue Type: Bug Affects Versions: 1.4 Reporter: Peter Wolanin Attachments: SOLR-1852.patch Symptom: searching for a string like a domain name containing a '.', the Solr 1.4 analyzer tells me that I will get a match, but when I enter the search either in the client or directly in Solr, the search fails. test string: Identi.ca queries that fail: IdentiCa, Identi.ca, Identi-ca query that matches: Identi ca schema in use is: http://drupalcode.org/viewvc/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.34content-type=text%2Fplainview=copathrev=DRUPAL-6--1 Screen shots: analysis: http://img.skitch.com/20100327-nt1uc1ctykgny28n8bgu99h923.png dismax search: http://img.skitch.com/20100327-byiduuiry78caka7q5smsw7fp.png dismax search: http://img.skitch.com/20100327-gckm8uhjx3t7px31ygfqc2ugdq.png standard search: http://img.skitch.com/20100327-usqyqju1d12ymcpb2cfbtdwyh.png Whether or not the bug appears is determined by the surrounding text: would be great to have support for Identi.ca on the follow block fails to match Identi.ca, but putting the content on its own or in another sentence: Support Identi.ca the search matches. Testing suggests the word for is the problem, and it looks like the bug occurs when a stop word preceeds a word that is split up using the word delimiter filter. Setting enablePositionIncrements=false in the stop filter and reindexing causes the searches to match. According to Mark Miller in #solr, this bug appears to be fixed already in Solr trunk, either due to the upgraded lucene or changes to the WordDelimiterFactory -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1553) extended dismax query parser
[ https://issues.apache.org/jira/browse/SOLR-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12805303#action_12805303 ] Peter Wolanin commented on SOLR-1553: - some commented out debug code left in the committed parser? {code} protected void addClause(List clauses, int conj, int mods, Query q) { //System.out.println(addClause:clauses=+clauses+ conj=+conj+ mods=+mods+ q=+q); super.addClause(clauses, conj, mods, q); } {code} extended dismax query parser Key: SOLR-1553 URL: https://issues.apache.org/jira/browse/SOLR-1553 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Fix For: 1.5 Attachments: SOLR-1553.patch, SOLR-1553.pf-refactor.patch An improved user-facing query parser based on dismax -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (TIKA-338) Trying to use -encoding parameter alwyas results in an exception
[ https://issues.apache.org/jira/browse/TIKA-338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin closed TIKA-338. -- Resolution: Invalid Trying to use -encoding parameter alwyas results in an exception Key: TIKA-338 URL: https://issues.apache.org/jira/browse/TIKA-338 Project: Tika Issue Type: Bug Components: cli Reporter: Peter Wolanin Fix For: 0.6 Original Estimate: 1h Remaining Estimate: 1h There is a logical error in the CLI code - -encoding can never work and always results in an exception $ java -jar tika-app/target/tika-app-0.6-SNAPSHOT.jar -encoding=UTF-8 -t test.txt Exception in thread main java.io.UnsupportedEncodingException: ncoding=UTF-8 at sun.nio.cs.StreamEncoder.forOutputStreamWriter(StreamEncoder.java:42) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated TIKA-324: --- Attachment: TIKA-324-README.patch Here's a little follow-up patch for the README file to document this further. Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X) -- Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4, 0.5 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Assignee: Jukka Zitting Priority: Critical Fix For: 0.6 Attachments: test.txt, TIKA-324-0.5.patch, TIKA-324-macosx.patch, TIKA-324-README.patch, TIKA-324.patch, TIKA-324.patch Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778926#action_12778926 ] Peter Wolanin commented on TIKA-324: In fact for tika 0.4 it looks like it works already to pass this option to java: -Dfile.encoding=UTF8 $java -Dfile.encoding=UTF8 -jar orig-tika-app-0.4.jar -t ./test.txt Iñtërnâtiônàlizætiøn Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X) -- Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4, 0.5 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Attachments: test.txt, TIKA-324-0.5.patch, TIKA-324-macosx.patch, TIKA-324.patch, TIKA-324.patch Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778928#action_12778928 ] Peter Wolanin commented on TIKA-324: Also, this is not a Mac-only problem- I have the same issue, for example, on CentOS using java version 1.6.0_04 [r...@i:~] java -jar tika-app-0.4.jar -t test.txt I?t?rn?ti?n?liz?ti?n Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X) -- Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4, 0.5 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Attachments: test.txt, TIKA-324-0.5.patch, TIKA-324-macosx.patch, TIKA-324.patch, TIKA-324.patch Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X)
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778951#action_12778951 ] Peter Wolanin commented on TIKA-324: on Mac OS 10.5 it looks correct: $echo $LANG en_US.UTF-8 on CentOS 5, no value is set: echo $LANG If I set that value on CenOS (to the same as my Mac) then output is correct: [r...@i:~] export LANG=en_US.UTF-8 [r...@i:~] java -jar tika-app-0.4.jar -t test.txt Iñtërnâtiônàlizætiøn Tika CLI mangles utf-8 content in text (-t) mode (on Mac OS X) -- Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4, 0.5 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Attachments: test.txt, TIKA-324-0.5.patch, TIKA-324-macosx.patch, TIKA-324.patch, TIKA-324.patch Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode
Tika CLI mangles utf-8 content in text (-t) mode Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.4, 0.3 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Fix For: 0.5 Attachments: test.txt When using the -t flag to tika, multi-byte content is destroyed in the output. Example: {code} $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html {code} see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated TIKA-324: --- Attachment: test.txt attaching little test ext file. Tika CLI mangles utf-8 content in text (-t) mode Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Fix For: 0.5 Attachments: test.txt Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: {code} $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html {code} see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778133#action_12778133 ] Peter Wolanin commented on TIKA-324: Examining the TikaCLI.java code, the xhtml versus text output is handled very differently. I'm not sure why the text one fails, but it seems to be easily rectified by applying the trasformer using text as the method. Tika CLI mangles utf-8 content in text (-t) mode Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Fix For: 0.5 Attachments: test.txt Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: {code} $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html {code} see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated TIKA-324: --- Attachment: TIKA-324.patch Attached is a patch against Tika 0.4. It resolves the bug for me, at least for the simple test case. {code} $ java -jar tika-app-0.4.jar -t ./test.txt Iñtërnâtiônàlizætiøn $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html {code} Tika CLI mangles utf-8 content in text (-t) mode Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Fix For: 0.5 Attachments: test.txt, TIKA-324.patch Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: {code} $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html {code} see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778134#action_12778134 ] Peter Wolanin edited comment on TIKA-324 at 11/15/09 6:01 PM: -- Attached is a patch against Tika 0.4. It resolves the bug for me, at least for the simple test case. $ java -jar tika-app-0.4.jar -t ./test.txt Iñtërnâtiônàlizætiøn $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html was (Author: pwolanin): Attached is a patch against Tika 0.4. It resolves the bug for me, at least for the simple test case. {code} $ java -jar tika-app-0.4.jar -t ./test.txt Iñtërnâtiônàlizætiøn $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html {code} Tika CLI mangles utf-8 content in text (-t) mode Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Fix For: 0.5 Attachments: test.txt, TIKA-324.patch Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: {code} $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html {code} see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778135#action_12778135 ] Peter Wolanin commented on TIKA-324: note: test string origin is: http://intertwingly.net/stories/2004/04/14/i18n.html Tika CLI mangles utf-8 content in text (-t) mode Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Fix For: 0.5 Attachments: test.txt, TIKA-324.patch Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: {code} $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html {code} see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated TIKA-324: --- Description: When using the -t flag to tika, multi-byte content is destroyed in the output. Example: $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html see also: http://drupal.org/node/622508#comment-2267918 was: When using the -t flag to tika, multi-byte content is destroyed in the output. Example: {code} $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html {code} see also: http://drupal.org/node/622508#comment-2267918 The bug is confirmed as present in 0.3 also. Tika CLI mangles utf-8 content in text (-t) mode Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Fix For: 0.5 Attachments: test.txt, TIKA-324.patch Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778140#action_12778140 ] Peter Wolanin edited comment on TIKA-324 at 11/15/09 6:20 PM: -- The bug is confirmed as present in 0.3 also. $ java -jar tika-0.3.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n was (Author: pwolanin): The bug is confirmed as present in 0.3 also. Tika CLI mangles utf-8 content in text (-t) mode Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Fix For: 0.5 Attachments: test.txt, TIKA-324.patch Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12778148#action_12778148 ] Peter Wolanin commented on TIKA-324: Bug is still present in trunk (and code tagged for 0.5) $ java -jar tika-app/target/tika-app-0.6-SNAPSHOT.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n Tika CLI mangles utf-8 content in text (-t) mode Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Fix For: 0.5 Attachments: test.txt, TIKA-324.patch Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (TIKA-324) Tika CLI mangles utf-8 content in text (-t) mode
[ https://issues.apache.org/jira/browse/TIKA-324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated TIKA-324: --- Attachment: TIKA-324.patch TIKA-324-0.5.patch Here is a patch for tika 0.5/trunk that resolves the bug (1 line change) and a revised patch for 0.4 that sets indent to true for consistency. For a quick test PDF - look at: http://nlp.stanford.edu/IR-book/pdf/00front.pdf Without the patch, the math symbols like ω,ωk are obliterated. Tika CLI mangles utf-8 content in text (-t) mode Key: TIKA-324 URL: https://issues.apache.org/jira/browse/TIKA-324 Project: Tika Issue Type: Bug Components: cli Affects Versions: 0.3, 0.4 Environment: Mac OS 10.5, java version 1.6.0_15 Reporter: Peter Wolanin Priority: Critical Fix For: 0.5 Attachments: test.txt, TIKA-324-0.5.patch, TIKA-324.patch, TIKA-324.patch Original Estimate: 2h Remaining Estimate: 2h When using the -t flag to tika, multi-byte content is destroyed in the output. Example: $ java -jar tika-app-0.4.jar -t ./test.txt I?t?rn?ti?n?liz?ti?n $ java -jar tika-app-0.4.jar -x ./test.txt ?xml version=1.0 encoding=UTF-8? html xmlns=http://www.w3.org/1999/xhtml; head title/ /head body pIñtërnâtiônàlizætiøn /p /body /html see also: http://drupal.org/node/622508#comment-2267918 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-874) Dismax parser exceptions on trailing OPERATOR
[ https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12771932#action_12771932 ] Peter Wolanin commented on SOLR-874: Anyone have an approach for this bug so we can get it fixed before 1.4 is done? Dismax parser exceptions on trailing OPERATOR - Key: SOLR-874 URL: https://issues.apache.org/jira/browse/SOLR-874 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Erik Hatcher Attachments: SOLR-874.patch Dismax is supposed to be immune to parse exceptions, but alas it's not: http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND kaboom! Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod AND': Encountered EOF at line 1, column 8. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175) at org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138) at org.apache.solr.search.QParser.getQuery(QParser.java:88) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter
[ https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752468#action_12752468 ] Peter Wolanin commented on SOLR-1400: - these lines seems to vary as to whether there is WS between char and the [] {code} @@ -29,29 +30,48 @@ public class TestTrimFilter extends BaseTokenTestCase { public void testTrim() throws Exception { +char[] a = a .toCharArray(); +char [] b = b .toCharArray(); +char [] ccc = cCc.toCharArray(); +char[] whitespace =.toCharArray(); +char[] empty = .toCharArray(); {code} Document with empty or white-space only string causes exception with TrimFilter --- Key: SOLR-1400 URL: https://issues.apache.org/jira/browse/SOLR-1400 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.4 Reporter: Peter Wolanin Assignee: Grant Ingersoll Fix For: 1.4 Attachments: SOLR-1400.patch, trim-example.xml Observed with Solr trunk. Posting any empty or whitespace-only string to a field using the {code}filter class=solr.TrimFilterFactory /{code} Causes a java exception: {code} Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63) at org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {code} Trim of an empty or WS-only string should not fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter
[ https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12752245#action_12752245 ] Peter Wolanin commented on SOLR-1400: - The patch seems to fix the bug for me, but there seems to be some code style inconsistency in the test code. Document with empty or white-space only string causes exception with TrimFilter --- Key: SOLR-1400 URL: https://issues.apache.org/jira/browse/SOLR-1400 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.4 Reporter: Peter Wolanin Assignee: Grant Ingersoll Fix For: 1.4 Attachments: SOLR-1400.patch, trim-example.xml Observed with Solr trunk. Posting any empty or whitespace-only string to a field using the {code}filter class=solr.TrimFilterFactory /{code} Causes a java exception: {code} Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63) at org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {code} Trim of an empty or WS-only string should not fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-756) Make DisjunctionMaxQueryParser generally useful by supporting all query types.
[ https://issues.apache.org/jira/browse/SOLR-756?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12751038#action_12751038 ] Peter Wolanin commented on SOLR-756: We are regularly hitting this wall and users are very frustrated by not being able to use wildcards becuase we wanted the other advantages of the dismax parser. Any chance to get some of these changes in 1.4? Make DisjunctionMaxQueryParser generally useful by supporting all query types. -- Key: SOLR-756 URL: https://issues.apache.org/jira/browse/SOLR-756 Project: Solr Issue Type: Improvement Affects Versions: 1.3 Reporter: David Smiley Fix For: 1.5 Attachments: SolrPluginUtilsDisMax.patch This is an enhancement to the DisjunctionMaxQueryParser to work on all the query variants such as wildcard, prefix, and fuzzy queries, and to support working in AND scenarios that are not processed by the min-should-match DisMax QParser. This was not in Solr already because DisMax was only used for a very limited syntax that didn't use those features. In my opinion, this makes a more suitable base parser for general use because unlike the Lucene/Solr parser, this one supports multiple default fields whereas other ones (say Yonik's {!prefix} one for example, can't do dismax). The notion of a single default field is antiquated and a technical under-the-hood detail of Lucene that I think Solr should shield the user from by on-the-fly using a DisMax when multiple fields are used. (patch to be attached soon) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1400) Document with empty or white-space only string causes exception with TrimFilter
[ https://issues.apache.org/jira/browse/SOLR-1400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1400: Attachment: trim-example.xml Post the attached document using the trunk sample schema.xml to reproduce. Document with empty or white-space only string causes exception with TrimFilter --- Key: SOLR-1400 URL: https://issues.apache.org/jira/browse/SOLR-1400 Project: Solr Issue Type: Bug Components: update Affects Versions: 1.4 Reporter: Peter Wolanin Attachments: trim-example.xml Observed with Solr trunk. Posting any empty or whitespace-only string to a field using the {code}filter class=solr.TrimFilterFactory /{code} Causes a java exception: {code} Sep 1, 2009 4:58:09 PM org.apache.solr.common.SolrException log SEVERE: java.lang.ArrayIndexOutOfBoundsException: -1 at org.apache.solr.analysis.TrimFilter.incrementToken(TrimFilter.java:63) at org.apache.solr.analysis.PatternReplaceFilter.incrementToken(PatternReplaceFilter.java:74) at org.apache.lucene.index.DocInverterPerField.processFields(DocInverterPerField.java:138) at org.apache.lucene.index.DocFieldProcessorPerThread.processDocument(DocFieldProcessorPerThread.java:244) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:772) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:755) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2611) at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:2583) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:241) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) {code} Trim of an empty or WS-only string should not fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler
[ https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1274: Attachment: SOLR-1274.patch Here's a patch that's nearly there, but somehow I'm missing something in how java behaves. The param is getting picked up, but this line never evals as true, even when the param is parsed right: {code} if (extractFormat == text) { {code} If I set it to {code} if (true) { {code} I get the desired text-only output. Provide multiple output formats in extract-only mode for tika handler - Key: SOLR-1274 URL: https://issues.apache.org/jira/browse/SOLR-1274 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1274.patch The proposed feature is to accept a URL parameter when using extract-only mode to specify an output format. This parameter might just overload the existing ext.extract.only so that one can optionally specify a format, e.g. false|true|xml|text where true and xml give the same response (i.e. xml remains the default) I had been assuming that I could choose among possible tika output formats when using the extracting request handler in extract-only mode as if from the CLI with the tika jar: -x or --xmlOutput XHTML content (default) -h or --html Output HTML content -t or --text Output plain text content -m or --metadata Output only metadata However, looking at the docs and source, it seems that only the xml option is available (hard-coded) in ExtractingDocumentLoader.java {code} serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, true)); {code} Providing at least a plain-text response seems to work if you change the serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler
[ https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1274: Attachment: SOLR-1274.patch Well, indeed - something like that works better. Provide multiple output formats in extract-only mode for tika handler - Key: SOLR-1274 URL: https://issues.apache.org/jira/browse/SOLR-1274 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1274.patch, SOLR-1274.patch The proposed feature is to accept a URL parameter when using extract-only mode to specify an output format. This parameter might just overload the existing ext.extract.only so that one can optionally specify a format, e.g. false|true|xml|text where true and xml give the same response (i.e. xml remains the default) I had been assuming that I could choose among possible tika output formats when using the extracting request handler in extract-only mode as if from the CLI with the tika jar: -x or --xmlOutput XHTML content (default) -h or --html Output HTML content -t or --text Output plain text content -m or --metadata Output only metadata However, looking at the docs and source, it seems that only the xml option is available (hard-coded) in ExtractingDocumentLoader.java {code} serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, true)); {code} Providing at least a plain-text response seems to work if you change the serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler
[ https://issues.apache.org/jira/browse/SOLR-1274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12731437#action_12731437 ] Peter Wolanin commented on SOLR-1274: - A minimal version of this would be pretty trivial as far as features go, and I'd thought Yonik was indicating on the e-mail list that it would be a reasonable follow on to his last patch in the linked issue. Provide multiple output formats in extract-only mode for tika handler - Key: SOLR-1274 URL: https://issues.apache.org/jira/browse/SOLR-1274 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 The proposed feature is to accept a URL parameter when using extract-only mode to specify an output format. This parameter might just overload the existing ext.extract.only so that one can optionally specify a format, e.g. false|true|xml|text where true and xml give the same response (i.e. xml remains the default) I had been assuming that I could choose among possible tika output formats when using the extracting request handler in extract-only mode as if from the CLI with the tika jar: -x or --xmlOutput XHTML content (default) -h or --html Output HTML content -t or --text Output plain text content -m or --metadata Output only metadata However, looking at the docs and source, it seems that only the xml option is available (hard-coded) in ExtractingDocumentLoader.java {code} serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, true)); {code} Providing at least a plain-text response seems to work if you change the serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-874) Dismax parser exceptions on trailing OPERATOR
[ https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-874: --- Attachment: SOLR-874.patch Here's a simple patch that escapes with a \. It prevents the exception, however, this fails to match and/or/not (after removing those from the stopwords file) so it's clearly not quite right. Dismax parser exceptions on trailing OPERATOR - Key: SOLR-874 URL: https://issues.apache.org/jira/browse/SOLR-874 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Erik Hatcher Attachments: SOLR-874.patch Dismax is supposed to be immune to parse exceptions, but alas it's not: http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND kaboom! Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod AND': Encountered EOF at line 1, column 8. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175) at org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138) at org.apache.solr.search.QParser.getQuery(QParser.java:88) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1274) Provide multiple output formats in extract-only mode for tika handler
Provide multiple output formats in extract-only mode for tika handler - Key: SOLR-1274 URL: https://issues.apache.org/jira/browse/SOLR-1274 Project: Solr Issue Type: New Feature Affects Versions: 1.4 Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 The proposed feature is to accept a URL parameter when using extract-only mode to specify an output format. This parameter might just overload the existing ext.extract.only so that one can optionally specify a format, e.g. false|true|xml|text where true and xml give the same response (i.e. xml remains the default) I had been assuming that I could choose among possible tika output formats when using the extracting request handler in extract-only mode as if from the CLI with the tika jar: -x or --xmlOutput XHTML content (default) -h or --html Output HTML content -t or --text Output plain text content -m or --metadata Output only metadata However, looking at the docs and source, it seems that only the xml option is available (hard-coded) in ExtractingDocumentLoader.java {code} serializer = new XMLSerializer(writer, new OutputFormat(XML, UTF-8, true)); {code} Providing at least a plain-text response seems to work if you change the serializer to a TextSerializer (org.apache.xml.serialize.TextSerializer). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-874) Dismax parser exceptions on trailing OPERATOR
[ https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730492#action_12730492 ] Peter Wolanin commented on SOLR-874: I get the same sort of exception with a *leading* operator and the dismax handler. Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR bin OR vti OR aut OR author OR dll': Encountered OR OR at line 1, column 0. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) Dismax parser exceptions on trailing OPERATOR - Key: SOLR-874 URL: https://issues.apache.org/jira/browse/SOLR-874 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Erik Hatcher Dismax is supposed to be immune to parse exceptions, but alas it's not: http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND kaboom! Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod AND': Encountered EOF at line 1, column 8. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175) at org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138) at org.apache.solr.search.QParser.getQuery(QParser.java:88) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-874) Dismax parser exceptions on trailing OPERATOR
[ https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12730513#action_12730513 ] Peter Wolanin commented on SOLR-874: possibly a fix could be rolled into this existing method in SolrPluginUtils.java ? {code} /** * Strips operators that are used illegally, otherwise reuturns it's * input. Some examples of illegal user queries are: chocolate +- * chip, chocolate - - chip, and chocolate chip -. */ public static CharSequence stripIllegalOperators(CharSequence s) { String temp = CONSECUTIVE_OP_PATTERN.matcher( s ).replaceAll( ); return DANGLING_OP_PATTERN.matcher( temp ).replaceAll( ); } {code} This seems only to be called from: org/apache/solr/search/DisMaxQParser.java:156: userQuery = SolrPluginUtils.stripIllegalOperators(userQuery).toString(); Dismax parser exceptions on trailing OPERATOR - Key: SOLR-874 URL: https://issues.apache.org/jira/browse/SOLR-874 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Erik Hatcher Dismax is supposed to be immune to parse exceptions, but alas it's not: http://localhost:8983/solr/select?defType=dismaxqf=nameq=ipod+AND kaboom! Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod AND': Encountered EOF at line 1, column 8. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... TERM ... * ... at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175) at org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138) at org.apache.solr.search.QParser.getQuery(QParser.java:88) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1200) NullPointerException when unloading an absent core
[ https://issues.apache.org/jira/browse/SOLR-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12716252#action_12716252 ] Peter Wolanin commented on SOLR-1200: - Do we need to open another issue (maybe for 1.5) - I'd think the expected behavior would be to throw a specific exception anywhere in core admin that a core is not found, and then catch it and return a 404? At the moment, however, you can request status for a non-existent core, etc, and get a 200 with some data, so this patch makes the behavior consistent, at least. NullPointerException when unloading an absent core -- Key: SOLR-1200 URL: https://issues.apache.org/jira/browse/SOLR-1200 Project: Solr Issue Type: Bug Affects Versions: 1.4 Environment: java version 1.6.0_07 Reporter: Peter Wolanin Assignee: Noble Paul Priority: Minor Fix For: 1.4 Attachments: SOLR-1200.patch, SOLR-1200.patch Original Estimate: 1h Remaining Estimate: 1h When I try to unload a core that does not exist (e.g. it has already been unloaded), Solr throws a NullPointerException java.lang.NullPointerException at org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:125) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) ... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1200) NullPointerException when unloading an absent core
NullPointerException when unloading an absent core -- Key: SOLR-1200 URL: https://issues.apache.org/jira/browse/SOLR-1200 Project: Solr Issue Type: Bug Affects Versions: 1.4 Environment: java version 1.6.0_07 Reporter: Peter Wolanin Priority: Minor When I try to unload a core that does not exist (e.g. it has already been unloaded), Solr throws a NullPointerException java.lang.NullPointerException at org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:125) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) ... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1200) NullPointerException when unloading an absent core
[ https://issues.apache.org/jira/browse/SOLR-1200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1200: Attachment: SOLR-1200.patch Here's a simple patch that follows the pattern in the other core admin methods. NullPointerException when unloading an absent core -- Key: SOLR-1200 URL: https://issues.apache.org/jira/browse/SOLR-1200 Project: Solr Issue Type: Bug Affects Versions: 1.4 Environment: java version 1.6.0_07 Reporter: Peter Wolanin Priority: Minor Attachments: SOLR-1200.patch Original Estimate: 1h Remaining Estimate: 1h When I try to unload a core that does not exist (e.g. it has already been unloaded), Solr throws a NullPointerException java.lang.NullPointerException at org.apache.solr.handler.admin.CoreAdminHandler.handleUnloadAction(CoreAdminHandler.java:319) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:125) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:301) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:174) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) ... -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1183) Example script not update for new analysis path from SOLR-1099
[ https://issues.apache.org/jira/browse/SOLR-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1183: Attachment: SOLR-1183.patch Example script not update for new analysis path from SOLR-1099 -- Key: SOLR-1183 URL: https://issues.apache.org/jira/browse/SOLR-1183 Project: Solr Issue Type: Bug Components: Analysis Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1183.patch The example script example/exampleAnalysis/post.sh attempts to post to the path http://localhost:8983/solr/analysis however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by default as of r767412 A simple fix is to change to http://localhost:8983/solr/analysis/document -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1183) Example script not updated for new analysis path from SOLR-1099
[ https://issues.apache.org/jira/browse/SOLR-1183?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1183: Description: The example script example/exampleAnalysis/post.sh attempts to post to the path http://localhost:8983/solr/analysis however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by default as of r767412 A simple fix is to change to http://localhost:8983/solr/analysis/document was: The example script example/exampleAnalysis/post.sh attempts to post to the path http://localhost:8983/solr/analysis however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by default as of r767412 A simple fix is to change to http://localhost:8983/solr/analysis/document Summary: Example script not updated for new analysis path from SOLR-1099 (was: Example script not update for new analysis path from SOLR-1099) Example script not updated for new analysis path from SOLR-1099 --- Key: SOLR-1183 URL: https://issues.apache.org/jira/browse/SOLR-1183 Project: Solr Issue Type: Bug Components: Analysis Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1183.patch The example script example/exampleAnalysis/post.sh attempts to post to the path http://localhost:8983/solr/analysis however, SOLR-1099 changed the solrconfig.xml, so that path is disabled by default as of r767412 A simple fix is to change to http://localhost:8983/solr/analysis/document -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1167) Support module xml config files using XInclude
[ https://issues.apache.org/jira/browse/SOLR-1167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12710200#action_12710200 ] Peter Wolanin commented on SOLR-1167: - I think you posted a sample snippet for solrconfig to the list - can you report here and possibly include in the patch a change to the sample schema or solrconfig that would demonstrate this feature? Support module xml config files using XInclude -- Key: SOLR-1167 URL: https://issues.apache.org/jira/browse/SOLR-1167 Project: Solr Issue Type: New Feature Reporter: Bryan Talbot Priority: Minor Attachments: SOLR-1167.patch Current configuration files (schema and solrconfig) are monolithic which can make maintenance and reuse more difficult that it needs to be. The XML standards include a feature to include content from external files. This is described at http://www.w3.org/TR/xinclude/ This feature is to add support for XInclude features for XML configuration files. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1151) Document the new CopyField maxChars property in the example schema.xml
[ https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1151: Description: In this issue: http://issues.apache.org/jira/browse/SOLR-538 a maxLength property was added to the copyField directive. However, this is not documented in the example schema to make the feature known to users. (was: In this issue: http://issues.apache.org/jira/browse/SOLR-538 a maxLength property was added to the copyField directive. However, this is not documented in the example schema to make the feature known to users.) Summary: Document the new CopyField maxChars property in the example schema.xml (was: Document the new CopyField maxLength property in the example schema.xml) Document the new CopyField maxChars property in the example schema.xml -- Key: SOLR-1151 URL: https://issues.apache.org/jira/browse/SOLR-1151 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 1.4 Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1151.patch Original Estimate: 1h Remaining Estimate: 1h In this issue: http://issues.apache.org/jira/browse/SOLR-538 a maxLength property was added to the copyField directive. However, this is not documented in the example schema to make the feature known to users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1151) Document the new CopyField maxChars property in the example schema.xml
[ https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1151: Attachment: SOLR-1151.patch revised patch to use maxChars - still not sure if this is a useful example, but at least adds some documentation of this property. Document the new CopyField maxChars property in the example schema.xml -- Key: SOLR-1151 URL: https://issues.apache.org/jira/browse/SOLR-1151 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 1.4 Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1151.patch, SOLR-1151.patch Original Estimate: 1h Remaining Estimate: 1h In this issue: http://issues.apache.org/jira/browse/SOLR-538 a maxLength property was added to the copyField directive. However, this is not documented in the example schema to make the feature known to users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-1151) Document the new CopyField maxLength property in the example schema.xml
Document the new CopyField maxLength property in the example schema.xml --- Key: SOLR-1151 URL: https://issues.apache.org/jira/browse/SOLR-1151 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 1.4 Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 In this issue: http://issues.apache.org/jira/browse/SOLR-538 a maxLength property was added to the copyField directive. However, this is not documented in the example schema to make the feature known to users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-1151) Document the new CopyField maxLength property in the example schema.xml
[ https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-1151: Attachment: SOLR-1151.patch 1st pass Document the new CopyField maxLength property in the example schema.xml --- Key: SOLR-1151 URL: https://issues.apache.org/jira/browse/SOLR-1151 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 1.4 Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1151.patch Original Estimate: 1h Remaining Estimate: 1h In this issue: http://issues.apache.org/jira/browse/SOLR-538 a maxLength property was added to the copyField directive. However, this is not documented in the example schema to make the feature known to users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1151) Document the new CopyField maxLength property in the example schema.xml
[ https://issues.apache.org/jira/browse/SOLR-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12707211#action_12707211 ] Peter Wolanin commented on SOLR-1151: - needs work - the final format is maxChars NOT maxLength Document the new CopyField maxLength property in the example schema.xml --- Key: SOLR-1151 URL: https://issues.apache.org/jira/browse/SOLR-1151 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 1.4 Reporter: Peter Wolanin Priority: Minor Fix For: 1.4 Attachments: SOLR-1151.patch Original Estimate: 1h Remaining Estimate: 1h In this issue: http://issues.apache.org/jira/browse/SOLR-538 a maxLength property was added to the copyField directive. However, this is not documented in the example schema to make the feature known to users. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-341) PHP Solr Client
[ https://issues.apache.org/jira/browse/SOLR-341?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12681893#action_12681893 ] Peter Wolanin commented on SOLR-341: r6 has been bundled into a release: http://code.google.com/p/solr-php-client/downloads/list We'll test this with the Drupal module soon, but is likely to work fine. PHP Solr Client --- Key: SOLR-341 URL: https://issues.apache.org/jira/browse/SOLR-341 Project: Solr Issue Type: New Feature Components: clients - php Affects Versions: 1.2 Environment: PHP = 5.2.0 (or older with JSON PECL extension or other json_decode function implementation). Solr = 1.2 Reporter: Donovan Jimenez Priority: Trivial Fix For: 1.5 Attachments: SolrPhpClient.2008-09-02.zip, SolrPhpClient.2008-11-14.zip, SolrPhpClient.2008-11-25.zip, SolrPhpClient.zip Developed this client when the example PHP source didn't meet our needs. The company I work for agreed to release it under the terms of the Apache License. This version is slightly different from what I originally linked to on the dev mailing list. I've incorporated feedback from Yonik and hossman to simplify the client and only accept one response format (JSON currently). When Solr 1.3 is released the client can be updated to use the PHP or Serialized PHP response writer. example usage from my original mailing list post: ?php require_once('Solr/Service.php'); $start = microtime(true); $solr = new Solr_Service(); //Or explicitly new Solr_Service('localhost', 8180, '/solr'); try { $response = $solr-search('solr', 0, 10, array(/* you can include other parameters here */)); echo 'search returned with status = ', $response-responseHeader-status, ' and took ', microtime(true) - $start, ' seconds', \n; //here's how you would access results //Notice that I've mapped the values by name into a tree of stdClass objects //and arrays (actually, most of this is done by json_decode ) if ($response-response-numFound 0) { $doc_number = $response-response-start; foreach ($response-response-docs as $doc) { $doc_number++; echo $doc_number, ': ', $doc-text, \n; } } //for the purposes of seeing the available structure of the response //NOTE: Solr_Response::_parsedData is lazy loaded, so a print_r on the response before //any values are accessed may result in different behavior (in case //anyone has some troubles debugging) //print_r($response); } catch (Exception $e) { echo $e-getMessage(), \n; } ? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-196) A PHP response writer for Solr
[ https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680288#action_12680288 ] Peter Wolanin commented on SOLR-196: This serialized writer produces output that is inconsistent with the other PHP writer adn inconsistent with the JSON A PHP response writer for Solr -- Key: SOLR-196 URL: https://issues.apache.org/jira/browse/SOLR-196 Project: Solr Issue Type: New Feature Components: clients - php, search Reporter: Paul Borgermans Fix For: 1.3 Attachments: SOLR-192-php-responsewriter.patch, SOLR-196-PHPResponseWriter.patch It would be useful to have a PHP response writer that returns an array to be eval-ed directly. This is especially true for PHP4.x installs, where there is no built in support for JSON. This issue attempts to address this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-196) A PHP response writer for Solr
[ https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680288#action_12680288 ] Peter Wolanin edited comment on SOLR-196 at 3/9/09 2:33 PM: This serialized writer produces output that is inconsistent with the other PHP writer and inconsistent with the JSON. was (Author: pwolanin): This serialized writer produces output that is inconsistent with the other PHP writer adn inconsistent with the JSON A PHP response writer for Solr -- Key: SOLR-196 URL: https://issues.apache.org/jira/browse/SOLR-196 Project: Solr Issue Type: New Feature Components: clients - php, search Reporter: Paul Borgermans Fix For: 1.3 Attachments: SOLR-192-php-responsewriter.patch, SOLR-196-PHPResponseWriter.patch It would be useful to have a PHP response writer that returns an array to be eval-ed directly. This is especially true for PHP4.x installs, where there is no built in support for JSON. This issue attempts to address this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Issue Comment Edited: (SOLR-196) A PHP response writer for Solr
[ https://issues.apache.org/jira/browse/SOLR-196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12680288#action_12680288 ] Peter Wolanin edited comment on SOLR-196 at 3/9/09 4:39 PM: This PHP writer is inconsistent with the JSON if you use php 5's decode_json, maps come back as objects. was (Author: pwolanin): This serialized writer produces output that is inconsistent with the other PHP writer and inconsistent with the JSON. A PHP response writer for Solr -- Key: SOLR-196 URL: https://issues.apache.org/jira/browse/SOLR-196 Project: Solr Issue Type: New Feature Components: clients - php, search Reporter: Paul Borgermans Fix For: 1.3 Attachments: SOLR-192-php-responsewriter.patch, SOLR-196-PHPResponseWriter.patch It would be useful to have a PHP response writer that returns an array to be eval-ed directly. This is especially true for PHP4.x installs, where there is no built in support for JSON. This issue attempts to address this. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677517#action_12677517 ] Peter Wolanin commented on LUCENE-1500: --- Well, this patch does not (obviously) solve the real bug. Is it possible to combine #1 and #3, but possibly revert #3 later when we solve the real bug in the highlighter code? Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Assignee: Michael McCandless Fix For: 2.4.1, 2.9 Attachments: LUCENE-1500.patch, patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677531#action_12677531 ] Peter Wolanin commented on LUCENE-1500: --- The bug we are seeing now happens on pretty much every document that contains multi-byte characters, but only sometime was it going past the end of the full string and hitting the exception. With the patch, the bug is still very evident, it jsut prevents the exception. I's a serious flaw in the highlighter - maybe using some only non-utf-8 aware method to calculate string lengths? Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Assignee: Michael McCandless Fix For: 2.4.1, 2.9 Attachments: LUCENE-1500.patch, patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677540#action_12677540 ] Peter Wolanin commented on LUCENE-1500: --- I am using Solr, but with a single value field. I'm using the current Solr build (includes the fix), so the bug I'm describing, which triggers the same exception as the prior Solr bug did, is still present and unrelated to SOLR-925. The extent of my tracing suggests it's coming when the token stream is generated, which looks to be part of the lucene highlighter: org.apache.lucene.search.highlight.TokenSources Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Assignee: Michael McCandless Fix For: 2.4.1, 2.9 Attachments: LUCENE-1500.patch, patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677561#action_12677561 ] Peter Wolanin commented on LUCENE-1500: --- I'm still trying to get a handle on how these pieces fit together., so sorry if I've jumped to the wrong conclusion. If the analyzer is where the offsets are calculated, then that sounds like the place to look. The field does use term vectors. The field uses this type from the Solr schema: {code} fieldType name=text class=solr.TextField positionIncrementGap=100 {code} The full schema is http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/schema.xml?revision=1.1.2.1.2.30pathrev=DRUPAL-6--1 the field is {code} field name=body type=text indexed=true stored=true termVectors=true/ {code} in case it's relevant, the solrconfig is: http://cvs.drupal.org/viewvc.py/drupal/contributions/modules/apachesolr/solrconfig.xml?revision=1.1.2.15pathrev=DRUPAL-6--1 Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Assignee: Michael McCandless Fix For: 2.4.1, 2.9 Attachments: LUCENE-1500.patch, patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677620#action_12677620 ] Peter Wolanin commented on LUCENE-1500: --- Ah, it occurs to me that we first saw this bug recently - and it seems likely it was only after starting to use : {code} charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ {code} for that field type. I will investigate more and post a SOLR issue. Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Assignee: Michael McCandless Fix For: 2.4.1, 2.9 Attachments: LUCENE-1500.patch, patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677629#action_12677629 ] Peter Wolanin commented on LUCENE-1500: --- Koji - thanks - I was aware that not all worked with the mapping filter, but I was apparently misinformed since I was told that the solr.HTMLStripWhitespaceTokenizerFactory was also suitable for CharFilter. Indeed your e-mail thread linked from SOLR-822 describes exactly the problem I have: bq. As you can see, if you use CharFilter, Token offsets could be incorrect because CharFilters may convert 1 char to 2 chars or the other way around. In the thread you suggest that this API could be aded to lucene java? Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Assignee: Michael McCandless Fix For: 2.4.1, 2.9 Attachments: LUCENE-1500.patch, patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (SOLR-822) CharFilter - normalize characters before tokenizer
[ https://issues.apache.org/jira/browse/SOLR-822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12677627#action_12677627 ] Peter Wolanin commented on SOLR-822: Is there an issue for CharStream API in lucene? The e-mail thread looks like people were generally in support. CharFilter - normalize characters before tokenizer -- Key: SOLR-822 URL: https://issues.apache.org/jira/browse/SOLR-822 Project: Solr Issue Type: New Feature Components: Analysis Affects Versions: 1.3 Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 1.4 Attachments: character-normalization.JPG, sample_mapping_ja.txt, sample_mapping_ja.txt, SOLR-822-for-1.3.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch, SOLR-822.patch A new plugin which can be placed in front of tokenizer/. {code:xml} fieldType name=textCharNorm class=solr.TextField positionIncrementGap=100 analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping_ja.txt / tokenizer class=solr.MappingCJKTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType {code} charFilter/ can be multiple (chained). I'll post a JPEG file to show character normalization sample soon. MOTIVATION: In Japan, there are two types of tokenizers -- N-gram (CJKTokenizer) and Morphological Analyzer. When we use morphological analyzer, because the analyzer uses Japanese dictionary to detect terms, we need to normalize characters before tokenization. I'll post a patch soon, too. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676648#action_12676648 ] Peter Wolanin commented on LUCENE-1500: --- Yes - this patch is not a fix - but a work-around. The root cause is clearly somewhere in the code generating the token stream - tokens seem to be getting positions in bytes rather than characters. DefaultSolrHighlighter.java has this code: {code} import org.apache.lucene.search.highlight.TokenSources; ... // create TokenStream try { // attempt term vectors if( tots == null ) tots = new TermOffsetsTokenStream( TokenSources.getTokenStream(searcher.getReader(), docId, fieldName) ); tstream = tots.getMultiValuedTokenStream( docTexts[j].length() ); } catch (IllegalArgumentException e) { // fall back to anaylzer tstream = new TokenOrderingFilter(schema.getAnalyzer().tokenStream(fieldName, new StringReader(docTexts[j])), 10); } {code} Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Assignee: Michael McCandless Fix For: 2.4.1, 2.9 Attachments: LUCENE-1500.patch, patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676421#action_12676421 ] Peter Wolanin commented on LUCENE-1500: --- I have run into this issue over the last couple days. Also using Solr, but the error is triggered by content that has multi-byte characters (such as German). It seems that somewhere lucene is counting bytes instead of characters, so each substring the highlighter tries to select is offset further forward in the string being matched. here's an example trying to highlight the string 'Drupaltalk' with strong tags {code} p class=search-snippet Community ist - und dieses Portal Drstrongupaltalk.d/stronge samt seinem schon eifrigen Benutzer- und Gästezulauf ( ... nter Drustrongpaltalk001/strong könnt Ihr die erste Konferenz noch mal nachhören und erfahren, wie Selbstorganisation in der Drupal Szene funktioniert. Drustrongpaltalk002/strong ist dann der Talk vom Dienstag zum Thema Drupal Al/p {code} Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Attachments: patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676421#action_12676421 ] pwolanin edited comment on LUCENE-1500 at 2/24/09 1:38 PM: I have run into this issue over the last couple days. Also using Solr, but the error is triggered by content that has multi-byte characters (such as German). It seems that somewhere Lucene is counting bytes instead of characters, so each substring the highlighter tries to select is offset further forward in the string being matched. here's an example trying to highlight the string 'Drupaltalk' with strong tags {code} p class=search-snippet Community ist - und dieses Portal Drstrongupaltalk.d/stronge samt seinem schon eifrigen Benutzer- und Gästezulauf ( ... nter Drustrongpaltalk001/strong könnt Ihr die erste Konferenz noch mal nachhören und erfahren, wie Selbstorganisation in der Drupal Szene funktioniert. Drustrongpaltalk002/strong ist dann der Talk vom Dienstag zum Thema Drupal Al/p {code} So the attached patch would probably avoid the exception (and is a good idea) but would not fix the bug I'm seeing. was (Author: pwolanin): I have run into this issue over the last couple days. Also using Solr, but the error is triggered by content that has multi-byte characters (such as German). It seems that somewhere lucene is counting bytes instead of characters, so each substring the highlighter tries to select is offset further forward in the string being matched. here's an example trying to highlight the string 'Drupaltalk' with strong tags {code} p class=search-snippet Community ist - und dieses Portal Drstrongupaltalk.d/stronge samt seinem schon eifrigen Benutzer- und Gästezulauf ( ... nter Drustrongpaltalk001/strong könnt Ihr die erste Konferenz noch mal nachhören und erfahren, wie Selbstorganisation in der Drupal Szene funktioniert. Drustrongpaltalk002/strong ist dann der Talk vom Dienstag zum Thema Drupal Al/p {code} So the attached patch would probably avoid the exception (and is a good idea) but would not fix the bug I'm seeing. Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Attachments: patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676421#action_12676421 ] pwolanin edited comment on LUCENE-1500 at 2/24/09 1:37 PM: I have run into this issue over the last couple days. Also using Solr, but the error is triggered by content that has multi-byte characters (such as German). It seems that somewhere lucene is counting bytes instead of characters, so each substring the highlighter tries to select is offset further forward in the string being matched. here's an example trying to highlight the string 'Drupaltalk' with strong tags {code} p class=search-snippet Community ist - und dieses Portal Drstrongupaltalk.d/stronge samt seinem schon eifrigen Benutzer- und Gästezulauf ( ... nter Drustrongpaltalk001/strong könnt Ihr die erste Konferenz noch mal nachhören und erfahren, wie Selbstorganisation in der Drupal Szene funktioniert. Drustrongpaltalk002/strong ist dann der Talk vom Dienstag zum Thema Drupal Al/p {code} So the attached patch would probably avoid the exception (and is a good idea) but would not fix the bug I'm seeing. was (Author: pwolanin): I have run into this issue over the last couple days. Also using Solr, but the error is triggered by content that has multi-byte characters (such as German). It seems that somewhere lucene is counting bytes instead of characters, so each substring the highlighter tries to select is offset further forward in the string being matched. here's an example trying to highlight the string 'Drupaltalk' with strong tags {code} p class=search-snippet Community ist - und dieses Portal Drstrongupaltalk.d/stronge samt seinem schon eifrigen Benutzer- und Gästezulauf ( ... nter Drustrongpaltalk001/strong könnt Ihr die erste Konferenz noch mal nachhören und erfahren, wie Selbstorganisation in der Drupal Szene funktioniert. Drustrongpaltalk002/strong ist dann der Talk vom Dienstag zum Thema Drupal Al/p {code} Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Attachments: patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676426#action_12676426 ] Peter Wolanin commented on LUCENE-1500: --- Actually, looking at the Lucene source and the trace: {code} java.lang.StringIndexOutOfBoundsException: String index out of range: 2822 at java.lang.String.substring(String.java:1765) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:313) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) ... {code} I see now that getBestTextFragments() takes in a token stream - and each token in this steam already has start/end positions set. So, this patch would mitigate the exception, but looks liek the real bug is in Solr. Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Attachments: patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Issue Comment Edited: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676426#action_12676426 ] pwolanin edited comment on LUCENE-1500 at 2/24/09 2:15 PM: Actually, looking at the Lucene source and the trace: {code} java.lang.StringIndexOutOfBoundsException: String index out of range: 2822 at java.lang.String.substring(String.java:1765) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:313) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) ... {code} I see now that getBestTextFragments() takes in a token stream - and each token in this steam already has start/end positions set. So, this patch would mitigate the exception, but looks like the real bug is in Solr, or perhaps elsewhere in Lucene where the token stream is constructed. was (Author: pwolanin): Actually, looking at the Lucene source and the trace: {code} java.lang.StringIndexOutOfBoundsException: String index out of range: 2822 at java.lang.String.substring(String.java:1765) at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:274) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:313) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:84) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) ... {code} I see now that getBestTextFragments() takes in a token stream - and each token in this steam already has start/end positions set. So, this patch would mitigate the exception, but looks liek the real bug is in Solr. Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Attachments: patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12676449#action_12676449 ] Peter Wolanin commented on LUCENE-1500: --- Actually - the initial patch does not avoid the exception I'm seeing, since the start of the token is ok, but the end is beyond the string's end. Here is a slightly enhanced version that checks both the start and end of the token. Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Attachments: patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1500) Highlighter throws StringIndexOutOfBoundsException
[ https://issues.apache.org/jira/browse/LUCENE-1500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated LUCENE-1500: -- Attachment: LUCENE-1500.patch Highlighter throws StringIndexOutOfBoundsException -- Key: LUCENE-1500 URL: https://issues.apache.org/jira/browse/LUCENE-1500 Project: Lucene - Java Issue Type: Bug Components: contrib/highlighter Affects Versions: 2.4 Environment: Found this running the example code in Solr (latest version). Reporter: David Bowen Attachments: LUCENE-1500.patch, patch.txt Using the canonical Solr example (ant run-example) I added this document (using exampledocs/post.sh): adddoc field name=idTest for Highlighting StringIndexOutOfBoundsExcdption/field field name=nameSome Name/field field name=manuAcme, Inc./field field name=featuresDescription of the features, mentioning various things/field field name=featuresFeatures also is multivalued/field field name=popularity6/field field name=inStocktrue/field /doc/add and then the URL http://localhost:8983/solr/select/?q=featureshl=truehl.fl=features caused the exception. I have a patch. I don't know if it is completely correct, but it avoids this exception. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org