AW: [VOTE] Release PyLucene 3.6.0
I was able to build JCC and PyLucene - after having fixed the ivy-issue - on Win7-32 with Python2.7 and Java1.6: Installed c:\python27\lib\site-packages\jcc-2.13-py2.7-win32.egg The initial build of PyLucene failed with: pylucene-3.6.0-1\lucene-java-3.6.0\lucene\common-build.xml:526: The following error occurred while executing this line: pylucene-3.6.0-1\lucene-java-3.6.0\lucene\common-build.xml:298: Ivy is not available After running the ivy-bootstrap the build went fine. cd pylucene-3.6.0-1\lucene-java-3.6.0\lucene ant ivy-bootstrap Buildfile: pylucene-3.6.0-1\lucene-java-3.6.0\lucene\build.xml ivy-bootstrap: [mkdir] Created dir: C:\Users\Thomas Koch\.ant\lib [echo] installing ivy 2.2.0 to C:\Users\Thomas Koch\.ant\lib [get] Getting: http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar [get] To: C:\Users\Thomas Koch\.ant\lib\ivy-2.2.0.jar BUILD SUCCESSFUL Total time: 1 second Note: running ant ivy-bootstrap the 2nd time seems to detect the already installed version - and not download it again: ivy-bootstrap: [echo] installing ivy 2.2.0 to C:\Users\Thomas Koch\.ant\lib [get] Getting: http://repo1.maven.org/maven2/org/apache/ivy/ivy/2.2.0/ivy-2.2.0.jar [get] To: C:\Users\Thomas Koch\.ant\lib\ivy-2.2.0.jar [get] Not modified - so not downloaded BUILD SUCCESSFUL Total time: 1 second So I'd say either adding ant ivy-bootstrap to the make all target (of PyLucene) or simply adding a make ivy target (and some hint in the docs) could help here. Alternatively it should be clearly marked as a required component to build PyLucene - as the BUILD readme of java-lucene tells: ...Set up your development environment (JDK 1.5 or greater, Ant 1.7.1+, Ivy 2.2.0) However it seems that the ivy.jar in user's local ant dir may not be sufficient depending on the ANT global config. I understand the Java-Lucene guys decided to fallback to a wiki page and some error details in the build process (and declare ivy as required - see above) - cf. lucene jira where the problem is discussed: https://issues.apache.org/jira/browse/LUCENE-3946 Finally: my ant-1.8 does not recognize the mentioned ant --noconfig option, but only supports -nouserlib Run ant without using the jar files from ${user.home}/.ant/lib -noclasspathRun ant without using CLASSPATH It's somewhat ironic that bringing in a very powerful dependency manager (Apache Ivy website) results in another dependency issue... Regards, Thomas -Ursprüngliche Nachricht- Von: Andi Vajda [mailto:va...@apache.org] Gesendet: Samstag, 5. Mai 2012 23:53 An: pylucene-dev@lucene.apache.org Betreff: Re: [VOTE] Release PyLucene 3.6.0 Please vote to release these artifacts as PyLucene 3.6.0-1. Lucene fails to compile because I don't have ivy installed and the Makefile doesn't call ivy-bootstrap automatically. Right. The Lucene error should be clear enough but adding a make target could help. How to communicate that target, though ? Andi..
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13815 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13815/ All tests passed Build Log (for compile errors): [...truncated 24077 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13816 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13816/ All tests passed Build Log (for compile errors): [...truncated 24057 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13817 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13817/ All tests passed Build Log (for compile errors): [...truncated 24079 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13818 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13818/ All tests passed Build Log (for compile errors): [...truncated 24054 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13819 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13819/ All tests passed Build Log (for compile errors): [...truncated 24057 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13820 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13820/ All tests passed Build Log (for compile errors): [...truncated 24095 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 13820 - Still Failing
I committed a fix... Mike McCandless http://blog.mikemccandless.com On Sun, May 6, 2012 at 5:37 AM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13820/ All tests passed Build Log (for compile errors): [...truncated 24095 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 13821 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/13821/ All tests passed Build Log (for compile errors): [...truncated 24140 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269191#comment-13269191 ] Andrzej Bialecki commented on SOLR-139: David, please see LUCENE-3837 for a low-level partial update of inverted fields without re-indexing other fields. That is very much work in progress, and it's more complex. This issue provides a shortcut to a retrieve stored fields, modify, delete original doc, add modified doc sequence that users would have to execute manually. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: New Feature Components: update Reporter: Ryan McKinley Attachments: Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-139.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4032) don't write offsetlength every skip
[ https://issues.apache.org/jira/browse/LUCENE-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269196#comment-13269196 ] Michael McCandless commented on LUCENE-4032: +1 don't write offsetlength every skip --- Key: LUCENE-4032 URL: https://issues.apache.org/jira/browse/LUCENE-4032 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-4032.patch We currently write this every skip, but we should try to avoid this (like payloads). This reduces skip data on my test corpus: .frq goes from 52354303 - 50896066 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3830) MappingCharFilter could be improved by switching to an FST.
[ https://issues.apache.org/jira/browse/LUCENE-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269210#comment-13269210 ] Robert Muir commented on LUCENE-3830: - patch looks good: I guess the bulk read in RollingCharBuffer should help other things like Kuromoji that use it too?! MappingCharFilter could be improved by switching to an FST. --- Key: LUCENE-3830 URL: https://issues.apache.org/jira/browse/LUCENE-3830 Project: Lucene - Java Issue Type: Improvement Reporter: Dawid Weiss Assignee: Michael McCandless Priority: Minor Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.0 Attachments: LUCENE-3830.patch, LUCENE-3830.patch, LUCENE-3830.patch, PerfTestMappingCharFilter.java MappingCharFilter stores an overly complex tree-like structure for matching input patterns. The input is a union of fixed strings mapped to a set of fixed strings; an fst matcher would be ideal here and provide both memory and speed improvement I bet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3830) MappingCharFilter could be improved by switching to an FST.
[ https://issues.apache.org/jira/browse/LUCENE-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269211#comment-13269211 ] Michael McCandless commented on LUCENE-3830: bq. I guess the bulk read in RollingCharBuffer should help other things like Kuromoji that use it too?! I haven't tested but I think it should help! MappingCharFilter could be improved by switching to an FST. --- Key: LUCENE-3830 URL: https://issues.apache.org/jira/browse/LUCENE-3830 Project: Lucene - Java Issue Type: Improvement Reporter: Dawid Weiss Assignee: Michael McCandless Priority: Minor Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.0 Attachments: LUCENE-3830.patch, LUCENE-3830.patch, LUCENE-3830.patch, PerfTestMappingCharFilter.java MappingCharFilter stores an overly complex tree-like structure for matching input patterns. The input is a union of fixed strings mapped to a set of fixed strings; an fst matcher would be ideal here and provide both memory and speed improvement I bet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3830) MappingCharFilter could be improved by switching to an FST.
[ https://issues.apache.org/jira/browse/LUCENE-3830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-3830. Resolution: Fixed MappingCharFilter could be improved by switching to an FST. --- Key: LUCENE-3830 URL: https://issues.apache.org/jira/browse/LUCENE-3830 Project: Lucene - Java Issue Type: Improvement Reporter: Dawid Weiss Assignee: Michael McCandless Priority: Minor Labels: gsoc2012, lucene-gsoc-12 Fix For: 4.0 Attachments: LUCENE-3830.patch, LUCENE-3830.patch, LUCENE-3830.patch, PerfTestMappingCharFilter.java MappingCharFilter stores an overly complex tree-like structure for matching input patterns. The input is a union of fixed strings mapped to a set of fixed strings; an fst matcher would be ideal here and provide both memory and speed improvement I bet. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3948) Experiment with placing poms outside of src
[ https://issues.apache.org/jira/browse/LUCENE-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe updated LUCENE-3948: Attachment: LUCENE-3948.patch Patch, brought up to date with the {{modules/}}-{{lucene/}} move. Also: added info to {{dev-tools/maven/README.maven}}, and modified {{svn:ignore}} properties, to ignore top-level {{maven-build/}}, and to stop ignoring {{pom.xml}} files. I'll commit this tomorrow if there are no objections. Experiment with placing poms outside of src --- Key: LUCENE-3948 URL: https://issues.apache.org/jira/browse/LUCENE-3948 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Chris Male Priority: Minor Attachments: LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch Recent work in LUCENE-3944 has changed how our generated pom.xml files are handled during release preparation, placing them in build/ instead. However get-maven-poms still places the poms inside src/ so you can use them to drive a build. What I think would be ideal is if we could unify the release handling of the poms and the normal building handling, so that the poms can sit outside of src and serve both purposes. Some time ago I investigated how the ANT project handles its own Maven integration and it has its poms sitting in their own directory. They then reference the actual src locations inside the poms. This works for ANT but with a warning since some of their tests don't work due to how the Maven surefire plugin works, so they skip their tests. I have done some quick testing of my own and this process does seem to work for our poms and tests. I now want to take this to a full scale POC and see if it works fully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3948) Experiment with placing poms outside of src
[ https://issues.apache.org/jira/browse/LUCENE-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steven Rowe reassigned LUCENE-3948: --- Assignee: Steven Rowe Experiment with placing poms outside of src --- Key: LUCENE-3948 URL: https://issues.apache.org/jira/browse/LUCENE-3948 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Chris Male Assignee: Steven Rowe Priority: Minor Attachments: LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch Recent work in LUCENE-3944 has changed how our generated pom.xml files are handled during release preparation, placing them in build/ instead. However get-maven-poms still places the poms inside src/ so you can use them to drive a build. What I think would be ideal is if we could unify the release handling of the poms and the normal building handling, so that the poms can sit outside of src and serve both purposes. Some time ago I investigated how the ANT project handles its own Maven integration and it has its poms sitting in their own directory. They then reference the actual src locations inside the poms. This works for ANT but with a warning since some of their tests don't work due to how the Maven surefire plugin works, so they skip their tests. I have done some quick testing of my own and this process does seem to work for our poms and tests. I now want to take this to a full scale POC and see if it works fully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4032) don't write offsetlength every skip
[ https://issues.apache.org/jira/browse/LUCENE-4032?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4032. - Resolution: Fixed Fix Version/s: 4.0 don't write offsetlength every skip --- Key: LUCENE-4032 URL: https://issues.apache.org/jira/browse/LUCENE-4032 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-4032.patch We currently write this every skip, but we should try to avoid this (like payloads). This reduces skip data on my test corpus: .frq goes from 52354303 - 50896066 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box
[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Krupansky updated SOLR-3439: - Attachment: Lincoln-Gettysburg-Address.pdf Lincoln-Gettysburg-Address.docx Test documents for SolrCell. Both have a bunch of metadata fields defined. The PDF was generated from the Word doc. We can consider them for inclusion in exampledocs, but for now they are posted here for reference and anybody wanting to test this issue. Add content field to example schema to make SolrCell easier to use out of the box --- Key: SOLR-3439 URL: https://issues.apache.org/jira/browse/SOLR-3439 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction), Schema and Analysis Reporter: Jack Krupansky Priority: Minor Fix For: 4.0 Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf Currently, SolrCell is configured to map Tika content (the main body of a document) to the text field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation. I propose that we add the content field to the example schema in the section of fields already defined to support SolrCell metadata. It would be stored and indexed. I further propose that a copyField be added for the title, description, (and maybe a couple of others) and content fields to add them to the text field for searching. Again, trying to improve the out of the box user experience. It also simplifies testing - less setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box
[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269238#comment-13269238 ] Yonik Seeley commented on SOLR-3439: I agree with adding a stored content field, but I don't think we should add any more copyFields. One of the biggest out of the box experience items that people make their decision based on is performance - so we shouldn't make the example schema/config slower. Add content field to example schema to make SolrCell easier to use out of the box --- Key: SOLR-3439 URL: https://issues.apache.org/jira/browse/SOLR-3439 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction), Schema and Analysis Reporter: Jack Krupansky Priority: Minor Fix For: 4.0 Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf Currently, SolrCell is configured to map Tika content (the main body of a document) to the text field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation. I propose that we add the content field to the example schema in the section of fields already defined to support SolrCell metadata. It would be stored and indexed. I further propose that a copyField be added for the title, description, (and maybe a couple of others) and content fields to add them to the text field for searching. Again, trying to improve the out of the box user experience. It also simplifies testing - less setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box
[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269244#comment-13269244 ] Jack Krupansky commented on SOLR-3439: -- We could have the copyFields default to being commented out, but then the content would not be searched by default. Or we could not index the content field, but then it can't be searched by itself. For non-SolrCell applications, will copyField of the empty content field be a significant performance drag? Or is it only the apps that use SolrCell where there are concerns about the copyField impact? I agree that performance should be a consideration, but I suspect that these couple of copyFields(I'll post the preliminary patch as soon as the tests finish running) are small potatoes in the overall performance picture. Add content field to example schema to make SolrCell easier to use out of the box --- Key: SOLR-3439 URL: https://issues.apache.org/jira/browse/SOLR-3439 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction), Schema and Analysis Reporter: Jack Krupansky Priority: Minor Fix For: 4.0 Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf Currently, SolrCell is configured to map Tika content (the main body of a document) to the text field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation. I propose that we add the content field to the example schema in the section of fields already defined to support SolrCell metadata. It would be stored and indexed. I further propose that a copyField be added for the title, description, (and maybe a couple of others) and content fields to add them to the text field for searching. Again, trying to improve the out of the box user experience. It also simplifies testing - less setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box
[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269245#comment-13269245 ] Yonik Seeley commented on SOLR-3439: bq. For non-SolrCell applications, will copyField of the empty content field be a significant performance drag? No, but if it's used, it can be a big performance drag (indexing content twice). I'm not sure how important it is to be searched by default... i.e. with edismax, someone would just need to add content to the qf parameter. Add content field to example schema to make SolrCell easier to use out of the box --- Key: SOLR-3439 URL: https://issues.apache.org/jira/browse/SOLR-3439 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction), Schema and Analysis Reporter: Jack Krupansky Priority: Minor Fix For: 4.0 Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf Currently, SolrCell is configured to map Tika content (the main body of a document) to the text field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation. I propose that we add the content field to the example schema in the section of fields already defined to support SolrCell metadata. It would be stored and indexed. I further propose that a copyField be added for the title, description, (and maybe a couple of others) and content fields to add them to the text field for searching. Again, trying to improve the out of the box user experience. It also simplifies testing - less setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box
[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269247#comment-13269247 ] Jack Krupansky commented on SOLR-3439: -- Right, so if it is the double indexing that is a serious concern, maybe having content stored but not indexed is a reasonable compromise. It would be searchable due to the CopyField but not double-indexed. This would still give a reasonablly friendly out of the box experience (default search works and content is returned), and obviously they can hand-tune for more specific control. But if content is stored but not indexed, the user can't simply add content to qf - they need to make it indexed, which is what my preliminary patch does. Add content field to example schema to make SolrCell easier to use out of the box --- Key: SOLR-3439 URL: https://issues.apache.org/jira/browse/SOLR-3439 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction), Schema and Analysis Reporter: Jack Krupansky Priority: Minor Fix For: 4.0 Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf Currently, SolrCell is configured to map Tika content (the main body of a document) to the text field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation. I propose that we add the content field to the example schema in the section of fields already defined to support SolrCell metadata. It would be stored and indexed. I further propose that a copyField be added for the title, description, (and maybe a couple of others) and content fields to add them to the text field for searching. Again, trying to improve the out of the box user experience. It also simplifies testing - less setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box
[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Krupansky updated SOLR-3439: - Attachment: SOLR-3439.patch Preliminary patch. content is both stored and indexed, with multiple copy fields. Add content field to example schema to make SolrCell easier to use out of the box --- Key: SOLR-3439 URL: https://issues.apache.org/jira/browse/SOLR-3439 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction), Schema and Analysis Reporter: Jack Krupansky Priority: Minor Fix For: 4.0 Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch Currently, SolrCell is configured to map Tika content (the main body of a document) to the text field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation. I propose that we add the content field to the example schema in the section of fields already defined to support SolrCell metadata. It would be stored and indexed. I further propose that a copyField be added for the title, description, (and maybe a couple of others) and content fields to add them to the text field for searching. Again, trying to improve the out of the box user experience. It also simplifies testing - less setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4034) improve functionquery tests, fix some minor bugs
[ https://issues.apache.org/jira/browse/LUCENE-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4034: Attachment: LUCENE-4034.patch improve functionquery tests, fix some minor bugs Key: LUCENE-4034 URL: https://issues.apache.org/jira/browse/LUCENE-4034 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-4034.patch Currently functionqueries have basically no simple low-level tests. Found a few minor problems: * fix -1 summation (in case some, but not all segments are preflex): TotalTermFreq/SumTotalTermFreq * fix omitTF case (due to LUCENE-2929, docsenum will return null if you ask for freqs but the field is omitTF). * fix some indexedField vs field mixups * fix QueryUtils searcher-wrapping to also set the similarity the same as it was on the original searcher. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4034) improve functionquery tests, fix some minor bugs
Robert Muir created LUCENE-4034: --- Summary: improve functionquery tests, fix some minor bugs Key: LUCENE-4034 URL: https://issues.apache.org/jira/browse/LUCENE-4034 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-4034.patch Currently functionqueries have basically no simple low-level tests. Found a few minor problems: * fix -1 summation (in case some, but not all segments are preflex): TotalTermFreq/SumTotalTermFreq * fix omitTF case (due to LUCENE-2929, docsenum will return null if you ask for freqs but the field is omitTF). * fix some indexedField vs field mixups * fix QueryUtils searcher-wrapping to also set the similarity the same as it was on the original searcher. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3505) BooleanScorer2.freq() doesnt work unless you call score() first.
[ https://issues.apache.org/jira/browse/LUCENE-3505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3505: Attachment: LUCENE-3505.patch patch brought up to trunk. still doesnt have any tests. BooleanScorer2.freq() doesnt work unless you call score() first. Key: LUCENE-3505 URL: https://issues.apache.org/jira/browse/LUCENE-3505 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Attachments: LUCENE-3505.patch, LUCENE-3505.patch its 0, the freq() is then calculated as a side effect of score()... we should at least document this or throw UOE for freq() instead. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4035) Collation via docvalues
Robert Muir created LUCENE-4035: --- Summary: Collation via docvalues Key: LUCENE-4035 URL: https://issues.apache.org/jira/browse/LUCENE-4035 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Robert Muir Currently collated sort is via an Analyzer into an indexedfield, which is uninverted in the fieldcache. Instead we could support this with docvalues, and take advantage of future improvements like LUCENE-3729. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4035) Collation via docvalues
[ https://issues.apache.org/jira/browse/LUCENE-4035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4035: Attachment: LUCENE-4035.patch just a quick prototype patch... not happy about how the sort apis work with this (see LUCENE-4033) Collation via docvalues --- Key: LUCENE-4035 URL: https://issues.apache.org/jira/browse/LUCENE-4035 Project: Lucene - Java Issue Type: Improvement Affects Versions: 4.0 Reporter: Robert Muir Attachments: LUCENE-4035.patch Currently collated sort is via an Analyzer into an indexedfield, which is uninverted in the fieldcache. Instead we could support this with docvalues, and take advantage of future improvements like LUCENE-3729. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Annotation for run this test, but don't fail build if it fails ?
So, I started thinking about it -- I can implement something that will report failures (much like we do right now) it's quite tricky to fit it into the reporting system and continuous integration system. Here's why -- if a test doesn't fail then its output (sysout/syserrs) are not currently printed (to provide a cleaner view of what's been executed). Verbose log is on disk but it'd have to be scanned by hand (and copied as a build artifact). Yet another problem is that jenkins wouldn't _fail_ on such pseudo-failures because the set of JUnit statuses is not extensible (it'd be something like FAILED+IGNORE) so we'd need to either go with IGNORED, ASSUMPTION_IGNORED or SUCCESS, none of which are a good match, really. ASSUMPTION_IGNORED status is probably most convenient here because of how it can be technically propagated back to JUnit. Any ideas? Hoss -- how do you envision monitoring of these tests? Manually? Dawid If we could leave these tests running on every build, then we could at least monitor the relative frequency of the failures -- ie: last week testFoo failed in 10% of the builds, this week it fails in every build, so somebody definiteily broke something or last week testFoor failed in 10% of the builds, and after my attempted hardening it only fails in 5% of the builds so i may be on to something. what do folks think? -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4034) improve functionquery tests, fix some minor bugs
[ https://issues.apache.org/jira/browse/LUCENE-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4034. - Resolution: Fixed Fix Version/s: 4.0 I committed this... if anyone has any comments (especially regarding omitTF, it seemed to me e.g. tf() should be consistent with what termquery does), let me know. improve functionquery tests, fix some minor bugs Key: LUCENE-4034 URL: https://issues.apache.org/jira/browse/LUCENE-4034 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-4034.patch Currently functionqueries have basically no simple low-level tests. Found a few minor problems: * fix -1 summation (in case some, but not all segments are preflex): TotalTermFreq/SumTotalTermFreq * fix omitTF case (due to LUCENE-2929, docsenum will return null if you ask for freqs but the field is omitTF). * fix some indexedField vs field mixups * fix QueryUtils searcher-wrapping to also set the similarity the same as it was on the original searcher. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Annotation for run this test, but don't fail build if it fails ?
On Sun, May 6, 2012 at 2:39 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: Any ideas? Hoss -- how do you envision monitoring of these tests? Manually? If the tests are run many times a day, it would be great to get a daily report of the percent of time the tests pass. Then if it goes from 5% to 50%, we can go uh-oh... The crux of the problem remains that (for solr devs) it's still much more useful to have a test fail intermittently than to disable and not run the test at all. -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Annotation for run this test, but don't fail build if it fails ?
If the tests are run many times a day, it would be great to get a daily report of the percent of time the tests pass. Then if it goes from 5% to 50%, we can go uh-oh... Yeah, well... but this is beyond the runner as it aggregates over time -- it looks like a jenkins plugin that would analyze test run logs and provide such statistics. I also admit I've never seen anything like this -- a suite of tests with an allowed failure ratio over time and a threshold that would trigger a warning... The crux of the problem remains that (for solr devs) it's still much more useful to have a test fail intermittently than to disable and not run the test at all. These are weird tests if they allow for a (predictable?) failure from time to time. I don't say it's a bad concept, but I think unit tests may not be a good framework for handling this. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Annotation for run this test, but don't fail build if it fails ?
On Sun, May 6, 2012 at 3:38 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: I also admit I've never seen anything like this -- a suite of tests with an allowed failure ratio over time and a threshold that would trigger a warning... Not so much an allowed failure rate... more of it fails sometimes and no one has had the time to try to get it to pass with a greater percentage of time. And even when people put effort into get it to pass more often, it's still not 100%. As those tests exist now, there are a few choices a) turn them off (this is bad because it seriously decreases coverage) b) somehow deal with the intermittent failures Given that we're not running on a realtime system, the fact that many higher level tests have timing and scheduling dependencies means that we will never achieve a 100% pass rate on such tests. These are weird tests if they allow for a (predictable?) failure from time to time. I don't say it's a bad concept, but I think unit tests may not be a good framework for handling this. Yeah, these aren't really unit tests. Should we try to move them somewhere else? Or run them separately and email the results to a different list? -Yonik lucenerevolution.com - Lucene/Solr Open Source Search Conference. Boston May 7-10 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Jira doesn't know that 3.6 is released
I was going to file a Jira on an issue with Solr 3.6, but I noticed that Jira still thinks that 3.6 is “Unreleased”. -- Jack Krupansky
[jira] [Created] (SOLR-3441) Make ElisionFilterFactory MultiTermAware
Jack Krupansky created SOLR-3441: Summary: Make ElisionFilterFactory MultiTermAware Key: SOLR-3441 URL: https://issues.apache.org/jira/browse/SOLR-3441 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6 Reporter: Jack Krupansky Priority: Minor The ElisionFilterFactory (which removes l' from l'avion) is not MultiTermAware - which includes release 3.6. I wanted to use a wildcard such as: (l'aub*). Seems simple enough to address. I'll attach a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3441) Make ElisionFilterFactory MultiTermAware
[ https://issues.apache.org/jira/browse/SOLR-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jack Krupansky updated SOLR-3441: - Attachment: SOLR-3441.patch Preliminary patch. Make ElisionFilterFactory MultiTermAware Key: SOLR-3441 URL: https://issues.apache.org/jira/browse/SOLR-3441 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 3.6 Reporter: Jack Krupansky Priority: Minor Attachments: SOLR-3441.patch The ElisionFilterFactory (which removes l' from l'avion) is not MultiTermAware - which includes release 3.6. I wanted to use a wildcard such as: (l'aub*). Seems simple enough to address. I'll attach a patch. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box
[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269305#comment-13269305 ] Jan Høydahl commented on SOLR-3439: --- Really, the copyField thing in todays example schema is an *anti pattern* since we teach people to duplicate all their content while most people would be better off using DisMax. I have had several customers who build their whole search on the model from example schema and then get into performance problems due to the 2x index increase. How would you feel if we instead get rid of *all* the copyFields and configure the default handler with defType=edismaxqf=name,features,manu,content Then we can leave a copyField section commented out in the schema with an explanation of what use cases it is good for. Add content field to example schema to make SolrCell easier to use out of the box --- Key: SOLR-3439 URL: https://issues.apache.org/jira/browse/SOLR-3439 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction), Schema and Analysis Reporter: Jack Krupansky Priority: Minor Fix For: 4.0 Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch Currently, SolrCell is configured to map Tika content (the main body of a document) to the text field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation. I propose that we add the content field to the example schema in the section of fields already defined to support SolrCell metadata. It would be stored and indexed. I further propose that a copyField be added for the title, description, (and maybe a couple of others) and content fields to add them to the text field for searching. Again, trying to improve the out of the box user experience. It also simplifies testing - less setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4024) FuzzyQuery should never do edit distance 2
[ https://issues.apache.org/jira/browse/LUCENE-4024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4024. - Resolution: Fixed FuzzyQuery should never do edit distance 2 Key: LUCENE-4024 URL: https://issues.apache.org/jira/browse/LUCENE-4024 Project: Lucene - Java Issue Type: Improvement Reporter: Michael McCandless Fix For: 4.0 Attachments: LUCENE-4024.patch Edit distance 1 and 2 are now very very fast compared to 3.x (100X-200X faster) ... but edit distance 3 will fallback to the super-slow scan all terms in 3.x, which is not graceful degradation. Not sure how to fix it ... mabye we have a SlowFuzzyQuery? And FuzzyQuery throws exc if you try to ask it to be slow? Or, we add boolean (off by default) that you must turn on to allow slow one..? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Why does ValueSource still implement 'Serializable' if Java serialization is out in Lucene 4.0?
sounds like a relic. care to toss up a patch? On Sun, May 6, 2012 at 7:11 PM, Benson Margulies bimargul...@gmail.com wrote: ? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Anyone else interested in json serialization of queries
I'm pecking away at my idea of providing query serialization via Jackson without adding annotations or setters to the Query classes. If anyone else likes this idea well enough to pitch in, please let me know and I'll unleash you on the github repo. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4036) HaversineConstFunction ignores one of its two values, is this on purpose?
Benson Margulies created LUCENE-4036: Summary: HaversineConstFunction ignores one of its two values, is this on purpose? Key: LUCENE-4036 URL: https://issues.apache.org/jira/browse/LUCENE-4036 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 4.0 Reporter: Benson Margulies org.apache.solr.search.function.distance.HaversineConstFunction.parser.new ValueSourceParser() {...}.parse(FunctionQParser) has an unused variable warning for 'vs2', and uses vs1 to initialize mv2. Maybe vs2 should just be deleted? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4037) ValueSource still implements Serializable
Benson Margulies created LUCENE-4037: Summary: ValueSource still implements Serializable Key: LUCENE-4037 URL: https://issues.apache.org/jira/browse/LUCENE-4037 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 4.0 Reporter: Benson Margulies Priority: Minor Attachments: LUCENE-4037.patch 4.0 eliminates the use of Serializable. Here's a leftover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4037) ValueSource still implements Serializable
[ https://issues.apache.org/jira/browse/LUCENE-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Benson Margulies updated LUCENE-4037: - Attachment: LUCENE-4037.patch ValueSource still implements Serializable - Key: LUCENE-4037 URL: https://issues.apache.org/jira/browse/LUCENE-4037 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 4.0 Reporter: Benson Margulies Priority: Minor Attachments: LUCENE-4037.patch 4.0 eliminates the use of Serializable. Here's a leftover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Why does ValueSource still implement 'Serializable' if Java serialization is out in Lucene 4.0?
Done. On Sun, May 6, 2012 at 7:13 PM, Robert Muir rcm...@gmail.com wrote: sounds like a relic. care to toss up a patch? On Sun, May 6, 2012 at 7:11 PM, Benson Margulies bimargul...@gmail.com wrote: ? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4037) ValueSource still implements Serializable
[ https://issues.apache.org/jira/browse/LUCENE-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4037: Attachment: LUCENE-4037.patch Thanks Benson! I searched around and found some other cruft... some old readResolve()'s etc... here's the patch. I'll commit soon. ValueSource still implements Serializable - Key: LUCENE-4037 URL: https://issues.apache.org/jira/browse/LUCENE-4037 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 4.0 Reporter: Benson Margulies Priority: Minor Attachments: LUCENE-4037.patch, LUCENE-4037.patch 4.0 eliminates the use of Serializable. Here's a leftover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4037) ValueSource still implements Serializable
[ https://issues.apache.org/jira/browse/LUCENE-4037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-4037. - Resolution: Fixed Fix Version/s: 4.0 ValueSource still implements Serializable - Key: LUCENE-4037 URL: https://issues.apache.org/jira/browse/LUCENE-4037 Project: Lucene - Java Issue Type: Bug Components: core/other Affects Versions: 4.0 Reporter: Benson Margulies Priority: Minor Fix For: 4.0 Attachments: LUCENE-4037.patch, LUCENE-4037.patch 4.0 eliminates the use of Serializable. Here's a leftover. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3296) Explore alternatives to Commons CSV
[ https://issues.apache.org/jira/browse/SOLR-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269324#comment-13269324 ] Chris Male commented on SOLR-3296: -- After some research (thanks Steven), it seems the likely cause of the failure is that their repositories timeout after some period if they aren't synced to the central repository. Because I submitted the bundle on a Friday, it perhaps didn't get looked into until too late. So I've resubmitted the bundle (on a Monday now), fingers crossed. Explore alternatives to Commons CSV --- Key: SOLR-3296 URL: https://issues.apache.org/jira/browse/SOLR-3296 Project: Solr Issue Type: Improvement Components: Build Reporter: Chris Male Attachments: SOLR-3295-CSV-tests.patch, SOLR-3296_noggit.patch, pom.xml, pom.xml In LUCENE-3930 we're implementing some less than ideal solutions to make available the unreleased version of commons-csv. We could remove these solutions if we didn't rely on this lib. So I think we should explore alternatives. I think [opencsv|http://opencsv.sourceforge.net/] is an alternative to consider, I've used it in many commercial projects. Bizarrely Commons-CSV's website says that Opencsv uses a BSD license, but this isn't the case, OpenCSV uses ASL2. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3439) Add content field to example schema to make SolrCell easier to use out of the box
[ https://issues.apache.org/jira/browse/SOLR-3439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269326#comment-13269326 ] Jack Krupansky commented on SOLR-3439: -- The concept of copyField is implicitly a judgment that a query of the merged fields is significantly better than the dismax query of the separate fields. But, is that really the case? And it is common to boost various document components differently, such as the title. That said, I am a little reluctant to change the overall pattern/approach simply to add one field. Maybe the pattern change should be a separate issue. Add content field to example schema to make SolrCell easier to use out of the box --- Key: SOLR-3439 URL: https://issues.apache.org/jira/browse/SOLR-3439 Project: Solr Issue Type: Improvement Components: contrib - Solr Cell (Tika extraction), Schema and Analysis Reporter: Jack Krupansky Priority: Minor Fix For: 4.0 Attachments: Lincoln-Gettysburg-Address.docx, Lincoln-Gettysburg-Address.pdf, SOLR-3439.patch Currently, SolrCell is configured to map Tika content (the main body of a document) to the text field which is the indexed-only (not stored) catch-all for default queries. That searches fine, but doesn't show the document content in the results, sometimes leading users to think that something is wrong. Sure, the user can easily add the field (and this is documented), but it would be a better user experience to have such a basic feature work right out of the box without any config editing and without the need for the user to read the fine print in the documentation. I propose that we add the content field to the example schema in the section of fields already defined to support SolrCell metadata. It would be stored and indexed. I further propose that a copyField be added for the title, description, (and maybe a couple of others) and content fields to add them to the text field for searching. Again, trying to improve the out of the box user experience. It also simplifies testing - less setup. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3948) Experiment with placing poms outside of src
[ https://issues.apache.org/jira/browse/LUCENE-3948?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13269330#comment-13269330 ] Chris Male commented on LUCENE-3948: +1 to committing Experiment with placing poms outside of src --- Key: LUCENE-3948 URL: https://issues.apache.org/jira/browse/LUCENE-3948 Project: Lucene - Java Issue Type: Improvement Components: general/build Reporter: Chris Male Assignee: Steven Rowe Priority: Minor Attachments: LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch, LUCENE-3948.patch Recent work in LUCENE-3944 has changed how our generated pom.xml files are handled during release preparation, placing them in build/ instead. However get-maven-poms still places the poms inside src/ so you can use them to drive a build. What I think would be ideal is if we could unify the release handling of the poms and the normal building handling, so that the poms can sit outside of src and serve both purposes. Some time ago I investigated how the ANT project handles its own Maven integration and it has its poms sitting in their own directory. They then reference the actual src locations inside the poms. This works for ANT but with a warning since some of their tests don't work due to how the Maven surefire plugin works, so they skip their tests. I have done some quick testing of my own and this process does seem to work for our poms and tests. I now want to take this to a full scale POC and see if it works fully. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org