Re: Pylucene release
On Nov 29, 2012, at 5:37, Shawn Grant shawn.gr...@orcatec.com wrote: Hi Andi, thanks for the explanation. The main problem I've come across so far is that it looks like the main branch lucene has a lucene41 codec in it that does not appear to be part of the 4.0 release and (I think) is causing problems creating and/or retrieving term vectors. I'm not a lucene expert and it's been hard to diagnose. I also can't use Luke due to the codec. PyLucene trunk is currently tracking Lucene's branch 4.x. I'd expect the lucene41 codec to be available there. I tried to set the default codec to lucene40 but then my index writer complained that lucene40 was only for reading. You should ask on the lucene-user@ list. There are more people listening there who would know the details. I'll try to contribute to porting the unit tests to help move the release along. Cool ! Andi.. On 11/13/2012 02:18 PM, Andi Vajda wrote: Hi Shawn, On Tue, 13 Nov 2012, Shawn Grant wrote: Hi Andi, I was just wondering if Pylucene is on its usual schedule to release 4-6 weeks after Lucene. I didn't see any discussion of it on the mailing list or elsewhere. I'm looking forward to 4.0! Normally, PyLucene is released a few days after a Lucene release but 4.0 has seen so many API changes and removals that all tests and samples need to be ported to the new API. Last week-end, I ported a few but lots remain to be. If no one helps, it either means that no one cares enough or that everyone is willing to be patient :-) The PyLucene trunk svn repository is currently tracking the Lucene Core 4.x branch and you're welcome to use it out of svn. In the ten or so unit tests I ported so far, I didn't find any issues with PyLucene proper (or JCC). All changes were due to the tests being out of date or using deprecated APIs now removed. You might find that PyLucene out-of-trunk is quite usable. If people want to help with porting PyLucene unit tests, the ones under its 'test' directory not yet ported, feel free to ask questions here. The gist of it is: - fix the imports (look at the first few tests for example, alphabetically) - fix the tests to pass by looking at the original Java tests for changes as most of these tests were originally ported from Java Lucene. Once you're familiar with the new APIs, porting the sample code in samples and in LuceneInAction should fairly straightforward. It's just that there is a lot to port. Andi..
[jira] [Commented] (LUCENE-4345) Create a Classification module
[ https://issues.apache.org/jira/browse/LUCENE-4345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506317#comment-13506317 ] Commit Tag Bot commented on LUCENE-4345: [trunk commit] Uwe Schindler http://svn.apache.org/viewvc?view=revisionrevision=1415074 LUCENE-4345: Fix forbidden APIs and make the test more predicatable Create a Classification module -- Key: LUCENE-4345 URL: https://issues.apache.org/jira/browse/LUCENE-4345 Project: Lucene - Core Issue Type: New Feature Reporter: Tommaso Teofili Assignee: Tommaso Teofili Priority: Minor Attachments: LUCENE-4345_2.patch, LUCENE-4345.patch, SOLR-3700_2.patch, SOLR-3700.patch Lucene/Solr can host huge sets of documents containing lots of information in fields so that these can be used as training examples (w/ features) in order to very quickly create classifiers algorithms to use on new documents and / or to provide an additional service. So the idea is to create a contrib module (called 'classification') to host a ClassificationComponent that will use already seen data (the indexed documents / fields) to classify new documents / text fragments. The first version will contain a (simplistic) Lucene based Naive Bayes classifier but more implementations should be added in the future. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (32bit/jdk1.6.0_37) - Build # 2938 - Still Failing!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/2938/ Java: 32bit/jdk1.6.0_37 -server -XX:+UseSerialGC All tests passed Build Log: [...truncated 13789 lines...] -check-forbidden-test-apis: [forbidden-apis] Reading API signatures: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/tools/forbiddenApis/tests.txt [forbidden-apis] Loading classes to check... [forbidden-apis] Scanning for API signatures and dependencies... [forbidden-apis] Forbidden method invocation: java.util.Random#init() [forbidden-apis] in org.apache.lucene.classification.utils.DataSplitterTest (DataSplitterTest.java:65) [forbidden-apis] Forbidden method invocation: java.util.Random#init() [forbidden-apis] in org.apache.lucene.classification.utils.DataSplitterTest (DataSplitterTest.java:70) [forbidden-apis] Forbidden method invocation: java.util.Random#init() [forbidden-apis] in org.apache.lucene.classification.utils.DataSplitterTest (DataSplitterTest.java:71) [forbidden-apis] Forbidden method invocation: java.util.Random#init() [forbidden-apis] in org.apache.lucene.classification.utils.DataSplitterTest (DataSplitterTest.java:71) [forbidden-apis] Forbidden method invocation: java.util.Random#init() [forbidden-apis] in org.apache.lucene.classification.utils.DataSplitterTest (DataSplitterTest.java:71) [forbidden-apis] Forbidden method invocation: java.util.Random#init() [forbidden-apis] in org.apache.lucene.classification.utils.DataSplitterTest (DataSplitterTest.java:73) [forbidden-apis] Forbidden method invocation: java.util.Random#init() [forbidden-apis] in org.apache.lucene.classification.utils.DataSplitterTest (DataSplitterTest.java:111) [forbidden-apis] Scanned 2157 (and 215 related) class file(s) for forbidden API invocations (in 2.10s), 7 error(s). BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:69: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:174: Check for forbidden API calls failed, see log. Total time: 20 minutes 57 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 32bit/jdk1.6.0_37 -server -XX:+UseSerialGC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shai Erera updated LUCENE-4575: --- Attachment: LUCENE-4575.patch Patch adds setCommitData to IndexWriter and increase changeCount as well as sets that commitData on segmentInfos. It also adds a test to verify the behavior. Regarding back-compat - I prefer to nuke commit(data) and prepcommit(data), in exchange for this API, for both trunk and 4.x. This patch however supports the old commit/prepcommit(data) API, but I think that it will be simpler if we just nuke these API. The migration to the new API is a no-brainer, just call setCommitData before your commit(). I don't intend to commit it yet, depending on how we decide to handle back-compat. If we decide to keep the back-compat support, I want to move the commit(data) and prepCommit(data) impls to their respective no-data versions, and then have these API deprecated and call setCommitData() followed by the respective no-data version. Allow IndexWriter to commit, even just commitData - Key: LUCENE-4575 URL: https://issues.apache.org/jira/browse/LUCENE-4575 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Priority: Minor Attachments: LUCENE-4575.patch Spinoff from here http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. In some cases, it is valuable to be able to commit changes to the index, even if the changes are just commitData. Such data is sometimes used by applications to register in the index some global application information/state. The proposal is: * Add a setCommitData() API and separate it from commit() and prepareCommit() (simplify their API) * When that API is called, flip on the dirty/changes bit, so that this gets committed even if no other changes were made to the index. I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3668) offsets issues with multiword synonyms
[ https://issues.apache.org/jira/browse/LUCENE-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506378#comment-13506378 ] Okke Klein commented on LUCENE-3668: Doesn't work for me either in Solr4. Can we revisit this issue? offsets issues with multiword synonyms -- Key: LUCENE-3668 URL: https://issues.apache.org/jira/browse/LUCENE-3668 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Assignee: Michael McCandless Fix For: 3.6, 4.0-ALPHA Attachments: LUCENE-3668.patch, LUCENE-3668_test.patch as reported on the list, there are some strange offsets with FSTSynonyms, in the case of multiword synonyms. as a workaround it was suggested to use the older synonym impl, but it has bugs too (just in a different way). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506392#comment-13506392 ] Lukas Graf commented on SOLR-139: - Ok, I finally figured it out by diffing every single difference from my test case to the stock Solr 4.0 example using _git bisect_. The culprit was a missing *updateLog /* directive in _solrconfig.xml_. As soon as I configured a transaction log, atomic updates worked as expected. I added a note about this at http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22field.22 . Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: New Feature Components: update Reporter: Ryan McKinley Fix For: 4.0 Attachments: Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3484 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3484/ All tests passed Build Log: [...truncated 20084 lines...] -documentation-lint: [echo] checking for broken html... [jtidy] Checking for broken html (such as invalid tags)... [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build/jtidy_tmp [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for missing docs... [exec] [exec] build/docs/classification/overview-summary.html [exec] missing: org.apache.lucene.classification.utils [exec] [exec] build/docs/classification/org/apache/lucene/classification/utils/package-summary.html [exec] no package description (missing package.html in src?) [exec] [exec] Missing javadocs were found! BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/build.xml:62: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build.xml:245: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/common-build.xml:1944: exec returned: 1 Total time: 24 minutes 0 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-trunk-Linux (64bit/jdk1.7.0_09) - Build # 2942 - Failure!
Build: http://jenkins.sd-datasolutions.de/job/Lucene-Solr-trunk-Linux/2942/ Java: 64bit/jdk1.7.0_09 -XX:+UseG1GC All tests passed Build Log: [...truncated 20067 lines...] -documentation-lint: [echo] checking for broken html... [jtidy] Checking for broken html (such as invalid tags)... [delete] Deleting directory /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build/jtidy_tmp [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for missing docs... [exec] [exec] build/docs/classification/overview-summary.html [exec] missing: org.apache.lucene.classification.utils [exec] [exec] build/docs/classification/org/apache/lucene/classification/utils/package-summary.html [exec] no package description (missing package.html in src?) [exec] [exec] Missing javadocs were found! BUILD FAILED /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/build.xml:62: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/build.xml:245: The following error occurred while executing this line: /mnt/ssd/jenkins/workspace/Lucene-Solr-trunk-Linux/lucene/common-build.xml:1944: exec returned: 1 Total time: 20 minutes 14 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Description set: Java: 64bit/jdk1.7.0_09 -XX:+UseG1GC Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506432#comment-13506432 ] Michael McCandless commented on LUCENE-4575: +1 to do a hard break; this is expert. Allow IndexWriter to commit, even just commitData - Key: LUCENE-4575 URL: https://issues.apache.org/jira/browse/LUCENE-4575 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Priority: Minor Attachments: LUCENE-4575.patch Spinoff from here http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. In some cases, it is valuable to be able to commit changes to the index, even if the changes are just commitData. Such data is sometimes used by applications to register in the index some global application information/state. The proposal is: * Add a setCommitData() API and separate it from commit() and prepareCommit() (simplify their API) * When that API is called, flip on the dirty/changes bit, so that this gets committed even if no other changes were made to the index. I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (LUCENE-3668) offsets issues with multiword synonyms
[ https://issues.apache.org/jira/browse/LUCENE-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506378#comment-13506378 ] Okke Klein edited comment on LUCENE-3668 at 11/29/12 12:36 PM: --- Doesn't work for me either in Solr4. Can we revisit this issue? Perhaps this http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/ can give some insight/help? was (Author: okkeklein): Doesn't work for me either in Solr4. Can we revisit this issue? offsets issues with multiword synonyms -- Key: LUCENE-3668 URL: https://issues.apache.org/jira/browse/LUCENE-3668 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Assignee: Michael McCandless Fix For: 3.6, 4.0-ALPHA Attachments: LUCENE-3668.patch, LUCENE-3668_test.patch as reported on the list, there are some strange offsets with FSTSynonyms, in the case of multiword synonyms. as a workaround it was suggested to use the older synonym impl, but it has bugs too (just in a different way). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4120) Collection API: Support for specifying a list of solrs to spread a new collection across
[ https://issues.apache.org/jira/browse/SOLR-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per Steffensen updated SOLR-4120: - Attachment: SOLR-4120.patch h4. SOLR-4120.patch h5. Where does it fit * It fits on top of revision 1412602 of branch lucene_solr_4_0, where the patch for SOLR-4114 has already been applied. The following should work if you have a checkout of revision 1412602 of branch lucene_solr_4_0 ** cd checkout-folder ** patch -s -p0 SOLR-4114.patch ** patch --ignore-whitespace -p0 SOLR-4120.patch You need the --ignore-whitespace - at least with my version of patch on Show Leopard. Probably because I do not have the correct Solr code-style installed in my Eclipse. Hmmm, probably should do that. h5. Content of the patch The patch modifies the create operation of the Solr Collection API, so that i allows to provide a list of Solrs that the shards for the new collection should be spread across * Param key: createNodeSet * Param value: comma-separated list of node-names (equal to the node-names received from ClusterState.getLiveNodes()) * Param is not mandatory. If not provided the created collection will still have its shards spread across all live nodes h5. Testing BasicDistributedZkTest.testCollectionAPI has been modified to also test this feature Collection API: Support for specifying a list of solrs to spread a new collection across Key: SOLR-4120 URL: https://issues.apache.org/jira/browse/SOLR-4120 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Per Steffensen Priority: Minor Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4120.patch When creating a new collection through the Collection API, the Overseer (handling the creation) will spread shards for this new collection across all live nodes. Sometimes you dont want a collection spread across all available nodes. Allow for the create operation of the Collection API, to take a createNodeSet parameter containing a list of Solr to spread the new shards across. If not provided it will just spread across all available nodes (default). For an example of a concrete case of usage see: https://issues.apache.org/jira/browse/SOLR-4114?focusedCommentId=13505506page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13505506 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4120) Collection API: Support for specifying a list of solrs to spread a new collection across
[ https://issues.apache.org/jira/browse/SOLR-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506444#comment-13506444 ] Per Steffensen edited comment on SOLR-4120 at 11/29/12 12:56 PM: - h4. SOLR-4120.patch h5. Where does it fit * It fits on top of revision 1412602 of branch lucene_solr_4_0, where the patch for SOLR-4114 has already been applied. The following should work if you have a checkout of revision 1412602 of branch lucene_solr_4_0 ** cd checkout-folder ** patch -s -p0 SOLR-4114.patch ** patch --ignore-whitespace -p0 SOLR-4120.patch You need the --ignore-whitespace - at least with my version of patch on Show Leopard. Probably because I do not have the correct Solr code-style installed in my Eclipse. Hmmm, probably should do that. h5. Content of the patch The patch modifies the create operation of the Solr Collection API, so that i allows to provide a list of Solrs that the shards for the new collection should be spread across * Param key: createNodeSet (OverseerCollectionProcessor.CREATE_NODE_SET) * Param value: comma-separated list of node-names (equal to the node-names received from ClusterState.getLiveNodes()) * Param is not mandatory. If not provided the created collection will still have its shards spread across all live nodes h5. Testing BasicDistributedZkTest.testCollectionAPI has been modified to also test this feature was (Author: steff1193): h4. SOLR-4120.patch h5. Where does it fit * It fits on top of revision 1412602 of branch lucene_solr_4_0, where the patch for SOLR-4114 has already been applied. The following should work if you have a checkout of revision 1412602 of branch lucene_solr_4_0 ** cd checkout-folder ** patch -s -p0 SOLR-4114.patch ** patch --ignore-whitespace -p0 SOLR-4120.patch You need the --ignore-whitespace - at least with my version of patch on Show Leopard. Probably because I do not have the correct Solr code-style installed in my Eclipse. Hmmm, probably should do that. h5. Content of the patch The patch modifies the create operation of the Solr Collection API, so that i allows to provide a list of Solrs that the shards for the new collection should be spread across * Param key: createNodeSet * Param value: comma-separated list of node-names (equal to the node-names received from ClusterState.getLiveNodes()) * Param is not mandatory. If not provided the created collection will still have its shards spread across all live nodes h5. Testing BasicDistributedZkTest.testCollectionAPI has been modified to also test this feature Collection API: Support for specifying a list of solrs to spread a new collection across Key: SOLR-4120 URL: https://issues.apache.org/jira/browse/SOLR-4120 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Per Steffensen Priority: Minor Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4120.patch When creating a new collection through the Collection API, the Overseer (handling the creation) will spread shards for this new collection across all live nodes. Sometimes you dont want a collection spread across all available nodes. Allow for the create operation of the Collection API, to take a createNodeSet parameter containing a list of Solr to spread the new shards across. If not provided it will just spread across all available nodes (default). For an example of a concrete case of usage see: https://issues.apache.org/jira/browse/SOLR-4114?focusedCommentId=13505506page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13505506 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506468#comment-13506468 ] Shai Erera commented on LUCENE-4575: Thanks. I forgot to mention two things about the changes in the patch, which I wasn't sure about: # I currently copy the commitData map on setCommitData. It seems safe to do it, and I don't think commitData are huge. Any objections? # I set pass the copied map directly to segmentInfos, rather than saving it in a member in IW. Do you see any issues with it? (I'm thinking about rollback, even though we have another copy of the segmentInfos for rollback purposes ...) Allow IndexWriter to commit, even just commitData - Key: LUCENE-4575 URL: https://issues.apache.org/jira/browse/LUCENE-4575 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Priority: Minor Attachments: LUCENE-4575.patch Spinoff from here http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. In some cases, it is valuable to be able to commit changes to the index, even if the changes are just commitData. Such data is sometimes used by applications to register in the index some global application information/state. The proposal is: * Add a setCommitData() API and separate it from commit() and prepareCommit() (simplify their API) * When that API is called, flip on the dirty/changes bit, so that this gets committed even if no other changes were made to the index. I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4124) You should be able to set the update log directory with the CoreAdmin API the same way as the data directory.
[ https://issues.apache.org/jira/browse/SOLR-4124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4124: -- Attachment: SOLR-4124.patch First cut at a patch. You should be able to set the update log directory with the CoreAdmin API the same way as the data directory. - Key: SOLR-4124 URL: https://issues.apache.org/jira/browse/SOLR-4124 Project: Solr Issue Type: Improvement Affects Versions: 4.0 Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.1, 5.0 Attachments: SOLR-4124.patch -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1604) Wildcards, ORs etc inside Phrase Queries
[ https://issues.apache.org/jira/browse/SOLR-1604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506475#comment-13506475 ] Roman Slavik commented on SOLR-1604: Hi, I downloaded last version of ComplexPhrase (24/Oct/12 02:30) but have problem with junit test. Here is error log : {noformat} test(org.apache.solr.search.ComplexPhraseQParserPluginTest) Time elapsed: 0.191 sec ERROR! java.lang.RuntimeException: Exception during query at __randomizedtesting.SeedInfo.seed([4BF35CC9C13F3B15:C3A763136FC356ED]:0) at org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:283) at org.apache.solr.search.ComplexPhraseQParserPluginTest.test(ComplexPhraseQParserPluginTest.java:158) // nothing interest here Caused by: java.lang.IllegalArgumentException: Unknown query type org.apache.lucene.search.ConstantScoreQuery found in phrase query string jo* [sma TO smz] at org.apache.lucene.queryparser.classic.ComplexPhraseQueryParser$ComplexPhraseQuery.rewrite(ComplexPhraseQueryParser.java:297) at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:599) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:646) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1385) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1260) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:390) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:411) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:206) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1699) at org.apache.solr.util.TestHarness.query(TestHarness.java:364) at org.apache.solr.util.TestHarness.query(TestHarness.java:346) at org.apache.solr.util.AbstractSolrTestCase.assertQ(AbstractSolrTestCase.java:273) ... 41 more {noformat} Is it error on my side (I didn't change anything)? Or some crucial error? Wildcards, ORs etc inside Phrase Queries Key: SOLR-1604 URL: https://issues.apache.org/jira/browse/SOLR-1604 Project: Solr Issue Type: Improvement Components: query parsers, search Affects Versions: 1.4 Reporter: Ahmet Arslan Priority: Minor Attachments: ASF.LICENSE.NOT.GRANTED--ComplexPhrase.zip, ComplexPhraseQueryParser.java, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, ComplexPhrase.zip, SOLR-1604-alternative.patch, SOLR-1604.patch, SOLR-1604.patch Solr Plugin for ComplexPhraseQueryParser (LUCENE-1486) which supports wildcards, ORs, ranges, fuzzies inside phrase queries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-Tests-trunk-java7 - Build # 3485 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-Tests-trunk-java7/3485/ All tests passed Build Log: [...truncated 20087 lines...] -documentation-lint: [echo] checking for broken html... [jtidy] Checking for broken html (such as invalid tags)... [delete] Deleting directory /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build/jtidy_tmp [echo] Checking for broken links... [exec] [exec] Crawl/parse... [exec] [exec] Verify... [echo] Checking for missing docs... [exec] [exec] build/docs/classification/org/apache/lucene/classification/utils/DatasetSplitter.html [exec] missing Constructors: DatasetSplitter(double, double) [exec] [exec] Missing javadocs were found! BUILD FAILED /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/build.xml:62: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/build.xml:259: The following error occurred while executing this line: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-Tests-trunk-java7/lucene/common-build.xml:1944: exec returned: 1 Total time: 24 minutes 47 seconds Build step 'Invoke Ant' marked build as failure Archiving artifacts Recording test results Email was triggered for: Failure Sending email for trigger: Failure - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Active 4.x branches?
On Thu, Nov 29, 2012 at 1:24 AM, David Smiley (@MITRE.org) dsmi...@mitre.org wrote: Maybe we should have a roster somewhere of parts of the codebase that have an owner. Taking ownership is a mindset, and is very different from any kind of recognized having ownership. We shouldn't tag areas as owned by someone, as that could discourage others getting involved in that area. It might also encourage deference to the owner, which would also be a bad thing. We sometimes naturally defer to someone with more experience in an area than we have, but it should continue to be on an informal case-by-case basis. It could be useful to people not in the know on who to contact The right contact point is this mailing list. There's already way to much off-list (and off IRC channel) collaboration that goes on IMO. -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506492#comment-13506492 ] Yonik Seeley commented on LUCENE-4575: -- bq. I currently copy the commitData map on setCommitData. It seems safe to do it, and I don't think commitData are huge. Any objections? Do any users care about order (i.e. they pass in a LinkedHashMap)? If would be trivial to preserve *if* it added value for some. Allow IndexWriter to commit, even just commitData - Key: LUCENE-4575 URL: https://issues.apache.org/jira/browse/LUCENE-4575 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Priority: Minor Attachments: LUCENE-4575.patch Spinoff from here http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. In some cases, it is valuable to be able to commit changes to the index, even if the changes are just commitData. Such data is sometimes used by applications to register in the index some global application information/state. The proposal is: * Add a setCommitData() API and separate it from commit() and prepareCommit() (simplify their API) * When that API is called, flip on the dirty/changes bit, so that this gets committed even if no other changes were made to the index. I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4050) Solr example fails to start in nightly-smoke
[ https://issues.apache.org/jira/browse/SOLR-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506495#comment-13506495 ] Tomás Fernández Löbbe commented on SOLR-4050: - I'm having this exact issue after upgrading (trunk). is there something I should clean/rebuild/delete in order to get this to work? Solr example fails to start in nightly-smoke Key: SOLR-4050 URL: https://issues.apache.org/jira/browse/SOLR-4050 Project: Solr Issue Type: Bug Reporter: Michael McCandless Priority: Blocker The nightly smoke job is stalled (I'll go kill it shortly): https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/22/console It's stalled when trying to run the Solr example ... the server produced this output: {noformat} java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at org.eclipse.jetty.start.Main.invokeMain(Main.java:424) at org.eclipse.jetty.start.Main.start(Main.java:602) at org.eclipse.jetty.start.Main.main(Main.java:82) ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration Usage: java -jar start.jar [options] [properties] [configs] java -jar start.jar --help # for more information {noformat} Seems likely the Jetty upgrade somehow caused this... Separately I committed a fix to smoke tester so that it quickly fails if the Solr example fails to start ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4050) Solr example fails to start in nightly-smoke
[ https://issues.apache.org/jira/browse/SOLR-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506504#comment-13506504 ] Yonik Seeley commented on SOLR-4050: Tomas: try removing start.jar and let ivy re-get it. Solr example fails to start in nightly-smoke Key: SOLR-4050 URL: https://issues.apache.org/jira/browse/SOLR-4050 Project: Solr Issue Type: Bug Reporter: Michael McCandless Priority: Blocker The nightly smoke job is stalled (I'll go kill it shortly): https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/22/console It's stalled when trying to run the Solr example ... the server produced this output: {noformat} java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at org.eclipse.jetty.start.Main.invokeMain(Main.java:424) at org.eclipse.jetty.start.Main.start(Main.java:602) at org.eclipse.jetty.start.Main.main(Main.java:82) ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration Usage: java -jar start.jar [options] [properties] [configs] java -jar start.jar --help # for more information {noformat} Seems likely the Jetty upgrade somehow caused this... Separately I committed a fix to smoke tester so that it quickly fails if the Solr example fails to start ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Active 4.x branches?
Those are good points Yonik. I guess I don't know what to think anymore. Yonik Seeley-4 wrote On Thu, Nov 29, 2012 at 1:24 AM, David Smiley (@MITRE.org) lt; DSMILEY@ gt; wrote: Maybe we should have a roster somewhere of parts of the codebase that have an owner. Taking ownership is a mindset, and is very different from any kind of recognized having ownership. We shouldn't tag areas as owned by someone, as that could discourage others getting involved in that area. It might also encourage deference to the owner, which would also be a bad thing. We sometimes naturally defer to someone with more experience in an area than we have, but it should continue to be on an informal case-by-case basis. It could be useful to people not in the know on who to contact The right contact point is this mailing list. There's already way to much off-list (and off IRC channel) collaboration that goes on IMO. -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscribe@.apache For additional commands, e-mail: dev-help@.apache - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Active-4-x-branches-tp4022609p4023246.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4125) There are a few small changes in 5x that should be in 4x but are not.
Mark Miller created SOLR-4125: - Summary: There are a few small changes in 5x that should be in 4x but are not. Key: SOLR-4125 URL: https://issues.apache.org/jira/browse/SOLR-4125 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.1, 5.0 Someone pinged me today about a very small part of a fix that is in 5x but not 4x. I've done a bit of comparing a found a couple such things. I'll merge them back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4125) There are a few small changes in 5x that should be in 4x but are not.
[ https://issues.apache.org/jira/browse/SOLR-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506509#comment-13506509 ] Uwe Schindler commented on SOLR-4125: - What is this issue about...? There are a few small changes in 5x that should be in 4x but are not. - Key: SOLR-4125 URL: https://issues.apache.org/jira/browse/SOLR-4125 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.1, 5.0 Someone pinged me today about a very small part of a fix that is in 5x but not 4x. I've done a bit of comparing a found a couple such things. I'll merge them back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4125) There are a few small changes in 5x that should be in 4x but are not.
[ https://issues.apache.org/jira/browse/SOLR-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506510#comment-13506510 ] Commit Tag Bot commented on SOLR-4125: -- [branch_4x commit] Mark Robert Miller http://svn.apache.org/viewvc?view=revisionrevision=1415191 SOLR-4125: There are a few small changes in 5x that should be in 4x but are not. There are a few small changes in 5x that should be in 4x but are not. - Key: SOLR-4125 URL: https://issues.apache.org/jira/browse/SOLR-4125 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.1, 5.0 Someone pinged me today about a very small part of a fix that is in 5x but not 4x. I've done a bit of comparing a found a couple such things. I'll merge them back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Active 4.x branches?
Hey, this is open source, which means that everything is fair game for everybody. Anybody (even a non-committer) can be an owner simply by being active in the conversations in any area. So, who is the owner of any area will vary over the course of a year. Sometimes people take breaks (or have real work assignments), so their ownership may fade down and later fade up again. Although the email list is one of the primary mediums for conversations, Jiras and comments in the Jiras, as well as svn commit history, will make it clear to any newcomer who are the owners (and that should/must be plural!) or most-interested parties in a particular area. If at any point it looks like there is a single owner in an area, that is a sign of potential trouble. Keep the bus factor in mind. -- Jack Krupansky -Original Message- From: David Smiley (@MITRE.org) Sent: Thursday, November 29, 2012 9:48 AM To: dev@lucene.apache.org Subject: Re: Active 4.x branches? Those are good points Yonik. I guess I don't know what to think anymore. Yonik Seeley-4 wrote On Thu, Nov 29, 2012 at 1:24 AM, David Smiley (@MITRE.org) lt; DSMILEY@ gt; wrote: Maybe we should have a roster somewhere of parts of the codebase that have an owner. Taking ownership is a mindset, and is very different from any kind of recognized having ownership. We shouldn't tag areas as owned by someone, as that could discourage others getting involved in that area. It might also encourage deference to the owner, which would also be a bad thing. We sometimes naturally defer to someone with more experience in an area than we have, but it should continue to be on an informal case-by-case basis. It could be useful to people not in the know on who to contact The right contact point is this mailing list. There's already way to much off-list (and off IRC channel) collaboration that goes on IMO. -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscribe@.apache For additional commands, e-mail: dev-help@.apache - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Active-4-x-branches-tp4022609p4023246.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4050) Solr example fails to start in nightly-smoke
[ https://issues.apache.org/jira/browse/SOLR-4050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506511#comment-13506511 ] Tomás Fernández Löbbe commented on SOLR-4050: - That did the trick, thanks. Solr example fails to start in nightly-smoke Key: SOLR-4050 URL: https://issues.apache.org/jira/browse/SOLR-4050 Project: Solr Issue Type: Bug Reporter: Michael McCandless Priority: Blocker The nightly smoke job is stalled (I'll go kill it shortly): https://builds.apache.org/job/Lucene-Solr-SmokeRelease-4.x/22/console It's stalled when trying to run the Solr example ... the server produced this output: {noformat} java.lang.ClassNotFoundException: org.eclipse.jetty.xml.XmlConfiguration at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) at org.eclipse.jetty.start.Main.invokeMain(Main.java:424) at org.eclipse.jetty.start.Main.start(Main.java:602) at org.eclipse.jetty.start.Main.main(Main.java:82) ClassNotFound: org.eclipse.jetty.xml.XmlConfiguration Usage: java -jar start.jar [options] [properties] [configs] java -jar start.jar --help # for more information {noformat} Seems likely the Jetty upgrade somehow caused this... Separately I committed a fix to smoke tester so that it quickly fails if the Solr example fails to start ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4125) There are a few small changes in 5x that should be in 4x but are not.
[ https://issues.apache.org/jira/browse/SOLR-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506512#comment-13506512 ] Uwe Schindler commented on SOLR-4125: - The commit explained - sorry for the noise :-) There are a few small changes in 5x that should be in 4x but are not. - Key: SOLR-4125 URL: https://issues.apache.org/jira/browse/SOLR-4125 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.1, 5.0 Someone pinged me today about a very small part of a fix that is in 5x but not 4x. I've done a bit of comparing a found a couple such things. I'll merge them back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4125) There are a few small changes in 5x that should be in 4x but are not.
[ https://issues.apache.org/jira/browse/SOLR-4125?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506514#comment-13506514 ] Mark Miller commented on SOLR-4125: --- Yeah, just a sync up issue - some small part of a fix missed being merged back in some of my work - now I'm on the hunt for anything else I may have missed! There are a few small changes in 5x that should be in 4x but are not. - Key: SOLR-4125 URL: https://issues.apache.org/jira/browse/SOLR-4125 Project: Solr Issue Type: Bug Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.1, 5.0 Someone pinged me today about a very small part of a fix that is in 5x but not 4x. I've done a bit of comparing a found a couple such things. I'll merge them back. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506520#comment-13506520 ] Jack Krupansky commented on SOLR-139: - Oh, yeah, that. I actually was going to mention it, but I wanted to focus on running with the stock Solr example first. Actually, we need to look a little closer as to why/whether the updateLog directive is really always needed for partial document update. That should probably be a separate Jira issue. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: New Feature Components: update Reporter: Ryan McKinley Fix For: 4.0 Attachments: Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Active 4.x branches?
Whenever I want to know who owns a piece of code, I just look at the svn history to see who has been modifying it. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: David Smiley (@MITRE.org) [mailto:dsmi...@mitre.org] Sent: Thursday, November 29, 2012 8:49 AM To: dev@lucene.apache.org Subject: Re: Active 4.x branches? Those are good points Yonik. I guess I don't know what to think anymore. Yonik Seeley-4 wrote On Thu, Nov 29, 2012 at 1:24 AM, David Smiley (@MITRE.org) lt; DSMILEY@ gt; wrote: Maybe we should have a roster somewhere of parts of the codebase that have an owner. Taking ownership is a mindset, and is very different from any kind of recognized having ownership. We shouldn't tag areas as owned by someone, as that could discourage others getting involved in that area. It might also encourage deference to the owner, which would also be a bad thing. We sometimes naturally defer to someone with more experience in an area than we have, but it should continue to be on an informal case-by-case basis. It could be useful to people not in the know on who to contact The right contact point is this mailing list. There's already way to much off-list (and off IRC channel) collaboration that goes on IMO. -Yonik http://lucidworks.com - To unsubscribe, e-mail: dev-unsubscribe@.apache For additional commands, e-mail: dev-help@.apache - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Active-4-x-branches-tp4022609p4023246.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506526#comment-13506526 ] Shai Erera commented on LUCENE-4575: We use commitData extensively but we don't care about the order. We store key/value pairs. I don't think though that it's trivial to support. Currently the user can pass any Map, but IndexReader returns in practice a HashMap (DataInput.readStringStringMap initializes a HashMap). Therefore, if we want to preserve the type of the Map, we'd need to change DataInput/Output code. I'm not sure it's worth the hassle, but let's discuss that anyway on a separate issue? It's not really related to how the map is set. Allow IndexWriter to commit, even just commitData - Key: LUCENE-4575 URL: https://issues.apache.org/jira/browse/LUCENE-4575 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Priority: Minor Attachments: LUCENE-4575.patch Spinoff from here http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. In some cases, it is valuable to be able to commit changes to the index, even if the changes are just commitData. Such data is sometimes used by applications to register in the index some global application information/state. The proposal is: * Add a setCommitData() API and separate it from commit() and prepareCommit() (simplify their API) * When that API is called, flip on the dirty/changes bit, so that this gets committed even if no other changes were made to the index. I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506528#comment-13506528 ] Uwe Schindler commented on LUCENE-4575: --- The API returns MapString,String, so we make no garanties about order. Allow IndexWriter to commit, even just commitData - Key: LUCENE-4575 URL: https://issues.apache.org/jira/browse/LUCENE-4575 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Priority: Minor Attachments: LUCENE-4575.patch Spinoff from here http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. In some cases, it is valuable to be able to commit changes to the index, even if the changes are just commitData. Such data is sometimes used by applications to register in the index some global application information/state. The proposal is: * Add a setCommitData() API and separate it from commit() and prepareCommit() (simplify their API) * When that API is called, flip on the dirty/changes bit, so that this gets committed even if no other changes were made to the index. I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
AW: AW: Pylucene release
Hi Andi, thanks for you instructions - I meanwhile managed to install pylucene (4.0) from trunk and started working on the test_fuzzyQuery. Will send you a patch once I managed to update a few tests. Just wanted to let you know about (slow) progress - sorry for late reply! regards, Thomas -Ursprüngliche Nachricht- Von: Andi Vajda [mailto:va...@apache.org] Gesendet: Mittwoch, 14. November 2012 18:36 An: pylucene-...@lucene.apache.org Betreff: Re: AW: Pylucene release Hi Thomas, On Wed, 14 Nov 2012, Thomas Koch wrote: I still wanted to check the API changes related to 4.0 and could then help with porting the example code (and/or unit tests). I hope there are more people interested in helping to port PyLucene (or at least the 'related' Python code) to the Lucene 4.0 level... How can we best proceed? 1. Pick a test that fails (for example: python test/test_FuzzyQuery.py) 2. Announce you're working on it on the list (so that only you does) 3. Fix it 4. Send in a patch I assume you checked in the code that's adapted already to SVN. Yes, all current code is checked in, including fixed or broken tests. Is there a list of code that needs to be ported (and can be used to distribute tasks)? Currently, all tests in test up to test_FilteredQuery.py (alphabetically) pass. The test_ICU* ones also pass. You should use these as examples on how to fix failing ones. As said I don't have a an idea of the API changes yet, so it's hard to estimate the time needed to get used to 4.0 No time estimated is expected from you. It's best to proceed by example. Look at the tests that pass already (and thus that have been fixed) as examples. The steps to fix a failing test are as follows: - fix import statements first (they're all changed since PyLucene 4.0 no longer uses a flat namespace but strictly follows the original Java package structure now) for example: from lucene import Document becomes from org.apache.lucene.document import Document If you don't know where a class is (and the Lucene tree is deeply nested), find lucene src -name ClassName.java will usually give you an idea of the package structure to import - when it makes sense (most of the time), use PyLuceneTestCase as the parent test class. This will help with the complexities/boilerplate in creating a test IndexWriter/Reader/Searcher using a RAMDirectory - if the tests still fails, look at the original Java test code for possible changes in the API or the expected that behaviour that occurred since the first port. The original Java test file is usually named TestName.java when the Python test is named test_Name.py Andi.. (and fix the code), but as you did that already maybe you can share your experience with us. As with any new major release (e.g. Python 3.x) I guess many of us are afraid to move forward to the new release and change our code base, but certainly that's just a matter of time ... Cheers, Thomas -Ursprüngliche Nachricht- Von: Andi Vajda [mailto:va...@apache.org] Gesendet: Dienstag, 13. November 2012 23:18 An: Shawn Grant Cc: pylucene-...@lucene.apache.org Betreff: Re: Pylucene release Hi Shawn, On Tue, 13 Nov 2012, Shawn Grant wrote: Hi Andi, I was just wondering if Pylucene is on its usual schedule to release 4-6 weeks after Lucene. I didn't see any discussion of it on the mailing list or elsewhere. I'm looking forward to 4.0! Normally, PyLucene is released a few days after a Lucene release but 4.0 has seen so many API changes and removals that all tests and samples need to be ported to the new API. Last week-end, I ported a few but lots remain to be. If no one helps, it either means that no one cares enough or that everyone is willing to be patient :-) The PyLucene trunk svn repository is currently tracking the Lucene Core 4.x branch and you're welcome to use it out of svn. In the ten or so unit tests I ported so far, I didn't find any issues with PyLucene proper (or JCC). All changes were due to the tests being out of date or using deprecated APIs now removed. You might find that PyLucene out-of- trunk is quite usable. If people want to help with porting PyLucene unit tests, the ones under its 'test' directory not yet ported, feel free to ask questions here. The gist of it is: - fix the imports (look at the first few tests for example, alphabetically) - fix the tests to pass by looking at the original Java tests for changes as most of these tests were originally ported from Java Lucene. Once you're familiar with the new APIs, porting the sample code in samples and in LuceneInAction should fairly straightforward. It's just that there is a lot to port. Andi..
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506530#comment-13506530 ] Yonik Seeley commented on LUCENE-4575: -- bq. I don't think though that it's trivial to support. Currently the user can pass any Map, but IndexReader returns in practice a HashMap (DataInput.readStringStringMap initializes a HashMap). If a user cared about order, then they would pass a LinkedHashMap. Then the only thing that would need to change is DataInput.readStringStringMap: s/HashMap/LinkedHashMap. bq. it's not really related to how the map is set. It is... if you make a copy of the map and we want to preserve order, it's new LinkedHashMap instead of HashMap. It's a minor enough point I don't think it does deserve it's own issue. I don't personally care about preserving order - but I did think it was worth at least bringing up. Allow IndexWriter to commit, even just commitData - Key: LUCENE-4575 URL: https://issues.apache.org/jira/browse/LUCENE-4575 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Priority: Minor Attachments: LUCENE-4575.patch Spinoff from here http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. In some cases, it is valuable to be able to commit changes to the index, even if the changes are just commitData. Such data is sometimes used by applications to register in the index some global application information/state. The proposal is: * Add a setCommitData() API and separate it from commit() and prepareCommit() (simplify their API) * When that API is called, flip on the dirty/changes bit, so that this gets committed even if no other changes were made to the index. I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506533#comment-13506533 ] Mark Miller commented on SOLR-139: -- bq. we need to look a little closer as to why/whether the updateLog directive is really always needed for partial document update. I believe yonik chose to implement it by using updateLog features. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: New Feature Components: update Reporter: Ryan McKinley Fix For: 4.0 Attachments: Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4575) Allow IndexWriter to commit, even just commitData
[ https://issues.apache.org/jira/browse/LUCENE-4575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506540#comment-13506540 ] Shai Erera commented on LUCENE-4575: bq. Then the only thing that would need to change is DataInput.readStringStringMap: s/HashMap/LinkedHashMap. So you propose that the code will always initialize LHM in DataInput, that way preserving order whether required or not? Yes, I guess that we can do that. But I wonder if we should? We didn't so far, and nobody complained. And since it's an internal change, we can always make that change in the future if somebody asks? Allow IndexWriter to commit, even just commitData - Key: LUCENE-4575 URL: https://issues.apache.org/jira/browse/LUCENE-4575 Project: Lucene - Core Issue Type: Improvement Components: core/index Reporter: Shai Erera Priority: Minor Attachments: LUCENE-4575.patch Spinoff from here http://lucene.472066.n3.nabble.com/commit-with-only-commitData-td4022155.html. In some cases, it is valuable to be able to commit changes to the index, even if the changes are just commitData. Such data is sometimes used by applications to register in the index some global application information/state. The proposal is: * Add a setCommitData() API and separate it from commit() and prepareCommit() (simplify their API) * When that API is called, flip on the dirty/changes bit, so that this gets committed even if no other changes were made to the index. I will work on a patch a post. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3668) offsets issues with multiword synonyms
[ https://issues.apache.org/jira/browse/LUCENE-3668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506554#comment-13506554 ] Robert Muir commented on LUCENE-3668: - That writeup is a little off. {quote} Finally, and most seriously, the SynonymFilterFactory will simply not match multi-word synonyms in user queries if you do any kind of tokenization. This is because the tokenizer breaks up the input before the SynonymFilterFactory can transform it. {quote} Thats not correct. The bug is in QueryParser: LUCENE-2605. offsets issues with multiword synonyms -- Key: LUCENE-3668 URL: https://issues.apache.org/jira/browse/LUCENE-3668 Project: Lucene - Core Issue Type: Bug Components: modules/analysis Reporter: Robert Muir Assignee: Michael McCandless Fix For: 3.6, 4.0-ALPHA Attachments: LUCENE-3668.patch, LUCENE-3668_test.patch as reported on the list, there are some strange offsets with FSTSynonyms, in the case of multiword synonyms. as a workaround it was suggested to use the older synonym impl, but it has bugs too (just in a different way). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4120) Collection API: Support for specifying a list of solrs to spread a new collection across
[ https://issues.apache.org/jira/browse/SOLR-4120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506444#comment-13506444 ] Per Steffensen edited comment on SOLR-4120 at 11/29/12 4:16 PM: h4. SOLR-4120.patch h5. Where does it fit * It fits on top of revision 1412602 of branch lucene_solr_4_0, where the patch for SOLR-4114 has already been applied. The following should work if you have a checkout of revision 1412602 of branch lucene_solr_4_0 ** cd checkout-folder ** patch -s -p0 SOLR-4114.patch ** patch --ignore-whitespace -p0 SOLR-4120.patch You need the --ignore-whitespace - at least with my version of patch on Show Leopard. Probably because I do not have the correct Solr code-style installed in my Eclipse. Hmmm, probably should do that. h5. Content of the patch The patch modifies the create operation of the Solr Collection API, so that it allows to provide a list of Solrs that the shards for the new collection should be spread across * Param key: createNodeSet (OverseerCollectionProcessor.CREATE_NODE_SET) * Param value: comma-separated list of node-names (equal to the node-names received from ClusterState.getLiveNodes()) * Param is not mandatory. If not provided the created collection will still have its shards spread across all live nodes h5. Testing BasicDistributedZkTest.testCollectionAPI has been modified to also test this feature was (Author: steff1193): h4. SOLR-4120.patch h5. Where does it fit * It fits on top of revision 1412602 of branch lucene_solr_4_0, where the patch for SOLR-4114 has already been applied. The following should work if you have a checkout of revision 1412602 of branch lucene_solr_4_0 ** cd checkout-folder ** patch -s -p0 SOLR-4114.patch ** patch --ignore-whitespace -p0 SOLR-4120.patch You need the --ignore-whitespace - at least with my version of patch on Show Leopard. Probably because I do not have the correct Solr code-style installed in my Eclipse. Hmmm, probably should do that. h5. Content of the patch The patch modifies the create operation of the Solr Collection API, so that i allows to provide a list of Solrs that the shards for the new collection should be spread across * Param key: createNodeSet (OverseerCollectionProcessor.CREATE_NODE_SET) * Param value: comma-separated list of node-names (equal to the node-names received from ClusterState.getLiveNodes()) * Param is not mandatory. If not provided the created collection will still have its shards spread across all live nodes h5. Testing BasicDistributedZkTest.testCollectionAPI has been modified to also test this feature Collection API: Support for specifying a list of solrs to spread a new collection across Key: SOLR-4120 URL: https://issues.apache.org/jira/browse/SOLR-4120 Project: Solr Issue Type: New Feature Components: multicore, SolrCloud Affects Versions: 4.0 Reporter: Per Steffensen Assignee: Per Steffensen Priority: Minor Labels: collection-api, multicore, shard, shard-allocation Attachments: SOLR-4120.patch When creating a new collection through the Collection API, the Overseer (handling the creation) will spread shards for this new collection across all live nodes. Sometimes you dont want a collection spread across all available nodes. Allow for the create operation of the Collection API, to take a createNodeSet parameter containing a list of Solr to spread the new shards across. If not provided it will just spread across all available nodes (default). For an example of a concrete case of usage see: https://issues.apache.org/jira/browse/SOLR-4114?focusedCommentId=13505506page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13505506 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4126) Partial Update retrieve int/float value error
nihed mbarek created SOLR-4126: -- Summary: Partial Update retrieve int/float value error Key: SOLR-4126 URL: https://issues.apache.org/jira/browse/SOLR-4126 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: Solr 4.0 Reporter: nihed mbarek Dear, I have a document that I update using the recommendation of this link http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ as XML/JSON, the result is ok int name=a109/int float name=b4.368/float int name=c5318311/int but in my request handler : final Document doc = req.getSearcher().doc(x); final ListIndexableField fields = doc.getFields(); for (IndexableField indexableField : fields) { System.out.println(indexableField.name()+ +indexableField.stringValue()); } the result is totally out of range : a m b Àࢼڨ c Ԓڧ the kind of result is only visible for field with type different than string -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4566) SearcherManager.afterRefresh() issues
[ https://issues.apache.org/jira/browse/LUCENE-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-4566: -- Assignee: Michael McCandless SearcherManager.afterRefresh() issues - Key: LUCENE-4566 URL: https://issues.apache.org/jira/browse/LUCENE-4566 Project: Lucene - Core Issue Type: Bug Reporter: selckin Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-4566-double-listeners.patch, LUCENE-4566.patch, LUCENE-4566.patch 1) ReferenceManager.doMaybeRefresh seems to call afterRefresh even if it didn't refresh/swap, (when newReference == null) 2) It would be nice if users were allowed to override SearcherManager.afterRefresh() to get notified when a new searcher is in action. But SearcherManager and ReaderManager are final, while NRTManager is not. The only way to currently hook into when a new searched is created is using the factory, but if you wish to do some async task then, there are no guarantees that acquire() will return the new searcher, so you have to pass it around and incRef manually. While if allowed to hook into afterRefresh you can just rely on acquire() existing infra you have around it to give you the latest one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4566) SearcherManager.afterRefresh() issues
[ https://issues.apache.org/jira/browse/LUCENE-4566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4566: --- Attachment: LUCENE-4566.patch Patch, removing the closed listener (I think we don't need it?) ... I think it's ready. SearcherManager.afterRefresh() issues - Key: LUCENE-4566 URL: https://issues.apache.org/jira/browse/LUCENE-4566 Project: Lucene - Core Issue Type: Bug Reporter: selckin Assignee: Michael McCandless Priority: Minor Attachments: LUCENE-4566-double-listeners.patch, LUCENE-4566.patch, LUCENE-4566.patch 1) ReferenceManager.doMaybeRefresh seems to call afterRefresh even if it didn't refresh/swap, (when newReference == null) 2) It would be nice if users were allowed to override SearcherManager.afterRefresh() to get notified when a new searcher is in action. But SearcherManager and ReaderManager are final, while NRTManager is not. The only way to currently hook into when a new searched is created is using the factory, but if you wish to do some async task then, there are no guarantees that acquire() will return the new searcher, so you have to pass it around and incRef manually. While if allowed to hook into afterRefresh you can just rely on acquire() existing infra you have around it to give you the latest one. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4286) Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams
[ https://issues.apache.org/jira/browse/LUCENE-4286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tom Burton-West updated LUCENE-4286: Attachment: LUCENE-4286.patch_3.x We are still using Solr 3.6 in production so I backported the patch to Lucene/Solr 3.6. Attached as LUCENE-4286.patch_3.x Add flag to CJKBigramFilter to allow indexing unigrams as well as bigrams - Key: LUCENE-4286 URL: https://issues.apache.org/jira/browse/LUCENE-4286 Project: Lucene - Core Issue Type: Improvement Affects Versions: 4.0-ALPHA, 3.6.1 Reporter: Tom Burton-West Priority: Minor Fix For: 4.0-BETA, 5.0 Attachments: LUCENE-4286.patch, LUCENE-4286.patch, LUCENE-4286.patch_3.x Add an optional flag to the CJKBigramFilter to tell it to also output unigrams. This would allow indexing of both bigrams and unigrams and at query time the analyzer could analyze queries as bigrams unless the query contained a single Han unigram. As an example here is a configuration a Solr fieldType with the analyzer for indexing with the indexUnigrams flag set and the analyzer for querying without the flag. fieldType name=CJK autoGeneratePhraseQueries=false − analyzer type=index tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory indexUnigrams=true han=true/ /analyzer analyzer type=query tokenizer class=solr.ICUTokenizerFactory/ filter class=solr.CJKBigramFilterFactory han=true/ /analyzer /fieldType Use case: About 10% of our queries that contain Han characters are single character queries. The CJKBigram filter only outputs single characters when there are no adjacent bigrammable characters in the input. This means we have to create a separate field to index Han unigrams in order to address single character queries and then write application code to search that separate field if we detect a single character Han query. This is rather kludgey. With the optional flag, we could configure Solr as above This is somewhat analogous to the flags in LUCENE-1370 for the ShingleFilter used to allow single word queries (although that uses word n-grams rather than character n-grams.) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4126) Partial Update retrieve int/float value error
[ https://issues.apache.org/jira/browse/SOLR-4126?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-4126. Resolution: Not A Problem Nihed, solr plugins need to use the IndexSchema to access Documents in order to convert the encoded values in those documents into the appropriate javatypes. see for example SolrPluginUtils.docListToSolrDocumentList or TextResponseWriter.toSolrDocument. If you have more questions about writing custom plugins, please ask about them on the solr-user list. Partial Update retrieve int/float value error - Key: SOLR-4126 URL: https://issues.apache.org/jira/browse/SOLR-4126 Project: Solr Issue Type: Bug Affects Versions: 4.0 Environment: Solr 4.0 Reporter: nihed mbarek Dear, I have a document that I update using the recommendation of this link http://solr.pl/en/2012/07/09/solr-4-0-partial-documents-update/ as XML/JSON, the result is ok int name=a109/int float name=b4.368/float int name=c5318311/int but in my request handler : final Document doc = req.getSearcher().doc(x); final ListIndexableField fields = doc.getFields(); for (IndexableField indexableField : fields) { System.out.println(indexableField.name()+ +indexableField.stringValue()); } the result is totally out of range : a m b Àࢼڨ c Ԓڧ the kind of result is only visible for field with type different than string -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
pro coding style
if you talk about my yesterday work then no reformats were done because code was already properly formatted. Also all code was hand written, no generated code was used. Generated code is not committed to git anyway. my hard limits for code quality (checked at commit): * no findbugs warnings with level 14+ * code coverage 80% * code coverage in critical parts 95% * list of PMD warnings to stop commit * generation of call tree graph - check it for cycles, checking for calling same procedure from different levels (indicates bad code flow) * all eclipse warnings turned into errors * patched eclipse compiler to do better flow analysis * code reformatted at commit * javadoc everything, no warnings what you should do: * stuff i do + * ant - maven * svn - git (way better tools) * split code into small manageable maven modules * get more people * put trust into your testing, not into perfect people * work faster * use github to track patches * use springs for integration testing * use jenkins to do tests on incoming patches * do library checks for number of functions really used * contributor patches should be high priority or you will lose contributors i am giving sometimes lessons: about 1-2 sessions per year for 14 people, if i have spare time. But its waste of time, most ppl will not follow. learn this: SLOW CODING != BUG FREE CODE. GOOD TESTS + GOOD STATIC TESTING = GOOD BUG FREE CODE CODE STYLE != GAME WITH SPACES AND { } GOOD TESTS = 2x TIME NEEDED TO CODE STUFF UNDER TEST GOOD TESTS ARE MORE VALUABLE THEN GOOD CODE - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2701) Expose IndexWriter.commit(MapString,String commitUserData) to solr
[ https://issues.apache.org/jira/browse/SOLR-2701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506688#comment-13506688 ] Greg Bowyer commented on SOLR-2701: --- bq. I haven't had a chance to check out the rest of the patch/issue, but for this specifically, what about a convention? Anything under the persistent key in the commit data is carried over indefinitely. Or if persistent is the norm, then we could reverse it and have a transient map that is not carried over. The persistent/transient map sounds like a good idea; I will take a look at how that can be implemented Expose IndexWriter.commit(MapString,String commitUserData) to solr - Key: SOLR-2701 URL: https://issues.apache.org/jira/browse/SOLR-2701 Project: Solr Issue Type: New Feature Components: update Affects Versions: 4.0-ALPHA Reporter: Eks Dev Priority: Minor Labels: commit, update Attachments: SOLR-2701-Expose-userCommitData-throughout-solr.patch, SOLR-2701.patch Original Estimate: 8h Remaining Estimate: 8h At the moment, there is no feature that enables associating user information to the commit point. Lucene supports this possibility and it should be exposed to solr as well, probably via beforeCommit Listener (analogous to prepareCommit in Lucene). Most likely home for this Map to live is UpdateHandler. Example use case would be an atomic tracking of sequence numbers or timestamps for incremental updates. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506735#comment-13506735 ] Hoss Man commented on SOLR-139: --- bq. I believe yonik chose to implement it by using updateLog features. i think it has to be - the real time get support provided by the updateLog is the only way to garuntee that the document will be available to atomicly update it. Lukas: if the atomic update code path isn't throwing a big fat error if you try to use it w/o updateLog configured then that sounds to me like a bug -- can you please file a Jira for that Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: New Feature Components: update Reporter: Ryan McKinley Fix For: 4.0 Attachments: Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
IndexWriter.ensureOpen and ensureOpen(boolean)
Hi While working on LUCENE-4575 I noticed what I thought was an inconsistency between prepareCommit() and prepareCommit(commitData). The former called ensureOpen(true) and the latter ensureOpen(false). At first I thought that this is a bug, so I fixed both to call ensureOpen(true), especially now that I consolidate the two prepCommit() versions into one, but then all tests failed with AlreadyClosedException. How wonderful :). Getting deeper into the meaning of the two ensureOpen versions i realize that the boolean means something like fail if IW has been closed, or is in the process of closing). Some methods choose to not fail if IW is in the process of closing, while others do (mostly internal methods). My question is - why make the distinction? If IW is in the process of closing, why not always fail? Shai
Re: pro coding style
hey, some comments inline... On Thu, Nov 29, 2012 at 7:48 PM, Radim Kolar h...@filez.com wrote: if you talk about my yesterday work then no reformats were done because code was already properly formatted. Also all code was hand written, no generated code was used. Generated code is not committed to git anyway. my hard limits for code quality (checked at commit): * no findbugs warnings with level 14+ * code coverage 80% * code coverage in critical parts 95% * list of PMD warnings to stop commit * generation of call tree graph - check it for cycles, checking for calling same procedure from different levels (indicates bad code flow) * all eclipse warnings turned into errors * patched eclipse compiler to do better flow analysis * code reformatted at commit * javadoc everything, no warnings what you should do: * stuff i do + * ant - maven I suggest you start with this, make sure you have enough time and energy for the discussion. * svn - git (way better tools) I think we had this discussion already and it seems that lots of folks are positive, yet there is still some barrier infrasturcuture wise along the lines. * split code into small manageable maven modules see above - we have a fully functional maven build but ant is out primary build. My honest opinion forget what I said above - don't try. * get more people good point - can you refere us some, in my experience they are pretty hard to find. * put trust into your testing, not into perfect people ahh yeah testing, we should do that at some point * work faster wow - I never thought about that though! * use github to track patches wait why is github good for patches? * use springs for integration testing sorry we are a no-dependency library. * use jenkins to do tests on incoming patches patches welcome * do library checks for number of functions really used hmm - we are a library? * contributor patches should be high priority or you will lose contributors thats is a good advice for such a young project. i am giving sometimes lessons: about 1-2 sessions per year for 14 people, if i have spare time. But its waste of time, most ppl will not follow. learn this: SLOW CODING != BUG FREE CODE. GOOD TESTS + GOOD STATIC TESTING = GOOD BUG FREE CODE CODE STYLE != GAME WITH SPACES AND { } GOOD TESTS = 2x TIME NEEDED TO CODE STUFF UNDER TEST GOOD TESTS ARE MORE VALUABLE THEN GOOD CODE lets drop the code its a hassle to maintain anyway! thanks man, this mail made my day! simon - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506860#comment-13506860 ] Commit Tag Bot commented on SOLR-3849: -- [trunk commit] Steven Rowe http://svn.apache.org/viewvc?view=revisionrevision=1415402 SOLR-3849: Maven configuration += -Djava.awt.headless=true; also, upgrade maven-surefire-plugin to 2.12.4 ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at
[jira] [Commented] (SOLR-3602) Look into updating to ZooKeeper 3.4.5
[ https://issues.apache.org/jira/browse/SOLR-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506868#comment-13506868 ] Commit Tag Bot commented on SOLR-3602: -- [trunk commit] Steven Rowe http://svn.apache.org/viewvc?view=revisionrevision=1415408 SOLR-3602: Maven configuration: Exclude new zookeeper 3.4.5 transitive dependency org.slf4j:slf4j-log4j12 Look into updating to ZooKeeper 3.4.5 - Key: SOLR-3602 URL: https://issues.apache.org/jira/browse/SOLR-3602 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.1, 5.0 Looks like 3.4.4 may be considered stable - if that happens, we should look into updating. Otherwise, we should keep on eye out for 3.3.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3849) ScriptEngineTest failure RE system properties and ThreadLeakError
[ https://issues.apache.org/jira/browse/SOLR-3849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506878#comment-13506878 ] Commit Tag Bot commented on SOLR-3849: -- [branch_4x commit] Steven Rowe http://svn.apache.org/viewvc?view=revisionrevision=1415410 SOLR-3849: Maven configuration += -Djava.awt.headless=true; also, upgrade maven-surefire-plugin to 2.12.4 (merge trunk r1415402) ScriptEngineTest failure RE system properties and ThreadLeakError - Key: SOLR-3849 URL: https://issues.apache.org/jira/browse/SOLR-3849 Project: Solr Issue Type: Bug Components: update Affects Versions: 5.0 Environment: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 Reporter: David Smiley Assignee: Uwe Schindler Fix For: 4.0, 5.0 Attachments: SOLR-3849.patch 100% reproducible for me: solr$ ant test -Dtestcase=ScriptEngineTest {noformat} [junit4:junit4] JUnit4 says hi! Master seed: E62CC5FBAC2CEFA4 [junit4:junit4] Executing 1 suite with 1 JVM. [junit4:junit4] [junit4:junit4] Suite: org.apache.solr.update.processor.ScriptEngineTest [junit4:junit4] OK 0.17s | ScriptEngineTest.testPut [junit4:junit4] OK 0.02s | ScriptEngineTest.testEvalReader [junit4:junit4] IGNOR/A 0.10s | ScriptEngineTest.testJRuby [junit4:junit4] Assumption #1: got: [null], expected: each not null [junit4:junit4] OK 0.01s | ScriptEngineTest.testEvalText [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByExtension [junit4:junit4] OK 0.01s | ScriptEngineTest.testGetEngineByName [junit4:junit4] 2 -9 T9 ccr.ThreadLeakControl.checkThreadLeaks WARNING Will linger awaiting termination of 2 leaked thread(s). [junit4:junit4] 2 20163 T9 ccr.ThreadLeakControl.checkThreadLeaks SEVERE 1 thread leaked from SUITE scope at org.apache.solr.update.processor.ScriptEngineTest: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 20164 T9 ccr.ThreadLeakControl.tryToInterruptAll Starting to interrupt leaked threads: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 23172 T9 ccr.ThreadLeakControl.tryToInterruptAll SEVERE There are still zombie threads that couldn't be terminated: [junit4:junit4] 2 1) Thread[id=11, name=AppKit Thread, state=RUNNABLE, group=main] [junit4:junit4] 2 at (empty stack) [junit4:junit4] 2 NOTE: test params are: codec=SimpleText, sim=RandomSimilarityProvider(queryNorm=true,coord=yes): {}, locale=es_PR, timezone=America/Edmonton [junit4:junit4] 2 NOTE: Mac OS X 10.8.1 x86_64/Oracle Corporation 1.7.0_07 (64-bit)/cpus=4,threads=1,free=65764312,total=85065728 [junit4:junit4] 2 NOTE: All tests run in this JVM: [ScriptEngineTest] [junit4:junit4] 2 NOTE: reproduce with: ant test -Dtestcase=ScriptEngineTest -Dtests.seed=E62CC5FBAC2CEFA4 -Dtests.slow=true -Dtests.locale=es_PR -Dtests.timezone=America/Edmonton -Dtests.file.encoding=UTF-8 [junit4:junit4] ERROR 0.00s | ScriptEngineTest (suite) [junit4:junit4] Throwable #1: java.lang.AssertionError: System properties invariant violated. [junit4:junit4] New keys: [junit4:junit4] sun.awt.enableExtraMouseButtons=true [junit4:junit4] sun.font.fontmanager=sun.font.CFontManager [junit4:junit4] [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.SystemPropertiesInvariantRule$1.evaluate(SystemPropertiesInvariantRule.java:66) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.NoShadowingOrOverridesOnMethodsRule$1.evaluate(NoShadowingOrOverridesOnMethodsRule.java:39) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at org.apache.lucene.util.TestRuleAssertionsRequired$1.evaluate(TestRuleAssertionsRequired.java:43) [junit4:junit4] at org.apache.lucene.util.TestRuleMarkFailure$1.evaluate(TestRuleMarkFailure.java:48) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreAfterMaxFailures$1.evaluate(TestRuleIgnoreAfterMaxFailures.java:70) [junit4:junit4] at org.apache.lucene.util.TestRuleIgnoreTestSuites$1.evaluate(TestRuleIgnoreTestSuites.java:55) [junit4:junit4] at com.carrotsearch.randomizedtesting.rules.StatementAdapter.evaluate(StatementAdapter.java:36) [junit4:junit4] at
[jira] [Updated] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document
[ https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley updated LUCENE-4574: - Attachment: LUCENE-4574.patch I've thought about this some more and chatted with with Yonik Adrien in IRC. Attached is a new patch. In a nutshell, the caching is done via ScoreCachingWrappingScorer and is applied by TopFieldCollector but only when one of the comparators is a RelevancyComparator. I believe this is the only case when the score could be retrieved more than once per document. To implement this patch, I did a little refactoring. I pulled a Scorer field that was common to all subclasses of TopFieldCollector into TFC, and I added a getFieldComparators() abstract method that is implemented trivially by all its subclasses. setScorer() is now implemented only at TFC and none of its subclasses. If this seems reasonable, perhaps it would be good to make a further refactoring such that FieldComparator.setScorer() doesn't exist; leave it specific to RelevanceComparator or introduce an abstract class FieldComparatorNeedsScorer. After all, in Lucene only RelevanceComparator needs it. FunctionQuery ValueSource value computed twice per document --- Key: LUCENE-4574 URL: https://issues.apache.org/jira/browse/LUCENE-4574 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.0, 4.1 Reporter: David Smiley Attachments: LUCENE-4574.patch, LUCENE-4574.patch, Test_for_LUCENE-4574.patch I was working on a custom ValueSource and did some basic profiling and debugging to see if it was being used optimally. To my surprise, the value was being fetched twice per document in a row. This computation isn't exactly cheap to calculate so this is a big problem. I was able to work-around this problem trivially on my end by caching the last value with corresponding docid in my FunctionValues implementation. Here is an excerpt of the code path to the first execution: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} And here is the 2nd call: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56) at org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} The 2nd call appears to use some score caching mechanism, which is all well and good, but that same mechanism wasn't used in the first call so there's no cached value to retrieve. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3602) Look into updating to ZooKeeper 3.4.5
[ https://issues.apache.org/jira/browse/SOLR-3602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506918#comment-13506918 ] Commit Tag Bot commented on SOLR-3602: -- [branch_4x commit] Steven Rowe http://svn.apache.org/viewvc?view=revisionrevision=1415411 SOLR-3602: Maven configuration: Exclude new zookeeper 3.4.5 transitive dependency org.slf4j:slf4j-log4j12 (merge trunk r1415408) Look into updating to ZooKeeper 3.4.5 - Key: SOLR-3602 URL: https://issues.apache.org/jira/browse/SOLR-3602 Project: Solr Issue Type: Improvement Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Priority: Minor Fix For: 4.1, 5.0 Looks like 3.4.4 may be considered stable - if that happens, we should look into updating. Otherwise, we should keep on eye out for 3.3.6 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4127) Atomic updates used w/o updateLog should throw an error
Lukas Graf created SOLR-4127: Summary: Atomic updates used w/o updateLog should throw an error Key: SOLR-4127 URL: https://issues.apache.org/jira/browse/SOLR-4127 Project: Solr Issue Type: Bug Components: update Affects Versions: 4.0 Reporter: Lukas Graf The atomic update feature described in [SOLR-139|https://issues.apache.org/jira/browse/SOLR-139] seems to depend on having an {{updateLog /}} configured in {{solrconfig.xml}}. When used without an update log, the update commands like {{set}} or {{add}} don't result in an error and the transaction being aborted, but produce garbled documents instead. This is the case for both the XML and JSON formats for the update message. Example: I initially created some content like this: {code} $ curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d ' [{id:7cb8a43c,Title:My original Title, Creator: John Doe}]' {code} Which resulted in this document: {code:xml} doc str name=id7cb8a43c/str str name=TitleMy original Title/str str name=CreatorJohn Doe/str /doc {code} Then I attempted to update that document with this statement: {code} $ curl 'localhost:8983/solr/update?commit=true' -H 'Content-type:application/json' -d ' [{id:7cb8a43c,Title:{set:My new title}}]' {code} Which resulted in this garbled document, with the fields that weren't updated missing: {code:xml} doc str name=id7cb8a43c/str str name=Title{set=My new title}/str /doc {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4542) Make RECURSION_CAP in HunspellStemmer configurable
[ https://issues.apache.org/jira/browse/LUCENE-4542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506938#comment-13506938 ] Rafał Kuć commented on LUCENE-4542: --- Chris anything else should be done here in your opinion or is it ready to be committed ? Make RECURSION_CAP in HunspellStemmer configurable -- Key: LUCENE-4542 URL: https://issues.apache.org/jira/browse/LUCENE-4542 Project: Lucene - Core Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Reporter: Piotr Assignee: Chris Male Attachments: Lucene-4542-javadoc.patch, LUCENE-4542.patch, LUCENE-4542-with-solr.patch Currently there is private static final int RECURSION_CAP = 2; in the code of the class HunspellStemmer. It makes using hunspell with several dictionaries almost unusable, due to bad performance (f.ex. it costs 36ms to stem long sentence in latvian for recursion_cap=2 and 5 ms for recursion_cap=1). It would be nice to be able to tune this number as needed. AFAIK this number (2) was chosen arbitrary. (it's a first issue in my life, so please forgive me any mistakes done). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13506942#comment-13506942 ] Lukas Graf commented on SOLR-139: - Filed [SOLR-4127|https://issues.apache.org/jira/browse/SOLR-4127]: Atomic updates used w/o updateLog should throw an error Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: New Feature Components: update Reporter: Ryan McKinley Fix For: 4.0 Attachments: Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, SOLR-139_createIfNotExist.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139.patch, SOLR-139.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: pro coding style
what you should do: * stuff i do + * ant - maven I suggest you start with this, make sure you have enough time and energy for the discussion. I dont have either, if i decide to go with SOLR instead of EC, i will fork it. It will save me lot of time. * svn - git (way better tools) I think we had this discussion already and it seems that lots of folks are positive, yet there is still some barrier infrasturcuture wise along the lines. dont blame infrastructure, other apache projects are using it. * split code into small manageable maven modules see above - we have a fully functional maven build but ant is out primary build. i dont see pom.xml in your source tree. good point - can you refere us some, in my experience they are pretty hard to find. i do not know people who believe that process designed to be slow is a good process. We here believe that fast process = high salary. * use github to track patches wait why is github good for patches? you can track patch revisions and apply/browse/comment it easily. Also its way easier to upload it and do pull request then attach to ticket in jira. * use springs for integration testing sorry we are a no-dependency library. scopetest/scope - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Active 4.x branches?
How can you expect stability out of that? unit + integration testing. If it passes tests, its not different from old code. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document
[ https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13507045#comment-13507045 ] Robert Muir commented on LUCENE-4574: - Just to bold what I said before, as I feel its important here: {quote} Finally, we could also consider something like your patch, except more honed in these particular silly situations. so thats something like, up-front setting a boolean in these collectors ctors if one of the comparators is relevance *and also* its asked to track scores/max scores. {quote} Seems like we are doing it always if there is a relevance comparator? I feel like the caching (which i hate) should be contained exactly to whats minimal and necessary to prevent score from being called twice. FunctionQuery ValueSource value computed twice per document --- Key: LUCENE-4574 URL: https://issues.apache.org/jira/browse/LUCENE-4574 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.0, 4.1 Reporter: David Smiley Attachments: LUCENE-4574.patch, LUCENE-4574.patch, Test_for_LUCENE-4574.patch I was working on a custom ValueSource and did some basic profiling and debugging to see if it was being used optimally. To my surprise, the value was being fetched twice per document in a row. This computation isn't exactly cheap to calculate so this is a big problem. I was able to work-around this problem trivially on my end by caching the last value with corresponding docid in my FunctionValues implementation. Here is an excerpt of the code path to the first execution: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} And here is the 2nd call: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56) at org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} The 2nd call appears to use some score caching mechanism, which is all well and good, but that same mechanism wasn't used in the first call so there's no cached value to retrieve. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4123) ICUTokenizerFactory - per-script RBBI customization
[ https://issues.apache.org/jira/browse/SOLR-4123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-4123: -- Attachment: SOLR-4123.patch patch with that above syntax (which i'm not sure I even like). may not work: haven't tested at all. ICUTokenizerFactory - per-script RBBI customization --- Key: SOLR-4123 URL: https://issues.apache.org/jira/browse/SOLR-4123 Project: Solr Issue Type: Improvement Components: Schema and Analysis Affects Versions: 4.0 Reporter: Shawn Heisey Fix For: 4.1, 5.0 Attachments: SOLR-4123.patch Initially this started out as an idea for a configuration knob on ICUTokenizer that would allow me to tell it not to tokenize on punctuation. Through IRC discussion on #lucene, it sorta ballooned. The committers had a long discussion about it that I don't really understand, so I'll be including it in the comments. I am a Solr user, so I would also need the ability to access the configuration from there, likely either in schema.xml or solrconfig.xml. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4128) multivalued dynamicField matching 'score' causes text response writers to output score as an array
Aaron Daubman created SOLR-4128: --- Summary: multivalued dynamicField matching 'score' causes text response writers to output score as an array Key: SOLR-4128 URL: https://issues.apache.org/jira/browse/SOLR-4128 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 4.0 Environment: all Reporter: Aaron Daubman Priority: Minor With a schema that includes a dynamic field that matches 'score' (e.g. s* or even just *) text response writers (json, python, etc...) will return score as an array, e.g.: score: [ 17.522964 ] For now, a workaround (courtesy of hoss) is adding a non-indexed, non-stored, non-multivalued 'score' field to schema.xml, e.g.: field name=score type=string indexed=false stored=false multiValued=false/ Note that this will happen for anybody following the older default schema.xml where * was used to ignore undesired fields (e.g. as mentioned in https://issues.apache.org/jira/browse/SOLR-217?focusedCommentId=12492357page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12492357 ) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4128) multivalued dynamicField matching 'score' causes text response writers to output score as an array
[ https://issues.apache.org/jira/browse/SOLR-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Aaron Daubman updated SOLR-4128: Description: With a schema that includes a dynamic field that matches 'score' (e.g. s* or even just *) text response writers (json, python, etc...) will return score as an array, e.g.: score: [ 17.522964 ] For now, a workaround (courtesy of hoss) is adding a non-indexed, non-stored, non-multivalued 'score' field to schema.xml, e.g.: field name=score type=string indexed=false stored=false multiValued=false/ Note that this will happen for anybody following the current (or older) example schema.xml where * was used to ignore undesired fields (from: SOLR-217): https://github.com/apache/lucene-solr/blob/trunk/solr/example/solr/collection1/conf/schema.xml#L214 was: With a schema that includes a dynamic field that matches 'score' (e.g. s* or even just *) text response writers (json, python, etc...) will return score as an array, e.g.: score: [ 17.522964 ] For now, a workaround (courtesy of hoss) is adding a non-indexed, non-stored, non-multivalued 'score' field to schema.xml, e.g.: field name=score type=string indexed=false stored=false multiValued=false/ Note that this will happen for anybody following the older default schema.xml where * was used to ignore undesired fields (e.g. as mentioned in https://issues.apache.org/jira/browse/SOLR-217?focusedCommentId=12492357page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-12492357 ) multivalued dynamicField matching 'score' causes text response writers to output score as an array -- Key: SOLR-4128 URL: https://issues.apache.org/jira/browse/SOLR-4128 Project: Solr Issue Type: Bug Components: Response Writers Affects Versions: 4.0 Environment: all Reporter: Aaron Daubman Priority: Minor Labels: array, ignore, schema, score With a schema that includes a dynamic field that matches 'score' (e.g. s* or even just *) text response writers (json, python, etc...) will return score as an array, e.g.: score: [ 17.522964 ] For now, a workaround (courtesy of hoss) is adding a non-indexed, non-stored, non-multivalued 'score' field to schema.xml, e.g.: field name=score type=string indexed=false stored=false multiValued=false/ Note that this will happen for anybody following the current (or older) example schema.xml where * was used to ignore undesired fields (from: SOLR-217): https://github.com/apache/lucene-solr/blob/trunk/solr/example/solr/collection1/conf/schema.xml#L214 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: composition of different queries based scores
for boosting the term for the same example is the above example is valid ? (hello^0.5* OR hello^0.5~) On Tue, Nov 27, 2012 at 11:22 PM, Jack Krupansky j...@basetechnology.comwrote: The fuzzy option will be ignored here – you cannot combine fuzzy and wild on the same term, although you could do an OR of the two: (hello* OR hello~) -- Jack Krupansky *From:* sri krishna krishnai...@gmail.com *Sent:* Tuesday, November 27, 2012 11:08 AM *To:* dev@lucene.apache.org *Subject:* composition of different queries based scores for a search string hello*~ how the scoring is calculated? as the formula given in the url: http://lucene.apache.org/core/old_versioned_docs/versions/3_0_1/api/core/org/apache/lucene/search/Similarity.html, doesnot take into consideration of edit distance and prefix term corresponding factors into account. Does lucene add up the scores obtained from each type of query included i.e for the above query actual score=default scoring+1/(edit distance)+prefix match score ?, If so, there is no normalization between scores, else what is the approach lucene follows starting from seperating each query based identifiers like (~(edit distance), *(prefix query) etc) to actual scoring.
[jira] [Assigned] (LUCENE-4574) FunctionQuery ValueSource value computed twice per document
[ https://issues.apache.org/jira/browse/LUCENE-4574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Smiley reassigned LUCENE-4574: Assignee: David Smiley FunctionQuery ValueSource value computed twice per document --- Key: LUCENE-4574 URL: https://issues.apache.org/jira/browse/LUCENE-4574 Project: Lucene - Core Issue Type: Bug Components: core/search Affects Versions: 4.0, 4.1 Reporter: David Smiley Assignee: David Smiley Attachments: LUCENE-4574.patch, LUCENE-4574.patch, Test_for_LUCENE-4574.patch I was working on a custom ValueSource and did some basic profiling and debugging to see if it was being used optimally. To my surprise, the value was being fetched twice per document in a row. This computation isn't exactly cheap to calculate so this is a big problem. I was able to work-around this problem trivially on my end by caching the last value with corresponding docid in my FunctionValues implementation. Here is an excerpt of the code path to the first execution: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:291) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} And here is the 2nd call: {noformat} at org.apache.lucene.queries.function.docvalues.DoubleDocValues.floatVal(DoubleDocValues.java:48) at org.apache.lucene.queries.function.FunctionQuery$AllScorer.score(FunctionQuery.java:153) at org.apache.lucene.search.ScoreCachingWrappingScorer.score(ScoreCachingWrappingScorer.java:56) at org.apache.lucene.search.FieldComparator$RelevanceComparator.copy(FieldComparator.java:951) at org.apache.lucene.search.TopFieldCollector$OneComparatorScoringMaxScoreCollector.collect(TopFieldCollector.java:312) at org.apache.lucene.search.Scorer.score(Scorer.java:62) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:588) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:280) {noformat} The 2nd call appears to use some score caching mechanism, which is all well and good, but that same mechanism wasn't used in the first call so there's no cached value to retrieve. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: pro coding style
i dont see pom.xml in your source tree. Instead of educating others about what's good and bad how about if you take some more time studying the sources of Lucene/ Solr and its build system? Your observations are superficial to say the least: POM files are generated dynamically, the test infrastructure is among the more sophisticated things to be found; with multiple CI systems running the code all the time, the coverage is great across JVMs, the randomization really brings up bugs nobody thought to cover manually. Dawid - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org