[ANNOUNCE] Apache Solr 3.3
July 2011, Apache Solr™ 3.3 available The Lucene PMC is pleased to announce the release of Apache Solr 3.3. Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. This release contains numerous bug fixes, optimizations, and improvements, some of which are highlighted below. The release is available for immediate download at: http://www.apache.org/dyn/closer.cgi/lucene/solr (see note below). See the CHANGES.txt file included with the release for a full list of details as well as instructions on upgrading. Solr 3.3 Release Highlights * Grouping / Field Collapsing * A new, automaton-based suggest/autocomplete implementation offering an order of magnitude smaller RAM consumption. * KStemFilterFactory, an optimized implementation of a less aggressive stemmer for English. * Solr defaults to a new, more efficient merge policy (TieredMergePolicy). See http://s.apache.org/merging for more information. * Important bugfixes, including extremely high RAM usage in spellchecking. * Bugfixes and improvements from Apache Lucene 3.3 Note: The Apache Software Foundation uses an extensive mirroring network for distributing releases. It is possible that the mirror you are using may not have replicated the release yet. If that is the case, please try another mirror. This also goes for Maven access. Thanks, Apache Solr Developers - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 9216 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9216/ 1 tests failed. REGRESSION: org.apache.lucene.facet.util.TestScoredDocIDsUtils.testWithDeletions Error Message: Deleted docs must not appear in the allDocsScoredDocIds set Stack Trace: junit.framework.AssertionFailedError: Deleted docs must not appear in the allDocsScoredDocIds set at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348) at org.apache.lucene.facet.util.TestScoredDocIDsUtils.testWithDeletions(TestScoredDocIDsUtils.java:137) Build Log (for compile errors): [...truncated 4725 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
Hi, I don't understand the whole discussion here, so please compare these two implementations and tell me which one is faster. Please don't hurt me, if you don't want to see src.jar code from OpenJDK Java6 - just delete this mail if you don’t want to (the code here is licensed under GPL): Arrays.java: /** * Copies the specified array, truncating or padding with zeros (if necessary) * so the copy has the specified length. For all indices that are * valid in both the original array and the copy, the two arrays will * contain identical values. For any indices that are valid in the * copy but not the original, the copy will contain tt(byte)0/tt. * Such indices will exist if and only if the specified length * is greater than that of the original array. * * @param original the array to be copied * @param newLength the length of the copy to be returned * @return a copy of the original array, truncated or padded with zeros * to obtain the specified length * @throws NegativeArraySizeException if ttnewLength/tt is negative * @throws NullPointerException if ttoriginal/tt is null * @since 1.6 */ public static byte[] copyOf(byte[] original, int newLength) { byte[] copy = new byte[newLength]; System.arraycopy(original, 0, copy, 0, Math.min(original.length, newLength)); return copy; } This is our implementation, simon replaced and Robert reverted (UnsafeByteArrayOutputStream): private void grow(int newLength) { // It actually should be: (Java 1.7, when its intrinsic on all machines) // buffer = Arrays.copyOf(buffer, newLength); byte[] newBuffer = new byte[newLength]; System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); buffer = newBuffer; } So please look at the code, where is a difference that could slow down, except the Math.min() call which is an intrinsic in almost every JDK on earth? The problem we are talking about here is only about the generic Object[] copyOf method and also affects e.g. *all* Collection.toArray() methods - they all use this code, so whenever you use ArrayList.toArray() or similar, the slow code is executed. This is why we replaced Collections.sort() by CollectionUtil.sort, that does no array copy. Simon me were not willing to replace the reallocations in FST code (Mike you remember, we reverted that on your GIT repo when we did perf tests) and other parts in Lucene (there are only few of them). The idea was only to replace primitive type code to make it easier readable. And with later JDK code it could even get faster (not slower), if Oracle starts to add intrinsics for those new methods (and that’s Dawid and mine reason to change to copyTo for primitive types). In general, if you use Java SDK methods, that are as fast as ours, they always have a chance to get faster in later JDKs. So we should always prefer Java SDK methods, unless they are slower because their default impl is too generic or has too much safety checks or uses reflection. To come back to UnsafeByteArrayOutputStream: I would change the whole code, as I don’t like the allocation strategy in it (it's exponential, on every grow it doubles its size). We should change that to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar allocation strategy like in trunk. Then we can discuss about this problem again when Simon me wants to change ArrayUtils.grow methods to use Arrays.copyOf... *g* [just joking, I will never ask again, because this discussion here is endless and does not bring us forward]. The other thing I don’t like in the new faceting module is duplication of vint code. Why don’t we change it to use DataInput/DataOutput and use Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be much more streamlined with all the code we currently have. Then we can encode the payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput class. Uwe (getting crazy) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Friday, July 01, 2011 7:47 AM To: Michael McCandless Cc: dev@lucene.apache.org; Dawid Weiss Subject: Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe ByteArrayOutputStream.java On Fri, Jul 1, 2011 at 12:19 AM, Michael McCandless luc...@mikemccandless.com wrote: On Thu, Jun 30, 2011 at 4:45 PM, Simon Willnauer simon.willna...@googlemail.com wrote: On Thu, Jun 30, 2011 at 8:50 PM, Dawid Weiss dawid.we...@cs.put.poznan.pl wrote: I don't seen any evidence that this is any slower though. You need to run with -client (if the machine is a beast this is tricky because x64
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9221/ 11 tests failed. FAILED: org.apache.lucene.util.TestFieldCacheSanityChecker.testIndexAndMerge Error Message: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stack Trace: junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. at java.lang.Thread.run(Thread.java:636) REGRESSION: org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers Error Message: Only one compound segment should exist expected:3 but was:4 Stack Trace: junit.framework.AssertionFailedError: Only one compound segment should exist expected:3 but was:4 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) at org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers(TestAddIndexes.java:952) REGRESSION: org.apache.lucene.index.TestCompoundFile.testSingleFile Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.testSingleFile(TestCompoundFile.java:203) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.index.CompoundFileWriter at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) REGRESSION: org.apache.lucene.index.TestCompoundFile.testTwoFiles Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.testTwoFiles(TestCompoundFile.java:226) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testRandomFiles Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.testRandomFiles(TestCompoundFile.java:276) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305) at org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing(TestCompoundFile.java:371) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testRandomAccess Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305) at org.apache.lucene.index.TestCompoundFile.testRandomAccess(TestCompoundFile.java:428) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testRandomAccessClones Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305) at org.apache.lucene.index.TestCompoundFile.testRandomAccessClones(TestCompoundFile.java:507) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at
RE: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure
Is fixed now, was a problem during cutover to 3.3 backwards tests. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Apache Jenkins Server [mailto:jenk...@builds.apache.org] Sent: Friday, July 01, 2011 9:17 AM To: dev@lucene.apache.org Subject: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9221/ 11 tests failed. FAILED: org.apache.lucene.util.TestFieldCacheSanityChecker.testIndexAndMerge Error Message: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stack Trace: junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. at java.lang.Thread.run(Thread.java:636) REGRESSION: org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers Error Message: Only one compound segment should exist expected:3 but was:4 Stack Trace: junit.framework.AssertionFailedError: Only one compound segment should exist expected:3 but was:4 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1195) at org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers(TestAddInd exes.java:952) REGRESSION: org.apache.lucene.index.TestCompoundFile.testSingleFile Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.testSingleFile(TestCompoundFil e.java:203) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1195) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.index.CompoundFileWriter at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) REGRESSION: org.apache.lucene.index.TestCompoundFile.testTwoFiles Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.testTwoFiles(TestCompoundFil e.java:226) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testRandomFiles Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.testRandomFiles(TestCompoun dFile.java:276) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.jav a:305) at org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing(Test CompoundFile.java:371) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testRandomAccess Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.jav a:305) at org.apache.lucene.index.TestCompoundFile.testRandomAccess(TestCompo undFile.java:428) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc eneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testRandomAccessClones Error Message:
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9222 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9222/ 11 tests failed. FAILED: org.apache.lucene.util.TestFieldCacheSanityChecker.testIndexAndMerge Error Message: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. Stack Trace: junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please note the time in the report does not reflect the time until the VM exit. at java.lang.Thread.run(Thread.java:636) REGRESSION: org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers Error Message: Only one compound segment should exist expected:3 but was:4 Stack Trace: junit.framework.AssertionFailedError: Only one compound segment should exist expected:3 but was:4 at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) at org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers(TestAddIndexes.java:952) REGRESSION: org.apache.lucene.index.TestCompoundFile.testSingleFile Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.testSingleFile(TestCompoundFile.java:203) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) Caused by: java.lang.ClassNotFoundException: org.apache.lucene.index.CompoundFileWriter at java.net.URLClassLoader$1.run(URLClassLoader.java:217) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:205) at java.lang.ClassLoader.loadClass(ClassLoader.java:321) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294) at java.lang.ClassLoader.loadClass(ClassLoader.java:266) REGRESSION: org.apache.lucene.index.TestCompoundFile.testTwoFiles Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.testTwoFiles(TestCompoundFile.java:226) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testRandomFiles Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.testRandomFiles(TestCompoundFile.java:276) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305) at org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing(TestCompoundFile.java:371) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testRandomAccess Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305) at org.apache.lucene.index.TestCompoundFile.testRandomAccess(TestCompoundFile.java:428) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195) REGRESSION: org.apache.lucene.index.TestCompoundFile.testRandomAccessClones Error Message: org/apache/lucene/index/CompoundFileWriter Stack Trace: java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter at org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305) at org.apache.lucene.index.TestCompoundFile.testRandomAccessClones(TestCompoundFile.java:507) at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277) at
[jira] [Created] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt
use of FST for SynonymsFilterFactory and synonyms.txt - Key: SOLR-2628 URL: https://issues.apache.org/jira/browse/SOLR-2628 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.4, 4.0 Environment: Linux Reporter: Bernd Fehling Priority: Minor Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. This can generate huge maps because of the permutations for synonyms. Now where FST (finite state transducer) is introduced to lucene this could also be used for synonyms. A tool can compile the synoynms.txt file to a binary automaton file which can then be used with SynoynmsFilterFactory. Advantage: - faster start of solr, no need to generate SynonymsMap - faster lookup - memory saving -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt
[ https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned SOLR-2628: - Assignee: Dawid Weiss use of FST for SynonymsFilterFactory and synonyms.txt - Key: SOLR-2628 URL: https://issues.apache.org/jira/browse/SOLR-2628 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.4, 4.0 Environment: Linux Reporter: Bernd Fehling Assignee: Dawid Weiss Priority: Minor Labels: suggestion Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. This can generate huge maps because of the permutations for synonyms. Now where FST (finite state transducer) is introduced to lucene this could also be used for synonyms. A tool can compile the synoynms.txt file to a binary automaton file which can then be used with SynoynmsFilterFactory. Advantage: - faster start of solr, no need to generate SynonymsMap - faster lookup - memory saving -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [jira] [Created] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt
I don't need information about solr projects. * * **yazhini**
[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt
[ https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058241#comment-13058241 ] Dawid Weiss commented on SOLR-2628: --- I've talked about it a little bit with Bernd and indeed, it seems possible to reduce the size of in-memory data structures by an order of magnitude (or even two orders of magnitude, we shall see). I'm on vacation for the next week and on a business trip for another one after that, but I'll be on it once I come back home. use of FST for SynonymsFilterFactory and synonyms.txt - Key: SOLR-2628 URL: https://issues.apache.org/jira/browse/SOLR-2628 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.4, 4.0 Environment: Linux Reporter: Bernd Fehling Assignee: Dawid Weiss Priority: Minor Labels: suggestion Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. This can generate huge maps because of the permutations for synonyms. Now where FST (finite state transducer) is introduced to lucene this could also be used for synonyms. A tool can compile the synoynms.txt file to a binary automaton file which can then be used with SynoynmsFilterFactory. Advantage: - faster start of solr, no need to generate SynonymsMap - faster lookup - memory saving -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9222 - Still Failing
I don't need message from your mail sir.
[jira] [Created] (SOLR-2629) warning about org.apache.solr.request.SolrQueryResponse is deprecated
warning about org.apache.solr.request.SolrQueryResponse is deprecated - Key: SOLR-2629 URL: https://issues.apache.org/jira/browse/SOLR-2629 Project: Solr Issue Type: Bug Components: web gui Affects Versions: 3.2, 3.1, 3.3 Environment: Linux Reporter: Bernd Fehling Priority: Trivial The web admin interface uses the deprecated method org.apache.solr.request.SolrQueryResponse from within files: - solr/src/webapp/web/admin/replication/header.jsp - solr/src/webapp/web/admin/ping.jsp That should be changed to use org.apache.solr.response.SolrQueryResponse -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2623) Solr JMX MBeans do not survive core reloads
[ https://issues.apache.org/jira/browse/SOLR-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058461#comment-13058461 ] Shalin Shekhar Mangar commented on SOLR-2623: - Hoss, I wish there was a way to do just that. I looked and looked but couldn't find it. The JMX API is really screwed up. Once you send in a MBean, apparently you can't get it out again. I'd be interested if anyone knew of a way to do that. Solr JMX MBeans do not survive core reloads --- Key: SOLR-2623 URL: https://issues.apache.org/jira/browse/SOLR-2623 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 1.4, 1.4.1, 3.1, 3.2 Reporter: Alexey Serba Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-2623.patch, SOLR-2623.patch, SOLR-2623.patch Solr JMX MBeans do not survive core reloads {noformat:title=Steps to reproduce} sh cd example sh vi multicore/core0/conf/solrconfig.xml # enable jmx sh java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar start.jar sh echo 'open 8842 # 8842 is java pid domain solr/core0 beans ' | java -jar jmxterm-1.0-alpha-4-uber.jar solr/core0:id=core0,type=core solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler ... solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler sh curl 'http://localhost:8983/solr/admin/cores?action=RELOADcore=core0' sh echo 'open 8842 # 8842 is java pid domain solr/core0 beans ' | java -jar jmxterm-1.0-alpha-4-uber.jar # there's only one bean left after Solr core reload solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 main {noformat} The root cause of this is Solr core reload behavior: # create new core (which overwrites existing registered MBeans) # register new core and close old one (we remove/un-register MBeans on oldCore.close) The correct sequence is: # unregister MBeans from old core # create and register new core # close old core without touching MBeans -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt
[ https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058468#comment-13058468 ] Michael McCandless commented on SOLR-2628: -- Dawid, have a look at LUCENE-3233 -- we have a [very very rough] start at this. use of FST for SynonymsFilterFactory and synonyms.txt - Key: SOLR-2628 URL: https://issues.apache.org/jira/browse/SOLR-2628 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.4, 4.0 Environment: Linux Reporter: Bernd Fehling Assignee: Dawid Weiss Priority: Minor Labels: suggestion Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. This can generate huge maps because of the permutations for synonyms. Now where FST (finite state transducer) is introduced to lucene this could also be used for synonyms. A tool can compile the synoynms.txt file to a binary automaton file which can then be used with SynoynmsFilterFactory. Advantage: - faster start of solr, no need to generate SynonymsMap - faster lookup - memory saving -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt
[ https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss resolved SOLR-2628. --- Resolution: Duplicate Duplicate of LUCENE-3233 use of FST for SynonymsFilterFactory and synonyms.txt - Key: SOLR-2628 URL: https://issues.apache.org/jira/browse/SOLR-2628 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.4, 4.0 Environment: Linux Reporter: Bernd Fehling Assignee: Dawid Weiss Priority: Minor Labels: suggestion Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. This can generate huge maps because of the permutations for synonyms. Now where FST (finite state transducer) is introduced to lucene this could also be used for synonyms. A tool can compile the synoynms.txt file to a binary automaton file which can then be used with SynoynmsFilterFactory. Advantage: - faster start of solr, no need to generate SynonymsMap - faster lookup - memory saving -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt
[ https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058470#comment-13058470 ] Dawid Weiss commented on SOLR-2628: --- Yep, this is a duplicate. Thanks Mike. Like I said -- I won't be able to work on this for the next two weeks (I also have that FST refactoring opened up in the background... it's progressing slowly), but it's definitely a low-hanging fruit to pick because it shouldn't be very difficult and the gains huge. use of FST for SynonymsFilterFactory and synonyms.txt - Key: SOLR-2628 URL: https://issues.apache.org/jira/browse/SOLR-2628 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.4, 4.0 Environment: Linux Reporter: Bernd Fehling Assignee: Dawid Weiss Priority: Minor Labels: suggestion Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. This can generate huge maps because of the permutations for synonyms. Now where FST (finite state transducer) is introduced to lucene this could also be used for synonyms. A tool can compile the synoynms.txt file to a binary automaton file which can then be used with SynoynmsFilterFactory. Advantage: - faster start of solr, no need to generate SynonymsMap - faster lookup - memory saving -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt
[ https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058475#comment-13058475 ] Michael McCandless commented on SOLR-2628: -- I think the reduction of RAM should be huge but lookup speed might be slower (ie the usual tradeoff of FST), since we are going char by char in the FST. If we go word-by-word (ie FST's labels are word ords and we separately resolve word - ord via normal hash lookup) then that might be a good middle ground... but this is all speculation for now! use of FST for SynonymsFilterFactory and synonyms.txt - Key: SOLR-2628 URL: https://issues.apache.org/jira/browse/SOLR-2628 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.4, 4.0 Environment: Linux Reporter: Bernd Fehling Assignee: Dawid Weiss Priority: Minor Labels: suggestion Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. This can generate huge maps because of the permutations for synonyms. Now where FST (finite state transducer) is introduced to lucene this could also be used for synonyms. A tool can compile the synoynms.txt file to a binary automaton file which can then be used with SynoynmsFilterFactory. Advantage: - faster start of solr, no need to generate SynonymsMap - faster lookup - memory saving -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module
[ https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058476#comment-13058476 ] Michael McCandless commented on LUCENE-3272: Big +1! We've needed query parsing factored out for a lng time. And cutting tests over to a new MockQP, and then simply moving (but not merging) all QPs together to a module, sounds like great first steps. Note that the FieldType work (at least as currently planned/targetted) isn't a schema -- it's really just a nicer API for working with documents. Ie, nothing is persisted, nothing checks that 2 docs have the fields / types, etc. Still, it would be great to pull Solr's QP in and somehow abstract the parts that require access to Solr's schema. Consolidate Lucene's QueryParsers into a module --- Key: LUCENE-3272 URL: https://issues.apache.org/jira/browse/LUCENE-3272 Project: Lucene - Java Issue Type: Improvement Components: modules/queryparser Reporter: Chris Male Lucene has a lot of QueryParsers and we should have them all in a single consistent place. The following are QueryParsers I can find that warrant moving to the new module: - Lucene Core's QueryParser - AnalyzingQueryParser - ComplexPhraseQueryParser - ExtendableQueryParser - Surround's QueryParser - PrecedenceQueryParser - StandardQueryParser - XML-Query-Parser's CoreParser All seem to do a good job at their kind of parsing with extensive tests. One challenge of consolidating these is that many tests use Lucene Core's QueryParser. One option is to just replicate this class in src/test and call it TestingQueryParser. Another option is to convert all tests over to programmatically building their queries (seems like alot of work). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt
[ https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058489#comment-13058489 ] Dawid Weiss commented on SOLR-2628: --- Yes, this may be the case. It'd need to be investigated because storing words in a hashtable will also bump memory requirements, whereas an FST can at least reuse some prefixes and suffixes. use of FST for SynonymsFilterFactory and synonyms.txt - Key: SOLR-2628 URL: https://issues.apache.org/jira/browse/SOLR-2628 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 3.4, 4.0 Environment: Linux Reporter: Bernd Fehling Assignee: Dawid Weiss Priority: Minor Labels: suggestion Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. This can generate huge maps because of the permutations for synonyms. Now where FST (finite state transducer) is introduced to lucene this could also be used for synonyms. A tool can compile the synoynms.txt file to a binary automaton file which can then be used with SynoynmsFilterFactory. Advantage: - faster start of solr, no need to generate SynonymsMap - faster lookup - memory saving -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: revisit naming for grouping/join?
I think joining and grouping are two different functions, and we should keep different modules for them... On Thu, Jun 30, 2011 at 10:30 PM, Robert Muir rcm...@gmail.com wrote: Hi, when looking at just a very quick glance at some of the newer grouping/join features, I found myself a little confused about what is exactly what, and I think users might too. They are confusing! I discussed some of this with hossman, and it only seemed to make me even more totally confused about: * difference between field collapsing and grouping I like the name grouping better here: I think field collapsing undersells (it's only one specific way to use grouping). EG, grouping w/o collapsing is useful (eg, Best Buy grouping hits by product category and showing the top 5 in each). * difference between nested documents and the index-time join Similarly I think nested docs undersells index-time join: you can join (either during indexing or during searching) in many different ways, and nested docs is just one use case. EG, maybe your docs are doctors but during indexing you join to a city table with facts about that city (each doctor's office is in a specific city) and then you want to run queries like city's avg annual temp 60 and doctor has good bedside manner or something. * difference between index-time-join/nested documents and single-pass index-time grouping. Is the former only a more general case of the latter? Grouping is purely a presentation concern -- you are not altering which docs hit; you are simply changing how you pick which hits to display (top N by group). So we only have collectors here. The generic (requires 2 passes) collectors can group on anything at search time; the doc block collector requires that you indexed all docs in each group as a block. Join is both about restricting matches and also presentation of hits, because your query needs to match fields from different [logical] tables (so, the module has a Query and a Collector). When you get the results back, you may or may not be interested in retaining the table structure in your result set (ie, you may not have selected fields from the child table). Similarly, generic joining (in Solr/ElasticSearch today but I'd like to factor into the join module) can do any join at search time, while the doc block collector requires that you did the necessary join(s) during indexing. * difference between the above joinish capabilities and solr's join impl... other than the single-pass/index-time limitation (which is really an implementation detail), I'm talking about use cases. Solr's/ElasticSearch's join is more general because you can join anything at search time (even, across 2 different indexes), vs doc block join where you must pick which joins you will ever want to use and then build the index accordingly. You can also mix the two. Maybe you do certain joins while indexing, but then at search time you do other joins generically. That's fine. (Same is true for grouping). I think its especially interesting since the join module depends on the grouping module. The join module does currently depend on the grouping module, but for a silly reason: just for the TopGroups, to represent the returned hits. We could move TopGroups/GroupDocs into common (thus justifying its generic name!)? Then both join and grouping modules depend on common. Really TopGroups is just a TopDocs that allows some recursion (ie, each hit may in turn be another TopDocs). But TopGroups is limited now to only depth 2 recursion... we need to fix this for nested grouping. Really we just need a recursive TopDocs here So I am curious if we should: * add docs (maybe with simple examples) in the package.html or otherwise that differentiate what these guys are, or at least agree on some consistent terminology and define it somewhere? I feel like people have explained to me the differences in all these things before, but then its easy to forget. Well, each module's package.html has a start here, but I agree we should do more. I think what would be best is a smallish but feature complete demo, ie pull together some easy-to-understand sample content and the build a small demo app around it. We could then show how to use grouping for field collapsing (and for other use cases), joining for nested docs (and for other use cases), etc. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2630) XsltUpdateRequestHandler
XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Upayavira Priority: Minor Fix For: 4.0 Attachments: xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Upayavira updated SOLR-2630: Attachment: xslt-update-handler.patch Patch for XsltUpdateRequestHandler, along with a test case for it XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Upayavira Priority: Minor Fix For: 4.0 Attachments: xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Issues with Grouping
On Thu, Jun 30, 2011 at 11:58 PM, Bill Bell billnb...@gmail.com wrote: I meant FC insanity. It does not appear to be an NPE. That's natural, and not a bug. Grouping always uses per-segment field cache entries, where faceting sometimes uses top level field caches. -Yonik http://www.lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058540#comment-13058540 ] Uwe Schindler commented on SOLR-2630: - XML is binary data, so you should not convert it to Strings. Ideally the already transformed DOM tree or SAX stream would directly be passed to the importer. I know, this is not easily possible, so the most correct way would be to pass the binary byte[] dierectly and reparse. I will try to investigate to directly pass the SAX events / XSL DOM tree around, which is possible, as transformer API can also directly pipe to StAX, used by the underlying XMLImporter. XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Upayavira Priority: Minor Fix For: 4.0 Attachments: xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058541#comment-13058541 ] Uwe Schindler commented on SOLR-2630: - Also you miss to pass the content type charset to the StreamSource. I will post a improved patch fixing both problems soon. Thanks for the patch! XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Upayavira Priority: Minor Fix For: 4.0 Attachments: xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned SOLR-2630: --- Assignee: Uwe Schindler XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Upayavira Assignee: Uwe Schindler Priority: Minor Fix For: 4.0 Attachments: xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, I don't understand the whole discussion here, so please compare these two implementations and tell me which one is faster. Please don't hurt me, if you don't want to see src.jar code from OpenJDK Java6 - just delete this mail if you don’t want to (the code here is licensed under GPL): This is the source code for a specific version of one specific Java impl. If we knew all Java impls simply implemented the primitive case using System.arraycopy (admittedly it's hard to imagine that they wouldn't!) then we are fine. This is our implementation, simon replaced and Robert reverted (UnsafeByteArrayOutputStream): private void grow(int newLength) { // It actually should be: (Java 1.7, when its intrinsic on all machines) // buffer = Arrays.copyOf(buffer, newLength); byte[] newBuffer = new byte[newLength]; System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); buffer = newBuffer; } So please look at the code, where is a difference that could slow down, except the Math.min() call which is an intrinsic in almost every JDK on earth? Right, in this case (if you used OpenJDK 6) we are obviously OK. Not sure about other cases... The problem we are talking about here is only about the generic Object[] copyOf method and also affects e.g. *all* Collection.toArray() methods - they all use this code, so whenever you use ArrayList.toArray() or similar, the slow code is executed. This is why we replaced Collections.sort() by CollectionUtil.sort, that does no array copy. Simon me were not willing to replace the reallocations in FST code (Mike you remember, we reverted that on your GIT repo when we did perf tests) and other parts in Lucene (there are only few of them). The idea was only to replace primitive type code to make it easier readable. And with later JDK code it could even get faster (not slower), if Oracle starts to add intrinsics for those new methods (and that’s Dawid and mine reason to change to copyTo for primitive types). In general, if you use Java SDK methods, that are as fast as ours, they always have a chance to get faster in later JDKs. So we should always prefer Java SDK methods, unless they are slower because their default impl is too generic or has too much safety checks or uses reflection. OK I'm convinced (I think!) that for primitive types only, let's use Arrays.copyOf! To come back to UnsafeByteArrayOutputStream: I would change the whole code, as I don’t like the allocation strategy in it (it's exponential, on every grow it doubles its size). We should change that to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar allocation strategy like in trunk. Then we can discuss about this problem again when Simon me wants to change ArrayUtils.grow methods to use Arrays.copyOf... *g* [just joking, I will never ask again, because this discussion here is endless and does not bring us forward]. Well, it sounds like for primitive types, we can cutover ArrayUtils.grow methods. Then we can look @ the nightly bench the next day ;) But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it (almost) a dup of ByteArrayDataOutput? The other thing I don’t like in the new faceting module is duplication of vint code. Why don’t we change it to use DataInput/DataOutput and use Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be much more streamlined with all the code we currently have. Then we can encode the payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput class. That sounds good! Uwe can you commit TODOs to the code w/ these ideas? Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2309) Fully decouple IndexWriter from analyzers
[ https://issues.apache.org/jira/browse/LUCENE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058548#comment-13058548 ] Michael McCandless commented on LUCENE-2309: Great! This will overlap w/ the field type work (we have branch for this now), where we already have decoupled indexer from concrete Field/Document impls, by adding a minimal IndexableField. I think this issue should further that, ie pare back IndexableField so that there's only a getTokenStream for indexing (ie indexer will no longer try for String then Reader then tokenStream), and Analyzer must move to the FieldType and not be passed to IndexWriterConfig. Multi-valued fields will be tricky, since IW now asks analyzer for the gaps... Fully decouple IndexWriter from analyzers - Key: LUCENE-2309 URL: https://issues.apache.org/jira/browse/LUCENE-2309 Project: Lucene - Java Issue Type: Improvement Components: core/index Reporter: Michael McCandless Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 IndexWriter only needs an AttributeSource to do indexing. Yet, today, it interacts with Field instances, holds a private analyzers, invokes analyzer.reusableTokenStream, has to deal with a wide variety (it's not analyzed; it is analyzed but it's a Reader, String; it's pre-analyzed). I'd like to have IW only interact with attr sources that already arrived with the fields. This would be a powerful decoupling -- it means others are free to make their own attr sources. They need not even use any of Lucene's analysis impls; eg they can integrate to other things like [OpenPipeline|http://www.openpipeline.org]. Or make something completely custom. LUCENE-2302 is already a big step towards this: it makes IW agnostic about which attr is the term, and only requires that it provide a BytesRef (for flex). Then I think LUCENE-2308 would get us most of the remaining way -- ie, if the FieldType knows the analyzer to use, then we could simply create a getAttrSource() method (say) on it and move all the logic IW has today onto there. (We'd still need existing IW code for back-compat). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules
[ https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058549#comment-13058549 ] Michael McCandless commented on LUCENE-2883: +1, this is great Chris! Consolidate Solr Lucene FunctionQuery into modules - Key: LUCENE-2883 URL: https://issues.apache.org/jira/browse/LUCENE-2883 Project: Lucene - Java Issue Type: Task Components: core/search Affects Versions: 4.0 Reporter: Simon Willnauer Assignee: Chris Male Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: 4.0 Attachments: LUCENE-2883.patch Spin-off from the [dev list | http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html] -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: revisit naming for grouping/join?
I think what would be best is a smallish but feature complete demo, For the nested stuff I had a reasonable demo on LUCENE-2454 that was based around resumes - that use case has the one-to-many characteristics that lends itself to nested e.g. a person has many different qualifications and records of employment. This scenario was illustrated here: http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene I also had the book search type scenario where a book has many sections and for the purposes of efficient highlighting/summarisation these sections were treated as child docs which could be read quickly (rather than highlighting a whole book) I'm not sure what the parent was in your doctor and cities example, Mike. If a doctor is in only one city then there is no point making city a child doc as the one city info can happily be combined with the doctor info into a single document with no conflict (doctors have different properties to cities). If the city is the parent with many child doctor docs that makes more sense but feels like a less likely use case e.g. find me a city with doctor x and a different doctor y Searching for a person with excellent java and prefrerably good lucene skills feels like a more real-world example. It feels like documenting some of the trade-offs behind index design choices is useful too e.g. nesting is not too great for very volatile content with constantly changing children while search-time join is more costly in RAM and 2-pass processing Cheers Mark - Original Message From: Michael McCandless luc...@mikemccandless.com To: dev@lucene.apache.org Sent: Fri, 1 July, 2011 13:51:04 Subject: Re: revisit naming for grouping/join? I think joining and grouping are two different functions, and we should keep different modules for them... On Thu, Jun 30, 2011 at 10:30 PM, Robert Muir rcm...@gmail.com wrote: Hi, when looking at just a very quick glance at some of the newer grouping/join features, I found myself a little confused about what is exactly what, and I think users might too. They are confusing! I discussed some of this with hossman, and it only seemed to make me even more totally confused about: * difference between field collapsing and grouping I like the name grouping better here: I think field collapsing undersells (it's only one specific way to use grouping). EG, grouping w/o collapsing is useful (eg, Best Buy grouping hits by product category and showing the top 5 in each). * difference between nested documents and the index-time join Similarly I think nested docs undersells index-time join: you can join (either during indexing or during searching) in many different ways, and nested docs is just one use case. EG, maybe your docs are doctors but during indexing you join to a city table with facts about that city (each doctor's office is in a specific city) and then you want to run queries like city's avg annual temp 60 and doctor has good bedside manner or something. * difference between index-time-join/nested documents and single-pass index-time grouping. Is the former only a more general case of the latter? Grouping is purely a presentation concern -- you are not altering which docs hit; you are simply changing how you pick which hits to display (top N by group). So we only have collectors here. The generic (requires 2 passes) collectors can group on anything at search time; the doc block collector requires that you indexed all docs in each group as a block. Join is both about restricting matches and also presentation of hits, because your query needs to match fields from different [logical] tables (so, the module has a Query and a Collector). When you get the results back, you may or may not be interested in retaining the table structure in your result set (ie, you may not have selected fields from the child table). Similarly, generic joining (in Solr/ElasticSearch today but I'd like to factor into the join module) can do any join at search time, while the doc block collector requires that you did the necessary join(s) during indexing. * difference between the above joinish capabilities and solr's join impl... other than the single-pass/index-time limitation (which is really an implementation detail), I'm talking about use cases. Solr's/ElasticSearch's join is more general because you can join anything at search time (even, across 2 different indexes), vs doc block join where you must pick which joins you will ever want to use and then build the index accordingly. You can also mix the two. Maybe you do certain joins while indexing, but then at search time you do other joins generically. That's fine. (Same is true for grouping). I think its especially interesting since the join module depends on the grouping module. The join module does currently depend on the grouping module, but for a silly reason: just for the TopGroups, to represent the returned hits. We could
[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058552#comment-13058552 ] Upayavira commented on SOLR-2630: - Great! I was sure I'd missed stuff. Happy to improve stuff here too (e.g. port to 3.x). XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Upayavira Assignee: Uwe Schindler Priority: Minor Fix For: 4.0 Attachments: xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: revisit naming for grouping/join?
On Fri, Jul 1, 2011 at 8:51 AM, Michael McCandless luc...@mikemccandless.com wrote: The join module does currently depend on the grouping module, but for a silly reason: just for the TopGroups, to represent the returned hits. We could move TopGroups/GroupDocs into common (thus justifying its generic name!)? Then both join and grouping modules depend on common. Just a suggestion: maybe they belong in the lucene core? And maybe the stuff in the common module belongs in lucene core's util package? I guess I'm suggesting we try to keep our modules as flat as possible, with as little dependencies as possible. I think we really already have a 'common' module, thats the lucene core. If multiple modules end up relying upon the same functionality, especially if its something simple like an abstract class (Analyzer) or a utility thing (these mutable integers, etc), then thats a good sign it belongs in core apis. I think we really need to try to nuke all these dependencies between modules: its great to add them as a way to get refactoring started, but ultimately we should try to clean up: because we don't want a complex 'graph' of dependencies but instead something dead-simple. I made a total mess with the analyzers module at first, i think everything depended on it! but now we have nuked almost all dependencies on this thing, except for where it makes sense to have that concrete dependency (benchmark, demo, solr). I think what would be best is a smallish but feature complete demo, ie pull together some easy-to-understand sample content and the build a small demo app around it. We could then show how to use grouping for field collapsing (and for other use cases), joining for nested docs (and for other use cases), etc. For the same reason listed above, I think we should take our contrib/demo and consolidate 'examples' across various places into this demo module. The reason is: * examples typically depend upon 'concrete' stuff, but in general core stuff should work around interfaces/abstract classes: e.g. the faceting module has an analyzers dependency only because of its examples. * examples might want to integrate modules, e.g. an example of how to integrate faceting and grouping or something like that. * examples are important: i think if the same question comes up on the user list often, we should consider adding an example. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #168: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/168/ No tests ran. Build Log (for compile errors): [...truncated 7442 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2630: Attachment: xslt-update-handler.patch Here improved patch. This impl does not internally serialize the XML again to a stream and read it using StAX; this one uses the XSL ResultTreeFragment (RTF) which is always built as a DOM tree by XSL transformers and feeds it to StAX. So we dont need any stupid serialize/deserialize step inbetween. This patch also respects the content-type parameter of the input like XMLLoader. The intermediate buffering is needed because we need to change from push to pull APIs. This patch also fixes a small issue in XSLTResponseWriter, as it also misses to correctly log transformation warn/error events to slf4j. XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 4.0 Reporter: Upayavira Assignee: Uwe Schindler Priority: Minor Fix For: 4.0 Attachments: xslt-update-handler.patch, xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
@Rory, @All, The only tickets I currently have for those is LUCENE-419, LUCENE-418 418, I should be able to push into the 2.9.4g branch tonight.419 is a long term goal and not as important as getting the tests fixed, of have the tests broken down into what is actually a unit test, functional test, perf or long running test. I can get into more why it needs to be done. I'll also need to make document the what build script currently does on the wiki and make a few notes about testing, like using the RAMDirectory, etc. Things that need to get done or even be discussed. * There needs to be a running list of things to do/not to do with testing. I don't know if this goes in a jira or do we keep a running list on the wiki or site for people to pick up and help with. * Tests need to run on mono and not Fail (there is a good deal of failing tests on mono, mostly due to the temp directory have the C:\ in the path). * Assert.ThrowExceptionType() needs to be used instead of Try/Catch Assert.Fail. ** * File Path combines to the temp directory need helper methods, * e,g, having this in a hundred places is bad new System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get(tempDir, ), testIndex)); * We should still be testing deprecated methods, but we need to use #pragma warning disable/enable 0618 for testing those. otherwise compiler warnings are too numerous to be anywhere near helpful. * We should only be using deprecated methods in places where they are being explicitly tested, other tests that need that functionality in order to validate those tests should be re factored to use methods that are not deprecated. * Identify code that could be abstracted into test utility classes. * Infrastructure Validation tests need to be made, anything that seems like infrastructure. e.g. does the temp directory exist, does the folders that the tests use inside the temp directory exist, can we read/write to those folders. (if a ton of tests fail due to the file system, we should be able to point out that it was due to permissions or missing folders, files, etc). * Identify what classes need an interface, abstract class or inherited in order to create testing mocks. (once those classes are created, they should be documented in the wiki). ** Asset.Throws needs to replace stuff like the following. We should also be checking the messages for exceptions and make sure they make sense and can help users fix isses if the exceptions are aimed at the library users. try { d = DateTools.StringToDate(97); // no date Assert.Fail(); } catch (System.FormatException e) { /* expected exception */ } On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire codekai...@gmail.com wrote: So, veering towards action - are there concrete tasks written up anywhere for the unit tests? If a poor schlep like me wanted to dig in and start to improve them, where would I get the understanding of what is good and what needs help? -r On Thu, Jun 30, 2011 at 3:29 PM, Digy digyd...@gmail.com wrote: I can not say I like this approach, but till we find an automated way(with good results), it seems to be the only way we can use. DIGY -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Friday, July 01, 2011 12:43 AM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed? Scott - The idea of the automated port is still worth doing. Perhaps it makes sense for someone more passionate about the line-by-line idea to do that work? I would say, focus on what makes sense to you. Being productive, regardless of the specific direction, is what will be most valuable. Once you start, others will join and momentum will build. That is how these things work. I like DIGY's approach too, but the problem with it is that it is a never-ending manual task. The theory behind the automated port is that it may reduce the manual work. It is complicated, but once it's built and works, it will save a lot of future development hours. If it's built in a sufficiently general manner, it could be useful for other project like Lucene.Net that want to automate a port from Java to C#. It might make sense for that to be a separate project from Lucene.Net though. -T On Thu, Jun 30, 2011 at 2:13 PM, Scott Lombard lombardena...@gmail.com wrote: Ok I think I asked the wrong question. I am trying to figure out where to put my time. I was thinking about working on the automated porting system, but when I saw the response to the .NET 4.0 discussions I started to question if that is the right direction. The community seemed to be more interested in the .NET features. The complexity of the automated tool is going to become very high and will probably end up with a line-for-line style port. So I keep asking my self is the automated tool worth it. I don't think it is. I like the
[jira] [Updated] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2630: Affects Version/s: 3.3 Fix Version/s: 3.4 Merging to 3.x should be simple, too! XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.3, 4.0 Reporter: Upayavira Assignee: Uwe Schindler Priority: Minor Fix For: 3.4, 4.0 Attachments: xslt-update-handler.patch, xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
* need to document what the build script does. whut grammerz? On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon mhern...@wickedsoftware.net wrote: @Rory, @All, The only tickets I currently have for those is LUCENE-419, LUCENE-418 418, I should be able to push into the 2.9.4g branch tonight.419 is a long term goal and not as important as getting the tests fixed, of have the tests broken down into what is actually a unit test, functional test, perf or long running test. I can get into more why it needs to be done. I'll also need to make document the what build script currently does on the wiki and make a few notes about testing, like using the RAMDirectory, etc. Things that need to get done or even be discussed. * There needs to be a running list of things to do/not to do with testing. I don't know if this goes in a jira or do we keep a running list on the wiki or site for people to pick up and help with. * Tests need to run on mono and not Fail (there is a good deal of failing tests on mono, mostly due to the temp directory have the C:\ in the path). * Assert.ThrowExceptionType() needs to be used instead of Try/Catch Assert.Fail. ** * File Path combines to the temp directory need helper methods, * e,g, having this in a hundred places is bad new System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get(tempDir, ), testIndex)); * We should still be testing deprecated methods, but we need to use #pragma warning disable/enable 0618 for testing those. otherwise compiler warnings are too numerous to be anywhere near helpful. * We should only be using deprecated methods in places where they are being explicitly tested, other tests that need that functionality in order to validate those tests should be re factored to use methods that are not deprecated. * Identify code that could be abstracted into test utility classes. * Infrastructure Validation tests need to be made, anything that seems like infrastructure. e.g. does the temp directory exist, does the folders that the tests use inside the temp directory exist, can we read/write to those folders. (if a ton of tests fail due to the file system, we should be able to point out that it was due to permissions or missing folders, files, etc). * Identify what classes need an interface, abstract class or inherited in order to create testing mocks. (once those classes are created, they should be documented in the wiki). ** Asset.Throws needs to replace stuff like the following. We should also be checking the messages for exceptions and make sure they make sense and can help users fix isses if the exceptions are aimed at the library users. try { d = DateTools.StringToDate(97); // no date Assert.Fail(); } catch (System.FormatException e) { /* expected exception */ } On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire codekai...@gmail.comwrote: So, veering towards action - are there concrete tasks written up anywhere for the unit tests? If a poor schlep like me wanted to dig in and start to improve them, where would I get the understanding of what is good and what needs help? -r On Thu, Jun 30, 2011 at 3:29 PM, Digy digyd...@gmail.com wrote: I can not say I like this approach, but till we find an automated way(with good results), it seems to be the only way we can use. DIGY -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Friday, July 01, 2011 12:43 AM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed? Scott - The idea of the automated port is still worth doing. Perhaps it makes sense for someone more passionate about the line-by-line idea to do that work? I would say, focus on what makes sense to you. Being productive, regardless of the specific direction, is what will be most valuable. Once you start, others will join and momentum will build. That is how these things work. I like DIGY's approach too, but the problem with it is that it is a never-ending manual task. The theory behind the automated port is that it may reduce the manual work. It is complicated, but once it's built and works, it will save a lot of future development hours. If it's built in a sufficiently general manner, it could be useful for other project like Lucene.Net that want to automate a port from Java to C#. It might make sense for that to be a separate project from Lucene.Net though. -T On Thu, Jun 30, 2011 at 2:13 PM, Scott Lombard lombardena...@gmail.com wrote: Ok I think I asked the wrong question. I am trying to figure out where to put my time. I was thinking about working on the automated porting system, but when I saw the response to the .NET 4.0 discussions I started to question if that is the right direction. The community seemed to be more interested in the .NET features. The complexity of the
RE: revisit naming for grouping/join?
On 7/1/2011 at 10:02 AM, Robert Muir wrote: [...] I think we should take our contrib/demo and consolidate 'examples' across various places into this demo module. The reason is: * examples typically depend upon 'concrete' stuff, but in general core stuff should work around interfaces/abstract classes: e.g. the faceting module has an analyzers dependency only because of its examples. * examples might want to integrate modules, e.g. an example of how to integrate faceting and grouping or something like that. * examples are important: i think if the same question comes up on the user list often, we should consider adding an example. +1
Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java
About the encoders package - there are several encoders there besides VInt, so I wouldn't dispose of it so quickly. That said, I think we should definitely explore consolidating VInt with the core classes, and maybe write an encoder which delegate to them. Or, come up w/ a different approach for allowing to plug in different Encoders. I don't rule out anything, as long as we preserve functionality and capabilities. Shai On Friday, July 1, 2011, Michael McCandless luc...@mikemccandless.com wrote: On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, I don't understand the whole discussion here, so please compare these two implementations and tell me which one is faster. Please don't hurt me, if you don't want to see src.jar code from OpenJDK Java6 - just delete this mail if you don’t want to (the code here is licensed under GPL): This is the source code for a specific version of one specific Java impl. If we knew all Java impls simply implemented the primitive case using System.arraycopy (admittedly it's hard to imagine that they wouldn't!) then we are fine. This is our implementation, simon replaced and Robert reverted (UnsafeByteArrayOutputStream): private void grow(int newLength) { // It actually should be: (Java 1.7, when its intrinsic on all machines) // buffer = Arrays.copyOf(buffer, newLength); byte[] newBuffer = new byte[newLength]; System.arraycopy(buffer, 0, newBuffer, 0, buffer.length); buffer = newBuffer; } So please look at the code, where is a difference that could slow down, except the Math.min() call which is an intrinsic in almost every JDK on earth? Right, in this case (if you used OpenJDK 6) we are obviously OK. Not sure about other cases... The problem we are talking about here is only about the generic Object[] copyOf method and also affects e.g. *all* Collection.toArray() methods - they all use this code, so whenever you use ArrayList.toArray() or similar, the slow code is executed. This is why we replaced Collections.sort() by CollectionUtil.sort, that does no array copy. Simon me were not willing to replace the reallocations in FST code (Mike you remember, we reverted that on your GIT repo when we did perf tests) and other parts in Lucene (there are only few of them). The idea was only to replace primitive type code to make it easier readable. And with later JDK code it could even get faster (not slower), if Oracle starts to add intrinsics for those new methods (and that’s Dawid and mine reason to change to copyTo for primitive types). In general, if you use Java SDK methods, that are as fast as ours, they always have a chance to get faster in later JDKs. So we should always prefer Java SDK methods, unless they are slower because their default impl is too generic or has too much safety checks or uses reflection. OK I'm convinced (I think!) that for primitive types only, let's use Arrays.copyOf! To come back to UnsafeByteArrayOutputStream: I would change the whole code, as I don’t like the allocation strategy in it (it's exponential, on every grow it doubles its size). We should change that to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar allocation strategy like in trunk. Then we can discuss about this problem again when Simon me wants to change ArrayUtils.grow methods to use Arrays.copyOf... *g* [just joking, I will never ask again, because this discussion here is endless and does not bring us forward]. Well, it sounds like for primitive types, we can cutover ArrayUtils.grow methods. Then we can look @ the nightly bench the next day ;) But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it (almost) a dup of ByteArrayDataOutput? The other thing I don’t like in the new faceting module is duplication of vint code. Why don’t we change it to use DataInput/DataOutput and use Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be much more streamlined with all the code we currently have. Then we can encode the payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? Or maybe add a ByteArrayDataOutput class. That sounds good! Uwe can you commit TODOs to the code w/ these ideas? Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2623) Solr JMX MBeans do not survive core reloads
[ https://issues.apache.org/jira/browse/SOLR-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058626#comment-13058626 ] Hoss Man commented on SOLR-2623: Grr... right, right. ObjectInstance != MBean. Solr JMX MBeans do not survive core reloads --- Key: SOLR-2623 URL: https://issues.apache.org/jira/browse/SOLR-2623 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 1.4, 1.4.1, 3.1, 3.2 Reporter: Alexey Serba Assignee: Shalin Shekhar Mangar Priority: Minor Attachments: SOLR-2623.patch, SOLR-2623.patch, SOLR-2623.patch Solr JMX MBeans do not survive core reloads {noformat:title=Steps to reproduce} sh cd example sh vi multicore/core0/conf/solrconfig.xml # enable jmx sh java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar start.jar sh echo 'open 8842 # 8842 is java pid domain solr/core0 beans ' | java -jar jmxterm-1.0-alpha-4-uber.jar solr/core0:id=core0,type=core solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler ... solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler sh curl 'http://localhost:8983/solr/admin/cores?action=RELOADcore=core0' sh echo 'open 8842 # 8842 is java pid domain solr/core0 beans ' | java -jar jmxterm-1.0-alpha-4-uber.jar # there's only one bean left after Solr core reload solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 main {noformat} The root cause of this is Solr core reload behavior: # create new core (which overwrites existing registered MBeans) # register new core and close old one (we remove/un-register MBeans on oldCore.close) The correct sequence is: # unregister MBeans from old core # create and register new core # close old core without touching MBeans -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler resolved SOLR-2630. - Resolution: Fixed Committed trunk revision: 1141999 Committed 3.x revision: 1142003 Thanks Upayavira, the idea is great and also of use for myself (if PANGAEA/panFMP moves to Solr, but since we have facetting now in Lucene I don't think we will do this step)! XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.3, 4.0 Reporter: Upayavira Assignee: Uwe Schindler Priority: Minor Fix For: 3.4, 4.0 Attachments: xslt-update-handler.patch, xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058679#comment-13058679 ] Hoss Man commented on SOLR-2630: Hmmm... from a user perspective does it really make sense for this to be an entirely new RequestHandler? wouldn't it make more sense if users could just continue to use XmlUpdateRequestHandler along with a tr param indicating the transform to apply first? XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.3, 4.0 Reporter: Upayavira Assignee: Uwe Schindler Priority: Minor Fix For: 3.4, 4.0 Attachments: xslt-update-handler.patch, xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058681#comment-13058681 ] Uwe Schindler commented on SOLR-2630: - I was thinking about that, it would be easy to implement as the current code would simply be moved to XMLLoader? Should I add patch relative to whats currently committed? XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.3, 4.0 Reporter: Upayavira Assignee: Uwe Schindler Priority: Minor Fix For: 3.4, 4.0 Attachments: xslt-update-handler.patch, xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058690#comment-13058690 ] Uwe Schindler commented on SOLR-2630: - On the other hand, this one is similar to XSLTResponseWriter which also is separate to XMLResponseWriter. XMLResponseWriter could also take an optional tr param and then transform? So the current solution is more consistent. XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.3, 4.0 Reporter: Upayavira Assignee: Uwe Schindler Priority: Minor Fix For: 3.4, 4.0 Attachments: xslt-update-handler.patch, xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058705#comment-13058705 ] Upayavira commented on SOLR-2630: - I considered the same thing, making the XmlUpdateRequestHandler accept tr, but opted not to for the same reason as Uwe. Which ever way, consistency is a good thing! XsltUpdateRequestHandler Key: SOLR-2630 URL: https://issues.apache.org/jira/browse/SOLR-2630 Project: Solr Issue Type: Improvement Components: update Affects Versions: 3.3, 4.0 Reporter: Upayavira Assignee: Uwe Schindler Priority: Minor Fix For: 3.4, 4.0 Attachments: xslt-update-handler.patch, xslt-update-handler.patch An update request handler that can accept a tr param, allowing the indexing of any XML content that is passed to solr, so long as there is an XSLT stylesheet in solr/conf/xslt that can transform it to the adddoc//add format. Could be used, for example, to allow Solr to ingest docbook directly, without any preprocessing. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module
[ https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058730#comment-13058730 ] Hoss Man commented on LUCENE-3272: -- single module != single jar ... correct? someone writing a small form factor app that wants to use the basic Lucene QueryParser shouldn't have to load a jar containing every query parser provided by solr (and all of the dependencies they have) Consolidate Lucene's QueryParsers into a module --- Key: LUCENE-3272 URL: https://issues.apache.org/jira/browse/LUCENE-3272 Project: Lucene - Java Issue Type: Improvement Components: modules/queryparser Reporter: Chris Male Lucene has a lot of QueryParsers and we should have them all in a single consistent place. The following are QueryParsers I can find that warrant moving to the new module: - Lucene Core's QueryParser - AnalyzingQueryParser - ComplexPhraseQueryParser - ExtendableQueryParser - Surround's QueryParser - PrecedenceQueryParser - StandardQueryParser - XML-Query-Parser's CoreParser All seem to do a good job at their kind of parsing with extensive tests. One challenge of consolidating these is that many tests use Lucene Core's QueryParser. One option is to just replicate this class in src/test and call it TestingQueryParser. Another option is to convert all tests over to programmatically building their queries (seems like alot of work). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
@Michael - Should that list be in JIRA? It would be easier to manage, I think... If yes, I'll happily do it. -r On Fri, Jul 1, 2011 at 8:04 AM, Michael Herndon mhern...@wickedsoftware.net wrote: * need to document what the build script does. whut grammerz? On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon mhern...@wickedsoftware.net wrote: @Rory, @All, The only tickets I currently have for those is LUCENE-419, LUCENE-418 418, I should be able to push into the 2.9.4g branch tonight.419 is a long term goal and not as important as getting the tests fixed, of have the tests broken down into what is actually a unit test, functional test, perf or long running test. I can get into more why it needs to be done. I'll also need to make document the what build script currently does on the wiki and make a few notes about testing, like using the RAMDirectory, etc. Things that need to get done or even be discussed. * There needs to be a running list of things to do/not to do with testing. I don't know if this goes in a jira or do we keep a running list on the wiki or site for people to pick up and help with. * Tests need to run on mono and not Fail (there is a good deal of failing tests on mono, mostly due to the temp directory have the C:\ in the path). * Assert.ThrowExceptionType() needs to be used instead of Try/Catch Assert.Fail. ** * File Path combines to the temp directory need helper methods, * e,g, having this in a hundred places is bad new System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get(tempDir, ), testIndex)); * We should still be testing deprecated methods, but we need to use #pragma warning disable/enable 0618 for testing those. otherwise compiler warnings are too numerous to be anywhere near helpful. * We should only be using deprecated methods in places where they are being explicitly tested, other tests that need that functionality in order to validate those tests should be re factored to use methods that are not deprecated. * Identify code that could be abstracted into test utility classes. * Infrastructure Validation tests need to be made, anything that seems like infrastructure. e.g. does the temp directory exist, does the folders that the tests use inside the temp directory exist, can we read/write to those folders. (if a ton of tests fail due to the file system, we should be able to point out that it was due to permissions or missing folders, files, etc). * Identify what classes need an interface, abstract class or inherited in order to create testing mocks. (once those classes are created, they should be documented in the wiki). ** Asset.Throws needs to replace stuff like the following. We should also be checking the messages for exceptions and make sure they make sense and can help users fix isses if the exceptions are aimed at the library users. try { d = DateTools.StringToDate(97); // no date Assert.Fail(); } catch (System.FormatException e) { /* expected exception */ } On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire codekai...@gmail.com wrote: So, veering towards action - are there concrete tasks written up anywhere for the unit tests? If a poor schlep like me wanted to dig in and start to improve them, where would I get the understanding of what is good and what needs help? -r On Thu, Jun 30, 2011 at 3:29 PM, Digy digyd...@gmail.com wrote: I can not say I like this approach, but till we find an automated way(with good results), it seems to be the only way we can use. DIGY -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Friday, July 01, 2011 12:43 AM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed? Scott - The idea of the automated port is still worth doing. Perhaps it makes sense for someone more passionate about the line-by-line idea to do that work? I would say, focus on what makes sense to you. Being productive, regardless of the specific direction, is what will be most valuable. Once you start, others will join and momentum will build. That is how these things work. I like DIGY's approach too, but the problem with it is that it is a never-ending manual task. The theory behind the automated port is that it may reduce the manual work. It is complicated, but once it's built and works, it will save a lot of future development hours. If it's built in a sufficiently general manner, it could be useful for other project like Lucene.Net that want to automate a port from Java to C#. It might make sense for that to be a separate project from Lucene.Net though. -T On Thu, Jun 30, 2011 at 2:13 PM, Scott Lombard lombardena...@gmail.com wrote: Ok I think I asked the wrong
Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
I think whatever makes sense to do. possibly create one jira for now with a running list that can be modified and possibly as people pull from that list, cross things off or create a separate ticket that links back to to the main one. On Fri, Jul 1, 2011 at 3:35 PM, Rory Plaire codekai...@gmail.com wrote: @Michael - Should that list be in JIRA? It would be easier to manage, I think... If yes, I'll happily do it. -r On Fri, Jul 1, 2011 at 8:04 AM, Michael Herndon mhern...@wickedsoftware.net wrote: * need to document what the build script does. whut grammerz? On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon mhern...@wickedsoftware.net wrote: @Rory, @All, The only tickets I currently have for those is LUCENE-419, LUCENE-418 418, I should be able to push into the 2.9.4g branch tonight.419 is a long term goal and not as important as getting the tests fixed, of have the tests broken down into what is actually a unit test, functional test, perf or long running test. I can get into more why it needs to be done. I'll also need to make document the what build script currently does on the wiki and make a few notes about testing, like using the RAMDirectory, etc. Things that need to get done or even be discussed. * There needs to be a running list of things to do/not to do with testing. I don't know if this goes in a jira or do we keep a running list on the wiki or site for people to pick up and help with. * Tests need to run on mono and not Fail (there is a good deal of failing tests on mono, mostly due to the temp directory have the C:\ in the path). * Assert.ThrowExceptionType() needs to be used instead of Try/Catch Assert.Fail. ** * File Path combines to the temp directory need helper methods, * e,g, having this in a hundred places is bad new System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get(tempDir, ), testIndex)); * We should still be testing deprecated methods, but we need to use #pragma warning disable/enable 0618 for testing those. otherwise compiler warnings are too numerous to be anywhere near helpful. * We should only be using deprecated methods in places where they are being explicitly tested, other tests that need that functionality in order to validate those tests should be re factored to use methods that are not deprecated. * Identify code that could be abstracted into test utility classes. * Infrastructure Validation tests need to be made, anything that seems like infrastructure. e.g. does the temp directory exist, does the folders that the tests use inside the temp directory exist, can we read/write to those folders. (if a ton of tests fail due to the file system, we should be able to point out that it was due to permissions or missing folders, files, etc). * Identify what classes need an interface, abstract class or inherited in order to create testing mocks. (once those classes are created, they should be documented in the wiki). ** Asset.Throws needs to replace stuff like the following. We should also be checking the messages for exceptions and make sure they make sense and can help users fix isses if the exceptions are aimed at the library users. try { d = DateTools.StringToDate(97); // no date Assert.Fail(); } catch (System.FormatException e) { /* expected exception */ } On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire codekai...@gmail.com wrote: So, veering towards action - are there concrete tasks written up anywhere for the unit tests? If a poor schlep like me wanted to dig in and start to improve them, where would I get the understanding of what is good and what needs help? -r On Thu, Jun 30, 2011 at 3:29 PM, Digy digyd...@gmail.com wrote: I can not say I like this approach, but till we find an automated way(with good results), it seems to be the only way we can use. DIGY -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Friday, July 01, 2011 12:43 AM To: lucene-net-...@lucene.apache.org Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed? Scott - The idea of the automated port is still worth doing. Perhaps it makes sense for someone more passionate about the line-by-line idea to do that work? I would say, focus on what makes sense to you. Being productive, regardless of the specific direction, is what will be most valuable. Once you start, others will join and momentum will build. That is how these things work. I like DIGY's approach too, but the problem with it is that it is a never-ending manual task. The theory behind the automated port is that it may reduce the manual work. It is complicated, but
[jira] [Commented] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module
[ https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058769#comment-13058769 ] Robert Muir commented on LUCENE-3272: - single jar, but you can customize: its open source. Hoss I think you are looking at this the wrong way: this actually makes it way easier for someone writing a small form factor app that uses no query parser at all, or their own queryparser, or whatever. we should do this to make the lucene core smaller, and then you plug in the modules you need (and maybe only selected parts from them, but thats your call). I don't think we need to provide X * Y * Z possibilities, nor do we need to provide 87 jar files. But, this is just rehashing LUCENE-2323, where we already had this conversation. I think at the least we should put all these QPs into one place to make refactoring between them easier. Then we make a smaller amount of code for these small form factor apps you are so concerned about, with the messy duplication this is not possible now. I still stand by my comments in LUCENE-2323, and guess what, turns out I think I was right. LUCENE-1938 then refactored one of these queryparsers, removing 4000 lines of code but keeping the same functionality. Consolidate Lucene's QueryParsers into a module --- Key: LUCENE-3272 URL: https://issues.apache.org/jira/browse/LUCENE-3272 Project: Lucene - Java Issue Type: Improvement Components: modules/queryparser Reporter: Chris Male Lucene has a lot of QueryParsers and we should have them all in a single consistent place. The following are QueryParsers I can find that warrant moving to the new module: - Lucene Core's QueryParser - AnalyzingQueryParser - ComplexPhraseQueryParser - ExtendableQueryParser - Surround's QueryParser - PrecedenceQueryParser - StandardQueryParser - XML-Query-Parser's CoreParser All seem to do a good job at their kind of parsing with extensive tests. One challenge of consolidating these is that many tests use Lucene Core's QueryParser. One option is to just replicate this class in src/test and call it TestingQueryParser. Another option is to convert all tests over to programmatically building their queries (seems like alot of work). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[Lucene.Net] [jira] [Updated] (LUCENENET-400) Evaluate tooling for continuous integration server
[ https://issues.apache.org/jira/browse/LUCENENET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael herndon updated LUCENENET-400: -- Due Date: 30/Sep/11 (was: 28/Feb/11) Evaluate tooling for continuous integration server -- Key: LUCENENET-400 URL: https://issues.apache.org/jira/browse/LUCENENET-400 Project: Lucene.Net Issue Type: Task Components: Build Automation, Project Infrastructure Reporter: Troy Howard Assignee: michael herndon We would like to have a CI server setup for Lucene.Net. It has been suggested to do this outside of the ASF infrastructure, but this would not work for ASF. Please review the available options at http://ci.apache.org/ and evaluate which CI server system would be preferred for our setup. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[Lucene.Net] [jira] [Commented] (LUCENENET-418) LuceneTestCase should not have a static method could throw exceptions.
[ https://issues.apache.org/jira/browse/LUCENENET-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058781#comment-13058781 ] michael herndon commented on LUCENENET-418: --- r1132085 under the Lucene.Net_2_9_4g branch. The exception was removed. The static constructor still exists, but will be re-factored out at a later date. The paths for the TestBackwardsCompatability tests were also fixed. LuceneTestCase should not have a static method could throw exceptions. Key: LUCENENET-418 URL: https://issues.apache.org/jira/browse/LUCENENET-418 Project: Lucene.Net Issue Type: Bug Components: Lucene.Net Test Affects Versions: Lucene.Net 3.x Environment: Linux, OSX, etc Reporter: michael herndon Assignee: michael herndon Labels: test Original Estimate: 2m Remaining Estimate: 2m Throwing an exception in a base classes for 90% tests in a static method makes it hard to debug the issue in nunit. The test results came back saying that TestFixtureSetup was causing an issue even though it was the Static Constructor causing problems and this then propagates to all the tests that stem from LuceneTestCase. The TEMP_DIR needs to be moved to a static util class as a property or even a mixin method. This caused me hours to debug and figure out the real issue as the underlying exception method never bubbled up. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (SOLR-2631) PingRequestHandler can infinite loop if called with a qt that points to itsself
PingRequestHandler can infinite loop if called with a qt that points to itsself --- Key: SOLR-2631 URL: https://issues.apache.org/jira/browse/SOLR-2631 Project: Solr Issue Type: Bug Components: search, web gui Affects Versions: 3.2, 3.1, 1.4, 3.3 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.4, 4.0 We got a security report to priv...@lucene.apache.org, that Solr can infinite loop, use 100% CPU and stack overflow, if you execute the following HTTP request: - http://localhost:8983/solr/select?qt=/admin/ping - http://localhost:8983/admin/ping?qt=/admin/ping The qt paramter instructs PingRequestHandler to call the given request handler. This leads to an infinite loop. This is not an security issue, but for an unprotected Solr server with unprotected /solr/select path this makes it stop working. The fix is to prevent infinite loop by disallowing calling itsself. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2631) PingRequestHandler can infinite loop if called with a qt that points to itsself
[ https://issues.apache.org/jira/browse/SOLR-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058785#comment-13058785 ] Uwe Schindler commented on SOLR-2631: - Edoardo Tosca, who reported the issue, gave the following workaround for solrconfig.xml to fix this by configuration: {quote} Ok, to solve the Ping problem you can add an invariant: lst name=defaults str name=qsolrpingquery/str str name=echoParamsall/str /lst lst name=invariants str name=qtsearch/str /lst in this case you avoid generating recursive calls to /admin/ping handler Edo {quote} PingRequestHandler can infinite loop if called with a qt that points to itsself --- Key: SOLR-2631 URL: https://issues.apache.org/jira/browse/SOLR-2631 Project: Solr Issue Type: Bug Components: search, web gui Affects Versions: 1.4, 3.1, 3.2, 3.3 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.4, 4.0 We got a security report to priv...@lucene.apache.org, that Solr can infinite loop, use 100% CPU and stack overflow, if you execute the following HTTP request: - http://localhost:8983/solr/select?qt=/admin/ping - http://localhost:8983/admin/ping?qt=/admin/ping The qt paramter instructs PingRequestHandler to call the given request handler. This leads to an infinite loop. This is not an security issue, but for an unprotected Solr server with unprotected /solr/select path this makes it stop working. The fix is to prevent infinite loop by disallowing calling itsself. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2631) PingRequestHandler can infinite loop if called with a qt that points to itsself
[ https://issues.apache.org/jira/browse/SOLR-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated SOLR-2631: Description: We got a security report to priv...@lucene.apache.org, that Solr can infinite loop, use 100% CPU and stack overflow, if you execute the following HTTP request: - http://localhost:8983/solr/select?qt=/admin/ping - http://localhost:8983/solr/admin/ping?qt=/admin/ping The qt paramter instructs PingRequestHandler to call the given request handler. This leads to an infinite loop. This is not an security issue, but for an unprotected Solr server with unprotected /solr/select path this makes it stop working. The fix is to prevent infinite loop by disallowing calling itsself. was: We got a security report to priv...@lucene.apache.org, that Solr can infinite loop, use 100% CPU and stack overflow, if you execute the following HTTP request: - http://localhost:8983/solr/select?qt=/admin/ping - http://localhost:8983/admin/ping?qt=/admin/ping The qt paramter instructs PingRequestHandler to call the given request handler. This leads to an infinite loop. This is not an security issue, but for an unprotected Solr server with unprotected /solr/select path this makes it stop working. The fix is to prevent infinite loop by disallowing calling itsself. PingRequestHandler can infinite loop if called with a qt that points to itsself --- Key: SOLR-2631 URL: https://issues.apache.org/jira/browse/SOLR-2631 Project: Solr Issue Type: Bug Components: search, web gui Affects Versions: 1.4, 3.1, 3.2, 3.3 Reporter: Uwe Schindler Assignee: Uwe Schindler Labels: security Fix For: 3.4, 4.0 Attachments: SOLR-2631.patch We got a security report to priv...@lucene.apache.org, that Solr can infinite loop, use 100% CPU and stack overflow, if you execute the following HTTP request: - http://localhost:8983/solr/select?qt=/admin/ping - http://localhost:8983/solr/admin/ping?qt=/admin/ping The qt paramter instructs PingRequestHandler to call the given request handler. This leads to an infinite loop. This is not an security issue, but for an unprotected Solr server with unprotected /solr/select path this makes it stop working. The fix is to prevent infinite loop by disallowing calling itsself. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-2429) ability to not cache a filter
[ https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yonik Seeley resolved SOLR-2429. Resolution: Fixed Fix Version/s: 3.4 ability to not cache a filter - Key: SOLR-2429 URL: https://issues.apache.org/jira/browse/SOLR-2429 Project: Solr Issue Type: New Feature Reporter: Yonik Seeley Fix For: 3.4 Attachments: SOLR-2429.patch A user should be able to add {!cache=false} to a query or filter query. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage
[ https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058836#comment-13058836 ] Mitsu Hadeishi commented on SOLR-2462: -- Oh now you tell us. :) Well, we already built the patched 3.2 so we're going with that for now :) Using spellcheck.collate can result in extremely high memory usage -- Key: SOLR-2462 URL: https://issues.apache.org/jira/browse/SOLR-2462 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 3.1 Reporter: James Dyer Assignee: Robert Muir Priority: Critical Fix For: 3.3, 4.0 Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch When using spellcheck.collate, class SpellPossibilityIterator creates a ranked list of *every* possible correction combination. But if returning several corrections per term, and if several words are misspelled, the existing algorithm uses a huge amount of memory. This bug was introduced with SOLR-2010. However, it is triggered anytime spellcheck.collate is used. It is not necessary to use any features that were added with SOLR-2010. We were in Production with Solr for 1 1/2 days and this bug started taking our Solr servers down with infinite GC loops. It was pretty easy for this to happen as occasionally a user will accidently paste the URL into the Search box on our app. This URL results in a search with ~12 misspelled words. We have spellcheck.count set to 15. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058844#comment-13058844 ] Stefan Matheis (steffkes) commented on SOLR-2399: - So, after a few hours hacking .. it's hopefully a step into the right direction for the Analysis-Page! : Please, have a look and let me know, what you're thinking. I've changed various things: * Vertical Separation should be more clear now (Index- vs. Query-Time) * Filter- Tokenizer-Names are placed on the left Side (so it should be easier to follow each token through all the steps, Full Name on MouseOver) * Property-Names are not longer abbreviated * All Properties (except {{match}} and {{positionHistory}}) are displayed * If the Property-Name contains a #-Sign, only the latter part is displayed (Full Name on MouseOver) Uwe, maybe you could give it a try w/ lucene-gosen? These are the [required changes|https://github.com/steffkes/solr-admin/commit/ddb1e0098efc2ef48082e43ed57e4b62b23ba6d7] (since the last svn-commit). || ||Version 1980||First Try||Current Page|| ||Normal|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_cur.png]|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_01.png]|*[Screenshot|http://files.mathe.is/solr-admin/04_analysis.png]*| ||Verbose|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_verbose_cur.png]|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_verbose_01.png]|*[Screenshot|http://files.mathe.is/solr-admin/04_analysis_verbose.png]*| Stefan Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ
[ https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058848#comment-13058848 ] Lance Norskog commented on SOLR-1499: - Ahmet- are you still using this? SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ - Key: SOLR-1499 URL: https://issues.apache.org/jira/browse/SOLR-1499 Project: Solr Issue Type: New Feature Components: contrib - DataImportHandler Reporter: Lance Norskog Fix For: 3.3 Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch The SolrEntityProcessor queries an external Solr instance. The Solr documents returned are unpacked and emitted as DIH fields. The SolrEntityProcessor uses the following attributes: * solr='http://localhost:8983/solr/sms' ** This gives the URL of the target Solr instance. *** Note: the connection to the target Solr uses the binary SolrJ format. * query='Jeffersonsort=id+asc' ** This gives the base query string use with Solr. It can include any standard Solr request parameter. This attribute is processed under the variable resolution rules and can be driven in an inner stage of the indexing pipeline. * rows='10' ** This gives the number of rows to fetch per request.. ** The SolrEntityProcessor always fetches every document that matches the request.. * fields='id,tag' ** This selects the fields to be returned from the Solr request. ** These must also be declared as field elements. ** As with all fields, template processors can be used to alter the contents to be passed downwards. * timeout='30' ** This limits the query to 5 seconds. This can be used as a fail-safe to prevent the indexing session from freezing up. By default the timeout is 5 minutes. Limitations: * Solr errors are not handled correctly. * Loop control constructs have not been tested. * Multi-valued returned fields have not been tested. The unit tests give examples of how to use it as the root entity and an inner entity. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058865#comment-13058865 ] Stefan Matheis (steffkes) commented on SOLR-2399: - First Feedback from #solr: {quote}hossman_ in the new ones, it's easy to overlook that sonic and viewsonic are at the same position hossman_ steffkes: i think what i would suggest is to keep your new layout, treat position as special and put some sort of visual indicator on terms that are at the same position{quote} {quote}hossman_ oh yeah ... i ment to ask about that ... i'm assuming you look at the capital letters in the class name? hossman_ i think it's an assume way to save space, definitely a good idea for verbose==false (as long as you can mouse over it or something to see the full name) hossman_ for verbose==true ... not sure{quote} {quote}_regarding the two-column-layout / Index- vs. Query-Time_: elyograg I see now. I think they might need headers to indicate which is which. Not strictly required if your screen is wide enough, but if it wraps below, it may not be immediately apparent.{quote} Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked
[ https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058879#comment-13058879 ] Robert Muir commented on SOLR-2399: --- {quote} Uwe, maybe you could give it a try w/ lucene-gosen? These are the required changes (since the last svn-commit). {quote} If Uwe doesn't have the time, I'll try to investigate this in the next few days, once I stop laughing about Version 1980. we have a version that works with trunk here, https://lucene-gosen.googlecode.com/svn/branches/4x Solr Admin Interface, reworked -- Key: SOLR-2399 URL: https://issues.apache.org/jira/browse/SOLR-2399 Project: Solr Issue Type: Improvement Components: web gui Reporter: Stefan Matheis (steffkes) Assignee: Ryan McKinley Priority: Minor Fix For: 4.0 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, SOLR-2399-110606.patch, SOLR-2399-110622.patch, SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, SOLR-2399-wip-notice.patch, SOLR-2399.patch *The idea was to create a new, fresh (and hopefully clean) Solr Admin Interface.* [Based on this [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]] *Features:* * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png] * [Query-Form|http://files.mathe.is/solr-admin/02_query.png] * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png] * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, SOLR-2400) * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png] * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482) * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png] * [Replication|http://files.mathe.is/solr-admin/10_replication.png] * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png] * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459) ** Stub (using static data) Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI I've quickly created a Github-Repository (Just for me, to keep track of the changes) » https://github.com/steffkes/solr-admin -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk - Build # 1612 - Still Failing
Build: https://builds.apache.org/job/Lucene-trunk/1612/ No tests ran. Build Log (for compile errors): [...truncated 9445 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
My thinking is just a separate ticket for each one. This makes the work easier to manage and gives a better sense about how much work is left as well as makes it easier to prioritize independent issues. We could link all the sub-issues to a single task / feature / whatever (that is, if JIRA has that capability). -r On Fri, Jul 1, 2011 at 12:48 PM, Michael Herndon mhern...@wickedsoftware.net wrote: I think whatever makes sense to do. possibly create one jira for now with a running list that can be modified and possibly as people pull from that list, cross things off or create a separate ticket that links back to to the main one. On Fri, Jul 1, 2011 at 3:35 PM, Rory Plaire codekai...@gmail.com wrote: @Michael - Should that list be in JIRA? It would be easier to manage, I think... If yes, I'll happily do it. -r On Fri, Jul 1, 2011 at 8:04 AM, Michael Herndon mhern...@wickedsoftware.net wrote: * need to document what the build script does. whut grammerz? On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon mhern...@wickedsoftware.net wrote: @Rory, @All, The only tickets I currently have for those is LUCENE-419, LUCENE-418 418, I should be able to push into the 2.9.4g branch tonight.419 is a long term goal and not as important as getting the tests fixed, of have the tests broken down into what is actually a unit test, functional test, perf or long running test. I can get into more why it needs to be done. I'll also need to make document the what build script currently does on the wiki and make a few notes about testing, like using the RAMDirectory, etc. Things that need to get done or even be discussed. * There needs to be a running list of things to do/not to do with testing. I don't know if this goes in a jira or do we keep a running list on the wiki or site for people to pick up and help with. * Tests need to run on mono and not Fail (there is a good deal of failing tests on mono, mostly due to the temp directory have the C:\ in the path). * Assert.ThrowExceptionType() needs to be used instead of Try/Catch Assert.Fail. ** * File Path combines to the temp directory need helper methods, * e,g, having this in a hundred places is bad new System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get(tempDir, ), testIndex)); * We should still be testing deprecated methods, but we need to use #pragma warning disable/enable 0618 for testing those. otherwise compiler warnings are too numerous to be anywhere near helpful. * We should only be using deprecated methods in places where they are being explicitly tested, other tests that need that functionality in order to validate those tests should be re factored to use methods that are not deprecated. * Identify code that could be abstracted into test utility classes. * Infrastructure Validation tests need to be made, anything that seems like infrastructure. e.g. does the temp directory exist, does the folders that the tests use inside the temp directory exist, can we read/write to those folders. (if a ton of tests fail due to the file system, we should be able to point out that it was due to permissions or missing folders, files, etc). * Identify what classes need an interface, abstract class or inherited in order to create testing mocks. (once those classes are created, they should be documented in the wiki). ** Asset.Throws needs to replace stuff like the following. We should also be checking the messages for exceptions and make sure they make sense and can help users fix isses if the exceptions are aimed at the library users. try { d = DateTools.StringToDate(97); // no date Assert.Fail(); } catch (System.FormatException e) { /* expected exception */ } On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire codekai...@gmail.com wrote: So, veering towards action - are there concrete tasks written up anywhere for the unit tests? If a poor schlep like me wanted to dig in and start to improve them, where would I get the understanding of what is good and what needs help? -r On Thu, Jun 30, 2011 at 3:29 PM, Digy digyd...@gmail.com wrote: I can not say I like this approach, but till we find an automated way(with good results), it seems to be the only way we can use. DIGY -Original Message- From: Troy Howard [mailto:thowar...@gmail.com] Sent: Friday, July 01, 2011 12:43 AM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed? Scott - The idea of the automated port is still worth doing. Perhaps it makes
[Lucene.Net] [jira] [Commented] (LUCENENET-404) Improve brand logo design
[ https://issues.apache.org/jira/browse/LUCENENET-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058761#comment-13058761 ] Troy Howard commented on LUCENENET-404: --- Just a quick update. The artist is making some final edits before we commit. Will post them soon. I'll attach examples. Improve brand logo design - Key: LUCENENET-404 URL: https://issues.apache.org/jira/browse/LUCENENET-404 Project: Lucene.Net Issue Type: Sub-task Components: Project Infrastructure Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Labels: branding, logo The existing Lucene.Net logo leaves a lot to be desired. We'd like a new logo that is modern and well designed. To implement this, Troy is coordinating with StackOverflow/StackExchange to manage a logo design contest, the results of which will be our new logo design. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
@Rory, Jira in the past had the ability to create sub tasks or convert a current task to a sub task. I'm guessing that apache's jira should be able to do that. @All, I've add a Paths Class under the Lucene.Net.Tests Util folder (feel free to rename, refactor as long as the tests still work) to help with working with paths in general. This should help with forming paths relative to the temp directory or the project root. NUnit's Shadow Copy tends to throw people off when trying to get the path of the current assembly being tested to build up a relative path, the Path class should help with that. - Michael On Fri, Jul 1, 2011 at 4:09 PM, Rory Plaire codekai...@gmail.com wrote: My thinking is just a separate ticket for each one. This makes the work easier to manage and gives a better sense about how much work is left as well as makes it easier to prioritize independent issues. We could link all the sub-issues to a single task / feature / whatever (that is, if JIRA has that capability). -r On Fri, Jul 1, 2011 at 12:48 PM, Michael Herndon mhern...@wickedsoftware.net wrote: I think whatever makes sense to do. possibly create one jira for now with a running list that can be modified and possibly as people pull from that list, cross things off or create a separate ticket that links back to to the main one. On Fri, Jul 1, 2011 at 3:35 PM, Rory Plaire codekai...@gmail.com wrote: @Michael - Should that list be in JIRA? It would be easier to manage, I think... If yes, I'll happily do it. -r On Fri, Jul 1, 2011 at 8:04 AM, Michael Herndon mhern...@wickedsoftware.net wrote: * need to document what the build script does. whut grammerz? On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon mhern...@wickedsoftware.net wrote: @Rory, @All, The only tickets I currently have for those is LUCENE-419, LUCENE-418 418, I should be able to push into the 2.9.4g branch tonight. 419 is a long term goal and not as important as getting the tests fixed, of have the tests broken down into what is actually a unit test, functional test, perf or long running test. I can get into more why it needs to be done. I'll also need to make document the what build script currently does on the wiki and make a few notes about testing, like using the RAMDirectory, etc. Things that need to get done or even be discussed. * There needs to be a running list of things to do/not to do with testing. I don't know if this goes in a jira or do we keep a running list on the wiki or site for people to pick up and help with. * Tests need to run on mono and not Fail (there is a good deal of failing tests on mono, mostly due to the temp directory have the C:\ in the path). * Assert.ThrowExceptionType() needs to be used instead of Try/Catch Assert.Fail. ** * File Path combines to the temp directory need helper methods, * e,g, having this in a hundred places is bad new System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get(tempDir, ), testIndex)); * We should still be testing deprecated methods, but we need to use #pragma warning disable/enable 0618 for testing those. otherwise compiler warnings are too numerous to be anywhere near helpful. * We should only be using deprecated methods in places where they are being explicitly tested, other tests that need that functionality in order to validate those tests should be re factored to use methods that are not deprecated. * Identify code that could be abstracted into test utility classes. * Infrastructure Validation tests need to be made, anything that seems like infrastructure. e.g. does the temp directory exist, does the folders that the tests use inside the temp directory exist, can we read/write to those folders. (if a ton of tests fail due to the file system, we should be able to point out that it was due to permissions or missing folders, files, etc). * Identify what classes need an interface, abstract class or inherited in order to create testing mocks. (once those classes are created, they should be documented in the wiki). ** Asset.Throws needs to replace stuff like the following. We should also be checking the messages for exceptions and make sure they make sense and can help users fix isses if the exceptions are aimed at the library users. try { d = DateTools.StringToDate(97); // no date Assert.Fail(); } catch (System.FormatException e) { /* expected exception */ } On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire codekai...@gmail.com wrote: So, veering towards
[Lucene.Net] [jira] [Work started] (LUCENENET-404) Improve brand logo design
[ https://issues.apache.org/jira/browse/LUCENENET-404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on LUCENENET-404 started by Troy Howard. Improve brand logo design - Key: LUCENENET-404 URL: https://issues.apache.org/jira/browse/LUCENENET-404 Project: Lucene.Net Issue Type: Sub-task Components: Project Infrastructure Reporter: Troy Howard Assignee: Troy Howard Priority: Minor Labels: branding, logo Attachments: lucene-alternates.jpg, lucene-medium.png, lucene-net-logo-display.jpg The existing Lucene.Net logo leaves a lot to be desired. We'd like a new logo that is modern and well designed. To implement this, Troy is coordinating with StackOverflow/StackExchange to manage a logo design contest, the results of which will be our new logo design. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira
[VOTE] Release PyLucene 3.3.0
The PyLucene 3.3.0-1 release closely tracking the recent release of Lucene Java 3.3 is ready. A release candidate is available from: http://people.apache.org/~vajda/staging_area/ A list of changes in this release can be seen at: http://svn.apache.org/repos/asf/lucene/pylucene/branches/pylucene_3_3/CHANGES PyLucene 3.3.0 is built with JCC 2.9 included in these release artifacts. A list of Lucene Java changes can be seen at: http://svn.apache.org/repos/asf/lucene/dev/tags/lucene_solr_3_3/lucene/CHANGES.txt Please vote to release these artifacts as PyLucene 3.3.0-1. Thanks ! Andi.. ps: the KEYS file for PyLucene release signing is at: http://svn.apache.org/repos/asf/lucene/pylucene/dist/KEYS http://people.apache.org/~vajda/staging_area/KEYS pps: here is my +1
Re: calling Python from Java fails...
Andi Vajda va...@apache.org wrote: By the way, you might want to add a paragraph in that section about adding the [-framework, Python] flags for building JCC on OS X. I tripped over that again. If you send a paragraph to this effect, I'll integrate it into the docs. Here's a patch: /local/pylucene 31 % svn diff Index: site/src/documentation/content/xdocs/jcc/documentation/install.xml === --- site/src/documentation/content/xdocs/jcc/documentation/install.xml (revision 1141969) +++ site/src/documentation/content/xdocs/jcc/documentation/install.xml (working copy) @@ -138,7 +138,7 @@ Is JCC built with shared mode support or not ? ul li - By default, on Mac OS X and Windows this is the case + By default, on Mac OS X and Windows this is the case. /li li By default, on Linux, this is the case. @@ -167,6 +167,12 @@ and codeLFLAGS/code for codedarwin/code should be correct and ready to use. /p + p +However, if you intend to use the 'system' Python from a Java VM +on Mac OS X -- Python embedded in Java -- +you will need to add the flags code-framework Python/code +to the codeLFLAGS/code value. + /p /section section id=linux titleNotes for Linux/title Index: site/src/documentation/content/xdocs/jcc/documentation/readme.xml === --- site/src/documentation/content/xdocs/jcc/documentation/readme.xml (revision 1141969) +++ site/src/documentation/content/xdocs/jcc/documentation/readme.xml (working copy) @@ -763,9 +763,12 @@ /p ul li - JCC must be built in shared mode. - See a href=site:jcc/documentation/installinstallation - instructions/a for more information about shared mode. + JCC must be built in shared mode. See + a href=site:jcc/documentation/installinstallation + instructions/a for more information about shared mode. + Note that for this use on Mac OS X, JCC must also be built + with the link flags code-framework Python/code in + the codeLFLAGS/code value. /li li As described in the previous section, define one or more Java @@ -790,14 +793,19 @@ via the code-Djava.library.path/code command line parameter. /li li - In the Java VM's main thread, initialize the Python VM by calling - its static codestart()/code method passing it a Python program - name string and optional start-up arguments in a string array that - will be made accessible in Python via codesys.argv/code. - This method returns the singleton PythonVM instance to be used in - this Java VM. This instance may be retrieved at any later time via - the static codeget()/code method defined on the - codeorg.apache.jcc.PythonVM/code class. + In the Java VM's main thread, initialize the Python VM by + calling its static codestart()/code method passing it a + Python program name string and optional start-up arguments + in a string array that will be made accessible in Python via + codesys.argv/code. Note that the program name string is + purely informational, and is not used by the + codestart()/code code other than to initialize that + Python variable. This method returns the singleton PythonVM + instance to be used in this Java VM. codestart()/code + may be called multiple times; it will always return the same + singleton instance. This instance may also be retrieved at any + later time via the static codeget()/code method defined + on the codeorg.apache.jcc.PythonVM/code class. /li li Any Java VM thread that is going to be calling into the Python VM
Re: calling Python from Java fails...
Andi Vajda va...@apache.org wrote: That being said, if you send in javadoc patches, I agree, the results should be published like they are on the lucene/java site (under resources) and I can take care of that. Here's a patch (against the JCC branch_3x): --- java/org/apache/jcc/PythonVM.java (revision 1141989) +++ java/org/apache/jcc/PythonVM.java (working copy) @@ -23,6 +23,18 @@ System.loadLibrary(jcc); } +/** + * Start the embedded Python interpreter. The specified + * programname and args are set into the Python variable sys.argv. + * This returns an instance of the Python VM; it may be called + * multiple times, and will return the same VM instance each time. + * + * @param programName the name of the Python program, typically + * /usr/bin/python. This is informational; the program is not + * actually executed. + * @param args additional arguments to be put into sys.argv. + * @return a singleton instance of PythonVM + */ static public PythonVM start(String programName, String[] args) { if (vm == null) @@ -34,11 +46,28 @@ return vm; } +/** + * Start the embedded Python interpreter. The specified + * programname is set into the Python variable sys.argv[0]. + * This returns an instance of the Python VM; it may be called + * multiple times, and will return the same VM instance each time. + * + * @param programName the name of the Python program, typically + * /usr/bin/python. This is informational; the program is not + * actually executed. + * @return a singleton instance of PythonVM + */ static public PythonVM start(String programName) { return start(programName, null); } +/** + * Obtain the PythonVM instance, or null if the Python VM + * has not yet been started. + * + * @return a singleton instance of PythonVM, or null + */ static public PythonVM get() { return vm; @@ -50,9 +79,33 @@ protected native void init(String programName, String[] args); +/** + * Instantiate the specified Python class, and return the instance. + * + * @param moduleName the Python module the class is defined in + * @param className the Python class to instantiate. + * @return a handle on the Python instance. + */ public native Object instantiate(String moduleName, String className) throws PythonException; +/** + * Bump the Python thread state counter. Every thread should + * do this before calling into Python, to prevent the Python + * thread state from being inadvertently collected. + * + * @return the Python thread state counter. A return value less + * than zero signals an error. + */ public native int acquireThreadState(); + +/** + * Release the Python thread state counter. Every thread that has + * called acquireThreadState() should call this before + * terminating. + * + * @return the Python thread state counter. A return value less + * than zero signals an error. + */ public native int releaseThreadState(); }