date:20110701

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/9216/

1 tests failed.
REGRESSION:  
org.apache.lucene.facet.util.TestScoredDocIDsUtils.testWithDeletions

Error Message:
Deleted docs must not appear in the allDocsScoredDocIds set

Stack Trace:
junit.framework.AssertionFailedError: Deleted docs must not appear in the 
allDocsScoredDocIds set
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1430)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1348)
at 
org.apache.lucene.facet.util.TestScoredDocIDsUtils.testWithDeletions(TestScoredDocIDsUtils.java:137)




Build Log (for compile errors):
[...truncated 4725 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java

2011-07-01 Thread Uwe Schindler

Hi,

I don't understand the whole discussion here, so please compare these two 
implementations and tell me which one is faster. Please don't hurt me, if you 
don't want to see src.jar code from OpenJDK Java6 - just delete this mail if 
you don’t want to (the code here is licensed under GPL):

Arrays.java:
/**
 * Copies the specified array, truncating or padding with zeros (if 
necessary)
 * so the copy has the specified length.  For all indices that are
 * valid in both the original array and the copy, the two arrays will
 * contain identical values.  For any indices that are valid in the
 * copy but not the original, the copy will contain tt(byte)0/tt.
 * Such indices will exist if and only if the specified length
 * is greater than that of the original array.
 *
 * @param original the array to be copied
 * @param newLength the length of the copy to be returned
 * @return a copy of the original array, truncated or padded with zeros
 * to obtain the specified length
 * @throws NegativeArraySizeException if ttnewLength/tt is negative
 * @throws NullPointerException if ttoriginal/tt is null
 * @since 1.6
 */
public static byte[] copyOf(byte[] original, int newLength) {
byte[] copy = new byte[newLength];
System.arraycopy(original, 0, copy, 0,
 Math.min(original.length, newLength));
return copy;
}


This is our implementation, simon replaced and Robert reverted 
(UnsafeByteArrayOutputStream):

  private void grow(int newLength) {
// It actually should be: (Java 1.7, when its intrinsic on all machines)
// buffer = Arrays.copyOf(buffer, newLength);
byte[] newBuffer = new byte[newLength];
System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
buffer = newBuffer;
  }

So please look at the code, where is a difference that could slow down, except 
the Math.min() call which is an intrinsic in almost every JDK on earth?

The problem we are talking about here is only about the generic Object[] copyOf 
method and also affects e.g. *all* Collection.toArray() methods - they all use 
this code, so whenever you use ArrayList.toArray() or similar, the slow code is 
executed. This is why we replaced Collections.sort() by CollectionUtil.sort, 
that does no array copy. Simon  me were not willing to replace the 
reallocations in FST code (Mike you remember, we reverted that on your GIT repo 
when we did perf tests) and other parts in Lucene (there are only few of them). 
The idea was only to replace primitive type code to make it easier readable. 
And with later JDK code it could even get faster (not slower), if Oracle starts 
to add intrinsics for those new methods (and that’s Dawid and mine reason to 
change to copyTo for primitive types). In general, if you use Java SDK methods, 
that are as fast as ours, they always have a chance to get faster in later 
JDKs. So we should always prefer Java SDK methods, unless they are slower 
because their default impl is too generic or has too much safety checks or uses 
reflection.


To come back to UnsafeByteArrayOutputStream:

I would change the whole code, as I don’t like the allocation strategy in it 
(it's exponential, on every grow it doubles its size). We should change that to 
use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar allocation 
strategy like in trunk. Then we can discuss about this problem again when Simon 
 me wants to change ArrayUtils.grow methods to use Arrays.copyOf... *g* [just 
joking, I will never ask again, because this discussion here is endless and 
does not bring us forward].

The other thing I don’t like in the new faceting module is duplication of vint 
code. Why don’t we change it to use DataInput/DataOutput and use Dawid's new 
In/OutStream wrapper for DataOutput everywhere. This would be much more 
streamlined with all the code we currently have. Then we can encode the 
payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, 
wrapped with a OutputStreamDataOutput wrapper? Or maybe add a 
ByteArrayDataOutput class.

Uwe
(getting crazy)

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Simon Willnauer [mailto:simon.willna...@googlemail.com]
 Sent: Friday, July 01, 2011 7:47 AM
 To: Michael McCandless
 Cc: dev@lucene.apache.org; Dawid Weiss
 Subject: Re: svn commit: r1141510 -
 /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/Unsafe
 ByteArrayOutputStream.java
 
 On Fri, Jul 1, 2011 at 12:19 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
  On Thu, Jun 30, 2011 at 4:45 PM, Simon Willnauer
  simon.willna...@googlemail.com wrote:
  On Thu, Jun 30, 2011 at 8:50 PM, Dawid Weiss
  dawid.we...@cs.put.poznan.pl wrote:
  I don't seen any evidence that this is any slower though.
 
  You need to run with -client (if the machine is a beast this is
  tricky because x64

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9221/

11 tests failed.
FAILED:  org.apache.lucene.util.TestFieldCacheSanityChecker.testIndexAndMerge

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at java.lang.Thread.run(Thread.java:636)


REGRESSION:  org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers

Error Message:
Only one compound segment should exist expected:3 but was:4

Stack Trace:
junit.framework.AssertionFailedError: Only one compound segment should exist 
expected:3 but was:4
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
at 
org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers(TestAddIndexes.java:952)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testSingleFile

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testSingleFile(TestCompoundFile.java:203)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
Caused by: java.lang.ClassNotFoundException: 
org.apache.lucene.index.CompoundFileWriter
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testTwoFiles

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testTwoFiles(TestCompoundFile.java:226)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomFiles

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testRandomFiles(TestCompoundFile.java:276)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing(TestCompoundFile.java:371)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomAccess

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testRandomAccess(TestCompoundFile.java:428)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomAccessClones

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testRandomAccessClones(TestCompoundFile.java:507)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at

RE: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure

2011-07-01 Thread Uwe Schindler

Is fixed now, was a problem during cutover to 3.3 backwards tests.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
 Sent: Friday, July 01, 2011 9:17 AM
 To: dev@lucene.apache.org
 Subject: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9221 - Failure
 
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9221/
 
 11 tests failed.
 FAILED:
 org.apache.lucene.util.TestFieldCacheSanityChecker.testIndexAndMerge
 
 Error Message:
 Forked Java VM exited abnormally. Please note the time in the report does
 not reflect the time until the VM exit.
 
 Stack Trace:
 junit.framework.AssertionFailedError: Forked Java VM exited abnormally.
 Please note the time in the report does not reflect the time until the VM
 exit.
   at java.lang.Thread.run(Thread.java:636)
 
 
 REGRESSION:
 org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers
 
 Error Message:
 Only one compound segment should exist expected:3 but was:4
 
 Stack Trace:
 junit.framework.AssertionFailedError: Only one compound segment should
 exist expected:3 but was:4
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1277)
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1195)
   at
 org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers(TestAddInd
 exes.java:952)
 
 
 REGRESSION:  org.apache.lucene.index.TestCompoundFile.testSingleFile
 
 Error Message:
 org/apache/lucene/index/CompoundFileWriter
 
 Stack Trace:
 java.lang.NoClassDefFoundError:
 org/apache/lucene/index/CompoundFileWriter
   at
 org.apache.lucene.index.TestCompoundFile.testSingleFile(TestCompoundFil
 e.java:203)
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1277)
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1195)
 Caused by: java.lang.ClassNotFoundException:
 org.apache.lucene.index.CompoundFileWriter
   at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
 
 
 REGRESSION:  org.apache.lucene.index.TestCompoundFile.testTwoFiles
 
 Error Message:
 org/apache/lucene/index/CompoundFileWriter
 
 Stack Trace:
 java.lang.NoClassDefFoundError:
 org/apache/lucene/index/CompoundFileWriter
   at
 org.apache.lucene.index.TestCompoundFile.testTwoFiles(TestCompoundFil
 e.java:226)
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1277)
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1195)
 
 
 REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomFiles
 
 Error Message:
 org/apache/lucene/index/CompoundFileWriter
 
 Stack Trace:
 java.lang.NoClassDefFoundError:
 org/apache/lucene/index/CompoundFileWriter
   at
 org.apache.lucene.index.TestCompoundFile.testRandomFiles(TestCompoun
 dFile.java:276)
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1277)
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1195)
 
 
 REGRESSION:
 org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing
 
 Error Message:
 org/apache/lucene/index/CompoundFileWriter
 
 Stack Trace:
 java.lang.NoClassDefFoundError:
 org/apache/lucene/index/CompoundFileWriter
   at
 org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.jav
 a:305)
   at
 org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing(Test
 CompoundFile.java:371)
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1277)
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1195)
 
 
 REGRESSION:
 org.apache.lucene.index.TestCompoundFile.testRandomAccess
 
 Error Message:
 org/apache/lucene/index/CompoundFileWriter
 
 Stack Trace:
 java.lang.NoClassDefFoundError:
 org/apache/lucene/index/CompoundFileWriter
   at
 org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.jav
 a:305)
   at
 org.apache.lucene.index.TestCompoundFile.testRandomAccess(TestCompo
 undFile.java:428)
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1277)
   at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(Luc
 eneTestCase.java:1195)
 
 
 REGRESSION:
 org.apache.lucene.index.TestCompoundFile.testRandomAccessClones
 
 Error Message:

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 9222 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/9222/

11 tests failed.
FAILED:  org.apache.lucene.util.TestFieldCacheSanityChecker.testIndexAndMerge

Error Message:
Forked Java VM exited abnormally. Please note the time in the report does not 
reflect the time until the VM exit.

Stack Trace:
junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please 
note the time in the report does not reflect the time until the VM exit.
at java.lang.Thread.run(Thread.java:636)


REGRESSION:  org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers

Error Message:
Only one compound segment should exist expected:3 but was:4

Stack Trace:
junit.framework.AssertionFailedError: Only one compound segment should exist 
expected:3 but was:4
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
at 
org.apache.lucene.index.TestAddIndexes.testNonCFSLeftovers(TestAddIndexes.java:952)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testSingleFile

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testSingleFile(TestCompoundFile.java:203)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)
Caused by: java.lang.ClassNotFoundException: 
org.apache.lucene.index.CompoundFileWriter
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:294)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testTwoFiles

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testTwoFiles(TestCompoundFile.java:226)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomFiles

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.testRandomFiles(TestCompoundFile.java:276)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testClonedStreamsClosing(TestCompoundFile.java:371)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomAccess

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testRandomAccess(TestCompoundFile.java:428)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1195)


REGRESSION:  org.apache.lucene.index.TestCompoundFile.testRandomAccessClones

Error Message:
org/apache/lucene/index/CompoundFileWriter

Stack Trace:
java.lang.NoClassDefFoundError: org/apache/lucene/index/CompoundFileWriter
at 
org.apache.lucene.index.TestCompoundFile.setUp_2(TestCompoundFile.java:305)
at 
org.apache.lucene.index.TestCompoundFile.testRandomAccessClones(TestCompoundFile.java:507)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:1277)
at

[jira] [Created] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread Bernd Fehling (JIRA)

use of FST for SynonymsFilterFactory and synonyms.txt
-

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Priority: Minor


Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
This can generate huge maps because of the permutations for synonyms.

Now where FST (finite state transducer) is introduced to lucene this could also 
be used for synonyms.
A tool can compile the synoynms.txt file to a binary automaton file which can 
then be used
with SynoynmsFilterFactory.

Advantage:
- faster start of solr, no need to generate SynonymsMap
- faster lookup
- memory saving


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt


 [ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned SOLR-2628:
-

Assignee: Dawid Weiss

 use of FST for SynonymsFilterFactory and synonyms.txt
 -

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
  Labels: suggestion

 Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
 This can generate huge maps because of the permutations for synonyms.
 Now where FST (finite state transducer) is introduced to lucene this could 
 also be used for synonyms.
 A tool can compile the synoynms.txt file to a binary automaton file which can 
 then be used
 with SynoynmsFilterFactory.
 Advantage:
 - faster start of solr, no need to generate SynonymsMap
 - faster lookup
 - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Created] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

2011-07-01 Thread yazhini.k vini

I don't need information about solr projects.

*
*

**yazhini**

[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt


[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058241#comment-13058241
 ] 

Dawid Weiss commented on SOLR-2628:
---

I've talked about it a little bit with Bernd and indeed, it seems possible to 
reduce the size of in-memory data structures by an order of magnitude (or even 
two orders of magnitude, we shall see). I'm on vacation for the next week and 
on a business trip for another one after that, but I'll be on it once I come 
back home.

 use of FST for SynonymsFilterFactory and synonyms.txt
 -

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
  Labels: suggestion

 Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
 This can generate huge maps because of the permutations for synonyms.
 Now where FST (finite state transducer) is introduced to lucene this could 
 also be used for synonyms.
 A tool can compile the synoynms.txt file to a binary automaton file which can 
 then be used
 with SynoynmsFilterFactory.
 Advantage:
 - faster start of solr, no need to generate SynonymsMap
 - faster lookup
 - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 9222 - Still Failing

2011-07-01 Thread yazhini.k vini

I don't need message from your mail sir.

[jira] [Created] (SOLR-2629) warning about org.apache.solr.request.SolrQueryResponse is deprecated

2011-07-01 Thread Bernd Fehling (JIRA)

warning about org.apache.solr.request.SolrQueryResponse is deprecated
-

 Key: SOLR-2629
 URL: https://issues.apache.org/jira/browse/SOLR-2629
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 3.2, 3.1, 3.3
 Environment: Linux
Reporter: Bernd Fehling
Priority: Trivial


The web admin interface uses the deprecated method 
org.apache.solr.request.SolrQueryResponse from within files:
- solr/src/webapp/web/admin/replication/header.jsp
- solr/src/webapp/web/admin/ping.jsp

That should be changed to use org.apache.solr.response.SolrQueryResponse



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2623) Solr JMX MBeans do not survive core reloads

2011-07-01 Thread Shalin Shekhar Mangar (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058461#comment-13058461
 ] 

Shalin Shekhar Mangar commented on SOLR-2623:
-

Hoss, I wish there was a way to do just that. I looked and looked but couldn't 
find it. The JMX API is really screwed up. Once you send in a MBean, apparently 
you can't get it out again. I'd be interested if anyone knew of a way to do 
that.

 Solr JMX MBeans do not survive core reloads
 ---

 Key: SOLR-2623
 URL: https://issues.apache.org/jira/browse/SOLR-2623
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 1.4, 1.4.1, 3.1, 3.2
Reporter: Alexey Serba
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-2623.patch, SOLR-2623.patch, SOLR-2623.patch


 Solr JMX MBeans do not survive core reloads
 {noformat:title=Steps to reproduce}
 sh cd example
 sh vi multicore/core0/conf/solrconfig.xml # enable jmx
 sh java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar 
 start.jar
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 
 solr/core0:id=core0,type=core
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler
 ...
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher
 solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler
 sh curl 'http://localhost:8983/solr/admin/cores?action=RELOADcore=core0'
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 # there's only one bean left after Solr core reload
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 
 main
 {noformat}
 The root cause of this is Solr core reload behavior:
 # create new core (which overwrites existing registered MBeans)
 # register new core and close old one (we remove/un-register MBeans on 
 oldCore.close)
 The correct sequence is:
 # unregister MBeans from old core
 # create and register new core
 # close old core without touching MBeans

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt


[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058468#comment-13058468
 ] 

Michael McCandless commented on SOLR-2628:
--

Dawid, have a look at LUCENE-3233 -- we have a [very very rough] start at this.

 use of FST for SynonymsFilterFactory and synonyms.txt
 -

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
  Labels: suggestion

 Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
 This can generate huge maps because of the permutations for synonyms.
 Now where FST (finite state transducer) is introduced to lucene this could 
 also be used for synonyms.
 A tool can compile the synoynms.txt file to a binary automaton file which can 
 then be used
 with SynoynmsFilterFactory.
 Advantage:
 - faster start of solr, no need to generate SynonymsMap
 - faster lookup
 - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt


 [ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss resolved SOLR-2628.
---

Resolution: Duplicate

Duplicate of LUCENE-3233

 use of FST for SynonymsFilterFactory and synonyms.txt
 -

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
  Labels: suggestion

 Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
 This can generate huge maps because of the permutations for synonyms.
 Now where FST (finite state transducer) is introduced to lucene this could 
 also be used for synonyms.
 A tool can compile the synoynms.txt file to a binary automaton file which can 
 then be used
 with SynoynmsFilterFactory.
 Advantage:
 - faster start of solr, no need to generate SynonymsMap
 - faster lookup
 - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt

[
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058470#comment-13058470
]

Dawid Weiss commented on SOLR-2628:
---

Yep, this is a duplicate. Thanks Mike. Like I said -- I won't be able to work
on this for the next two weeks (I also have that FST refactoring opened up in
the background... it's progressing slowly), but it's definitely a low-hanging
fruit to pick because it shouldn't be very difficult and the gains huge.

use of FST for SynonymsFilterFactory and synonyms.txt
-

Key: SOLR-2628
URL: https://issues.apache.org/jira/browse/SOLR-2628
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis
Affects Versions: 3.4, 4.0
Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
Labels: suggestion

Currently the SynonymsFilterFactory builds up a memory based SynonymsMap.
This can generate huge maps because of the permutations for synonyms.
Now where FST (finite state transducer) is introduced to lucene this could
also be used for synonyms.
A tool can compile the synoynms.txt file to a binary automaton file which can
then be used
with SynoynmsFilterFactory.
Advantage:
- faster start of solr, no need to generate SynonymsMap
- faster lookup
- memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt


[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058475#comment-13058475
 ] 

Michael McCandless commented on SOLR-2628:
--

I think the reduction of RAM should be huge but lookup speed might be slower 
(ie the usual tradeoff of FST), since we are going char by char in the FST.  If 
we go word-by-word (ie FST's labels are word ords and we separately resolve 
word - ord via normal hash lookup) then that might be a good middle 
ground... but this is all speculation for now!


 use of FST for SynonymsFilterFactory and synonyms.txt
 -

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
  Labels: suggestion

 Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
 This can generate huge maps because of the permutations for synonyms.
 Now where FST (finite state transducer) is introduced to lucene this could 
 also be used for synonyms.
 A tool can compile the synoynms.txt file to a binary automaton file which can 
 then be used
 with SynoynmsFilterFactory.
 Advantage:
 - faster start of solr, no need to generate SynonymsMap
 - faster lookup
 - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module

[
https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058476#comment-13058476
]

Michael McCandless commented on LUCENE-3272:

Big +1! We've needed query parsing factored out for a lng time. And
cutting tests over to a new MockQP, and then simply moving (but not merging)
all QPs together to a module, sounds like great first steps.

Note that the FieldType work (at least as currently planned/targetted) isn't a
schema -- it's really just a nicer API for working with documents. Ie, nothing
is persisted, nothing checks that 2 docs have the fields / types, etc.

Still, it would be great to pull Solr's QP in and somehow abstract the parts
that require access to Solr's schema.

Consolidate Lucene's QueryParsers into a module
---

Key: LUCENE-3272
URL: https://issues.apache.org/jira/browse/LUCENE-3272
Project: Lucene - Java
Issue Type: Improvement
Components: modules/queryparser
Reporter: Chris Male

Lucene has a lot of QueryParsers and we should have them all in a single
consistent place.
The following are QueryParsers I can find that warrant moving to the new
module:
- Lucene Core's QueryParser
- AnalyzingQueryParser
- ComplexPhraseQueryParser
- ExtendableQueryParser
- Surround's QueryParser
- PrecedenceQueryParser
- StandardQueryParser
- XML-Query-Parser's CoreParser
All seem to do a good job at their kind of parsing with extensive tests.
One challenge of consolidating these is that many tests use Lucene Core's
QueryParser. One option is to just replicate this class in src/test and call
it TestingQueryParser. Another option is to convert all tests over to
programmatically building their queries (seems like alot of work).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2628) use of FST for SynonymsFilterFactory and synonyms.txt


[ 
https://issues.apache.org/jira/browse/SOLR-2628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058489#comment-13058489
 ] 

Dawid Weiss commented on SOLR-2628:
---

Yes, this may be the case. It'd need to be investigated because storing words 
in a hashtable will also bump memory requirements, whereas an FST can at least 
reuse some prefixes and suffixes.

 use of FST for SynonymsFilterFactory and synonyms.txt
 -

 Key: SOLR-2628
 URL: https://issues.apache.org/jira/browse/SOLR-2628
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 3.4, 4.0
 Environment: Linux
Reporter: Bernd Fehling
Assignee: Dawid Weiss
Priority: Minor
  Labels: suggestion

 Currently the SynonymsFilterFactory builds up a memory based SynonymsMap. 
 This can generate huge maps because of the permutations for synonyms.
 Now where FST (finite state transducer) is introduced to lucene this could 
 also be used for synonyms.
 A tool can compile the synoynms.txt file to a binary automaton file which can 
 then be used
 with SynoynmsFilterFactory.
 Advantage:
 - faster start of solr, no need to generate SynonymsMap
 - faster lookup
 - memory saving

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: revisit naming for grouping/join?

2011-07-01 Thread Michael McCandless

I think joining and grouping are two different functions, and we
should keep different modules for them...

On Thu, Jun 30, 2011 at 10:30 PM, Robert Muir rcm...@gmail.com wrote:
 Hi,

 when looking at just a very quick glance at some of the newer
 grouping/join features, I found myself a little confused about what is
 exactly what, and I think users might too.

They are confusing!

 I discussed some of this with hossman, and it only seemed to make me
 even more totally confused about:
 * difference between field collapsing and grouping

I like the name grouping better here: I think field collapsing
undersells (it's only one specific way to use grouping).  EG, grouping
w/o collapsing is useful (eg, Best Buy grouping hits by product
category and showing the top 5 in each).

 * difference between nested documents and the index-time join

Similarly I think nested docs undersells index-time join: you can
join (either during indexing or during searching) in many different
ways, and nested docs is just one use case.

EG, maybe your docs are doctors but during indexing you join to a city
table with facts about that city (each doctor's office is in a
specific city) and then you want to run queries like city's avg
annual temp  60 and doctor has good bedside manner or something.

 * difference between index-time-join/nested documents and single-pass
 index-time grouping. Is the former only a more general case of the
 latter?

Grouping is purely a presentation concern -- you are not altering
which docs hit; you are simply changing how you pick which hits to
display (top N by group).  So we only have collectors here.

The generic (requires 2 passes) collectors can group on anything at
search time; the doc block collector requires that you indexed all
docs in each group as a block.

Join is both about restricting matches and also presentation of hits,
because your query needs to match fields from different [logical]
tables (so, the module has a Query and a Collector).  When you get the
results back, you may or may not be interested in retaining the table
structure in your result set (ie, you may not have selected fields
from the child table).

Similarly, generic joining (in Solr/ElasticSearch today but I'd like
to factor into the join module) can do any join at search time, while
the doc block collector requires that you did the necessary join(s)
during indexing.

 * difference between the above joinish capabilities and solr's join
 impl... other than the single-pass/index-time limitation (which is
 really an implementation detail), I'm talking about use cases.

Solr's/ElasticSearch's join is more general because you can join
anything at search time (even, across 2 different indexes), vs doc
block join where you must pick which joins you will ever want to use
and then build the index accordingly.

You can also mix the two.  Maybe you do certain joins while indexing,
but then at search time you do other joins generically.  That's
fine.  (Same is true for grouping).

 I think its especially interesting since the join module depends on
 the grouping module.

The join module does currently depend on the grouping module, but for
a silly reason: just for the TopGroups, to represent the returned
hits.  We could move TopGroups/GroupDocs into common (thus justifying
its generic name!)?  Then both join and grouping modules depend on
common.

Really TopGroups is just a TopDocs that allows some recursion (ie,
each hit may in turn be another TopDocs).  But TopGroups is limited
now to only depth 2 recursion... we need to fix this for nested
grouping.  Really we just need a recursive TopDocs here

 So I am curious if we should:
 * add docs (maybe with simple examples) in the package.html or
 otherwise that differentiate what these guys are, or at least agree on
 some consistent terminology and define it somewhere? I feel like
 people have explained to me the differences in all these things
 before, but then its easy to forget.

Well, each module's package.html has a start here, but I agree we
should do more.

I think what would be best is a smallish but feature complete demo, ie
pull together some easy-to-understand sample content and the build a
small demo app around it.  We could then show how to use grouping for
field collapsing (and for other use cases), joining for nested docs
(and for other use cases), etc.

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2630) XsltUpdateRequestHandler

XsltUpdateRequestHandler


 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Upayavira
Priority: Minor
 Fix For: 4.0
 Attachments: xslt-update-handler.patch

An update request handler that can accept a tr param, allowing the indexing of 
any XML content that is passed to solr, so long as there is an XSLT stylesheet 
in solr/conf/xslt that can transform it to the adddoc//add format.

Could be used, for example, to allow Solr to ingest docbook directly, without 
any preprocessing.
 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2630) XsltUpdateRequestHandler


 [ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Upayavira updated SOLR-2630:


Attachment: xslt-update-handler.patch

Patch for XsltUpdateRequestHandler, along with a test case for it

 XsltUpdateRequestHandler
 

 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Upayavira
Priority: Minor
 Fix For: 4.0

 Attachments: xslt-update-handler.patch


 An update request handler that can accept a tr param, allowing the indexing 
 of any XML content that is passed to solr, so long as there is an XSLT 
 stylesheet in solr/conf/xslt that can transform it to the adddoc//add 
 format.
 Could be used, for example, to allow Solr to ingest docbook directly, without 
 any preprocessing.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Issues with Grouping

2011-07-01 Thread Yonik Seeley

On Thu, Jun 30, 2011 at 11:58 PM, Bill Bell billnb...@gmail.com wrote:
 I meant FC insanity. It does not appear to be an NPE.

That's natural, and not a bug.  Grouping always uses per-segment field
cache entries, where faceting sometimes uses top level field caches.

-Yonik
http://www.lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler


[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058540#comment-13058540
 ] 

Uwe Schindler commented on SOLR-2630:
-

XML is binary data, so you should not convert it to Strings. Ideally the 
already transformed DOM tree or SAX stream would directly be passed to the 
importer. I know, this is not easily possible, so the most correct way would be 
to pass the binary byte[] dierectly and reparse.

I will try to investigate to directly pass the SAX events / XSL DOM tree 
around, which is possible, as transformer API can also directly pipe to StAX, 
used by the underlying XMLImporter.

 XsltUpdateRequestHandler
 

 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Upayavira
Priority: Minor
 Fix For: 4.0

 Attachments: xslt-update-handler.patch


 An update request handler that can accept a tr param, allowing the indexing 
 of any XML content that is passed to solr, so long as there is an XSLT 
 stylesheet in solr/conf/xslt that can transform it to the adddoc//add 
 format.
 Could be used, for example, to allow Solr to ingest docbook directly, without 
 any preprocessing.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler


[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058541#comment-13058541
 ] 

Uwe Schindler commented on SOLR-2630:
-

Also you miss to pass the content type charset to the StreamSource. I will post 
a improved patch fixing both problems soon.

Thanks for the patch!

 XsltUpdateRequestHandler
 

 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Upayavira
Priority: Minor
 Fix For: 4.0

 Attachments: xslt-update-handler.patch


 An update request handler that can accept a tr param, allowing the indexing 
 of any XML content that is passed to solr, so long as there is an XSLT 
 stylesheet in solr/conf/xslt that can transform it to the adddoc//add 
 format.
 Could be used, for example, to allow Solr to ingest docbook directly, without 
 any preprocessing.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2630) XsltUpdateRequestHandler


 [ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler reassigned SOLR-2630:
---

Assignee: Uwe Schindler

 XsltUpdateRequestHandler
 

 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Upayavira
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 4.0

 Attachments: xslt-update-handler.patch


 An update request handler that can accept a tr param, allowing the indexing 
 of any XML content that is passed to solr, so long as there is an XSLT 
 stylesheet in solr/conf/xslt that can transform it to the adddoc//add 
 format.
 Could be used, for example, to allow Solr to ingest docbook directly, without 
 any preprocessing.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java

2011-07-01 Thread Michael McCandless

On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler u...@thetaphi.de wrote:
 Hi,

 I don't understand the whole discussion here, so please compare these two 
 implementations and tell me which one is faster. Please don't hurt me, if you 
 don't want to see src.jar code from OpenJDK Java6 - just delete this mail if 
 you don’t want to (the code here is licensed under GPL):

This is the source code for a specific version of one specific Java
impl.  If we knew all Java impls simply implemented the primitive case
using System.arraycopy (admittedly it's hard to imagine that they
wouldn't!) then we are fine.

 This is our implementation, simon replaced and Robert reverted 
 (UnsafeByteArrayOutputStream):

  private void grow(int newLength) {
    // It actually should be: (Java 1.7, when its intrinsic on all machines)
    // buffer = Arrays.copyOf(buffer, newLength);
    byte[] newBuffer = new byte[newLength];
    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    buffer = newBuffer;
  }

 So please look at the code, where is a difference that could slow down, 
 except the Math.min() call which is an intrinsic in almost every JDK on earth?

Right, in this case (if you used OpenJDK 6) we are obviously OK.  Not
sure about other cases...

 The problem we are talking about here is only about the generic Object[] 
 copyOf method and also affects e.g. *all* Collection.toArray() methods - they 
 all use this code, so whenever you use ArrayList.toArray() or similar, the 
 slow code is executed. This is why we replaced Collections.sort() by 
 CollectionUtil.sort, that does no array copy. Simon  me were not willing to 
 replace the reallocations in FST code (Mike you remember, we reverted that on 
 your GIT repo when we did perf tests) and other parts in Lucene (there are 
 only few of them). The idea was only to replace primitive type code to make 
 it easier readable. And with later JDK code it could even get faster (not 
 slower), if Oracle starts to add intrinsics for those new methods (and that’s 
 Dawid and mine reason to change to copyTo for primitive types). In general, 
 if you use Java SDK methods, that are as fast as ours, they always have a 
 chance to get faster in later JDKs. So we should always prefer Java SDK 
 methods, unless they are slower because their default impl is too generic or 
 has too much safety checks or uses reflection.

OK I'm convinced (I think!) that for primitive types only, let's use
Arrays.copyOf!

 To come back to UnsafeByteArrayOutputStream:

 I would change the whole code, as I don’t like the allocation strategy in it 
 (it's exponential, on every grow it doubles its size). We should change that 
 to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar 
 allocation strategy like in trunk. Then we can discuss about this problem 
 again when Simon  me wants to change ArrayUtils.grow methods to use 
 Arrays.copyOf... *g* [just joking, I will never ask again, because this 
 discussion here is endless and does not bring us forward].

Well, it sounds like for primitive types, we can cutover
ArrayUtils.grow methods.  Then we can look @ the nightly bench the
next day ;)

But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it
(almost) a dup of ByteArrayDataOutput?

 The other thing I don’t like in the new faceting module is duplication of 
 vint code. Why don’t we change it to use DataInput/DataOutput and use Dawid's 
 new In/OutStream wrapper for DataOutput everywhere. This would be much more 
 streamlined with all the code we currently have. Then we can encode the 
 payloads (or later docvalues) using the new UnsafeByteArrayOutputStream, 
 wrapped with a OutputStreamDataOutput wrapper? Or maybe add a 
 ByteArrayDataOutput class.

That sounds good!

Uwe can you commit TODOs to the code w/ these ideas?

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2309) Fully decouple IndexWriter from analyzers

[
https://issues.apache.org/jira/browse/LUCENE-2309?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058548#comment-13058548
]

Michael McCandless commented on LUCENE-2309:

Great!

This will overlap w/ the field type work (we have branch for this now), where
we already have decoupled indexer from concrete Field/Document impls, by adding
a minimal IndexableField.

I think this issue should further that, ie pare back IndexableField so that
there's only a getTokenStream for indexing (ie indexer will no longer try for
String then Reader then tokenStream), and Analyzer must move to the FieldType
and not be passed to IndexWriterConfig. Multi-valued fields will be tricky,
since IW now asks analyzer for the gaps...

Fully decouple IndexWriter from analyzers
-

Key: LUCENE-2309
URL: https://issues.apache.org/jira/browse/LUCENE-2309
Project: Lucene - Java
Issue Type: Improvement
Components: core/index
Reporter: Michael McCandless
Labels: gsoc2011, lucene-gsoc-11, mentor
Fix For: 4.0

IndexWriter only needs an AttributeSource to do indexing.
Yet, today, it interacts with Field instances, holds a private
analyzers, invokes analyzer.reusableTokenStream, has to deal with a
wide variety (it's not analyzed; it is analyzed but it's a Reader,
String; it's pre-analyzed).
I'd like to have IW only interact with attr sources that already
arrived with the fields. This would be a powerful decoupling -- it
means others are free to make their own attr sources.
They need not even use any of Lucene's analysis impls; eg they can
integrate to other things like [OpenPipeline|http://www.openpipeline.org].
Or make something completely custom.
LUCENE-2302 is already a big step towards this: it makes IW agnostic
about which attr is the term, and only requires that it provide a
BytesRef (for flex).
Then I think LUCENE-2308 would get us most of the remaining way -- ie, if the
FieldType knows the analyzer to use, then we could simply create a
getAttrSource() method (say) on it and move all the logic IW has today
onto there. (We'd still need existing IW code for back-compat).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2883) Consolidate Solr Lucene FunctionQuery into modules


[ 
https://issues.apache.org/jira/browse/LUCENE-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058549#comment-13058549
 ] 

Michael McCandless commented on LUCENE-2883:


+1, this is great Chris!

 Consolidate Solr   Lucene FunctionQuery into modules
 -

 Key: LUCENE-2883
 URL: https://issues.apache.org/jira/browse/LUCENE-2883
 Project: Lucene - Java
  Issue Type: Task
  Components: core/search
Affects Versions: 4.0
Reporter: Simon Willnauer
Assignee: Chris Male
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: 4.0

 Attachments: LUCENE-2883.patch


 Spin-off from the [dev list | 
 http://www.mail-archive.com/dev@lucene.apache.org/msg13261.html]  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: revisit naming for grouping/join?

2011-07-01 Thread mark harwood

 I think what would be best is a smallish but feature complete demo,

For the nested stuff I had a reasonable demo on LUCENE-2454 that was based 
around resumes - that use case has the one-to-many characteristics that lends 
itself to nested e.g. a person has many different qualifications and records of 
employment.
This scenario was illustrated 
here: 
http://www.slideshare.net/MarkHarwood/proposal-for-nested-document-support-in-lucene

I also had the book search type scenario where a book has many sections and 
for the purposes of efficient highlighting/summarisation  these sections were 
treated as child docs which could be read quickly (rather than highlighting a 
whole book)

I'm not sure what the parent was in your doctor and cities example, Mike. If 
a 
doctor is in only one city then there is no point making city a child doc as 
the 
one city info can happily be combined with the doctor info into a single 
document with no conflict (doctors have different properties to cities).
If the city is the parent with many child doctor docs that makes more sense but 
feels like a less likely use case e.g. find me a city with doctor x and a 
different doctor y
Searching for a person with excellent java and prefrerably good lucene skills 
feels like a more real-world example.

It feels like documenting some of the trade-offs behind index design choices is 
useful too e.g. nesting is not too great for very volatile content with 
constantly changing children while search-time join is more costly in RAM and 
2-pass processing

Cheers
Mark



- Original Message 
From: Michael McCandless luc...@mikemccandless.com
To: dev@lucene.apache.org
Sent: Fri, 1 July, 2011 13:51:04
Subject: Re: revisit naming for grouping/join?

I think joining and grouping are two different functions, and we
should keep different modules for them...

On Thu, Jun 30, 2011 at 10:30 PM, Robert Muir rcm...@gmail.com wrote:
 Hi,

 when looking at just a very quick glance at some of the newer
 grouping/join features, I found myself a little confused about what is
 exactly what, and I think users might too.

They are confusing!

 I discussed some of this with hossman, and it only seemed to make me
 even more totally confused about:
 * difference between field collapsing and grouping

I like the name grouping better here: I think field collapsing
undersells (it's only one specific way to use grouping).  EG, grouping
w/o collapsing is useful (eg, Best Buy grouping hits by product
category and showing the top 5 in each).

 * difference between nested documents and the index-time join

Similarly I think nested docs undersells index-time join: you can
join (either during indexing or during searching) in many different
ways, and nested docs is just one use case.

EG, maybe your docs are doctors but during indexing you join to a city
table with facts about that city (each doctor's office is in a
specific city) and then you want to run queries like city's avg
annual temp  60 and doctor has good bedside manner or something.

 * difference between index-time-join/nested documents and single-pass
 index-time grouping. Is the former only a more general case of the
 latter?

Grouping is purely a presentation concern -- you are not altering
which docs hit; you are simply changing how you pick which hits to
display (top N by group).  So we only have collectors here.

The generic (requires 2 passes) collectors can group on anything at
search time; the doc block collector requires that you indexed all
docs in each group as a block.

Join is both about restricting matches and also presentation of hits,
because your query needs to match fields from different [logical]
tables (so, the module has a Query and a Collector).  When you get the
results back, you may or may not be interested in retaining the table
structure in your result set (ie, you may not have selected fields
from the child table).

Similarly, generic joining (in Solr/ElasticSearch today but I'd like
to factor into the join module) can do any join at search time, while
the doc block collector requires that you did the necessary join(s)
during indexing.

 * difference between the above joinish capabilities and solr's join
 impl... other than the single-pass/index-time limitation (which is
 really an implementation detail), I'm talking about use cases.

Solr's/ElasticSearch's join is more general because you can join
anything at search time (even, across 2 different indexes), vs doc
block join where you must pick which joins you will ever want to use
and then build the index accordingly.

You can also mix the two.  Maybe you do certain joins while indexing,
but then at search time you do other joins generically.  That's
fine.  (Same is true for grouping).

 I think its especially interesting since the join module depends on
 the grouping module.

The join module does currently depend on the grouping module, but for
a silly reason: just for the TopGroups, to represent the returned
hits.  We could

[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler


[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058552#comment-13058552
 ] 

Upayavira commented on SOLR-2630:
-

Great! I was sure I'd missed stuff. Happy to improve stuff here too (e.g. port 
to 3.x).

 XsltUpdateRequestHandler
 

 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 4.0
Reporter: Upayavira
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 4.0

 Attachments: xslt-update-handler.patch


 An update request handler that can accept a tr param, allowing the indexing 
 of any XML content that is passed to solr, so long as there is an XSLT 
 stylesheet in solr/conf/xslt that can transform it to the adddoc//add 
 format.
 Could be used, for example, to allow Solr to ingest docbook directly, without 
 any preprocessing.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: revisit naming for grouping/join?

2011-07-01 Thread Robert Muir

On Fri, Jul 1, 2011 at 8:51 AM, Michael McCandless
luc...@mikemccandless.com wrote:

 The join module does currently depend on the grouping module, but for
 a silly reason: just for the TopGroups, to represent the returned
 hits.  We could move TopGroups/GroupDocs into common (thus justifying
 its generic name!)?  Then both join and grouping modules depend on
 common.

Just a suggestion: maybe they belong in the lucene core? And maybe the
stuff in the common module belongs in lucene core's util package?

I guess I'm suggesting we try to keep our modules as flat as possible,
with as little dependencies as possible. I think we really already
have a 'common' module, thats the lucene core. If multiple modules end
up relying upon the same functionality, especially if its something
simple like an abstract class (Analyzer) or a utility thing (these
mutable integers, etc), then thats a good sign it belongs in core
apis.

I think we really need to try to nuke all these dependencies between
modules: its great to add them as a way to get refactoring started,
but ultimately we should try to clean up: because we don't want a
complex 'graph' of dependencies but instead something dead-simple. I
made a total mess with the analyzers module at first, i think
everything depended on it! but now we have nuked almost all
dependencies on this thing, except for where it makes sense to have
that concrete dependency (benchmark, demo, solr).


 I think what would be best is a smallish but feature complete demo, ie
 pull together some easy-to-understand sample content and the build a
 small demo app around it.  We could then show how to use grouping for
 field collapsing (and for other use cases), joining for nested docs
 (and for other use cases), etc.


For the same reason listed above, I think we should take our
contrib/demo and consolidate 'examples' across various places into
this demo module. The reason is:
* examples typically depend upon 'concrete' stuff, but in general core
stuff should work around interfaces/abstract classes: e.g. the
faceting module has an analyzers dependency only because of its
examples.
* examples might want to integrate modules, e.g. an example of how to
integrate faceting and grouping or something like that.
* examples are important: i think if the same question comes up on the
user list often, we should consider adding an example.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-trunk #168: POMs out of sync

Build: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/168/

No tests ran.

Build Log (for compile errors):
[...truncated 7442 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2630) XsltUpdateRequestHandler

[
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated SOLR-2630:

Attachment: xslt-update-handler.patch

Here improved patch. This impl does not internally serialize the XML again to a
stream and read it using StAX; this one uses the XSL ResultTreeFragment (RTF)
which is always built as a DOM tree by XSL transformers and feeds it to StAX.
So we dont need any stupid serialize/deserialize step inbetween. This patch
also respects the content-type parameter of the input like XMLLoader. The
intermediate buffering is needed because we need to change from push to pull
APIs.

This patch also fixes a small issue in XSLTResponseWriter, as it also misses to
correctly log transformation warn/error events to slf4j.

XsltUpdateRequestHandler

Key: SOLR-2630
URL: https://issues.apache.org/jira/browse/SOLR-2630
Project: Solr
Issue Type: Improvement
Components: update
Affects Versions: 4.0
Reporter: Upayavira
Assignee: Uwe Schindler
Priority: Minor
Fix For: 4.0

Attachments: xslt-update-handler.patch, xslt-update-handler.patch

An update request handler that can accept a tr param, allowing the indexing
of any XML content that is passed to solr, so long as there is an XSLT
stylesheet in solr/conf/xslt that can transform it to the adddoc//add
format.
Could be used, for example, to allow Solr to ingest docbook directly, without
any preprocessing.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

@Rory, @All,

The only tickets I currently have for those is LUCENE-419, LUCENE-418

418, I should be able to push into the 2.9.4g branch tonight.419 is a
long term goal and not as important as getting the tests fixed, of have the
tests broken down into what is actually a unit test, functional test, perf
or long running test. I can get into more why it needs to be done.

I'll also need to make document the what build script currently does on the
wiki  and make a few notes about testing, like using the RAMDirectory,
etc.

Things that need to get done or even be discussed.
 * There needs to be a running list of things to do/not to do with testing.
I don't know if this goes in a jira or do we keep a running list on the wiki
or site for people to pick up and  help with.
 * Tests need to run on mono and not Fail (there is a good deal of failing
tests on mono, mostly due to the temp directory have the C:\ in the path).
 * Assert.ThrowExceptionType() needs to be used instead of Try/Catch
Assert.Fail.  **
 * File  Path combines to the temp directory need helper methods,
 * e,g, having this in a hundred places is bad   new
System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get(tempDir,
), testIndex));
 * We should still be testing deprecated methods, but we need to use #pragma
warning disable/enable 0618  for testing those. otherwise compiler warnings
are too numerous to be anywhere near helpful.
 * We should only be using deprecated methods in places where they are being
explicitly tested, other tests that need that functionality in order to
validate those tests should be re factored to use methods that are not
deprecated.
 * Identify code that could be abstracted into test utility classes.
 * Infrastructure Validation tests need to be made, anything that seems like
infrastructure.  e.g. does the temp directory exist, does the folders that
the tests use inside the temp directory exist, can we read/write to those
folders. (if a ton of tests fail due to the file system, we should be able
to point out that it was due to permissions or missing folders, files,
etc).
 * Identify what classes need an interface, abstract class or inherited in
order to create testing mocks. (once those classes are created, they should
be documented in the wiki).



** Asset.Throws needs to replace stuff like the following. We should also be
checking the messages for exceptions and make sure they make sense and can
help users fix isses if the exceptions are aimed at the library users.
try
{
d = DateTools.StringToDate(97); // no date
Assert.Fail();
}
catch (System.FormatException e)
{
/* expected exception */
}

On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire codekai...@gmail.com wrote:

 So, veering towards action - are there concrete tasks written up anywhere
 for the unit tests? If a poor schlep like me wanted to dig in and start to
 improve them, where would I get the understanding of what is good and what
 needs help?

 -r

 On Thu, Jun 30, 2011 at 3:29 PM, Digy digyd...@gmail.com wrote:

  I can not say I like this approach, but till we find an automated
 way(with
  good results), it seems to be the only way we can use.
 
  DIGY
 
  -Original Message-
  From: Troy Howard [mailto:thowar...@gmail.com]
  Sent: Friday, July 01, 2011 12:43 AM
  To: lucene-net-...@lucene.apache.org
  Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
 
  Scott -
 
  The idea of the automated port is still worth doing. Perhaps it makes
 sense
  for someone more passionate about the line-by-line idea to do that work?
 
  I would say, focus on what makes sense to you. Being productive,
 regardless
  of the specific direction, is what will be most valuable. Once you start,
  others will join and momentum will build. That is how these things work.
 
  I like DIGY's approach too, but the problem with it is that it is a
  never-ending manual task. The theory behind the automated port is that it
  may reduce the manual work. It is complicated, but once it's built and
  works, it will save a lot of future development hours. If it's built in a
  sufficiently general manner, it could be useful for other project like
  Lucene.Net that want to automate a port from Java to C#.
 
  It might make sense for that to be a separate project from Lucene.Net
  though.
 
  -T
 
 
  On Thu, Jun 30, 2011 at 2:13 PM, Scott Lombard lombardena...@gmail.com
  wrote:
 
   Ok I think I asked the wrong question.  I am trying to figure out where
  to
   put my time.  I was thinking about working on the automated porting
  system,
   but when I saw the response to the .NET 4.0 discussions I started to
   question if that is the right direction.  The community seemed to be
 more
   interested in the .NET features.
  
   The complexity of the automated tool is going to become very high and
  will
   probably end up with a line-for-line style port.  So I keep asking my
  self
   is the automated tool worth it.  I don't think it is.
  
   I like the

[jira] [Updated] (SOLR-2630) XsltUpdateRequestHandler


 [ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated SOLR-2630:


Affects Version/s: 3.3
Fix Version/s: 3.4

Merging to 3.x should be simple, too!

 XsltUpdateRequestHandler
 

 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.3, 4.0
Reporter: Upayavira
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: xslt-update-handler.patch, xslt-update-handler.patch


 An update request handler that can accept a tr param, allowing the indexing 
 of any XML content that is passed to solr, so long as there is an XSLT 
 stylesheet in solr/conf/xslt that can transform it to the adddoc//add 
 format.
 Could be used, for example, to allow Solr to ingest docbook directly, without 
 any preprocessing.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

* need to document what the build script does.  whut grammerz?

On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon 
mhern...@wickedsoftware.net wrote:

 @Rory, @All,

 The only tickets I currently have for those is LUCENE-419, LUCENE-418

 418, I should be able to push into the 2.9.4g branch tonight.419 is a
 long term goal and not as important as getting the tests fixed, of have the
 tests broken down into what is actually a unit test, functional test, perf
 or long running test. I can get into more why it needs to be done.

 I'll also need to make document the what build script currently does on the
 wiki  and make a few notes about testing, like using the RAMDirectory,
 etc.

 Things that need to get done or even be discussed.
  * There needs to be a running list of things to do/not to do with testing.
 I don't know if this goes in a jira or do we keep a running list on the wiki
 or site for people to pick up and  help with.
  * Tests need to run on mono and not Fail (there is a good deal of failing
 tests on mono, mostly due to the temp directory have the C:\ in the path).
  * Assert.ThrowExceptionType() needs to be used instead of Try/Catch
 Assert.Fail.  **
  * File  Path combines to the temp directory need helper methods,
  * e,g, having this in a hundred places is bad   new
 System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get(tempDir,
 ), testIndex));
  * We should still be testing deprecated methods, but we need to use #pragma
 warning disable/enable 0618  for testing those. otherwise compiler warnings
 are too numerous to be anywhere near helpful.
  * We should only be using deprecated methods in places where they are
 being explicitly tested, other tests that need that functionality in order
 to validate those tests should be re factored to use methods that are not
 deprecated.
  * Identify code that could be abstracted into test utility classes.
  * Infrastructure Validation tests need to be made, anything that seems
 like infrastructure.  e.g. does the temp directory exist, does the folders
 that the tests use inside the temp directory exist, can we read/write to
 those folders. (if a ton of tests fail due to the file system, we should be
 able to point out that it was due to permissions or missing folders, files,
 etc).
  * Identify what classes need an interface, abstract class or inherited in
 order to create testing mocks. (once those classes are created, they should
 be documented in the wiki).



 ** Asset.Throws needs to replace stuff like the following. We should also
 be checking the messages for exceptions and make sure they make sense and
 can help users fix isses if the exceptions are aimed at the library users.
 try
 {
 d = DateTools.StringToDate(97); // no date
  Assert.Fail();
 }
 catch (System.FormatException e)
  {
 /* expected exception */
 }

 On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire codekai...@gmail.comwrote:

 So, veering towards action - are there concrete tasks written up anywhere
 for the unit tests? If a poor schlep like me wanted to dig in and start to
 improve them, where would I get the understanding of what is good and what
 needs help?

 -r

 On Thu, Jun 30, 2011 at 3:29 PM, Digy digyd...@gmail.com wrote:

  I can not say I like this approach, but till we find an automated
 way(with
  good results), it seems to be the only way we can use.
 
  DIGY
 
  -Original Message-
  From: Troy Howard [mailto:thowar...@gmail.com]
  Sent: Friday, July 01, 2011 12:43 AM
  To: lucene-net-...@lucene.apache.org
  Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?
 
  Scott -
 
  The idea of the automated port is still worth doing. Perhaps it makes
 sense
  for someone more passionate about the line-by-line idea to do that work?
 
  I would say, focus on what makes sense to you. Being productive,
 regardless
  of the specific direction, is what will be most valuable. Once you
 start,
  others will join and momentum will build. That is how these things work.
 
  I like DIGY's approach too, but the problem with it is that it is a
  never-ending manual task. The theory behind the automated port is that
 it
  may reduce the manual work. It is complicated, but once it's built and
  works, it will save a lot of future development hours. If it's built in
 a
  sufficiently general manner, it could be useful for other project like
  Lucene.Net that want to automate a port from Java to C#.
 
  It might make sense for that to be a separate project from Lucene.Net
  though.
 
  -T
 
 
  On Thu, Jun 30, 2011 at 2:13 PM, Scott Lombard lombardena...@gmail.com
  wrote:
 
   Ok I think I asked the wrong question.  I am trying to figure out
 where
  to
   put my time.  I was thinking about working on the automated porting
  system,
   but when I saw the response to the .NET 4.0 discussions I started to
   question if that is the right direction.  The community seemed to be
 more
   interested in the .NET features.
  
   The complexity of the

RE: revisit naming for grouping/join?

2011-07-01 Thread Steven A Rowe

On 7/1/2011 at 10:02 AM, Robert Muir wrote:
 [...] I think we should take our contrib/demo and consolidate 'examples'
 across various places into this demo module. The reason is:

 * examples typically depend upon 'concrete' stuff, but in general core
   stuff should work around interfaces/abstract classes: e.g. the faceting
   module has an analyzers dependency only because of its examples.

 * examples might want to integrate modules, e.g. an example of how to
   integrate faceting and grouping or something like that.

 * examples are important: i think if the same question comes up on the
   user list often, we should consider adding an example.

+1

Re: svn commit: r1141510 - /lucene/dev/trunk/modules/facet/src/java/org/apache/lucene/util/UnsafeByteArrayOutputStream.java

2011-07-01 Thread Shai Erera

About the encoders package - there are several encoders there besides
VInt, so I wouldn't dispose of it so quickly. That said, I think we
should definitely explore consolidating VInt with the core classes,
and maybe write an encoder which delegate to them.

Or, come up w/ a different approach for allowing to plug in different
Encoders. I don't rule out anything, as long as we preserve
functionality and capabilities.

Shai

On Friday, July 1, 2011, Michael McCandless luc...@mikemccandless.com wrote:
 On Fri, Jul 1, 2011 at 2:33 AM, Uwe Schindler u...@thetaphi.de wrote:
 Hi,

 I don't understand the whole discussion here, so please compare these two 
 implementations and tell me which one is faster. Please don't hurt me, if 
 you don't want to see src.jar code from OpenJDK Java6 - just delete this 
 mail if you don’t want to (the code here is licensed under GPL):

 This is the source code for a specific version of one specific Java
 impl.  If we knew all Java impls simply implemented the primitive case
 using System.arraycopy (admittedly it's hard to imagine that they
 wouldn't!) then we are fine.

 This is our implementation, simon replaced and Robert reverted 
 (UnsafeByteArrayOutputStream):

  private void grow(int newLength) {
    // It actually should be: (Java 1.7, when its intrinsic on all machines)
    // buffer = Arrays.copyOf(buffer, newLength);
    byte[] newBuffer = new byte[newLength];
    System.arraycopy(buffer, 0, newBuffer, 0, buffer.length);
    buffer = newBuffer;
  }

 So please look at the code, where is a difference that could slow down, 
 except the Math.min() call which is an intrinsic in almost every JDK on 
 earth?

 Right, in this case (if you used OpenJDK 6) we are obviously OK.  Not
 sure about other cases...

 The problem we are talking about here is only about the generic Object[] 
 copyOf method and also affects e.g. *all* Collection.toArray() methods - 
 they all use this code, so whenever you use ArrayList.toArray() or similar, 
 the slow code is executed. This is why we replaced Collections.sort() by 
 CollectionUtil.sort, that does no array copy. Simon  me were not willing to 
 replace the reallocations in FST code (Mike you remember, we reverted that 
 on your GIT repo when we did perf tests) and other parts in Lucene (there 
 are only few of them). The idea was only to replace primitive type code to 
 make it easier readable. And with later JDK code it could even get faster 
 (not slower), if Oracle starts to add intrinsics for those new methods (and 
 that’s Dawid and mine reason to change to copyTo for primitive types). In 
 general, if you use Java SDK methods, that are as fast as ours, they always 
 have a chance to get faster in later JDKs. So we should always prefer Java 
 SDK methods, unless they are slower because their default impl is too 
 generic or has too much safety checks or uses reflection.

 OK I'm convinced (I think!) that for primitive types only, let's use
 Arrays.copyOf!

 To come back to UnsafeByteArrayOutputStream:

 I would change the whole code, as I don’t like the allocation strategy in it 
 (it's exponential, on every grow it doubles its size). We should change that 
 to use ArrayUtils.grow() and ArrayUtils.oversize(), to have a similar 
 allocation strategy like in trunk. Then we can discuss about this problem 
 again when Simon  me wants to change ArrayUtils.grow methods to use 
 Arrays.copyOf... *g* [just joking, I will never ask again, because this 
 discussion here is endless and does not bring us forward].

 Well, it sounds like for primitive types, we can cutover
 ArrayUtils.grow methods.  Then we can look @ the nightly bench the
 next day ;)

 But I agree we should fix UnsafeByteArrayOutputStream... or, isn't it
 (almost) a dup of ByteArrayDataOutput?

 The other thing I don’t like in the new faceting module is duplication of 
 vint code. Why don’t we change it to use DataInput/DataOutput and use 
 Dawid's new In/OutStream wrapper for DataOutput everywhere. This would be 
 much more streamlined with all the code we currently have. Then we can 
 encode the payloads (or later docvalues) using the new 
 UnsafeByteArrayOutputStream, wrapped with a OutputStreamDataOutput wrapper? 
 Or maybe add a ByteArrayDataOutput class.

 That sounds good!

 Uwe can you commit TODOs to the code w/ these ideas?

 Mike

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2623) Solr JMX MBeans do not survive core reloads

2011-07-01 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058626#comment-13058626
 ] 

Hoss Man commented on SOLR-2623:


Grr... right, right.   ObjectInstance != MBean.

 Solr JMX MBeans do not survive core reloads
 ---

 Key: SOLR-2623
 URL: https://issues.apache.org/jira/browse/SOLR-2623
 Project: Solr
  Issue Type: Bug
  Components: multicore
Affects Versions: 1.4, 1.4.1, 3.1, 3.2
Reporter: Alexey Serba
Assignee: Shalin Shekhar Mangar
Priority: Minor
 Attachments: SOLR-2623.patch, SOLR-2623.patch, SOLR-2623.patch


 Solr JMX MBeans do not survive core reloads
 {noformat:title=Steps to reproduce}
 sh cd example
 sh vi multicore/core0/conf/solrconfig.xml # enable jmx
 sh java -Dcom.sun.management.jmxremote -Dsolr.solr.home=multicore -jar 
 start.jar
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 
 solr/core0:id=core0,type=core
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=org.apache.solr.handler.StandardRequestHandler
 solr/core0:id=org.apache.solr.handler.StandardRequestHandler,type=standard
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=/update
 solr/core0:id=org.apache.solr.handler.XmlUpdateRequestHandler,type=org.apache.solr.handler.XmlUpdateRequestHandler
 ...
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=searcher
 solr/core0:id=org.apache.solr.update.DirectUpdateHandler2,type=updateHandler
 sh curl 'http://localhost:8983/solr/admin/cores?action=RELOADcore=core0'
 sh echo 'open 8842 # 8842 is java pid
  domain solr/core0
  beans
  ' | java -jar jmxterm-1.0-alpha-4-uber.jar
 # there's only one bean left after Solr core reload
 solr/core0:id=org.apache.solr.search.SolrIndexSearcher,type=Searcher@2e831a91 
 main
 {noformat}
 The root cause of this is Solr core reload behavior:
 # create new core (which overwrites existing registered MBeans)
 # register new core and close old one (we remove/un-register MBeans on 
 oldCore.close)
 The correct sequence is:
 # unregister MBeans from old core
 # create and register new core
 # close old core without touching MBeans

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2630) XsltUpdateRequestHandler


 [ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler resolved SOLR-2630.
-

Resolution: Fixed

Committed trunk revision: 1141999
Committed 3.x revision: 1142003

Thanks Upayavira, the idea is great and also of use for myself (if 
PANGAEA/panFMP moves to Solr, but since we have facetting now in Lucene I don't 
think we will do this step)!

 XsltUpdateRequestHandler
 

 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.3, 4.0
Reporter: Upayavira
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: xslt-update-handler.patch, xslt-update-handler.patch


 An update request handler that can accept a tr param, allowing the indexing 
 of any XML content that is passed to solr, so long as there is an XSLT 
 stylesheet in solr/conf/xslt that can transform it to the adddoc//add 
 format.
 Could be used, for example, to allow Solr to ingest docbook directly, without 
 any preprocessing.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler

2011-07-01 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058679#comment-13058679
 ] 

Hoss Man commented on SOLR-2630:


Hmmm... from a user perspective does it really make sense for this to be an 
entirely new RequestHandler?

wouldn't it make more sense if users could just continue to use 
XmlUpdateRequestHandler along with a tr param indicating the transform to apply 
first?

 XsltUpdateRequestHandler
 

 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.3, 4.0
Reporter: Upayavira
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: xslt-update-handler.patch, xslt-update-handler.patch


 An update request handler that can accept a tr param, allowing the indexing 
 of any XML content that is passed to solr, so long as there is an XSLT 
 stylesheet in solr/conf/xslt that can transform it to the adddoc//add 
 format.
 Could be used, for example, to allow Solr to ingest docbook directly, without 
 any preprocessing.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler


[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058681#comment-13058681
 ] 

Uwe Schindler commented on SOLR-2630:
-

I was thinking about that, it would be easy to implement as the current code 
would simply be moved to XMLLoader?

Should I add patch relative to whats currently committed?

 XsltUpdateRequestHandler
 

 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.3, 4.0
Reporter: Upayavira
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: xslt-update-handler.patch, xslt-update-handler.patch


 An update request handler that can accept a tr param, allowing the indexing 
 of any XML content that is passed to solr, so long as there is an XSLT 
 stylesheet in solr/conf/xslt that can transform it to the adddoc//add 
 format.
 Could be used, for example, to allow Solr to ingest docbook directly, without 
 any preprocessing.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler


[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058690#comment-13058690
 ] 

Uwe Schindler commented on SOLR-2630:
-

On the other hand, this one is similar to XSLTResponseWriter which also is 
separate to XMLResponseWriter. XMLResponseWriter could also take an optional tr 
param and then transform? So the current solution is more consistent.

 XsltUpdateRequestHandler
 

 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.3, 4.0
Reporter: Upayavira
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: xslt-update-handler.patch, xslt-update-handler.patch


 An update request handler that can accept a tr param, allowing the indexing 
 of any XML content that is passed to solr, so long as there is an XSLT 
 stylesheet in solr/conf/xslt that can transform it to the adddoc//add 
 format.
 Could be used, for example, to allow Solr to ingest docbook directly, without 
 any preprocessing.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2630) XsltUpdateRequestHandler


[ 
https://issues.apache.org/jira/browse/SOLR-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058705#comment-13058705
 ] 

Upayavira commented on SOLR-2630:
-

I considered the same thing, making the XmlUpdateRequestHandler accept tr, but 
opted not to for the same reason as Uwe. Which ever way, consistency is a good 
thing!

 XsltUpdateRequestHandler
 

 Key: SOLR-2630
 URL: https://issues.apache.org/jira/browse/SOLR-2630
 Project: Solr
  Issue Type: Improvement
  Components: update
Affects Versions: 3.3, 4.0
Reporter: Upayavira
Assignee: Uwe Schindler
Priority: Minor
 Fix For: 3.4, 4.0

 Attachments: xslt-update-handler.patch, xslt-update-handler.patch


 An update request handler that can accept a tr param, allowing the indexing 
 of any XML content that is passed to solr, so long as there is an XSLT 
 stylesheet in solr/conf/xslt that can transform it to the adddoc//add 
 format.
 Could be used, for example, to allow Solr to ingest docbook directly, without 
 any preprocessing.
  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module

2011-07-01 Thread Hoss Man (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058730#comment-13058730
 ] 

Hoss Man commented on LUCENE-3272:
--

single module != single jar ... correct?

someone writing a small form factor app that wants to use the basic Lucene 
QueryParser shouldn't have to load a jar containing every query parser provided 
by solr (and all of the dependencies they have)

 Consolidate Lucene's QueryParsers into a module
 ---

 Key: LUCENE-3272
 URL: https://issues.apache.org/jira/browse/LUCENE-3272
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/queryparser
Reporter: Chris Male

 Lucene has a lot of QueryParsers and we should have them all in a single 
 consistent place.  
 The following are QueryParsers I can find that warrant moving to the new 
 module:
 - Lucene Core's QueryParser
 - AnalyzingQueryParser
 - ComplexPhraseQueryParser
 - ExtendableQueryParser
 - Surround's QueryParser
 - PrecedenceQueryParser
 - StandardQueryParser
 - XML-Query-Parser's CoreParser
 All seem to do a good job at their kind of parsing with extensive tests.
 One challenge of consolidating these is that many tests use Lucene Core's 
 QueryParser.  One option is to just replicate this class in src/test and call 
 it TestingQueryParser.  Another option is to convert all tests over to 
 programmatically building their queries (seems like alot of work).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-07-01 Thread Rory Plaire

@Michael -

Should that list be in JIRA? It would be easier to manage, I think...

If yes, I'll happily do it.

-r

On Fri, Jul 1, 2011 at 8:04 AM, Michael Herndon mhern...@wickedsoftware.net
 wrote:

 * need to document what the build script does.  whut grammerz?

 On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon 
 mhern...@wickedsoftware.net wrote:

  @Rory, @All,
 
  The only tickets I currently have for those is LUCENE-419, LUCENE-418
 
  418, I should be able to push into the 2.9.4g branch tonight.419 is a
  long term goal and not as important as getting the tests fixed, of have
 the
  tests broken down into what is actually a unit test, functional test,
 perf
  or long running test. I can get into more why it needs to be done.
 
  I'll also need to make document the what build script currently does on
 the
  wiki  and make a few notes about testing, like using the RAMDirectory,
  etc.
 
  Things that need to get done or even be discussed.
   * There needs to be a running list of things to do/not to do with
 testing.
  I don't know if this goes in a jira or do we keep a running list on the
 wiki
  or site for people to pick up and  help with.
   * Tests need to run on mono and not Fail (there is a good deal of
 failing
  tests on mono, mostly due to the temp directory have the C:\ in the
 path).
   * Assert.ThrowExceptionType() needs to be used instead of Try/Catch
  Assert.Fail.  **
   * File  Path combines to the temp directory need helper methods,
   * e,g, having this in a hundred places is bad   new
 
 System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get(tempDir,
  ), testIndex));
   * We should still be testing deprecated methods, but we need to use
 #pragma
  warning disable/enable 0618  for testing those. otherwise compiler
 warnings
  are too numerous to be anywhere near helpful.
   * We should only be using deprecated methods in places where they are
  being explicitly tested, other tests that need that functionality in
 order
  to validate those tests should be re factored to use methods that are not
  deprecated.
   * Identify code that could be abstracted into test utility classes.
   * Infrastructure Validation tests need to be made, anything that seems
  like infrastructure.  e.g. does the temp directory exist, does the
 folders
  that the tests use inside the temp directory exist, can we read/write to
  those folders. (if a ton of tests fail due to the file system, we should
 be
  able to point out that it was due to permissions or missing folders,
 files,
  etc).
   * Identify what classes need an interface, abstract class or inherited
 in
  order to create testing mocks. (once those classes are created, they
 should
  be documented in the wiki).
 
 
 
  ** Asset.Throws needs to replace stuff like the following. We should also
  be checking the messages for exceptions and make sure they make sense and
  can help users fix isses if the exceptions are aimed at the library
 users.
  try
  {
  d = DateTools.StringToDate(97); // no date
   Assert.Fail();
  }
  catch (System.FormatException e)
   {
  /* expected exception */
  }
 
  On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire codekai...@gmail.com
 wrote:
 
  So, veering towards action - are there concrete tasks written up
 anywhere
  for the unit tests? If a poor schlep like me wanted to dig in and start
 to
  improve them, where would I get the understanding of what is good and
 what
  needs help?
 
  -r
 
  On Thu, Jun 30, 2011 at 3:29 PM, Digy digyd...@gmail.com wrote:
 
   I can not say I like this approach, but till we find an automated
  way(with
   good results), it seems to be the only way we can use.
  
   DIGY
  
   -Original Message-
   From: Troy Howard [mailto:thowar...@gmail.com]
   Sent: Friday, July 01, 2011 12:43 AM
   To: lucene-net-...@lucene.apache.org
   Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port
 needed?
  
   Scott -
  
   The idea of the automated port is still worth doing. Perhaps it makes
  sense
   for someone more passionate about the line-by-line idea to do that
 work?
  
   I would say, focus on what makes sense to you. Being productive,
  regardless
   of the specific direction, is what will be most valuable. Once you
  start,
   others will join and momentum will build. That is how these things
 work.
  
   I like DIGY's approach too, but the problem with it is that it is a
   never-ending manual task. The theory behind the automated port is that
  it
   may reduce the manual work. It is complicated, but once it's built and
   works, it will save a lot of future development hours. If it's built
 in
  a
   sufficiently general manner, it could be useful for other project like
   Lucene.Net that want to automate a port from Java to C#.
  
   It might make sense for that to be a separate project from Lucene.Net
   though.
  
   -T
  
  
   On Thu, Jun 30, 2011 at 2:13 PM, Scott Lombard 
 lombardena...@gmail.com
   wrote:
  
Ok I think I asked the wrong

Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

I think whatever makes sense to do.

possibly create one jira for now with a running list that can be modified
and possibly as people pull from that list, cross things off or create a
separate ticket that links back to to the main one.




On Fri, Jul 1, 2011 at 3:35 PM, Rory Plaire codekai...@gmail.com wrote:

 @Michael -

 Should that list be in JIRA? It would be easier to manage, I think...

 If yes, I'll happily do it.

 -r

 On Fri, Jul 1, 2011 at 8:04 AM, Michael Herndon 
 mhern...@wickedsoftware.net
  wrote:

  * need to document what the build script does.  whut grammerz?
 
  On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon 
  mhern...@wickedsoftware.net wrote:
 
   @Rory, @All,
  
   The only tickets I currently have for those is LUCENE-419, LUCENE-418
  
   418, I should be able to push into the 2.9.4g branch tonight.419 is
 a
   long term goal and not as important as getting the tests fixed, of have
  the
   tests broken down into what is actually a unit test, functional test,
  perf
   or long running test. I can get into more why it needs to be done.
  
   I'll also need to make document the what build script currently does on
  the
   wiki  and make a few notes about testing, like using the RAMDirectory,
   etc.
  
   Things that need to get done or even be discussed.
* There needs to be a running list of things to do/not to do with
  testing.
   I don't know if this goes in a jira or do we keep a running list on the
  wiki
   or site for people to pick up and  help with.
* Tests need to run on mono and not Fail (there is a good deal of
  failing
   tests on mono, mostly due to the temp directory have the C:\ in the
  path).
* Assert.ThrowExceptionType() needs to be used instead of Try/Catch
   Assert.Fail.  **
* File  Path combines to the temp directory need helper methods,
* e,g, having this in a hundred places is bad   new
  
 
 System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get(tempDir,
   ), testIndex));
* We should still be testing deprecated methods, but we need to use
  #pragma
   warning disable/enable 0618  for testing those. otherwise compiler
  warnings
   are too numerous to be anywhere near helpful.
* We should only be using deprecated methods in places where they are
   being explicitly tested, other tests that need that functionality in
  order
   to validate those tests should be re factored to use methods that are
 not
   deprecated.
* Identify code that could be abstracted into test utility classes.
* Infrastructure Validation tests need to be made, anything that seems
   like infrastructure.  e.g. does the temp directory exist, does the
  folders
   that the tests use inside the temp directory exist, can we read/write
 to
   those folders. (if a ton of tests fail due to the file system, we
 should
  be
   able to point out that it was due to permissions or missing folders,
  files,
   etc).
* Identify what classes need an interface, abstract class or inherited
  in
   order to create testing mocks. (once those classes are created, they
  should
   be documented in the wiki).
  
  
  
   ** Asset.Throws needs to replace stuff like the following. We should
 also
   be checking the messages for exceptions and make sure they make sense
 and
   can help users fix isses if the exceptions are aimed at the library
  users.
   try
   {
   d = DateTools.StringToDate(97); // no date
Assert.Fail();
   }
   catch (System.FormatException e)
{
   /* expected exception */
   }
  
   On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire codekai...@gmail.com
  wrote:
  
   So, veering towards action - are there concrete tasks written up
  anywhere
   for the unit tests? If a poor schlep like me wanted to dig in and
 start
  to
   improve them, where would I get the understanding of what is good and
  what
   needs help?
  
   -r
  
   On Thu, Jun 30, 2011 at 3:29 PM, Digy digyd...@gmail.com wrote:
  
I can not say I like this approach, but till we find an automated
   way(with
good results), it seems to be the only way we can use.
   
DIGY
   
-Original Message-
From: Troy Howard [mailto:thowar...@gmail.com]
Sent: Friday, July 01, 2011 12:43 AM
To: lucene-net-...@lucene.apache.org
Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port
  needed?
   
Scott -
   
The idea of the automated port is still worth doing. Perhaps it
 makes
   sense
for someone more passionate about the line-by-line idea to do that
  work?
   
I would say, focus on what makes sense to you. Being productive,
   regardless
of the specific direction, is what will be most valuable. Once you
   start,
others will join and momentum will build. That is how these things
  work.
   
I like DIGY's approach too, but the problem with it is that it is a
never-ending manual task. The theory behind the automated port is
 that
   it
may reduce the manual work. It is complicated, but

[jira] [Commented] (LUCENE-3272) Consolidate Lucene's QueryParsers into a module

2011-07-01 Thread Robert Muir (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058769#comment-13058769
]

Robert Muir commented on LUCENE-3272:
-

single jar, but you can customize: its open source.

Hoss I think you are looking at this the wrong way: this actually makes it way
easier for someone writing a small form factor app that uses no query parser at
all, or their own queryparser, or whatever.

we should do this to make the lucene core smaller, and then you plug in the
modules you need (and maybe only selected parts from them, but thats your call).

I don't think we need to provide X * Y * Z possibilities, nor do we need to
provide 87 jar files.

But, this is just rehashing LUCENE-2323, where we already had this
conversation. I think at the least we should put all these QPs into one place
to make refactoring between them easier. Then we make a smaller amount of code
for these small form factor apps you are so concerned about, with the messy
duplication this is not possible now.

I still stand by my comments in LUCENE-2323, and guess what, turns out I think
I was right.
LUCENE-1938 then refactored one of these queryparsers, removing 4000 lines of
code but keeping the same functionality.

Consolidate Lucene's QueryParsers into a module
---

Key: LUCENE-3272
URL: https://issues.apache.org/jira/browse/LUCENE-3272
Project: Lucene - Java
Issue Type: Improvement
Components: modules/queryparser
Reporter: Chris Male

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[Lucene.Net] [jira] [Updated] (LUCENENET-400) Evaluate tooling for continuous integration server

2011-07-01 Thread michael herndon (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENENET-400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

michael herndon updated LUCENENET-400:
--

Due Date: 30/Sep/11  (was: 28/Feb/11)

 Evaluate tooling for continuous integration server
 --

 Key: LUCENENET-400
 URL: https://issues.apache.org/jira/browse/LUCENENET-400
 Project: Lucene.Net
  Issue Type: Task
  Components: Build Automation, Project Infrastructure
Reporter: Troy Howard
Assignee: michael herndon

 We would like to have a CI server setup for Lucene.Net.
 It has been suggested to do this outside of the ASF infrastructure, but this 
 would not work for ASF. 
 Please review the available options at http://ci.apache.org/ and evaluate 
 which CI server system would be preferred for our setup. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[Lucene.Net] [jira] [Commented] (LUCENENET-418) LuceneTestCase should not have a static method could throw exceptions.

2011-07-01 Thread michael herndon (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058781#comment-13058781
 ] 

michael herndon commented on LUCENENET-418:
---

r1132085 under the Lucene.Net_2_9_4g branch.  The exception was removed. The 
static constructor still exists, but will be re-factored out at a later date.   
The paths for the TestBackwardsCompatability tests were also fixed.  

 LuceneTestCase should not have a static method could throw exceptions.  
 

 Key: LUCENENET-418
 URL: https://issues.apache.org/jira/browse/LUCENENET-418
 Project: Lucene.Net
  Issue Type: Bug
  Components: Lucene.Net Test
Affects Versions: Lucene.Net 3.x
 Environment: Linux, OSX, etc 
Reporter: michael herndon
Assignee: michael herndon
  Labels: test
   Original Estimate: 2m
  Remaining Estimate: 2m

 Throwing an exception in a base classes for 90% tests in a static method 
 makes it hard to debug the issue in nunit.
 The test results came back saying that TestFixtureSetup was causing an issue 
 even though it was the Static Constructor causing problems and this then 
 propagates to all the tests that stem from LuceneTestCase. 
 The TEMP_DIR needs to be moved to a static util class as a property or even a 
 mixin method.  This caused me hours to debug and figure out the real issue as 
 the underlying exception method never bubbled up.  

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (SOLR-2631) PingRequestHandler can infinite loop if called with a qt that points to itsself

PingRequestHandler can infinite loop if called with a qt that points to itsself
---

 Key: SOLR-2631
 URL: https://issues.apache.org/jira/browse/SOLR-2631
 Project: Solr
  Issue Type: Bug
  Components: search, web gui
Affects Versions: 3.2, 3.1, 1.4, 3.3
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.4, 4.0


We got a security report to priv...@lucene.apache.org, that Solr can infinite 
loop, use 100% CPU and stack overflow, if you execute the following HTTP 
request: 

- http://localhost:8983/solr/select?qt=/admin/ping
- http://localhost:8983/admin/ping?qt=/admin/ping

The qt paramter instructs PingRequestHandler to call the given request handler. 
This leads to an infinite loop. This is not an security issue, but for an 
unprotected Solr server with unprotected /solr/select path this makes it stop 
working.

The fix is to prevent infinite loop by disallowing calling itsself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2631) PingRequestHandler can infinite loop if called with a qt that points to itsself


[ 
https://issues.apache.org/jira/browse/SOLR-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058785#comment-13058785
 ] 

Uwe Schindler commented on SOLR-2631:
-

Edoardo Tosca, who reported the issue, gave the following workaround for 
solrconfig.xml to fix this by configuration:

{quote}
Ok,
to solve the Ping problem you can add an invariant:
lst name=defaults
  str name=qsolrpingquery/str
  str name=echoParamsall/str
/lst
lst name=invariants
  str name=qtsearch/str
/lst

in this case you avoid generating recursive calls to /admin/ping handler

Edo
{quote}

 PingRequestHandler can infinite loop if called with a qt that points to 
 itsself
 ---

 Key: SOLR-2631
 URL: https://issues.apache.org/jira/browse/SOLR-2631
 Project: Solr
  Issue Type: Bug
  Components: search, web gui
Affects Versions: 1.4, 3.1, 3.2, 3.3
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.4, 4.0


 We got a security report to priv...@lucene.apache.org, that Solr can infinite 
 loop, use 100% CPU and stack overflow, if you execute the following HTTP 
 request: 
 - http://localhost:8983/solr/select?qt=/admin/ping
 - http://localhost:8983/admin/ping?qt=/admin/ping
 The qt paramter instructs PingRequestHandler to call the given request 
 handler. This leads to an infinite loop. This is not an security issue, but 
 for an unprotected Solr server with unprotected /solr/select path this makes 
 it stop working.
 The fix is to prevent infinite loop by disallowing calling itsself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2631) PingRequestHandler can infinite loop if called with a qt that points to itsself

[
https://issues.apache.org/jira/browse/SOLR-2631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Uwe Schindler updated SOLR-2631:

Description:
We got a security report to priv...@lucene.apache.org, that Solr can infinite
loop, use 100% CPU and stack overflow, if you execute the following HTTP
request:

- http://localhost:8983/solr/select?qt=/admin/ping
- http://localhost:8983/solr/admin/ping?qt=/admin/ping

The qt paramter instructs PingRequestHandler to call the given request handler.
This leads to an infinite loop. This is not an security issue, but for an
unprotected Solr server with unprotected /solr/select path this makes it stop
working.

The fix is to prevent infinite loop by disallowing calling itsself.

was:
We got a security report to priv...@lucene.apache.org, that Solr can infinite
loop, use 100% CPU and stack overflow, if you execute the following HTTP
request:

- http://localhost:8983/solr/select?qt=/admin/ping
- http://localhost:8983/admin/ping?qt=/admin/ping

The fix is to prevent infinite loop by disallowing calling itsself.

PingRequestHandler can infinite loop if called with a qt that points to
itsself
---

Key: SOLR-2631
URL: https://issues.apache.org/jira/browse/SOLR-2631
Project: Solr
Issue Type: Bug
Components: search, web gui
Affects Versions: 1.4, 3.1, 3.2, 3.3
Reporter: Uwe Schindler
Assignee: Uwe Schindler
Labels: security
Fix For: 3.4, 4.0

Attachments: SOLR-2631.patch

We got a security report to priv...@lucene.apache.org, that Solr can infinite
loop, use 100% CPU and stack overflow, if you execute the following HTTP
request:
- http://localhost:8983/solr/select?qt=/admin/ping
- http://localhost:8983/solr/admin/ping?qt=/admin/ping
The qt paramter instructs PingRequestHandler to call the given request
handler. This leads to an infinite loop. This is not an security issue, but
for an unprotected Solr server with unprotected /solr/select path this makes
it stop working.
The fix is to prevent infinite loop by disallowing calling itsself.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (SOLR-2429) ability to not cache a filter

2011-07-01 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-2429.


   Resolution: Fixed
Fix Version/s: 3.4

 ability to not cache a filter
 -

 Key: SOLR-2429
 URL: https://issues.apache.org/jira/browse/SOLR-2429
 Project: Solr
  Issue Type: New Feature
Reporter: Yonik Seeley
 Fix For: 3.4

 Attachments: SOLR-2429.patch


 A user should be able to add {!cache=false} to a query or filter query.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2462) Using spellcheck.collate can result in extremely high memory usage

2011-07-01 Thread Mitsu Hadeishi (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058836#comment-13058836
]

Mitsu Hadeishi commented on SOLR-2462:
--

Oh now you tell us. :) Well, we already built the patched 3.2 so we're going
with that for now :)

Using spellcheck.collate can result in extremely high memory usage
--

Key: SOLR-2462
URL: https://issues.apache.org/jira/browse/SOLR-2462
Project: Solr
Issue Type: Bug
Components: spellchecker
Affects Versions: 3.1
Reporter: James Dyer
Assignee: Robert Muir
Priority: Critical
Fix For: 3.3, 4.0

Attachments: SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch,
SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch, SOLR-2462.patch,
SOLR-2462.patch, SOLR-2462.patch, SOLR-2462_3_1.patch

When using spellcheck.collate, class SpellPossibilityIterator creates a
ranked list of *every* possible correction combination. But if returning
several corrections per term, and if several words are misspelled, the
existing algorithm uses a huge amount of memory.
This bug was introduced with SOLR-2010. However, it is triggered anytime
spellcheck.collate is used. It is not necessary to use any features that
were added with SOLR-2010.
We were in Production with Solr for 1 1/2 days and this bug started taking
our Solr servers down with infinite GC loops. It was pretty easy for this
to happen as occasionally a user will accidently paste the URL into the
Search box on our app. This URL results in a search with ~12 misspelled
words. We have spellcheck.count set to 15.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-01 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058844#comment-13058844
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

So, after a few hours hacking .. it's hopefully a step into the right direction 
for the Analysis-Page! :

Please, have a look and let me know, what you're thinking. I've changed various 
things:
* Vertical Separation should be more clear now (Index- vs. Query-Time)
* Filter-  Tokenizer-Names are placed on the left Side (so it should be easier 
to follow each token through all the steps, Full Name on MouseOver)
* Property-Names are not longer abbreviated
* All Properties (except {{match}} and {{positionHistory}}) are displayed
* If the Property-Name contains a #-Sign, only the latter part is displayed 
(Full Name on MouseOver)

Uwe, maybe you could give it a try w/ lucene-gosen? These are the [required 
changes|https://github.com/steffkes/solr-admin/commit/ddb1e0098efc2ef48082e43ed57e4b62b23ba6d7]
 (since the last svn-commit).

|| ||Version 1980||First Try||Current Page||
||Normal|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_cur.png]|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_01.png]|*[Screenshot|http://files.mathe.is/solr-admin/04_analysis.png]*|
||Verbose|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_verbose_cur.png]|[Screenshot|http://files.mathe.is/solr-admin/04_analysis_verbose_01.png]|*[Screenshot|http://files.mathe.is/solr-admin/04_analysis_verbose.png]*|

Stefan

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-1499) SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via SolrJ

2011-07-01 Thread Lance Norskog (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058848#comment-13058848
]

Lance Norskog commented on SOLR-1499:
-

Ahmet- are you still using this?

SolrEntityProcessor - DIH EntityProcessor that queries an external Solr via
SolrJ
-

Key: SOLR-1499
URL: https://issues.apache.org/jira/browse/SOLR-1499
Project: Solr
Issue Type: New Feature
Components: contrib - DataImportHandler
Reporter: Lance Norskog
Fix For: 3.3

Attachments: SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch,
SOLR-1499.patch, SOLR-1499.patch, SOLR-1499.patch

The SolrEntityProcessor queries an external Solr instance. The Solr documents
returned are unpacked and emitted as DIH fields.
The SolrEntityProcessor uses the following attributes:
* solr='http://localhost:8983/solr/sms'
** This gives the URL of the target Solr instance.
*** Note: the connection to the target Solr uses the binary SolrJ format.
* query='Jeffersonsort=id+asc'
** This gives the base query string use with Solr. It can include any
standard Solr request parameter. This attribute is processed under the
variable resolution rules and can be driven in an inner stage of the indexing
pipeline.
* rows='10'
** This gives the number of rows to fetch per request..
** The SolrEntityProcessor always fetches every document that matches the
request..
* fields='id,tag'
** This selects the fields to be returned from the Solr request.
** These must also be declared as field elements.
** As with all fields, template processors can be used to alter the contents
to be passed downwards.
* timeout='30'
** This limits the query to 5 seconds. This can be used as a fail-safe to
prevent the indexing session from freezing up. By default the timeout is 5
minutes.
Limitations:
* Solr errors are not handled correctly.
* Loop control constructs have not been tested.
* Multi-valued returned fields have not been tested.
The unit tests give examples of how to use it as the root entity and an inner
entity.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-01 Thread Stefan Matheis (steffkes) (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058865#comment-13058865
 ] 

Stefan Matheis (steffkes) commented on SOLR-2399:
-

First Feedback from #solr:

{quote}hossman_ in the new ones, it's easy to overlook that sonic and 
viewsonic are at the same position
hossman_ steffkes: i think what i would suggest is to keep your new layout, 
treat position as special and put some sort of visual indicator on terms that 
are at the same position{quote}

{quote}hossman_ oh yeah ... i ment to ask about that ... i'm assuming you 
look at the capital letters in the class name?
hossman_ i think it's an assume way to save space, definitely a good idea for 
verbose==false (as long as you can mouse over it or something to see the full 
name)
hossman_ for verbose==true ... not sure{quote}

{quote}_regarding the two-column-layout / Index- vs. Query-Time_:
elyograg I see now.  I think they might need headers to indicate which is 
which.  Not strictly required if your screen is wide enough, but if it wraps 
below, it may not be immediately apparent.{quote}

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2399) Solr Admin Interface, reworked

2011-07-01 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058879#comment-13058879
 ] 

Robert Muir commented on SOLR-2399:
---

{quote}
Uwe, maybe you could give it a try w/ lucene-gosen? These are the required 
changes (since the last svn-commit).
{quote}

If Uwe doesn't have the time, I'll try to investigate this in the next few 
days, once I stop laughing about Version 1980.

we have a version that works with trunk here, 
https://lucene-gosen.googlecode.com/svn/branches/4x

 Solr Admin Interface, reworked
 --

 Key: SOLR-2399
 URL: https://issues.apache.org/jira/browse/SOLR-2399
 Project: Solr
  Issue Type: Improvement
  Components: web gui
Reporter: Stefan Matheis (steffkes)
Assignee: Ryan McKinley
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-2399-110603-2.patch, SOLR-2399-110603.patch, 
 SOLR-2399-110606.patch, SOLR-2399-110622.patch, 
 SOLR-2399-admin-interface.patch, SOLR-2399-analysis-stopwords.patch, 
 SOLR-2399-fluid-width.patch, SOLR-2399-sorting-fields.patch, 
 SOLR-2399-wip-notice.patch, SOLR-2399.patch


 *The idea was to create a new, fresh (and hopefully clean) Solr Admin 
 Interface.* [Based on this 
 [ML-Thread|http://www.lucidimagination.com/search/document/ae35e236d29d225e/solr_admin_interface_reworked_go_on_go_away]]
 *Features:*
 * [Dashboard|http://files.mathe.is/solr-admin/01_dashboard.png]
 * [Query-Form|http://files.mathe.is/solr-admin/02_query.png]
 * [Plugins|http://files.mathe.is/solr-admin/05_plugins.png]
 * [Analysis|http://files.mathe.is/solr-admin/04_analysis.png] (SOLR-2476, 
 SOLR-2400)
 * [Schema-Browser|http://files.mathe.is/solr-admin/06_schema-browser.png]
 * [Dataimport|http://files.mathe.is/solr-admin/08_dataimport.png] (SOLR-2482)
 * [Core-Admin|http://files.mathe.is/solr-admin/09_coreadmin.png]
 * [Replication|http://files.mathe.is/solr-admin/10_replication.png]
 * [Zookeeper|http://files.mathe.is/solr-admin/11_cloud.png]
 * [Logging|http://files.mathe.is/solr-admin/07_logging.png] (SOLR-2459)
 ** Stub (using static data)
 Newly created Wiki-Page: http://wiki.apache.org/solr/ReworkedSolrAdminGUI
 I've quickly created a Github-Repository (Just for me, to keep track of the 
 changes)
 » https://github.com/steffkes/solr-admin

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-trunk - Build # 1612 - Still Failing

Build: https://builds.apache.org/job/Lucene-trunk/1612/

No tests ran.

Build Log (for compile errors):
[...truncated 9445 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?

2011-07-01 Thread Rory Plaire

My thinking is just a separate ticket for each one. This makes the work
easier to manage and gives a better sense about how much work is left as
well as makes it easier to prioritize independent issues. We could link all
the sub-issues to a single task / feature / whatever (that is, if JIRA has
that capability).

-r
On Fri, Jul 1, 2011 at 12:48 PM, Michael Herndon 
mhern...@wickedsoftware.net wrote:

 I think whatever makes sense to do.

 possibly create one jira for now with a running list that can be modified
 and possibly as people pull from that list, cross things off or create a
 separate ticket that links back to to the main one.




 On Fri, Jul 1, 2011 at 3:35 PM, Rory Plaire codekai...@gmail.com wrote:

  @Michael -
 
  Should that list be in JIRA? It would be easier to manage, I think...
 
  If yes, I'll happily do it.
 
  -r
 
  On Fri, Jul 1, 2011 at 8:04 AM, Michael Herndon 
  mhern...@wickedsoftware.net
   wrote:
 
   * need to document what the build script does.  whut grammerz?
  
   On Fri, Jul 1, 2011 at 10:52 AM, Michael Herndon 
   mhern...@wickedsoftware.net wrote:
  
@Rory, @All,
   
The only tickets I currently have for those is LUCENE-419, LUCENE-418
   
418, I should be able to push into the 2.9.4g branch tonight.419
 is
  a
long term goal and not as important as getting the tests fixed, of
 have
   the
tests broken down into what is actually a unit test, functional test,
   perf
or long running test. I can get into more why it needs to be done.
   
I'll also need to make document the what build script currently does
 on
   the
wiki  and make a few notes about testing, like using the
 RAMDirectory,
etc.
   
Things that need to get done or even be discussed.
 * There needs to be a running list of things to do/not to do with
   testing.
I don't know if this goes in a jira or do we keep a running list on
 the
   wiki
or site for people to pick up and  help with.
 * Tests need to run on mono and not Fail (there is a good deal of
   failing
tests on mono, mostly due to the temp directory have the C:\ in the
   path).
 * Assert.ThrowExceptionType() needs to be used instead of
 Try/Catch
Assert.Fail.  **
 * File  Path combines to the temp directory need helper methods,
 * e,g, having this in a hundred places is bad   new
   
  
 
 System.IO.FileInfo(System.IO.Path.Combine(Support.AppSettings.Get(tempDir,
), testIndex));
 * We should still be testing deprecated methods, but we need to use
   #pragma
warning disable/enable 0618  for testing those. otherwise compiler
   warnings
are too numerous to be anywhere near helpful.
 * We should only be using deprecated methods in places where they
 are
being explicitly tested, other tests that need that functionality in
   order
to validate those tests should be re factored to use methods that are
  not
deprecated.
 * Identify code that could be abstracted into test utility classes.
 * Infrastructure Validation tests need to be made, anything that
 seems
like infrastructure.  e.g. does the temp directory exist, does the
   folders
that the tests use inside the temp directory exist, can we read/write
  to
those folders. (if a ton of tests fail due to the file system, we
  should
   be
able to point out that it was due to permissions or missing folders,
   files,
etc).
 * Identify what classes need an interface, abstract class or
 inherited
   in
order to create testing mocks. (once those classes are created, they
   should
be documented in the wiki).
   
   
   
** Asset.Throws needs to replace stuff like the following. We should
  also
be checking the messages for exceptions and make sure they make sense
  and
can help users fix isses if the exceptions are aimed at the library
   users.
try
{
d = DateTools.StringToDate(97); // no date
 Assert.Fail();
}
catch (System.FormatException e)
 {
/* expected exception */
}
   
On Thu, Jun 30, 2011 at 11:48 PM, Rory Plaire codekai...@gmail.com
   wrote:
   
So, veering towards action - are there concrete tasks written up
   anywhere
for the unit tests? If a poor schlep like me wanted to dig in and
  start
   to
improve them, where would I get the understanding of what is good
 and
   what
needs help?
   
-r
   
On Thu, Jun 30, 2011 at 3:29 PM, Digy digyd...@gmail.com wrote:
   
 I can not say I like this approach, but till we find an automated
way(with
 good results), it seems to be the only way we can use.

 DIGY

 -Original Message-
 From: Troy Howard [mailto:thowar...@gmail.com]
 Sent: Friday, July 01, 2011 12:43 AM
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port
   needed?

 Scott -

 The idea of the automated port is still worth doing. Perhaps it
  makes

[Lucene.Net] [jira] [Commented] (LUCENENET-404) Improve brand logo design

2011-07-01 Thread Troy Howard (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENENET-404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13058761#comment-13058761
 ] 

Troy Howard commented on LUCENENET-404:
---

Just a quick update. The artist is making some final edits before we commit. 
Will post them soon. I'll attach examples. 

 Improve brand logo design
 -

 Key: LUCENENET-404
 URL: https://issues.apache.org/jira/browse/LUCENENET-404
 Project: Lucene.Net
  Issue Type: Sub-task
  Components: Project Infrastructure
Reporter: Troy Howard
Assignee: Troy Howard
Priority: Minor
  Labels: branding, logo

 The existing Lucene.Net logo leaves a lot to be desired. We'd like a new logo 
 that is modern and well designed. 
 To implement this, Troy is coordinating with StackOverflow/StackExchange to 
 manage a logo design contest, the results of which will be our new logo 
 design. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Re: [Lucene.Net] Is a Lucene.Net Line-by-Line Jave port needed?