date:20110911


[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102271#comment-13102271
 ] 

Robert Muir commented on LUCENE-3426:
-

Hi Koji, I wonder if instead it would be cleaner as a subclass of PhraseQuery 
(NGramPhraseQuery or similar),
that rewrites to the (possibly optimized) PhraseQuery in rewrite(). For 
example, it would build an optimized 
PhraseQuery when slop = 0, and there are enough terms to optimize, otherwise it 
would build a normal phrasequery.

Then the optimization would be easy to apply, the user just uses 
NGramPhraseQuery instead of PhraseQuery.
for example, from QueryParser:
{noformat}
  @Override
  protected PhraseQuery newPhraseQuery() {
return new NGramPhraseQuery();
  }
{noformat}


 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3423) add Terms.docCount


 [ 
https://issues.apache.org/jira/browse/LUCENE-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3423.
-

Resolution: Fixed
  Assignee: Robert Muir

 add Terms.docCount
 --

 Key: LUCENE-3423
 URL: https://issues.apache.org/jira/browse/LUCENE-3423
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3423.patch


 spinoff from LUCENE-3290, where yonik mentioned:
 {noformat}
 Is there currently a way to get the number of documents that have a value in 
 the field?
 Then one could compute the average length of a (sparse) field via 
 sumTotalTermFreq(field)/docsWithField(field)
 docsWithField(field) would be useful in other contexts that want to know how 
 sparse a field is (automatically selecting faceting algorithms, etc).
 {noformat}
 I think this is a useful stat to add, in case you have sparse fields for 
 heuristics or scoring.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

heads up: reindex trunk indexes

I just committed https://issues.apache.org/jira/browse/LUCENE-3423

If you are using trunk, you should reindex.

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3427) add function queries for all index statistics

add function queries for all index statistics
-

 Key: LUCENE-3427
 URL: https://issues.apache.org/jira/browse/LUCENE-3427
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir


I think we have most of them, but at least the following are missing:
* getDocCount (# of documents that contain a value for a field)
* sumDocFreq (# of postings for a field)

not sure if there are others that don't have function queries.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

+1, thanks for creating this release candidate.

On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Please vote to release the RC1 artifacts at:

  https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142

 as Lucene 3.4.0 and Solr 3.4.0.

 Mike McCandless

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2752) leader-per-shard


 [ 
https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2752:
--

Attachment: SOLR-2752.patch

new patch - much stronger test, a couple fixes, refactor most of the leader 
election code into its own class.

 leader-per-shard
 

 Key: SOLR-2752
 URL: https://issues.apache.org/jira/browse/SOLR-2752
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Yonik Seeley
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-2752.patch, SOLR-2752.patch


 We need to add metadata into zookeeper about who is the leader for each 
 shard, and have some kind of leader election.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2752) leader-per-shard


[ 
https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102283#comment-13102283
 ] 

Mark Miller commented on SOLR-2752:
---

Just a quick correction to first comment - cores create an ephemeral|sequential 
node - not just ephemeral.

 leader-per-shard
 

 Key: SOLR-2752
 URL: https://issues.apache.org/jira/browse/SOLR-2752
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Yonik Seeley
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-2752.patch, SOLR-2752.patch


 We need to add metadata into zookeeper about who is the leader for each 
 shard, and have some kind of leader election.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-2754) create Solr similarity factories for new ranking algorithms

create Solr similarity factories for new ranking algorithms
---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir


To make it easy to use some of the new ranking algorithms, we should add 
factories to solr:
* for parametric models like LM and BM25 so that parameters can be set from 
schema.xml
* for framework models like IFR and IB, so that different basic 
models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2754) create Solr similarity factories for new ranking algorithms


 [ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2754:
--

Description: 
To make it easy to use some of the new ranking algorithms, we should add 
factories to solr:
* for parametric models like LM and BM25 so that parameters can be set from 
schema.xml
* for framework models like DFR and IB, so that different basic 
models/normalizations/lambdas can be chosen

  was:
To make it easy to use some of the new ranking algorithms, we should add 
factories to solr:
* for parametric models like LM and BM25 so that parameters can be set from 
schema.xml
* for framework models like IFR and IB, so that different basic 
models/normalizations/lambdas can be chosen


 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir

 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (SOLR-2754) create Solr similarity factories for new ranking algorithms


 [ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned SOLR-2754:
-

Assignee: Robert Muir

 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir

 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10500 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10500/

1 tests failed.
FAILED:  TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml.init

Error Message:


Stack Trace:
Test report file 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml
 was length 0



Build Log (for compile errors):
[...truncated 1243 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads

trunk tests hang/deadlock TestIndexWriterWithThreads


 Key: LUCENE-3428
 URL: https://issues.apache.org/jira/browse/LUCENE-3428
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir


trunk tests have been hanging often lately in hudson, this time i was careful 
to kill and get a good stacktrace:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads


[ 
https://issues.apache.org/jira/browse/LUCENE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102296#comment-13102296
 ] 

Robert Muir commented on LUCENE-3428:
-

https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-Solr-tests-only-trunk/10500

{noformat}
[junit] 2011-09-11 16:32:39
[junit] Full thread dump OpenJDK 64-Bit Server VM (20.0-b11 mixed mode):
[junit] 
[junit] Low Memory Detector daemon prio=5 tid=0x000801eee800 
nid=0x19642 runnable [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] C2 CompilerThread1 daemon prio=5 tid=0x000801eef000 
nid=0x19640 waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] C2 CompilerThread0 daemon prio=5 tid=0x000801ef 
nid=0x1963d waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] Signal Dispatcher daemon prio=5 tid=0x000801ef0800 
nid=0x19630 waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] Finalizer daemon prio=5 tid=0x000801ef1800 nid=0x19581 in 
Object.wait() [0x7ebee000]
[junit]java.lang.Thread.State: WAITING (on object monitor)
[junit] at java.lang.Object.wait(Native Method)
[junit] - waiting on 0x000828cb0370 (a 
java.lang.ref.ReferenceQueue$Lock)
[junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
[junit] - locked 0x000828cb0370 (a 
java.lang.ref.ReferenceQueue$Lock)
[junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
[junit] at 
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)
[junit] 
[junit] Reference Handler daemon prio=5 tid=0x000801ef3000 
nid=0x1957f in Object.wait() [0x7ecef000]
[junit]java.lang.Thread.State: WAITING (on object monitor)
[junit] at java.lang.Object.wait(Native Method)
[junit] - waiting on 0x000828cb0410 (a 
java.lang.ref.Reference$Lock)
[junit] at java.lang.Object.wait(Object.java:502)
[junit] at 
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
[junit] - locked 0x000828cb0410 (a java.lang.ref.Reference$Lock)
[junit] 
[junit] main prio=5 tid=0x000801ef3800 nid=0x19432 waiting on 
condition [0x7fbfd000]
[junit]java.lang.Thread.State: WAITING (parking)
[junit] at sun.misc.Unsafe.park(Native Method)
[junit] - parking to wait for  0x000827a440c0 (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
[junit] at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871)
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201)
[junit] at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
[junit] at 
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
[junit] at 
org.apache.lucene.index.DocumentsWriterFlushControl.assertActiveDeleteQueue(DocumentsWriterFlushControl.java:435)
[junit] at 
org.apache.lucene.index.DocumentsWriterFlushControl.markForFullFlush(DocumentsWriterFlushControl.java:428)
[junit] at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:557)
[junit] - locked 0x000827a417c0 (a 
org.apache.lucene.index.DocumentsWriter)
[junit] at 
org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2973)
[junit] - locked 0x000827a3d738 (a java.lang.Object)
[junit] at 
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2950)
[junit] at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1133)
[junit] at 
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1097)
[junit] at 
org.apache.lucene.index.TestIndexWriterWithThreads.testCloseWithThreads(TestIndexWriterWithThreads.java:200)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit] at java.lang.reflect.Method.invoke(Method.java:616)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 10522 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/10522/

No tests ran.

Build Log (for compile errors):
[...truncated 142 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 424 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/424/

No tests ran.

Build Log (for compile errors):
[...truncated 100 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 10500 - Failure

I killed this due to a hang/deadlock issue:
https://issues.apache.org/jira/browse/LUCENE-3428

On Sun, Sep 11, 2011 at 12:35 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10500/

 1 tests failed.
 FAILED:  TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml.init

 Error Message:


 Stack Trace:
 Test report file 
 /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml
  was length 0



 Build Log (for compile errors):
 [...truncated 1243 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 10522 - Failure

collateral damage from
https://issues.apache.org/jira/browse/LUCENE-3428, i was just killing
java processes.

On Sun, Sep 11, 2011 at 12:36 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/10522/

 No tests ran.

 Build Log (for compile errors):
 [...truncated 142 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3429) improve build system when tests hang

improve build system when tests hang


 Key: LUCENE-3429
 URL: https://issues.apache.org/jira/browse/LUCENE-3429
 Project: Lucene - Java
  Issue Type: Test
Reporter: Robert Muir
 Fix For: 3.5, 4.0


Currently, if tests hang in hudson it can go hung for days until we manually 
kill it.

The problem is that when a hang happens its probably serious, what we want to 
do (I think), is:
# time out the build.
# ensure we have enough debugging information to hopefully fix any hang.

So I think the ideal solution would be:
# add a sysprop -D that LuceneTestCase respects, it could default to no 
timeout at all (some value like zero).
# when a timeout is set, LuceneTestCase spawns an additional timer thread for 
the test class? method?
# if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
information, random seed information to hopefully reproduce the hang, and fails 
the test.
# nightly builds would pass some reasonable -D for each test.

separately, I think we should have an ant-level timeout for the whole build, 
in case it goes completely crazy (e.g. jvm completely hangs or something else), 
just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3429) improve build system when tests hang

[
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102301#comment-13102301
]

Robert Muir commented on LUCENE-3429:
-

I'm gonna play with the ant junit task timeout first, just to see if we can do
anything with it as a quick hack.

I suspect the problem will be that we won't get enough debugging information
via this mechanism (random seed, stacktraces).

improve build system when tests hang

Key: LUCENE-3429
URL: https://issues.apache.org/jira/browse/LUCENE-3429
Project: Lucene - Java
Issue Type: Test
Reporter: Robert Muir
Fix For: 3.5, 4.0

Currently, if tests hang in hudson it can go hung for days until we manually
kill it.
The problem is that when a hang happens its probably serious, what we want to
do (I think), is:
# time out the build.
# ensure we have enough debugging information to hopefully fix any hang.
So I think the ideal solution would be:
# add a sysprop -D that LuceneTestCase respects, it could default to no
timeout at all (some value like zero).
# when a timeout is set, LuceneTestCase spawns an additional timer thread for
the test class? method?
# if the timeout is exceeded, LuceneTestCase dumps all thread/stack
information, random seed information to hopefully reproduce the hang, and
fails the test.
# nightly builds would pass some reasonable -D for each test.
separately, I think we should have an ant-level timeout for the whole
build, in case it goes completely crazy (e.g. jvm completely hangs or
something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2066) Search Grouping: support distributed search


[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102307#comment-13102307
 ] 

Martijn van Groningen edited comment on SOLR-2066 at 9/11/11 5:15 PM:
--

Jasper, does the exception occur for the same queries? I did add a test for 
this. Can you run the TestDistributedSearch test?

  was (Author: martijn.v.groningen):
Jasper, does the exception occur occur for the same queries? I did add a 
test for this. Can you run the TestDistributedSearch test?
  
 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search


[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102307#comment-13102307
 ] 

Martijn van Groningen commented on SOLR-2066:
-

Jasper, does the exception occur occur for the same queries? I did add a test 
for this. Can you run the TestDistributedSearch test?

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3429) improve build system when tests hang

[
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Robert Muir updated LUCENE-3429:

Attachment: LUCENE-3429.patch

here is a hack patch that sets a timeout of 1 hour to any test batch (e.g.
test-core) by default, unless you are running Test2BTerms (10 hours).

i tested this, the issue is you get no debugging information at all... but its
at least a small start.

improve build system when tests hang

Key: LUCENE-3429
URL: https://issues.apache.org/jira/browse/LUCENE-3429
Project: Lucene - Java
Issue Type: Test
Reporter: Robert Muir
Fix For: 3.5, 4.0

Attachments: LUCENE-3429.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2752) leader-per-shard


 [ 
https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2752:
--

Attachment: SOLR-2752.patch

Another new patch:

I moved SolrZooKeeper to the org.apache.zookeeper package so that I could add a 
simulated timeout method for tests.

I also wrote a new test that starts up a bunch of replicas and then times out 
the leader. After waiting for the leader to reconnect, all of the other 
replicas are killed and I check that the first leader is again the leader. I 
wrote this test because I knew it would fail and that on reconnecting, clients 
don't jump back into the leader election process.

So I also added to the client reconnection impl - on reconnect, all SolrCores 
are re-registered. This also has the advantage that any SolrCores that where 
created while the connection was down are put into play. That allows the new 
test to pass.

 leader-per-shard
 

 Key: SOLR-2752
 URL: https://issues.apache.org/jira/browse/SOLR-2752
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Yonik Seeley
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-2752.patch, SOLR-2752.patch, SOLR-2752.patch


 We need to add metadata into zookeeper about who is the leader for each 
 shard, and have some kind of leader election.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Sanne Grinovero

+1
all tests on all Lucene-using projects I contribute to pass without
any change needed (a sure sign I should add more...).

Once more, great work and thank so much to everyone involved.

Sanne

On 11 September 2011 16:11, Robert Muir rcm...@gmail.com wrote:
 +1, thanks for creating this release candidate.

 On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Please vote to release the RC1 artifacts at:

  https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142

 as Lucene 3.4.0 and Solr 3.4.0.

 Mike McCandless

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Assigned] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads

2011-09-11 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-3428:
---

Assignee: Simon Willnauer

 trunk tests hang/deadlock TestIndexWriterWithThreads
 

 Key: LUCENE-3428
 URL: https://issues.apache.org/jira/browse/LUCENE-3428
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Simon Willnauer
 Attachments: LUCENE-3428.patch


 trunk tests have been hanging often lately in hudson, this time i was careful 
 to kill and get a good stacktrace:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads

2011-09-11 Thread Simon Willnauer (JIRA)


 [ 
https://issues.apache.org/jira/browse/LUCENE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3428:


Attachment: LUCENE-3428.patch

I think I found the reason or one possible reason for this. there is one place 
where we don't release a DWPT lock in the case of a failure. Here is a patch.

 trunk tests hang/deadlock TestIndexWriterWithThreads
 

 Key: LUCENE-3428
 URL: https://issues.apache.org/jira/browse/LUCENE-3428
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Simon Willnauer
 Attachments: LUCENE-3428.patch


 trunk tests have been hanging often lately in hudson, this time i was careful 
 to kill and get a good stacktrace:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-trunk - Build # 1673 - Still Failing

Build: https://builds.apache.org/job/Lucene-trunk/1673/

1 tests failed.
FAILED:  org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:50)
at org.apache.lucene.search.Scorer.score(Scorer.java:60)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:419)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:376)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:296)
at 
org.apache.lucene.queryparser.xml.TestParser.dumpResults(TestParser.java:216)
at 
org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML(TestParser.java:157)




Build Log (for compile errors):
[...truncated 16136 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Andi Vajda

I prepared a PyLucene 3.4 release candidate from the Lucene 3.4 branch.
All tests pass.

+1 to release Lucene Solr 3.4.

Andi..

On Sep 9, 2011, at 9:06, Michael McCandless luc...@mikemccandless.com wrote:

 Please vote to release the RC1 artifacts at:
 
  
 https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142
 
 as Lucene 3.4.0 and Solr 3.4.0.
 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #240: POMs out of sync

Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/240/

No tests ran.

Build Log (for compile errors):
[...truncated 13149 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search


 [ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-2066:


Attachment: LUCENE-3360.patch

Updated patch
* group.query works in distributed search
* group.main works in distributed search
* Many refactorings

I think the feature needs to be committed. Maybe besides some jdocs the patch 
is ready. I'll commit this feature in the coming days. In the mean time I will 
start working on making the patch work for the 3x branch.

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3360.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search


 [ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-2066:


Attachment: (was: LUCENE-3360.patch)

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2066) Search Grouping: support distributed search

[
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102354#comment-13102354
]

Martijn van Groningen edited comment on SOLR-2066 at 9/11/11 9:33 PM:
--

Updated patch
* group.query works in distributed search
* group.main works in distributed search
* Many refactorings

I think the feature needs to be committed. Maybe besides some jdocs the patch
is ready. I'll commit this feature in the coming days. In the mean time I will
start working on the patch for the 3x branch.

was (Author: martijn.v.groningen):
Updated patch
* group.query works in distributed search
* group.main works in distributed search
* Many refactorings

I think the feature needs to be committed. Maybe besides some jdocs the patch
is ready. I'll commit this feature in the coming days. In the mean time I will
start working on making the patch work for the 3x branch.

Search Grouping: support distributed search
---

Key: SOLR-2066
URL: https://issues.apache.org/jira/browse/SOLR-2066
Project: Solr
Issue Type: Sub-task
Reporter: Yonik Seeley
Fix For: 3.5, 4.0

Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch,
SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch,
SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch

Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search


 [ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-2066:


Attachment: SOLR-2066.patch

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2752) leader-per-shard


 [ 
https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2752:
--

Attachment: SOLR-2752.patch

feeling motivated I guess - another patch with a bunch of polish

 leader-per-shard
 

 Key: SOLR-2752
 URL: https://issues.apache.org/jira/browse/SOLR-2752
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Yonik Seeley
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-2752.patch, SOLR-2752.patch, SOLR-2752.patch, 
 SOLR-2752.patch


 We need to add metadata into zookeeper about who is the leader for each 
 shard, and have some kind of leader election.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Uwe Schindler

Hi,

+1

I checked the Lucene Core JAR file as drop-in replacement for PANGAEA, works
without any problem. Did reindex some documents, checkindexed, optimized,
checkindexed again. All fine, no 1.6.0_24 crashes all is working as it
should. Code compiles fine, too. We are running now on this version with
Solaris and MMAP (as usual).

I had no time to verify the package contents and md5/sha1 hashes or try
Solr, but I think somebody might already have done this. I can verify that
the javadoc links to 0racle work again.

Changes look fine, one small thing: We have Java 7 try-with-resources
support now (our first Java 7 feature!!!), but the note is at wrong position
(under BUG FIXES):
LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log
suppressed exceptions in the original exception, so stack trace will contain
them. (Uwe Schindler)
[should be NEW FEATURES] - But that's minor, just if we respin again, but I
don't expect this.

Mike: Thanks for the great new release and sorry for the respin.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Friday, September 09, 2011 6:07 PM
 To: dev@lucene.apache.org Dev
 Subject: [VOTE] Release Lucene/Solr 3.4.0, RC1
 
 Please vote to release the RC1 artifacts at:
 

https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-
 rev1167142
 
 as Lucene 3.4.0 and Solr 3.4.0.
 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Erik Hatcher

+1 

Used this build in my classes today at NFJS Boston (sorry Mike - no time to say 
hi).  Solr worked just fine. 

   Erik

On Sep 9, 2011, at 12:06, Michael McCandless luc...@mikemccandless.com wrote:

 Please vote to release the RC1 artifacts at:
 
  
 https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142
 
 as Lucene 3.4.0 and Solr 3.4.0.
 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: svn commit: r1169564 - in /lucene/dev/branches/branch_3x: build.xml solr/common-build.xml

Thanks Steve!

Mike McCandless

http://blog.mikemccandless.com

On Sun, Sep 11, 2011 at 6:47 PM,  sar...@apache.org wrote:
 Author: sarowe
 Date: Sun Sep 11 22:47:33 2011
 New Revision: 1169564

 URL: http://svn.apache.org/viewvc?rev=1169564view=rev
 Log:
 3.4 - 3.5

 Modified:
    lucene/dev/branches/branch_3x/build.xml
    lucene/dev/branches/branch_3x/solr/common-build.xml

 Modified: lucene/dev/branches/branch_3x/build.xml
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/build.xml?rev=1169564r1=1169563r2=1169564view=diff
 ==
 --- lucene/dev/branches/branch_3x/build.xml (original)
 +++ lucene/dev/branches/branch_3x/build.xml Sun Sep 11 22:47:33 2011
 @@ -45,7 +45,7 @@
     /sequential
   /target

 -  property name=version value=3.4-SNAPSHOT/
 +  property name=version value=3.5-SNAPSHOT/
   target name=get-maven-poms
           description=Copy Maven POMs from dev-tools/maven/ to their target 
 locations
     copy todir=. overwrite=true

 Modified: lucene/dev/branches/branch_3x/solr/common-build.xml
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/common-build.xml?rev=1169564r1=1169563r2=1169564view=diff
 ==
 --- lucene/dev/branches/branch_3x/solr/common-build.xml (original)
 +++ lucene/dev/branches/branch_3x/solr/common-build.xml Sun Sep 11 22:47:33 
 2011
 @@ -72,7 +72,7 @@
        By default, this should be set to X.Y.M.${dateversion}
        where X.Y.M is the last version released (on this branch).
     --
 -  property name=solr.spec.version value=3.4.0.${dateversion} /
 +  property name=solr.spec.version value=3.5.0.${dateversion} /

   path id=solr.base.classpath
        pathelement path=${analyzers-common.jar}/




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

On Sun, Sep 11, 2011 at 5:46 PM, Uwe Schindler u...@thetaphi.de wrote:

 Changes look fine, one small thing: We have Java 7 try-with-resources
 support now (our first Java 7 feature!!!), but the note is at wrong position
 (under BUG FIXES):
 LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log
 suppressed exceptions in the original exception, so stack trace will contain
 them. (Uwe Schindler)
 [should be NEW FEATURES] - But that's minor, just if we respin again, but I
 don't expect this.

Woops, OK, if we respin (looks unlikely so far).  Can you fix on 3.x for 3.5?

 Mike: Thanks for the great new release and sorry for the respin.

No problem, it's really easy now: I have it down to a single Python
script!  I'll commit it to dev-tools...

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

+1 to release.

I ran the release smoke tester, it was happy!

Mike McCandless

http://blog.mikemccandless.com

On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Please vote to release the RC1 artifacts at:

  https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142

 as Lucene 3.4.0 and Solr 3.4.0.

 Mike McCandless

 http://blog.mikemccandless.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-11 Thread Michael McCandless (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102370#comment-13102370
]

Michael McCandless commented on LUCENE-3429:

We could run a standalone tool that does a kill -QUIT if any java process is
taking X minutes?

improve build system when tests hang

Key: LUCENE-3429
URL: https://issues.apache.org/jira/browse/LUCENE-3429
Project: Lucene - Java
Issue Type: Test
Reporter: Robert Muir
Fix For: 3.5, 4.0

Attachments: LUCENE-3429.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

2011-09-11 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102371#comment-13102371
 ] 

Michael McCandless commented on LUCENE-2959:


Thanks David and Robert!

What an incredible step forward: now you can easily try out all sorts of 
pre-existing scoring models, or make your own.  Yay :)

 [GSoC] Implementing State of the Art Ranking for Lucene
 ---

 Key: LUCENE-2959
 URL: https://issues.apache.org/jira/browse/LUCENE-2959
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/query/scoring, general/javadocs, modules/examples
Reporter: David Mark Nemeskey
Assignee: Robert Muir
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: flexscoring branch, 4.0

 Attachments: LUCENE-2959.patch, LUCENE-2959.patch, 
 LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, 
 implementation_plan.pdf, proposal.pdf


 Lucene employs the Vector Space Model (VSM) to rank documents, which compares
 unfavorably to state of the art algorithms, such as BM25. Moreover, the 
 architecture is
 tailored specically to VSM, which makes the addition of new ranking functions 
 a non-
 trivial task.
 This project aims to bring state of the art ranking methods to Lucene and to 
 implement a
 query architecture with pluggable ranking functions.
 The wiki page for the project can be found at 
 http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Regarding Transaction logging

I agree: we should figure out just how an app would effectively make
use of this seq ID, in order to understand if this really is gonna
work end to end.  Else we shouldn't change Lucene's core APIs.

EG: could ES remove its lock array if Lucene returned a seq ID?  How
bad is it that ES/Solr/this-new-module would have to order their
transaction log according to Lucene's seq ID?  Or maybe it would not
re-order, but rather write the seqID+document in each entry; then on
playback (but also on RT get) it'd have to re-order?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Sep 10, 2011 at 1:45 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 On Thu, Sep 8, 2011 at 5:35 PM, Yonik Seeley yo...@lucidimagination.com 
 wrote:
 On Thu, Sep 8, 2011 at 11:26 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Returning a long seqID seems the least invasive change to make this
 total ordering possible?  Especially since the DWDQ already computes
 this order...

 +1
 This seems like the most powerful option.

 I still wonder how we make efficient use of this. If you are ordering
 the logs based on the returned sequence Ids you have to effectively
 delay writing to the log since documents ie. their threads come back
 async and out of order. Even worse if some thread picks up a flush it
 might block for a reasonable amount of time. I am not saying its
 impossible but before we jump on it and get into the DWPT hassle we
 should at least sketch out how to make use of this feature (lemme tell
 you this is not trivial to implement and requires a fair bit of
 refactoring). If somebody has thought about this I'd be happy if you
 could share you ideas here!

 simon

 -Yonik
 http://www.lucene-eurocon.com - The Lucene/Solr User Conference

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 3.4.0 draft release notes

On Sat, Sep 10, 2011 at 10:21 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:

 Will the fix/patch for issue SOLR-2726 included in SOLR 3.4.0?

Sorry, no.

This isn't a release blocker issue.

But, separately, I think we should fix it, but on quick glance it
doesn't look like there's consensus on how to fix it?

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (LUCENE-3430) TestParser.testSpanTermXML fails with some sims

TestParser.testSpanTermXML fails with some sims
---

 Key: LUCENE-3430
 URL: https://issues.apache.org/jira/browse/LUCENE-3430
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0


here is why this test sometimes fails (my explanation in the test i wrote):

{noformat}
  /** make sure all sims work with spanOR(termX, termY) where termY does not 
exist */
  public void testCrazySpans() throws Exception {
// The problem: normal lucene queries create scorers, returning null if 
terms dont exist
// This means they never score a term that does not exist.
// however with spans, there is only one scorer for the whole hierarchy:
// inner queries are not real queries, their boosts are ignored, etc.
{noformat}

Basically, SpanQueries aren't really queries, you just get one scorer. it calls 
extractTerms on the whole hierarchy and computes weights (e.g. IDF) on
the whole bag of terms, even if they don't exist.

This is fine, we already have tests that sim's won't bug-out in computeStats() 
here: however they don't expect to actually score documents based on
these terms that don't exist... however this is exactly what happens in Spans 
because it doesn't use sub-scorers.

Lucene's sim avoids this with the (docFreq + 1)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3430) TestParser.testSpanTermXML fails with some sims


 [ 
https://issues.apache.org/jira/browse/LUCENE-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3430:


Attachment: LUCENE-3430.patch

patch, my modifications to the others take the same approach as lucene's sim

I did the relevance testing (across all 129 possibilities) with short queries, 
no problems, still waiting on my computer for long queries... if that comes 
back ok I'd like to commit.


 TestParser.testSpanTermXML fails with some sims
 ---

 Key: LUCENE-3430
 URL: https://issues.apache.org/jira/browse/LUCENE-3430
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3430.patch


 here is why this test sometimes fails (my explanation in the test i wrote):
 {noformat}
   /** make sure all sims work with spanOR(termX, termY) where termY does not 
 exist */
   public void testCrazySpans() throws Exception {
 // The problem: normal lucene queries create scorers, returning null if 
 terms dont exist
 // This means they never score a term that does not exist.
 // however with spans, there is only one scorer for the whole hierarchy:
 // inner queries are not real queries, their boosts are ignored, etc.
 {noformat}
 Basically, SpanQueries aren't really queries, you just get one scorer. it 
 calls extractTerms on the whole hierarchy and computes weights (e.g. IDF) on
 the whole bag of terms, even if they don't exist.
 This is fine, we already have tests that sim's won't bug-out in 
 computeStats() here: however they don't expect to actually score documents 
 based on
 these terms that don't exist... however this is exactly what happens in Spans 
 because it doesn't use sub-scorers.
 Lucene's sim avoids this with the (docFreq + 1)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: 3.4.0 draft release notes

On Sun, Sep 11, 2011 at 7:04 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 On Sat, Sep 10, 2011 at 10:21 AM, Bernd Fehling
 bernd.fehl...@uni-bielefeld.de wrote:

 Will the fix/patch for issue SOLR-2726 included in SOLR 3.4.0?

 Sorry, no.

 This isn't a release blocker issue.

 But, separately, I think we should fix it, but on quick glance it
 doesn't look like there's consensus on how to fix it?


I had this same bug when implementing a spellchecker too.
Its something the spellcheck framework expects, but doesn't provide.

I think its broken that SolrSpellChecker has both field name and analyzer,
but only sets up field name in its init()... if SolrSpellChecker is
going to own the 'analyzer' variable then
I think its init() should take care of the logic, currently its either
duplicated across spellchecker implementations,
or its missing entirely, causing bugs like SOLR-2726.


-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10504 - Failure

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10504/

1 tests failed.
REGRESSION:  org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:50)
at org.apache.lucene.search.Scorer.score(Scorer.java:60)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:419)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:376)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:296)
at 
org.apache.lucene.queryparser.xml.TestParser.dumpResults(TestParser.java:216)
at 
org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML(TestParser.java:157)




Build Log (for compile errors):
[...truncated 5267 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Full search posibility in Solr

2011-09-11 Thread Eugeny Balakhonov

Hello,

 

My task is very simple:

 

I have a big database with a lot tables and fields. This database has
dynamic structure and can be extended or changed in any time.

I need a tool for full-search possibility via all fields in all tables of my
database. On the input of this tool - some text for search. On the output -
some unique key and the name of field which contains this text.

 

Solr is very good selection, but I have serious problem with it: all Solr
query parsers (standard, dismax, edismax) requires explicit declaration of
fields for search. But list of these fields in my case is very and very big!
And at search time I don't know all field names in  the database.

 

I think that my task is not unique. According google a lot of people tries
to solve same problems with Solr.

 

May be good idea to add more flexible possibilities for search in all
indexed fields?

 

I see following variants:

 

1. Add wildcards in the qf parameter for dismax/edismax query parsers.

 

2. Add possibility to store source field name in copyField  operator in
schema.xml. In this case user can do following:

 

a) create field for default search:

field name=TEXT type=text_ALL indexed=true stored=true
multiValued=true/

...

defaultSearchFieldTEXT/defaultSearchField

 

b) copy all fields to default search field:

copyField source=* dest=TEXT storeSource=true /

 

c) In query response user can receive needed source field name:

 

lst name=highlighting

lst name=..

arr name=TEXT

  str source=SOURCE_FIELD_NAMEfoo foo foo emtest/em foo foo/str 

  /arr

  /lst

 

I'm sorry, if has distracted from affairs.

 

Eugeny

[jira] [Commented] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-09-11 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102374#comment-13102374
 ] 

Jan Høydahl commented on SOLR-1979:
---

An updated documentation of the Processor is now at 
http://wiki.apache.org/solr/LanguageDetection

@Lance: What params were on your mind as candidates for keyword instead of 
true/false, and for what potential future reasons?

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
  Labels: UpdateProcessor
 Fix For: 3.5

 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch


 Language identification from document fields, and mapping of field names to 
 language-specific fields based on detected language.
 Wrap the Tika LanguageIdentifier in an UpdateProcessor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2726) NullPointerException when using spellcheck.q


[ 
https://issues.apache.org/jira/browse/SOLR-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102375#comment-13102375
 ] 

Robert Muir commented on SOLR-2726:
---

In my opinion, since the base class SolrSpellChecker has this 'analyzer' field 
(that it wants to be non-null),
it should at least take care of it in its init() method, and we should make 
sure subclasses call super.init(args) in their init() methods.

When i had this bug in directspellchecker i copied-pasted the below code from 
AbstractLuceneSpellChecker to fix it, but i think its dumb 
to put this in every spellchecker subclass, and its trappy for someone trying 
to implement their own spellchecker:
{noformat}
if (field != null  core.getSchema().getFieldTypeNoEx(field) != null)  {
  analyzer = core.getSchema().getFieldType(field).getQueryAnalyzer();
}
fieldTypeName = (String) config.get(FIELD_TYPE);
if (core.getSchema().getFieldTypes().containsKey(fieldTypeName))  {
  FieldType fieldType = core.getSchema().getFieldTypes().get(fieldTypeName);
  analyzer = fieldType.getQueryAnalyzer();
}
if (analyzer == null)   {
  LOG.info(Using WhitespaceAnalyzer for dictionary:  + name);
  analyzer = new 
WhitespaceAnalyzer(core.getSolrConfig().luceneMatchVersion);
}
{noformat}

 NullPointerException when using spellcheck.q
 

 Key: SOLR-2726
 URL: https://issues.apache.org/jira/browse/SOLR-2726
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.3, 4.0
 Environment: ubuntu
Reporter: valentin
  Labels: nullpointerexception, spellcheck
 Attachments: SOLR-2726.patch


 When I use spellcheck.q in my query to define what will be spellchecked, I 
 always have this error, for every configuration I try :
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476)
 at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:202)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 All my other functions works great, this is the only thing which doesn't work 
 at all, just when i add spellcheck.q=my%20sentence in the query...
 Example of a query : 
 http://localhost:8983/solr/db/suggest_full?q=american%20israelspellcheck.q=american%20israel
 In solrconfig.xml :
 searchComponent name=suggest_full class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypesuggestTextFull/str
lst name=spellchecker
 str name=namesuggest_full/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str 
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 str name=fieldtext_suggest_full/str
 str name=fieldTypesuggestTextFull/str
/lst
 /searchComponent
 requestHandler name=/suggest_full 
 class=org.apache.solr.handler.component.SearchHandler
   lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest_full/str
str name=spellcheck.count10/str
str

[jira] [Commented] (LUCENE-3429) improve build system when tests hang

[
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102376#comment-13102376
]

Robert Muir commented on LUCENE-3429:
-

Mike, right but even that solution wouldn't be that great: it wouldn't give us
random seed :)

Dawid pointed me to some code of his, I think he is working on a prototype for
us to try to integrate:

https://github.com/dweiss/timeoutrule/tree/master/src/test/java/com/carrotsearch

improve build system when tests hang

Key: LUCENE-3429
URL: https://issues.apache.org/jira/browse/LUCENE-3429
Project: Lucene - Java
Issue Type: Test
Reporter: Robert Muir
Fix For: 3.5, 4.0

Attachments: LUCENE-3429.patch

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3426) optimizer for n-gram PhraseQuery


 [ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3426:
---

Attachment: LUCENE-3426.patch

I like the idea of introducing the newly created class! Here is the new patch.

 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery


[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102393#comment-13102393
 ] 

Robert Muir commented on LUCENE-3426:
-

I think I like it better too... though I wonder if its possible to keep the 
original NGramPhraseQuery unmodified?
this way its not changed by Query.rewrite(), and if a user reuses the query 
(which we document they can do), they could then call add() again and 
everything works.

Also, somewhat related to the issue might be SOLR-2660. We don't have to commit 
that patch, but we could separate
out the queryparser refactoring to make it easier for such an optimization to 
be automatic in solr, because it allows
SolrQueryParser to delegate creation of Phrase/MultiPhraseQuery to the 
FieldType.



 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Resolved] (LUCENE-3430) TestParser.testSpanTermXML fails with some sims


 [ 
https://issues.apache.org/jira/browse/LUCENE-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3430.
-

Resolution: Fixed
  Assignee: Robert Muir

 TestParser.testSpanTermXML fails with some sims
 ---

 Key: LUCENE-3430
 URL: https://issues.apache.org/jira/browse/LUCENE-3430
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3430.patch


 here is why this test sometimes fails (my explanation in the test i wrote):
 {noformat}
   /** make sure all sims work with spanOR(termX, termY) where termY does not 
 exist */
   public void testCrazySpans() throws Exception {
 // The problem: normal lucene queries create scorers, returning null if 
 terms dont exist
 // This means they never score a term that does not exist.
 // however with spans, there is only one scorer for the whole hierarchy:
 // inner queries are not real queries, their boosts are ignored, etc.
 {noformat}
 Basically, SpanQueries aren't really queries, you just get one scorer. it 
 calls extractTerms on the whole hierarchy and computes weights (e.g. IDF) on
 the whole bag of terms, even if they don't exist.
 This is fine, we already have tests that sim's won't bug-out in 
 computeStats() here: however they don't expect to actually score documents 
 based on
 these terms that don't exist... however this is exactly what happens in Spans 
 because it doesn't use sub-scorers.
 Lucene's sim avoids this with the (docFreq + 1)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10505 - Still Failing

Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10505/

1 tests failed.
FAILED:  org.apache.solr.search.TestRealTimeGet.testStressGetRealtime

Error Message:
java.lang.AssertionError: Some threads threw uncaught exceptions!

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: Some threads threw 
uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:695)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:89)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:723)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:667)




Build Log (for compile errors):
[...truncated 8579 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3426) optimizer for n-gram PhraseQuery


 [ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3426:
---

Attachment: LUCENE-3426.patch

{quote}
I think I like it better too... though I wonder if its possible to keep the 
original NGramPhraseQuery unmodified?
this way its not changed by Query.rewrite(), and if a user reuses the query 
(which we document they can do), they could then call add() again and 
everything works.
{quote}

I wonder it that too. Here is the new patch. This time I added 
assertSame()/NotSame() to check the rewritten Query to test code.

 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (LUCENE-3426) optimizer for n-gram PhraseQuery


 [ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3426:
---

Attachment: PerfTest.java

 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery


[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102405#comment-13102405
 ] 

Koji Sekiguchi commented on LUCENE-3426:


For automatic in Solr, I wonder if we could move the feature to n-gram 
tokenizers, and we could have something like:

{code}
fieldType name=text_cjk class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.CJKTokenizerFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.CJKTokenizerFactory optimizePhraseQuery=true/
  /analyzer
/fieldType
{code}


 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery


[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102406#comment-13102406
 ] 

Robert Muir commented on LUCENE-3426:
-

Well if we apply the refactoring part of SOLR-2660 (we can split out into a 
separate issue), we could add such a thing as an attribute to the fieldType?

I like the way your patch looks now! A couple more questions:
* doesn't the optimization also apply to MultiPhraseQuery? If so, 
NGramPhraseQuery could extend MultiPhraseQuery and just rewrite to the correct 
one (MultiPhrase or Phrase depending upon the situation after optimization)
* what about hashCode/equals? Although the same results will be returned, 
scoring will differ, maybe it NGramPhraseQuery should implement these?


 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (LUCENE-3426) optimizer for n-gram PhraseQuery


[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102405#comment-13102405
 ] 

Koji Sekiguchi edited comment on LUCENE-3426 at 9/12/11 2:02 AM:
-

For automatic in Solr, I wonder if we could move the feature to n-gram 
tokenizers, and we could have something like:

{code}
fieldType name=text_cjk class=solr.TextField positionIncrementGap=100
   autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.CJKTokenizerFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.CJKTokenizerFactory optimizePhraseQuery=true/
  /analyzer
/fieldType
{code}


  was (Author: koji):
For automatic in Solr, I wonder if we could move the feature to n-gram 
tokenizers, and we could have something like:

{code}
fieldType name=text_cjk class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.CJKTokenizerFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.CJKTokenizerFactory optimizePhraseQuery=true/
  /analyzer
/fieldType
{code}

  
 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery


[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102408#comment-13102408
 ] 

Koji Sekiguchi commented on LUCENE-3426:


I'm not sure it could apply MutiPhraseQuery. Let me take more time.

Considering hashCode/equals is good point. I'll see.


 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: issue SOLR-1565

2011-09-11 Thread William Bell

Code is not supposed to fly around in email. Use JIRA. Just create a
new issue and attach it to the bug using SVN diff.

See http://wiki.apache.org/solr/HowToContribute


On Fri, Sep 9, 2011 at 1:03 PM, Patrick Sauts psa...@viadeoteam.com wrote:
 Hi,



 I’ve made a alpha version of StreamingUpdateSolrServer dedicated to Binary
 update (javabin), It works fine for me.



 It is not a fix of the issue SOLR-1565, it is a new class.

 But I think It can maybe be useful to fix the bug.



 If somebody tests it thank you to send feedback.



 Patrick Sauts.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-trunk - Build # 1674 - Still Failing

Build: https://builds.apache.org/job/Lucene-trunk/1674/

2 tests failed.
REGRESSION:  org.apache.lucene.index.TestTermsEnum.testIntersectRandom

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.util.automaton.RunAutomaton.init(RunAutomaton.java:128)
at 
org.apache.lucene.util.automaton.ByteRunAutomaton.init(ByteRunAutomaton.java:28)
at 
org.apache.lucene.util.automaton.CompiledAutomaton.init(CompiledAutomaton.java:134)
at 
org.apache.lucene.index.TestTermsEnum.testIntersectRandom(TestTermsEnum.java:266)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)


REGRESSION:  org.apache.lucene.util.automaton.TestCompiledAutomaton.testRandom

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.util.automaton.RunAutomaton.init(RunAutomaton.java:128)
at 
org.apache.lucene.util.automaton.ByteRunAutomaton.init(ByteRunAutomaton.java:28)
at 
org.apache.lucene.util.automaton.CompiledAutomaton.init(CompiledAutomaton.java:134)
at 
org.apache.lucene.util.automaton.TestCompiledAutomaton.build(TestCompiledAutomaton.java:39)
at 
org.apache.lucene.util.automaton.TestCompiledAutomaton.testTerms(TestCompiledAutomaton.java:55)
at 
org.apache.lucene.util.automaton.TestCompiledAutomaton.testRandom(TestCompiledAutomaton.java:101)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)




Build Log (for compile errors):
[...truncated 12798 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10507 - Failure