RE: [Lucene.Net] 2.9.4

2011-09-11 Thread Prescott Nasser

Thanks Itamar!


 Date: Sat, 10 Sep 2011 20:22:59 +0300
 From: ita...@code972.com
 To: lucene-net-dev@lucene.apache.org
 Subject: Re: [Lucene.Net] 2.9.4

 We have been running some extensive tests 30hrs now against the 2.9.4
 branch, and did not detect any leaks. We will have it running a few more
 days, if you wish to wait for more conclusive findings.

 On Wed, Sep 7, 2011 at 5:07 PM, Prescott Nasser geobmx...@hotmail.comwrote:

  2.9.4 would make it in I assume because that will be our next official
  release.
 
 
  Sent from my Windows Phone
 
  -Original Message-
  From: Michael Herndon
  Sent: Wednesday, September 07, 2011 5:12 AM
  To: lucene-net-dev@lucene.apache.org
  Subject: Re: [Lucene.Net] 2.9.4
 
   What version is going to make it to nuget? 2.9.4 or 2.9.4g?
  ooo totally forgot about nuget. we definitely need to get that setup.
 
 
  On Wed, Sep 7, 2011 at 6:46 AM, digy digy digyd...@gmail.com wrote:
 
   Since it includes some level of divergence from java I committed it to
  only
   2.9.4g branch.
  
   https://issues.apache.org/jira/browse/LUCENE-1930
   https://issues.apache.org/jira/browse/LUCENENET-431
  
   DIGY
  
   On Wed, Sep 7, 2011 at 1:03 PM, Itamar Syn-Hershko ita...@code972.com
   wrote:
  
Ok, core compiles, and all tests pass. We are now running long tests to
measure memory usage among other things.
   
There is one show stopper tho. There was a patch sent by Matt Warren
  for
Spatial.Net, that doesn't seem to be in. See
http://groups.google.com/group/ravendb/msg/7517f095810c48f3
   
Any chance you can get it in to 2.9.4?
   
On Wed, Sep 7, 2011 at 1:01 AM, Itamar Syn-Hershko ita...@code972.com
wrote:
   
 Ok, great, we will run RavenDB on top of 2.9.4 in the next few days
  and
 will let you know how it went.


 On Tue, Sep 6, 2011 at 8:59 PM, Michael Herndon 
 mhern...@wickedsoftware.net wrote:

 I can't tell if the apache git mirror is updated via scheduler or
  from
 commit hooks, but its generally stays close to being on par with
  svn.
 I'll
 check next time I push something to svn.

 But both of those items have made it to the mirror.

 - michael


 On Tue, Sep 6, 2011 at 1:44 PM, Digy digyd...@gmail.com wrote:

  I don't know how often github mirror is updated.
  These are the original locations
  2.9.4
  https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/
  2.9.4g
 
 

   
  
  https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_
  9_4g/
 
  Both versions include ThreadLocal fix + Signing.
 
  Thanks,
  DIGY
 
 
 
  -Original Message-
  From: itamar.synhers...@gmail.com [mailto:
   itamar.synhers...@gmail.com
]
 On
  Behalf Of Itamar Syn-Hershko
  Sent: Tuesday, September 06, 2011 2:34 AM
  To: lucene-net-dev@lucene.apache.org
  Subject: Re: [Lucene.Net] 2.9.4
 
  Not a problem, we will test RavenDB on a separate branch, also for
  potential
  memory leaks
 
  Digy, can you make sure the github mirror contains an updated
  2.9.4
tag
 I
  can pull from, which includes the latest ThreadLocal fix + the
strongly
  signed patch applied to it?
 
  2011/9/6 Digy digyd...@gmail.com
 
   To avoid misunderstanding...
  
   Community==all Lucene.Net users
  
   DIGY
  
   -Original Message-
   From: Digy [mailto:digyd...@gmail.com]
   Sent: Monday, September 05, 2011 11:46 PM
   To: 'lucene-net-dev@lucene.apache.org'
   Subject: RE: [Lucene.Net] 2.9.4
  
   Not bad idea, but I would prefer community's feedback instead of
 testing
   against all projects using Lucene.Net
   DIGY
  
   -Original Message-
   From: Matt Warren [mailto:mattd...@gmail.com]
   Sent: Monday, September 05, 2011 11:09 PM
   To: lucene-net-dev@lucene.apache.org
   Subject: Re: [Lucene.Net] 2.9.4
  
   If you want to test it against a large project you could take a
   look
 at
  how
   RavenDB uses it?
  
   At the moment it's using 2.9.2 (
  
  
 
 

   
  
  https://github.com/ayende/ravendb/tree/master/SharedLibs/Sources/Lucene2.9.2
   )
   but if you were to recompile it against 2.9.4 and check that all
it's
   unit-tests still run that would give you quite a large test
  case.
  
   On 5 September 2011 19:22, Prescott Nasser 
  geobmx...@hotmail.com
   
  wrote:
  
   
Hey All,
   
How do people feel about the 2.9.4 code base? I've been using
  it
for
sometime, for my use cases it's be excellent. Do we feel we
  are
 ready
  to
package this up and make it an official release? Or do we have
some
  tasks
left 

[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search

2011-09-11 Thread Jasper van Veghel (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102261#comment-13102261
 ] 

Jasper van Veghel commented on SOLR-2066:
-

You're more than welcome! Having distributed grouping will be a great addition 
to have. As for the patch, could it be that you've modified a previous version 
rather than the latest one that includes the highlighting fixes? I'm getting 
the same highlighting-related stacktrace as before. ;-)

{code}SEVERE: java.lang.NullPointerException
at 
org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:156)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1407)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:680){code}

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery

2011-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102271#comment-13102271
 ] 

Robert Muir commented on LUCENE-3426:
-

Hi Koji, I wonder if instead it would be cleaner as a subclass of PhraseQuery 
(NGramPhraseQuery or similar),
that rewrites to the (possibly optimized) PhraseQuery in rewrite(). For 
example, it would build an optimized 
PhraseQuery when slop = 0, and there are enough terms to optimize, otherwise it 
would build a normal phrasequery.

Then the optimization would be easy to apply, the user just uses 
NGramPhraseQuery instead of PhraseQuery.
for example, from QueryParser:
{noformat}
  @Override
  protected PhraseQuery newPhraseQuery() {
return new NGramPhraseQuery();
  }
{noformat}


 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3423) add Terms.docCount

2011-09-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3423.
-

Resolution: Fixed
  Assignee: Robert Muir

 add Terms.docCount
 --

 Key: LUCENE-3423
 URL: https://issues.apache.org/jira/browse/LUCENE-3423
 Project: Lucene - Java
  Issue Type: New Feature
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3423.patch


 spinoff from LUCENE-3290, where yonik mentioned:
 {noformat}
 Is there currently a way to get the number of documents that have a value in 
 the field?
 Then one could compute the average length of a (sparse) field via 
 sumTotalTermFreq(field)/docsWithField(field)
 docsWithField(field) would be useful in other contexts that want to know how 
 sparse a field is (automatically selecting faceting algorithms, etc).
 {noformat}
 I think this is a useful stat to add, in case you have sparse fields for 
 heuristics or scoring.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



heads up: reindex trunk indexes

2011-09-11 Thread Robert Muir
I just committed https://issues.apache.org/jira/browse/LUCENE-3423

If you are using trunk, you should reindex.

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3427) add function queries for all index statistics

2011-09-11 Thread Robert Muir (JIRA)
add function queries for all index statistics
-

 Key: LUCENE-3427
 URL: https://issues.apache.org/jira/browse/LUCENE-3427
 Project: Lucene - Java
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir


I think we have most of them, but at least the following are missing:
* getDocCount (# of documents that contain a value for a field)
* sumDocFreq (# of postings for a field)

not sure if there are others that don't have function queries.



--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Robert Muir
+1, thanks for creating this release candidate.

On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Please vote to release the RC1 artifacts at:

  https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142

 as Lucene 3.4.0 and Solr 3.4.0.

 Mike McCandless

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2752) leader-per-shard

2011-09-11 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2752:
--

Attachment: SOLR-2752.patch

new patch - much stronger test, a couple fixes, refactor most of the leader 
election code into its own class.

 leader-per-shard
 

 Key: SOLR-2752
 URL: https://issues.apache.org/jira/browse/SOLR-2752
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Yonik Seeley
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-2752.patch, SOLR-2752.patch


 We need to add metadata into zookeeper about who is the leader for each 
 shard, and have some kind of leader election.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2752) leader-per-shard

2011-09-11 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102283#comment-13102283
 ] 

Mark Miller commented on SOLR-2752:
---

Just a quick correction to first comment - cores create an ephemeral|sequential 
node - not just ephemeral.

 leader-per-shard
 

 Key: SOLR-2752
 URL: https://issues.apache.org/jira/browse/SOLR-2752
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Yonik Seeley
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-2752.patch, SOLR-2752.patch


 We need to add metadata into zookeeper about who is the leader for each 
 shard, and have some kind of leader election.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-11 Thread Robert Muir (JIRA)
create Solr similarity factories for new ranking algorithms
---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir


To make it easy to use some of the new ranking algorithms, we should add 
factories to solr:
* for parametric models like LM and BM25 so that parameters can be set from 
schema.xml
* for framework models like IFR and IB, so that different basic 
models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-2754:
--

Description: 
To make it easy to use some of the new ranking algorithms, we should add 
factories to solr:
* for parametric models like LM and BM25 so that parameters can be set from 
schema.xml
* for framework models like DFR and IB, so that different basic 
models/normalizations/lambdas can be chosen

  was:
To make it easy to use some of the new ranking algorithms, we should add 
factories to solr:
* for parametric models like LM and BM25 so that parameters can be set from 
schema.xml
* for framework models like IFR and IB, so that different basic 
models/normalizations/lambdas can be chosen


 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir

 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-2754) create Solr similarity factories for new ranking algorithms

2011-09-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir reassigned SOLR-2754:
-

Assignee: Robert Muir

 create Solr similarity factories for new ranking algorithms
 ---

 Key: SOLR-2754
 URL: https://issues.apache.org/jira/browse/SOLR-2754
 Project: Solr
  Issue Type: New Feature
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir

 To make it easy to use some of the new ranking algorithms, we should add 
 factories to solr:
 * for parametric models like LM and BM25 so that parameters can be set from 
 schema.xml
 * for framework models like DFR and IB, so that different basic 
 models/normalizations/lambdas can be chosen

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10500 - Failure

2011-09-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10500/

1 tests failed.
FAILED:  TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml.init

Error Message:


Stack Trace:
Test report file 
/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml
 was length 0



Build Log (for compile errors):
[...truncated 1243 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads

2011-09-11 Thread Robert Muir (JIRA)
trunk tests hang/deadlock TestIndexWriterWithThreads


 Key: LUCENE-3428
 URL: https://issues.apache.org/jira/browse/LUCENE-3428
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir


trunk tests have been hanging often lately in hudson, this time i was careful 
to kill and get a good stacktrace:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads

2011-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102296#comment-13102296
 ] 

Robert Muir commented on LUCENE-3428:
-

https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-Solr-tests-only-trunk/10500

{noformat}
[junit] 2011-09-11 16:32:39
[junit] Full thread dump OpenJDK 64-Bit Server VM (20.0-b11 mixed mode):
[junit] 
[junit] Low Memory Detector daemon prio=5 tid=0x000801eee800 
nid=0x19642 runnable [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] C2 CompilerThread1 daemon prio=5 tid=0x000801eef000 
nid=0x19640 waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] C2 CompilerThread0 daemon prio=5 tid=0x000801ef 
nid=0x1963d waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] Signal Dispatcher daemon prio=5 tid=0x000801ef0800 
nid=0x19630 waiting on condition [0x]
[junit]java.lang.Thread.State: RUNNABLE
[junit] 
[junit] Finalizer daemon prio=5 tid=0x000801ef1800 nid=0x19581 in 
Object.wait() [0x7ebee000]
[junit]java.lang.Thread.State: WAITING (on object monitor)
[junit] at java.lang.Object.wait(Native Method)
[junit] - waiting on 0x000828cb0370 (a 
java.lang.ref.ReferenceQueue$Lock)
[junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133)
[junit] - locked 0x000828cb0370 (a 
java.lang.ref.ReferenceQueue$Lock)
[junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149)
[junit] at 
java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177)
[junit] 
[junit] Reference Handler daemon prio=5 tid=0x000801ef3000 
nid=0x1957f in Object.wait() [0x7ecef000]
[junit]java.lang.Thread.State: WAITING (on object monitor)
[junit] at java.lang.Object.wait(Native Method)
[junit] - waiting on 0x000828cb0410 (a 
java.lang.ref.Reference$Lock)
[junit] at java.lang.Object.wait(Object.java:502)
[junit] at 
java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133)
[junit] - locked 0x000828cb0410 (a java.lang.ref.Reference$Lock)
[junit] 
[junit] main prio=5 tid=0x000801ef3800 nid=0x19432 waiting on 
condition [0x7fbfd000]
[junit]java.lang.Thread.State: WAITING (parking)
[junit] at sun.misc.Unsafe.park(Native Method)
[junit] - parking to wait for  0x000827a440c0 (a 
java.util.concurrent.locks.ReentrantLock$NonfairSync)
[junit] at 
java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838)
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871)
[junit] at 
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201)
[junit] at 
java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214)
[junit] at 
java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290)
[junit] at 
org.apache.lucene.index.DocumentsWriterFlushControl.assertActiveDeleteQueue(DocumentsWriterFlushControl.java:435)
[junit] at 
org.apache.lucene.index.DocumentsWriterFlushControl.markForFullFlush(DocumentsWriterFlushControl.java:428)
[junit] at 
org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:557)
[junit] - locked 0x000827a417c0 (a 
org.apache.lucene.index.DocumentsWriter)
[junit] at 
org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2973)
[junit] - locked 0x000827a3d738 (a java.lang.Object)
[junit] at 
org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2950)
[junit] at 
org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1133)
[junit] at 
org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1097)
[junit] at 
org.apache.lucene.index.TestIndexWriterWithThreads.testCloseWithThreads(TestIndexWriterWithThreads.java:200)
[junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
[junit] at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
[junit] at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
[junit] at java.lang.reflect.Method.invoke(Method.java:616)
[junit] at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44)
[junit] at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15)
[junit] at 

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 10522 - Failure

2011-09-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/10522/

No tests ran.

Build Log (for compile errors):
[...truncated 142 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 424 - Failure

2011-09-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/424/

No tests ran.

Build Log (for compile errors):
[...truncated 100 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 10500 - Failure

2011-09-11 Thread Robert Muir
I killed this due to a hang/deadlock issue:
https://issues.apache.org/jira/browse/LUCENE-3428

On Sun, Sep 11, 2011 at 12:35 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10500/

 1 tests failed.
 FAILED:  TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml.init

 Error Message:


 Stack Trace:
 Test report file 
 /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml
  was length 0



 Build Log (for compile errors):
 [...truncated 1243 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 10522 - Failure

2011-09-11 Thread Robert Muir
collateral damage from
https://issues.apache.org/jira/browse/LUCENE-3428, i was just killing
java processes.

On Sun, Sep 11, 2011 at 12:36 PM, Apache Jenkins Server
jenk...@builds.apache.org wrote:
 Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/10522/

 No tests ran.

 Build Log (for compile errors):
 [...truncated 142 lines...]



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3429) improve build system when tests hang

2011-09-11 Thread Robert Muir (JIRA)
improve build system when tests hang


 Key: LUCENE-3429
 URL: https://issues.apache.org/jira/browse/LUCENE-3429
 Project: Lucene - Java
  Issue Type: Test
Reporter: Robert Muir
 Fix For: 3.5, 4.0


Currently, if tests hang in hudson it can go hung for days until we manually 
kill it.

The problem is that when a hang happens its probably serious, what we want to 
do (I think), is:
# time out the build.
# ensure we have enough debugging information to hopefully fix any hang.

So I think the ideal solution would be:
# add a sysprop -D that LuceneTestCase respects, it could default to no 
timeout at all (some value like zero).
# when a timeout is set, LuceneTestCase spawns an additional timer thread for 
the test class? method?
# if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
information, random seed information to hopefully reproduce the hang, and fails 
the test.
# nightly builds would pass some reasonable -D for each test.

separately, I think we should have an ant-level timeout for the whole build, 
in case it goes completely crazy (e.g. jvm completely hangs or something else), 
just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102301#comment-13102301
 ] 

Robert Muir commented on LUCENE-3429:
-

I'm gonna play with the ant junit task timeout first, just to see if we can do 
anything with it as a quick hack.

I suspect the problem will be that we won't get enough debugging information 
via this mechanism (random seed, stacktraces).

 improve build system when tests hang
 

 Key: LUCENE-3429
 URL: https://issues.apache.org/jira/browse/LUCENE-3429
 Project: Lucene - Java
  Issue Type: Test
Reporter: Robert Muir
 Fix For: 3.5, 4.0


 Currently, if tests hang in hudson it can go hung for days until we manually 
 kill it.
 The problem is that when a hang happens its probably serious, what we want to 
 do (I think), is:
 # time out the build.
 # ensure we have enough debugging information to hopefully fix any hang.
 So I think the ideal solution would be:
 # add a sysprop -D that LuceneTestCase respects, it could default to no 
 timeout at all (some value like zero).
 # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
 the test class? method?
 # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
 information, random seed information to hopefully reproduce the hang, and 
 fails the test.
 # nightly builds would pass some reasonable -D for each test.
 separately, I think we should have an ant-level timeout for the whole 
 build, in case it goes completely crazy (e.g. jvm completely hangs or 
 something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2066) Search Grouping: support distributed search

2011-09-11 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102307#comment-13102307
 ] 

Martijn van Groningen edited comment on SOLR-2066 at 9/11/11 5:15 PM:
--

Jasper, does the exception occur for the same queries? I did add a test for 
this. Can you run the TestDistributedSearch test?

  was (Author: martijn.v.groningen):
Jasper, does the exception occur occur for the same queries? I did add a 
test for this. Can you run the TestDistributedSearch test?
  
 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search

2011-09-11 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102307#comment-13102307
 ] 

Martijn van Groningen commented on SOLR-2066:
-

Jasper, does the exception occur occur for the same queries? I did add a test 
for this. Can you run the TestDistributedSearch test?

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3429) improve build system when tests hang

2011-09-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3429:


Attachment: LUCENE-3429.patch

here is a hack patch that sets a timeout of 1 hour to any test batch (e.g. 
test-core) by default, unless you are running Test2BTerms (10 hours).

i tested this, the issue is you get no debugging information at all... but its 
at least a small start.

 improve build system when tests hang
 

 Key: LUCENE-3429
 URL: https://issues.apache.org/jira/browse/LUCENE-3429
 Project: Lucene - Java
  Issue Type: Test
Reporter: Robert Muir
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3429.patch


 Currently, if tests hang in hudson it can go hung for days until we manually 
 kill it.
 The problem is that when a hang happens its probably serious, what we want to 
 do (I think), is:
 # time out the build.
 # ensure we have enough debugging information to hopefully fix any hang.
 So I think the ideal solution would be:
 # add a sysprop -D that LuceneTestCase respects, it could default to no 
 timeout at all (some value like zero).
 # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
 the test class? method?
 # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
 information, random seed information to hopefully reproduce the hang, and 
 fails the test.
 # nightly builds would pass some reasonable -D for each test.
 separately, I think we should have an ant-level timeout for the whole 
 build, in case it goes completely crazy (e.g. jvm completely hangs or 
 something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2752) leader-per-shard

2011-09-11 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2752:
--

Attachment: SOLR-2752.patch

Another new patch:

I moved SolrZooKeeper to the org.apache.zookeeper package so that I could add a 
simulated timeout method for tests.

I also wrote a new test that starts up a bunch of replicas and then times out 
the leader. After waiting for the leader to reconnect, all of the other 
replicas are killed and I check that the first leader is again the leader. I 
wrote this test because I knew it would fail and that on reconnecting, clients 
don't jump back into the leader election process.

So I also added to the client reconnection impl - on reconnect, all SolrCores 
are re-registered. This also has the advantage that any SolrCores that where 
created while the connection was down are put into play. That allows the new 
test to pass.

 leader-per-shard
 

 Key: SOLR-2752
 URL: https://issues.apache.org/jira/browse/SOLR-2752
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Yonik Seeley
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-2752.patch, SOLR-2752.patch, SOLR-2752.patch


 We need to add metadata into zookeeper about who is the leader for each 
 shard, and have some kind of leader election.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Sanne Grinovero
+1
all tests on all Lucene-using projects I contribute to pass without
any change needed (a sure sign I should add more...).

Once more, great work and thank so much to everyone involved.

Sanne

On 11 September 2011 16:11, Robert Muir rcm...@gmail.com wrote:
 +1, thanks for creating this release candidate.

 On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Please vote to release the RC1 artifacts at:

  https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142

 as Lucene 3.4.0 and Solr 3.4.0.

 Mike McCandless

 http://blog.mikemccandless.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org





 --
 lucidimagination.com

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads

2011-09-11 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reassigned LUCENE-3428:
---

Assignee: Simon Willnauer

 trunk tests hang/deadlock TestIndexWriterWithThreads
 

 Key: LUCENE-3428
 URL: https://issues.apache.org/jira/browse/LUCENE-3428
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Simon Willnauer
 Attachments: LUCENE-3428.patch


 trunk tests have been hanging often lately in hudson, this time i was careful 
 to kill and get a good stacktrace:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads

2011-09-11 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer updated LUCENE-3428:


Attachment: LUCENE-3428.patch

I think I found the reason or one possible reason for this. there is one place 
where we don't release a DWPT lock in the case of a failure. Here is a patch.

 trunk tests hang/deadlock TestIndexWriterWithThreads
 

 Key: LUCENE-3428
 URL: https://issues.apache.org/jira/browse/LUCENE-3428
 Project: Lucene - Java
  Issue Type: Bug
Reporter: Robert Muir
Assignee: Simon Willnauer
 Attachments: LUCENE-3428.patch


 trunk tests have been hanging often lately in hudson, this time i was careful 
 to kill and get a good stacktrace:

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk - Build # 1673 - Still Failing

2011-09-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-trunk/1673/

1 tests failed.
FAILED:  org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:50)
at org.apache.lucene.search.Scorer.score(Scorer.java:60)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:419)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:376)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:296)
at 
org.apache.lucene.queryparser.xml.TestParser.dumpResults(TestParser.java:216)
at 
org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML(TestParser.java:157)




Build Log (for compile errors):
[...truncated 16136 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Andi Vajda
I prepared a PyLucene 3.4 release candidate from the Lucene 3.4 branch.
All tests pass.

+1 to release Lucene Solr 3.4.

Andi..

On Sep 9, 2011, at 9:06, Michael McCandless luc...@mikemccandless.com wrote:

 Please vote to release the RC1 artifacts at:
 
  
 https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142
 
 as Lucene 3.4.0 and Solr 3.4.0.
 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #240: POMs out of sync

2011-09-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/240/

No tests ran.

Build Log (for compile errors):
[...truncated 13149 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search

2011-09-11 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-2066:


Attachment: LUCENE-3360.patch

Updated patch
* group.query works in distributed search
* group.main works in distributed search
* Many refactorings

I think the feature needs to be committed. Maybe besides some jdocs the patch 
is ready. I'll commit this feature in the coming days. In the mean time I will 
start working on making the patch work for the 3x branch.

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3360.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search

2011-09-11 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-2066:


Attachment: (was: LUCENE-3360.patch)

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-2066) Search Grouping: support distributed search

2011-09-11 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102354#comment-13102354
 ] 

Martijn van Groningen edited comment on SOLR-2066 at 9/11/11 9:33 PM:
--

Updated patch
* group.query works in distributed search
* group.main works in distributed search
* Many refactorings

I think the feature needs to be committed. Maybe besides some jdocs the patch 
is ready. I'll commit this feature in the coming days. In the mean time I will 
start working on the patch for the 3x branch.

  was (Author: martijn.v.groningen):
Updated patch
* group.query works in distributed search
* group.main works in distributed search
* Many refactorings

I think the feature needs to be committed. Maybe besides some jdocs the patch 
is ready. I'll commit this feature in the coming days. In the mean time I will 
start working on making the patch work for the 3x branch.
  
 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search

2011-09-11 Thread Martijn van Groningen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Martijn van Groningen updated SOLR-2066:


Attachment: SOLR-2066.patch

 Search Grouping: support distributed search
 ---

 Key: SOLR-2066
 URL: https://issues.apache.org/jira/browse/SOLR-2066
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 3.5, 4.0

 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, 
 SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch


 Support distributed field collapsing / search grouping.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-2752) leader-per-shard

2011-09-11 Thread Mark Miller (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-2752:
--

Attachment: SOLR-2752.patch

feeling motivated I guess - another patch with a bunch of polish

 leader-per-shard
 

 Key: SOLR-2752
 URL: https://issues.apache.org/jira/browse/SOLR-2752
 Project: Solr
  Issue Type: Sub-task
  Components: SolrCloud
Reporter: Yonik Seeley
Assignee: Mark Miller
 Fix For: 4.0

 Attachments: SOLR-2752.patch, SOLR-2752.patch, SOLR-2752.patch, 
 SOLR-2752.patch


 We need to add metadata into zookeeper about who is the leader for each 
 shard, and have some kind of leader election.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Uwe Schindler
Hi,

+1

I checked the Lucene Core JAR file as drop-in replacement for PANGAEA, works
without any problem. Did reindex some documents, checkindexed, optimized,
checkindexed again. All fine, no 1.6.0_24 crashes all is working as it
should. Code compiles fine, too. We are running now on this version with
Solaris and MMAP (as usual).

I had no time to verify the package contents and md5/sha1 hashes or try
Solr, but I think somebody might already have done this. I can verify that
the javadoc links to 0racle work again.

Changes look fine, one small thing: We have Java 7 try-with-resources
support now (our first Java 7 feature!!!), but the note is at wrong position
(under BUG FIXES):
LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log
suppressed exceptions in the original exception, so stack trace will contain
them. (Uwe Schindler)
[should be NEW FEATURES] - But that's minor, just if we respin again, but I
don't expect this.

Mike: Thanks for the great new release and sorry for the respin.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de

 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Friday, September 09, 2011 6:07 PM
 To: dev@lucene.apache.org Dev
 Subject: [VOTE] Release Lucene/Solr 3.4.0, RC1
 
 Please vote to release the RC1 artifacts at:
 

https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-
 rev1167142
 
 as Lucene 3.4.0 and Solr 3.4.0.
 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Erik Hatcher
+1 

Used this build in my classes today at NFJS Boston (sorry Mike - no time to say 
hi).  Solr worked just fine. 

   Erik

On Sep 9, 2011, at 12:06, Michael McCandless luc...@mikemccandless.com wrote:

 Please vote to release the RC1 artifacts at:
 
  
 https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142
 
 as Lucene 3.4.0 and Solr 3.4.0.
 
 Mike McCandless
 
 http://blog.mikemccandless.com
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: svn commit: r1169564 - in /lucene/dev/branches/branch_3x: build.xml solr/common-build.xml

2011-09-11 Thread Michael McCandless
Thanks Steve!

Mike McCandless

http://blog.mikemccandless.com

On Sun, Sep 11, 2011 at 6:47 PM,  sar...@apache.org wrote:
 Author: sarowe
 Date: Sun Sep 11 22:47:33 2011
 New Revision: 1169564

 URL: http://svn.apache.org/viewvc?rev=1169564view=rev
 Log:
 3.4 - 3.5

 Modified:
    lucene/dev/branches/branch_3x/build.xml
    lucene/dev/branches/branch_3x/solr/common-build.xml

 Modified: lucene/dev/branches/branch_3x/build.xml
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/build.xml?rev=1169564r1=1169563r2=1169564view=diff
 ==
 --- lucene/dev/branches/branch_3x/build.xml (original)
 +++ lucene/dev/branches/branch_3x/build.xml Sun Sep 11 22:47:33 2011
 @@ -45,7 +45,7 @@
     /sequential
   /target

 -  property name=version value=3.4-SNAPSHOT/
 +  property name=version value=3.5-SNAPSHOT/
   target name=get-maven-poms
           description=Copy Maven POMs from dev-tools/maven/ to their target 
 locations
     copy todir=. overwrite=true

 Modified: lucene/dev/branches/branch_3x/solr/common-build.xml
 URL: 
 http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/common-build.xml?rev=1169564r1=1169563r2=1169564view=diff
 ==
 --- lucene/dev/branches/branch_3x/solr/common-build.xml (original)
 +++ lucene/dev/branches/branch_3x/solr/common-build.xml Sun Sep 11 22:47:33 
 2011
 @@ -72,7 +72,7 @@
        By default, this should be set to X.Y.M.${dateversion}
        where X.Y.M is the last version released (on this branch).
     --
 -  property name=solr.spec.version value=3.4.0.${dateversion} /
 +  property name=solr.spec.version value=3.5.0.${dateversion} /

   path id=solr.base.classpath
        pathelement path=${analyzers-common.jar}/




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Michael McCandless
On Sun, Sep 11, 2011 at 5:46 PM, Uwe Schindler u...@thetaphi.de wrote:

 Changes look fine, one small thing: We have Java 7 try-with-resources
 support now (our first Java 7 feature!!!), but the note is at wrong position
 (under BUG FIXES):
 LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log
 suppressed exceptions in the original exception, so stack trace will contain
 them. (Uwe Schindler)
 [should be NEW FEATURES] - But that's minor, just if we respin again, but I
 don't expect this.

Woops, OK, if we respin (looks unlikely so far).  Can you fix on 3.x for 3.5?

 Mike: Thanks for the great new release and sorry for the respin.

No problem, it's really easy now: I have it down to a single Python
script!  I'll commit it to dev-tools...

Mike

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Michael McCandless
+1 to release.

I ran the release smoke tester, it was happy!

Mike McCandless

http://blog.mikemccandless.com

On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 Please vote to release the RC1 artifacts at:

  https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142

 as Lucene 3.4.0 and Solr 3.4.0.

 Mike McCandless

 http://blog.mikemccandless.com


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102370#comment-13102370
 ] 

Michael McCandless commented on LUCENE-3429:


We could run a standalone tool that does a kill -QUIT if any java process is 
taking  X minutes?

 improve build system when tests hang
 

 Key: LUCENE-3429
 URL: https://issues.apache.org/jira/browse/LUCENE-3429
 Project: Lucene - Java
  Issue Type: Test
Reporter: Robert Muir
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3429.patch


 Currently, if tests hang in hudson it can go hung for days until we manually 
 kill it.
 The problem is that when a hang happens its probably serious, what we want to 
 do (I think), is:
 # time out the build.
 # ensure we have enough debugging information to hopefully fix any hang.
 So I think the ideal solution would be:
 # add a sysprop -D that LuceneTestCase respects, it could default to no 
 timeout at all (some value like zero).
 # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
 the test class? method?
 # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
 information, random seed information to hopefully reproduce the hang, and 
 fails the test.
 # nightly builds would pass some reasonable -D for each test.
 separately, I think we should have an ant-level timeout for the whole 
 build, in case it goes completely crazy (e.g. jvm completely hangs or 
 something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene

2011-09-11 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102371#comment-13102371
 ] 

Michael McCandless commented on LUCENE-2959:


Thanks David and Robert!

What an incredible step forward: now you can easily try out all sorts of 
pre-existing scoring models, or make your own.  Yay :)

 [GSoC] Implementing State of the Art Ranking for Lucene
 ---

 Key: LUCENE-2959
 URL: https://issues.apache.org/jira/browse/LUCENE-2959
 Project: Lucene - Java
  Issue Type: New Feature
  Components: core/query/scoring, general/javadocs, modules/examples
Reporter: David Mark Nemeskey
Assignee: Robert Muir
  Labels: gsoc2011, lucene-gsoc-11, mentor
 Fix For: flexscoring branch, 4.0

 Attachments: LUCENE-2959.patch, LUCENE-2959.patch, 
 LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, 
 implementation_plan.pdf, proposal.pdf


 Lucene employs the Vector Space Model (VSM) to rank documents, which compares
 unfavorably to state of the art algorithms, such as BM25. Moreover, the 
 architecture is
 tailored specically to VSM, which makes the addition of new ranking functions 
 a non-
 trivial task.
 This project aims to bring state of the art ranking methods to Lucene and to 
 implement a
 query architecture with pluggable ranking functions.
 The wiki page for the project can be found at 
 http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Regarding Transaction logging

2011-09-11 Thread Michael McCandless
I agree: we should figure out just how an app would effectively make
use of this seq ID, in order to understand if this really is gonna
work end to end.  Else we shouldn't change Lucene's core APIs.

EG: could ES remove its lock array if Lucene returned a seq ID?  How
bad is it that ES/Solr/this-new-module would have to order their
transaction log according to Lucene's seq ID?  Or maybe it would not
re-order, but rather write the seqID+document in each entry; then on
playback (but also on RT get) it'd have to re-order?

Mike McCandless

http://blog.mikemccandless.com

On Sat, Sep 10, 2011 at 1:45 PM, Simon Willnauer
simon.willna...@googlemail.com wrote:
 On Thu, Sep 8, 2011 at 5:35 PM, Yonik Seeley yo...@lucidimagination.com 
 wrote:
 On Thu, Sep 8, 2011 at 11:26 AM, Michael McCandless
 luc...@mikemccandless.com wrote:
 Returning a long seqID seems the least invasive change to make this
 total ordering possible?  Especially since the DWDQ already computes
 this order...

 +1
 This seems like the most powerful option.

 I still wonder how we make efficient use of this. If you are ordering
 the logs based on the returned sequence Ids you have to effectively
 delay writing to the log since documents ie. their threads come back
 async and out of order. Even worse if some thread picks up a flush it
 might block for a reasonable amount of time. I am not saying its
 impossible but before we jump on it and get into the DWPT hassle we
 should at least sketch out how to make use of this feature (lemme tell
 you this is not trivial to implement and requires a fair bit of
 refactoring). If somebody has thought about this I'd be happy if you
 could share you ideas here!

 simon

 -Yonik
 http://www.lucene-eurocon.com - The Lucene/Solr User Conference

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 3.4.0 draft release notes

2011-09-11 Thread Michael McCandless
On Sat, Sep 10, 2011 at 10:21 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:

 Will the fix/patch for issue SOLR-2726 included in SOLR 3.4.0?

Sorry, no.

This isn't a release blocker issue.

But, separately, I think we should fix it, but on quick glance it
doesn't look like there's consensus on how to fix it?

Mike McCandless

http://blog.mikemccandless.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3430) TestParser.testSpanTermXML fails with some sims

2011-09-11 Thread Robert Muir (JIRA)
TestParser.testSpanTermXML fails with some sims
---

 Key: LUCENE-3430
 URL: https://issues.apache.org/jira/browse/LUCENE-3430
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0


here is why this test sometimes fails (my explanation in the test i wrote):

{noformat}
  /** make sure all sims work with spanOR(termX, termY) where termY does not 
exist */
  public void testCrazySpans() throws Exception {
// The problem: normal lucene queries create scorers, returning null if 
terms dont exist
// This means they never score a term that does not exist.
// however with spans, there is only one scorer for the whole hierarchy:
// inner queries are not real queries, their boosts are ignored, etc.
{noformat}

Basically, SpanQueries aren't really queries, you just get one scorer. it calls 
extractTerms on the whole hierarchy and computes weights (e.g. IDF) on
the whole bag of terms, even if they don't exist.

This is fine, we already have tests that sim's won't bug-out in computeStats() 
here: however they don't expect to actually score documents based on
these terms that don't exist... however this is exactly what happens in Spans 
because it doesn't use sub-scorers.

Lucene's sim avoids this with the (docFreq + 1)


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3430) TestParser.testSpanTermXML fails with some sims

2011-09-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-3430:


Attachment: LUCENE-3430.patch

patch, my modifications to the others take the same approach as lucene's sim

I did the relevance testing (across all 129 possibilities) with short queries, 
no problems, still waiting on my computer for long queries... if that comes 
back ok I'd like to commit.


 TestParser.testSpanTermXML fails with some sims
 ---

 Key: LUCENE-3430
 URL: https://issues.apache.org/jira/browse/LUCENE-3430
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3430.patch


 here is why this test sometimes fails (my explanation in the test i wrote):
 {noformat}
   /** make sure all sims work with spanOR(termX, termY) where termY does not 
 exist */
   public void testCrazySpans() throws Exception {
 // The problem: normal lucene queries create scorers, returning null if 
 terms dont exist
 // This means they never score a term that does not exist.
 // however with spans, there is only one scorer for the whole hierarchy:
 // inner queries are not real queries, their boosts are ignored, etc.
 {noformat}
 Basically, SpanQueries aren't really queries, you just get one scorer. it 
 calls extractTerms on the whole hierarchy and computes weights (e.g. IDF) on
 the whole bag of terms, even if they don't exist.
 This is fine, we already have tests that sim's won't bug-out in 
 computeStats() here: however they don't expect to actually score documents 
 based on
 these terms that don't exist... however this is exactly what happens in Spans 
 because it doesn't use sub-scorers.
 Lucene's sim avoids this with the (docFreq + 1)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: 3.4.0 draft release notes

2011-09-11 Thread Robert Muir
On Sun, Sep 11, 2011 at 7:04 PM, Michael McCandless
luc...@mikemccandless.com wrote:
 On Sat, Sep 10, 2011 at 10:21 AM, Bernd Fehling
 bernd.fehl...@uni-bielefeld.de wrote:

 Will the fix/patch for issue SOLR-2726 included in SOLR 3.4.0?

 Sorry, no.

 This isn't a release blocker issue.

 But, separately, I think we should fix it, but on quick glance it
 doesn't look like there's consensus on how to fix it?


I had this same bug when implementing a spellchecker too.
Its something the spellcheck framework expects, but doesn't provide.

I think its broken that SolrSpellChecker has both field name and analyzer,
but only sets up field name in its init()... if SolrSpellChecker is
going to own the 'analyzer' variable then
I think its init() should take care of the logic, currently its either
duplicated across spellchecker implementations,
or its missing entirely, causing bugs like SOLR-2726.


-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10504 - Failure

2011-09-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10504/

1 tests failed.
REGRESSION:  org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML

Error Message:
null

Stack Trace:
junit.framework.AssertionFailedError
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:50)
at org.apache.lucene.search.Scorer.score(Scorer.java:60)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:419)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:376)
at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:296)
at 
org.apache.lucene.queryparser.xml.TestParser.dumpResults(TestParser.java:216)
at 
org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML(TestParser.java:157)




Build Log (for compile errors):
[...truncated 5267 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Full search posibility in Solr

2011-09-11 Thread Eugeny Balakhonov
Hello,

 

My task is very simple:

 

I have a big database with a lot tables and fields. This database has
dynamic structure and can be extended or changed in any time.

I need a tool for full-search possibility via all fields in all tables of my
database. On the input of this tool - some text for search. On the output -
some unique key and the name of field which contains this text.

 

Solr is very good selection, but I have serious problem with it: all Solr
query parsers (standard, dismax, edismax) requires explicit declaration of
fields for search. But list of these fields in my case is very and very big!
And at search time I don't know all field names in  the database.

 

I think that my task is not unique. According google a lot of people tries
to solve same problems with Solr.

 

May be good idea to add more flexible possibilities for search in all
indexed fields?

 

I see following variants:

 

1. Add wildcards in the qf parameter for dismax/edismax query parsers.

 

2. Add possibility to store source field name in copyField  operator in
schema.xml. In this case user can do following:

 

a) create field for default search:

field name=TEXT type=text_ALL indexed=true stored=true
multiValued=true/

...

defaultSearchFieldTEXT/defaultSearchField

 

b) copy all fields to default search field:

copyField source=* dest=TEXT storeSource=true /

 

c) In query response user can receive needed source field name:

 

lst name=highlighting

lst name=..

arr name=TEXT

  str source=SOURCE_FIELD_NAMEfoo foo foo emtest/em foo foo/str 

  /arr

  /lst

 

I'm sorry, if has distracted from affairs.

 

Eugeny



[jira] [Commented] (SOLR-1979) Create LanguageIdentifierUpdateProcessor

2011-09-11 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102374#comment-13102374
 ] 

Jan Høydahl commented on SOLR-1979:
---

An updated documentation of the Processor is now at 
http://wiki.apache.org/solr/LanguageDetection

@Lance: What params were on your mind as candidates for keyword instead of 
true/false, and for what potential future reasons?

 Create LanguageIdentifierUpdateProcessor
 

 Key: SOLR-1979
 URL: https://issues.apache.org/jira/browse/SOLR-1979
 Project: Solr
  Issue Type: New Feature
  Components: update
Reporter: Jan Høydahl
Assignee: Jan Høydahl
Priority: Minor
  Labels: UpdateProcessor
 Fix For: 3.5

 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, 
 SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch


 Language identification from document fields, and mapping of field names to 
 language-specific fields based on detected language.
 Wrap the Tika LanguageIdentifier in an UpdateProcessor.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-2726) NullPointerException when using spellcheck.q

2011-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102375#comment-13102375
 ] 

Robert Muir commented on SOLR-2726:
---

In my opinion, since the base class SolrSpellChecker has this 'analyzer' field 
(that it wants to be non-null),
it should at least take care of it in its init() method, and we should make 
sure subclasses call super.init(args) in their init() methods.

When i had this bug in directspellchecker i copied-pasted the below code from 
AbstractLuceneSpellChecker to fix it, but i think its dumb 
to put this in every spellchecker subclass, and its trappy for someone trying 
to implement their own spellchecker:
{noformat}
if (field != null  core.getSchema().getFieldTypeNoEx(field) != null)  {
  analyzer = core.getSchema().getFieldType(field).getQueryAnalyzer();
}
fieldTypeName = (String) config.get(FIELD_TYPE);
if (core.getSchema().getFieldTypes().containsKey(fieldTypeName))  {
  FieldType fieldType = core.getSchema().getFieldTypes().get(fieldTypeName);
  analyzer = fieldType.getQueryAnalyzer();
}
if (analyzer == null)   {
  LOG.info(Using WhitespaceAnalyzer for dictionary:  + name);
  analyzer = new 
WhitespaceAnalyzer(core.getSolrConfig().luceneMatchVersion);
}
{noformat}

 NullPointerException when using spellcheck.q
 

 Key: SOLR-2726
 URL: https://issues.apache.org/jira/browse/SOLR-2726
 Project: Solr
  Issue Type: Bug
  Components: spellchecker
Affects Versions: 3.3, 4.0
 Environment: ubuntu
Reporter: valentin
  Labels: nullpointerexception, spellcheck
 Attachments: SOLR-2726.patch


 When I use spellcheck.q in my query to define what will be spellchecked, I 
 always have this error, for every configuration I try :
 java.lang.NullPointerException
 at 
 org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476)
 at 
 org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131)
 at 
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:202)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 All my other functions works great, this is the only thing which doesn't work 
 at all, just when i add spellcheck.q=my%20sentence in the query...
 Example of a query : 
 http://localhost:8983/solr/db/suggest_full?q=american%20israelspellcheck.q=american%20israel
 In solrconfig.xml :
 searchComponent name=suggest_full class=solr.SpellCheckComponent
str name=queryAnalyzerFieldTypesuggestTextFull/str
lst name=spellchecker
 str name=namesuggest_full/str
 str name=classnameorg.apache.solr.spelling.suggest.Suggester/str
 str 
 name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
 str name=fieldtext_suggest_full/str
 str name=fieldTypesuggestTextFull/str
/lst
 /searchComponent
 requestHandler name=/suggest_full 
 class=org.apache.solr.handler.component.SearchHandler
   lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest_full/str
str name=spellcheck.count10/str
str 

[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102376#comment-13102376
 ] 

Robert Muir commented on LUCENE-3429:
-

Mike, right but even that solution wouldn't be that great: it wouldn't give us 
random seed :)

Dawid pointed me to some code of his, I think he is working on a prototype for 
us to try to integrate:

https://github.com/dweiss/timeoutrule/tree/master/src/test/java/com/carrotsearch


 improve build system when tests hang
 

 Key: LUCENE-3429
 URL: https://issues.apache.org/jira/browse/LUCENE-3429
 Project: Lucene - Java
  Issue Type: Test
Reporter: Robert Muir
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3429.patch


 Currently, if tests hang in hudson it can go hung for days until we manually 
 kill it.
 The problem is that when a hang happens its probably serious, what we want to 
 do (I think), is:
 # time out the build.
 # ensure we have enough debugging information to hopefully fix any hang.
 So I think the ideal solution would be:
 # add a sysprop -D that LuceneTestCase respects, it could default to no 
 timeout at all (some value like zero).
 # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
 the test class? method?
 # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
 information, random seed information to hopefully reproduce the hang, and 
 fails the test.
 # nightly builds would pass some reasonable -D for each test.
 separately, I think we should have an ant-level timeout for the whole 
 build, in case it goes completely crazy (e.g. jvm completely hangs or 
 something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3426) optimizer for n-gram PhraseQuery

2011-09-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3426:
---

Attachment: LUCENE-3426.patch

I like the idea of introducing the newly created class! Here is the new patch.

 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery

2011-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102393#comment-13102393
 ] 

Robert Muir commented on LUCENE-3426:
-

I think I like it better too... though I wonder if its possible to keep the 
original NGramPhraseQuery unmodified?
this way its not changed by Query.rewrite(), and if a user reuses the query 
(which we document they can do), they could then call add() again and 
everything works.

Also, somewhat related to the issue might be SOLR-2660. We don't have to commit 
that patch, but we could separate
out the queryparser refactoring to make it easier for such an optimization to 
be automatic in solr, because it allows
SolrQueryParser to delegate creation of Phrase/MultiPhraseQuery to the 
FieldType.



 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Resolved] (LUCENE-3430) TestParser.testSpanTermXML fails with some sims

2011-09-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-3430.
-

Resolution: Fixed
  Assignee: Robert Muir

 TestParser.testSpanTermXML fails with some sims
 ---

 Key: LUCENE-3430
 URL: https://issues.apache.org/jira/browse/LUCENE-3430
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Fix For: 4.0

 Attachments: LUCENE-3430.patch


 here is why this test sometimes fails (my explanation in the test i wrote):
 {noformat}
   /** make sure all sims work with spanOR(termX, termY) where termY does not 
 exist */
   public void testCrazySpans() throws Exception {
 // The problem: normal lucene queries create scorers, returning null if 
 terms dont exist
 // This means they never score a term that does not exist.
 // however with spans, there is only one scorer for the whole hierarchy:
 // inner queries are not real queries, their boosts are ignored, etc.
 {noformat}
 Basically, SpanQueries aren't really queries, you just get one scorer. it 
 calls extractTerms on the whole hierarchy and computes weights (e.g. IDF) on
 the whole bag of terms, even if they don't exist.
 This is fine, we already have tests that sim's won't bug-out in 
 computeStats() here: however they don't expect to actually score documents 
 based on
 these terms that don't exist... however this is exactly what happens in Spans 
 because it doesn't use sub-scorers.
 Lucene's sim avoids this with the (docFreq + 1)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10505 - Still Failing

2011-09-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10505/

1 tests failed.
FAILED:  org.apache.solr.search.TestRealTimeGet.testStressGetRealtime

Error Message:
java.lang.AssertionError: Some threads threw uncaught exceptions!

Stack Trace:
java.lang.RuntimeException: java.lang.AssertionError: Some threads threw 
uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:695)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:89)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:723)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:667)




Build Log (for compile errors):
[...truncated 8579 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3426) optimizer for n-gram PhraseQuery

2011-09-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3426:
---

Attachment: LUCENE-3426.patch

{quote}
I think I like it better too... though I wonder if its possible to keep the 
original NGramPhraseQuery unmodified?
this way its not changed by Query.rewrite(), and if a user reuses the query 
(which we document they can do), they could then call add() again and 
everything works.
{quote}

I wonder it that too. Here is the new patch. This time I added 
assertSame()/NotSame() to check the rewritten Query to test code.

 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3426) optimizer for n-gram PhraseQuery

2011-09-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3426:
---

Attachment: PerfTest.java

 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery

2011-09-11 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102405#comment-13102405
 ] 

Koji Sekiguchi commented on LUCENE-3426:


For automatic in Solr, I wonder if we could move the feature to n-gram 
tokenizers, and we could have something like:

{code}
fieldType name=text_cjk class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.CJKTokenizerFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.CJKTokenizerFactory optimizePhraseQuery=true/
  /analyzer
/fieldType
{code}


 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery

2011-09-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102406#comment-13102406
 ] 

Robert Muir commented on LUCENE-3426:
-

Well if we apply the refactoring part of SOLR-2660 (we can split out into a 
separate issue), we could add such a thing as an attribute to the fieldType?

I like the way your patch looks now! A couple more questions:
* doesn't the optimization also apply to MultiPhraseQuery? If so, 
NGramPhraseQuery could extend MultiPhraseQuery and just rewrite to the correct 
one (MultiPhrase or Phrase depending upon the situation after optimization)
* what about hashCode/equals? Although the same results will be returned, 
scoring will differ, maybe it NGramPhraseQuery should implement these?


 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3426) optimizer for n-gram PhraseQuery

2011-09-11 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102405#comment-13102405
 ] 

Koji Sekiguchi edited comment on LUCENE-3426 at 9/12/11 2:02 AM:
-

For automatic in Solr, I wonder if we could move the feature to n-gram 
tokenizers, and we could have something like:

{code}
fieldType name=text_cjk class=solr.TextField positionIncrementGap=100
   autoGeneratePhraseQueries=true
  analyzer type=index
tokenizer class=solr.CJKTokenizerFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.CJKTokenizerFactory optimizePhraseQuery=true/
  /analyzer
/fieldType
{code}


  was (Author: koji):
For automatic in Solr, I wonder if we could move the feature to n-gram 
tokenizers, and we could have something like:

{code}
fieldType name=text_cjk class=solr.TextField positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.CJKTokenizerFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.CJKTokenizerFactory optimizePhraseQuery=true/
  /analyzer
/fieldType
{code}

  
 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery

2011-09-11 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102408#comment-13102408
 ] 

Koji Sekiguchi commented on LUCENE-3426:


I'm not sure it could apply MutiPhraseQuery. Let me take more time.

Considering hashCode/equals is good point. I'll see.


 optimizer for n-gram PhraseQuery
 

 Key: LUCENE-3426
 URL: https://issues.apache.org/jira/browse/LUCENE-3426
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/search
Reporter: Koji Sekiguchi
Priority: Trivial
 Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, 
 LUCENE-3426.patch, PerfTest.java, PerfTest.java


 If 2-gram is used and the length of query string is 4, for example q=ABCD, 
 QueryParser generates (when autoGeneratePhraseQueries is true) 
 PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB 
 CD) with appropriate positions.
 The idea came from the Japanese paper N.M-gram: Implementation of Inverted 
 Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main 
 theme of the paper is different from the idea that I'm using here, though)

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: issue SOLR-1565

2011-09-11 Thread William Bell
Code is not supposed to fly around in email. Use JIRA. Just create a
new issue and attach it to the bug using SVN diff.

See http://wiki.apache.org/solr/HowToContribute


On Fri, Sep 9, 2011 at 1:03 PM, Patrick Sauts psa...@viadeoteam.com wrote:
 Hi,



 I’ve made a alpha version of StreamingUpdateSolrServer dedicated to Binary
 update (javabin), It works fine for me.



 It is not a fix of the issue SOLR-1565, it is a new class.

 But I think It can maybe be useful to fix the bug.



 If somebody tests it thank you to send feedback.



 Patrick Sauts.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-trunk - Build # 1674 - Still Failing

2011-09-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-trunk/1674/

2 tests failed.
REGRESSION:  org.apache.lucene.index.TestTermsEnum.testIntersectRandom

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.util.automaton.RunAutomaton.init(RunAutomaton.java:128)
at 
org.apache.lucene.util.automaton.ByteRunAutomaton.init(ByteRunAutomaton.java:28)
at 
org.apache.lucene.util.automaton.CompiledAutomaton.init(CompiledAutomaton.java:134)
at 
org.apache.lucene.index.TestTermsEnum.testIntersectRandom(TestTermsEnum.java:266)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)


REGRESSION:  org.apache.lucene.util.automaton.TestCompiledAutomaton.testRandom

Error Message:
Java heap space

Stack Trace:
java.lang.OutOfMemoryError: Java heap space
at 
org.apache.lucene.util.automaton.RunAutomaton.init(RunAutomaton.java:128)
at 
org.apache.lucene.util.automaton.ByteRunAutomaton.init(ByteRunAutomaton.java:28)
at 
org.apache.lucene.util.automaton.CompiledAutomaton.init(CompiledAutomaton.java:134)
at 
org.apache.lucene.util.automaton.TestCompiledAutomaton.build(TestCompiledAutomaton.java:39)
at 
org.apache.lucene.util.automaton.TestCompiledAutomaton.testTerms(TestCompiledAutomaton.java:55)
at 
org.apache.lucene.util.automaton.TestCompiledAutomaton.testRandom(TestCompiledAutomaton.java:101)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)




Build Log (for compile errors):
[...truncated 12798 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10507 - Failure

2011-09-11 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10507/

1 tests failed.
REGRESSION:  org.apache.lucene.search.TestFilteredQuery.testFilteredQuery

Error Message:
expected:2.778353214263916 but was:2.778353452682495

Stack Trace:
junit.framework.AssertionFailedError: expected:2.778353214263916 but 
was:2.778353452682495
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148)
at 
org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50)
at 
org.apache.lucene.search.TestFilteredQuery.assertScoreEquals(TestFilteredQuery.java:182)
at 
org.apache.lucene.search.TestFilteredQuery.testFilteredQuery(TestFilteredQuery.java:154)




Build Log (for compile errors):
[...truncated 1261 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [VOTE] Release Lucene/Solr 3.4.0, RC1

2011-09-11 Thread Shai Erera
+1 to release. Checked docs and changes and they look ok.

Shai

On Mon, Sep 12, 2011 at 1:57 AM, Michael McCandless 
luc...@mikemccandless.com wrote:

 +1 to release.

 I ran the release smoke tester, it was happy!

 Mike McCandless

 http://blog.mikemccandless.com

 On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless
 luc...@mikemccandless.com wrote:
  Please vote to release the RC1 artifacts at:
 
 
 https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142
 
  as Lucene 3.4.0 and Solr 3.4.0.
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




How to serach on specific file types ?

2011-09-11 Thread ahmad ajiloo
Hello
I want to search on articles. So need to find only specific files like doc,
docx, and pdf. can you help me?


How to see links in offline mode?

2011-09-11 Thread ahmad ajiloo
Hello
I'm using Nutch to crawl in web. then send my data to Solr for index and
search. when Solr search on indexes, the url of hints, which Solr finds, is
linked to the web. But I want to have some titles which linked to my site.
so I want to use crawled data in Nutch database to show any web pages or
files that users search in my search engine! this is an offline search and
our users wouldn't need to go on other web pages.

can you help me?


[jira] [Commented] (LUCENE-3429) improve build system when tests hang

2011-09-11 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102429#comment-13102429
 ] 

Hoss Man commented on LUCENE-3429:
--

bq. separately, I think we should have an ant-level timeout for the whole 
build, in case it goes completely crazy (e.g. jvm completely hangs or something 
else), just as an additional safety.

jenkins's has a build option to  handle this part (no personal experience with 
it though)

bq. Dawid pointed me to some code of his, ...

A per test annotation definitely seems like the killer solution.

 improve build system when tests hang
 

 Key: LUCENE-3429
 URL: https://issues.apache.org/jira/browse/LUCENE-3429
 Project: Lucene - Java
  Issue Type: Test
Reporter: Robert Muir
 Fix For: 3.5, 4.0

 Attachments: LUCENE-3429.patch


 Currently, if tests hang in hudson it can go hung for days until we manually 
 kill it.
 The problem is that when a hang happens its probably serious, what we want to 
 do (I think), is:
 # time out the build.
 # ensure we have enough debugging information to hopefully fix any hang.
 So I think the ideal solution would be:
 # add a sysprop -D that LuceneTestCase respects, it could default to no 
 timeout at all (some value like zero).
 # when a timeout is set, LuceneTestCase spawns an additional timer thread for 
 the test class? method?
 # if the timeout is exceeded, LuceneTestCase dumps all thread/stack 
 information, random seed information to hopefully reproduce the hang, and 
 fails the test.
 # nightly builds would pass some reasonable -D for each test.
 separately, I think we should have an ant-level timeout for the whole 
 build, in case it goes completely crazy (e.g. jvm completely hangs or 
 something else), just as an additional safety.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers

2011-09-11 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102443#comment-13102443
 ] 

Chris Male commented on LUCENE-3396:


Committed revision 1169607.

Now attacking the remaining Analyzers.

 Make TokenStream Reuse Mandatory for Analyzers
 --

 Key: LUCENE-3396
 URL: https://issues.apache.org/jira/browse/LUCENE-3396
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Reporter: Chris Male
 Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
 LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, 
 LUCENE-3396-rab.patch, LUCENE-3396-rab.patch


 In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having 
 to return reusable TokenStreams.  This is a big chunk of work, but its time 
 to bite the bullet.
 I plan to attack this in the following way:
 - Collapse the logic of ReusableAnalyzerBase into Analyzer
 - Add a ReuseStrategy abstraction to Analyzer which controls whether the 
 TokenStreamComponents are reused globally (as they are today) or per-field.
 - Convert all Analyzers over to using TokenStreamComponents.  I've already 
 seen that some of the TokenStreams created in tests need some work to be 
 reusable (even if they aren't reused).
 - Remove Analyzer.reusableTokenStream and convert everything over to using 
 .tokenStream (which will now be returning reusable TokenStreams).

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Tika can not parse all of the persian pdf files

2011-09-11 Thread ahmad ajiloo
Hello
I used Tika (of course in Nutch) to parse some persian pdf files. some of
the files clearly transformed to a plain text. but about some of them,
output was corrupted. I used ICU4J v4 library and the text changed to
right-to-left mode. but the mentioned problem didn't resolve. insofar as
Tika can not understand any charachter of input persian pdf file!

I copy this text via Document Viewer in Linux: this is a clearly persian
text !
--
‫هر روز پس از نماز صبح، سوره مباركه الرحمن را تا فباي آلاء ربكما تكذبان
بخواند.‬
‫) اين يعني 21 آيه اول سوره ، كه در قرآن رسم الخط عثمانطه تقريبا يك نصف
صفحه است. (‬
‫همچنين در روايات از حضرت رسول )ص( و ائمه اطهار )ع( آمده كه چند چيز براي قوت
حافظه مفيد است:‬
‫1- مسواك كردن 2- روزه گرفتن 3- قرائت قرآن؛ مخصوصا آيه الكرسي‬
‫4- خوردن عسل‬ ‫5- خوردن عدس 6- خوردن گوشت نزديک گردن
--
Tike returns this output !
--
 92   @A   8 * B
   C9D  !D   ) (?)   =/
   

 () ,8 ;
 8 #

   +  9!:
 L
  #)4   M() * 0
 * -3IA J
  - 2   (+   G
 H  -1
 (+ J 5#+C 0T J (+  O - 6R . (+  O - 5 PH. (+  O -4
--
can anyone help me?
thanks a lot