RE: [Lucene.Net] 2.9.4
Thanks Itamar! Date: Sat, 10 Sep 2011 20:22:59 +0300 From: ita...@code972.com To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] 2.9.4 We have been running some extensive tests 30hrs now against the 2.9.4 branch, and did not detect any leaks. We will have it running a few more days, if you wish to wait for more conclusive findings. On Wed, Sep 7, 2011 at 5:07 PM, Prescott Nasser geobmx...@hotmail.comwrote: 2.9.4 would make it in I assume because that will be our next official release. Sent from my Windows Phone -Original Message- From: Michael Herndon Sent: Wednesday, September 07, 2011 5:12 AM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] 2.9.4 What version is going to make it to nuget? 2.9.4 or 2.9.4g? ooo totally forgot about nuget. we definitely need to get that setup. On Wed, Sep 7, 2011 at 6:46 AM, digy digy digyd...@gmail.com wrote: Since it includes some level of divergence from java I committed it to only 2.9.4g branch. https://issues.apache.org/jira/browse/LUCENE-1930 https://issues.apache.org/jira/browse/LUCENENET-431 DIGY On Wed, Sep 7, 2011 at 1:03 PM, Itamar Syn-Hershko ita...@code972.com wrote: Ok, core compiles, and all tests pass. We are now running long tests to measure memory usage among other things. There is one show stopper tho. There was a patch sent by Matt Warren for Spatial.Net, that doesn't seem to be in. See http://groups.google.com/group/ravendb/msg/7517f095810c48f3 Any chance you can get it in to 2.9.4? On Wed, Sep 7, 2011 at 1:01 AM, Itamar Syn-Hershko ita...@code972.com wrote: Ok, great, we will run RavenDB on top of 2.9.4 in the next few days and will let you know how it went. On Tue, Sep 6, 2011 at 8:59 PM, Michael Herndon mhern...@wickedsoftware.net wrote: I can't tell if the apache git mirror is updated via scheduler or from commit hooks, but its generally stays close to being on par with svn. I'll check next time I push something to svn. But both of those items have made it to the mirror. - michael On Tue, Sep 6, 2011 at 1:44 PM, Digy digyd...@gmail.com wrote: I don't know how often github mirror is updated. These are the original locations 2.9.4 https://svn.apache.org/repos/asf/incubator/lucene.net/trunk/ 2.9.4g https://svn.apache.org/repos/asf/incubator/lucene.net/branches/Lucene.Net_2_ 9_4g/ Both versions include ThreadLocal fix + Signing. Thanks, DIGY -Original Message- From: itamar.synhers...@gmail.com [mailto: itamar.synhers...@gmail.com ] On Behalf Of Itamar Syn-Hershko Sent: Tuesday, September 06, 2011 2:34 AM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] 2.9.4 Not a problem, we will test RavenDB on a separate branch, also for potential memory leaks Digy, can you make sure the github mirror contains an updated 2.9.4 tag I can pull from, which includes the latest ThreadLocal fix + the strongly signed patch applied to it? 2011/9/6 Digy digyd...@gmail.com To avoid misunderstanding... Community==all Lucene.Net users DIGY -Original Message- From: Digy [mailto:digyd...@gmail.com] Sent: Monday, September 05, 2011 11:46 PM To: 'lucene-net-dev@lucene.apache.org' Subject: RE: [Lucene.Net] 2.9.4 Not bad idea, but I would prefer community's feedback instead of testing against all projects using Lucene.Net DIGY -Original Message- From: Matt Warren [mailto:mattd...@gmail.com] Sent: Monday, September 05, 2011 11:09 PM To: lucene-net-dev@lucene.apache.org Subject: Re: [Lucene.Net] 2.9.4 If you want to test it against a large project you could take a look at how RavenDB uses it? At the moment it's using 2.9.2 ( https://github.com/ayende/ravendb/tree/master/SharedLibs/Sources/Lucene2.9.2 ) but if you were to recompile it against 2.9.4 and check that all it's unit-tests still run that would give you quite a large test case. On 5 September 2011 19:22, Prescott Nasser geobmx...@hotmail.com wrote: Hey All, How do people feel about the 2.9.4 code base? I've been using it for sometime, for my use cases it's be excellent. Do we feel we are ready to package this up and make it an official release? Or do we have some tasks left
[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102261#comment-13102261 ] Jasper van Veghel commented on SOLR-2066: - You're more than welcome! Having distributed grouping will be a great addition to have. As for the patch, could it be that you've modified a previous version rather than the latest one that includes the highlighting fixes? I'm getting the same highlighting-related stacktrace as before. ;-) {code}SEVERE: java.lang.NullPointerException at org.apache.solr.handler.component.HighlightComponent.finishStage(HighlightComponent.java:156) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1407) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:298) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:588) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489) at java.lang.Thread.run(Thread.java:680){code} Search Grouping: support distributed search --- Key: SOLR-2066 URL: https://issues.apache.org/jira/browse/SOLR-2066 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Fix For: 3.5, 4.0 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery
[ https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102271#comment-13102271 ] Robert Muir commented on LUCENE-3426: - Hi Koji, I wonder if instead it would be cleaner as a subclass of PhraseQuery (NGramPhraseQuery or similar), that rewrites to the (possibly optimized) PhraseQuery in rewrite(). For example, it would build an optimized PhraseQuery when slop = 0, and there are enough terms to optimize, otherwise it would build a normal phrasequery. Then the optimization would be easy to apply, the user just uses NGramPhraseQuery instead of PhraseQuery. for example, from QueryParser: {noformat} @Override protected PhraseQuery newPhraseQuery() { return new NGramPhraseQuery(); } {noformat} optimizer for n-gram PhraseQuery Key: LUCENE-3426 URL: https://issues.apache.org/jira/browse/LUCENE-3426 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Koji Sekiguchi Priority: Trivial Attachments: LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java If 2-gram is used and the length of query string is 4, for example q=ABCD, QueryParser generates (when autoGeneratePhraseQueries is true) PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB CD) with appropriate positions. The idea came from the Japanese paper N.M-gram: Implementation of Inverted Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main theme of the paper is different from the idea that I'm using here, though) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3423) add Terms.docCount
[ https://issues.apache.org/jira/browse/LUCENE-3423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3423. - Resolution: Fixed Assignee: Robert Muir add Terms.docCount -- Key: LUCENE-3423 URL: https://issues.apache.org/jira/browse/LUCENE-3423 Project: Lucene - Java Issue Type: New Feature Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-3423.patch spinoff from LUCENE-3290, where yonik mentioned: {noformat} Is there currently a way to get the number of documents that have a value in the field? Then one could compute the average length of a (sparse) field via sumTotalTermFreq(field)/docsWithField(field) docsWithField(field) would be useful in other contexts that want to know how sparse a field is (automatically selecting faceting algorithms, etc). {noformat} I think this is a useful stat to add, in case you have sparse fields for heuristics or scoring. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
heads up: reindex trunk indexes
I just committed https://issues.apache.org/jira/browse/LUCENE-3423 If you are using trunk, you should reindex. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3427) add function queries for all index statistics
add function queries for all index statistics - Key: LUCENE-3427 URL: https://issues.apache.org/jira/browse/LUCENE-3427 Project: Lucene - Java Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir I think we have most of them, but at least the following are missing: * getDocCount (# of documents that contain a value for a field) * sumDocFreq (# of postings for a field) not sure if there are others that don't have function queries. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.4.0, RC1
+1, thanks for creating this release candidate. On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless luc...@mikemccandless.com wrote: Please vote to release the RC1 artifacts at: https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142 as Lucene 3.4.0 and Solr 3.4.0. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2752) leader-per-shard
[ https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-2752: -- Attachment: SOLR-2752.patch new patch - much stronger test, a couple fixes, refactor most of the leader election code into its own class. leader-per-shard Key: SOLR-2752 URL: https://issues.apache.org/jira/browse/SOLR-2752 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Yonik Seeley Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-2752.patch, SOLR-2752.patch We need to add metadata into zookeeper about who is the leader for each shard, and have some kind of leader election. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2752) leader-per-shard
[ https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102283#comment-13102283 ] Mark Miller commented on SOLR-2752: --- Just a quick correction to first comment - cores create an ephemeral|sequential node - not just ephemeral. leader-per-shard Key: SOLR-2752 URL: https://issues.apache.org/jira/browse/SOLR-2752 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Yonik Seeley Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-2752.patch, SOLR-2752.patch We need to add metadata into zookeeper about who is the leader for each shard, and have some kind of leader election. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-2754) create Solr similarity factories for new ranking algorithms
create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like IFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-2754: -- Description: To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen was: To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like IFR and IB, so that different basic models/normalizations/lambdas can be chosen create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-2754) create Solr similarity factories for new ranking algorithms
[ https://issues.apache.org/jira/browse/SOLR-2754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir reassigned SOLR-2754: - Assignee: Robert Muir create Solr similarity factories for new ranking algorithms --- Key: SOLR-2754 URL: https://issues.apache.org/jira/browse/SOLR-2754 Project: Solr Issue Type: New Feature Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir To make it easy to use some of the new ranking algorithms, we should add factories to solr: * for parametric models like LM and BM25 so that parameters can be set from schema.xml * for framework models like DFR and IB, so that different basic models/normalizations/lambdas can be chosen -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10500 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10500/ 1 tests failed. FAILED: TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml.init Error Message: Stack Trace: Test report file /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml was length 0 Build Log (for compile errors): [...truncated 1243 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads
trunk tests hang/deadlock TestIndexWriterWithThreads Key: LUCENE-3428 URL: https://issues.apache.org/jira/browse/LUCENE-3428 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir trunk tests have been hanging often lately in hudson, this time i was careful to kill and get a good stacktrace: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads
[ https://issues.apache.org/jira/browse/LUCENE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102296#comment-13102296 ] Robert Muir commented on LUCENE-3428: - https://builds.apache.org/view/G-L/view/Lucene/job/Lucene-Solr-tests-only-trunk/10500 {noformat} [junit] 2011-09-11 16:32:39 [junit] Full thread dump OpenJDK 64-Bit Server VM (20.0-b11 mixed mode): [junit] [junit] Low Memory Detector daemon prio=5 tid=0x000801eee800 nid=0x19642 runnable [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] C2 CompilerThread1 daemon prio=5 tid=0x000801eef000 nid=0x19640 waiting on condition [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] C2 CompilerThread0 daemon prio=5 tid=0x000801ef nid=0x1963d waiting on condition [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] Signal Dispatcher daemon prio=5 tid=0x000801ef0800 nid=0x19630 waiting on condition [0x] [junit]java.lang.Thread.State: RUNNABLE [junit] [junit] Finalizer daemon prio=5 tid=0x000801ef1800 nid=0x19581 in Object.wait() [0x7ebee000] [junit]java.lang.Thread.State: WAITING (on object monitor) [junit] at java.lang.Object.wait(Native Method) [junit] - waiting on 0x000828cb0370 (a java.lang.ref.ReferenceQueue$Lock) [junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:133) [junit] - locked 0x000828cb0370 (a java.lang.ref.ReferenceQueue$Lock) [junit] at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:149) [junit] at java.lang.ref.Finalizer$FinalizerThread.run(Finalizer.java:177) [junit] [junit] Reference Handler daemon prio=5 tid=0x000801ef3000 nid=0x1957f in Object.wait() [0x7ecef000] [junit]java.lang.Thread.State: WAITING (on object monitor) [junit] at java.lang.Object.wait(Native Method) [junit] - waiting on 0x000828cb0410 (a java.lang.ref.Reference$Lock) [junit] at java.lang.Object.wait(Object.java:502) [junit] at java.lang.ref.Reference$ReferenceHandler.run(Reference.java:133) [junit] - locked 0x000828cb0410 (a java.lang.ref.Reference$Lock) [junit] [junit] main prio=5 tid=0x000801ef3800 nid=0x19432 waiting on condition [0x7fbfd000] [junit]java.lang.Thread.State: WAITING (parking) [junit] at sun.misc.Unsafe.park(Native Method) [junit] - parking to wait for 0x000827a440c0 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) [junit] at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186) [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:838) [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:871) [junit] at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1201) [junit] at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:214) [junit] at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:290) [junit] at org.apache.lucene.index.DocumentsWriterFlushControl.assertActiveDeleteQueue(DocumentsWriterFlushControl.java:435) [junit] at org.apache.lucene.index.DocumentsWriterFlushControl.markForFullFlush(DocumentsWriterFlushControl.java:428) [junit] at org.apache.lucene.index.DocumentsWriter.flushAllThreads(DocumentsWriter.java:557) [junit] - locked 0x000827a417c0 (a org.apache.lucene.index.DocumentsWriter) [junit] at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:2973) [junit] - locked 0x000827a3d738 (a java.lang.Object) [junit] at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:2950) [junit] at org.apache.lucene.index.IndexWriter.closeInternal(IndexWriter.java:1133) [junit] at org.apache.lucene.index.IndexWriter.close(IndexWriter.java:1097) [junit] at org.apache.lucene.index.TestIndexWriterWithThreads.testCloseWithThreads(TestIndexWriterWithThreads.java:200) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [junit] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) [junit] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [junit] at java.lang.reflect.Method.invoke(Method.java:616) [junit] at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:44) [junit] at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:15) [junit] at
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 10522 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/10522/ No tests ran. Build Log (for compile errors): [...truncated 142 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 424 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/424/ No tests ran. Build Log (for compile errors): [...truncated 100 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-trunk - Build # 10500 - Failure
I killed this due to a hang/deadlock issue: https://issues.apache.org/jira/browse/LUCENE-3428 On Sun, Sep 11, 2011 at 12:35 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10500/ 1 tests failed. FAILED: TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml.init Error Message: Stack Trace: Test report file /home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-trunk/checkout/lucene/build/test/TEST-org.apache.lucene.index.TestIndexWriterWithThreads.xml was length 0 Build Log (for compile errors): [...truncated 1243 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [JENKINS] Lucene-Solr-tests-only-3.x - Build # 10522 - Failure
collateral damage from https://issues.apache.org/jira/browse/LUCENE-3428, i was just killing java processes. On Sun, Sep 11, 2011 at 12:36 PM, Apache Jenkins Server jenk...@builds.apache.org wrote: Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/10522/ No tests ran. Build Log (for compile errors): [...truncated 142 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3429) improve build system when tests hang
improve build system when tests hang Key: LUCENE-3429 URL: https://issues.apache.org/jira/browse/LUCENE-3429 Project: Lucene - Java Issue Type: Test Reporter: Robert Muir Fix For: 3.5, 4.0 Currently, if tests hang in hudson it can go hung for days until we manually kill it. The problem is that when a hang happens its probably serious, what we want to do (I think), is: # time out the build. # ensure we have enough debugging information to hopefully fix any hang. So I think the ideal solution would be: # add a sysprop -D that LuceneTestCase respects, it could default to no timeout at all (some value like zero). # when a timeout is set, LuceneTestCase spawns an additional timer thread for the test class? method? # if the timeout is exceeded, LuceneTestCase dumps all thread/stack information, random seed information to hopefully reproduce the hang, and fails the test. # nightly builds would pass some reasonable -D for each test. separately, I think we should have an ant-level timeout for the whole build, in case it goes completely crazy (e.g. jvm completely hangs or something else), just as an additional safety. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3429) improve build system when tests hang
[ https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102301#comment-13102301 ] Robert Muir commented on LUCENE-3429: - I'm gonna play with the ant junit task timeout first, just to see if we can do anything with it as a quick hack. I suspect the problem will be that we won't get enough debugging information via this mechanism (random seed, stacktraces). improve build system when tests hang Key: LUCENE-3429 URL: https://issues.apache.org/jira/browse/LUCENE-3429 Project: Lucene - Java Issue Type: Test Reporter: Robert Muir Fix For: 3.5, 4.0 Currently, if tests hang in hudson it can go hung for days until we manually kill it. The problem is that when a hang happens its probably serious, what we want to do (I think), is: # time out the build. # ensure we have enough debugging information to hopefully fix any hang. So I think the ideal solution would be: # add a sysprop -D that LuceneTestCase respects, it could default to no timeout at all (some value like zero). # when a timeout is set, LuceneTestCase spawns an additional timer thread for the test class? method? # if the timeout is exceeded, LuceneTestCase dumps all thread/stack information, random seed information to hopefully reproduce the hang, and fails the test. # nightly builds would pass some reasonable -D for each test. separately, I think we should have an ant-level timeout for the whole build, in case it goes completely crazy (e.g. jvm completely hangs or something else), just as an additional safety. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102307#comment-13102307 ] Martijn van Groningen edited comment on SOLR-2066 at 9/11/11 5:15 PM: -- Jasper, does the exception occur for the same queries? I did add a test for this. Can you run the TestDistributedSearch test? was (Author: martijn.v.groningen): Jasper, does the exception occur occur for the same queries? I did add a test for this. Can you run the TestDistributedSearch test? Search Grouping: support distributed search --- Key: SOLR-2066 URL: https://issues.apache.org/jira/browse/SOLR-2066 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Fix For: 3.5, 4.0 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102307#comment-13102307 ] Martijn van Groningen commented on SOLR-2066: - Jasper, does the exception occur occur for the same queries? I did add a test for this. Can you run the TestDistributedSearch test? Search Grouping: support distributed search --- Key: SOLR-2066 URL: https://issues.apache.org/jira/browse/SOLR-2066 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Fix For: 3.5, 4.0 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3429) improve build system when tests hang
[ https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3429: Attachment: LUCENE-3429.patch here is a hack patch that sets a timeout of 1 hour to any test batch (e.g. test-core) by default, unless you are running Test2BTerms (10 hours). i tested this, the issue is you get no debugging information at all... but its at least a small start. improve build system when tests hang Key: LUCENE-3429 URL: https://issues.apache.org/jira/browse/LUCENE-3429 Project: Lucene - Java Issue Type: Test Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3429.patch Currently, if tests hang in hudson it can go hung for days until we manually kill it. The problem is that when a hang happens its probably serious, what we want to do (I think), is: # time out the build. # ensure we have enough debugging information to hopefully fix any hang. So I think the ideal solution would be: # add a sysprop -D that LuceneTestCase respects, it could default to no timeout at all (some value like zero). # when a timeout is set, LuceneTestCase spawns an additional timer thread for the test class? method? # if the timeout is exceeded, LuceneTestCase dumps all thread/stack information, random seed information to hopefully reproduce the hang, and fails the test. # nightly builds would pass some reasonable -D for each test. separately, I think we should have an ant-level timeout for the whole build, in case it goes completely crazy (e.g. jvm completely hangs or something else), just as an additional safety. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2752) leader-per-shard
[ https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-2752: -- Attachment: SOLR-2752.patch Another new patch: I moved SolrZooKeeper to the org.apache.zookeeper package so that I could add a simulated timeout method for tests. I also wrote a new test that starts up a bunch of replicas and then times out the leader. After waiting for the leader to reconnect, all of the other replicas are killed and I check that the first leader is again the leader. I wrote this test because I knew it would fail and that on reconnecting, clients don't jump back into the leader election process. So I also added to the client reconnection impl - on reconnect, all SolrCores are re-registered. This also has the advantage that any SolrCores that where created while the connection was down are put into play. That allows the new test to pass. leader-per-shard Key: SOLR-2752 URL: https://issues.apache.org/jira/browse/SOLR-2752 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Yonik Seeley Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-2752.patch, SOLR-2752.patch, SOLR-2752.patch We need to add metadata into zookeeper about who is the leader for each shard, and have some kind of leader election. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.4.0, RC1
+1 all tests on all Lucene-using projects I contribute to pass without any change needed (a sure sign I should add more...). Once more, great work and thank so much to everyone involved. Sanne On 11 September 2011 16:11, Robert Muir rcm...@gmail.com wrote: +1, thanks for creating this release candidate. On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless luc...@mikemccandless.com wrote: Please vote to release the RC1 artifacts at: https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142 as Lucene 3.4.0 and Solr 3.4.0. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads
[ https://issues.apache.org/jira/browse/LUCENE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer reassigned LUCENE-3428: --- Assignee: Simon Willnauer trunk tests hang/deadlock TestIndexWriterWithThreads Key: LUCENE-3428 URL: https://issues.apache.org/jira/browse/LUCENE-3428 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Assignee: Simon Willnauer Attachments: LUCENE-3428.patch trunk tests have been hanging often lately in hudson, this time i was careful to kill and get a good stacktrace: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3428) trunk tests hang/deadlock TestIndexWriterWithThreads
[ https://issues.apache.org/jira/browse/LUCENE-3428?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Simon Willnauer updated LUCENE-3428: Attachment: LUCENE-3428.patch I think I found the reason or one possible reason for this. there is one place where we don't release a DWPT lock in the case of a failure. Here is a patch. trunk tests hang/deadlock TestIndexWriterWithThreads Key: LUCENE-3428 URL: https://issues.apache.org/jira/browse/LUCENE-3428 Project: Lucene - Java Issue Type: Bug Reporter: Robert Muir Assignee: Simon Willnauer Attachments: LUCENE-3428.patch trunk tests have been hanging often lately in hudson, this time i was careful to kill and get a good stacktrace: -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk - Build # 1673 - Still Failing
Build: https://builds.apache.org/job/Lucene-trunk/1673/ 1 tests failed. FAILED: org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML Error Message: null Stack Trace: junit.framework.AssertionFailedError at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:50) at org.apache.lucene.search.Scorer.score(Scorer.java:60) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:419) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:376) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:296) at org.apache.lucene.queryparser.xml.TestParser.dumpResults(TestParser.java:216) at org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML(TestParser.java:157) Build Log (for compile errors): [...truncated 16136 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.4.0, RC1
I prepared a PyLucene 3.4 release candidate from the Lucene 3.4 branch. All tests pass. +1 to release Lucene Solr 3.4. Andi.. On Sep 9, 2011, at 9:06, Michael McCandless luc...@mikemccandless.com wrote: Please vote to release the RC1 artifacts at: https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142 as Lucene 3.4.0 and Solr 3.4.0. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS-MAVEN] Lucene-Solr-Maven-3.x #240: POMs out of sync
Build: https://builds.apache.org/job/Lucene-Solr-Maven-3.x/240/ No tests ran. Build Log (for compile errors): [...truncated 13149 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated SOLR-2066: Attachment: LUCENE-3360.patch Updated patch * group.query works in distributed search * group.main works in distributed search * Many refactorings I think the feature needs to be committed. Maybe besides some jdocs the patch is ready. I'll commit this feature in the coming days. In the mean time I will start working on making the patch work for the 3x branch. Search Grouping: support distributed search --- Key: SOLR-2066 URL: https://issues.apache.org/jira/browse/SOLR-2066 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Fix For: 3.5, 4.0 Attachments: LUCENE-3360.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated SOLR-2066: Attachment: (was: LUCENE-3360.patch) Search Grouping: support distributed search --- Key: SOLR-2066 URL: https://issues.apache.org/jira/browse/SOLR-2066 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Fix For: 3.5, 4.0 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102354#comment-13102354 ] Martijn van Groningen edited comment on SOLR-2066 at 9/11/11 9:33 PM: -- Updated patch * group.query works in distributed search * group.main works in distributed search * Many refactorings I think the feature needs to be committed. Maybe besides some jdocs the patch is ready. I'll commit this feature in the coming days. In the mean time I will start working on the patch for the 3x branch. was (Author: martijn.v.groningen): Updated patch * group.query works in distributed search * group.main works in distributed search * Many refactorings I think the feature needs to be committed. Maybe besides some jdocs the patch is ready. I'll commit this feature in the coming days. In the mean time I will start working on making the patch work for the 3x branch. Search Grouping: support distributed search --- Key: SOLR-2066 URL: https://issues.apache.org/jira/browse/SOLR-2066 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Fix For: 3.5, 4.0 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2066) Search Grouping: support distributed search
[ https://issues.apache.org/jira/browse/SOLR-2066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated SOLR-2066: Attachment: SOLR-2066.patch Search Grouping: support distributed search --- Key: SOLR-2066 URL: https://issues.apache.org/jira/browse/SOLR-2066 Project: Solr Issue Type: Sub-task Reporter: Yonik Seeley Fix For: 3.5, 4.0 Attachments: SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch, SOLR-2066.patch Support distributed field collapsing / search grouping. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-2752) leader-per-shard
[ https://issues.apache.org/jira/browse/SOLR-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-2752: -- Attachment: SOLR-2752.patch feeling motivated I guess - another patch with a bunch of polish leader-per-shard Key: SOLR-2752 URL: https://issues.apache.org/jira/browse/SOLR-2752 Project: Solr Issue Type: Sub-task Components: SolrCloud Reporter: Yonik Seeley Assignee: Mark Miller Fix For: 4.0 Attachments: SOLR-2752.patch, SOLR-2752.patch, SOLR-2752.patch, SOLR-2752.patch We need to add metadata into zookeeper about who is the leader for each shard, and have some kind of leader election. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: [VOTE] Release Lucene/Solr 3.4.0, RC1
Hi, +1 I checked the Lucene Core JAR file as drop-in replacement for PANGAEA, works without any problem. Did reindex some documents, checkindexed, optimized, checkindexed again. All fine, no 1.6.0_24 crashes all is working as it should. Code compiles fine, too. We are running now on this version with Solaris and MMAP (as usual). I had no time to verify the package contents and md5/sha1 hashes or try Solr, but I think somebody might already have done this. I can verify that the javadoc links to 0racle work again. Changes look fine, one small thing: We have Java 7 try-with-resources support now (our first Java 7 feature!!!), but the note is at wrong position (under BUG FIXES): LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log suppressed exceptions in the original exception, so stack trace will contain them. (Uwe Schindler) [should be NEW FEATURES] - But that's minor, just if we respin again, but I don't expect this. Mike: Thanks for the great new release and sorry for the respin. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Friday, September 09, 2011 6:07 PM To: dev@lucene.apache.org Dev Subject: [VOTE] Release Lucene/Solr 3.4.0, RC1 Please vote to release the RC1 artifacts at: https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1- rev1167142 as Lucene 3.4.0 and Solr 3.4.0. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.4.0, RC1
+1 Used this build in my classes today at NFJS Boston (sorry Mike - no time to say hi). Solr worked just fine. Erik On Sep 9, 2011, at 12:06, Michael McCandless luc...@mikemccandless.com wrote: Please vote to release the RC1 artifacts at: https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142 as Lucene 3.4.0 and Solr 3.4.0. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: svn commit: r1169564 - in /lucene/dev/branches/branch_3x: build.xml solr/common-build.xml
Thanks Steve! Mike McCandless http://blog.mikemccandless.com On Sun, Sep 11, 2011 at 6:47 PM, sar...@apache.org wrote: Author: sarowe Date: Sun Sep 11 22:47:33 2011 New Revision: 1169564 URL: http://svn.apache.org/viewvc?rev=1169564view=rev Log: 3.4 - 3.5 Modified: lucene/dev/branches/branch_3x/build.xml lucene/dev/branches/branch_3x/solr/common-build.xml Modified: lucene/dev/branches/branch_3x/build.xml URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/build.xml?rev=1169564r1=1169563r2=1169564view=diff == --- lucene/dev/branches/branch_3x/build.xml (original) +++ lucene/dev/branches/branch_3x/build.xml Sun Sep 11 22:47:33 2011 @@ -45,7 +45,7 @@ /sequential /target - property name=version value=3.4-SNAPSHOT/ + property name=version value=3.5-SNAPSHOT/ target name=get-maven-poms description=Copy Maven POMs from dev-tools/maven/ to their target locations copy todir=. overwrite=true Modified: lucene/dev/branches/branch_3x/solr/common-build.xml URL: http://svn.apache.org/viewvc/lucene/dev/branches/branch_3x/solr/common-build.xml?rev=1169564r1=1169563r2=1169564view=diff == --- lucene/dev/branches/branch_3x/solr/common-build.xml (original) +++ lucene/dev/branches/branch_3x/solr/common-build.xml Sun Sep 11 22:47:33 2011 @@ -72,7 +72,7 @@ By default, this should be set to X.Y.M.${dateversion} where X.Y.M is the last version released (on this branch). -- - property name=solr.spec.version value=3.4.0.${dateversion} / + property name=solr.spec.version value=3.5.0.${dateversion} / path id=solr.base.classpath pathelement path=${analyzers-common.jar}/ - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.4.0, RC1
On Sun, Sep 11, 2011 at 5:46 PM, Uwe Schindler u...@thetaphi.de wrote: Changes look fine, one small thing: We have Java 7 try-with-resources support now (our first Java 7 feature!!!), but the note is at wrong position (under BUG FIXES): LUCENE-3334: If Java7 is detected, IOUtils.closeSafely() will log suppressed exceptions in the original exception, so stack trace will contain them. (Uwe Schindler) [should be NEW FEATURES] - But that's minor, just if we respin again, but I don't expect this. Woops, OK, if we respin (looks unlikely so far). Can you fix on 3.x for 3.5? Mike: Thanks for the great new release and sorry for the respin. No problem, it's really easy now: I have it down to a single Python script! I'll commit it to dev-tools... Mike - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.4.0, RC1
+1 to release. I ran the release smoke tester, it was happy! Mike McCandless http://blog.mikemccandless.com On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless luc...@mikemccandless.com wrote: Please vote to release the RC1 artifacts at: https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142 as Lucene 3.4.0 and Solr 3.4.0. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3429) improve build system when tests hang
[ https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102370#comment-13102370 ] Michael McCandless commented on LUCENE-3429: We could run a standalone tool that does a kill -QUIT if any java process is taking X minutes? improve build system when tests hang Key: LUCENE-3429 URL: https://issues.apache.org/jira/browse/LUCENE-3429 Project: Lucene - Java Issue Type: Test Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3429.patch Currently, if tests hang in hudson it can go hung for days until we manually kill it. The problem is that when a hang happens its probably serious, what we want to do (I think), is: # time out the build. # ensure we have enough debugging information to hopefully fix any hang. So I think the ideal solution would be: # add a sysprop -D that LuceneTestCase respects, it could default to no timeout at all (some value like zero). # when a timeout is set, LuceneTestCase spawns an additional timer thread for the test class? method? # if the timeout is exceeded, LuceneTestCase dumps all thread/stack information, random seed information to hopefully reproduce the hang, and fails the test. # nightly builds would pass some reasonable -D for each test. separately, I think we should have an ant-level timeout for the whole build, in case it goes completely crazy (e.g. jvm completely hangs or something else), just as an additional safety. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-2959) [GSoC] Implementing State of the Art Ranking for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102371#comment-13102371 ] Michael McCandless commented on LUCENE-2959: Thanks David and Robert! What an incredible step forward: now you can easily try out all sorts of pre-existing scoring models, or make your own. Yay :) [GSoC] Implementing State of the Art Ranking for Lucene --- Key: LUCENE-2959 URL: https://issues.apache.org/jira/browse/LUCENE-2959 Project: Lucene - Java Issue Type: New Feature Components: core/query/scoring, general/javadocs, modules/examples Reporter: David Mark Nemeskey Assignee: Robert Muir Labels: gsoc2011, lucene-gsoc-11, mentor Fix For: flexscoring branch, 4.0 Attachments: LUCENE-2959.patch, LUCENE-2959.patch, LUCENE-2959_mockdfr.patch, LUCENE-2959_nocommits.patch, implementation_plan.pdf, proposal.pdf Lucene employs the Vector Space Model (VSM) to rank documents, which compares unfavorably to state of the art algorithms, such as BM25. Moreover, the architecture is tailored specically to VSM, which makes the addition of new ranking functions a non- trivial task. This project aims to bring state of the art ranking methods to Lucene and to implement a query architecture with pluggable ranking functions. The wiki page for the project can be found at http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRanking. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Regarding Transaction logging
I agree: we should figure out just how an app would effectively make use of this seq ID, in order to understand if this really is gonna work end to end. Else we shouldn't change Lucene's core APIs. EG: could ES remove its lock array if Lucene returned a seq ID? How bad is it that ES/Solr/this-new-module would have to order their transaction log according to Lucene's seq ID? Or maybe it would not re-order, but rather write the seqID+document in each entry; then on playback (but also on RT get) it'd have to re-order? Mike McCandless http://blog.mikemccandless.com On Sat, Sep 10, 2011 at 1:45 PM, Simon Willnauer simon.willna...@googlemail.com wrote: On Thu, Sep 8, 2011 at 5:35 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Sep 8, 2011 at 11:26 AM, Michael McCandless luc...@mikemccandless.com wrote: Returning a long seqID seems the least invasive change to make this total ordering possible? Especially since the DWDQ already computes this order... +1 This seems like the most powerful option. I still wonder how we make efficient use of this. If you are ordering the logs based on the returned sequence Ids you have to effectively delay writing to the log since documents ie. their threads come back async and out of order. Even worse if some thread picks up a flush it might block for a reasonable amount of time. I am not saying its impossible but before we jump on it and get into the DWPT hassle we should at least sketch out how to make use of this feature (lemme tell you this is not trivial to implement and requires a fair bit of refactoring). If somebody has thought about this I'd be happy if you could share you ideas here! simon -Yonik http://www.lucene-eurocon.com - The Lucene/Solr User Conference - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 3.4.0 draft release notes
On Sat, Sep 10, 2011 at 10:21 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Will the fix/patch for issue SOLR-2726 included in SOLR 3.4.0? Sorry, no. This isn't a release blocker issue. But, separately, I think we should fix it, but on quick glance it doesn't look like there's consensus on how to fix it? Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3430) TestParser.testSpanTermXML fails with some sims
TestParser.testSpanTermXML fails with some sims --- Key: LUCENE-3430 URL: https://issues.apache.org/jira/browse/LUCENE-3430 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Fix For: 4.0 here is why this test sometimes fails (my explanation in the test i wrote): {noformat} /** make sure all sims work with spanOR(termX, termY) where termY does not exist */ public void testCrazySpans() throws Exception { // The problem: normal lucene queries create scorers, returning null if terms dont exist // This means they never score a term that does not exist. // however with spans, there is only one scorer for the whole hierarchy: // inner queries are not real queries, their boosts are ignored, etc. {noformat} Basically, SpanQueries aren't really queries, you just get one scorer. it calls extractTerms on the whole hierarchy and computes weights (e.g. IDF) on the whole bag of terms, even if they don't exist. This is fine, we already have tests that sim's won't bug-out in computeStats() here: however they don't expect to actually score documents based on these terms that don't exist... however this is exactly what happens in Spans because it doesn't use sub-scorers. Lucene's sim avoids this with the (docFreq + 1) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3430) TestParser.testSpanTermXML fails with some sims
[ https://issues.apache.org/jira/browse/LUCENE-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-3430: Attachment: LUCENE-3430.patch patch, my modifications to the others take the same approach as lucene's sim I did the relevance testing (across all 129 possibilities) with short queries, no problems, still waiting on my computer for long queries... if that comes back ok I'd like to commit. TestParser.testSpanTermXML fails with some sims --- Key: LUCENE-3430 URL: https://issues.apache.org/jira/browse/LUCENE-3430 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Fix For: 4.0 Attachments: LUCENE-3430.patch here is why this test sometimes fails (my explanation in the test i wrote): {noformat} /** make sure all sims work with spanOR(termX, termY) where termY does not exist */ public void testCrazySpans() throws Exception { // The problem: normal lucene queries create scorers, returning null if terms dont exist // This means they never score a term that does not exist. // however with spans, there is only one scorer for the whole hierarchy: // inner queries are not real queries, their boosts are ignored, etc. {noformat} Basically, SpanQueries aren't really queries, you just get one scorer. it calls extractTerms on the whole hierarchy and computes weights (e.g. IDF) on the whole bag of terms, even if they don't exist. This is fine, we already have tests that sim's won't bug-out in computeStats() here: however they don't expect to actually score documents based on these terms that don't exist... however this is exactly what happens in Spans because it doesn't use sub-scorers. Lucene's sim avoids this with the (docFreq + 1) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: 3.4.0 draft release notes
On Sun, Sep 11, 2011 at 7:04 PM, Michael McCandless luc...@mikemccandless.com wrote: On Sat, Sep 10, 2011 at 10:21 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Will the fix/patch for issue SOLR-2726 included in SOLR 3.4.0? Sorry, no. This isn't a release blocker issue. But, separately, I think we should fix it, but on quick glance it doesn't look like there's consensus on how to fix it? I had this same bug when implementing a spellchecker too. Its something the spellcheck framework expects, but doesn't provide. I think its broken that SolrSpellChecker has both field name and analyzer, but only sets up field name in its init()... if SolrSpellChecker is going to own the 'analyzer' variable then I think its init() should take care of the logic, currently its either duplicated across spellchecker implementations, or its missing entirely, causing bugs like SOLR-2726. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10504 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10504/ 1 tests failed. REGRESSION: org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML Error Message: null Stack Trace: junit.framework.AssertionFailedError at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.lucene.search.TopScoreDocCollector$InOrderTopScoreDocCollector.collect(TopScoreDocCollector.java:50) at org.apache.lucene.search.Scorer.score(Scorer.java:60) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:552) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:419) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:376) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:296) at org.apache.lucene.queryparser.xml.TestParser.dumpResults(TestParser.java:216) at org.apache.lucene.queryparser.xml.TestParser.testSpanTermXML(TestParser.java:157) Build Log (for compile errors): [...truncated 5267 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Full search posibility in Solr
Hello, My task is very simple: I have a big database with a lot tables and fields. This database has dynamic structure and can be extended or changed in any time. I need a tool for full-search possibility via all fields in all tables of my database. On the input of this tool - some text for search. On the output - some unique key and the name of field which contains this text. Solr is very good selection, but I have serious problem with it: all Solr query parsers (standard, dismax, edismax) requires explicit declaration of fields for search. But list of these fields in my case is very and very big! And at search time I don't know all field names in the database. I think that my task is not unique. According google a lot of people tries to solve same problems with Solr. May be good idea to add more flexible possibilities for search in all indexed fields? I see following variants: 1. Add wildcards in the qf parameter for dismax/edismax query parsers. 2. Add possibility to store source field name in copyField operator in schema.xml. In this case user can do following: a) create field for default search: field name=TEXT type=text_ALL indexed=true stored=true multiValued=true/ ... defaultSearchFieldTEXT/defaultSearchField b) copy all fields to default search field: copyField source=* dest=TEXT storeSource=true / c) In query response user can receive needed source field name: lst name=highlighting lst name=.. arr name=TEXT str source=SOURCE_FIELD_NAMEfoo foo foo emtest/em foo foo/str /arr /lst I'm sorry, if has distracted from affairs. Eugeny
[jira] [Commented] (SOLR-1979) Create LanguageIdentifierUpdateProcessor
[ https://issues.apache.org/jira/browse/SOLR-1979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102374#comment-13102374 ] Jan Høydahl commented on SOLR-1979: --- An updated documentation of the Processor is now at http://wiki.apache.org/solr/LanguageDetection @Lance: What params were on your mind as candidates for keyword instead of true/false, and for what potential future reasons? Create LanguageIdentifierUpdateProcessor Key: SOLR-1979 URL: https://issues.apache.org/jira/browse/SOLR-1979 Project: Solr Issue Type: New Feature Components: update Reporter: Jan Høydahl Assignee: Jan Høydahl Priority: Minor Labels: UpdateProcessor Fix For: 3.5 Attachments: SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch, SOLR-1979.patch Language identification from document fields, and mapping of field names to language-specific fields based on detected language. Wrap the Tika LanguageIdentifier in an UpdateProcessor. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-2726) NullPointerException when using spellcheck.q
[ https://issues.apache.org/jira/browse/SOLR-2726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102375#comment-13102375 ] Robert Muir commented on SOLR-2726: --- In my opinion, since the base class SolrSpellChecker has this 'analyzer' field (that it wants to be non-null), it should at least take care of it in its init() method, and we should make sure subclasses call super.init(args) in their init() methods. When i had this bug in directspellchecker i copied-pasted the below code from AbstractLuceneSpellChecker to fix it, but i think its dumb to put this in every spellchecker subclass, and its trappy for someone trying to implement their own spellchecker: {noformat} if (field != null core.getSchema().getFieldTypeNoEx(field) != null) { analyzer = core.getSchema().getFieldType(field).getQueryAnalyzer(); } fieldTypeName = (String) config.get(FIELD_TYPE); if (core.getSchema().getFieldTypes().containsKey(fieldTypeName)) { FieldType fieldType = core.getSchema().getFieldTypes().get(fieldTypeName); analyzer = fieldType.getQueryAnalyzer(); } if (analyzer == null) { LOG.info(Using WhitespaceAnalyzer for dictionary: + name); analyzer = new WhitespaceAnalyzer(core.getSolrConfig().luceneMatchVersion); } {noformat} NullPointerException when using spellcheck.q Key: SOLR-2726 URL: https://issues.apache.org/jira/browse/SOLR-2726 Project: Solr Issue Type: Bug Components: spellchecker Affects Versions: 3.3, 4.0 Environment: ubuntu Reporter: valentin Labels: nullpointerexception, spellcheck Attachments: SOLR-2726.patch When I use spellcheck.q in my query to define what will be spellchecked, I always have this error, for every configuration I try : java.lang.NullPointerException at org.apache.solr.handler.component.SpellCheckComponent.getTokens(SpellCheckComponent.java:476) at org.apache.solr.handler.component.SpellCheckComponent.process(SpellCheckComponent.java:131) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:202) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) All my other functions works great, this is the only thing which doesn't work at all, just when i add spellcheck.q=my%20sentence in the query... Example of a query : http://localhost:8983/solr/db/suggest_full?q=american%20israelspellcheck.q=american%20israel In solrconfig.xml : searchComponent name=suggest_full class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypesuggestTextFull/str lst name=spellchecker str name=namesuggest_full/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldtext_suggest_full/str str name=fieldTypesuggestTextFull/str /lst /searchComponent requestHandler name=/suggest_full class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest_full/str str name=spellcheck.count10/str str
[jira] [Commented] (LUCENE-3429) improve build system when tests hang
[ https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102376#comment-13102376 ] Robert Muir commented on LUCENE-3429: - Mike, right but even that solution wouldn't be that great: it wouldn't give us random seed :) Dawid pointed me to some code of his, I think he is working on a prototype for us to try to integrate: https://github.com/dweiss/timeoutrule/tree/master/src/test/java/com/carrotsearch improve build system when tests hang Key: LUCENE-3429 URL: https://issues.apache.org/jira/browse/LUCENE-3429 Project: Lucene - Java Issue Type: Test Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3429.patch Currently, if tests hang in hudson it can go hung for days until we manually kill it. The problem is that when a hang happens its probably serious, what we want to do (I think), is: # time out the build. # ensure we have enough debugging information to hopefully fix any hang. So I think the ideal solution would be: # add a sysprop -D that LuceneTestCase respects, it could default to no timeout at all (some value like zero). # when a timeout is set, LuceneTestCase spawns an additional timer thread for the test class? method? # if the timeout is exceeded, LuceneTestCase dumps all thread/stack information, random seed information to hopefully reproduce the hang, and fails the test. # nightly builds would pass some reasonable -D for each test. separately, I think we should have an ant-level timeout for the whole build, in case it goes completely crazy (e.g. jvm completely hangs or something else), just as an additional safety. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3426) optimizer for n-gram PhraseQuery
[ https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3426: --- Attachment: LUCENE-3426.patch I like the idea of introducing the newly created class! Here is the new patch. optimizer for n-gram PhraseQuery Key: LUCENE-3426 URL: https://issues.apache.org/jira/browse/LUCENE-3426 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Koji Sekiguchi Priority: Trivial Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java If 2-gram is used and the length of query string is 4, for example q=ABCD, QueryParser generates (when autoGeneratePhraseQueries is true) PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB CD) with appropriate positions. The idea came from the Japanese paper N.M-gram: Implementation of Inverted Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main theme of the paper is different from the idea that I'm using here, though) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery
[ https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102393#comment-13102393 ] Robert Muir commented on LUCENE-3426: - I think I like it better too... though I wonder if its possible to keep the original NGramPhraseQuery unmodified? this way its not changed by Query.rewrite(), and if a user reuses the query (which we document they can do), they could then call add() again and everything works. Also, somewhat related to the issue might be SOLR-2660. We don't have to commit that patch, but we could separate out the queryparser refactoring to make it easier for such an optimization to be automatic in solr, because it allows SolrQueryParser to delegate creation of Phrase/MultiPhraseQuery to the FieldType. optimizer for n-gram PhraseQuery Key: LUCENE-3426 URL: https://issues.apache.org/jira/browse/LUCENE-3426 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Koji Sekiguchi Priority: Trivial Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java If 2-gram is used and the length of query string is 4, for example q=ABCD, QueryParser generates (when autoGeneratePhraseQueries is true) PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB CD) with appropriate positions. The idea came from the Japanese paper N.M-gram: Implementation of Inverted Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main theme of the paper is different from the idea that I'm using here, though) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-3430) TestParser.testSpanTermXML fails with some sims
[ https://issues.apache.org/jira/browse/LUCENE-3430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir resolved LUCENE-3430. - Resolution: Fixed Assignee: Robert Muir TestParser.testSpanTermXML fails with some sims --- Key: LUCENE-3430 URL: https://issues.apache.org/jira/browse/LUCENE-3430 Project: Lucene - Java Issue Type: Bug Affects Versions: 4.0 Reporter: Robert Muir Assignee: Robert Muir Fix For: 4.0 Attachments: LUCENE-3430.patch here is why this test sometimes fails (my explanation in the test i wrote): {noformat} /** make sure all sims work with spanOR(termX, termY) where termY does not exist */ public void testCrazySpans() throws Exception { // The problem: normal lucene queries create scorers, returning null if terms dont exist // This means they never score a term that does not exist. // however with spans, there is only one scorer for the whole hierarchy: // inner queries are not real queries, their boosts are ignored, etc. {noformat} Basically, SpanQueries aren't really queries, you just get one scorer. it calls extractTerms on the whole hierarchy and computes weights (e.g. IDF) on the whole bag of terms, even if they don't exist. This is fine, we already have tests that sim's won't bug-out in computeStats() here: however they don't expect to actually score documents based on these terms that don't exist... however this is exactly what happens in Spans because it doesn't use sub-scorers. Lucene's sim avoids this with the (docFreq + 1) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10505 - Still Failing
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10505/ 1 tests failed. FAILED: org.apache.solr.search.TestRealTimeGet.testStressGetRealtime Error Message: java.lang.AssertionError: Some threads threw uncaught exceptions! Stack Trace: java.lang.RuntimeException: java.lang.AssertionError: Some threads threw uncaught exceptions! at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:695) at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:89) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.lucene.util.LuceneTestCase.checkUncaughtExceptionsAfter(LuceneTestCase.java:723) at org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:667) Build Log (for compile errors): [...truncated 8579 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3426) optimizer for n-gram PhraseQuery
[ https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3426: --- Attachment: LUCENE-3426.patch {quote} I think I like it better too... though I wonder if its possible to keep the original NGramPhraseQuery unmodified? this way its not changed by Query.rewrite(), and if a user reuses the query (which we document they can do), they could then call add() again and everything works. {quote} I wonder it that too. Here is the new patch. This time I added assertSame()/NotSame() to check the rewritten Query to test code. optimizer for n-gram PhraseQuery Key: LUCENE-3426 URL: https://issues.apache.org/jira/browse/LUCENE-3426 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Koji Sekiguchi Priority: Trivial Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java If 2-gram is used and the length of query string is 4, for example q=ABCD, QueryParser generates (when autoGeneratePhraseQueries is true) PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB CD) with appropriate positions. The idea came from the Japanese paper N.M-gram: Implementation of Inverted Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main theme of the paper is different from the idea that I'm using here, though) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3426) optimizer for n-gram PhraseQuery
[ https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3426: --- Attachment: PerfTest.java optimizer for n-gram PhraseQuery Key: LUCENE-3426 URL: https://issues.apache.org/jira/browse/LUCENE-3426 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Koji Sekiguchi Priority: Trivial Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java, PerfTest.java If 2-gram is used and the length of query string is 4, for example q=ABCD, QueryParser generates (when autoGeneratePhraseQueries is true) PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB CD) with appropriate positions. The idea came from the Japanese paper N.M-gram: Implementation of Inverted Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main theme of the paper is different from the idea that I'm using here, though) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery
[ https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102405#comment-13102405 ] Koji Sekiguchi commented on LUCENE-3426: For automatic in Solr, I wonder if we could move the feature to n-gram tokenizers, and we could have something like: {code} fieldType name=text_cjk class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.CJKTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.CJKTokenizerFactory optimizePhraseQuery=true/ /analyzer /fieldType {code} optimizer for n-gram PhraseQuery Key: LUCENE-3426 URL: https://issues.apache.org/jira/browse/LUCENE-3426 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Koji Sekiguchi Priority: Trivial Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java, PerfTest.java If 2-gram is used and the length of query string is 4, for example q=ABCD, QueryParser generates (when autoGeneratePhraseQueries is true) PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB CD) with appropriate positions. The idea came from the Japanese paper N.M-gram: Implementation of Inverted Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main theme of the paper is different from the idea that I'm using here, though) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery
[ https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102406#comment-13102406 ] Robert Muir commented on LUCENE-3426: - Well if we apply the refactoring part of SOLR-2660 (we can split out into a separate issue), we could add such a thing as an attribute to the fieldType? I like the way your patch looks now! A couple more questions: * doesn't the optimization also apply to MultiPhraseQuery? If so, NGramPhraseQuery could extend MultiPhraseQuery and just rewrite to the correct one (MultiPhrase or Phrase depending upon the situation after optimization) * what about hashCode/equals? Although the same results will be returned, scoring will differ, maybe it NGramPhraseQuery should implement these? optimizer for n-gram PhraseQuery Key: LUCENE-3426 URL: https://issues.apache.org/jira/browse/LUCENE-3426 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Koji Sekiguchi Priority: Trivial Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java, PerfTest.java If 2-gram is used and the length of query string is 4, for example q=ABCD, QueryParser generates (when autoGeneratePhraseQueries is true) PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB CD) with appropriate positions. The idea came from the Japanese paper N.M-gram: Implementation of Inverted Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main theme of the paper is different from the idea that I'm using here, though) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3426) optimizer for n-gram PhraseQuery
[ https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102405#comment-13102405 ] Koji Sekiguchi edited comment on LUCENE-3426 at 9/12/11 2:02 AM: - For automatic in Solr, I wonder if we could move the feature to n-gram tokenizers, and we could have something like: {code} fieldType name=text_cjk class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.CJKTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.CJKTokenizerFactory optimizePhraseQuery=true/ /analyzer /fieldType {code} was (Author: koji): For automatic in Solr, I wonder if we could move the feature to n-gram tokenizers, and we could have something like: {code} fieldType name=text_cjk class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.CJKTokenizerFactory/ /analyzer analyzer type=query tokenizer class=solr.CJKTokenizerFactory optimizePhraseQuery=true/ /analyzer /fieldType {code} optimizer for n-gram PhraseQuery Key: LUCENE-3426 URL: https://issues.apache.org/jira/browse/LUCENE-3426 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Koji Sekiguchi Priority: Trivial Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java, PerfTest.java If 2-gram is used and the length of query string is 4, for example q=ABCD, QueryParser generates (when autoGeneratePhraseQueries is true) PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB CD) with appropriate positions. The idea came from the Japanese paper N.M-gram: Implementation of Inverted Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main theme of the paper is different from the idea that I'm using here, though) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3426) optimizer for n-gram PhraseQuery
[ https://issues.apache.org/jira/browse/LUCENE-3426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102408#comment-13102408 ] Koji Sekiguchi commented on LUCENE-3426: I'm not sure it could apply MutiPhraseQuery. Let me take more time. Considering hashCode/equals is good point. I'll see. optimizer for n-gram PhraseQuery Key: LUCENE-3426 URL: https://issues.apache.org/jira/browse/LUCENE-3426 Project: Lucene - Java Issue Type: Improvement Components: core/search Reporter: Koji Sekiguchi Priority: Trivial Attachments: LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, LUCENE-3426.patch, PerfTest.java, PerfTest.java If 2-gram is used and the length of query string is 4, for example q=ABCD, QueryParser generates (when autoGeneratePhraseQueries is true) PhraseQuery(AB BC CD) with slop 0. But it can be optimized PhraseQuery(AB CD) with appropriate positions. The idea came from the Japanese paper N.M-gram: Implementation of Inverted Index Using N-gram with Hash Values by Mikio Hirabayashi, et al. (The main theme of the paper is different from the idea that I'm using here, though) -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: issue SOLR-1565
Code is not supposed to fly around in email. Use JIRA. Just create a new issue and attach it to the bug using SVN diff. See http://wiki.apache.org/solr/HowToContribute On Fri, Sep 9, 2011 at 1:03 PM, Patrick Sauts psa...@viadeoteam.com wrote: Hi, I’ve made a alpha version of StreamingUpdateSolrServer dedicated to Binary update (javabin), It works fine for me. It is not a fix of the issue SOLR-1565, it is a new class. But I think It can maybe be useful to fix the bug. If somebody tests it thank you to send feedback. Patrick Sauts. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org -- Bill Bell billnb...@gmail.com cell 720-256-8076 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-trunk - Build # 1674 - Still Failing
Build: https://builds.apache.org/job/Lucene-trunk/1674/ 2 tests failed. REGRESSION: org.apache.lucene.index.TestTermsEnum.testIntersectRandom Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.automaton.RunAutomaton.init(RunAutomaton.java:128) at org.apache.lucene.util.automaton.ByteRunAutomaton.init(ByteRunAutomaton.java:28) at org.apache.lucene.util.automaton.CompiledAutomaton.init(CompiledAutomaton.java:134) at org.apache.lucene.index.TestTermsEnum.testIntersectRandom(TestTermsEnum.java:266) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) REGRESSION: org.apache.lucene.util.automaton.TestCompiledAutomaton.testRandom Error Message: Java heap space Stack Trace: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.automaton.RunAutomaton.init(RunAutomaton.java:128) at org.apache.lucene.util.automaton.ByteRunAutomaton.init(ByteRunAutomaton.java:28) at org.apache.lucene.util.automaton.CompiledAutomaton.init(CompiledAutomaton.java:134) at org.apache.lucene.util.automaton.TestCompiledAutomaton.build(TestCompiledAutomaton.java:39) at org.apache.lucene.util.automaton.TestCompiledAutomaton.testTerms(TestCompiledAutomaton.java:55) at org.apache.lucene.util.automaton.TestCompiledAutomaton.testRandom(TestCompiledAutomaton.java:101) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) Build Log (for compile errors): [...truncated 12798 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Lucene-Solr-tests-only-trunk - Build # 10507 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-trunk/10507/ 1 tests failed. REGRESSION: org.apache.lucene.search.TestFilteredQuery.testFilteredQuery Error Message: expected:2.778353214263916 but was:2.778353452682495 Stack Trace: junit.framework.AssertionFailedError: expected:2.778353214263916 but was:2.778353452682495 at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:148) at org.apache.lucene.util.LuceneTestCaseRunner.runChild(LuceneTestCaseRunner.java:50) at org.apache.lucene.search.TestFilteredQuery.assertScoreEquals(TestFilteredQuery.java:182) at org.apache.lucene.search.TestFilteredQuery.testFilteredQuery(TestFilteredQuery.java:154) Build Log (for compile errors): [...truncated 1261 lines...] - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: [VOTE] Release Lucene/Solr 3.4.0, RC1
+1 to release. Checked docs and changes and they look ok. Shai On Mon, Sep 12, 2011 at 1:57 AM, Michael McCandless luc...@mikemccandless.com wrote: +1 to release. I ran the release smoke tester, it was happy! Mike McCandless http://blog.mikemccandless.com On Fri, Sep 9, 2011 at 12:06 PM, Michael McCandless luc...@mikemccandless.com wrote: Please vote to release the RC1 artifacts at: https://people.apache.org/~mikemccand/staging_area/lucene-solr-3.4.0-RC1-rev1167142 as Lucene 3.4.0 and Solr 3.4.0. Mike McCandless http://blog.mikemccandless.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
How to serach on specific file types ?
Hello I want to search on articles. So need to find only specific files like doc, docx, and pdf. can you help me?
How to see links in offline mode?
Hello I'm using Nutch to crawl in web. then send my data to Solr for index and search. when Solr search on indexes, the url of hints, which Solr finds, is linked to the web. But I want to have some titles which linked to my site. so I want to use crawled data in Nutch database to show any web pages or files that users search in my search engine! this is an offline search and our users wouldn't need to go on other web pages. can you help me?
[jira] [Commented] (LUCENE-3429) improve build system when tests hang
[ https://issues.apache.org/jira/browse/LUCENE-3429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102429#comment-13102429 ] Hoss Man commented on LUCENE-3429: -- bq. separately, I think we should have an ant-level timeout for the whole build, in case it goes completely crazy (e.g. jvm completely hangs or something else), just as an additional safety. jenkins's has a build option to handle this part (no personal experience with it though) bq. Dawid pointed me to some code of his, ... A per test annotation definitely seems like the killer solution. improve build system when tests hang Key: LUCENE-3429 URL: https://issues.apache.org/jira/browse/LUCENE-3429 Project: Lucene - Java Issue Type: Test Reporter: Robert Muir Fix For: 3.5, 4.0 Attachments: LUCENE-3429.patch Currently, if tests hang in hudson it can go hung for days until we manually kill it. The problem is that when a hang happens its probably serious, what we want to do (I think), is: # time out the build. # ensure we have enough debugging information to hopefully fix any hang. So I think the ideal solution would be: # add a sysprop -D that LuceneTestCase respects, it could default to no timeout at all (some value like zero). # when a timeout is set, LuceneTestCase spawns an additional timer thread for the test class? method? # if the timeout is exceeded, LuceneTestCase dumps all thread/stack information, random seed information to hopefully reproduce the hang, and fails the test. # nightly builds would pass some reasonable -D for each test. separately, I think we should have an ant-level timeout for the whole build, in case it goes completely crazy (e.g. jvm completely hangs or something else), just as an additional safety. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3396) Make TokenStream Reuse Mandatory for Analyzers
[ https://issues.apache.org/jira/browse/LUCENE-3396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13102443#comment-13102443 ] Chris Male commented on LUCENE-3396: Committed revision 1169607. Now attacking the remaining Analyzers. Make TokenStream Reuse Mandatory for Analyzers -- Key: LUCENE-3396 URL: https://issues.apache.org/jira/browse/LUCENE-3396 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Reporter: Chris Male Attachments: LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch, LUCENE-3396-rab.patch In LUCENE-2309 it became clear that we'd benefit a lot from Analyzer having to return reusable TokenStreams. This is a big chunk of work, but its time to bite the bullet. I plan to attack this in the following way: - Collapse the logic of ReusableAnalyzerBase into Analyzer - Add a ReuseStrategy abstraction to Analyzer which controls whether the TokenStreamComponents are reused globally (as they are today) or per-field. - Convert all Analyzers over to using TokenStreamComponents. I've already seen that some of the TokenStreams created in tests need some work to be reusable (even if they aren't reused). - Remove Analyzer.reusableTokenStream and convert everything over to using .tokenStream (which will now be returning reusable TokenStreams). -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Tika can not parse all of the persian pdf files
Hello I used Tika (of course in Nutch) to parse some persian pdf files. some of the files clearly transformed to a plain text. but about some of them, output was corrupted. I used ICU4J v4 library and the text changed to right-to-left mode. but the mentioned problem didn't resolve. insofar as Tika can not understand any charachter of input persian pdf file! I copy this text via Document Viewer in Linux: this is a clearly persian text ! -- هر روز پس از نماز صبح، سوره مباركه الرحمن را تا فباي آلاء ربكما تكذبان بخواند. ) اين يعني 21 آيه اول سوره ، كه در قرآن رسم الخط عثمانطه تقريبا يك نصف صفحه است. ( همچنين در روايات از حضرت رسول )ص( و ائمه اطهار )ع( آمده كه چند چيز براي قوت حافظه مفيد است: 1- مسواك كردن 2- روزه گرفتن 3- قرائت قرآن؛ مخصوصا آيه الكرسي 4- خوردن عسل 5- خوردن عدس 6- خوردن گوشت نزديک گردن -- Tike returns this output ! -- 92 @A 8 * B C9D !D ) (?) =/ () ,8 ; 8 # + 9!: L #)4 M() * 0 * -3IA J - 2 (+ G H -1 (+ J 5#+C 0T J (+ O - 6R . (+ O - 5 PH. (+ O -4 -- can anyone help me? thanks a lot