[jira] [Commented] (SOLR-4449) Enable backup requests for the internal solr load balancer
[ https://issues.apache.org/jira/browse/SOLR-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585772#comment-13585772 ] philip hoy commented on SOLR-4449: -- Hi Raintang, My apologies I think I have been sloppy in my language and that may have misled you. When I referred to shards in my comments previously, really I meant replicas. This jira covers how the replicas are load balanced not how the shard/slice requests are managed. I would think that should be dealt with in a separate issue. Enable backup requests for the internal solr load balancer -- Key: SOLR-4449 URL: https://issues.apache.org/jira/browse/SOLR-4449 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: philip hoy Priority: Minor Attachments: patch-4449.txt, SOLR-4449.patch Add the ability to configure the built-in solr load balancer such that it submits a backup request to the next server in the list if the initial request takes too long. Employing such an algorithm could improve the latency of the 9xth percentile albeit at the expense of increasing overall load due to additional requests. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Sorting issue
Hi, I have an issue with sorting in solr . I want to sort the solr results on the basis of a field which has indexed property set to true and is not multivalued. I am setting the sorting parameters using the addSortField method. Still I cannot see the solr results not sorted . Can you please guide me as to what needs to be done to solve this issue? Thanks Regards, Puneet Chaturvedi
Line length in Lucene/Solr code
According to https://wiki.apache.org/solr/HowToContribute, Sun's code style conventions should be used when writing contributions for Lucene and Solr. Said conventions state that lines in code should be 80 characters or less, since they're not handled well by many terminals and tools: http://www.oracle.com/technetwork/java/javase/documentation/codeconventions-136091.html#313 A quick random inspection of the Lucene/Solr code base tells me that this recommendation is not followed: Out of 20 source files, only a single one adhered to the 80 characters/line limit and that was StorageField, which is an interface. I am all for a larger limit as I find that it makes Java code a lot more readable. With current tools, Java code needs to be formatted using line breaks and indents (as opposed to fully dynamic tool-specific re-flow of the code). That formatting is dependent on a specific maximum line width to be consistent. With that in mind, I suggest that the code style recommendation is expanded with the notion that a maximum of x characters/line should be used, where x is something more than 80. Judging by a quick search, 120 chars seems to be a common choice. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Line length in Lucene/Solr code
+1 to raise the default of 80 to a minimum of 120. I really hate short lines (and I find that the longer lines are much more readable) :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, February 25, 2013 11:39 AM To: dev@lucene.apache.org Subject: Line length in Lucene/Solr code According to https://wiki.apache.org/solr/HowToContribute, Sun's code style conventions should be used when writing contributions for Lucene and Solr. Said conventions state that lines in code should be 80 characters or less, since they're not handled well by many terminals and tools: http://www.oracle.com/technetwork/java/javase/documentation/codecon ventions-136091.html#313 A quick random inspection of the Lucene/Solr code base tells me that this recommendation is not followed: Out of 20 source files, only a single one adhered to the 80 characters/line limit and that was StorageField, which is an interface. I am all for a larger limit as I find that it makes Java code a lot more readable. With current tools, Java code needs to be formatted using line breaks and indents (as opposed to fully dynamic tool-specific re-flow of the code). That formatting is dependent on a specific maximum line width to be consistent. With that in mind, I suggest that the code style recommendation is expanded with the notion that a maximum of x characters/line should be used, where x is something more than 80. Judging by a quick search, 120 chars seems to be a common choice. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Line length in Lucene/Solr code
+1 Mike McCandless http://blog.mikemccandless.com On Mon, Feb 25, 2013 at 5:59 AM, Uwe Schindler u...@thetaphi.de wrote: +1 to raise the default of 80 to a minimum of 120. I really hate short lines (and I find that the longer lines are much more readable) :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, February 25, 2013 11:39 AM To: dev@lucene.apache.org Subject: Line length in Lucene/Solr code According to https://wiki.apache.org/solr/HowToContribute, Sun's code style conventions should be used when writing contributions for Lucene and Solr. Said conventions state that lines in code should be 80 characters or less, since they're not handled well by many terminals and tools: http://www.oracle.com/technetwork/java/javase/documentation/codecon ventions-136091.html#313 A quick random inspection of the Lucene/Solr code base tells me that this recommendation is not followed: Out of 20 source files, only a single one adhered to the 80 characters/line limit and that was StorageField, which is an interface. I am all for a larger limit as I find that it makes Java code a lot more readable. With current tools, Java code needs to be formatted using line breaks and indents (as opposed to fully dynamic tool-specific re-flow of the code). That formatting is dependent on a specific maximum line width to be consistent. With that in mind, I suggest that the code style recommendation is expanded with the notion that a maximum of x characters/line should be used, where x is something more than 80. Judging by a quick search, 120 chars seems to be a common choice. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Line length in Lucene/Solr code
+1 Shai On Mon, Feb 25, 2013 at 1:01 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 Mike McCandless http://blog.mikemccandless.com On Mon, Feb 25, 2013 at 5:59 AM, Uwe Schindler u...@thetaphi.de wrote: +1 to raise the default of 80 to a minimum of 120. I really hate short lines (and I find that the longer lines are much more readable) :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, February 25, 2013 11:39 AM To: dev@lucene.apache.org Subject: Line length in Lucene/Solr code According to https://wiki.apache.org/solr/HowToContribute, Sun's code style conventions should be used when writing contributions for Lucene and Solr. Said conventions state that lines in code should be 80 characters or less, since they're not handled well by many terminals and tools: http://www.oracle.com/technetwork/java/javase/documentation/codecon ventions-136091.html#313 A quick random inspection of the Lucene/Solr code base tells me that this recommendation is not followed: Out of 20 source files, only a single one adhered to the 80 characters/line limit and that was StorageField, which is an interface. I am all for a larger limit as I find that it makes Java code a lot more readable. With current tools, Java code needs to be formatted using line breaks and indents (as opposed to fully dynamic tool-specific re-flow of the code). That formatting is dependent on a specific maximum line width to be consistent. With that in mind, I suggest that the code style recommendation is expanded with the notion that a maximum of x characters/line should be used, where x is something more than 80. Judging by a quick search, 120 chars seems to be a common choice. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Line length in Lucene/Solr code
+1 Christian Moen http://www.atilika.com On Feb 25, 2013, at 8:01 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 Mike McCandless http://blog.mikemccandless.com On Mon, Feb 25, 2013 at 5:59 AM, Uwe Schindler u...@thetaphi.de wrote: +1 to raise the default of 80 to a minimum of 120. I really hate short lines (and I find that the longer lines are much more readable) :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, February 25, 2013 11:39 AM To: dev@lucene.apache.org Subject: Line length in Lucene/Solr code According to https://wiki.apache.org/solr/HowToContribute, Sun's code style conventions should be used when writing contributions for Lucene and Solr. Said conventions state that lines in code should be 80 characters or less, since they're not handled well by many terminals and tools: http://www.oracle.com/technetwork/java/javase/documentation/codecon ventions-136091.html#313 A quick random inspection of the Lucene/Solr code base tells me that this recommendation is not followed: Out of 20 source files, only a single one adhered to the 80 characters/line limit and that was StorageField, which is an interface. I am all for a larger limit as I find that it makes Java code a lot more readable. With current tools, Java code needs to be formatted using line breaks and indents (as opposed to fully dynamic tool-specific re-flow of the code). That formatting is dependent on a specific maximum line width to be consistent. With that in mind, I suggest that the code style recommendation is expanded with the notion that a maximum of x characters/line should be used, where x is something more than 80. Judging by a quick search, 120 chars seems to be a common choice. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Line length in Lucene/Solr code
+1 On Mon, Feb 25, 2013 at 12:05 PM, Christian Moen c...@atilika.com wrote: +1 Christian Moen http://www.atilika.com On Feb 25, 2013, at 8:01 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 Mike McCandless http://blog.mikemccandless.com On Mon, Feb 25, 2013 at 5:59 AM, Uwe Schindler u...@thetaphi.de wrote: +1 to raise the default of 80 to a minimum of 120. I really hate short lines (and I find that the longer lines are much more readable) :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, February 25, 2013 11:39 AM To: dev@lucene.apache.org Subject: Line length in Lucene/Solr code According to https://wiki.apache.org/solr/HowToContribute, Sun's code style conventions should be used when writing contributions for Lucene and Solr. Said conventions state that lines in code should be 80 characters or less, since they're not handled well by many terminals and tools: http://www.oracle.com/technetwork/java/javase/documentation/codecon ventions-136091.html#313 A quick random inspection of the Lucene/Solr code base tells me that this recommendation is not followed: Out of 20 source files, only a single one adhered to the 80 characters/line limit and that was StorageField, which is an interface. I am all for a larger limit as I find that it makes Java code a lot more readable. With current tools, Java code needs to be formatted using line breaks and indents (as opposed to fully dynamic tool-specific re-flow of the code). That formatting is dependent on a specific maximum line width to be consistent. With that in mind, I suggest that the code style recommendation is expanded with the notion that a maximum of x characters/line should be used, where x is something more than 80. Judging by a quick search, 120 chars seems to be a common choice. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4771) Query-time join collectors could maybe be more efficient
[ https://issues.apache.org/jira/browse/LUCENE-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Martijn van Groningen updated LUCENE-4771: -- Attachment: LUCENE-4771-prototype.patch Updated patch with the current trunk codebase. Query-time join collectors could maybe be more efficient Key: LUCENE-4771 URL: https://issues.apache.org/jira/browse/LUCENE-4771 Project: Lucene - Core Issue Type: Improvement Components: modules/join Reporter: Robert Muir Attachments: LUCENE-4771_prototype.patch, LUCENE-4771-prototype.patch, LUCENE-4771_prototype_without_bug.patch I was looking @ these collectors on LUCENE-4765 and I noticed: * SingleValued collector (SV) pulls FieldCache.getTerms and adds the bytes to a bytesrefhash per-collect. * MultiValued collector (MV) pulls FieldCache.getDocTermsOrds, but doesnt use the ords, just looks up each value and adds the bytes per-collect. I think instead its worth investigating if SV should use getTermsIndex, and both collectors just collect-up their per-segment ords in something like a BitSet[maxOrd]. When asked for the terms at the end in getCollectorTerms(), they could merge these into one BytesRefHash. Of course, if you are going to turn around and execute the query against the same searcher anyway (is this the typical case?), this could even be more efficient: No need to hash or instantiate all the terms in memory, we could do postpone the lookups to SeekingTermSetTermsEnum.accept()/nextSeekTerm() i think... somehow :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Race condition in Solr, plea fo help and/or advice
OK, in working on SOLR-4196 I'm exercising opening/closing cores as never before. I have a little stress program that does about the worst thing possible, essentially opens and closes a core for every request. It has a bunch of query and update threads running simultaneously that pick a random core and do a query or update. I've got a bunch of code that, I think, prevents any attempt to open _or_ close a core while it is being either opened or closed by another thread (but I'm verifying). It runs fine for a couple of hours, then hits a race condition. I was able to get a stack trace (see below). CloserThread.run(CoreContainer.java:1920) (second thread below) is, indeed, new code. The stress-test program is updating cores (which may not be loaded) like crazy and doing queries on other random cores. It's perfectly possible to be updating a core that's in the process of being closed, I was counting on the ref counting to make this OK... The cores are transient in a limited cache, so they come and go. It looks like I'm trying to close a core at the same time an update has come in, but I'm not sure whether this is something that should be prevented from the new code or is an underlying problem. So a couple of questions: 1 SOLR-4196 has a whole series of improvements that even let us get here. Running the stress test program against current trunk barfs before having time to hit this condition, so the current state is an improvement. What do you think about me checking 4196 in and opening a separate JIRA for this issue? 2 Any suggestions on what direction to go next? If it's something easy, I can just fold it into this patch. 3 Am I just going about things bass-ackwards? Not an unusual state of affairs unfortunately. NOTE: The current patch for SOLR-4196 isn't the one running with this code, there are a couple more things I want change. Mostly I'm asking if someone familiar with the code where the race is encountered has a quick fix Thanks, Erick Found one Java-level deadlock: = commitScheduler-122579-thread-1: waiting to lock monitor 7f87c3076d38 (object 78b379a28, a org.apache.solr.update.DefaultSolrCoreState), which is held by Thread-15 Thread-15: waiting for ownable synchronizer 765e84638, (a java.util.concurrent.locks.ReentrantLock$NonfairSync), which is held by commitScheduler-122579-thread-1 Java stack information for the threads listed above: === commitScheduler-122579-thread-1: at org.apache.solr.update.DefaultSolrCoreState.getIndexWriter(DefaultSolrCoreState.java:82) - waiting to lock 78b379a28 (a org.apache.solr.update.DefaultSolrCoreState) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1354) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:573) - locked 76aa46f58 (a java.lang.Object) at org.apache.solr.update.CommitTracker.run(CommitTracker.java:216) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:680) Thread-15: at sun.misc.Unsafe.park(Native Method) - parking to wait for 765e84638 (a java.util.concurrent.locks.ReentrantLock$NonfairSync) at java.util.concurrent.locks.LockSupport.park(LockSupport.java:156) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:811) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:842) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1178) at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:186) at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:262) at org.apache.solr.update.DirectUpdateHandler2.closeWriter(DirectUpdateHandler2.java:680) at org.apache.solr.update.DefaultSolrCoreState.closeIndexWriter(DefaultSolrCoreState.java:68) - locked 78b379a28 (a org.apache.solr.update.DefaultSolrCoreState) at org.apache.solr.update.DefaultSolrCoreState.close(DefaultSolrCoreState.java:289) - locked 78b379a28 (a org.apache.solr.update.DefaultSolrCoreState) at org.apache.solr.update.SolrCoreState.decrefSolrCoreState(SolrCoreState.java:68) - locked 78b379a28 (a org.apache.solr.update.DefaultSolrCoreState) at org.apache.solr.core.SolrCore.close(SolrCore.java:975) at org.apache.solr.core.CloserThread.run(CoreContainer.java:1920) Found 1 deadlock.
[jira] [Created] (LUCENE-4795) Add FacetsCollector based on SortedSetDocValues
Michael McCandless created LUCENE-4795: -- Summary: Add FacetsCollector based on SortedSetDocValues Key: LUCENE-4795 URL: https://issues.apache.org/jira/browse/LUCENE-4795 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Recently (LUCENE-4765) we added multi-valued DocValues field (SortedSetDocValuesField), and this can be used for faceting in Solr (SOLR-4490). I think we should also add support in the facet module? It'd be an option with different tradeoffs. Eg, it wouldn't require the taxonomy index, since the main index handles label/ord resolving. There are at least two possible approaches: * On every reopen, build the seg - global ord map, and then on every collect, get the seg ord, map it to the global ord space, and increment counts. This adds cost during reopen in proportion to number of unique terms ... * On every collect, increment counts based on the seg ords, and then do a merge in the end just like distributed faceting does. The first approach is much easier so I built a quick prototype using that. The prototype does the counting, but it does NOT do the top K facets gathering in the end, and it doesn't know parent/child ord relationships, so there's tons more to do before this is real. I also was unsure how to properly integrate it since the existing classes seem to expect that you use a taxonomy index to resolve ords. I ran a quick performance test. base = trunk except I disabled the compute top-K in FacetsAccumulator to make the comparison fair; comp = using the prototype collector in the patch: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff OrHighLow 18.79 (2.5%) 14.36 (3.3%) -23.6% ( -28% - -18%) HighTerm 21.58 (2.4%) 16.53 (3.7%) -23.4% ( -28% - -17%) OrHighMed 18.20 (2.5%) 13.99 (3.3%) -23.2% ( -28% - -17%) Prefix3 14.37 (1.5%) 11.62 (3.5%) -19.1% ( -23% - -14%) LowTerm 130.80 (1.6%) 106.95 (2.4%) -18.2% ( -21% - -14%) OrHighHigh9.60 (2.6%)7.88 (3.5%) -17.9% ( -23% - -12%) AndHighHigh 24.61 (0.7%) 20.74 (1.9%) -15.7% ( -18% - -13%) Fuzzy1 49.40 (2.5%) 43.48 (1.9%) -12.0% ( -15% - -7%) MedSloppyPhrase 27.06 (1.6%) 23.95 (2.3%) -11.5% ( -15% - -7%) MedTerm 51.43 (2.0%) 46.21 (2.7%) -10.2% ( -14% - -5%) IntNRQ4.02 (1.6%)3.63 (4.0%) -9.7% ( -15% - -4%) Wildcard 29.14 (1.5%) 26.46 (2.5%) -9.2% ( -13% - -5%) HighSloppyPhrase0.92 (4.5%)0.87 (5.8%) -5.4% ( -15% -5%) MedSpanNear 29.51 (2.5%) 27.94 (2.2%) -5.3% ( -9% -0%) HighSpanNear3.55 (2.4%)3.38 (2.0%) -4.9% ( -9% -0%) AndHighMed 108.34 (0.9%) 104.55 (1.1%) -3.5% ( -5% - -1%) LowSloppyPhrase 20.50 (2.0%) 20.09 (4.2%) -2.0% ( -8% -4%) LowPhrase 21.60 (6.0%) 21.26 (5.1%) -1.6% ( -11% - 10%) Fuzzy2 53.16 (3.9%) 52.40 (2.7%) -1.4% ( -7% -5%) LowSpanNear8.42 (3.2%)8.45 (3.0%) 0.3% ( -5% -6%) Respell 45.17 (4.3%) 45.38 (4.4%) 0.5% ( -7% -9%) MedPhrase 113.93 (5.8%) 115.02 (4.9%) 1.0% ( -9% - 12%) AndHighLow 596.42 (2.5%) 617.12 (2.8%) 3.5% ( -1% -8%) HighPhrase 17.30 (10.5%) 18.36 (9.1%) 6.2% ( -12% - 28%) {noformat} I'm impressed that this approach is only ~24% slower in the worst case! I think this means it's a good option to make available? Yes it has downsides (NRT reopen more costly, small added RAM usage, slightly slower faceting), but it's also simpler (no taxo index to manage). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4795) Add FacetsCollector based on SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4795: --- Attachment: LUCENE-4795.patch Add FacetsCollector based on SortedSetDocValues --- Key: LUCENE-4795 URL: https://issues.apache.org/jira/browse/LUCENE-4795 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Attachments: LUCENE-4795.patch Recently (LUCENE-4765) we added multi-valued DocValues field (SortedSetDocValuesField), and this can be used for faceting in Solr (SOLR-4490). I think we should also add support in the facet module? It'd be an option with different tradeoffs. Eg, it wouldn't require the taxonomy index, since the main index handles label/ord resolving. There are at least two possible approaches: * On every reopen, build the seg - global ord map, and then on every collect, get the seg ord, map it to the global ord space, and increment counts. This adds cost during reopen in proportion to number of unique terms ... * On every collect, increment counts based on the seg ords, and then do a merge in the end just like distributed faceting does. The first approach is much easier so I built a quick prototype using that. The prototype does the counting, but it does NOT do the top K facets gathering in the end, and it doesn't know parent/child ord relationships, so there's tons more to do before this is real. I also was unsure how to properly integrate it since the existing classes seem to expect that you use a taxonomy index to resolve ords. I ran a quick performance test. base = trunk except I disabled the compute top-K in FacetsAccumulator to make the comparison fair; comp = using the prototype collector in the patch: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff OrHighLow 18.79 (2.5%) 14.36 (3.3%) -23.6% ( -28% - -18%) HighTerm 21.58 (2.4%) 16.53 (3.7%) -23.4% ( -28% - -17%) OrHighMed 18.20 (2.5%) 13.99 (3.3%) -23.2% ( -28% - -17%) Prefix3 14.37 (1.5%) 11.62 (3.5%) -19.1% ( -23% - -14%) LowTerm 130.80 (1.6%) 106.95 (2.4%) -18.2% ( -21% - -14%) OrHighHigh9.60 (2.6%)7.88 (3.5%) -17.9% ( -23% - -12%) AndHighHigh 24.61 (0.7%) 20.74 (1.9%) -15.7% ( -18% - -13%) Fuzzy1 49.40 (2.5%) 43.48 (1.9%) -12.0% ( -15% - -7%) MedSloppyPhrase 27.06 (1.6%) 23.95 (2.3%) -11.5% ( -15% - -7%) MedTerm 51.43 (2.0%) 46.21 (2.7%) -10.2% ( -14% - -5%) IntNRQ4.02 (1.6%)3.63 (4.0%) -9.7% ( -15% - -4%) Wildcard 29.14 (1.5%) 26.46 (2.5%) -9.2% ( -13% - -5%) HighSloppyPhrase0.92 (4.5%)0.87 (5.8%) -5.4% ( -15% -5%) MedSpanNear 29.51 (2.5%) 27.94 (2.2%) -5.3% ( -9% -0%) HighSpanNear3.55 (2.4%)3.38 (2.0%) -4.9% ( -9% -0%) AndHighMed 108.34 (0.9%) 104.55 (1.1%) -3.5% ( -5% - -1%) LowSloppyPhrase 20.50 (2.0%) 20.09 (4.2%) -2.0% ( -8% -4%) LowPhrase 21.60 (6.0%) 21.26 (5.1%) -1.6% ( -11% - 10%) Fuzzy2 53.16 (3.9%) 52.40 (2.7%) -1.4% ( -7% -5%) LowSpanNear8.42 (3.2%)8.45 (3.0%) 0.3% ( -5% -6%) Respell 45.17 (4.3%) 45.38 (4.4%) 0.5% ( -7% -9%) MedPhrase 113.93 (5.8%) 115.02 (4.9%) 1.0% ( -9% - 12%) AndHighLow 596.42 (2.5%) 617.12 (2.8%) 3.5% ( -1% -8%) HighPhrase 17.30 (10.5%) 18.36 (9.1%) 6.2% ( -12% - 28%) {noformat} I'm impressed that this approach is only ~24% slower in the worst case! I think this means it's a good option to make available? Yes it has downsides (NRT reopen more costly, small added RAM usage, slightly slower faceting), but it's also simpler (no taxo index to manage). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see:
[jira] [Updated] (LUCENE-4795) Add FacetsCollector based on SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-4795: Attachment: pleaseBenchmarkMe.patch Thanks for benchmarking this approach Mike! I'm happy with the results, though i still added a TODO that we should investigate the cost of the special packed-ints compression we do. can you benchmark the attached change just out of curiousity? Add FacetsCollector based on SortedSetDocValues --- Key: LUCENE-4795 URL: https://issues.apache.org/jira/browse/LUCENE-4795 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Attachments: LUCENE-4795.patch, pleaseBenchmarkMe.patch Recently (LUCENE-4765) we added multi-valued DocValues field (SortedSetDocValuesField), and this can be used for faceting in Solr (SOLR-4490). I think we should also add support in the facet module? It'd be an option with different tradeoffs. Eg, it wouldn't require the taxonomy index, since the main index handles label/ord resolving. There are at least two possible approaches: * On every reopen, build the seg - global ord map, and then on every collect, get the seg ord, map it to the global ord space, and increment counts. This adds cost during reopen in proportion to number of unique terms ... * On every collect, increment counts based on the seg ords, and then do a merge in the end just like distributed faceting does. The first approach is much easier so I built a quick prototype using that. The prototype does the counting, but it does NOT do the top K facets gathering in the end, and it doesn't know parent/child ord relationships, so there's tons more to do before this is real. I also was unsure how to properly integrate it since the existing classes seem to expect that you use a taxonomy index to resolve ords. I ran a quick performance test. base = trunk except I disabled the compute top-K in FacetsAccumulator to make the comparison fair; comp = using the prototype collector in the patch: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff OrHighLow 18.79 (2.5%) 14.36 (3.3%) -23.6% ( -28% - -18%) HighTerm 21.58 (2.4%) 16.53 (3.7%) -23.4% ( -28% - -17%) OrHighMed 18.20 (2.5%) 13.99 (3.3%) -23.2% ( -28% - -17%) Prefix3 14.37 (1.5%) 11.62 (3.5%) -19.1% ( -23% - -14%) LowTerm 130.80 (1.6%) 106.95 (2.4%) -18.2% ( -21% - -14%) OrHighHigh9.60 (2.6%)7.88 (3.5%) -17.9% ( -23% - -12%) AndHighHigh 24.61 (0.7%) 20.74 (1.9%) -15.7% ( -18% - -13%) Fuzzy1 49.40 (2.5%) 43.48 (1.9%) -12.0% ( -15% - -7%) MedSloppyPhrase 27.06 (1.6%) 23.95 (2.3%) -11.5% ( -15% - -7%) MedTerm 51.43 (2.0%) 46.21 (2.7%) -10.2% ( -14% - -5%) IntNRQ4.02 (1.6%)3.63 (4.0%) -9.7% ( -15% - -4%) Wildcard 29.14 (1.5%) 26.46 (2.5%) -9.2% ( -13% - -5%) HighSloppyPhrase0.92 (4.5%)0.87 (5.8%) -5.4% ( -15% -5%) MedSpanNear 29.51 (2.5%) 27.94 (2.2%) -5.3% ( -9% -0%) HighSpanNear3.55 (2.4%)3.38 (2.0%) -4.9% ( -9% -0%) AndHighMed 108.34 (0.9%) 104.55 (1.1%) -3.5% ( -5% - -1%) LowSloppyPhrase 20.50 (2.0%) 20.09 (4.2%) -2.0% ( -8% -4%) LowPhrase 21.60 (6.0%) 21.26 (5.1%) -1.6% ( -11% - 10%) Fuzzy2 53.16 (3.9%) 52.40 (2.7%) -1.4% ( -7% -5%) LowSpanNear8.42 (3.2%)8.45 (3.0%) 0.3% ( -5% -6%) Respell 45.17 (4.3%) 45.38 (4.4%) 0.5% ( -7% -9%) MedPhrase 113.93 (5.8%) 115.02 (4.9%) 1.0% ( -9% - 12%) AndHighLow 596.42 (2.5%) 617.12 (2.8%) 3.5% ( -1% -8%) HighPhrase 17.30 (10.5%) 18.36 (9.1%) 6.2% ( -12% - 28%) {noformat} I'm impressed that this approach is only ~24% slower in the worst case! I think this means it's a good option to make available? Yes it has downsides (NRT reopen more costly, small added
[jira] [Commented] (LUCENE-4795) Add FacetsCollector based on SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585878#comment-13585878 ] Adrien Grand commented on LUCENE-4795: -- Not having to manage a taxonomy index is very appealing to me! What about collecting based on segment ords and bulk translating these ords to the global ords in setNextReader and when the collection ends? This way ordinalMap.get would be called less often (once per value per segment instead of once per value per doc per segment) and in a sequential way so I assume it would be faster while remaining easy to implement? Add FacetsCollector based on SortedSetDocValues --- Key: LUCENE-4795 URL: https://issues.apache.org/jira/browse/LUCENE-4795 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Attachments: LUCENE-4795.patch, pleaseBenchmarkMe.patch Recently (LUCENE-4765) we added multi-valued DocValues field (SortedSetDocValuesField), and this can be used for faceting in Solr (SOLR-4490). I think we should also add support in the facet module? It'd be an option with different tradeoffs. Eg, it wouldn't require the taxonomy index, since the main index handles label/ord resolving. There are at least two possible approaches: * On every reopen, build the seg - global ord map, and then on every collect, get the seg ord, map it to the global ord space, and increment counts. This adds cost during reopen in proportion to number of unique terms ... * On every collect, increment counts based on the seg ords, and then do a merge in the end just like distributed faceting does. The first approach is much easier so I built a quick prototype using that. The prototype does the counting, but it does NOT do the top K facets gathering in the end, and it doesn't know parent/child ord relationships, so there's tons more to do before this is real. I also was unsure how to properly integrate it since the existing classes seem to expect that you use a taxonomy index to resolve ords. I ran a quick performance test. base = trunk except I disabled the compute top-K in FacetsAccumulator to make the comparison fair; comp = using the prototype collector in the patch: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff OrHighLow 18.79 (2.5%) 14.36 (3.3%) -23.6% ( -28% - -18%) HighTerm 21.58 (2.4%) 16.53 (3.7%) -23.4% ( -28% - -17%) OrHighMed 18.20 (2.5%) 13.99 (3.3%) -23.2% ( -28% - -17%) Prefix3 14.37 (1.5%) 11.62 (3.5%) -19.1% ( -23% - -14%) LowTerm 130.80 (1.6%) 106.95 (2.4%) -18.2% ( -21% - -14%) OrHighHigh9.60 (2.6%)7.88 (3.5%) -17.9% ( -23% - -12%) AndHighHigh 24.61 (0.7%) 20.74 (1.9%) -15.7% ( -18% - -13%) Fuzzy1 49.40 (2.5%) 43.48 (1.9%) -12.0% ( -15% - -7%) MedSloppyPhrase 27.06 (1.6%) 23.95 (2.3%) -11.5% ( -15% - -7%) MedTerm 51.43 (2.0%) 46.21 (2.7%) -10.2% ( -14% - -5%) IntNRQ4.02 (1.6%)3.63 (4.0%) -9.7% ( -15% - -4%) Wildcard 29.14 (1.5%) 26.46 (2.5%) -9.2% ( -13% - -5%) HighSloppyPhrase0.92 (4.5%)0.87 (5.8%) -5.4% ( -15% -5%) MedSpanNear 29.51 (2.5%) 27.94 (2.2%) -5.3% ( -9% -0%) HighSpanNear3.55 (2.4%)3.38 (2.0%) -4.9% ( -9% -0%) AndHighMed 108.34 (0.9%) 104.55 (1.1%) -3.5% ( -5% - -1%) LowSloppyPhrase 20.50 (2.0%) 20.09 (4.2%) -2.0% ( -8% -4%) LowPhrase 21.60 (6.0%) 21.26 (5.1%) -1.6% ( -11% - 10%) Fuzzy2 53.16 (3.9%) 52.40 (2.7%) -1.4% ( -7% -5%) LowSpanNear8.42 (3.2%)8.45 (3.0%) 0.3% ( -5% -6%) Respell 45.17 (4.3%) 45.38 (4.4%) 0.5% ( -7% -9%) MedPhrase 113.93 (5.8%) 115.02 (4.9%) 1.0% ( -9% - 12%) AndHighLow 596.42 (2.5%) 617.12 (2.8%) 3.5% ( -1% -8%) HighPhrase 17.30 (10.5%) 18.36 (9.1%) 6.2% ( -12% - 28%)
Re: Line length in Lucene/Solr code
+1 -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com 25. feb. 2013 kl. 13:37 skrev Simon Willnauer simon.willna...@gmail.com: +1 On Mon, Feb 25, 2013 at 12:05 PM, Christian Moen c...@atilika.com wrote: +1 Christian Moen http://www.atilika.com On Feb 25, 2013, at 8:01 PM, Michael McCandless luc...@mikemccandless.com wrote: +1 Mike McCandless http://blog.mikemccandless.com On Mon, Feb 25, 2013 at 5:59 AM, Uwe Schindler u...@thetaphi.de wrote: +1 to raise the default of 80 to a minimum of 120. I really hate short lines (and I find that the longer lines are much more readable) :-) - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, February 25, 2013 11:39 AM To: dev@lucene.apache.org Subject: Line length in Lucene/Solr code According to https://wiki.apache.org/solr/HowToContribute, Sun's code style conventions should be used when writing contributions for Lucene and Solr. Said conventions state that lines in code should be 80 characters or less, since they're not handled well by many terminals and tools: http://www.oracle.com/technetwork/java/javase/documentation/codecon ventions-136091.html#313 A quick random inspection of the Lucene/Solr code base tells me that this recommendation is not followed: Out of 20 source files, only a single one adhered to the 80 characters/line limit and that was StorageField, which is an interface. I am all for a larger limit as I find that it makes Java code a lot more readable. With current tools, Java code needs to be formatted using line breaks and indents (as opposed to fully dynamic tool-specific re-flow of the code). That formatting is dependent on a specific maximum line width to be consistent. With that in mind, I suggest that the code style recommendation is expanded with the notion that a maximum of x characters/line should be used, where x is something more than 80. Judging by a quick search, 120 chars seems to be a common choice. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4795) Add FacetsCollector based on SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585882#comment-13585882 ] Michael McCandless commented on LUCENE-4795: bq. What about collecting based on segment ords and bulk translating these ords to the global ords in setNextReader and when the collection ends? That sounds great! I'll try that. Add FacetsCollector based on SortedSetDocValues --- Key: LUCENE-4795 URL: https://issues.apache.org/jira/browse/LUCENE-4795 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Attachments: LUCENE-4795.patch, pleaseBenchmarkMe.patch Recently (LUCENE-4765) we added multi-valued DocValues field (SortedSetDocValuesField), and this can be used for faceting in Solr (SOLR-4490). I think we should also add support in the facet module? It'd be an option with different tradeoffs. Eg, it wouldn't require the taxonomy index, since the main index handles label/ord resolving. There are at least two possible approaches: * On every reopen, build the seg - global ord map, and then on every collect, get the seg ord, map it to the global ord space, and increment counts. This adds cost during reopen in proportion to number of unique terms ... * On every collect, increment counts based on the seg ords, and then do a merge in the end just like distributed faceting does. The first approach is much easier so I built a quick prototype using that. The prototype does the counting, but it does NOT do the top K facets gathering in the end, and it doesn't know parent/child ord relationships, so there's tons more to do before this is real. I also was unsure how to properly integrate it since the existing classes seem to expect that you use a taxonomy index to resolve ords. I ran a quick performance test. base = trunk except I disabled the compute top-K in FacetsAccumulator to make the comparison fair; comp = using the prototype collector in the patch: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff OrHighLow 18.79 (2.5%) 14.36 (3.3%) -23.6% ( -28% - -18%) HighTerm 21.58 (2.4%) 16.53 (3.7%) -23.4% ( -28% - -17%) OrHighMed 18.20 (2.5%) 13.99 (3.3%) -23.2% ( -28% - -17%) Prefix3 14.37 (1.5%) 11.62 (3.5%) -19.1% ( -23% - -14%) LowTerm 130.80 (1.6%) 106.95 (2.4%) -18.2% ( -21% - -14%) OrHighHigh9.60 (2.6%)7.88 (3.5%) -17.9% ( -23% - -12%) AndHighHigh 24.61 (0.7%) 20.74 (1.9%) -15.7% ( -18% - -13%) Fuzzy1 49.40 (2.5%) 43.48 (1.9%) -12.0% ( -15% - -7%) MedSloppyPhrase 27.06 (1.6%) 23.95 (2.3%) -11.5% ( -15% - -7%) MedTerm 51.43 (2.0%) 46.21 (2.7%) -10.2% ( -14% - -5%) IntNRQ4.02 (1.6%)3.63 (4.0%) -9.7% ( -15% - -4%) Wildcard 29.14 (1.5%) 26.46 (2.5%) -9.2% ( -13% - -5%) HighSloppyPhrase0.92 (4.5%)0.87 (5.8%) -5.4% ( -15% -5%) MedSpanNear 29.51 (2.5%) 27.94 (2.2%) -5.3% ( -9% -0%) HighSpanNear3.55 (2.4%)3.38 (2.0%) -4.9% ( -9% -0%) AndHighMed 108.34 (0.9%) 104.55 (1.1%) -3.5% ( -5% - -1%) LowSloppyPhrase 20.50 (2.0%) 20.09 (4.2%) -2.0% ( -8% -4%) LowPhrase 21.60 (6.0%) 21.26 (5.1%) -1.6% ( -11% - 10%) Fuzzy2 53.16 (3.9%) 52.40 (2.7%) -1.4% ( -7% -5%) LowSpanNear8.42 (3.2%)8.45 (3.0%) 0.3% ( -5% -6%) Respell 45.17 (4.3%) 45.38 (4.4%) 0.5% ( -7% -9%) MedPhrase 113.93 (5.8%) 115.02 (4.9%) 1.0% ( -9% - 12%) AndHighLow 596.42 (2.5%) 617.12 (2.8%) 3.5% ( -1% -8%) HighPhrase 17.30 (10.5%) 18.36 (9.1%) 6.2% ( -12% - 28%) {noformat} I'm impressed that this approach is only ~24% slower in the worst case! I think this means it's a good option to make available? Yes it has downsides (NRT reopen more costly, small added RAM usage, slightly slower
[jira] [Updated] (SOLR-839) XML Query Parser support
[ https://issues.apache.org/jira/browse/SOLR-839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Erik Hatcher updated SOLR-839: -- Attachment: SOLR-839-object-parser.patch This patch depends on the object parsing approach in SOLR-4351. This is a different approach from using Lucene's XML Query Parser. The XMLQueryParser is neat and all, but the builders aren't going to work well with Solr's schema. I tinkered with a SolrQueryBuilder, and that mostly works, but nested XML queries weren't working, so I revamped using the object parser. XML Query Parser support Key: SOLR-839 URL: https://issues.apache.org/jira/browse/SOLR-839 Project: Solr Issue Type: New Feature Components: query parsers Affects Versions: 1.3 Reporter: Erik Hatcher Assignee: Erik Hatcher Fix For: 5.0 Attachments: lucene-xml-query-parser-2.4-dev.jar, SOLR-839-object-parser.patch, SOLR-839.patch Lucene contrib includes a query parser that is able to create the full-spectrum of Lucene queries, using an XML data structure. This patch adds xml query parser support to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4795) Add FacetsCollector based on SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585892#comment-13585892 ] Shai Erera commented on LUCENE-4795: Nice Mike. If you want to integrate that with the current classes, all you need to do is to implement a partial TaxonomyReader, which resolves ordinals to CPs using the global ord map? Or actually make that TR the entity that's responsible to manage to global ordinal map, so that TR.doOpenIfChanged opens the new segments and updates the global map? Since this taxonomy, at least currently, doesn't support hierarchical facets, you'll need to hack something as a ParallelTaxoArray, but that should be easy .. I think. Is the only benefit in this approach that you don't need to manage a sidecar taxonomy index? Add FacetsCollector based on SortedSetDocValues --- Key: LUCENE-4795 URL: https://issues.apache.org/jira/browse/LUCENE-4795 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Attachments: LUCENE-4795.patch, pleaseBenchmarkMe.patch Recently (LUCENE-4765) we added multi-valued DocValues field (SortedSetDocValuesField), and this can be used for faceting in Solr (SOLR-4490). I think we should also add support in the facet module? It'd be an option with different tradeoffs. Eg, it wouldn't require the taxonomy index, since the main index handles label/ord resolving. There are at least two possible approaches: * On every reopen, build the seg - global ord map, and then on every collect, get the seg ord, map it to the global ord space, and increment counts. This adds cost during reopen in proportion to number of unique terms ... * On every collect, increment counts based on the seg ords, and then do a merge in the end just like distributed faceting does. The first approach is much easier so I built a quick prototype using that. The prototype does the counting, but it does NOT do the top K facets gathering in the end, and it doesn't know parent/child ord relationships, so there's tons more to do before this is real. I also was unsure how to properly integrate it since the existing classes seem to expect that you use a taxonomy index to resolve ords. I ran a quick performance test. base = trunk except I disabled the compute top-K in FacetsAccumulator to make the comparison fair; comp = using the prototype collector in the patch: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff OrHighLow 18.79 (2.5%) 14.36 (3.3%) -23.6% ( -28% - -18%) HighTerm 21.58 (2.4%) 16.53 (3.7%) -23.4% ( -28% - -17%) OrHighMed 18.20 (2.5%) 13.99 (3.3%) -23.2% ( -28% - -17%) Prefix3 14.37 (1.5%) 11.62 (3.5%) -19.1% ( -23% - -14%) LowTerm 130.80 (1.6%) 106.95 (2.4%) -18.2% ( -21% - -14%) OrHighHigh9.60 (2.6%)7.88 (3.5%) -17.9% ( -23% - -12%) AndHighHigh 24.61 (0.7%) 20.74 (1.9%) -15.7% ( -18% - -13%) Fuzzy1 49.40 (2.5%) 43.48 (1.9%) -12.0% ( -15% - -7%) MedSloppyPhrase 27.06 (1.6%) 23.95 (2.3%) -11.5% ( -15% - -7%) MedTerm 51.43 (2.0%) 46.21 (2.7%) -10.2% ( -14% - -5%) IntNRQ4.02 (1.6%)3.63 (4.0%) -9.7% ( -15% - -4%) Wildcard 29.14 (1.5%) 26.46 (2.5%) -9.2% ( -13% - -5%) HighSloppyPhrase0.92 (4.5%)0.87 (5.8%) -5.4% ( -15% -5%) MedSpanNear 29.51 (2.5%) 27.94 (2.2%) -5.3% ( -9% -0%) HighSpanNear3.55 (2.4%)3.38 (2.0%) -4.9% ( -9% -0%) AndHighMed 108.34 (0.9%) 104.55 (1.1%) -3.5% ( -5% - -1%) LowSloppyPhrase 20.50 (2.0%) 20.09 (4.2%) -2.0% ( -8% -4%) LowPhrase 21.60 (6.0%) 21.26 (5.1%) -1.6% ( -11% - 10%) Fuzzy2 53.16 (3.9%) 52.40 (2.7%) -1.4% ( -7% -5%) LowSpanNear8.42 (3.2%)8.45 (3.0%) 0.3% ( -5% -6%) Respell 45.17 (4.3%) 45.38 (4.4%) 0.5% ( -7% -9%) MedPhrase 113.93 (5.8%) 115.02 (4.9%) 1.0% ( -9% - 12%)
Re: Sorting issue
You have to give us some more idea what you _are_ seeing. Just because a field isn't multiValued doesn't mean you can sort on it, it must also not be tokenized, so text fields that have an analysis chain are generally not good candidates for sorting. What do your solr logs show? What response do you get? Can you sort just by using the browser URL? You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Feb 25, 2013 at 5:28 AM, Chaturvedi, Puneet pchaturv...@columnit.com wrote: Hi, I have an issue with sorting in solr . I want to sort the solr results on the basis of a field which has “indexed” property set to “true” and is not multivalued. I am setting the sorting parameters using the “addSortField” method. Still I cannot see the solr results not sorted . Can you please guide me as to what needs to be done to solve this issue? ** ** Thanks Regards, Puneet Chaturvedi
RE: Line length in Lucene/Solr code
Hi, One interesting detail: The old style terminal width of IBM PC's with 80 columns used in the Java line length migrated in the meantime to another common line length: Most terminal applications have a default length of e.g. 132 already - so I would make this number (around 130) the most common standard! Interestingly, the avg line length of Lucene code is already smaller! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, February 25, 2013 11:39 AM To: dev@lucene.apache.org Subject: Line length in Lucene/Solr code According to https://wiki.apache.org/solr/HowToContribute, Sun's code style conventions should be used when writing contributions for Lucene and Solr. Said conventions state that lines in code should be 80 characters or less, since they're not handled well by many terminals and tools: http://www.oracle.com/technetwork/java/javase/documentation/codecon ventions-136091.html#313 A quick random inspection of the Lucene/Solr code base tells me that this recommendation is not followed: Out of 20 source files, only a single one adhered to the 80 characters/line limit and that was StorageField, which is an interface. I am all for a larger limit as I find that it makes Java code a lot more readable. With current tools, Java code needs to be formatted using line breaks and indents (as opposed to fully dynamic tool-specific re-flow of the code). That formatting is dependent on a specific maximum line width to be consistent. With that in mind, I suggest that the code style recommendation is expanded with the notion that a maximum of x characters/line should be used, where x is something more than 80. Judging by a quick search, 120 chars seems to be a common choice. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-839) XML Query Parser support
[ https://issues.apache.org/jira/browse/SOLR-839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585897#comment-13585897 ] Erik Hatcher commented on SOLR-839: --- With the latest patch, these queries work (borrowed from SOLR-4351's tests): {code} term f=id11/term field f=textNow Cow/field prefix f=textbrow/prefix frange l=20 u=24mul(foo_i,2)/frange join from=qqq_s to=www_sid:10/join join from=qqq_s to=www_sterm f=id10/term/join lucenetext:Cow -id:1/lucene {code} The object parser path worked easily, but it's not as powerful as it needs to be. There needs to be a way to make BooleanQuery's (without having to use the lucene query parser) and then, like the XMLQueryParser, do stuff with span queries and so on. Maybe it's not worthwhile to have both JSON and XML query parsing as they both should probably use the same infrastructure. But I would hate to see a JSON form of XSLT used here. Ideally, the query tree would be defined server-side and lean/clean parameters would be passed in to fill in the blanks, but also possibly having some logic based on the values of the parameters (in_stock=true, would if specified add a filter for inStock:true, for example) The XMLQParser in the last patch has xsl capability as well, so that you could define id.xsl to be: {code} xsl:stylesheet version=1.0 xmlns:xsl=http://www.w3.org/1999/XSL/Transform; xsl:template match=/Document term f=idxsl:value-of select=id//term /xsl:template /xsl:stylesheet {code} Then using defType=xmlxsl=idid=SOLR1000 a term query will be generated. (this is too simple of an example, as there would be other leaner/cleaner ways to do this exact one) XML Query Parser support Key: SOLR-839 URL: https://issues.apache.org/jira/browse/SOLR-839 Project: Solr Issue Type: New Feature Components: query parsers Affects Versions: 1.3 Reporter: Erik Hatcher Assignee: Erik Hatcher Fix For: 5.0 Attachments: lucene-xml-query-parser-2.4-dev.jar, SOLR-839-object-parser.patch, SOLR-839.patch Lucene contrib includes a query parser that is able to create the full-spectrum of Lucene queries, using an XML data structure. This patch adds xml query parser support to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4795) Add FacetsCollector based on SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585899#comment-13585899 ] Michael McCandless commented on LUCENE-4795: Fixed small bug (wasn't counting ord 0); here's the same test as before, just running on Term Or queries: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff MedTerm 52.70 (3.1%) 40.87 (2.0%) -22.5% ( -26% - -17%) OrHighMed 25.54 (3.6%) 20.18 (2.4%) -21.0% ( -26% - -15%) HighTerm9.22 (4.1%)7.33 (2.4%) -20.4% ( -25% - -14%) OrHighHigh 12.92 (3.6%) 10.41 (2.8%) -19.4% ( -24% - -13%) OrHighLow 13.12 (3.8%) 10.61 (2.8%) -19.2% ( -24% - -13%) LowTerm 145.94 (1.9%) 125.51 (1.6%) -14.0% ( -17% - -10%) {noformat} Then I applied Rob's patch (base = trunk, comp = Rob's + my patch): {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff MedTerm 52.97 (2.2%) 42.34 (1.6%) -20.1% ( -23% - -16%) OrHighMed 25.66 (2.2%) 20.73 (1.7%) -19.2% ( -22% - -15%) OrHighHigh 12.99 (2.4%) 10.69 (1.8%) -17.7% ( -21% - -13%) OrHighLow 13.19 (2.3%) 10.94 (2.0%) -17.0% ( -20% - -12%) HighTerm9.30 (2.6%)7.76 (1.8%) -16.6% ( -20% - -12%) LowTerm 146.48 (1.3%) 129.04 (0.9%) -11.9% ( -13% - -9%) {noformat} So a wee bit faster but not much... (good! The awesome predictive compression from MonotonicALB doesn't hurt much). Then I made a new collector that resolves ords after each segment from Adrien's idea (SortedSetDocValuesCollectorMergeBySeg) -- base = same as above, comp = new collector w/o Rob's patch: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff HighTerm9.29 (3.1%)7.14 (1.9%) -23.2% ( -27% - -18%) OrHighMed 25.51 (2.7%) 19.60 (2.2%) -23.2% ( -27% - -18%) OrHighLow 13.08 (2.8%) 10.20 (2.3%) -22.0% ( -26% - -17%) OrHighHigh 12.89 (2.9%) 10.21 (2.6%) -20.8% ( -25% - -15%) MedTerm 53.00 (2.7%) 43.34 (1.5%) -18.2% ( -21% - -14%) LowTerm 145.97 (1.6%) 133.05 (0.9%) -8.9% ( -11% - -6%) {noformat} Strangely it's not really faster ... maybe I have a bug. Unfortunately, until we get the top K working, we can't do the end-to-end comparison to make sure we're getting the right facet values ... Add FacetsCollector based on SortedSetDocValues --- Key: LUCENE-4795 URL: https://issues.apache.org/jira/browse/LUCENE-4795 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Attachments: LUCENE-4795.patch, pleaseBenchmarkMe.patch Recently (LUCENE-4765) we added multi-valued DocValues field (SortedSetDocValuesField), and this can be used for faceting in Solr (SOLR-4490). I think we should also add support in the facet module? It'd be an option with different tradeoffs. Eg, it wouldn't require the taxonomy index, since the main index handles label/ord resolving. There are at least two possible approaches: * On every reopen, build the seg - global ord map, and then on every collect, get the seg ord, map it to the global ord space, and increment counts. This adds cost during reopen in proportion to number of unique terms ... * On every collect, increment counts based on the seg ords, and then do a merge in the end just like distributed faceting does. The first approach is much easier so I built a quick prototype using that. The prototype does the counting, but it does NOT do the top K facets gathering in the end, and it doesn't know parent/child ord relationships, so there's tons more to do before this is real. I also was unsure how to properly integrate it since the existing classes seem to expect that you use a taxonomy index to resolve ords. I ran a quick performance test. base = trunk except I disabled the compute top-K in FacetsAccumulator to make the comparison fair; comp = using the prototype collector in the patch: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff
Re: Line length in Lucene/Solr code
Changed to: lines can be greater than 80 chars long, 132 is a common limit. Try to be reasonable for ''very'' long lines. On Mon, Feb 25, 2013 at 9:51 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, One interesting detail: The old style terminal width of IBM PC's with 80 columns used in the Java line length migrated in the meantime to another common line length: Most terminal applications have a default length of e.g. 132 already - so I would make this number (around 130) the most common standard! Interestingly, the avg line length of Lucene code is already smaller! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, February 25, 2013 11:39 AM To: dev@lucene.apache.org Subject: Line length in Lucene/Solr code According to https://wiki.apache.org/solr/HowToContribute, Sun's code style conventions should be used when writing contributions for Lucene and Solr. Said conventions state that lines in code should be 80 characters or less, since they're not handled well by many terminals and tools: http://www.oracle.com/technetwork/java/javase/documentation/codecon ventions-136091.html#313 A quick random inspection of the Lucene/Solr code base tells me that this recommendation is not followed: Out of 20 source files, only a single one adhered to the 80 characters/line limit and that was StorageField, which is an interface. I am all for a larger limit as I find that it makes Java code a lot more readable. With current tools, Java code needs to be formatted using line breaks and indents (as opposed to fully dynamic tool-specific re-flow of the code). That formatting is dependent on a specific maximum line width to be consistent. With that in mind, I suggest that the code style recommendation is expanded with the notion that a maximum of x characters/line should be used, where x is something more than 80. Judging by a quick search, 120 chars seems to be a common choice. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4795) Add FacetsCollector based on SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless updated LUCENE-4795: --- Attachment: LUCENE-4795.patch Add FacetsCollector based on SortedSetDocValues --- Key: LUCENE-4795 URL: https://issues.apache.org/jira/browse/LUCENE-4795 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Attachments: LUCENE-4795.patch, LUCENE-4795.patch, pleaseBenchmarkMe.patch Recently (LUCENE-4765) we added multi-valued DocValues field (SortedSetDocValuesField), and this can be used for faceting in Solr (SOLR-4490). I think we should also add support in the facet module? It'd be an option with different tradeoffs. Eg, it wouldn't require the taxonomy index, since the main index handles label/ord resolving. There are at least two possible approaches: * On every reopen, build the seg - global ord map, and then on every collect, get the seg ord, map it to the global ord space, and increment counts. This adds cost during reopen in proportion to number of unique terms ... * On every collect, increment counts based on the seg ords, and then do a merge in the end just like distributed faceting does. The first approach is much easier so I built a quick prototype using that. The prototype does the counting, but it does NOT do the top K facets gathering in the end, and it doesn't know parent/child ord relationships, so there's tons more to do before this is real. I also was unsure how to properly integrate it since the existing classes seem to expect that you use a taxonomy index to resolve ords. I ran a quick performance test. base = trunk except I disabled the compute top-K in FacetsAccumulator to make the comparison fair; comp = using the prototype collector in the patch: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff OrHighLow 18.79 (2.5%) 14.36 (3.3%) -23.6% ( -28% - -18%) HighTerm 21.58 (2.4%) 16.53 (3.7%) -23.4% ( -28% - -17%) OrHighMed 18.20 (2.5%) 13.99 (3.3%) -23.2% ( -28% - -17%) Prefix3 14.37 (1.5%) 11.62 (3.5%) -19.1% ( -23% - -14%) LowTerm 130.80 (1.6%) 106.95 (2.4%) -18.2% ( -21% - -14%) OrHighHigh9.60 (2.6%)7.88 (3.5%) -17.9% ( -23% - -12%) AndHighHigh 24.61 (0.7%) 20.74 (1.9%) -15.7% ( -18% - -13%) Fuzzy1 49.40 (2.5%) 43.48 (1.9%) -12.0% ( -15% - -7%) MedSloppyPhrase 27.06 (1.6%) 23.95 (2.3%) -11.5% ( -15% - -7%) MedTerm 51.43 (2.0%) 46.21 (2.7%) -10.2% ( -14% - -5%) IntNRQ4.02 (1.6%)3.63 (4.0%) -9.7% ( -15% - -4%) Wildcard 29.14 (1.5%) 26.46 (2.5%) -9.2% ( -13% - -5%) HighSloppyPhrase0.92 (4.5%)0.87 (5.8%) -5.4% ( -15% -5%) MedSpanNear 29.51 (2.5%) 27.94 (2.2%) -5.3% ( -9% -0%) HighSpanNear3.55 (2.4%)3.38 (2.0%) -4.9% ( -9% -0%) AndHighMed 108.34 (0.9%) 104.55 (1.1%) -3.5% ( -5% - -1%) LowSloppyPhrase 20.50 (2.0%) 20.09 (4.2%) -2.0% ( -8% -4%) LowPhrase 21.60 (6.0%) 21.26 (5.1%) -1.6% ( -11% - 10%) Fuzzy2 53.16 (3.9%) 52.40 (2.7%) -1.4% ( -7% -5%) LowSpanNear8.42 (3.2%)8.45 (3.0%) 0.3% ( -5% -6%) Respell 45.17 (4.3%) 45.38 (4.4%) 0.5% ( -7% -9%) MedPhrase 113.93 (5.8%) 115.02 (4.9%) 1.0% ( -9% - 12%) AndHighLow 596.42 (2.5%) 617.12 (2.8%) 3.5% ( -1% -8%) HighPhrase 17.30 (10.5%) 18.36 (9.1%) 6.2% ( -12% - 28%) {noformat} I'm impressed that this approach is only ~24% slower in the worst case! I think this means it's a good option to make available? Yes it has downsides (NRT reopen more costly, small added RAM usage, slightly slower faceting), but it's also simpler (no taxo index to manage). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more
[jira] [Commented] (LUCENE-4795) Add FacetsCollector based on SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585901#comment-13585901 ] Robert Muir commented on LUCENE-4795: - Thanks for benchmarking: I think we should keep the monotonic compression! It will use significantly less RAM for this thing. Add FacetsCollector based on SortedSetDocValues --- Key: LUCENE-4795 URL: https://issues.apache.org/jira/browse/LUCENE-4795 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Attachments: LUCENE-4795.patch, LUCENE-4795.patch, pleaseBenchmarkMe.patch Recently (LUCENE-4765) we added multi-valued DocValues field (SortedSetDocValuesField), and this can be used for faceting in Solr (SOLR-4490). I think we should also add support in the facet module? It'd be an option with different tradeoffs. Eg, it wouldn't require the taxonomy index, since the main index handles label/ord resolving. There are at least two possible approaches: * On every reopen, build the seg - global ord map, and then on every collect, get the seg ord, map it to the global ord space, and increment counts. This adds cost during reopen in proportion to number of unique terms ... * On every collect, increment counts based on the seg ords, and then do a merge in the end just like distributed faceting does. The first approach is much easier so I built a quick prototype using that. The prototype does the counting, but it does NOT do the top K facets gathering in the end, and it doesn't know parent/child ord relationships, so there's tons more to do before this is real. I also was unsure how to properly integrate it since the existing classes seem to expect that you use a taxonomy index to resolve ords. I ran a quick performance test. base = trunk except I disabled the compute top-K in FacetsAccumulator to make the comparison fair; comp = using the prototype collector in the patch: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff OrHighLow 18.79 (2.5%) 14.36 (3.3%) -23.6% ( -28% - -18%) HighTerm 21.58 (2.4%) 16.53 (3.7%) -23.4% ( -28% - -17%) OrHighMed 18.20 (2.5%) 13.99 (3.3%) -23.2% ( -28% - -17%) Prefix3 14.37 (1.5%) 11.62 (3.5%) -19.1% ( -23% - -14%) LowTerm 130.80 (1.6%) 106.95 (2.4%) -18.2% ( -21% - -14%) OrHighHigh9.60 (2.6%)7.88 (3.5%) -17.9% ( -23% - -12%) AndHighHigh 24.61 (0.7%) 20.74 (1.9%) -15.7% ( -18% - -13%) Fuzzy1 49.40 (2.5%) 43.48 (1.9%) -12.0% ( -15% - -7%) MedSloppyPhrase 27.06 (1.6%) 23.95 (2.3%) -11.5% ( -15% - -7%) MedTerm 51.43 (2.0%) 46.21 (2.7%) -10.2% ( -14% - -5%) IntNRQ4.02 (1.6%)3.63 (4.0%) -9.7% ( -15% - -4%) Wildcard 29.14 (1.5%) 26.46 (2.5%) -9.2% ( -13% - -5%) HighSloppyPhrase0.92 (4.5%)0.87 (5.8%) -5.4% ( -15% -5%) MedSpanNear 29.51 (2.5%) 27.94 (2.2%) -5.3% ( -9% -0%) HighSpanNear3.55 (2.4%)3.38 (2.0%) -4.9% ( -9% -0%) AndHighMed 108.34 (0.9%) 104.55 (1.1%) -3.5% ( -5% - -1%) LowSloppyPhrase 20.50 (2.0%) 20.09 (4.2%) -2.0% ( -8% -4%) LowPhrase 21.60 (6.0%) 21.26 (5.1%) -1.6% ( -11% - 10%) Fuzzy2 53.16 (3.9%) 52.40 (2.7%) -1.4% ( -7% -5%) LowSpanNear8.42 (3.2%)8.45 (3.0%) 0.3% ( -5% -6%) Respell 45.17 (4.3%) 45.38 (4.4%) 0.5% ( -7% -9%) MedPhrase 113.93 (5.8%) 115.02 (4.9%) 1.0% ( -9% - 12%) AndHighLow 596.42 (2.5%) 617.12 (2.8%) 3.5% ( -1% -8%) HighPhrase 17.30 (10.5%) 18.36 (9.1%) 6.2% ( -12% - 28%) {noformat} I'm impressed that this approach is only ~24% slower in the worst case! I think this means it's a good option to make available? Yes it has downsides (NRT reopen more costly, small added RAM usage, slightly slower faceting), but it's also simpler (no taxo index to
[jira] [Created] (SOLR-4500) How can we integrate LDAP authentiocation with the Solr instance
Srividhya created SOLR-4500: --- Summary: How can we integrate LDAP authentiocation with the Solr instance Key: SOLR-4500 URL: https://issues.apache.org/jira/browse/SOLR-4500 Project: Solr Issue Type: Task Affects Versions: 4.1 Reporter: Srividhya -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Line length in Lucene/Solr code
I'd actually never bothered to look at the line limitation, that's from back when I started programming. Mostly I was just s happy that someone had short-circuited the endless whether braces should be on the same line or not discussion. G P.S. the ''very'' is really an italic. P.P.S. Why programmers are different than the rest. Not _only_ have I been in the very where should the braces go discussions at various points in my life, but there's a Wiki article that's far too long http://en.wikipedia.org/wiki/One_True_Brace_Style#K.26R_style On Mon, Feb 25, 2013 at 9:55 AM, Erick Erickson erickerick...@gmail.comwrote: Changed to: lines can be greater than 80 chars long, 132 is a common limit. Try to be reasonable for ''very'' long lines. On Mon, Feb 25, 2013 at 9:51 AM, Uwe Schindler u...@thetaphi.de wrote: Hi, One interesting detail: The old style terminal width of IBM PC's with 80 columns used in the Java line length migrated in the meantime to another common line length: Most terminal applications have a default length of e.g. 132 already - so I would make this number (around 130) the most common standard! Interestingly, the avg line length of Lucene code is already smaller! Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent: Monday, February 25, 2013 11:39 AM To: dev@lucene.apache.org Subject: Line length in Lucene/Solr code According to https://wiki.apache.org/solr/HowToContribute, Sun's code style conventions should be used when writing contributions for Lucene and Solr. Said conventions state that lines in code should be 80 characters or less, since they're not handled well by many terminals and tools: http://www.oracle.com/technetwork/java/javase/documentation/codecon ventions-136091.html#313 A quick random inspection of the Lucene/Solr code base tells me that this recommendation is not followed: Out of 20 source files, only a single one adhered to the 80 characters/line limit and that was StorageField, which is an interface. I am all for a larger limit as I find that it makes Java code a lot more readable. With current tools, Java code needs to be formatted using line breaks and indents (as opposed to fully dynamic tool-specific re-flow of the code). That formatting is dependent on a specific maximum line width to be consistent. With that in mind, I suggest that the code style recommendation is expanded with the notion that a maximum of x characters/line should be used, where x is something more than 80. Judging by a quick search, 120 chars seems to be a common choice. Regards, Toke Eskildsen - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4795) Add FacetsCollector based on SortedSetDocValues
[ https://issues.apache.org/jira/browse/LUCENE-4795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585903#comment-13585903 ] Michael McCandless commented on LUCENE-4795: bq. If you want to integrate that with the current classes, all you need to do is to implement a partial TaxonomyReader, which resolves ordinals to CPs using the global ord map? Or actually make that TR the entity that's responsible to manage to global ordinal map, so that TR.doOpenIfChanged opens the new segments and updates the global map? That sounds great! bq. Since this taxonomy, at least currently, doesn't support hierarchical facets, you'll need to hack something as a ParallelTaxoArray, but that should be easy .. I think. OK. I think it could be hierarchical w/o so much work, ie on reopen as it walks the terms it should be able to easily build up the parent/child arrays since the terms are in sorted order. Hmm, except, with SSDV you cannot have a term/ord that had no docs indexed. So the ancestor ords would not exist... hmm. Better start non-hierarchical. I guess if we are non-hierarchical then we don't really need to integrate at indexing time? Ie, app can just add the facet values using SortedSetDVF. bq. Is the only benefit in this approach that you don't need to manage a sidecar taxonomy index? I think so? Add FacetsCollector based on SortedSetDocValues --- Key: LUCENE-4795 URL: https://issues.apache.org/jira/browse/LUCENE-4795 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Attachments: LUCENE-4795.patch, LUCENE-4795.patch, pleaseBenchmarkMe.patch Recently (LUCENE-4765) we added multi-valued DocValues field (SortedSetDocValuesField), and this can be used for faceting in Solr (SOLR-4490). I think we should also add support in the facet module? It'd be an option with different tradeoffs. Eg, it wouldn't require the taxonomy index, since the main index handles label/ord resolving. There are at least two possible approaches: * On every reopen, build the seg - global ord map, and then on every collect, get the seg ord, map it to the global ord space, and increment counts. This adds cost during reopen in proportion to number of unique terms ... * On every collect, increment counts based on the seg ords, and then do a merge in the end just like distributed faceting does. The first approach is much easier so I built a quick prototype using that. The prototype does the counting, but it does NOT do the top K facets gathering in the end, and it doesn't know parent/child ord relationships, so there's tons more to do before this is real. I also was unsure how to properly integrate it since the existing classes seem to expect that you use a taxonomy index to resolve ords. I ran a quick performance test. base = trunk except I disabled the compute top-K in FacetsAccumulator to make the comparison fair; comp = using the prototype collector in the patch: {noformat} TaskQPS base StdDevQPS comp StdDev Pct diff OrHighLow 18.79 (2.5%) 14.36 (3.3%) -23.6% ( -28% - -18%) HighTerm 21.58 (2.4%) 16.53 (3.7%) -23.4% ( -28% - -17%) OrHighMed 18.20 (2.5%) 13.99 (3.3%) -23.2% ( -28% - -17%) Prefix3 14.37 (1.5%) 11.62 (3.5%) -19.1% ( -23% - -14%) LowTerm 130.80 (1.6%) 106.95 (2.4%) -18.2% ( -21% - -14%) OrHighHigh9.60 (2.6%)7.88 (3.5%) -17.9% ( -23% - -12%) AndHighHigh 24.61 (0.7%) 20.74 (1.9%) -15.7% ( -18% - -13%) Fuzzy1 49.40 (2.5%) 43.48 (1.9%) -12.0% ( -15% - -7%) MedSloppyPhrase 27.06 (1.6%) 23.95 (2.3%) -11.5% ( -15% - -7%) MedTerm 51.43 (2.0%) 46.21 (2.7%) -10.2% ( -14% - -5%) IntNRQ4.02 (1.6%)3.63 (4.0%) -9.7% ( -15% - -4%) Wildcard 29.14 (1.5%) 26.46 (2.5%) -9.2% ( -13% - -5%) HighSloppyPhrase0.92 (4.5%)0.87 (5.8%) -5.4% ( -15% -5%) MedSpanNear 29.51 (2.5%) 27.94 (2.2%) -5.3% ( -9% -0%) HighSpanNear3.55 (2.4%)3.38 (2.0%) -4.9% ( -9% -0%) AndHighMed 108.34 (0.9%) 104.55 (1.1%) -3.5% ( -5% - -1%) LowSloppyPhrase 20.50
[jira] [Updated] (SOLR-4078) Allow custom naming of nodes so that a new host:port combination can take over for a previous shard.
[ https://issues.apache.org/jira/browse/SOLR-4078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4078: -- Attachment: SOLR-4078.patch New patch - just about ready. Allow custom naming of nodes so that a new host:port combination can take over for a previous shard. Key: SOLR-4078 URL: https://issues.apache.org/jira/browse/SOLR-4078 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Mark Miller Assignee: Mark Miller Fix For: 4.2, 5.0 Attachments: SOLR-4078.patch, SOLR-4078.patch Currently we auto assign a unique node name based on the host address and core name - we should let the user optionally override this so that a new host address + core name combo can take over the duties of a previous registered node. Especially useful for ec2 if you are not using elastic ips. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4332) Adding documents to SolrCloud collection broken when a node doesn't have a core for the collection
[ https://issues.apache.org/jira/browse/SOLR-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4332: -- Assignee: Mark Miller Priority: Major (was: Critical) Issue Type: New Feature (was: Bug) Adding documents to SolrCloud collection broken when a node doesn't have a core for the collection -- Key: SOLR-4332 URL: https://issues.apache.org/jira/browse/SOLR-4332 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.1 Reporter: Eric Falcao Assignee: Mark Miller In SOLR-4321, it's documented that creating a collection via API results in some nodes having more than one core, while other nodes have zero cores. Not sure if this is desired behavior, but when a node doesn't know about a core, it throws a 404 on select/update. Reproduction: -Create a 2 node SolrCloud cluster -Create a new collection with numShards=1. 50% of your cluster will have a core for that collection. -Do an update or select against the node that doesn't have the core. 404 Like I said, not sure if this is desired behavior, but I would expect a cluster of nodes to be able to forward requests appropriately to nodes that have a core for the collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4332) Adding documents to SolrCloud collection broken when a node doesn't have a core for the collection
[ https://issues.apache.org/jira/browse/SOLR-4332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller resolved SOLR-4332. --- Resolution: Duplicate Adding documents to SolrCloud collection broken when a node doesn't have a core for the collection -- Key: SOLR-4332 URL: https://issues.apache.org/jira/browse/SOLR-4332 Project: Solr Issue Type: New Feature Components: SolrCloud Affects Versions: 4.1 Reporter: Eric Falcao Assignee: Mark Miller In SOLR-4321, it's documented that creating a collection via API results in some nodes having more than one core, while other nodes have zero cores. Not sure if this is desired behavior, but when a node doesn't know about a core, it throws a 404 on select/update. Reproduction: -Create a 2 node SolrCloud cluster -Create a new collection with numShards=1. 50% of your cluster will have a core for that collection. -Do an update or select against the node that doesn't have the core. 404 Like I said, not sure if this is desired behavior, but I would expect a cluster of nodes to be able to forward requests appropriately to nodes that have a core for the collection. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4210) if couldn't find the collection locally when searching, we should look on other nodes. one of TODOs part in SolrDispatchFilter
[ https://issues.apache.org/jira/browse/SOLR-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4210: -- Fix Version/s: (was: 4.0) (was: 4.0-BETA) 5.0 4.2 Assignee: Mark Miller Priority: Major (was: Critical) if couldn't find the collection locally when searching, we should look on other nodes. one of TODOs part in SolrDispatchFilter --- Key: SOLR-4210 URL: https://issues.apache.org/jira/browse/SOLR-4210 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Po Rui Assignee: Mark Miller Fix For: 4.2, 5.0 Attachments: SOLR-4210.patch It only check the local collection or core when searching, doesn't look on other nodes. e.g. a cluster have 4 nodes. 1th 2th 3th contribute to collection1. 2th 3th 4th contribute to collection2. now send query to 4th to searching collection1 will failed. It is an imperfect feature for searching. it is a TODO part in SolrDispatchFilter-line 220. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4210) if couldn't find the collection locally when searching, we should look on other nodes. one of TODOs part in SolrDispatchFilter
[ https://issues.apache.org/jira/browse/SOLR-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585913#comment-13585913 ] Mark Miller commented on SOLR-4210: --- Thanks Po! Never saw this issue - I was just about to tackle this myself. I'll add some testing to your patch. if couldn't find the collection locally when searching, we should look on other nodes. one of TODOs part in SolrDispatchFilter --- Key: SOLR-4210 URL: https://issues.apache.org/jira/browse/SOLR-4210 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Po Rui Assignee: Mark Miller Fix For: 4.2, 5.0 Attachments: SOLR-4210.patch It only check the local collection or core when searching, doesn't look on other nodes. e.g. a cluster have 4 nodes. 1th 2th 3th contribute to collection1. 2th 3th 4th contribute to collection2. now send query to 4th to searching collection1 will failed. It is an imperfect feature for searching. it is a TODO part in SolrDispatchFilter-line 220. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4501) MoreLikeThisComponent is misusing the mlt.count parameter
Kiril A. created SOLR-4501: -- Summary: MoreLikeThisComponent is misusing the mlt.count parameter Key: SOLR-4501 URL: https://issues.apache.org/jira/browse/SOLR-4501 Project: Solr Issue Type: Bug Components: MoreLikeThis Affects Versions: 4.1 Reporter: Kiril A. Probably there is a bug on line 144 of MoreLikeThisComponent.java method process() There is a call: {code} NamedListDocList sim = getMoreLikeThese(rb, rb.req.getSearcher(), rb.getResults().docList, mltcount); {code} The last argument (mltcount) is the number of similar documents to return for each result. However the signature of called method getMoreLikeThese is: {code} NamedListDocList getMoreLikeThese(ResponseBuilder rb,SolrIndexSearcher searcher, DocList docs, int flags) {code} The last argument is the flags - which should contains values like SolrIndexSearcher.GET_SCORES and etc. Please, could some developers confirm if this is a bug? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Line length in Lucene/Solr code
I'd actually never bothered to look at the line limitation, that's from back when I started programming. Mostly I was just s happy that someone had short-circuited the endless whether braces should be on the same line or not discussion. G P.S. the ''very'' is really an italic. P.P.S. Why programmers are different than the rest. Not _only_ have I been in the very where should the braces go discussions at various points in my life, but there's a Wiki article that's far too long For brace style, I believe that lucene currently uses 1TBS. Where I work, we are expected to use Allman. Before starting here, I used 1TBS in my own code. Allman is easiet to follow, vyt uses up a loy of vertical real estate. I have no real opinion on whether brace style should change. A slightly different topic is whitespace on otherwise blank lines. There is no consistency in Lucene here. I have no strong opinion one way or the other, but I will note that the Eclipse format settings created by 'ant eclipse' add the whitespace. Getting back to the subject of this thread, I am torn. I use two programs to edit Solr code -- vi and eclipse. For vi (in PuTTY windows) 80 would be best. For eclipse (in windows 7), something like 100 would be better. I do not maximize program windows, because I like to see what's going on in background windows. My eclipse window is large, but does not use the whole 1600x1050 area. 120 seems large, but it would work. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4783) Inconsistent results, changing based on recent previous searches (caching?)
[ https://issues.apache.org/jira/browse/LUCENE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585935#comment-13585935 ] William Johnson commented on LUCENE-4783: - This is odd, as of this morning, I can't make it happen any more. This problem has been on-again off-again, and currently it's off. If it starts happening again, I'll see if I can find more specifics as to what in particular is happening at the time it happens, but for now I suppose I should close the issue as something I can't repeat. Inconsistent results, changing based on recent previous searches (caching?) --- Key: LUCENE-4783 URL: https://issues.apache.org/jira/browse/LUCENE-4783 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.1 Environment: Ubuntu Linux Java application running under Tomcat Reporter: William Johnson We have several repeatable cases where Lucene is returning different candidates for the same search, on the same (static) index, depending on what other searches have been run before hand. It appears as though Lucene is failing to find matches in some cases if they have not been cached by a previous search. In specific (although it is happening with more than just fuzzy searches), a fuzzy search on a misspelled street name returns no result. If you then search on the correctly spelled street name, and THEN return to the original fuzzy query on the original incorrect spelling, you now receive the result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (LUCENE-4783) Inconsistent results, changing based on recent previous searches (caching?)
[ https://issues.apache.org/jira/browse/LUCENE-4783?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] William Johnson resolved LUCENE-4783. - Resolution: Cannot Reproduce Inconsistent results, changing based on recent previous searches (caching?) --- Key: LUCENE-4783 URL: https://issues.apache.org/jira/browse/LUCENE-4783 Project: Lucene - Core Issue Type: Bug Affects Versions: 4.1 Environment: Ubuntu Linux Java application running under Tomcat Reporter: William Johnson We have several repeatable cases where Lucene is returning different candidates for the same search, on the same (static) index, depending on what other searches have been run before hand. It appears as though Lucene is failing to find matches in some cases if they have not been cached by a previous search. In specific (although it is happening with more than just fuzzy searches), a fuzzy search on a misspelled street name returns no result. If you then search on the correctly spelled street name, and THEN return to the original fuzzy query on the original incorrect spelling, you now receive the result. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4457) Queries ending in question mark interpreted as wildcard
[ https://issues.apache.org/jira/browse/SOLR-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-4457: -- Attachment: SOLR-4457.patch First quick patch subclassing edismax, which can be used as a plugin for existing versions. Will proceed with proper code as an option inside edismax itself. Queries ending in question mark interpreted as wildcard --- Key: SOLR-4457 URL: https://issues.apache.org/jira/browse/SOLR-4457 Project: Solr Issue Type: Improvement Components: query parsers Reporter: Jan Høydahl Attachments: SOLR-4457.patch For many search applications, queries ending in a question mark such as {{foo bar?}} would *not* mean a search for a four-letter word starting with {{bar}}. Neither will it mean a literal search for a question mark. The query parsers should have an option to discard trailing question mark before passing to analysis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4210) if couldn't find the collection locally when searching, we should look on other nodes. one of TODOs part in SolrDispatchFilter
[ https://issues.apache.org/jira/browse/SOLR-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585960#comment-13585960 ] Mark Miller commented on SOLR-4210: --- Along with tests, I think we want to update this to work with binary (javabin, etc) as well as other urls beyond the 'select' handler. if couldn't find the collection locally when searching, we should look on other nodes. one of TODOs part in SolrDispatchFilter --- Key: SOLR-4210 URL: https://issues.apache.org/jira/browse/SOLR-4210 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Po Rui Assignee: Mark Miller Fix For: 4.2, 5.0 Attachments: SOLR-4210.patch It only check the local collection or core when searching, doesn't look on other nodes. e.g. a cluster have 4 nodes. 1th 2th 3th contribute to collection1. 2th 3th 4th contribute to collection2. now send query to 4th to searching collection1 will failed. It is an imperfect feature for searching. it is a TODO part in SolrDispatchFilter-line 220. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4502) ShardHandlerFactory not initialized in CoreContainer when creating a Core manually.
Michael Aspetsberger created SOLR-4502: -- Summary: ShardHandlerFactory not initialized in CoreContainer when creating a Core manually. Key: SOLR-4502 URL: https://issues.apache.org/jira/browse/SOLR-4502 Project: Solr Issue Type: Bug Affects Versions: 4.1 Reporter: Michael Aspetsberger We are using an embedded solr server for our unit testing purposes. In our scenario, we create a {{CoreContainer}} using only the solr-home path, and then create the cores manually using a {{CoreDescriptor}}. While the creation appears to work fine, it hits an NPE when it handles the search: {quote} Caused by: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150) {quote} According to http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CE8A9BF60-5577-45F9-8BEA-B85616C6539D%40gmail.com%3E , this is due to a missing {{CoreContainer#load}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4471) Replication occurs even when a slave is already up to date.
[ https://issues.apache.org/jira/browse/SOLR-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13585978#comment-13585978 ] Raúl Grande commented on SOLR-4471: --- I have problems with 4.2-SNAPSHOT version. My slaves doesn't replicate even when master's version is higher than theirs. See image here: http://oi50.tinypic.com/o8uzad.jpg Why logs say Slave in sync with master when clearly isn't?? Replication occurs even when a slave is already up to date. --- Key: SOLR-4471 URL: https://issues.apache.org/jira/browse/SOLR-4471 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.1 Reporter: Andre Charton Assignee: Mark Miller Labels: master, replication, slave, version Fix For: 4.2, 5.0 Attachments: SOLR-4471.patch, SOLR-4471.patch, SOLR-4471.patch, SOLR-4471_TestRefactor.diff, SOLR-4471_Tests.patch Scenario: master/slave replication, master delta index runs every 10 minutes, slave poll interval is 10 sec. There was an issue SOLR-4413 - slave reads index from wrong directory, so slave is full copy index from master every time, which is fixed after applying this patch from 4413 (see script below). Now on replication the slave downloads only updated files, but slave is create a new segement file and also a new version of index (generation is identical with master). On next polling the slave is download the full index again, because the new version slave is force a full copy. Problem is the new version of index on the slave after first replication. {noformat:apply patch SOLR-4413 script, please copy patch into patches directory before useage.} mkdir work cd work svn co http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_1/ cd lucene_solr_4_1 patch -p0 ../../patches/SOLR-4413.patch cd solr ant dist {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4503) Add REST API methods to get schema information: fields, dynamic fields, and field types
Steve Rowe created SOLR-4503: Summary: Add REST API methods to get schema information: fields, dynamic fields, and field types Key: SOLR-4503 URL: https://issues.apache.org/jira/browse/SOLR-4503 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Affects Versions: 4.1 Reporter: Steve Rowe Assignee: Steve Rowe Add REST methods that provide properties for fields, dynamic fields, and field types, using paths: /solr/(corename)/schema/fields /solr/(corename)/schema/fields/fieldname /solr/(corename)/schema/dynamicfields /solr/(corename)/schema/dynamicfields/pattern /solr/(corename)/schema/fieldtypes /solr/(corename)/schema/fieldtypes/typename -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4503) Add REST API methods to get schema information: fields, dynamic fields, and field types
[ https://issues.apache.org/jira/browse/SOLR-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-4503: - Attachment: SOLR-4503.patch Patch implementing the idea. No (functioning) tests yet. I've started the example server using the default solr home and the multicore solr home, and requests to all methods are functional from curl. Add REST API methods to get schema information: fields, dynamic fields, and field types --- Key: SOLR-4503 URL: https://issues.apache.org/jira/browse/SOLR-4503 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Affects Versions: 4.1 Reporter: Steve Rowe Assignee: Steve Rowe Attachments: SOLR-4503.patch Add REST methods that provide properties for fields, dynamic fields, and field types, using paths: /solr/(corename)/schema/fields /solr/(corename)/schema/fields/fieldname /solr/(corename)/schema/dynamicfields /solr/(corename)/schema/dynamicfields/pattern /solr/(corename)/schema/fieldtypes /solr/(corename)/schema/fieldtypes/typename -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4210) if couldn't find the collection locally when searching, we should look on other nodes. one of TODOs part in SolrDispatchFilter
[ https://issues.apache.org/jira/browse/SOLR-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4210: -- Attachment: SOLR-4210.patch Here is a first pass at something more generic - it attempts to forward for all appropriate requests. Also has a simple test. if couldn't find the collection locally when searching, we should look on other nodes. one of TODOs part in SolrDispatchFilter --- Key: SOLR-4210 URL: https://issues.apache.org/jira/browse/SOLR-4210 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Po Rui Assignee: Mark Miller Fix For: 4.2, 5.0 Attachments: SOLR-4210.patch, SOLR-4210.patch It only check the local collection or core when searching, doesn't look on other nodes. e.g. a cluster have 4 nodes. 1th 2th 3th contribute to collection1. 2th 3th 4th contribute to collection2. now send query to 4th to searching collection1 will failed. It is an imperfect feature for searching. it is a TODO part in SolrDispatchFilter-line 220. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4502) ShardHandlerFactory not initialized in CoreContainer when creating a Core manually.
[ https://issues.apache.org/jira/browse/SOLR-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-4502: - Assignee: Mark Miller ShardHandlerFactory not initialized in CoreContainer when creating a Core manually. --- Key: SOLR-4502 URL: https://issues.apache.org/jira/browse/SOLR-4502 Project: Solr Issue Type: Bug Affects Versions: 4.1 Reporter: Michael Aspetsberger Assignee: Mark Miller Labels: NPE We are using an embedded solr server for our unit testing purposes. In our scenario, we create a {{CoreContainer}} using only the solr-home path, and then create the cores manually using a {{CoreDescriptor}}. While the creation appears to work fine, it hits an NPE when it handles the search: {quote} Caused by: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150) {quote} According to http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CE8A9BF60-5577-45F9-8BEA-B85616C6539D%40gmail.com%3E , this is due to a missing {{CoreContainer#load}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4502) ShardHandlerFactory not initialized in CoreContainer when creating a Core manually.
[ https://issues.apache.org/jira/browse/SOLR-4502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4502: -- Fix Version/s: 5.0 4.2 ShardHandlerFactory not initialized in CoreContainer when creating a Core manually. --- Key: SOLR-4502 URL: https://issues.apache.org/jira/browse/SOLR-4502 Project: Solr Issue Type: Bug Affects Versions: 4.1 Reporter: Michael Aspetsberger Assignee: Mark Miller Labels: NPE Fix For: 4.2, 5.0 We are using an embedded solr server for our unit testing purposes. In our scenario, we create a {{CoreContainer}} using only the solr-home path, and then create the cores manually using a {{CoreDescriptor}}. While the creation appears to work fine, it hits an NPE when it handles the search: {quote} Caused by: java.lang.NullPointerException at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:181) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1816) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:150) {quote} According to http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201301.mbox/%3CE8A9BF60-5577-45F9-8BEA-B85616C6539D%40gmail.com%3E , this is due to a missing {{CoreContainer#load}}. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4471) Replication occurs even when a slave is already up to date.
[ https://issues.apache.org/jira/browse/SOLR-4471?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586019#comment-13586019 ] Amit Nithian commented on SOLR-4471: I'm curious about something. What does: 1) http://localhost:17045/solr/replication?command=indexversion yield 2) http://localhost:your slave port/solr/replication?command=details yield I have noticed that what you see on the UI vs what you see in the indexversion and details sometimes differ and I wonder if that is a culprit here? Does #1,#2 jive with the versions that you see in the UI? Replication occurs even when a slave is already up to date. --- Key: SOLR-4471 URL: https://issues.apache.org/jira/browse/SOLR-4471 Project: Solr Issue Type: Bug Components: replication (java) Affects Versions: 4.1 Reporter: Andre Charton Assignee: Mark Miller Labels: master, replication, slave, version Fix For: 4.2, 5.0 Attachments: SOLR-4471.patch, SOLR-4471.patch, SOLR-4471.patch, SOLR-4471_TestRefactor.diff, SOLR-4471_Tests.patch Scenario: master/slave replication, master delta index runs every 10 minutes, slave poll interval is 10 sec. There was an issue SOLR-4413 - slave reads index from wrong directory, so slave is full copy index from master every time, which is fixed after applying this patch from 4413 (see script below). Now on replication the slave downloads only updated files, but slave is create a new segement file and also a new version of index (generation is identical with master). On next polling the slave is download the full index again, because the new version slave is force a full copy. Problem is the new version of index on the slave after first replication. {noformat:apply patch SOLR-4413 script, please copy patch into patches directory before useage.} mkdir work cd work svn co http://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_4_1/ cd lucene_solr_4_1 patch -p0 ../../patches/SOLR-4413.patch cd solr ant dist {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Line length in Lucene/Solr code
: I am all for a larger limit as I find that it makes Java code a lot more : readable. With current tools, Java code needs to be formatted using line Aim for 80 chars, but don't shoehorn things if they are more readable on a single long line -Hoss'ss Law of Code Line Lengths -Hoss - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4373) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores
[ https://issues.apache.org/jira/browse/SOLR-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man reassigned SOLR-4373: -- Assignee: Mark Miller (was: Hoss Man) Mark: i really don't have anything to offer on this issue beyond the comments i already posted ... based on my testing it really _seems_ like this is caused by SOLR-4063, but that may be totally off base since aledandre couldn't make the problem go away using hte workarround i thought i found (ie: forcing single threaded core init) since i couldn't reproduce a similar collision between multicores with a simple stopwords file loaded from the classpath, it also seems likeley that this relates to SPI loading: but since i don't really understand at all how NamedSPILoader works in a multiclassloader application (and since my experiements in forcing synchronization on it didn't solve the problem for me) i'm at a dead in there as well In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores --- Key: SOLR-4373 URL: https://issues.apache.org/jira/browse/SOLR-4373 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.1 Reporter: Alexandre Rafalovitch Assignee: Mark Miller Priority: Blocker Labels: lib, multicore Fix For: 4.2, 5.0, 4.1.1 Attachments: multicore-bug.zip Having lib directives in solrconfig.xml seem to wipe out/override the definitions in previous cores. The exception (for the earlier core) is: at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error loading class 'solr.ICUFoldingFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) The full replication case is attached. If the SECOND core is turned off in solr.xml, the FIRST core loads just fine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-4373) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores
[ https://issues.apache.org/jira/browse/SOLR-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-4373: - Assignee: (was: Mark Miller) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores --- Key: SOLR-4373 URL: https://issues.apache.org/jira/browse/SOLR-4373 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.1 Reporter: Alexandre Rafalovitch Priority: Blocker Labels: lib, multicore Fix For: 4.2, 5.0, 4.1.1 Attachments: multicore-bug.zip Having lib directives in solrconfig.xml seem to wipe out/override the definitions in previous cores. The exception (for the earlier core) is: at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error loading class 'solr.ICUFoldingFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) The full replication case is attached. If the SECOND core is turned off in solr.xml, the FIRST core loads just fine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4373) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores
[ https://issues.apache.org/jira/browse/SOLR-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586097#comment-13586097 ] Mark Miller commented on SOLR-4373: --- Oh, I don't plan on looking into it. Just assigned it to so that it wasn't forgotten and you seemed to have some knowledge in this area. I don't really have a need for more than a single lib dir ever pretty much. In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores --- Key: SOLR-4373 URL: https://issues.apache.org/jira/browse/SOLR-4373 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.1 Reporter: Alexandre Rafalovitch Priority: Blocker Labels: lib, multicore Fix For: 4.2, 5.0, 4.1.1 Attachments: multicore-bug.zip Having lib directives in solrconfig.xml seem to wipe out/override the definitions in previous cores. The exception (for the earlier core) is: at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error loading class 'solr.ICUFoldingFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) The full replication case is attached. If the SECOND core is turned off in solr.xml, the FIRST core loads just fine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4503) Add REST API methods to get schema information: fields, dynamic fields, and field types
[ https://issues.apache.org/jira/browse/SOLR-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586112#comment-13586112 ] Steve Rowe commented on SOLR-4503: -- The patch adds two dependencies: Restlet and the Restlet servlet extension. All REST methods are implemented as Restlet ServerResource subclasses, which delegate to new self-reporting methods on IndexField and FieldType, the implementation of which was inspired by/stolen from LukeRequestHandler. SolrDispatchFilter figures out the core, creates a SolrRequest and a SolrResponse, sets them on SolrRequestInfo's thread local, then passes the request (via filter chaining or request forwarding) to the Restlet servlet defined to handle schema requests. Based on the URL path, the Restlet servlet's router then sends the request to the appropriate ServerResource subclass, where the response is filled in. There is no RequestHandler involved in servicing these requests. I've turned off Restlet's content negotiation facilities in favor of using Solr's wt parameter to specify the ResponseWriter. At present, both GET and HEAD requests work for all six requests. (Restlet uses GET methods to service HEAD requests, so there was very little coding required to do this.) Add REST API methods to get schema information: fields, dynamic fields, and field types --- Key: SOLR-4503 URL: https://issues.apache.org/jira/browse/SOLR-4503 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Affects Versions: 4.1 Reporter: Steve Rowe Assignee: Steve Rowe Attachments: SOLR-4503.patch Add REST methods that provide properties for fields, dynamic fields, and field types, using paths: /solr/(corename)/schema/fields /solr/(corename)/schema/fields/fieldname /solr/(corename)/schema/dynamicfields /solr/(corename)/schema/dynamicfields/pattern /solr/(corename)/schema/fieldtypes /solr/(corename)/schema/fieldtypes/typename -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4373) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores
[ https://issues.apache.org/jira/browse/SOLR-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586122#comment-13586122 ] Alexandre Rafalovitch commented on SOLR-4373: - I have just rechecked and I still have the problem with coreLoadThreads=1 setting under Solr 4.1. Every time I start Solr a different subset of cores fails. I think reloading a core triggers this as well, though it is harder to check. I have the whole set of examples structured around getting this to work (each example is a separate core). Is there something I can do to help troubleshooting this? I haven't tried working with Solr source yet, but I am a Java developer and can dig around if there is some sort of information at where library references are stored. In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores --- Key: SOLR-4373 URL: https://issues.apache.org/jira/browse/SOLR-4373 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.1 Reporter: Alexandre Rafalovitch Priority: Blocker Labels: lib, multicore Fix For: 4.2, 5.0, 4.1.1 Attachments: multicore-bug.zip Having lib directives in solrconfig.xml seem to wipe out/override the definitions in previous cores. The exception (for the earlier core) is: at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error loading class 'solr.ICUFoldingFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) The full replication case is attached. If the SECOND core is turned off in solr.xml, the FIRST core loads just fine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4373) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores
[ https://issues.apache.org/jira/browse/SOLR-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586128#comment-13586128 ] Uwe Schindler commented on SOLR-4373: - Hoss: The concurrency issue in NamedSPILoader can indeed be solved by making the reload method synchonized. The services field is volatile, so readers will in any case see the correct value. The thread safety problem is *inside* this method. In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores --- Key: SOLR-4373 URL: https://issues.apache.org/jira/browse/SOLR-4373 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.1 Reporter: Alexandre Rafalovitch Priority: Blocker Labels: lib, multicore Fix For: 4.2, 5.0, 4.1.1 Attachments: multicore-bug.zip Having lib directives in solrconfig.xml seem to wipe out/override the definitions in previous cores. The exception (for the earlier core) is: at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error loading class 'solr.ICUFoldingFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) The full replication case is attached. If the SECOND core is turned off in solr.xml, the FIRST core loads just fine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4481) SwitchQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586135#comment-13586135 ] Commit Tag Bot commented on SOLR-4481: -- [trunk commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revisionrevision=1449809 SOLR-4481: SwitchQParserPlugin registered by default as 'switch' SwitchQParserPlugin --- Key: SOLR-4481 URL: https://issues.apache.org/jira/browse/SOLR-4481 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Attachments: SOLR-4481.patch Inspired by a conversation i had with someone on IRC a while back about using append fq params + local params to create custom request params, it occurred to me that it would be handy to have a switch qparser that could be configured with some set of fixed switch case localparams that it would delegate too based on it's input string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4771) Query-time join collectors could maybe be more efficient
[ https://issues.apache.org/jira/browse/LUCENE-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586137#comment-13586137 ] Robert Muir commented on LUCENE-4771: - Thanks for updating Martijn! I plan to look at this later tonight and work on pulling out the BitsFilteredTermsEnum and making it more efficient. After that, I think we should revisit the intersection (I started with something ultra-simple here) to make sure its optimal too. Somehow actually we should try to come up with a standard benchmark (luceneutil or similar) so that we can test the approach for the single-valued case there too. My intuition says I think it can be a win in both cases. Query-time join collectors could maybe be more efficient Key: LUCENE-4771 URL: https://issues.apache.org/jira/browse/LUCENE-4771 Project: Lucene - Core Issue Type: Improvement Components: modules/join Reporter: Robert Muir Attachments: LUCENE-4771_prototype.patch, LUCENE-4771-prototype.patch, LUCENE-4771_prototype_without_bug.patch I was looking @ these collectors on LUCENE-4765 and I noticed: * SingleValued collector (SV) pulls FieldCache.getTerms and adds the bytes to a bytesrefhash per-collect. * MultiValued collector (MV) pulls FieldCache.getDocTermsOrds, but doesnt use the ords, just looks up each value and adds the bytes per-collect. I think instead its worth investigating if SV should use getTermsIndex, and both collectors just collect-up their per-segment ords in something like a BitSet[maxOrd]. When asked for the terms at the end in getCollectorTerms(), they could merge these into one BytesRefHash. Of course, if you are going to turn around and execute the query against the same searcher anyway (is this the typical case?), this could even be more efficient: No need to hash or instantiate all the terms in memory, we could do postpone the lookups to SeekingTermSetTermsEnum.accept()/nextSeekTerm() i think... somehow :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4796) NamedSPILoader.reload needs to be synchronized
Hoss Man created LUCENE-4796: Summary: NamedSPILoader.reload needs to be synchronized Key: LUCENE-4796 URL: https://issues.apache.org/jira/browse/LUCENE-4796 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.2, 5.0 Spun off of SOLR-4373: as discsused with uwe on IRC, NamedSPILoader.reload is not thread safe: it reads from this.services at the beginging of hte method, makes additions based on the method input, and then overwrites this.services at the end of the method. if the method is called by two threads concurrently, the entries added by threadB could be lost if threadA enters the method before threadB and exists the method after threadB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4796) NamedSPILoader.reload needs to be synchronized
[ https://issues.apache.org/jira/browse/LUCENE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586144#comment-13586144 ] Uwe Schindler commented on LUCENE-4796: --- Hoss: I agree! The concurrency issue in NamedSPILoader can indeed be solved by making the reload method synchonized. The services field is volatile, so readers will in any case see the correct value, otherwise all methods would need to be synchronized. By using a voltile, we only need to synchronize this single method. NamedSPILoader.reload needs to be synchronized -- Key: LUCENE-4796 URL: https://issues.apache.org/jira/browse/LUCENE-4796 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.2, 5.0 Spun off of SOLR-4373: as discsused with uwe on IRC, NamedSPILoader.reload is not thread safe: it reads from this.services at the beginging of hte method, makes additions based on the method input, and then overwrites this.services at the end of the method. if the method is called by two threads concurrently, the entries added by threadB could be lost if threadA enters the method before threadB and exists the method after threadB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-4797) Fix remaining Lucene/Solr Javadocs issue
Uwe Schindler created LUCENE-4797: - Summary: Fix remaining Lucene/Solr Javadocs issue Key: LUCENE-4797 URL: https://issues.apache.org/jira/browse/LUCENE-4797 Project: Lucene - Core Issue Type: Bug Components: general/javadocs Affects Versions: 4.1 Reporter: Uwe Schindler Java 8 has a new feature (enabled by default): http://openjdk.java.net/jeps/172 It fails the build on: - incorrect links (@see, @link,...) - incorrect HTML entities - invalid HTML in general Thanks to our linter written in HTMLTidy and Python, most of the bugs are already solved in our source code, but the Oracle Linter finds some more problems, our linter does not: - missing escapes - invalid entities Unfortunately the versions of JDK8 released up to today have a bug, making optional closing tags (which are valid HTML4), like /p, mandatory. This will be fixed in b78. Currently there is another bug in the Oracle javadocs tool (it fails to copy doc-files folders), but this is under investigation at the moment. We should clean up our javadocs, so the pass the new JDK8 javadocs tool with build 78+. Maybe we can put our own linter out of service, once we rely on Java 8 :-) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4373) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores
[ https://issues.apache.org/jira/browse/SOLR-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586175#comment-13586175 ] Alexandre Rafalovitch commented on SOLR-4373: - Isn't that for SPIs only? Does that cover TokenFactories, etc? I think the libraries actually use SolrResourceLoader#reloadLuceneSPI instead. I wonder if the problem is related to SolrResourceLoader#createClassLoader: {code:java} if ( null == parent ) { parent = Thread.currentThread().getContextClassLoader(); } {code} How does this work with multiple threads? In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores --- Key: SOLR-4373 URL: https://issues.apache.org/jira/browse/SOLR-4373 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.1 Reporter: Alexandre Rafalovitch Priority: Blocker Labels: lib, multicore Fix For: 4.2, 5.0, 4.1.1 Attachments: multicore-bug.zip Having lib directives in solrconfig.xml seem to wipe out/override the definitions in previous cores. The exception (for the earlier core) is: at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error loading class 'solr.ICUFoldingFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) The full replication case is attached. If the SECOND core is turned off in solr.xml, the FIRST core loads just fine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4373) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores
[ https://issues.apache.org/jira/browse/SOLR-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586182#comment-13586182 ] Hoss Man commented on SOLR-4373: Alex: I just noticed something in one of your earlier comments... bq. I am unable to get the problem go away by using coreLoadThreads=1: cores adminPath=/admin/cores coreLoadThreads=1 ...the coreLoadThreads option should be on the {{solr/}} element, not the {{cores/}} element, can you please test that again? Other comments.. bq. Is there something I can do to help troubleshooting this? I haven't tried working with Solr source yet, but I am a Java developer and can dig around if there is some sort of information at where library references are stored. primarily we build up a ClassLoader per SolrCore in SOlrResourceLoader, each of which hangs off of the parent classloader for the webapp -- but the use of SPI in lucene complicates things in ways i still don't fully understand. bq. Isn't that for SPIs only? Does that cover TokenFactories Yes, many of the various factories in Solr are handled using SPI now (take a look at SolrResourceLoader.reloadLuceneSPI()) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores --- Key: SOLR-4373 URL: https://issues.apache.org/jira/browse/SOLR-4373 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.1 Reporter: Alexandre Rafalovitch Priority: Blocker Labels: lib, multicore Fix For: 4.2, 5.0, 4.1.1 Attachments: multicore-bug.zip Having lib directives in solrconfig.xml seem to wipe out/override the definitions in previous cores. The exception (for the earlier core) is: at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error loading class 'solr.ICUFoldingFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) The full replication case is attached. If the SECOND core is turned off in solr.xml, the FIRST core loads just fine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Resolved] (SOLR-4481) SwitchQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man resolved SOLR-4481. Resolution: Fixed Fix Version/s: 5.0 4.2 I went ahead and committed a version using the following syntax... {noformat} {!switch case=XXX case.foo=YYY case.bar=ZZZ default=QQQ}foo {noformat} Committed revision 1449809. Committed revision 1449823. SwitchQParserPlugin --- Key: SOLR-4481 URL: https://issues.apache.org/jira/browse/SOLR-4481 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.2, 5.0 Attachments: SOLR-4481.patch Inspired by a conversation i had with someone on IRC a while back about using append fq params + local params to create custom request params, it occurred to me that it would be handy to have a switch qparser that could be configured with some set of fixed switch case localparams that it would delegate too based on it's input string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4481) SwitchQParserPlugin
[ https://issues.apache.org/jira/browse/SOLR-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586198#comment-13586198 ] Commit Tag Bot commented on SOLR-4481: -- [branch_4x commit] Chris M. Hostetter http://svn.apache.org/viewvc?view=revisionrevision=1449823 SOLR-4481: SwitchQParserPlugin registered by default as 'switch' (merge r1449809) SwitchQParserPlugin --- Key: SOLR-4481 URL: https://issues.apache.org/jira/browse/SOLR-4481 Project: Solr Issue Type: New Feature Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.2, 5.0 Attachments: SOLR-4481.patch Inspired by a conversation i had with someone on IRC a while back about using append fq params + local params to create custom request params, it occurred to me that it would be handy to have a switch qparser that could be configured with some set of fixed switch case localparams that it would delegate too based on it's input string. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4373) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores
[ https://issues.apache.org/jira/browse/SOLR-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586217#comment-13586217 ] Alexandre Rafalovitch commented on SOLR-4373: - Ok, adding the flag on solr element seems to have fixed the problem. My bad. Does it mean we know where the problem is? However, on the other point, I don't see NamedSPILoader being called from SolrResourceLoader.reloadLuceneSPI(). Rather, I see AnalysisSPILoader. Not sure if the difference is significant. In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores --- Key: SOLR-4373 URL: https://issues.apache.org/jira/browse/SOLR-4373 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.1 Reporter: Alexandre Rafalovitch Priority: Blocker Labels: lib, multicore Fix For: 4.2, 5.0, 4.1.1 Attachments: multicore-bug.zip Having lib directives in solrconfig.xml seem to wipe out/override the definitions in previous cores. The exception (for the earlier core) is: at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error loading class 'solr.ICUFoldingFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) The full replication case is attached. If the SECOND core is turned off in solr.xml, the FIRST core loads just fine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4210) if couldn't find the collection locally when searching, we should look on other nodes. one of TODOs part in SolrDispatchFilter
[ https://issues.apache.org/jira/browse/SOLR-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4210: -- Attachment: SOLR-4210.patch New patch: tightened up, more testing. if couldn't find the collection locally when searching, we should look on other nodes. one of TODOs part in SolrDispatchFilter --- Key: SOLR-4210 URL: https://issues.apache.org/jira/browse/SOLR-4210 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Po Rui Assignee: Mark Miller Fix For: 4.2, 5.0 Attachments: SOLR-4210.patch, SOLR-4210.patch, SOLR-4210.patch It only check the local collection or core when searching, doesn't look on other nodes. e.g. a cluster have 4 nodes. 1th 2th 3th contribute to collection1. 2th 3th 4th contribute to collection2. now send query to 4th to searching collection1 will failed. It is an imperfect feature for searching. it is a TODO part in SolrDispatchFilter-line 220. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4480) EDisMax parser blows up with query containing single plus or minus
[ https://issues.apache.org/jira/browse/SOLR-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jan Høydahl updated SOLR-4480: -- Attachment: SOLR-4480.patch Updated patch, also catching exceptions in the case where the +/- comes after a leading whitespace. What do people think about the solution? Plan to commit soon. EDisMax parser blows up with query containing single plus or minus -- Key: SOLR-4480 URL: https://issues.apache.org/jira/browse/SOLR-4480 Project: Solr Issue Type: Bug Components: query parsers Reporter: Fiona Tay Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-4480.patch, SOLR-4480.patch We are running solr with sunspot and when we set up a query containing a single plus, Solr blows up with the following error: SOLR Request (5.0ms) [ path=#RSolr::Client:0x4c7464ac parameters={data: fq=type%3A%28Attachment+OR+User+OR+GpdbDataSource+OR+HadoopInstance+OR+GnipInstance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29fq=type_name_s%3A%28Attachment+OR+User+OR+Instance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29q=%2Bfl=%2A+scoreqf=name_texts+first_name_texts+last_name_texts+file_name_textsdefType=edismaxhl=onhl.simple.pre=%40%40%40hl%40%40%40hl.simple.post=%40%40%40endhl%40%40%40start=0rows=3, method: post, params: {:wt=:ruby}, query: wt=ruby, headers: {Content-Type=application/x-www-form-urlencoded; charset=UTF-8}, path: select, uri: http://localhost:8982/solr/select?wt=ruby, open_timeout: , read_timeout: } ] RSolr::Error::Http (RSolr::Error::Http - 400 Bad Request Error: org.apache.lucene.queryParser.ParseException: Cannot parse '': Encountered EOF at line 1, column 0. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4210) Requests to a Collection that does not exist on the receiving node should be proxied to a suitable node.
[ https://issues.apache.org/jira/browse/SOLR-4210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-4210: -- Summary: Requests to a Collection that does not exist on the receiving node should be proxied to a suitable node. (was: if couldn't find the collection locally when searching, we should look on other nodes. one of TODOs part in SolrDispatchFilter) Requests to a Collection that does not exist on the receiving node should be proxied to a suitable node. Key: SOLR-4210 URL: https://issues.apache.org/jira/browse/SOLR-4210 Project: Solr Issue Type: Improvement Components: SolrCloud Affects Versions: 4.0-BETA, 4.0 Reporter: Po Rui Assignee: Mark Miller Fix For: 4.2, 5.0 Attachments: SOLR-4210.patch, SOLR-4210.patch, SOLR-4210.patch It only check the local collection or core when searching, doesn't look on other nodes. e.g. a cluster have 4 nodes. 1th 2th 3th contribute to collection1. 2th 3th 4th contribute to collection2. now send query to 4th to searching collection1 will failed. It is an imperfect feature for searching. it is a TODO part in SolrDispatchFilter-line 220. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4373) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores
[ https://issues.apache.org/jira/browse/SOLR-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586239#comment-13586239 ] Uwe Schindler commented on SOLR-4373: - This is how it is called, just indirect. In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores --- Key: SOLR-4373 URL: https://issues.apache.org/jira/browse/SOLR-4373 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.1 Reporter: Alexandre Rafalovitch Priority: Blocker Labels: lib, multicore Fix For: 4.2, 5.0, 4.1.1 Attachments: multicore-bug.zip Having lib directives in solrconfig.xml seem to wipe out/override the definitions in previous cores. The exception (for the earlier core) is: at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error loading class 'solr.ICUFoldingFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) The full replication case is attached. If the SECOND core is turned off in solr.xml, the FIRST core loads just fine. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3843) Add lucene-codecs to Solr libs?
[ https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated SOLR-3843: -- Attachment: SOLR-3843.patch Here's the start to a patch (I havent tested the build with it or looked at maven and so on). This adds the codecs jar and enables SchemaCodecFactory by default: so the format for postings lists and docvalues can be customized easily in the fieldtype. I didnt want to turn this factory on by default because of SOLR-4417, but Mark fixed that. Add lucene-codecs to Solr libs? --- Key: SOLR-3843 URL: https://issues.apache.org/jira/browse/SOLR-3843 Project: Solr Issue Type: Wish Affects Versions: 4.0 Reporter: Adrien Grand Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-3843.patch Solr gives the ability to its users to select the postings format to use on a per-field basis but only Lucene40PostingsFormat is available by default (unless users add lucene-codecs to the Solr lib directory). Maybe we should add lucene-codecs to Solr libs (I mean in the WAR file) so that people can try our non-default postings formats with minimum effort? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4492) Please add support for Collection API CREATE method to evenly distribute leader roles among instances
[ https://issues.apache.org/jira/browse/SOLR-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586253#comment-13586253 ] Tim Vaillancourt commented on SOLR-4492: I like the logic mentioned. As an Ops guy I'm basically looking for a least leader algo, ie: make the winner the node with the least leader roles, otherwise random. Please add support for Collection API CREATE method to evenly distribute leader roles among instances - Key: SOLR-4492 URL: https://issues.apache.org/jira/browse/SOLR-4492 Project: Solr Issue Type: New Feature Components: SolrCloud Reporter: Tim Vaillancourt Priority: Minor Currently in SolrCloud 4.1, a CREATE call to the Collection API will cause the server receiving the CREATE call to become the leader of all shards. I would like to ask for the ability for the CREATE call to evenly distribute the leader role across all instances, ie: if I create 3 shards over 3 SOLR 4.1 instances, each instance/node would only be the leader of 1 shard. This would be logically consistent with the way replicas are randomly distributed by this same call across instances/nodes. Currently, this CREATE call will cause the server receiving the call to become the leader of 3 shards. curl -v 'http://HOST:8983/solr/admin/collections?action=CREATEname=testnumShards=3replicationFactor=2maxShardsPerNode=2' PS: Thank you SOLR developers for your contributions! Tim Vaillancourt -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4796) NamedSPILoader.reload needs to be synchronized
[ https://issues.apache.org/jira/browse/LUCENE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586257#comment-13586257 ] Uwe Schindler commented on LUCENE-4796: --- We have to fix the same issue in AnalysisSPILoader which is (unfortunately) a different class with some code duplication (in analysis/common module). NamedSPILoader.reload needs to be synchronized -- Key: LUCENE-4796 URL: https://issues.apache.org/jira/browse/LUCENE-4796 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.2, 5.0 Spun off of SOLR-4373: as discsused with uwe on IRC, NamedSPILoader.reload is not thread safe: it reads from this.services at the beginging of hte method, makes additions based on the method input, and then overwrites this.services at the end of the method. if the method is called by two threads concurrently, the entries added by threadB could be lost if threadA enters the method before threadB and exists the method after threadB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4771) Query-time join collectors could maybe be more efficient
[ https://issues.apache.org/jira/browse/LUCENE-4771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586258#comment-13586258 ] Martijn van Groningen commented on LUCENE-4771: --- I also think that this approach will improve single-valued joining too. Just collecting the ordinals and fetching the actual terms on the fly without hashing should be much faster. Just wondering how to make a standard benchmark. Usually when I test joining I generate random docs. Simple docs with with random `from` values and docs with matching `to` values and also have different `from` to `to` docs ratios. Maybe we can use the stackoverflow dataset (join questions and answers) as test dataset with relational like data. Not sure if this is possible licence wise. Query-time join collectors could maybe be more efficient Key: LUCENE-4771 URL: https://issues.apache.org/jira/browse/LUCENE-4771 Project: Lucene - Core Issue Type: Improvement Components: modules/join Reporter: Robert Muir Attachments: LUCENE-4771_prototype.patch, LUCENE-4771-prototype.patch, LUCENE-4771_prototype_without_bug.patch I was looking @ these collectors on LUCENE-4765 and I noticed: * SingleValued collector (SV) pulls FieldCache.getTerms and adds the bytes to a bytesrefhash per-collect. * MultiValued collector (MV) pulls FieldCache.getDocTermsOrds, but doesnt use the ords, just looks up each value and adds the bytes per-collect. I think instead its worth investigating if SV should use getTermsIndex, and both collectors just collect-up their per-segment ords in something like a BitSet[maxOrd]. When asked for the terms at the end in getCollectorTerms(), they could merge these into one BytesRefHash. Of course, if you are going to turn around and execute the query against the same searcher anyway (is this the typical case?), this could even be more efficient: No need to hash or instantiate all the terms in memory, we could do postpone the lookups to SeekingTermSetTermsEnum.accept()/nextSeekTerm() i think... somehow :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4796) NamedSPILoader.reload needs to be synchronized
[ https://issues.apache.org/jira/browse/LUCENE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-4796: -- Attachment: LUCENE-4796.patch NamedSPILoader.reload needs to be synchronized -- Key: LUCENE-4796 URL: https://issues.apache.org/jira/browse/LUCENE-4796 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.2, 5.0 Attachments: LUCENE-4796.patch Spun off of SOLR-4373: as discsused with uwe on IRC, NamedSPILoader.reload is not thread safe: it reads from this.services at the beginging of hte method, makes additions based on the method input, and then overwrites this.services at the end of the method. if the method is called by two threads concurrently, the entries added by threadB could be lost if threadA enters the method before threadB and exists the method after threadB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4503) Add REST API methods to get schema information: fields, dynamic fields, and field types
[ https://issues.apache.org/jira/browse/SOLR-4503?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586264#comment-13586264 ] Steve Rowe commented on SOLR-4503: -- I'm interested in what other people think of using Restlet in Solr - this issue, in part, is about exploring *how* to do that. Restlet brings some baggage: * Non-RequestHandler-based Restlet actions aren't available (as currently written anyway) via EmbeddedSolrServer, which only knows how to deal with requests that have RequestHandlers. * Restlet's artifacts aren't deployed to Maven Central - instead, they host their own Maven repository. I was worried that having dependencies drawn from 3rd party Maven repositories would cause trouble, so I deployed to the ASF staging repository a fake Solr release including the two Restlet dependencies in the Solr core POM, and the quality checks performed as part of closing didn't flag this as a problem, so I think using Restlet will not block Lucene or Solr from deploying to Maven Central. Restlet should make some things easier, though, e.g. the PUT and DELETE methods are usable. Add REST API methods to get schema information: fields, dynamic fields, and field types --- Key: SOLR-4503 URL: https://issues.apache.org/jira/browse/SOLR-4503 Project: Solr Issue Type: Sub-task Components: Schema and Analysis Affects Versions: 4.1 Reporter: Steve Rowe Assignee: Steve Rowe Attachments: SOLR-4503.patch Add REST methods that provide properties for fields, dynamic fields, and field types, using paths: /solr/(corename)/schema/fields /solr/(corename)/schema/fields/fieldname /solr/(corename)/schema/dynamicfields /solr/(corename)/schema/dynamicfields/pattern /solr/(corename)/schema/fieldtypes /solr/(corename)/schema/fieldtypes/typename -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3843) Add lucene-codecs to Solr libs?
[ https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586265#comment-13586265 ] Robert Muir commented on SOLR-3843: --- Smoketesting passes with this patch. But i am not sure if anything should/needs to be changed in maven. Add lucene-codecs to Solr libs? --- Key: SOLR-3843 URL: https://issues.apache.org/jira/browse/SOLR-3843 Project: Solr Issue Type: Wish Affects Versions: 4.0 Reporter: Adrien Grand Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-3843.patch Solr gives the ability to its users to select the postings format to use on a per-field basis but only Lucene40PostingsFormat is available by default (unless users add lucene-codecs to the Solr lib directory). Maybe we should add lucene-codecs to Solr libs (I mean in the WAR file) so that people can try our non-default postings formats with minimum effort? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4796) NamedSPILoader.reload needs to be synchronized
[ https://issues.apache.org/jira/browse/LUCENE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586301#comment-13586301 ] Hoss Man commented on LUCENE-4796: -- bq. We have to fix the same issue in AnalysisSPILoader which is (unfortunately) a different class with some code duplication A... that could totally explain why my naive attempt at fixing SOLR-4373 a while back didn't seem to work -- i was only aware of NamedSPILoader but did ad hock testing using analyzer factories. Uwe: looking at your patch, one thing that jumps out at me is that AnalysisSPILoader seems to have another exist bug that may also cause some similar problems, regardless of thread safety... {code} public synchronized void reload(ClassLoader classloader) { final SPIClassIteratorS loader = SPIClassIterator.get(clazz, classloader); final LinkedHashMapString,Class? extends S services = new LinkedHashMapString,Class? extends S(); {code} ...shouldn't that LinkedHashMap be initialized with a copy of this.services (just like in NamedSPILoader.reload) so successive calls to reload(...) don't forget services that have already been added? (if you only call reload on child classloaders, then i imagine this wouldn't cause any problems, but with independent sibling classloaders it seems like calls stacks along the lines of.. {noformat} analysisloader = new AnalysisSPILoader(Foo.class, parentClassLoader); analysisloader.reload(childAClassLoader); analysisloader.reload(childBClassLoader); {noformat} ...would cause the loader to forget about any services it found in childAClassloader) NamedSPILoader.reload needs to be synchronized -- Key: LUCENE-4796 URL: https://issues.apache.org/jira/browse/LUCENE-4796 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.2, 5.0 Attachments: LUCENE-4796.patch Spun off of SOLR-4373: as discsused with uwe on IRC, NamedSPILoader.reload is not thread safe: it reads from this.services at the beginging of hte method, makes additions based on the method input, and then overwrites this.services at the end of the method. if the method is called by two threads concurrently, the entries added by threadB could be lost if threadA enters the method before threadB and exists the method after threadB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Line length in Lucene/Solr code
If 120 is the new maximum, is it also the generally recommended reflow/line-break for javadocs? Or should that be 100, or stay at 80? I suggest 100. ~ David - Author: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Line-length-in-Lucene-Solr-code-tp4042685p4042849.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-4796) NamedSPILoader.reload needs to be synchronized
[ https://issues.apache.org/jira/browse/LUCENE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-4796: -- Attachment: LUCENE-4796.patch Thanks Hoss, this is indeed another bug. Too stupid! - copypaste error from the earlier days. In my opinion, thecode duplication is horrible, but AnalysisFactories unfortunately dont inplement NamedSPI, so have no name. NamedSPILoader.reload needs to be synchronized -- Key: LUCENE-4796 URL: https://issues.apache.org/jira/browse/LUCENE-4796 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Assignee: Hoss Man Fix For: 4.2, 5.0 Attachments: LUCENE-4796.patch, LUCENE-4796.patch Spun off of SOLR-4373: as discsused with uwe on IRC, NamedSPILoader.reload is not thread safe: it reads from this.services at the beginging of hte method, makes additions based on the method input, and then overwrites this.services at the end of the method. if the method is called by two threads concurrently, the entries added by threadB could be lost if threadA enters the method before threadB and exists the method after threadB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-4796) NamedSPILoader.reload needs to be synchronized
[ https://issues.apache.org/jira/browse/LUCENE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-4796: - Assignee: Uwe Schindler (was: Hoss Man) NamedSPILoader.reload needs to be synchronized -- Key: LUCENE-4796 URL: https://issues.apache.org/jira/browse/LUCENE-4796 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Assignee: Uwe Schindler Fix For: 4.2, 5.0 Attachments: LUCENE-4796.patch, LUCENE-4796.patch Spun off of SOLR-4373: as discsused with uwe on IRC, NamedSPILoader.reload is not thread safe: it reads from this.services at the beginging of hte method, makes additions based on the method input, and then overwrites this.services at the end of the method. if the method is called by two threads concurrently, the entries added by threadB could be lost if threadA enters the method before threadB and exists the method after threadB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4480) EDisMax parser blows up with query containing single plus or minus
[ https://issues.apache.org/jira/browse/SOLR-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586345#comment-13586345 ] Yonik Seeley commented on SOLR-4480: bq. What should be the logical behavior of a single + or -? If eDisMax discovers it as one of many words, it is treated as whitespace It shouldn't be, and a quick test shows that it is treated as a literal. Are you forgetting to URL-encode the + when trying it from a browser, or perhaps the analysis of the default field is removing the character because it's not alphanumeric? Try this: http://localhost:8983/solr/select?debug=querydefType=edismaxdf=foo_sq=hello+%2b+there EDisMax parser blows up with query containing single plus or minus -- Key: SOLR-4480 URL: https://issues.apache.org/jira/browse/SOLR-4480 Project: Solr Issue Type: Bug Components: query parsers Reporter: Fiona Tay Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-4480.patch, SOLR-4480.patch We are running solr with sunspot and when we set up a query containing a single plus, Solr blows up with the following error: SOLR Request (5.0ms) [ path=#RSolr::Client:0x4c7464ac parameters={data: fq=type%3A%28Attachment+OR+User+OR+GpdbDataSource+OR+HadoopInstance+OR+GnipInstance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29fq=type_name_s%3A%28Attachment+OR+User+OR+Instance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29q=%2Bfl=%2A+scoreqf=name_texts+first_name_texts+last_name_texts+file_name_textsdefType=edismaxhl=onhl.simple.pre=%40%40%40hl%40%40%40hl.simple.post=%40%40%40endhl%40%40%40start=0rows=3, method: post, params: {:wt=:ruby}, query: wt=ruby, headers: {Content-Type=application/x-www-form-urlencoded; charset=UTF-8}, path: select, uri: http://localhost:8982/solr/select?wt=ruby, open_timeout: , read_timeout: } ] RSolr::Error::Http (RSolr::Error::Http - 400 Bad Request Error: org.apache.lucene.queryParser.ParseException: Cannot parse '': Encountered EOF at line 1, column 0. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4490) add support for multivalued docvalues
[ https://issues.apache.org/jira/browse/SOLR-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586349#comment-13586349 ] Adrien Grand commented on SOLR-4490: +1 add support for multivalued docvalues -- Key: SOLR-4490 URL: https://issues.apache.org/jira/browse/SOLR-4490 Project: Solr Issue Type: New Feature Reporter: Robert Muir Attachments: SOLR-4490.patch, SOLR-4490.patch exposing LUCENE-4765 essentially. I think we don't need any new options, it just means doing the right thing when someone has docValues=true and multivalued=true. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4480) EDisMax parser blows up with query containing single plus or minus
[ https://issues.apache.org/jira/browse/SOLR-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586368#comment-13586368 ] Jan Høydahl commented on SOLR-4480: --- So let's take the String field example. A single %2B crashes the Lucene query parser, and since we just pass it straight through it crashes eDisMax too. For the Lucene parser, it crashes for all query strings *ending* in a single + http://localhost:8983/solr/select?debug=queryq=foo%20%2B but not for queries where there is a whitespace after the + http://localhost:8983/solr/select?debug=queryq=%2B%20foo eDismax is a bit different. It does not crash on ending + but it swallows it: http://localhost:8983/solr/select?debug=querydefType=edismaxdf=foo_sq=%2B%20hello%20%2B This is due to line 700-703 being too quick at guessing that the + or - means MUST or NOT {code} if (ch=='+' || ch=='-') { clause.must = ch; pos++; } {code} I'm ok with saying that a single + or - should mean literal matching (given that field type supports it), and thus we translate '+'-'\+'. But then we should do the same for the + or - at the end of a query string. EDisMax parser blows up with query containing single plus or minus -- Key: SOLR-4480 URL: https://issues.apache.org/jira/browse/SOLR-4480 Project: Solr Issue Type: Bug Components: query parsers Reporter: Fiona Tay Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-4480.patch, SOLR-4480.patch We are running solr with sunspot and when we set up a query containing a single plus, Solr blows up with the following error: SOLR Request (5.0ms) [ path=#RSolr::Client:0x4c7464ac parameters={data: fq=type%3A%28Attachment+OR+User+OR+GpdbDataSource+OR+HadoopInstance+OR+GnipInstance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29fq=type_name_s%3A%28Attachment+OR+User+OR+Instance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29q=%2Bfl=%2A+scoreqf=name_texts+first_name_texts+last_name_texts+file_name_textsdefType=edismaxhl=onhl.simple.pre=%40%40%40hl%40%40%40hl.simple.post=%40%40%40endhl%40%40%40start=0rows=3, method: post, params: {:wt=:ruby}, query: wt=ruby, headers: {Content-Type=application/x-www-form-urlencoded; charset=UTF-8}, path: select, uri: http://localhost:8982/solr/select?wt=ruby, open_timeout: , read_timeout: } ] RSolr::Error::Http (RSolr::Error::Http - 400 Bad Request Error: org.apache.lucene.queryParser.ParseException: Cannot parse '': Encountered EOF at line 1, column 0. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-4480) EDisMax parser blows up with query containing single plus or minus
[ https://issues.apache.org/jira/browse/SOLR-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586368#comment-13586368 ] Jan Høydahl edited comment on SOLR-4480 at 2/25/13 10:22 PM: - So let's take the String field example. A single %2B crashes the Lucene query parser, and since we just pass it straight through it crashes eDisMax too. For the Lucene parser, it crashes for all query strings *ending* in a single + http://localhost:8983/solr/select?debug=queryq=foo%20%2B but not for queries where there is a whitespace after the + http://localhost:8983/solr/select?debug=queryq=%2B%20foo eDismax is a bit different. It does not crash on ending + but it swallows it: http://localhost:8983/solr/select?debug=querydefType=edismaxdf=foo_sq=%2B%20hello%20%2B This is probably due to line 700-703 being too quick at guessing that the + or - means MUST or NOT {code} if (ch=='+' || ch=='-') { clause.must = ch; pos++; } {code} I'm ok with saying that a single plus or minus should mean literal matching (given that field type supports it), and thus we add escaping. But then we should do the same at the end of a query string. was (Author: janhoy): So let's take the String field example. A single %2B crashes the Lucene query parser, and since we just pass it straight through it crashes eDisMax too. For the Lucene parser, it crashes for all query strings *ending* in a single + http://localhost:8983/solr/select?debug=queryq=foo%20%2B but not for queries where there is a whitespace after the + http://localhost:8983/solr/select?debug=queryq=%2B%20foo eDismax is a bit different. It does not crash on ending + but it swallows it: http://localhost:8983/solr/select?debug=querydefType=edismaxdf=foo_sq=%2B%20hello%20%2B This is due to line 700-703 being too quick at guessing that the + or - means MUST or NOT {code} if (ch=='+' || ch=='-') { clause.must = ch; pos++; } {code} I'm ok with saying that a single + or - should mean literal matching (given that field type supports it), and thus we translate '+'-'\+'. But then we should do the same for the + or - at the end of a query string. EDisMax parser blows up with query containing single plus or minus -- Key: SOLR-4480 URL: https://issues.apache.org/jira/browse/SOLR-4480 Project: Solr Issue Type: Bug Components: query parsers Reporter: Fiona Tay Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-4480.patch, SOLR-4480.patch We are running solr with sunspot and when we set up a query containing a single plus, Solr blows up with the following error: SOLR Request (5.0ms) [ path=#RSolr::Client:0x4c7464ac parameters={data: fq=type%3A%28Attachment+OR+User+OR+GpdbDataSource+OR+HadoopInstance+OR+GnipInstance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29fq=type_name_s%3A%28Attachment+OR+User+OR+Instance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29q=%2Bfl=%2A+scoreqf=name_texts+first_name_texts+last_name_texts+file_name_textsdefType=edismaxhl=onhl.simple.pre=%40%40%40hl%40%40%40hl.simple.post=%40%40%40endhl%40%40%40start=0rows=3, method: post, params: {:wt=:ruby}, query: wt=ruby, headers: {Content-Type=application/x-www-form-urlencoded; charset=UTF-8}, path: select, uri: http://localhost:8982/solr/select?wt=ruby, open_timeout: , read_timeout: } ] RSolr::Error::Http (RSolr::Error::Http - 400 Bad Request Error: org.apache.lucene.queryParser.ParseException: Cannot parse '': Encountered EOF at line 1, column 0. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3843) Add lucene-codecs to Solr libs?
[ https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-3843: - Attachment: SOLR-3843.patch bq. Smoketesting passes with this patch. But i am not sure if anything should/needs to be changed in maven. The attached patch is Robert's with the addition of a dependency from the Solr webapp module on the lucene-codecs jar. With this change, when the war is built by Maven, the lucene-codecs jar is put in the same place as when the war is built by the Ant build: under WEB-INF/lib/. Add lucene-codecs to Solr libs? --- Key: SOLR-3843 URL: https://issues.apache.org/jira/browse/SOLR-3843 Project: Solr Issue Type: Wish Affects Versions: 4.0 Reporter: Adrien Grand Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-3843.patch, SOLR-3843.patch Solr gives the ability to its users to select the postings format to use on a per-field basis but only Lucene40PostingsFormat is available by default (unless users add lucene-codecs to the Solr lib directory). Maybe we should add lucene-codecs to Solr libs (I mean in the WAR file) so that people can try our non-default postings formats with minimum effort? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4480) EDisMax parser blows up with query containing single plus or minus
[ https://issues.apache.org/jira/browse/SOLR-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586380#comment-13586380 ] Yonik Seeley commented on SOLR-4480: bq. I'm ok with saying that a single plus or minus should mean literal matching (given that field type supports it), and thus we add escaping. But then we should do the same at the end of a query string. Correct. EDisMax parser blows up with query containing single plus or minus -- Key: SOLR-4480 URL: https://issues.apache.org/jira/browse/SOLR-4480 Project: Solr Issue Type: Bug Components: query parsers Reporter: Fiona Tay Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-4480.patch, SOLR-4480.patch We are running solr with sunspot and when we set up a query containing a single plus, Solr blows up with the following error: SOLR Request (5.0ms) [ path=#RSolr::Client:0x4c7464ac parameters={data: fq=type%3A%28Attachment+OR+User+OR+GpdbDataSource+OR+HadoopInstance+OR+GnipInstance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29fq=type_name_s%3A%28Attachment+OR+User+OR+Instance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29fq=-%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29fq=-%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29q=%2Bfl=%2A+scoreqf=name_texts+first_name_texts+last_name_texts+file_name_textsdefType=edismaxhl=onhl.simple.pre=%40%40%40hl%40%40%40hl.simple.post=%40%40%40endhl%40%40%40start=0rows=3, method: post, params: {:wt=:ruby}, query: wt=ruby, headers: {Content-Type=application/x-www-form-urlencoded; charset=UTF-8}, path: select, uri: http://localhost:8982/solr/select?wt=ruby, open_timeout: , read_timeout: } ] RSolr::Error::Http (RSolr::Error::Http - 400 Bad Request Error: org.apache.lucene.queryParser.ParseException: Cannot parse '': Encountered EOF at line 1, column 0. Was expecting one of: NOT ... + ... - ... ( ... * ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3843) Add lucene-codecs to Solr libs?
[ https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586388#comment-13586388 ] Robert Muir commented on SOLR-3843: --- Thanks Steve: I was actually (and still am i think) uncertain who should have the dependency. If you think about it, its no different than the analysis module cases: but i don't see the webapp depending on them here. At the moment, i understand the reasoning behind the hard dependency to analysis-common.jar (because bogusly the factory stuff is there, imo it should not be). But somewhere in maven, something in solr depends on the other analysis modules it bundles (e.g. analyzers-phonetic), yet you could remove this jar and solr would work fine (as long as you didnt use these particular phonetic analyzers). So I feel like these analysis components (except common, see above), along with codecs.jar, should be depended on in the same place. I guess theoretically they are optional dependencies but I don't think we should do that (unless we test every possibility with/without optional X,Y,Z, so I think its a bad idea). But they are the same in this sense. Add lucene-codecs to Solr libs? --- Key: SOLR-3843 URL: https://issues.apache.org/jira/browse/SOLR-3843 Project: Solr Issue Type: Wish Affects Versions: 4.0 Reporter: Adrien Grand Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-3843.patch, SOLR-3843.patch Solr gives the ability to its users to select the postings format to use on a per-field basis but only Lucene40PostingsFormat is available by default (unless users add lucene-codecs to the Solr lib directory). Maybe we should add lucene-codecs to Solr libs (I mean in the WAR file) so that people can try our non-default postings formats with minimum effort? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3843) Add lucene-codecs to Solr libs?
[ https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586412#comment-13586412 ] Steve Rowe commented on SOLR-3843: -- In the Maven build, it's the solr core module that depends on these analysis modules. Here's the output from {{mvn dependency:tree}} in {{maven-build/solr/webapp/}}: {noformat} [INFO] --- maven-dependency-plugin:2.4:tree (default-cli) @ solr --- [INFO] org.apache.solr:solr:war:5.0-SNAPSHOT [INFO] +- org.apache.solr:solr-core:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-core:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-analyzers-common:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-analyzers-kuromoji:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-analyzers-morfologik:jar:5.0-SNAPSHOT:compile [INFO] | | \- org.carrot2:morfologik-polish:jar:1.5.5:compile [INFO] | | \- org.carrot2:morfologik-stemming:jar:1.5.5:compile [INFO] | |\- org.carrot2:morfologik-fsa:jar:1.5.5:compile [INFO] | +- org.apache.lucene:lucene-analyzers-phonetic:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-highlighter:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-memory:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-misc:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-queryparser:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-spatial:jar:5.0-SNAPSHOT:compile [INFO] | | \- com.spatial4j:spatial4j:jar:0.3:compile [INFO] | +- org.apache.lucene:lucene-suggest:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-grouping:jar:5.0-SNAPSHOT:compile [INFO] | +- org.apache.lucene:lucene-queries:jar:5.0-SNAPSHOT:compile [INFO] | +- commons-codec:commons-codec:jar:1.7:compile [INFO] | +- commons-cli:commons-cli:jar:1.2:compile [INFO] | +- commons-fileupload:commons-fileupload:jar:1.2.1:compile [INFO] | +- commons-io:commons-io:jar:2.1:compile [INFO] | +- commons-lang:commons-lang:jar:2.6:compile [INFO] | +- com.google.guava:guava:jar:13.0.1:compile [INFO] | +- org.codehaus.woodstox:wstx-asl:jar:3.2.7:runtime [INFO] | +- org.apache.httpcomponents:httpclient:jar:4.2.3:compile [INFO] | | \- org.apache.httpcomponents:httpcore:jar:4.2.2:compile [INFO] | \- org.apache.httpcomponents:httpmime:jar:4.2.3:compile [INFO] +- org.apache.solr:solr-solrj:jar:5.0-SNAPSHOT:compile [INFO] | \- org.apache.zookeeper:zookeeper:jar:3.4.5:compile [INFO] +- org.apache.lucene:lucene-codecs:jar:5.0-SNAPSHOT:compile [INFO] +- org.eclipse.jetty.orbit:javax.servlet:jar:3.0.0.v201112011016:provided [INFO] +- org.slf4j:slf4j-jdk14:jar:1.6.4:runtime (scope not updated to compile) [INFO] +- org.slf4j:jcl-over-slf4j:jar:1.6.4:compile [INFO] +- org.slf4j:slf4j-api:jar:1.6.4:compile [INFO] \- junit:junit:jar:4.10:test {noformat} This parallels the Ant build: these analyzer jars are included in the solr.lucene.libs path, which is included in solr.base.classpath. I put the lucene-codecs dependency on the solr webapp module rather than the solr core module because *all non-test compilation succeeds without lucene-codecs*. (The lucene-test-framework pulls lucene-codecs into all Solr test classpaths.) And this issue is about packaging of the war: adding the dependency to the webapp module fixes exactly the problem. Add lucene-codecs to Solr libs? --- Key: SOLR-3843 URL: https://issues.apache.org/jira/browse/SOLR-3843 Project: Solr Issue Type: Wish Affects Versions: 4.0 Reporter: Adrien Grand Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-3843.patch, SOLR-3843.patch Solr gives the ability to its users to select the postings format to use on a per-field basis but only Lucene40PostingsFormat is available by default (unless users add lucene-codecs to the Solr lib directory). Maybe we should add lucene-codecs to Solr libs (I mean in the WAR file) so that people can try our non-default postings formats with minimum effort? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3843) Add lucene-codecs to Solr libs?
[ https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586415#comment-13586415 ] Robert Muir commented on SOLR-3843: --- {quote} I put the lucene-codecs dependency on the solr webapp module rather than the solr core module because all non-test compilation succeeds without lucene-codecs. (The lucene-test-framework pulls lucene-codecs into all Solr test classpaths.) And this issue is about packaging of the war: adding the dependency to the webapp module fixes exactly the problem. {quote} But it would also succeed without analyzers-phonetic. How are they any different? Add lucene-codecs to Solr libs? --- Key: SOLR-3843 URL: https://issues.apache.org/jira/browse/SOLR-3843 Project: Solr Issue Type: Wish Affects Versions: 4.0 Reporter: Adrien Grand Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-3843.patch, SOLR-3843.patch Solr gives the ability to its users to select the postings format to use on a per-field basis but only Lucene40PostingsFormat is available by default (unless users add lucene-codecs to the Solr lib directory). Maybe we should add lucene-codecs to Solr libs (I mean in the WAR file) so that people can try our non-default postings formats with minimum effort? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3843) Add lucene-codecs to Solr libs?
[ https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586424#comment-13586424 ] Steve Rowe commented on SOLR-3843: -- bq. But it would also succeed without analyzers-phonetic. How are they any different? They're not. :) I think the Ant build should change here: the solr compilation classpath shouldn't have things on it that aren't required for compilation. (This goes for the analysis module dependencies in the Maven build too, of course.) Is there a place where (optional) runtime dependencies are added to the stuff that goes into the war? I haven't looked at this in a while. Add lucene-codecs to Solr libs? --- Key: SOLR-3843 URL: https://issues.apache.org/jira/browse/SOLR-3843 Project: Solr Issue Type: Wish Affects Versions: 4.0 Reporter: Adrien Grand Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-3843.patch, SOLR-3843.patch Solr gives the ability to its users to select the postings format to use on a per-field basis but only Lucene40PostingsFormat is available by default (unless users add lucene-codecs to the Solr lib directory). Maybe we should add lucene-codecs to Solr libs (I mean in the WAR file) so that people can try our non-default postings formats with minimum effort? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3843) Add lucene-codecs to Solr libs?
[ https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586439#comment-13586439 ] Robert Muir commented on SOLR-3843: --- I dont think the ant build makes any distinction here. But yeah there is probably bigger issue / better way to go about it, someting like: * solr core etc should only have the minimal dependencies * tests using the solr example should somehow be in webapp/test or something. * webapp depends on these modules like phonetic and codecs. * the fact that lucene-test-framework brings in codecs anyway is an impl detail I guess for now I was just looking at us doing things consistently. Even if we are consistently wrong :) Add lucene-codecs to Solr libs? --- Key: SOLR-3843 URL: https://issues.apache.org/jira/browse/SOLR-3843 Project: Solr Issue Type: Wish Affects Versions: 4.0 Reporter: Adrien Grand Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-3843.patch, SOLR-3843.patch Solr gives the ability to its users to select the postings format to use on a per-field basis but only Lucene40PostingsFormat is available by default (unless users add lucene-codecs to the Solr lib directory). Maybe we should add lucene-codecs to Solr libs (I mean in the WAR file) so that people can try our non-default postings formats with minimum effort? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3843) Add lucene-codecs to Solr libs?
[ https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Rowe updated SOLR-3843: - Attachment: SOLR-3843.patch bq. I guess for now I was just looking at us doing things consistently. Even if we are consistently wrong :) Right, makes sense - in this case the consistent thing to do is to make the solr-core module, rather than the webapp module, depend in lucene-codecs jar in the Maven build. The attached patch does this. Add lucene-codecs to Solr libs? --- Key: SOLR-3843 URL: https://issues.apache.org/jira/browse/SOLR-3843 Project: Solr Issue Type: Wish Affects Versions: 4.0 Reporter: Adrien Grand Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-3843.patch, SOLR-3843.patch, SOLR-3843.patch Solr gives the ability to its users to select the postings format to use on a per-field basis but only Lucene40PostingsFormat is available by default (unless users add lucene-codecs to the Solr lib directory). Maybe we should add lucene-codecs to Solr libs (I mean in the WAR file) so that people can try our non-default postings formats with minimum effort? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Comment Edited] (SOLR-3843) Add lucene-codecs to Solr libs?
[ https://issues.apache.org/jira/browse/SOLR-3843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586458#comment-13586458 ] Steve Rowe edited comment on SOLR-3843 at 2/25/13 11:26 PM: bq. I guess for now I was just looking at us doing things consistently. Even if we are consistently wrong :) Right, makes sense - in this case the consistent thing to do is to make the solr-core module, rather than the webapp module, depend on the lucene-codecs jar in the Maven build. The attached patch does this. was (Author: steve_rowe): bq. I guess for now I was just looking at us doing things consistently. Even if we are consistently wrong :) Right, makes sense - in this case the consistent thing to do is to make the solr-core module, rather than the webapp module, depend in lucene-codecs jar in the Maven build. The attached patch does this. Add lucene-codecs to Solr libs? --- Key: SOLR-3843 URL: https://issues.apache.org/jira/browse/SOLR-3843 Project: Solr Issue Type: Wish Affects Versions: 4.0 Reporter: Adrien Grand Priority: Critical Fix For: 4.2, 5.0 Attachments: SOLR-3843.patch, SOLR-3843.patch, SOLR-3843.patch Solr gives the ability to its users to select the postings format to use on a per-field basis but only Lucene40PostingsFormat is available by default (unless users add lucene-codecs to the Solr lib directory). Maybe we should add lucene-codecs to Solr libs (I mean in the WAR file) so that people can try our non-default postings formats with minimum effort? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4796) NamedSPILoader.reload needs to be synchronized
[ https://issues.apache.org/jira/browse/LUCENE-4796?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586481#comment-13586481 ] Hoss Man commented on LUCENE-4796: -- +1 ... looks good to me. NamedSPILoader.reload needs to be synchronized -- Key: LUCENE-4796 URL: https://issues.apache.org/jira/browse/LUCENE-4796 Project: Lucene - Core Issue Type: Bug Reporter: Hoss Man Assignee: Uwe Schindler Fix For: 4.2, 5.0 Attachments: LUCENE-4796.patch, LUCENE-4796.patch Spun off of SOLR-4373: as discsused with uwe on IRC, NamedSPILoader.reload is not thread safe: it reads from this.services at the beginging of hte method, makes additions based on the method input, and then overwrites this.services at the end of the method. if the method is called by two threads concurrently, the entries added by threadB could be lost if threadA enters the method before threadB and exists the method after threadB -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-4373) In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores
[ https://issues.apache.org/jira/browse/SOLR-4373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man updated SOLR-4373: --- Description: Having lib directives in the solrconfig.xml files of multiple cores can cause problems when using multi-threaded core initialization -- which is the default starting with Solr 4.1. The problem manifests itself as init errors in the logs related to not being able to find classes located in plugin jars, even though earlier log messages indicated that those jars had been added to the classpath. One work around is to set {{coreLoadThreads=1}} in your solr.xml file -- forcing single threaded core initialization. For example... {code} ?xml version=1.0 encoding=utf-8 ? solr coreLoadThreads=1 cores adminPath=/admin/cores core name=core1 instanceDir=core1 / core name=core2 instanceDir=core2 / /cores /solr {code} (Similar problems may occur if multiple cores are initialized concurrently using the /admin/cores handler) was: Having lib directives in solrconfig.xml seem to wipe out/override the definitions in previous cores. The exception (for the earlier core) is: at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:369) at org.apache.solr.schema.IndexSchema.init(IndexSchema.java:113) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1000) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1033) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:624) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.solr.common.SolrException: Plugin init failure for [schema.xml] analyzer/filter: Error loading class 'solr.ICUFoldingFilterFactory' at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:177) at org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:377) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95) at org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43) at org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:151) The full replication case is attached. If the SECOND core is turned off in solr.xml, the FIRST core loads just fine. bq. My bad. Does it mean we know where the problem is? it means a few things... 1) it confirms the problem you were seeing realted to multi-thread core reloading 2) it means we're seeing consistent results, which narrows down the possible causes (before it looked like there may be two causes of a single symptom: one related to multi-threaded loading that i could reproduce, and one unknown that you could reproduce, now it seems more likeley that they are the same 3) it means we have a config based work arround for users who encounter this problem w/o requiring a code patch. Can you try out the patch Uwe posted to LUCENE-4796? his comments there helped me realize why my earlier attempts at fixing this bug didn't work for me, and with his most recent patch i can't reproduce this problem. would be good to have your feedback. In multicore, lib directives in solrconfig.xml cause conflict and clobber directives from earlier cores --- Key: SOLR-4373 URL: https://issues.apache.org/jira/browse/SOLR-4373 Project: Solr Issue Type: Bug Components: multicore Affects Versions: 4.1 Reporter: Alexandre Rafalovitch Priority: Blocker Labels: lib, multicore Fix For: 4.2, 5.0, 4.1.1 Attachments: multicore-bug.zip Having lib directives in the solrconfig.xml files of multiple cores can cause problems when using multi-threaded core initialization -- which is the default starting with Solr 4.1. The problem manifests itself as init errors in the logs related to not being able to find classes located in plugin jars, even though earlier log messages indicated that those jars had
[jira] [Resolved] (LUCENE-4748) Add DrillSideways helper class to Lucene facets module
[ https://issues.apache.org/jira/browse/LUCENE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless resolved LUCENE-4748. Resolution: Fixed Add DrillSideways helper class to Lucene facets module -- Key: LUCENE-4748 URL: https://issues.apache.org/jira/browse/LUCENE-4748 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: DrillSideways-alternative.tar.gz, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch This came out of a discussion on the java-user list with subject Faceted search in OR: http://markmail.org/thread/jmnq6z2x7ayzci5k The basic idea is to count near misses during collection, ie documents that matched the main query and also all except one of the drill down filters. Drill sideways makes for a very nice faceted search UI because you don't lose the facet counts after drilling in. Eg maybe you do a search for cameras, and you see facets for the manufacturer, so you drill into Nikon. With drill sideways, even after drilling down, you'll still get the counts for all the other brands, where each count tells you how many hits you'd get if you changed to a different manufacturer. This becomes more fun if you add further drill-downs, eg maybe I next drill down into Resolution=10 megapixels, and then I can see how many 10 megapixel cameras all other manufacturers, and what other resolutions Nikon cameras offer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4748) Add DrillSideways helper class to Lucene facets module
[ https://issues.apache.org/jira/browse/LUCENE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586496#comment-13586496 ] Commit Tag Bot commented on LUCENE-4748: [trunk commit] Michael McCandless http://svn.apache.org/viewvc?view=revisionrevision=1449972 LUCENE-4748: add DrillSideways utility class to facets module Add DrillSideways helper class to Lucene facets module -- Key: LUCENE-4748 URL: https://issues.apache.org/jira/browse/LUCENE-4748 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: DrillSideways-alternative.tar.gz, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch This came out of a discussion on the java-user list with subject Faceted search in OR: http://markmail.org/thread/jmnq6z2x7ayzci5k The basic idea is to count near misses during collection, ie documents that matched the main query and also all except one of the drill down filters. Drill sideways makes for a very nice faceted search UI because you don't lose the facet counts after drilling in. Eg maybe you do a search for cameras, and you see facets for the manufacturer, so you drill into Nikon. With drill sideways, even after drilling down, you'll still get the counts for all the other brands, where each count tells you how many hits you'd get if you changed to a different manufacturer. This becomes more fun if you add further drill-downs, eg maybe I next drill down into Resolution=10 megapixels, and then I can see how many 10 megapixel cameras all other manufacturers, and what other resolutions Nikon cameras offer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-4504) CurrencyField treats docs w/o value the same as having a value of 0.0
Hoss Man created SOLR-4504: -- Summary: CurrencyField treats docs w/o value the same as having a value of 0.0 Key: SOLR-4504 URL: https://issues.apache.org/jira/browse/SOLR-4504 Project: Solr Issue Type: Bug Reporter: Hoss Man As noted by Gerald Blank on the mailing list, CurrencyField queries treat documents w/o any value the same as documents wit ha value of 0.0f. observe that using the example solr schema, with any number of docs indexed, this query matches all docs even though no docs have any values at all for hte specified field... {noformat} http://localhost:8983/solr/select?q=hoss_c:[*%20TO%20*] {noformat} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-4748) Add DrillSideways helper class to Lucene facets module
[ https://issues.apache.org/jira/browse/LUCENE-4748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586508#comment-13586508 ] Commit Tag Bot commented on LUCENE-4748: [branch_4x commit] Michael McCandless http://svn.apache.org/viewvc?view=revisionrevision=1449973 LUCENE-4748: add DrillSideways utility class to facets module Add DrillSideways helper class to Lucene facets module -- Key: LUCENE-4748 URL: https://issues.apache.org/jira/browse/LUCENE-4748 Project: Lucene - Core Issue Type: Improvement Components: modules/facet Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 4.2, 5.0 Attachments: DrillSideways-alternative.tar.gz, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch, LUCENE-4748.patch This came out of a discussion on the java-user list with subject Faceted search in OR: http://markmail.org/thread/jmnq6z2x7ayzci5k The basic idea is to count near misses during collection, ie documents that matched the main query and also all except one of the drill down filters. Drill sideways makes for a very nice faceted search UI because you don't lose the facet counts after drilling in. Eg maybe you do a search for cameras, and you see facets for the manufacturer, so you drill into Nikon. With drill sideways, even after drilling down, you'll still get the counts for all the other brands, where each count tells you how many hits you'd get if you changed to a different manufacturer. This becomes more fun if you add further drill-downs, eg maybe I next drill down into Resolution=10 megapixels, and then I can see how many 10 megapixel cameras all other manufacturers, and what other resolutions Nikon cameras offer. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-4414) MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard
[ https://issues.apache.org/jira/browse/SOLR-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13586558#comment-13586558 ] Colin Bartolome commented on SOLR-4414: --- By the way, I'm guessing the interesting terms that the query does return, when it returns any, are based on the documents contained in that shard only, instead of the documents contained in the whole collection. I suppose I can live with that, for the time being, but the trick is to query the right shard to begin with! MoreLikeThis on a shard finds no interesting terms if the document queried is not in that shard --- Key: SOLR-4414 URL: https://issues.apache.org/jira/browse/SOLR-4414 Project: Solr Issue Type: Bug Components: MoreLikeThis, SolrCloud Affects Versions: 4.1 Reporter: Colin Bartolome Running a MoreLikeThis query in a cloud works only when the document being queried exists in whatever shard serves the request. If the document is not present in the shard, no interesting terms are found and, consequently, no matches are found. h5. Steps to reproduce * Edit example/solr/collection1/conf/solrconfig.xml and add this line, with the rest of the request handlers: {code:xml} requestHandler name=/mlt class=solr.MoreLikeThisHandler / {code} * Follow the [simplest SolrCloud example|http://wiki.apache.org/solr/SolrCloud#Example_A:_Simple_two_shard_cluster] to get two shards running. * Hit this URL: [http://localhost:8983/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1] * Compare that output to that of this URL: [http://localhost:7574/solr/collection1/mlt?mlt.fl=includesq=id:3007WFPmlt.match.include=falsemlt.interestingTerms=listmlt.mindf=1mlt.mintf=1] The former URL will return a result and list some interesting terms. The latter URL will return no results and list no interesting terms. It will also show this odd XML element: {code:xml} null name=response/ {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Distributed MLT doesn't seem to work (SOLR-4414?)
I can't get distributed MLT (committed in Solr 4.1, using 4.2-SNAPSHOT) to work at all. I think whatever is causing SOLR-4414 is probably causing my issue as well. If there's anything specific that's required to troubleshoot this issue, let me know how to get it and I'll provide it. Thanks, Shawn - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org