Google-developed posting list encoding
Can be quite a bit faster than vInt in some cases: http://www.ir.uwaterloo.ca/book/addenda-06-index-compression.html -Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: [jira] Created: (SOLR-1363) Search without using caches
Keep in mind that there is no way to bypass the most important cache of all (os disk cache). -Mike On Thu, Aug 13, 2009 at 12:01 PM, Jason Rutherglen (JIRA) j...@apache.orgwrote: For testing, I often need to perform a query and see the actual time it takes (rather than the time it takes to look it up from the cache). We'll need various options such as bypass the docsets, docs, or results. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Updated: (SOLR-1155) Change DirectUpdateHandler2 to allow concurrent adds during an autocommit
I'd like to take a look at this but JIRA seems to be down. Is anyone else experiencing this? -Mike On Wed, May 13, 2009 at 7:41 AM, Jayson Minard (JIRA) j...@apache.orgwrote: [ https://issues.apache.org/jira/browse/SOLR-1155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel] Jayson Minard updated SOLR-1155: Attachment: Solr-1155.patch Resolve TODO for commitWithin, and updated AutoCommitTrackerTest to validate the fix. Change DirectUpdateHandler2 to allow concurrent adds during an autocommit - Key: SOLR-1155 URL: https://issues.apache.org/jira/browse/SOLR-1155 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3 Reporter: Jayson Minard Attachments: Solr-1155.patch, Solr-1155.patch Currently DirectUpdateHandler2 will block adds during a commit, and it seems to be possible with recent changes to Lucene to allow them to run concurrently. See: http://www.nabble.com/Autocommit-blocking-adds---AutoCommit-Speedup--td23435224.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-1169) SortedIntDocSet
[ https://issues.apache.org/jira/browse/SOLR-1169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12709645#action_12709645 ] Mike Klaas commented on SOLR-1169: -- sweet. intersecting sorted int dicts should be faster in the general case. HashSet will of course win when one set is very small, but I expect this to still be pretty fast anyway. SortedIntDocSet --- Key: SOLR-1169 URL: https://issues.apache.org/jira/browse/SOLR-1169 Project: Solr Issue Type: Improvement Reporter: Yonik Seeley Assignee: Yonik Seeley Fix For: 1.4 A DocSet type that can skip to support SOLR-1165 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: DirectUpdateHandler2 threads pile up behind scheduleCommitWithin
Hi Jayson, Thanks, I'll take a look in the next few days. The current patch doesn't guarantee index consistency during post-commit callback hooks, right? This could be a problem for index replication. (Incidentally, I'm rather unfamiliar with the new java-based replication design. Anyone care to comment on the implications?) cheers, -MIke On 10-May-09, at 10:54 AM, jayson.minard wrote: Mike, I revamped the DirectUpdateHandler2 into DirectUpdateHandler3 in SOLR-1155, probably ready enough for your review to see if locking makes sense for current Lucene behavior. https://issues.apache.org/jira/browse/SOLR-1155 --j Mike Klaas wrote: On 7-May-09, at 10:36 AM, jayson.minard wrote: Does every thread really need to notify the update handler of the commit interval/threshold being reached, or really just the first thread that notices should send the signal, or better yet a background commit watching thread so that no foreground thread has to pay attention at all. That is assuming they wouldn't need to block like they are now for a reason I'm likely unaware of... This is due to the way Lucene was designed (although recent improvements in Lucene mean we can do better here). See the recent thread Autocommit blocking adds? on solr-user for a related discussion. As the person who first wrote the multi-threaded-ness of DUH2, I'd be very happy to promptly review any improvements made to it. -Mike -- View this message in context: http://www.nabble.com/DirectUpdateHandler2-threads-pile-up-behind-scheduleCommitWithin-tp23431691p23472391.html Sent from the Solr - Dev mailing list archive at Nabble.com.
Re: DirectUpdateHandler2 threads pile up behind scheduleCommitWithin
On 7-May-09, at 10:36 AM, jayson.minard wrote: Does every thread really need to notify the update handler of the commit interval/threshold being reached, or really just the first thread that notices should send the signal, or better yet a background commit watching thread so that no foreground thread has to pay attention at all. That is assuming they wouldn't need to block like they are now for a reason I'm likely unaware of... This is due to the way Lucene was designed (although recent improvements in Lucene mean we can do better here). See the recent thread Autocommit blocking adds? on solr-user for a related discussion. As the person who first wrote the multi-threaded-ness of DUH2, I'd be very happy to promptly review any improvements made to it. -Mike
Re: Welcome new Solr committers Mark Miller and Noble Paul
On 30-Apr-09, at 10:41 AM, Yonik Seeley wrote: I'm pleased to announce that Mark Miller and Noble Paul have accepted invitations to become Solr committers! Welcome Mark Noble, and thanks for all your great work on Solr! Congratulations Mark and Noble! Good to have you on board. -Mike
[jira] Commented: (SOLR-1116) Add a Binary FieldType
[ https://issues.apache.org/jira/browse/SOLR-1116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12704284#action_12704284 ] Mike Klaas commented on SOLR-1116: -- +1 for url-safe base64 (-_ being the extra chars) Add a Binary FieldType -- Key: SOLR-1116 URL: https://issues.apache.org/jira/browse/SOLR-1116 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: Noble Paul Fix For: 1.4 Attachments: SOLR-1116.patch, SOLR-1116.patch Lucene supports binary data for field but Solr has no corresponding field type. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Modularization
On 23-Mar-09, at 2:41 PM, Michael McCandless wrote: I agree, but at least we need some clear criteria so the future decision process is more straightforward. Towards that... it seems like there are good reasons why something should be put into contrib: * It uses a version of JDK higher than what core can allow * It has external dependencies * Its quality is debatable (or at least not proven) * It's of somewhat narrow usage/interest (eg: contrib/bdb) But I don't think it doesn't have to be in core (the software modularity goal) is the right reason to put something in contrib. Agreed. I don't think that building on the existing 'contrib' is the way to go. Frequently-used, high-quality components should be more properly part of Lucene, whether that means that they move to core, or in a new blessed modules section. Getting back to the original topic: Trie(Numeric)RangeFilter runs on JDK 1.4, has no external dependencies, looks to be high quality, and likely will have wide appeal. Doesn't it belong in core? +1. It is important that Lucene come blessed with very good quality defaults. Fast range queries are a common requirement. Similarly, I wouldn't be happy to have a new, wicked QueryParser be relegated to contrib where it is unlikely to be found by non-savvy users. At the very least, I agree with Michael that it should be findable in the same place. It does make sense to separate the machinery/building blocks (base Query, Weight, Scorer, Filter classes, Similarity interface, etc.) from the Query/Filter implementations that use them. But whether this is done by putting them in separate directories or via global core/ modules distinction seems unimportant. -Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1561) Maybe rename Field.omitTf, and strengthen the javadocs
[ https://issues.apache.org/jira/browse/LUCENE-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12688449#action_12688449 ] Mike Klaas commented on LUCENE-1561: I agree that it is going to be almost impossible to convey that phrase queries don't work by renaming the flag. I agree with Eks Dev that a positive formulation is the only chance, although this deviates from the current omit* flags. termPresenceOnly() trackTermPresenceOnly() onlyTermPresence() omitEverythingButTermPresence() // just kidding Maybe rename Field.omitTf, and strengthen the javadocs -- Key: LUCENE-1561 URL: https://issues.apache.org/jira/browse/LUCENE-1561 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.4.1 Reporter: Michael McCandless Assignee: Michael McCandless Fix For: 2.9 Attachments: LUCENE-1561.patch Spinoff from here: http://www.nabble.com/search-problem-when-indexed-using-Field.setOmitTf()-td22456141.html Maybe rename omitTf to something like omitTermPositions, and make it clear what queries will silently fail to work as a result. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
Re: Getting tokens from search results. Simple concept
On 5-Mar-09, at 2:42 PM, Chris Hostetter wrote: : What I would LOVE is if I could do it in a standard Lucene search like I : mentioned earlier. : Hit.doc[0].getHitTokenList() :confused: : Something like this... The Query/Scorer APIs don't provide any mechanism for information like that to be conveyed back up the call chain -- mainly because it's more heavy weight then most people need. If you have custom Query/Scorer implementations, you can keep track of whatever state you want when executing a QUery -- in fact the SpanQuery family of queries do keep track of exactly the type of info you seem to want, and after executing a query, you can ask it for the Spans of any matching document -- the down side is the a loss in performance of query execution (because it takes time/memory to keep track of all the matches) Even then, if I'm not mistaken, spans track token _positions_, not _offsets_ in the original string. A reverse text index like lucene is fast precisely because it doesn't have to keep track of this information. I think the best alternative might be to use termvectors, which are essentially a cache of the analyzed tokens for a document. -Mike - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (SOLR-1044) Use Hadoop RPC for inter Solr communication
[ https://issues.apache.org/jira/browse/SOLR-1044?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12679466#action_12679466 ] Mike Klaas commented on SOLR-1044: -- {quote} I haven't yet seen a HTTP server serving more than around 1200 req/sec (apache HTTPD). A call based server can serve 4k-5k messages easily. (I am yet to test hadoop RPC) . The proliferation of a large no: of frameworks around that is a testimony to the superiority of that approach. {/quote} up to 50,000 req/sec, with keepalive: http://www.litespeedtech.com/web-server-performance-comparison-litespeed-2.0-vs.html Use Hadoop RPC for inter Solr communication --- Key: SOLR-1044 URL: https://issues.apache.org/jira/browse/SOLR-1044 Project: Solr Issue Type: New Feature Components: search Reporter: Noble Paul Solr uses http for distributed search . We can make it a whole lot faster if we use an RPC mechanism which is more lightweight/efficient. Hadoop RPC looks like a good candidate for this. The implementation should just have one protocol. It should follow the Solr's idiom of making remote calls . A uri + params +[optional stream(s)] . The response can be a stream of bytes. To make this work we must make the SolrServer implementation pluggable in distributed search. Users should be able to choose between the current CommonshttpSolrServer, or a HadoopRpcSolrServer . -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-952) duplicated code in (Default)SolrHighlighter and HighlightingUtils
[ https://issues.apache.org/jira/browse/SOLR-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12674830#action_12674830 ] Mike Klaas commented on SOLR-952: - HighlightingUtils has been deprecated for at least one release; can't we just rip it out? duplicated code in (Default)SolrHighlighter and HighlightingUtils - Key: SOLR-952 URL: https://issues.apache.org/jira/browse/SOLR-952 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.4 Reporter: Chris Harris Priority: Minor Attachments: SOLR-952.patch A large quantity of code is duplicated between the deprecated HighlightingUtils class and the newer SolrHighlighter and DefaultSolrHighlighter (which have been getting bug fixes and enhancements). The Utils class is no longer used anywhere in Solr, but people writing plugins may be taking advantage of it, so it should be cleaned up. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [VOTE] LOGO
On 17-Dec-08, at 5:11 PM, Ryan McKinley wrote: Hoss - can you go ahead and post something? I'm heading out... but could post tomorrow. Since the community has been notified, any objections to me updating the site with the new logo/favicon? -Mike
Re: [jira] Commented: (SOLR-912) org.apache.solr.common.util.NamedList - Typesafe efficient variant - ModernNamedList introduced - implementing the same API as NamedList
On 19-Dec-08, at 8:27 AM, Kay Kay (JIRA) wrote: Meanwhile - w.r.t resize() - ( trade-off because increasing size a lot would increase memory usage. increase a size by a smaller factor would be resulting in a more frequent increases in size). I believe reading some theory that the ideal increase factor is somewhere close to ( 1 + 2^0.5) / 2 or something similar to that. It should be benchmarked, but yes, a factor of two is typically more memory wasteful than the performance it gains (you have a 50% chance of wasting at least 1/4 of your memory, a 25% chance of wasting at least 3/8th, etc.) The method - ensureCapacity(capacity) in ArrayList (Java 6) also seems to be a number along the lines ~ (1.5) int newCapacity = (oldCapacity * 3)/2 + 1; +1 seems to be move away from 0, and keep incrementing the count. ( Hmm .. That piece of code - in Java 6 ArrayList can definitely make use of bitwise operators for the div-by-2 operation !!). Let's not go crazy here guys. This relatively trivial calculation is only called log(n) times, and certainly uses bit ops after the jit gets its hands on it. -Mike
Re: [VOTE] LOGO
On 13-Dec-08, at 2:52 PM, Ryan McKinley wrote: Ok, all votes are cast (except Grant who is abstaining) Thanks for tallying the votes, Ryan. You're too damn quick for me! -Mike
Re: [VOTE] LOGO
https://issues.apache.org/jira/secure/attachment/12394264/apache_solr_a_red.jpg https://issues.apache.org/jira/secure/attachment/12394268/apache_solr_c_red.jpg https://issues.apache.org/jira/secure/attachment/12394070/sslogo-solr-finder2.0.png https://issues.apache.org/jira/secure/attachment/12394350/solr.s4.jpg https://issues.apache.org/jira/secure/attachment/12394267/apache_solr_c_blue.jpg https://issues.apache.org/jira/secure/attachment/12394266/apache_solr_b_red.jpg https://issues.apache.org/jira/secure/attachment/12394376/solr_sp.png https://issues.apache.org/jira/secure/attachment/12394218/solr-solid.png
Re: [VOTE] LOGO
I agree. I don't see why there needs to be a minimum or maximum number of logos to rank per vote. -Mike On 10-Dec-08, at 7:52 PM, Yonik Seeley wrote: Doesn't limiting to top 4 defeat the purpose of using STV to overcome splitting-the-vote? Seems like we should rank the whole list (or all that an individual finds acceptable) -Yonik On Wed, Dec 10, 2008 at 8:51 PM, Ryan McKinley ryan...@gmail.com wrote: This thread is for solr committers to list the top 4 logos preferences from the community logo contest. As a guide, we should look at: http://people.apache.org/~ryan/solr-logo-results.html The winner will be tabulated using instant runoff voting -- if this happens to result in a tie, the winner will be picked by the 'Single transferable vote' http://en.wikipedia.org/wiki/Instant-runoff_voting http://en.wikipedia.org/wiki/Single_transferable_vote To cast a valid vote, you *must* include 4 options. ryan
Re: logo contest
On 8-Dec-08, at 10:47 AM, Mike Klaas wrote: On 7-Dec-08, at 7:40 PM, Chris Hostetter wrote: : I would personally prefer more of an elimination-style vote (i.e., STV). Ah... yeah, that seems like it would be a more fair way to deal with things then my suggestion, and it doesn't violate the spirt of the rules as original outlined (it's still a vote of ranked preferences). Are you volunteering to do the vote counting Mike? Sure thing. I take it that there are no objections? If so, I'll call a vote by the end of the week. cheers, -Mike
Re: logo contest
On 10-Dec-08, at 12:41 PM, Yonik Seeley wrote: Sure thing. I take it that there are no objections? If so, I'll call a vote by the end of the week. +1 I just wish we had used this method with the community vote. I guess as a committer I should try and figure out what order the community would have voted and do that. I could run the results of the community vote interpreted as STV, if that would help (it'll be a few days, though). -Mike
Re: logo contest
On 7-Dec-08, at 7:40 PM, Chris Hostetter wrote: : I would personally prefer more of an elimination-style vote (i.e., STV). Ah... yeah, that seems like it would be a more fair way to deal with things then my suggestion, and it doesn't violate the spirt of the rules as original outlined (it's still a vote of ranked preferences). Are you volunteering to do the vote counting Mike? Sure thing. -Mike
Re: logo contest
On 4-Dec-08, at 2:33 PM, Chris Hostetter wrote: : Being the likely two candidates for winning. My guess is that : narrowing to the two most popular options first would make #2 the : winner, while voting on the top 10 (w/o any strategy for winning) : would make #1 the winner. limiting to only voting for the top 2 seems unrepresentative since more then one apache_solr_c_red.jpg variant tied for 2nd. : fun, fun. So people who want one of these options to win should vote : only for that option, really. Perhaps instead of just ranking top 5, we should ask committers to rank all of the choices on the final ballot to eliminate the strategy factor you are refering to ... i think we can trust all committers to understand this, but if someone botches it (or refuses?) we'll just shift the number of points each item earns down by the appropraite number (so if you want your 1st rank to earn 10 points, you must list all 10, if you only list 4 then your top ranked item only earns 4 points) Eliminating strategic voting merely biases the outcome toward the logo without the vote splitting problem. That is no solution. It is better to allow strategic voting, as that is the only way for voters to express certain preferences in this system. I would personally prefer more of an elimination-style vote (i.e., STV). Each voter lists the logos they prefer, in order. The logos are ranked by first place votes. The last in the rank is eliminated from the contest, and anyone who had that logo as their first-place vote has their vote transferred to the next logo on the list, if any. Iterate until two logos remain. There is no danger of vote-splitting and the outcome maximizes global welfare in terms of binary preferences (well, probably not, due to Arrow's theorem, but it does a good job regardless). -Mike
Re: [jira] Commented: (LUCENE-1458) Further steps towards flexible indexing
On 19-Nov-08, at 5:12 AM, Michael McCandless (JIRA) wrote: How can the VM system possibly make good decisions about what to swap out? It can't know if a page is being used for terms dict index, terms dict, norms, stored fields, postings. LRU is not a good policy, because some pages (terms index) are far far more costly to miss than others. A note on this discussion: we recently re-architected a large database- y, lucene-y system to use mmap-based storage and are extremely pleased with the performance. Sharing the buffers among processes is rather cool, as Marvin mentions, as is the near-instantaneous startup. -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Optimizing range constraints
Tim Sturge posted a nice optimization for range constraints/filters (e.g. age:[10 TO 35]) here: https://issues.apache.org/jira/browse/LUCENE-1461 It has a natural applicability to Solr's fq range filters, which can be abysmally slow for large ranges. Could be an interesting project for contributors who love optimizing speed (100-fold, in this case) g. I'd definitely do it had I the time. -Mike
Re: Deadlock with DirectUpdateHandler2
On 18-Nov-08, at 8:54 AM, Mark Miller wrote: Mark Miller wrote: Toby Cole wrote: Has anyone else experienced a deadlock when the DirectUpdateHandler2 does an autocommit? I'm using a recent snapshot from hudson (apache- solr-2008-11-12_08-06-21), and quite often when I'm loading data the server (tomcat 6) gets stuck at line 469 of DirectUpdateHandler2: // Check if there is a commit already scheduled for longer then this time if( pending != null pending.getDelay(TimeUnit.MILLISECONDS) = commitMaxTime ) Anyone got any enlightening tips? There is some inconsistent synchronization I think. Especially involving pending. Yuck g I would say there are problems with pending, autoCommitCount, and lastAddedTime. That alone could probably cause a deadlock (who knows), but it also seems somewhat possible that there is an issue with the heavy intermingling of locks (there a bunch of locks to be had in that class). I havn't looked for evidence of that though - prob makes sense to fix those 3 guys and see if you get reports from there. autoCommitCount is written in a CommitTracker.synchronized block only. It is read to print stats in an unsynchronized fashion, which perhaps could be fixed, though I can't see how it could cause a problem lastAddedTime is only written in a call path within a DirectUpdateHandler2.synchronized block. It is only read in a CommitTracker.synchronized block. It could read the wrong value, but I also don't see this causing a problem (a commit might fail to be scheduled). This could probably also be improved, but doesn't seem important. pending seems to be the issue. As long as commit are only triggered by autocommit, there is no issue as manipulation of pending is always performed inside CommitTracker.synchronized. But didCommit()/ didRollback() could be called via manual commit, and pending is directly manipulated during DUH2.close(). I'm having trouble coming up with a plausible deadlock scenario, but this needs to be fixed. It isn't as easy as synchronizing didCommit/didRollback, though--this would introduce definite deadlock scenarios. Mark, is there any chance you could post the thread dump for the deadlocked process? Do you issue manual commits during insertion? -Mike
Re: [jira] Commented: (SOLR-84) Logo Contests
On 14-Nov-08, at 8:54 AM, Doug Cutting (JIRA) wrote: [ https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12647660 #action_12647660 ] Doug Cutting commented on SOLR-84: -- I like https://issues.apache.org/jira/secure/attachment/12349896/logo-solr-e.jpg and https://issues.apache.org/jira/secure/attachment/12358494/sslogo-solr.jpg , because they're simple and scale down well. It should be possible to scale the logo, or a salient part of it, as small as a favicon (16x16) and still have it easily recognized. Most of the designs above require a lot of pixels to be recognizable. A good logo should be iconic more than textual--an abstract symbol. Often you can sample an element of a logo to form a favicon (like we do with Lucene's 'L'). So, when voting, think about whether there's an easily identifiable sample (e.g., is the typeface of the 'S' distinctive?). Lots of the designs do have distinctive suns that would make good favicons (after re-vectorizing; those gradients would not rescale nicely). -Mike
Re: ReentrantReadWriteLock in DUH2
On 6-Nov-08, at 7:48 AM, Koji Sekiguchi wrote: So that multiple threads can efficiently access the writer, but only one thread at a time does a commit. Adding docs with the writer is the 'read' and committing is the write. If I remember correctly. You remember correctly, Mark. Because of the lock, add/ is blocked during optimize/, even if ConcurrentMergeScheduler is used, right? I'd like to know why add/ should be blocked during optimize/. The core reason is laid out in the comment: // open a new searcher in the sync block to avoid opening it // after a deleteByQuery changed the index, or in between deletes // and adds of another commit being done. We want to open a searcher than corresponds exactly to the commit point (remember, an optimize is first and foremost a commit). I don't see why there couldn't be an optimize command that doesn't commit, if that is desired. -Mike
[jira] Commented: (SOLR-793) set a commit time bounds in the add command
[ https://issues.apache.org/jira/browse/SOLR-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12638825#action_12638825 ] Mike Klaas commented on SOLR-793: - I don't see any issue with the code: adddedDocument is always called within a synchronized context anyway, after all. One question: right now you have it set to use the minimum of autocommit/maxTime and commitWithin on the update command. Might it be better to always use commitWithin, even if it greater than a specified maxTime? This would allow the insertion of less important than normal docs (right now, it seems only useful for the more important case) set a commit time bounds in the add command - Key: SOLR-793 URL: https://issues.apache.org/jira/browse/SOLR-793 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: SOLR-793-commitWithin.patch, SOLR-793-commitWithin.patch Currently there are two options for how to handle commiting documents: 1. the client explicitly starts the commit via commit/ 2. set an auto commit value on the server -- clients can assume all documents will be commited within that time. However, this does not help in the case where the clients know what documents need updating quickly and others that could wait. I suggest adding: {code:xml} add commitWithin=100... {/code:xml} to the update syntax so the client can schedule commits explicitly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-793) set a commit time bounds in the add command
[ https://issues.apache.org/jira/browse/SOLR-793?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12637173#action_12637173 ] Mike Klaas commented on SOLR-793: - Hey Ryan, I think this is good functionality and will take a look at the synchro stuff in the next day or so. I feel somewhat reponsible, being the one who inflicted it on everyone :) set a commit time bounds in the add command - Key: SOLR-793 URL: https://issues.apache.org/jira/browse/SOLR-793 Project: Solr Issue Type: Improvement Components: update Reporter: Ryan McKinley Priority: Minor Attachments: SOLR-793-commitWithin.patch, SOLR-793-commitWithin.patch Currently there are two options for how to handle commiting documents: 1. the client explicitly starts the commit via commit/ 2. set an auto commit value on the server -- clients can assume all documents will be commited within that time. However, this does not help in the case where the clients know what documents need updating quickly and others that could wait. I suggest adding: {code:xml} add commitWithin=100... {/code:xml} to the update syntax so the client can schedule commits explicitly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Setting Fix Version in JIRA
On 23-Sep-08, at 12:33 PM, Otis Gospodnetic wrote: Hi, When people add new issues to JIRA they most often don't set the Fix Version field. Would it not be better to have a default value for that field, so that new entries don't get forgotten when we filter by Fix Version looking for issues to fix for the next release? If every issue had Fix Version set we'd be able to schedule things better, give reporters and others more insight into when a particular item will be taken care of, etc. When we are ready for the release we'd just bump all unresolved issues to the next planned version (e.g. Solr 1.3.1 or 1.4 or Lucene 2.4 or 2.9) -1 I doesn't make sense to automatically schedule something to be fixed in the next version of the product. I would be +1 on automatically setting the fix version for the current unreleased version when an issue is resolved as fixed, though. -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Setting Fix Version in JIRA
On 23-Sep-08, at 12:33 PM, Otis Gospodnetic wrote: Hi, When people add new issues to JIRA they most often don't set the Fix Version field. Would it not be better to have a default value for that field, so that new entries don't get forgotten when we filter by Fix Version looking for issues to fix for the next release? If every issue had Fix Version set we'd be able to schedule things better, give reporters and others more insight into when a particular item will be taken care of, etc. When we are ready for the release we'd just bump all unresolved issues to the next planned version (e.g. Solr 1.3.1 or 1.4 or Lucene 2.4 or 2.9) -1 I doesn't make sense to automatically schedule something to be fixed in the next version of the product. I would be +1 on automatically setting the fix version for the current unreleased version when an issue is resolved as fixed, though. -Mike
Re: Solr 1.3.0 Release Lessons Learned
On 22-Sep-08, at 10:34 AM, Shalin Shekhar Mangar wrote: I'd like to propose a more pro-active approach to release planning by the community. At any given time, let's have two versions in JIRA. Only those issues which a committer has assigned to himself should be in the first un-released version. All unassigned issues must be kept in the second un-released version. If a committer assigns and promotes an issue to the first un-released version, he should feel confident enough to resolve the issue one way or another within 3 months of the last release else he should mark it for the second version. At any given time, anybody can call a vote on releasing with the trunk features. If we feel confident enough and the list of resolved issues substantial enough, we can work according to our current way of release planning (deferring open issues, creating a branch, prioritizing bugs, putting up an RC and then release). I think that this is the right approach, but I don't think that it needs to be that complicated. For issues without the expectation of completion that you mention, it is fine to just not assign a version to the issue. It _would_ be useful, OTOH, to have a 2.0 version in JIRA for issues we know won't be resolved back-compatibly. -Mike
[jira] Commented: (SOLR-216) Improvements to solr.py
[ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12629981#action_12629981 ] Mike Klaas commented on SOLR-216: - That's great! Be sure to update http://wiki.apache.org/solr/SolPython as the project progresses. Improvements to solr.py --- Key: SOLR-216 URL: https://issues.apache.org/jira/browse/SOLR-216 Project: Solr Issue Type: Improvement Components: clients - python Affects Versions: 1.2 Reporter: Jason Cater Assignee: Mike Klaas Priority: Trivial Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py I've taken the original solr.py code and extended it to include higher-level functions. * Requires python 2.3+ * Supports SSL (https://) schema * Conforms (mostly) to PEP 8 -- the Python Style Guide * Provides a high-level results object with implicit data type conversion * Supports batching of update commands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Created: (SOLR-766) Remove python client from 1.3 distribution
Remove python client from 1.3 distribution -- Key: SOLR-766 URL: https://issues.apache.org/jira/browse/SOLR-766 Project: Solr Issue Type: Task Components: clients - python Affects Versions: 1.3 Reporter: Mike Klaas Assignee: Mike Klaas Priority: Blocker Fix For: 1.3 see solr-dev thread: http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200809.mbox/[EMAIL PROTECTED] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-766) Remove python client from 1.3 distribution
[ https://issues.apache.org/jira/browse/SOLR-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12630004#action_12630004 ] Mike Klaas commented on SOLR-766: - JIRA seems to be not allowing me to upload a patch. Here is the text of the proposed README: Note: As of version 1.3, Solr no longer comes bundled with a Python client. The existing client was not sufficiently maintained or tested as development of Solr progressed, and committers felt that the code was not up to our usual high standards of release. The client bundled with previous versions of Solr will continue to be available indefinitely at: http://svn.apache.org/viewvc/lucene/solr/tags/release-1.2.0/client/python/ Please see http://wiki.apache.org/solr/SolPython for information on third-party Solr python clients. Remove python client from 1.3 distribution -- Key: SOLR-766 URL: https://issues.apache.org/jira/browse/SOLR-766 Project: Solr Issue Type: Task Components: clients - python Affects Versions: 1.3 Reporter: Mike Klaas Assignee: Mike Klaas Priority: Blocker Fix For: 1.3 see solr-dev thread: http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200809.mbox/[EMAIL PROTECTED] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-766) Remove python client from 1.3 distribution
[ https://issues.apache.org/jira/browse/SOLR-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-766: Attachment: SOLR-766.patch Remove python client from 1.3 distribution -- Key: SOLR-766 URL: https://issues.apache.org/jira/browse/SOLR-766 Project: Solr Issue Type: Task Components: clients - python Affects Versions: 1.3 Reporter: Mike Klaas Assignee: Mike Klaas Priority: Blocker Fix For: 1.3 Attachments: SOLR-766.patch see solr-dev thread: http://mail-archives.apache.org/mod_mbox/lucene-solr-dev/200809.mbox/[EMAIL PROTECTED] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr's use of Lucene's Compression field
Agreed. It was the simplest thing to do at the time, but it would definitely be preferrable to offer the much faster lesser levels of compression. -Mike On 3-Sep-08, at 8:57 AM, Grant Ingersoll wrote: Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception , it occurred to me that we probably should refactor Solr's offering of compression. Currently, we rely on Field.COMPRESS from Lucene, but this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878 , because it only offers the highest level of compression, which is also the slowest. Obviously, Solr needs to handle the compression on the server side. I think we should have Solr do the compression, allowing users to set the level of compression (maybe even make it pluggable to put in your own compression techniques) and then just use Lucene's binary field capability. Granted, this is lower priority since I doubt many people use compression to begin with, but, still it would be useful. -Grant
Re: Solr's use of Lucene's Compression field
Also I see that another Lucene bug (LUCENE-1374) was found relating to compressed fields in lucene (when we first added compressed field support to solr a lucene bug involving lazy-loaded fields and compression was uncovered, too). It would be good to change the implementation simply to avoid relying on a deprecated lucene feature that isn't well exercised in development. -Mike On 3-Sep-08, at 11:36 AM, Mike Klaas wrote: Agreed. It was the simplest thing to do at the time, but it would definitely be preferrable to offer the much faster lesser levels of compression. -Mike On 3-Sep-08, at 8:57 AM, Grant Ingersoll wrote: Thinking about http://lucene.markmail.org/message/mef4cdo7m3s6i3fc?q=background+merge+exception , it occurred to me that we probably should refactor Solr's offering of compression. Currently, we rely on Field.COMPRESS from Lucene, but this really isn't considered best practice, see http://www.nabble.com/Need-Lucene-Compression-helpcan-pay-nominal-fee-to11001907.html#a11013878 , because it only offers the highest level of compression, which is also the slowest. Obviously, Solr needs to handle the compression on the server side. I think we should have Solr do the compression, allowing users to set the level of compression (maybe even make it pluggable to put in your own compression techniques) and then just use Lucene's binary field capability. Granted, this is lower priority since I doubt many people use compression to begin with, but, still it would be useful. -Grant
[jira] Commented: (SOLR-739) Add support for OmitTf
[ https://issues.apache.org/jira/browse/SOLR-739?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12627049#action_12627049 ] Mike Klaas commented on SOLR-739: - Haven't looked at the patch, but defaulting to omitTf=true is backwards-incompatible (think multi-valued string fields) Add support for OmitTf -- Key: SOLR-739 URL: https://issues.apache.org/jira/browse/SOLR-739 Project: Solr Issue Type: New Feature Reporter: Mark Miller Priority: Minor Fix For: 1.4 Attachments: SOLR-739.patch Allow setting omitTf in the field schema. Default to true for all but text fields. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: 1.3 status
+1 for 1.3 RC. The idea of putting new issues in 1.3.1 has been tossed around a few times on this list in the last few weeks. I'm not sure how other people feel about this, but in my mind, 1.X.Y and 1.X.Z releases should be feature-identical, with later releases only containing bugfixes. If we have a bunch of cool features we want to release shortly, I'd be happy with releasing 1.4 quickly :) -Mike On 25-Aug-08, at 7:30 AM, Shalin Shekhar Mangar wrote: +1 for Lucene upgrade +1 for a release candidate. I think the newer issues can make it to 1.3.1 easily. We don't need to halt 1.3 for them. A general question -- how long does a Release Candidate phase lasts? On Mon, Aug 25, 2008 at 7:51 PM, Otis Gospodnetic [EMAIL PROTECTED] wrote: +1 for Lucene upgrade +1 for a release (I *think* none of the recent SOLR-7** issues have to go in 1.3) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Erik Hatcher [EMAIL PROTECTED] To: solr-dev@lucene.apache.org Sent: Monday, August 25, 2008 10:06:46 AM Subject: Re: 1.3 status On Aug 25, 2008, at 9:48 AM, Yonik Seeley wrote: Given that there are backward compat concerns with https://issues.apache.org/jira/browse/LUCENE-1142 perhaps we should update Lucene again before a release? +1 Erik -- Regards, Shalin Shekhar Mangar.
Re: [jira] Closed: (LUCENE-1363) sub task of reopen performance
Wow, that was a fast resolution to this issue :) -Mike On 22-Aug-08, at 12:46 AM, F.Y. (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] F.Y. closed LUCENE-1363. Resolution: Fixed sub task of reopen performance -- Key: LUCENE-1363 URL: https://issues.apache.org/jira/browse/LUCENE-1363 Project: Lucene - Java Issue Type: Sub-task Environment: win Reporter: F.Y. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (SOLR-474) audit docs for Spellchecker
[ https://issues.apache.org/jira/browse/SOLR-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12622677#action_12622677 ] Mike Klaas commented on SOLR-474: - The issue is more wikidocs vs. behaviour. I apologize I haven't gotten to this yet--I've been suffering from RSI the last month or so and it has been difficult to get it non-work computer time. I'll take a look today. audit docs for Spellchecker --- Key: SOLR-474 URL: https://issues.apache.org/jira/browse/SOLR-474 Project: Solr Issue Type: Task Affects Versions: 1.3 Reporter: Hoss Man Assignee: Mike Klaas Fix For: 1.3 according to this troubling comment from Mike, the spellchecker handler javadocs (and wiki) may not reflect reality... http://www.nabble.com/spellcheckhandler-to14627712.html#a14627712 {quote} Multi-word spell checking is available only with extendedResults=true, and only in trunk. I believe that the current javadocs are incorrect on this point. {quote} we should audit/fix this before 1.3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-474) audit docs for Spellchecker
[ https://issues.apache.org/jira/browse/SOLR-474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas resolved SOLR-474. - Resolution: Fixed I've verified the behaviour and updated the wiki page accordingly. audit docs for Spellchecker --- Key: SOLR-474 URL: https://issues.apache.org/jira/browse/SOLR-474 Project: Solr Issue Type: Task Affects Versions: 1.3 Reporter: Hoss Man Assignee: Mike Klaas Fix For: 1.3 according to this troubling comment from Mike, the spellchecker handler javadocs (and wiki) may not reflect reality... http://www.nabble.com/spellcheckhandler-to14627712.html#a14627712 {quote} Multi-word spell checking is available only with extendedResults=true, and only in trunk. I believe that the current javadocs are incorrect on this point. {quote} we should audit/fix this before 1.3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-216) Improvements to solr.py
[ https://issues.apache.org/jira/browse/SOLR-216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12622391#action_12622391 ] Mike Klaas commented on SOLR-216: - Hi Dariusz, There will almost certainly be no more releases of Solr 1.2. 1.3 will likely be released in less than a month. However, it is good that you published this code so that it can be found by other parties. I'd be much more interested in working toward a client that is compatible with the upcoming 1.3 release (it is unlikely that it can be included, but it can be distributed separately). cheers, -Mike Improvements to solr.py --- Key: SOLR-216 URL: https://issues.apache.org/jira/browse/SOLR-216 Project: Solr Issue Type: Improvement Components: clients - python Affects Versions: 1.2 Reporter: Jason Cater Assignee: Mike Klaas Priority: Trivial Attachments: solr-solrpy-r5.patch, solr.py, solr.py, solr.py, solr.py, test_all.py I've taken the original solr.py code and extended it to include higher-level functions. * Requires python 2.3+ * Supports SSL (https://) schema * Conforms (mostly) to PEP 8 -- the Python Style Guide * Provides a high-level results object with implicit data type conversion * Supports batching of update commands -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: ClientUtils escape query
Wouldn't you want to reverse all escaping in that case anyway? -Mike On 5-Aug-08, at 1:45 PM, Grant Ingersoll wrote: It's mainly a problem when one wants to display the thing later, I guess. -Grant On Aug 5, 2008, at 4:16 PM, Ryan McKinley wrote: That came after I spent a week increasing the list of things that need escaped one at a time (waiting for errors along the way...) Erik suggested I look at how the ruby client handles it... and I haven't seen any problem since them. Is there any problem with over escaping? I know it makes some things look funny. Perhaps there is a regex that will do any non- letter except ryan On Aug 5, 2008, at 8:28 AM, Grant Ingersoll wrote: ClientUtils.escapeQueryChars seems a bit aggressive to me in terms of what it escapes. It references http://lucene.apache.org/java/docs/queryparsersyntax.html#Escaping Special Characters, but doesn't explicitly escape them, instead opting for the more general \W regex. Thus, I'm noticing that chars that don't need to be escaped ( like / ) are being escaped. Anyone recall why this is? I suppose the problem comes in when one considers other query parsers, but maybe we should just mark this one as explicitly for use w/ the Lucene QP? -Grant
Re: AutoCommitTest
On 5-Aug-08, at 3:32 PM, Yonik Seeley wrote: AutoCommitTest was failing for me a good percentage of the time... the comment suggested that adding another doc after the commit callback would block until the new searcher was registered. But that's not the case. I've hacked the test for now to just sleep(500) after the commit callback. Fair enough. It is difficult for me to fix this more permenently, since I can't get it to fail on local machines. I deleted a bunch of email recently so I checked nabble--it seems that in the last month that AutoCommitTest has failed once in Hudson (July 21) and once in the apache build (August 2). That isn't too bad, but I hope that your change eliminates those entirely. -Mike
Re: [jira] Issue Comment Edited: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost
On 29-Jul-08, at 3:20 AM, Andrew Savory wrote: Actually I'd argue that all such technical discussion would be better done on the mailing list rather than through JIRA. Mail clients are designed for threaded discussions far better than JIRA's web GUI. And JIRA's posting back to the list with bq. makes most responses impossible to follow. Excessive use of JIRA feels like a community antipattern to me. +1 -Mike
[jira] Commented: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost
[ https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617512#action_12617512 ] Mike Klaas commented on SOLR-665: - I haven't looked at the proposed code at all, but it _is_ possible to design this kind of datastructure, with much care: http://www.ddj.com/hpc-high-performance-computing/208801974 FIFO Cache (Unsynchronized): 9x times performance boost --- Key: SOLR-665 URL: https://issues.apache.org/jira/browse/SOLR-665 Project: Solr Issue Type: Improvement Affects Versions: 1.3 Environment: JRockit R27 (Java 6) Reporter: Fuad Efendi Attachments: FIFOCache.java Original Estimate: 672h Remaining Estimate: 672h Attached is modified version of LRUCache where 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that reordering/true (performance bottleneck of LRU) is replaced to insertion-order/false (so that it became FIFO) 2. Almost all (absolutely unneccessary) synchronized statements commented out See discussion at http://www.nabble.com/LRUCache---synchronized%21--td16439831.html Performance metrics (taken from SOLR Admin): LRU Requests: 7638 Average Time-Per-Request: 15300 Average Request-per-Second: 0.06 FIFO: Requests: 3355 Average Time-Per-Request: 1610 Average Request-per-Second: 0.11 Performance increased 9 times which roughly corresponds to a number of CPU in a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org) Current number of documents: 7494689 name: filterCache class:org.apache.solr.search.LRUCache version: 1.0 description: LRU Cache(maxSize=1000, initialSize=1000) stats:lookups : 15966954582 hits : 16391851546 hitratio : 0.102 inserts : 4246120 evictions : 0 size : 2668705 cumulative_lookups : 16415839763 cumulative_hits : 16411608101 cumulative_hitratio : 0.99 cumulative_inserts : 4246246 cumulative_evictions : 0 Thanks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-665) FIFO Cache (Unsynchronized): 9x times performance boost
[ https://issues.apache.org/jira/browse/SOLR-665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617549#action_12617549 ] Mike Klaas commented on SOLR-665: - [quote]We may simply use java.util.concurrent.locks instead of heavy synchronized... we may also use Executor framework instead of single-thread faceting... We may even base SOLR on Apache MINA project.[/quote] Simply replacing synchronized with java.util.concurrent.locks doesn't increase performance. There needs to be a specific strategy for employing these locks in a way that makes sense. For instance, one idea would be to create a read/write lock with the put()'s covered by write and get()'s covered by read. This would allow multiple parallel reads and will be thread-safe. Another is to create something like ConcurrentLinkedHashMap. These strategies should be tested before trying to create a lock-free get() version, which if even possible, would rely deeply on the implementation (such a structure would have to be created from scratch, I believe). I'd expect anyone that is able to create such a thing be familiar enough wiht memory barriers and such issues to be able to deeply explain the problems with double-checked locking off the top of their head (and immediately see such problems in other code) FIFO Cache (Unsynchronized): 9x times performance boost --- Key: SOLR-665 URL: https://issues.apache.org/jira/browse/SOLR-665 Project: Solr Issue Type: Improvement Affects Versions: 1.3 Environment: JRockit R27 (Java 6) Reporter: Fuad Efendi Attachments: FIFOCache.java Original Estimate: 672h Remaining Estimate: 672h Attached is modified version of LRUCache where 1. map = new LinkedHashMap(initialSize, 0.75f, false) - so that reordering/true (performance bottleneck of LRU) is replaced to insertion-order/false (so that it became FIFO) 2. Almost all (absolutely unneccessary) synchronized statements commented out See discussion at http://www.nabble.com/LRUCache---synchronized%21--td16439831.html Performance metrics (taken from SOLR Admin): LRU Requests: 7638 Average Time-Per-Request: 15300 Average Request-per-Second: 0.06 FIFO: Requests: 3355 Average Time-Per-Request: 1610 Average Request-per-Second: 0.11 Performance increased 9 times which roughly corresponds to a number of CPU in a system, http://www.tokenizer.org/ (Shopping Search Engine at Tokenizer.org) Current number of documents: 7494689 name: filterCache class:org.apache.solr.search.LRUCache version: 1.0 description: LRU Cache(maxSize=1000, initialSize=1000) stats:lookups : 15966954582 hits : 16391851546 hitratio : 0.102 inserts : 4246120 evictions : 0 size : 2668705 cumulative_lookups : 16415839763 cumulative_hits : 16411608101 cumulative_hitratio : 0.99 cumulative_inserts : 4246246 cumulative_evictions : 0 Thanks -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-474) audit docs for Spellchecker
[ https://issues.apache.org/jira/browse/SOLR-474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12617580#action_12617580 ] Mike Klaas commented on SOLR-474: - I will look at this before release. audit docs for Spellchecker --- Key: SOLR-474 URL: https://issues.apache.org/jira/browse/SOLR-474 Project: Solr Issue Type: Task Affects Versions: 1.3 Reporter: Hoss Man Assignee: Mike Klaas Fix For: 1.3 according to this troubling comment from Mike, the spellchecker handler javadocs (and wiki) may not reflect reality... http://www.nabble.com/spellcheckhandler-to14627712.html#a14627712 {quote} Multi-word spell checking is available only with extendedResults=true, and only in trunk. I believe that the current javadocs are incorrect on this point. {quote} we should audit/fix this before 1.3 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-139) Support updateable/modifiable documents
[ https://issues.apache.org/jira/browse/SOLR-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12616729#action_12616729 ] Mike Klaas commented on SOLR-139: - [quote]David - storing all data in the search index can be a problem because it can get BIG. Imagine if nutch stored the raw content in the lucene index? (I may be wrong on this) even with Lazy loading, there is a query time cost to having stored fields.[/quote] Splitting it out into another store is much better at scale. A distinct lucene index works relatively well. Support updateable/modifiable documents --- Key: SOLR-139 URL: https://issues.apache.org/jira/browse/SOLR-139 Project: Solr Issue Type: New Feature Components: update Reporter: Ryan McKinley Assignee: Ryan McKinley Attachments: Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, Eriks-ModifiableDocument.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, getStoredFields.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-IndexDocumentCommand.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-ModifyInputDocuments.patch, SOLR-139-XmlUpdater.patch, SOLR-269+139-ModifiableDocumentUpdateProcessor.patch It would be nice to be able to update some fields on a document without having to insert the entire document. Given the way lucene is structured, (for now) one can only modify stored fields. While we are at it, we can support incrementing an existing value - I think this only makes sense for numbers. for background, see: http://www.nabble.com/loading-many-documents-by-ID-tf3145666.html#a8722293 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Defining properties/using expressions in {multicore, config, schema} files
On 21-Jul-08, at 10:48 AM, Henrib wrote: I posted a new patch in solr-350 (solr-350-properties.patch) that allows defining properties in multicore.xml and using them in expressions in config schema files. This brings a lot of flexibility to configuration. I apologize for doubling the JIRA post; Solr-350 being closed, I just wanted to ensure anyone interested in the feature could try/comment/review/ etc. Perhaps opening a new issue would be best? cheers, -Mike
Re: Welcom Shalin Shekhar Mangar
Welcome aboard, Shalin! -Mike On 19-Jul-08, at 12:01 PM, Shalin Shekhar Mangar wrote: Thanks! I work at AOL in Bangalore as part of a small team which gets to work on a variety of (very cool!) stuff. Though my involvement started when we decided to contribute part of our work to Solr (DataImportHandler), it soon became a personal passion and has remained so since. AOL continues to encourage and support me for which I'm thankful. I'm very happy to be a part of this community and I'm looking forward to working more closely with you all. On Sat, Jul 19, 2008 at 1:12 AM, Grant Ingersoll [EMAIL PROTECTED] wrote: I am pleased to announce that the Lucene PMC has named Shalin Shekhar Mangar as a Solr committer. Shalin has already contributed numerous patches to the community as well as answers and help on the user list. Shalin, tradition has it that new committers introduce themselves a little bit, so feel free to drop a note about where you work, etc. if you are so inclined. Thanks, Grant -- Regards, Shalin Shekhar Mangar.
[jira] Updated: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
[ https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-610: Fix Version/s: 1.3 Add support for hl.maxAnalyzedChars=-1 to highlight the whole field --- Key: SOLR-610 URL: https://issues.apache.org/jira/browse/SOLR-610 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Assignee: Mike Klaas Priority: Minor Fix For: 1.3 Attachments: SOLR-556-610.patch, SOLR-610-maxanalyzed.patch Add support for specifying negative values for the hl.maxAnalyzedChars parameter to be able highlight the whole field without having to know its size. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
[ https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas resolved SOLR-610. - Resolution: Fixed commited. Thanks lars! Add support for hl.maxAnalyzedChars=-1 to highlight the whole field --- Key: SOLR-610 URL: https://issues.apache.org/jira/browse/SOLR-610 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Assignee: Mike Klaas Priority: Minor Fix For: 1.3 Attachments: SOLR-556-610.patch, SOLR-610-maxanalyzed.patch Add support for specifying negative values for the hl.maxAnalyzedChars parameter to be able highlight the whole field without having to know its size. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values
[ https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas resolved SOLR-556. - Resolution: Fixed committed as part of SOLR-610. thanks Lars! Highlighting of multi-valued fields returns snippets which span multiple different values - Key: SOLR-556 URL: https://issues.apache.org/jira/browse/SOLR-556 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Assignee: Mike Klaas Priority: Minor Fix For: 1.3 Attachments: SOLR-556-highlight-multivalued.patch, solr-highlight-multivalued-example.xml When highlighting multi-valued fields, the highlighter sometimes returns snippets which span multiple values, e.g. with values foo and bar and search term ba the highlighter will create the snippet fooemba/emr. Furthermore it sometimes returns smaller snippets than it should, e.g. with value foobar and search term oo it will create the snippet emoo/em regardless of hl.fragsize. I have been unable to determine the real cause for this, or indeed what actually goes on at all. To reproduce the problem, I've used the following steps: * create an index with multi-valued fields, one document should have at least 3 values for these fields (in my case strings of length between 5 and 15 Japanese characters -- as far as I can tell plain old ASCII should produce the same effect though) * search for part of a value in such a field with highlighting enabled, the additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, hl.mergeContiguous=true (changing the parameters does not seem to have any effect on the result though) * highlighted snippets should show effects described above -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
[ https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas reassigned SOLR-610: --- Assignee: Mike Klaas Add support for hl.maxAnalyzedChars=-1 to highlight the whole field --- Key: SOLR-610 URL: https://issues.apache.org/jira/browse/SOLR-610 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Assignee: Mike Klaas Priority: Minor Fix For: 1.3 Attachments: SOLR-556-610.patch, SOLR-610-maxanalyzed.patch Add support for specifying negative values for the hl.maxAnalyzedChars parameter to be able highlight the whole field without having to know its size. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-610) Add support for hl.maxAnalyzedChars=-1 to highlight the whole field
[ https://issues.apache.org/jira/browse/SOLR-610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12608878#action_12608878 ] Mike Klaas commented on SOLR-610: - Hi Lars, I was planning on commiting SOLR-556. Would you rather I commit that first, or to produce a unified patch instead? -Mike Add support for hl.maxAnalyzedChars=-1 to highlight the whole field --- Key: SOLR-610 URL: https://issues.apache.org/jira/browse/SOLR-610 Project: Solr Issue Type: New Feature Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Priority: Minor Attachments: SOLR-610-maxanalyzed.patch Add support for specifying negative values for the hl.maxAnalyzedChars parameter to be able highlight the whole field without having to know its size. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: per-field similarity
On 24-Jun-08, at 1:28 PM, Yonik Seeley wrote: Something to consider for Lucene 3 is to have something to retrieve Similarity per-field rather than passing the field name into some functions... +1 I've felt that this was the proper (and more useful) way to do things for a long time (http://markmail.org/message/56bk6wrbwallyjvr) -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Updated: (LUCENE-1314) IndexReader.reopen(boolean force)
On 23-Jun-08, at 10:14 AM, Jason Rutherglen (JIRA) wrote: Does anyone know how to turn off Eclipse automatically changing the import statements? I am not making it reformat but if I edit some code in a file it sees fit to reformat the imports. http://www.google.com/search?q=turn%20off%20eclipse%20changing%20import%20statements I'm running into a problem where Organize Imports is removing all of my import statements. I had to turn off Keep Imports Organized because I noticed that ... -Mike - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: XSS in Solr admin interface
On 19-Jun-08, at 11:17 PM, Nicob wrote: Le jeudi 19 juin 2008 à 19:21 -0700, Mike Klaas a écrit : Fixed in r669766. I checked the patch and it's correctly patching this XSS. Thanks to the dev team ! Thanks for the report! -Mike
Re: XSS in Solr admin interface
On 19-Jun-08, at 5:47 PM, Yonik Seeley wrote: On Thu, Jun 19, 2008 at 7:42 PM, Nicob [EMAIL PROTECTED] wrote: while testing the Solr search engine, I found a XSS vulnerability in its administration interface. I wrote to [EMAIL PROTECTED], but I wonder if this list could be a better place to find a security contact of the Solr project. This is definitely the right list. Is this vulnerability in the current dev version of solr? Fixed in r669766. -Mike
[jira] Commented: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter
[ https://issues.apache.org/jira/browse/SOLR-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605403#action_12605403 ] Mike Klaas commented on SOLR-14: Note that it is very easy to use an external TokenFilter, so you could just cp WDF into your own class and make the changes. (Though I'm not saying that this _shouldn't_ make it in for 1.3) Add the ability to preserve the original term when using WordDelimiterFilter Key: SOLR-14 URL: https://issues.apache.org/jira/browse/SOLR-14 Project: Solr Issue Type: Improvement Components: search Reporter: Richard Trey Hyde Attachments: TokenizerFactory.java, WordDelimiterFilter.patch, WordDelimiterFilter.patch When doing prefix searching, you need to hang on to the original term othewise you'll miss many matches you should be making. Data: ABC-12345 WordDelimiterFitler may change this into ABC 12345 ABC12345 A user may enter a search such as ABC\-123* Which will fail to find a match given the above scenario. The attached patch will allow the use of the preserveOriginal option to WordDelimiterFilter and will analyse as ABC 12345 ABC12345 ABC-12345 in which case we will get a postive match. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-14) Add the ability to preserve the original term when using WordDelimiterFilter
[ https://issues.apache.org/jira/browse/SOLR-14?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12605410#action_12605410 ] Mike Klaas commented on SOLR-14: Also, voting for an issue is a good way to increase its visibility Add the ability to preserve the original term when using WordDelimiterFilter Key: SOLR-14 URL: https://issues.apache.org/jira/browse/SOLR-14 Project: Solr Issue Type: Improvement Components: search Reporter: Richard Trey Hyde Attachments: TokenizerFactory.java, WordDelimiterFilter.patch, WordDelimiterFilter.patch When doing prefix searching, you need to hang on to the original term othewise you'll miss many matches you should be making. Data: ABC-12345 WordDelimiterFitler may change this into ABC 12345 ABC12345 A user may enter a search such as ABC\-123* Which will fail to find a match given the above scenario. The attached patch will allow the use of the preserveOriginal option to WordDelimiterFilter and will analyse as ABC 12345 ABC12345 ABC-12345 in which case we will get a postive match. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values
[ https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603780#action_12603780 ] Mike Klaas commented on SOLR-556: - Thanks for the patch, Lars. I think that the basic approach is sound, though I am a little nervous about the performance implications (especially in the case of phrase highlighting, where we spin up an entirely new spanhighlighter for each value in a multi-valued field). I wonder if I am the only one who highlights large text fields composed of dozens of individual values? Highlighting of multi-valued fields returns snippets which span multiple different values - Key: SOLR-556 URL: https://issues.apache.org/jira/browse/SOLR-556 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Assignee: Mike Klaas Priority: Minor Fix For: 1.3 Attachments: SOLR-556-highlight-multivalued.patch, solr-highlight-multivalued-example.xml When highlighting multi-valued fields, the highlighter sometimes returns snippets which span multiple values, e.g. with values foo and bar and search term ba the highlighter will create the snippet fooemba/emr. Furthermore it sometimes returns smaller snippets than it should, e.g. with value foobar and search term oo it will create the snippet emoo/em regardless of hl.fragsize. I have been unable to determine the real cause for this, or indeed what actually goes on at all. To reproduce the problem, I've used the following steps: * create an index with multi-valued fields, one document should have at least 3 values for these fields (in my case strings of length between 5 and 15 Japanese characters -- as far as I can tell plain old ASCII should produce the same effect though) * search for part of a value in such a field with highlighting enabled, the additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, hl.mergeContiguous=true (changing the parameters does not seem to have any effect on the result though) * highlighted snippets should show effects described above -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values
[ https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12603785#action_12603785 ] Mike Klaas commented on SOLR-556: - Hey Lars, Yeah, I'm talking about highlighting 15kB of text in 100-200 character chunks. Maybe I can whip up a perf test for this soon. The reason we probably see this issue differently is that the incorrect behaviour is quite minor for most users (perhaps a bit of punctuation leaking from value to value at most). Once way to correct what you are seeing is to use a tokenizer that creates tokens out of the CJK characters, or things on boundaries. In your case, inserting a fake token when encountering a right bracket [)] would fix the problem, I think. Nevertheless, I think I will probably end up committing your patch after pondering it some more. Highlighting of multi-valued fields returns snippets which span multiple different values - Key: SOLR-556 URL: https://issues.apache.org/jira/browse/SOLR-556 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Assignee: Mike Klaas Priority: Minor Fix For: 1.3 Attachments: SOLR-556-highlight-multivalued.patch, solr-highlight-multivalued-example.xml When highlighting multi-valued fields, the highlighter sometimes returns snippets which span multiple values, e.g. with values foo and bar and search term ba the highlighter will create the snippet fooemba/emr. Furthermore it sometimes returns smaller snippets than it should, e.g. with value foobar and search term oo it will create the snippet emoo/em regardless of hl.fragsize. I have been unable to determine the real cause for this, or indeed what actually goes on at all. To reproduce the problem, I've used the following steps: * create an index with multi-valued fields, one document should have at least 3 values for these fields (in my case strings of length between 5 and 15 Japanese characters -- as far as I can tell plain old ASCII should produce the same effect though) * search for part of a value in such a field with highlighting enabled, the additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, hl.mergeContiguous=true (changing the parameters does not seem to have any effect on the result though) * highlighted snippets should show effects described above -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Solr Maven Artifacts
As someone who is completely ignorant (and admittedly, somewhat willfully so) of the java enterprise world, I was hoping that someone more savvy in the ways of maven would step in here. It is even unclear to me what having the project in a Maven repository means for people, or why it would be convenient. Based on the link you sent, it seems that a few things are necessary for this to proceed, like a maven project descriptor for Solr (or is that already done?). That said, I'm +1 on steps to better propagate Solr, even if I don't think that I am the best person to effectuate those steps. -Mike On 9-Jun-08, at 12:58 AM, Andrew Savory wrote: Hi, Would any of the solr devs care to comment? It would be extremely useful to have maven artifacts published for those building apps based on Solr 1.2, and it would help prepare the way for releasing Solr 1.3 maven artifacts. 2008/6/5 Andrew Savory [EMAIL PROTECTED]: Hi, 2008/6/4 Andrew Savory [EMAIL PROTECTED]: I see from http://issues.apache.org/jira/browse/SOLR-19 that some tentative work has been done on mavenisation of solr, and from https://issues.apache.org/jira/browse/SOLR-586 that discussion of publishing maven artifacts ... is it possible to push solr 1.2 maven artifacts out to the repo? More specifically, would someone with sufficient privileges (Yonik?) be willing to do the following (from [1]): mkdir -p org.apache.solr/jars grab the solr-1.2 release (or svn co tags/release-1.2.0, but then you need to edit build.xml to update the version string that seems to have accidentally been updated before doing release tag, to change property name=version value=1.2.1-dev /) tar xzvf apache-solr-1.2.0.tar.gz cp apache-solr-1.2.0/dist/apache-solr-1.2.0.jar org.apache.solr/jars/ cd into org.apache.solr/jars and create md5 and sha1 checksums of apache-solr-1.2.0.jar: openssl md5 apache-solr-1.2.0.jar apache-solr-1.2.0.jar.md5 openssl sha apache-solr-1.2.0.jar apache-solr-1.2.0.jar.sha1 sign the release: gpg --armor --output apache-solr-1.2.0.jar.asc --detach-sig apache-solr-1.2.0.jar cd ../ and scp it onto people.apache.org: scp -r org.apache.solr [EMAIL PROTECTED]:/www/ people.apache.org/repo/m1-ibiblio-rsync-repository/ check permissions: cd /www/people.apache.org/repo/m1-ibiblio-rsync-repository/ org.apache.solr chgrp -R apcvs * chmod -R g+w * I could do it but I suspect that would be overstepping the bounds of a non-committer :-) This will make it easier for anyone to use solr from within maven. I'll file a patch to automate whatever can be automated from our ant build so this is easier for the 1.3 release. If people agree that publishing maven artifacts is a good idea, I'll happily update http://wiki.apache.org/solr/HowToRelease to point to the relevant information too. [1] http://www.apache.org/dev/release-publishing.html#maven-repo Andrew. -- [EMAIL PROTECTED] / [EMAIL PROTECTED] http://www.andrewsavory.com/
[jira] Commented: (SOLR-536) Automatic binding of results to Beans (for solrj)
[ https://issues.apache.org/jira/browse/SOLR-536?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602744#action_12602744 ] Mike Klaas commented on SOLR-536: - This is expensive private final MapClass, ListDocField infocache = Collections.synchronizedMap( new HashMapClass, ListDocField() ); Let us make it private final MapClass, ListDocField infocache = new ConcurrentHashMapClass, ListDocField() ; Expensive? I'd expect the synchronizedMap to be faster and more memory compact. The ConcurrentHashMap is definitely more concurrent, though. Automatic binding of results to Beans (for solrj) - Key: SOLR-536 URL: https://issues.apache.org/jira/browse/SOLR-536 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.3 Reporter: Noble Paul Assignee: Ryan McKinley Priority: Minor Fix For: 1.3 Attachments: SOLR-536.patch, SOLR-536.patch, SOLR-536.patch as we are using java5 .we can use annotations to bind SolrDocument to java beans directly. This can make the usage of solrj a bit simpler The QueryResponse class in solrj can have an extra method as follows public T ListT getResultBeans(ClassT klass) and the bean can have annotations as class MyBean{ @Field(id) //name is optional String id; @Field(category) ListString categories } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-572) Spell Checker as a Search Component
[ https://issues.apache.org/jira/browse/SOLR-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602828#action_12602828 ] Mike Klaas commented on SOLR-572: - [quote]Another use case is where Solr is used with indices that are not indices for a narrow domain or that don't have nice, clean, short fields that can be used for populating the SC index. For example, if the index consists of a pile of web pages, I don't think I'd want to use their data (not even their titles) to populate the SC index. I'd really want just a plain dictionary-powered SCRH.[/quote] It works great, actually. That was you get all the abbreviations, jargon, proper names, etc. Thresholding help prevent most of the cruft from appearing in the index. Spell Checker as a Search Component --- Key: SOLR-572 URL: https://issues.apache.org/jira/browse/SOLR-572 Project: Solr Issue Type: New Feature Components: spellchecker Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Assignee: Grant Ingersoll Priority: Minor Fix For: 1.3 Attachments: SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch, SOLR-572.patch Expose the Lucene contrib SpellChecker as a Search Component. Provide the following features: * Allow creating a spell index on a given field and make it possible to have multiple spell indices -- one for each field * Give suggestions on a per-field basis * Given a multi-word query, give only one consistent suggestion * Process the query with the same analyzer specified for the source field and process each token separately * Allow the user to specify minimum length for a token (optional) Consistency criteria for a multi-word query can consist of the following: * Preserve the correct words in the original query as it is * Never give duplicate words in a suggestion -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-284: Affects Version/s: (was: 1.3) Removing from 1.3. No committer has taken ownership. (It might make sense as a contrib, but I can see the argument for not duplicating tika) Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Reporter: Eric Pugh Fix For: 1.3 Attachments: libs.zip, rich.patch, rich.patch, rich.patch, rich.patch, source.zip, test-files.zip, test-files.zip, test.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-435) QParser must validate existance/absense of q parameter
[ https://issues.apache.org/jira/browse/SOLR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-435: Fix Version/s: (was: 1.3) QParser must validate existance/absense of q parameter Key: SOLR-435 URL: https://issues.apache.org/jira/browse/SOLR-435 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Ryan McKinley Each QParser should check if q exists or not. For some it will be required others not. currently it throws a null pointer: {code} java.lang.NullPointerException at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:36) at org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104) at org.apache.solr.search.QParser.getQuery(QParser.java:80) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:67) at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:150) ... {code} see: http://www.nabble.com/query-parsing-error-to14124285.html#a14140108 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-433) MultiCore and SpellChecker replication
[ https://issues.apache.org/jira/browse/SOLR-433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-433: Fix Version/s: (was: 1.3) MultiCore and SpellChecker replication -- Key: SOLR-433 URL: https://issues.apache.org/jira/browse/SOLR-433 Project: Solr Issue Type: Improvement Components: replication, spellchecker Affects Versions: 1.3 Reporter: Otis Gospodnetic Attachments: RunExecutableListener.patch, solr-433.patch, spellindexfix.patch With MultiCore functionality coming along, it looks like we'll need to be able to: A) snapshot each core's index directory, and B) replicate any and all cores' complete data directories, not just their index directories. Pulled from the spellchecker and multi-core index replication thread - http://markmail.org/message/pj2rjzegifd6zm7m Otis: I think that makes sense - distribute everything for a given core, not just its index. And the spellchecker could then also have its data dir (and only index/ underneath really) and be replicated in the same fashion. Right? Ryan: Yes, that was my thought. If an arbitrary directory could be distributed, then you could have /path/to/dist/index/... /path/to/dist/spelling-index/... /path/to/dist/foo and that would all get put into a snapshot. This would also let you put multiple cores within a single distribution: /path/to/dist/core0/index/... /path/to/dist/core0/spelling-index/... /path/to/dist/core0/foo /path/to/dist/core1/index/... /path/to/dist/core1/spelling-index/... /path/to/dist/core1/foo -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-351) external value source
[ https://issues.apache.org/jira/browse/SOLR-351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-351: Fix Version/s: (was: 1.3) external value source - Key: SOLR-351 URL: https://issues.apache.org/jira/browse/SOLR-351 Project: Solr Issue Type: New Feature Components: search Reporter: Yonik Seeley Attachments: ExternalFileField.patch Need a way to rapidly do a bulk update of a single field for use as a component in a function query (no need to be able to search on it). Idea: create an ExternalValueSource fieldType that reads it's values from a file. The file could be simple id,val records, and stored in the index directory so it would get replicated. Values could optionally be updated more often than the searcher (hashCode/equals should take this into account to prevent caching issues). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-284) Parsing Rich Document Types
[ https://issues.apache.org/jira/browse/SOLR-284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-284: Fix Version/s: (was: 1.3) Parsing Rich Document Types --- Key: SOLR-284 URL: https://issues.apache.org/jira/browse/SOLR-284 Project: Solr Issue Type: New Feature Components: update Reporter: Eric Pugh Attachments: libs.zip, rich.patch, rich.patch, rich.patch, rich.patch, source.zip, test-files.zip, test-files.zip, test.zip I have developed a RichDocumentRequestHandler based on the CSVRequestHandler that supports streaming a PDF, Word, Powerpoint, Excel, or PDF document into Solr. There is a wiki page with information here: http://wiki.apache.org/solr/UpdateRichDocuments -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-484) Solr Website changes
[ https://issues.apache.org/jira/browse/SOLR-484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-484: Fix Version/s: (was: 1.3) Solr Website changes Key: SOLR-484 URL: https://issues.apache.org/jira/browse/SOLR-484 Project: Solr Issue Type: Bug Components: documentation Reporter: Grant Ingersoll Priority: Minor In looking at the Solr website it has many of the same issues that Lucene Java did when it comes to ASF policies about nightly builds, etc. concerning the Javadocs See http://lucene.markmail.org/message/a7k7kujxkhwjwfy6?q=nightly+developer+releases+list:org%2Eapache%2Elucene%2Ejava-dev+from:%22Doug+Cutting+(JIRA)%22page=1 and http://lucene.markmail.org/message/vaks6omed4l6buth?q=nightly+developer+releases+list:org%2Eapache%2Elucene%2Ejava-dev+from:%22Doug+Cutting+(JIRA)%22page=1 This would suggest a change like Hadoop and Lucene Java did to separate out the main site, release docs (javadocs, any other?) and developer resources. Currently the javadocs on the main page are the nightly and should be made less prominent. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-84) New Solr logo?
[ https://issues.apache.org/jira/browse/SOLR-84?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-84: --- Fix Version/s: (was: 1.3) New Solr logo? -- Key: SOLR-84 URL: https://issues.apache.org/jira/browse/SOLR-84 Project: Solr Issue Type: Improvement Reporter: Bertrand Delacretaz Priority: Minor Attachments: logo-grid.jpg, logo-solr-d.jpg, logo-solr-e.jpg, logo-solr-source-files-take2.zip, solr-84-source-files.zip, solr-f.jpg, solr-logo-20061214.jpg, solr-logo-20061218.JPG, solr-logo-20070124.JPG, solr-nick.gif, solr.jpg, sslogo-solr-flare.jpg, sslogo-solr.jpg, sslogo-solr2-flare.jpg, sslogo-solr2.jpg, sslogo-solr3.jpg Following up on SOLR-76, our trainee Nicolas Barbay (nicolas (put at here) sarraux-dessous.ch) has reworked his logo proposal to be more solar. This can either be the start of a logo contest, or if people like it we could adopt it. The gradients can make it a bit hard to integrate, not sure if this is really a problem. WDYT? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-410) Audit the new ResponseBuilder class
[ https://issues.apache.org/jira/browse/SOLR-410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602834#action_12602834 ] Mike Klaas commented on SOLR-410: - Ryan, can this be closed? Audit the new ResponseBuilder class --- Key: SOLR-410 URL: https://issues.apache.org/jira/browse/SOLR-410 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.3 Reporter: Ryan McKinley Fix For: 1.3 In SOLR-281, we added a ResponseBuilder class to help search components communicate with one another. Before releasing 1.3, we need to make sure this is the best design and that it is an interface we can support in the future. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-243) Create a hook to allow custom code to create custom IndexReaders
[ https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-243: Do we still want to target 1.3 here? (Seems like there is a lot to do before it is commit-worthy, based on the comments) Create a hook to allow custom code to create custom IndexReaders Key: SOLR-243 URL: https://issues.apache.org/jira/browse/SOLR-243 Project: Solr Issue Type: Improvement Components: search Environment: Solr core Reporter: John Wang Assignee: Hoss Man Fix For: 1.3 Attachments: indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch I have a customized IndexReader and I want to write a Solr plugin to use my derived IndexReader implementation. Currently IndexReader instantiation is hard coded to be: IndexReader.open(path) It would be really useful if this is done thru a plugable factory that can be configured, e.g. IndexReaderFactory interface IndexReaderFactory{ IndexReader newReader(String name,String path); } the default implementation would just return: IndexReader.open(path) And in the newSearcher and getSearcher methods in SolrCore class can call the current factory implementation to get the IndexReader instance and then build the SolrIndexSearcher by passing in the reader. It would be really nice to add this improvement soon (This seems to be a trivial addition) as our project really depends on this. Thanks -John -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-545) remove MultiCore default core / cleanup DispatchHandlera
[ https://issues.apache.org/jira/browse/SOLR-545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas reassigned SOLR-545: --- Assignee: Ryan McKinley assigning 1.3 multicore stuff to Ryan remove MultiCore default core / cleanup DispatchHandlera --- Key: SOLR-545 URL: https://issues.apache.org/jira/browse/SOLR-545 Project: Solr Issue Type: Bug Affects Versions: 1.3 Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 1.3 MultiCore should require a core name in the URL. If the core name is missing, there should be a 404, not a valid core. That is: http://localhost:8983/solr/select?q=*:* should return 404. While we are at it, we should cleanup the DispatchHandler. Perhaps the best approach is to treat single core as multicore with only one core? As is the tangle of potential paths is ugly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Assigned: (SOLR-489) Added @deprecation Javadoc comments
[ https://issues.apache.org/jira/browse/SOLR-489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas reassigned SOLR-489: --- Assignee: Mike Klaas Added @deprecation Javadoc comments --- Key: SOLR-489 URL: https://issues.apache.org/jira/browse/SOLR-489 Project: Solr Issue Type: Bug Components: documentation Reporter: Sean Timm Assignee: Mike Klaas Priority: Trivial Fix For: 1.3 Attachments: deprecationDocumentation.patch In a number of files, @Deprecation annotations were added without accompanying @deprecation Javadoc comments to explain what to use now. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Closed: (SOLR-344) New Java API
[ https://issues.apache.org/jira/browse/SOLR-344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas closed SOLR-344. --- Resolution: Invalid Let's move this discussion to the wiki and mailinglist. It isn't really an open issue for Solr. New Java API Key: SOLR-344 URL: https://issues.apache.org/jira/browse/SOLR-344 Project: Solr Issue Type: Improvement Components: clients - java, search, update Affects Versions: 1.3 Reporter: Jonathan Woods Attachments: New Java API for Solr.pdf The core Solr codebase urgently needs to expose a new Java API designed for use by Java running in Solr's JVM and ultimately by core Solr code itself. This API must be (i) object-oriented ('typesafe'), (ii) self-documenting, (iii) at the right level of granularity, (iv) designed specifically to expose the value which Solr adds over and above Lucene. This is an urgent issue for two reasons: - Java-Solr integrations represent a use-case which is nearly as important as the core Solr use-case in which non-Java clients interact with Solr over HTTP - a significant proportion of questions on the mailing lists are clearly from people who are attempting such integrations right now. This point in Solr development - some way out from the 1.3 release - might be the right time to do the development and refactoring necessary to produce this API. We can do this without breaking any backward compatibility from the point of view of XML/HTTP and JSON-like clients, and without altering the core Solr algorithms which make it so efficient. If we do this work now, we can significantly speed up the spread of Solr. Eventually, this API should be part of core Solr code, not hived off into some separate project nor in a non-first-class package space. It should be capable of forming the foundation of any new Solr development which doesn't need to delve into low level constructs like DocSet and so on - and any new development which does need to do just that should be a candidate for incorporation into the API at the some level. Whether or not it will ever be worth re-writing existing code is a matter of opinion; but the Java API should be such that if it had existed before core plug-ins were written, it would have been natural to use it when writing them. I've attached a PDF which makes the case for this API. Apologies for delivering it as an attachment, but I wanted to embed pics and a bit of formatting. I'll update this issue in the next few days to give a prototype of this API to suggest what it might look like at present. This will build on the work already done in Solrj and SearchComponents (https://issues.apache.org/jira/browse/SOLR-281), and will be a patch on an up-to-date revision of Solr trunk. [PS: 1. Having written most of this, I then properly looked at SearchComponents/SOLR-281 and read http://www.nabble.com/forum/ViewPost.jtp?post=11050274framed=y, which says much the same thing albeit more quickly! And weeks ago, too. But this proposal is angled slightly differently: - it focusses on the value of creating an API not only for internal Solr consumption, but for local Java clients - it focusses on designing a Java API without constantly being hobbled by HTTP-Java - it's suggesting that the SearchComponents work should result in a Java API which can be used as much by third party Java as by ResponseBuilder. 2. I've made some attempt to address Hoss's point (http://www.nabble.com/search-components-%28plugins%29-tf3898040.html#6551097579454875774) - that an API like this would need to maintain enough state e.g. to allow an initial search to later be faceted, highlighted etc without going back to the start each time - but clearly the proof of the pudding will be in the prototype. 3. Again, I've just discovered SOLR-212 (DirectSolrConnection). I think all my comments about Solrj apply to this, useful though it clearly is.] -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-200) Scripts don't work when run as root in ~root and su'ing to a user
[ https://issues.apache.org/jira/browse/SOLR-200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas resolved SOLR-200. - Resolution: Won't Fix It doesn't surprise me that /root as the indexdir and / as solr_home doesn't work, being root or not. I don't think that this is an important case. Scripts don't work when run as root in ~root and su'ing to a user - Key: SOLR-200 URL: https://issues.apache.org/jira/browse/SOLR-200 Project: Solr Issue Type: Bug Affects Versions: 1.1.0 Reporter: Jürgen Hermann Priority: Minor This patch avoids an error due to permission problems when orig_dir is /root -orig_dir=$(pwd) -cd ${0%/*}/.. -solr_root=$(pwd) -cd ${orig_dir} +solr_root=$(cd ${0%/*}/.. pwd) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-517) highlighter doesn't work with hl.requireFieldMatch=true on un-optimized index
[ https://issues.apache.org/jira/browse/SOLR-517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602857#action_12602857 ] Mike Klaas commented on SOLR-517: - Koji: Is this resolved? I seemed to recall that we brought this up on java-dev, but I can't find the thread at the moment. (I don't think that the right thing to do is remove idf fetching of the terms as your patch proposes) highlighter doesn't work with hl.requireFieldMatch=true on un-optimized index - Key: SOLR-517 URL: https://issues.apache.org/jira/browse/SOLR-517 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.2, 1.3 Reporter: Koji Sekiguchi Priority: Minor Attachments: SOLR-517.patch, SOLR-517.patch On un-optimized index, highlighter doesn't work with hl.requireFieldMatch=true. see: http://www.nabble.com/hl.requireFieldMatch-and-idf-td16324482.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-522) analysis.jsp doesn't show payloads created/modified by tokenizers and tokenfilters
[ https://issues.apache.org/jira/browse/SOLR-522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-522: Fix Version/s: 1.3 analysis.jsp doesn't show payloads created/modified by tokenizers and tokenfilters -- Key: SOLR-522 URL: https://issues.apache.org/jira/browse/SOLR-522 Project: Solr Issue Type: Improvement Components: web gui Reporter: Tricia Williams Assignee: Mike Klaas Priority: Trivial Fix For: 1.3 Attachments: SOLR-522-analysis.jsp.patch, SOLR-522-analysis.jsp.patch Original Estimate: 0.17h Remaining Estimate: 0.17h Add payload content to the vebose output of the analysis.jsp page for debugging purposes. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-243) Create a hook to allow custom code to create custom IndexReaders
[ https://issues.apache.org/jira/browse/SOLR-243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602860#action_12602860 ] Mike Klaas commented on SOLR-243: - Hi John, Hoss has marked the issue for 1.3, so it will be in the release. -Mike Create a hook to allow custom code to create custom IndexReaders Key: SOLR-243 URL: https://issues.apache.org/jira/browse/SOLR-243 Project: Solr Issue Type: Improvement Components: search Environment: Solr core Reporter: John Wang Assignee: Hoss Man Fix For: 1.3 Attachments: indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch, indexReaderFactory.patch I have a customized IndexReader and I want to write a Solr plugin to use my derived IndexReader implementation. Currently IndexReader instantiation is hard coded to be: IndexReader.open(path) It would be really useful if this is done thru a plugable factory that can be configured, e.g. IndexReaderFactory interface IndexReaderFactory{ IndexReader newReader(String name,String path); } the default implementation would just return: IndexReader.open(path) And in the newSearcher and getSearcher methods in SolrCore class can call the current factory implementation to get the IndexReader instance and then build the SolrIndexSearcher by passing in the reader. It would be really nice to add this improvement soon (This seems to be a trivial addition) as our project really depends on this. Thanks -John -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [important] call for 1.3 planning
On 21-May-08, at 4:45 PM, Mike Klaas wrote: There seems to be some sort of consensus building that there should be a 1.3 release in the near future. The first step is to figure out what we want to finish before it gets released. The list of JIRA issues currently labeled 1.3 can be found here: http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truemode=hidesorter/order=DESCsorter/field=priorityresolution=-1pid=12310230fixfor=12312486 Let's try to get an assignee for every issue in that list by a week from now. If nobody steps up for an issue in that time, I'll assume it is low enough priority to move post-1.3. This would also be a good time to add any issues that you want to champion for 1.3. That brings us down to 20 issues, with only 2 unassigned: SOLR-424 and SOLR-410. I removed a few of the feature issues with no assignee. Seems like the big things that need to get done are: - componented spellchecking - contrib area + data import handler - distributed search -Mike
Re: 3 TokenFilter factories not compatible with 1.2
On 4-Jun-08, at 5:24 PM, Yonik Seeley wrote: On Wed, Jun 4, 2008 at 7:03 PM, Chris Hostetter [EMAIL PROTECTED] wrote: 3) Documentation and Education Since this wasn't exactly a use case we ever advertised, we could punt on the problem by putting a disclaimer in the CAHNGES.txt that ayone directly constructing those 3 classes should explicitly call inform() on the instances after calling init. #3 is obviously the simplest approach as developers, and to be quite honest: probably impacts the fewest total number of people (since there are probably very few people constructing Factory instances themselves) +1 +1, perhaps also pinging -user to see if there is a sizable group of people doing this. -Mike
[jira] Commented: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values
[ https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602541#action_12602541 ] Mike Klaas commented on SOLR-556: - Ah, I see what the problem is: Although it is impossible for tokens from different values to appear in the same fragment (due to the semantics of MultiValuedTokenFilter), the non-token text (typically, punctuation) from different values can bleed into the same fragment, since lucene's highlighter can only create a new fragment on token boundaries. Unfortunately SOLR-553 was committed a day after you submitted your patch, and rearranges the code slightly so that it no longer applies. Could you sync the patch with trunk? I think the basic approach is sound. Highlighting of multi-valued fields returns snippets which span multiple different values - Key: SOLR-556 URL: https://issues.apache.org/jira/browse/SOLR-556 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Assignee: Mike Klaas Priority: Minor Fix For: 1.3 Attachments: solr-highlight-multivalued-example.xml, solr-highlight-multivalued.patch When highlighting multi-valued fields, the highlighter sometimes returns snippets which span multiple values, e.g. with values foo and bar and search term ba the highlighter will create the snippet fooemba/emr. Furthermore it sometimes returns smaller snippets than it should, e.g. with value foobar and search term oo it will create the snippet emoo/em regardless of hl.fragsize. I have been unable to determine the real cause for this, or indeed what actually goes on at all. To reproduce the problem, I've used the following steps: * create an index with multi-valued fields, one document should have at least 3 values for these fields (in my case strings of length between 5 and 15 Japanese characters -- as far as I can tell plain old ASCII should produce the same effect though) * search for part of a value in such a field with highlighting enabled, the additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, hl.mergeContiguous=true (changing the parameters does not seem to have any effect on the result though) * highlighted snippets should show effects described above -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (SOLR-161) Dangling dash causes stack trace
[ https://issues.apache.org/jira/browse/SOLR-161?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12602038#action_12602038 ] Mike Klaas commented on SOLR-161: - It is really a Lucene query parser bug, but it wouldn't hurt to do s/(.*)-// as a workaround. Assuming my ed(1) syntax is still fresh. Regardless, no query string should ever give a stack trace This might be hard to guarantee. Already there are four issues details specific ways that dismax that barf on input. A lot of the suggestions above are of the form of detecting a specific failure mode and correcting it, which does not guarantee that you will catch them all. A robust way to do it is parse the query into an AST using a grammar in a way that matches the query as well as possible (dropping the stuff that doesn't fit). Unfortunately, this is duplicative of the lucene parsing logic, and it would be nicer add a relaxed mode to lucene rather than pre-parsing the query. (The reparse+reassemble method is what we use, btw. It is written in python but it might be possible to translate to java.) Dangling dash causes stack trace Key: SOLR-161 URL: https://issues.apache.org/jira/browse/SOLR-161 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.1.0 Environment: Java 1.5, Tomcat 5.5.17, Fedora Core 4, Intel Reporter: Walter Underwood I'm running tests from our search logs, and we have a query that ends in a dash. That caused a stack trace. org.apache.lucene.queryParser.ParseException: Cannot parse 'digging for the truth -': Encountered EOF at line 1, column 23. Was expecting one of: ( ... QUOTED ... TERM ... PREFIXTERM ... WILDTERM ... [ ... { ... NUMBER ... at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:127) at org.apache.solr.request.DisMaxRequestHandler.handleRequest(DisMaxRequestHandler.java:272) at org.apache.solr.core.SolrCore.execute(SolrCore.java:595) at org.apache.solr.servlet.SolrServlet.doGet(SolrServlet.java:92) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (LUCENE-1293) Tweaks to PhraseQuery.explain()
[ https://issues.apache.org/jira/browse/LUCENE-1293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12600973#action_12600973 ] Mike Klaas commented on LUCENE-1293: It is meant for debugging, though I have found it so painfully slow in the past that I have avoided it on occasion. The main culprit is the looped next() call in PhraseScorer.explain(). Using skipTo() would be faster. Tweaks to PhraseQuery.explain() --- Key: LUCENE-1293 URL: https://issues.apache.org/jira/browse/LUCENE-1293 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 1.9, 2.0.0, 2.1, 2.2, 2.3, 2.3.1, 2.3.2, 2.4 Reporter: Itamar Syn-Hershko Priority: Minor Fix For: 2.4 The explain() function in PhraseQuery.java is very clumzy and could use many optimizations. Perhaps it is only because it is intended to use while debugging? Here's an example: {noformat} result.addDetail(fieldExpl); // combine them result.setValue(queryExpl.getValue() * fieldExpl.getValue()); if (queryExpl.getValue() == 1.0f) return fieldExpl; return result; } {noformat} Can easily be tweaked and become: {noformat} if (queryExpl.getValue() == 1.0f) { return fieldExpl; } result.addDetail(fieldExpl); // combine them result.setValue(queryExpl.getValue() * fieldExpl.getValue()); return result; } {noformat} And thats really just for a start... Itamar. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Release of SOLR 1.3
On 20-May-08, at 12:32 PM, Shalin Shekhar Mangar wrote: +1 for your suggestions Mike. I'd like to see a few of the smaller issues get committed in 1.3 such as SOLR-256 (JMX), SOLR-536 (binding for SolrJ), SOLR-430 (SpellChecker support in SolrJ) etc. Also, SOLR-561 (replication by Solr) would be really cool to have in the next release. Noble and I are working on it and plan to give a patch soon. Whether something makes it in to this release will depend mostly on getting the buy-in and time commitment from one of the committers familiar with that aspect of the project. There is so much in 1.3 as it is that I think our focus should be on getting it out sooner rather than adding things. But small things that significantly improve the release are good too. SOLR-561 seems like a rather large project to me (although I have never even used the existing collection distribution method). Mike -- you removed SOLR-563 (Contrib area for Solr) from 1.3 but it is a dependency for SOLR-469 (DataImportHandler) as it was decided to have DataImportHandler as a contrib project. It would also be good to have a rough release roadmaps to work against. Can fixed release cycle (say every 6 months) work for Solr? Twice-yearly releases would be nice to aim for, but I think we're too small a project to fix release dates in advance. -Mike
Re: Release of SOLR 1.3
On 22-May-08, at 12:13 AM, Andrew Savory wrote: Sure, Commit-Then-Review vs. Review-Then-Commit ... but I don't actually think RTC is going to ensure significantly more widespread review since the time burden on other developers to find the issue in JIRA, download the patch, apply the patch, test, respond, then revert the change. Do people really have the time to do that? It's significantly more effort than that to svn update, look at code, and feed back. I prefer detailed discussion on the mailing list (which supports decent threading, quoting etc, unlike JIRA) followed by commit of a trial implementation which can then be refactored. Otherwise there might be a tendency to analysis paralysis. But I'm the new boy here, so I'll STFU and try to help out on the release instead of forcing y'all to rehash old discussions on how to run an open source project ;-) Maybe by the time 1.3 is out the door we'll all be using distributed SCM systems and the discussion will be moot anyway! I think we agree in principle--a patch does not have to be spotless to be committed. I also agree that that mailinglist is a preferable place to hash out design details. But it is necessary that the basic approach is one we feel will stick with before getting committed. I don't think this imposes much of a burden on people aiming to review a patch. It is true that using patches takes an extra minute or two to set up, but the time to evaluate a contribution is _by far_ mostly contained in understanding the contribution, its implications, and examining the code. Plus, the patch is much easier to back out of a given repository and makes it easier to see exactly what changes were made. Since contributors can't commit to the repository anyway, I don't see much disadvantage in working with patches. (btw, if you want a one-line equivalent to svn up, try something like: $ wget http://issues.apache.org/jira/secure/attachment/12381498/SOLR-563.patch -O - | patch -p0 Reverting is also one line: $ svn revert -R . Although this leaves added files, which can be removed with $ svn st | grep '?' | awk '{print $2}' | xargs rm Another useful trick is to have multiple checkouts of trunk and bounce an active changeset from one to another with $ svn diff | (cd ../otherbranch; patch -p0) ) -Mike -Mike
[important] call for 1.3 planning
There seems to be some sort of consensus building that there should be a 1.3 release in the near future. The first step is to figure out what we want to finish before it gets released. The list of JIRA issues currently labeled 1.3 can be found here: http://issues.apache.org/jira/secure/IssueNavigator.jspa?reset=truemode=hidesorter/order=DESCsorter/field=priorityresolution=-1pid=12310230fixfor=12312486 Let's try to get an assignee for every issue in that list by a week from now. If nobody steps up for an issue in that time, I'll assume it is low enough priority to move post-1.3. This would also be a good time to add any issues that you want to champion for 1.3. (This isn't meant to be a final list, just something to help get us started. Most of the unassigned issues were reported by committers, so that should hopefully make it easy to figure out the assignee.) -Mike
[jira] Updated: (SOLR-556) Highlighting of multi-valued fields returns snippets which span multiple different values
[ https://issues.apache.org/jira/browse/SOLR-556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-556: Fix Version/s: 1.3 Highlighting of multi-valued fields returns snippets which span multiple different values - Key: SOLR-556 URL: https://issues.apache.org/jira/browse/SOLR-556 Project: Solr Issue Type: Bug Components: highlighter Affects Versions: 1.3 Environment: Tomcat 5.5 Reporter: Lars Kotthoff Assignee: Mike Klaas Priority: Minor Fix For: 1.3 Attachments: solr-highlight-multivalued-example.xml, solr-highlight-multivalued.patch When highlighting multi-valued fields, the highlighter sometimes returns snippets which span multiple values, e.g. with values foo and bar and search term ba the highlighter will create the snippet fooemba/emr. Furthermore it sometimes returns smaller snippets than it should, e.g. with value foobar and search term oo it will create the snippet emoo/em regardless of hl.fragsize. I have been unable to determine the real cause for this, or indeed what actually goes on at all. To reproduce the problem, I've used the following steps: * create an index with multi-valued fields, one document should have at least 3 values for these fields (in my case strings of length between 5 and 15 Japanese characters -- as far as I can tell plain old ASCII should produce the same effect though) * search for part of a value in such a field with highlighting enabled, the additional parameters I use are hl.fragsize=70, hl.requireFieldMatch=true, hl.mergeContiguous=true (changing the parameters does not seem to have any effect on the result though) * highlighted snippets should show effects described above -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-536) Automatic binding of results to Beans (for solrj)
[ https://issues.apache.org/jira/browse/SOLR-536?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-536: Fix Version/s: (was: 1.3) Automatic binding of results to Beans (for solrj) - Key: SOLR-536 URL: https://issues.apache.org/jira/browse/SOLR-536 Project: Solr Issue Type: New Feature Components: clients - java Affects Versions: 1.3 Reporter: Noble Paul Priority: Minor Attachments: SOLR-536.patch as we are using java5 .we can use annotations to bind SolrDocument to java beans directly. This can make the usage of solrj a bit simpler The QueryResponse class in solrj can have an extra method as follows public T ListT getResultBeans(ClassT klass) and the bean can have annotations as class MyBean{ @Field(id) //name is optional String id; @Field(category) ListString categories } -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-579) Extend SimplePost with RecurseDirectories, threads, document encoding , number of docs per commit
[ https://issues.apache.org/jira/browse/SOLR-579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-579: Fix Version/s: (was: 1.3) Extend SimplePost with RecurseDirectories, threads, document encoding , number of docs per commit - Key: SOLR-579 URL: https://issues.apache.org/jira/browse/SOLR-579 Project: Solr Issue Type: New Feature Affects Versions: 1.3 Environment: Applies to all platforms Reporter: Patrick Debois Priority: Minor Original Estimate: 72h Remaining Estimate: 72h -When specifying a directory, simplepost should read also the contents of a directory New options for the commandline (some only usefull in DATAMODE= files) -RECURSEDIRS Recursive read of directories as an option, this is usefull for directories with a lot of files where the commandline expansion fails and xargs is too slow -DOCENCODING (default = system encoding or UTF-8) For non utf-8 clients , simplepost should include a way to set the encoding of the documents posted -THREADSIZE (default =1 ) For large volume posts, a threading pool makes sense , using JDK 1.5 Threadpool model -DOCSPERCOMMIT (default = 1) Number of documents after which a commit is done, instead of only at the end Note: not to break the existing behaviour of the existing SimplePost tool (post.sh) might be used in scripts -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Resolved: (SOLR-383) Add support for globalization/culture management
[ https://issues.apache.org/jira/browse/SOLR-383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas resolved SOLR-383. - Resolution: Fixed Fix Version/s: (was: 1.3) Add support for globalization/culture management Key: SOLR-383 URL: https://issues.apache.org/jira/browse/SOLR-383 Project: Solr Issue Type: Improvement Components: clients - C# Affects Versions: 1.3 Reporter: Jeff Rodenburg Assignee: Jeff Rodenburg Priority: Minor SolrSharp should supply configuration and/or programmatic control over windows culture settings. This is important for working with data being saved to indexes that carry certain formatting expectations for various types of fields, both in SolrSharp as well as the solr field counterparts on the server side. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-563) Contrib area for Solr
[ https://issues.apache.org/jira/browse/SOLR-563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-563: Fix Version/s: (was: 1.3) Contrib area for Solr - Key: SOLR-563 URL: https://issues.apache.org/jira/browse/SOLR-563 Project: Solr Issue Type: Task Affects Versions: 1.3 Reporter: Shalin Shekhar Mangar Attachments: SOLR-563.patch Add a contrib area for Solr and modify existing build.xml to build, package and distribute contrib projects also. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-565) Component to abstract shards from clients
[ https://issues.apache.org/jira/browse/SOLR-565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-565: Fix Version/s: (was: 1.3) Component to abstract shards from clients - Key: SOLR-565 URL: https://issues.apache.org/jira/browse/SOLR-565 Project: Solr Issue Type: New Feature Components: search Affects Versions: 1.3 Reporter: patrick o'leary Priority: Minor Attachments: distributor_component.patch A component that will remove the need for calling clients to provide the shards parameter for a distributed search. As systems grow, it's better to manage shards with in solr, rather than managing each client. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Updated: (SOLR-551) SOlr replication should include the schema also
[ https://issues.apache.org/jira/browse/SOLR-551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mike Klaas updated SOLR-551: Fix Version/s: (was: 1.3) SOlr replication should include the schema also --- Key: SOLR-551 URL: https://issues.apache.org/jira/browse/SOLR-551 Project: Solr Issue Type: Improvement Components: replication Affects Versions: 1.3 Reporter: Noble Paul The current Solr replication just copy the data directory . So if the schema changes and I do a re-index it will blissfully copy the index and the slaves will fail because of incompatible schema. So the steps we follow are * Stop rsync on slaves * Update the master with new schema * re-index data * forEach slave ** Kill the slave ** clean the data directory ** install the new schema ** restart ** do a manual snappull The amount of work the admin needs to do is quite significant (depending on the no:of slaves). These are manual steps and very error prone The solution : Make the replication mechanism handle the schema replication also. So all I need to do is to just change the master and the slaves synch automatically What is a good way to implement this? We have an idea along the following lines This should involve changes to the snapshooter and snappuller scripts and the snapinstaller components Everytime the snapshooter takes a snapshot it must keep the timestamps of schema.xml and elevate.xml (all the files which might affect the runtime behavior in slaves) For subsequent snapshots if the timestamps of any of them is changed it must copy the all of them also for replication. The snappuller copies the new directory as usual The snapinstaller checks if these config files are present , if yes, * It can create a temporary core * install the changed index and configuration * load it completely and swap it out with the original core -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.