[jira] [Commented] (LUCENENET-481) Port Contrib.MemoryIndex
[ https://issues.apache.org/jira/browse/LUCENENET-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238661#comment-13238661 ] Christopher Currens commented on LUCENENET-481: --- If you're talking about the termComparator, that wasn't made generic until 3.1. The comparator in 3.0.3 can't be ported the way it is anyway because of Java's type system, but I just want to make sure you're porting 3.0.3 to keep everything in line with the rest of the .NET versions. You'll find that the 3.x version in java uses a few other additions to the main lucene library that aren't yet available in 3.0.3. This problem should be easily solved without reflection. The comparator used basically requires that it be a {{KeyValuePairTKey, TValue}}, or more specifically, a {{KeyValuePairstring, TValue}}. There are actually only 2 different types that uses that termComparator: {{KeyValuePairstring, ArrayIntList[]}} and {{KeyValuePairstring,Info[]}}. An exception to that is the {{private static sort(DictionaryK,V)}} method, but that can be solved with a static method, a type constraint (which is already implied in the java version) and some type inference (as a nicety). I had ported most of this at one point (somewhere on my home computer), and if memory serves me correctly, I think this is how I solved this problem. You can use this if you want: {code} class KvpComparer { public static int ComparerTKey, TValue(KeyValuePairTKey, TValue x, KeyValuePairTKey, TValue y) where TKey : IComparableTKey { if (x.Equals(y)) return 0; return x.Key.CompareTo(y.Key); } } sealed class KvpComparerT : KvpComparer, IComparerKeyValuePairstring, T { public int Compare(KeyValuePairstring, T x, KeyValuePairstring, T y) { return Comparer(x, y); } } {code} You can create the two instances you need for the {{string,Info}} and {{string,ArrayIntList}} types. For the {{Map.EntryK,V[] sort(HashMapK,V map}} method, constrain {{K}} to {{IComparableK}}, and then you can use it like {{Array.Sort(entries, KvpComparer.Compare)}}, which is nice because it's one less object you need to create (or more) for each type passed into sort. Alternatively, since the {{sort}} method is private, and only uses those two types, you can just change the signature and pass in one of the comparers instead, removing the base class from the equation. Port Contrib.MemoryIndex Key: LUCENENET-481 URL: https://issues.apache.org/jira/browse/LUCENENET-481 Project: Lucene.Net Issue Type: New Feature Affects Versions: Lucene.Net 3.0.3 Reporter: Christopher Currens We need to port MemoryIndex from contrib, if we want to be able to port a few other contrib libraries. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: Indexing Boolean Expressions
Hello Joaquin, I looked through the paper several times, and see no problem to implement it in Lucene (the trivial case at least): Let's index conjunctive condition as {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3} then, form query from the incoming fact (event): fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD to enforce overlap between condition and event, wrap the query above into own query whose scorer will check that numClauses for the matched doc is equal to number of matched clauses. To get numClauses for the matched doc you can use FieldCache that's damn fast; and number of matched clauses can be obtained from DisjunctionSumScorer.nrMatchers() Negative clauses, and multivalue can be covered also, I believe. WDYT? On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado joaquin.delg...@gmail.comwrote: I looked at LUCENE-2987 and its work on the query side (changes to the accepted syntax to accept lower case 'or' and 'and'), which isn't really related to my proposal. What I'm proposing is to be able to index complex boolean expressions using Lucene. This can be viewed as the opposite of the regular search task. The objective here is find a set of relevant queries given a document (assignment of values to fields). This by itself may not sound that interesting but its a key piece to efficiently implementing any MATCHING system which is effectively a two-way search where constraints are defined both-ways. An example of this would be: 1) Job matching: Potential employers define their job posting as a documents along with complex boolean expressions used to narrow potential candidates. Job searchers upload their profile and may formulate complex queries when executing a search. Once a is search initiated from any of the sides constraints need to satisfied both ways. 2) Advertising: Publishers define constraints on the type of advertisers/ads they are willing to show in their sites. On the other hand, advertisers define constraints (typically at the campaign level) on publisher sites they want their ads to show at as well as on the user audiences they are targeting to. While some attribute values are known at definition time, others are only instantiated once the user visits a given page which triggers a matching request that must be satisfied in few milliseconds to select valid ads and then scored based on relevance. So in a matching system a MATCH QUERY is considered to to be a tuple that consists of a value assignment to attributes/fields (doc) + a boolean expression (query) that goes against a double index also built on tuples that simultaneously boolean expressions and associated documents. To do this efficiently we need to be able to build indexes on Boolean expressions (Lucene queries) and retrieve the set of matching expressions given a doc (typically few attributes with values assigned), which is the core of what is described in this paper: Indexing Boolean Expressions (See http://www.vldb.org/pvldb/2/vldb09-83.pdf) -- J So to effectively resolve the problem of realtime matching one can On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera calcmaste...@gmail.comwrote: On 02/21/2012 12:15 PM, Aayush Kothari wrote: So if Aayush Kothari is interested in working on this as a Student, all we need is a formal mentor (I can be the informal one). Anyone up for the task? Completely interested in working for and learning about the aforementioned subject/project. +1. This may be related to the work I'm doing with LUCENE-2987 Basically changing the grammar to accepts conjunctions AND and OR in the query text. I would be interested in working with you on some of the details. However, I too am not a formal committer. -- Joe Cabreraeminorlabs.com -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
[jira] [Updated] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi updated LUCENE-3888: --- Fix Version/s: (was: 3.6) Thanks Robert for giving some patches and comment. {quote} The only option for 3.6 would be something like my previous patch (https://issues.apache.org/jira/secure/attachment/12519860/LUCENE-3888.patch) which has the disadvantages of doing the second-phase re-ranking on surface forms. {quote} With the disadvantages, the spell checker won't work well for Japanese anyway. I give up this for 3.6. split off the spell check word and surface form in spell check dictionary - Key: LUCENE-3888 URL: https://issues.apache.org/jira/browse/LUCENE-3888 Project: Lucene - Java Issue Type: Improvement Components: modules/spellchecker Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 4.0 Attachments: LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch The did you mean? feature by using Lucene's spell checker cannot work well for Japanese environment unfortunately and is the longstanding problem, because the logic needs comparatively long text to check spells, but for some languages (e.g. Japanese), most words are too short to use the spell checker. I think, for at least Japanese, the things can be improved if we split off the spell check word and surface form in the spell check dictionary. Then we can use ReadingAttribute for spell checking but CharTermAttribute for suggesting, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3174) Visualize Cluster State
[ https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3174: Attachment: SOLR-3174.patch Visualize Cluster State --- Key: SOLR-3174 URL: https://issues.apache.org/jira/browse/SOLR-3174 Project: Solr Issue Type: New Feature Components: web gui Reporter: Ryan McKinley Attachments: SOLR-3174-graph.png, SOLR-3174-rgraph.png, SOLR-3174.patch, SOLR-3174.patch It would be great to visualize the cluster state in the new UI. See Mark's wish: https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3174) Visualize Cluster State
[ https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Matheis (steffkes) updated SOLR-3174: Attachment: SOLR-3174-rgraph.png SOLR-3174-graph.png Visualize Cluster State --- Key: SOLR-3174 URL: https://issues.apache.org/jira/browse/SOLR-3174 Project: Solr Issue Type: New Feature Components: web gui Reporter: Ryan McKinley Attachments: SOLR-3174-graph.png, SOLR-3174-graph.png, SOLR-3174-rgraph.png, SOLR-3174-rgraph.png, SOLR-3174.patch, SOLR-3174.patch It would be great to visualize the cluster state in the new UI. See Mark's wish: https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji
Add decompose compound Japanese Katakana token capability to Kuromoji - Key: LUCENE-3921 URL: https://issues.apache.org/jira/browse/LUCENE-3921 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Environment: Cent OS 5, IPA Dictionary Reporter: Kazuaki Hiraga Japanese morphological analyzer, Kuromoji doesn't have a capability to decompose every Japanese Katakana compound tokens to sub-tokens. It seems that some Katakana tokens can be decomposed, but it cannot be applied every Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary has バッグ in its entry. I would like to apply the decompose feature to every Katakana tokens if the sub-tokens are in the dictionary or add the capability to force apply the decompose feature to every Katakana tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kazuaki Hiraga updated LUCENE-3921: --- Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe (was: Cent OS 5, IPA Dictionary) Add decompose compound Japanese Katakana token capability to Kuromoji - Key: LUCENE-3921 URL: https://issues.apache.org/jira/browse/LUCENE-3921 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe Reporter: Kazuaki Hiraga Labels: features Japanese morphological analyzer, Kuromoji doesn't have a capability to decompose every Japanese Katakana compound tokens to sub-tokens. It seems that some Katakana tokens can be decomposed, but it cannot be applied every Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary has バッグ in its entry. I would like to apply the decompose feature to every Katakana tokens if the sub-tokens are in the dictionary or add the capability to force apply the decompose feature to every Katakana tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3174) Visualize Cluster State
[ https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238199#comment-13238199 ] Stefan Matheis (steffkes) commented on SOLR-3174: - Updated the Patch and the Screenshots. Radial-View is now working as expected. Also improved the displayed Hostname, if all have the same protocol, it's skipped - same for ports and directories. Visualize Cluster State --- Key: SOLR-3174 URL: https://issues.apache.org/jira/browse/SOLR-3174 Project: Solr Issue Type: New Feature Components: web gui Reporter: Ryan McKinley Attachments: SOLR-3174-graph.png, SOLR-3174-graph.png, SOLR-3174-rgraph.png, SOLR-3174-rgraph.png, SOLR-3174.patch, SOLR-3174.patch It would be great to visualize the cluster state in the new UI. See Mark's wish: https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238203#comment-13238203 ] Dawid Weiss commented on LUCENE-3867: - For historical records: the previous implementation of RamUsageEstimator was off by anything between 3% (random size objects, including arrays) to 20% (objects smaller than 80 bytes). Again -- these are perfect scenario measurements with empty heap and max. allocation until OOM, with a serial GC. With a concurrent and parallel GCs the memory consumption estimation is still accurate but it's nearly impossible to tell when an OOM will occur or how the GC will manage the heap space. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238206#comment-13238206 ] Uwe Schindler commented on LUCENE-3867: --- That's true. But you can still get the unreleaseable allocation, so the size of the non-gc-able object graph. If GC does not free the objects after release fast-enough, it will still do it once memory gets low. But the allocated objects with hard refs are not releaseable. So I think it's fine for memory requirement purposes. If you want real heap allocation, you must use instrumentation. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
[ https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238208#comment-13238208 ] Dawid Weiss commented on LUCENE-3867: - I didn't say it's wrong -- it is fine and accurate. What I'm saying is that it's not really suitable for predictions; for answering questions like: how many objects of a given type/ types can I allocate before an OOM hits me? It doesn't really surprise me that much, but it would be nice. For measuring already allocated stuff it's more than fine of course. RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect -- Key: LUCENE-3867 URL: https://issues.apache.org/jira/browse/LUCENE-3867 Project: Lucene - Java Issue Type: Bug Components: core/index Reporter: Shai Erera Assignee: Uwe Schindler Priority: Trivial Fix For: 3.6, 4.0 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The NUM_BYTES_OBJECT_REF part should not be included, at least not according to this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml {quote} A single-dimension array is a single object. As expected, the array has the usual object header. However, this object head is 12 bytes to accommodate a four-byte array length. Then comes the actual array data which, as you might expect, consists of the number of elements multiplied by the number of bytes required for one element, depending on its type. The memory usage for one element is 4 bytes for an object reference ... {quote} While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel about including such helper methods in RUE, as static, stateless, methods? It's not perfect, there's some room for improvement I'm sure, here it is: {code} /** * Computes the approximate size of a String object. Note that if this object * is also referenced by another object, you should add * {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this * method. */ public static int sizeOf(String str) { return 2 * str.length() + 6 // chars + additional safeness for arrays alignment + 3 * RamUsageEstimator.NUM_BYTES_INT // String maintains 3 integers + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // char[] array + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // String object } {code} If people are not against it, I'd like to also add sizeOf(int[] / byte[] / long[] / double[] ... and String[]). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3272) Solr filter factory for MorfologikFilter
[ https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafał Kuć updated SOLR-3272: Attachment: SOLR-3272.patch Patch with MorfologikFilterFactory and test added. Solr filter factory for MorfologikFilter Key: SOLR-3272 URL: https://issues.apache.org/jira/browse/SOLR-3272 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 4.0 Reporter: Rafał Kuć Fix For: 4.0 Attachments: SOLR-3272.patch I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe someone will have make use of it :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3272) Solr filter factory for MorfologikFilter
Solr filter factory for MorfologikFilter Key: SOLR-3272 URL: https://issues.apache.org/jira/browse/SOLR-3272 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 4.0 Reporter: Rafał Kuć Fix For: 4.0 Attachments: SOLR-3272.patch I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe someone will have make use of it :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3076) Solr should support block joins
[ https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238239#comment-13238239 ] Michael McCandless commented on SOLR-3076: -- {quote} 2. Do you agree with overall approach to deliver straightforward QP with explicit joining syntax? Or you object and insist on entity-relationship-schema approach? 3. What's is the level of uncertainty you have about the current QP syntax? What's your main concern and what's the way to improve it? {quote} Well, stepping back, my concern is still that I don't think there should be any QP syntax to express block joins. These are joins determined at indexing time, and compiled into the index, and so the only remaining query-time freedom is which fields you want to search against (something QP can already understand, ie field:text syntax). From that fields list the required joins are implied. I can't imagine users learning/typing the sort of syntax we are discussing here. It's true there are exceptional cases (Hoss's size field that's on both parent and child docs), but, that's the exception not the rule; I don't think we should design things (APIs, QP syntax) around exceptional cases. And, I think such an exception should be handled by some sort of field aliasing (book_page_count vs chapter_page_count). For query-time join, which is fully flexible, I agree the QP must (and already does) include join syntax, ie be more like SQL, where you can express arbitrary on-the-fly joins. But, at the same time, the 'users' of Solr's QP syntax may not be the end user, ie, the app's front end may very well construct these complex join expressions and so it's really the developers of that search app writing these join queries. So perhaps it's fine to add crazy-expert syntax that end users would rarely use but search app developers might...? All this being said, I defer to Hoss (and other committers more experienced w/ Solr QP issues) here... if they all feel this added QP syntax makes sense then let's do it! Solr should support block joins --- Key: SOLR-3076 URL: https://issues.apache.org/jira/browse/SOLR-3076 Project: Solr Issue Type: New Feature Reporter: Grant Ingersoll Attachments: SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, bjq-vs-filters-backward-disi.patch, bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, parent-bjq-qparser.patch, parent-bjq-qparser.patch, solrconf-bjq-erschema-snippet.xml, tochild-bjq-filtered-search-fix.patch Lucene has the ability to do block joins, we should add it to Solr. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238253#comment-13238253 ] Christian Moen commented on LUCENE-3921: Hello, Kazu. Long time no see -- I hope things are well! This is very good feature request. I think this is possible by changing how we emit unknown words, i.e. by not emitting them as greedily and giving the lattice more segmentation options. For example, if we find an unknown word トートバッグ (by regular greedy matching), we can emit {noformat} ト トー トート トートバ トートバッ トートバッグ {noformat} in the current position. When we reach the position that starts with バッグ, we'll find a known word, and when the Viterbi runs, it's likely to choose トート and バッグ as the best path. Let me have a look at this by looking into the lattice details. Add decompose compound Japanese Katakana token capability to Kuromoji - Key: LUCENE-3921 URL: https://issues.apache.org/jira/browse/LUCENE-3921 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe Reporter: Kazuaki Hiraga Labels: features Japanese morphological analyzer, Kuromoji doesn't have a capability to decompose every Japanese Katakana compound tokens to sub-tokens. It seems that some Katakana tokens can be decomposed, but it cannot be applied every Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary has バッグ in its entry. I would like to apply the decompose feature to every Katakana tokens if the sub-tokens are in the dictionary or add the capability to force apply the decompose feature to every Katakana tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238253#comment-13238253 ] Christian Moen edited comment on LUCENE-3921 at 3/26/12 10:44 AM: -- Hello, Kazu. Long time no see -- I hope things are well! This is very good feature request. I think this is possible by changing how we emit unknown words, i.e. by not emitting them as greedily and giving the lattice more segmentation options. For example, if we find an unknown word トートバッグ (by regular greedy matching), we can emit {noformat} ト トー トート トートバ トートバッ トートバッグ {noformat} in the current position. When we reach the position that starts with バッグ, we'll find a known word, and when the Viterbi runs, it's likely to choose トート and バッグ as the best path. Let me have a play by looking into the lattice details and see if something like this is feasible. was (Author: cm): Hello, Kazu. Long time no see -- I hope things are well! This is very good feature request. I think this is possible by changing how we emit unknown words, i.e. by not emitting them as greedily and giving the lattice more segmentation options. For example, if we find an unknown word トートバッグ (by regular greedy matching), we can emit {noformat} ト トー トート トートバ トートバッ トートバッグ {noformat} in the current position. When we reach the position that starts with バッグ, we'll find a known word, and when the Viterbi runs, it's likely to choose トート and バッグ as the best path. Let me have a look at this by looking into the lattice details. Add decompose compound Japanese Katakana token capability to Kuromoji - Key: LUCENE-3921 URL: https://issues.apache.org/jira/browse/LUCENE-3921 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe Reporter: Kazuaki Hiraga Labels: features Japanese morphological analyzer, Kuromoji doesn't have a capability to decompose every Japanese Katakana compound tokens to sub-tokens. It seems that some Katakana tokens can be decomposed, but it cannot be applied every Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary has バッグ in its entry. I would like to apply the decompose feature to every Katakana tokens if the sub-tokens are in the dictionary or add the capability to force apply the decompose feature to every Katakana tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238253#comment-13238253 ] Christian Moen edited comment on LUCENE-3921 at 3/26/12 10:57 AM: -- Hello, Kazu. Long time no see -- I hope things are well! This is very good feature request. I think this might be possible by changing how we emit unknown words, i.e. by not emitting them as greedily and giving the lattice more segmentation options. For example, if we find an unknown word トートバッグ (by regular greedy matching), we can emit {noformat} ト トー トート トートバ トートバッ トートバッグ {noformat} in the current position. When we reach the position that starts with バッグ we'll find a known word. When the Viterbi runs, it's likely to choose トート and バッグ as its best path. Let me have a play by looking into the lattice details and see if something like this is feasible. We are sort of hacking the model here so we also need to consider side-effects. was (Author: cm): Hello, Kazu. Long time no see -- I hope things are well! This is very good feature request. I think this is possible by changing how we emit unknown words, i.e. by not emitting them as greedily and giving the lattice more segmentation options. For example, if we find an unknown word トートバッグ (by regular greedy matching), we can emit {noformat} ト トー トート トートバ トートバッ トートバッグ {noformat} in the current position. When we reach the position that starts with バッグ, we'll find a known word, and when the Viterbi runs, it's likely to choose トート and バッグ as the best path. Let me have a play by looking into the lattice details and see if something like this is feasible. Add decompose compound Japanese Katakana token capability to Kuromoji - Key: LUCENE-3921 URL: https://issues.apache.org/jira/browse/LUCENE-3921 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe Reporter: Kazuaki Hiraga Labels: features Japanese morphological analyzer, Kuromoji doesn't have a capability to decompose every Japanese Katakana compound tokens to sub-tokens. It seems that some Katakana tokens can be decomposed, but it cannot be applied every Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary has バッグ in its entry. I would like to apply the decompose feature to every Katakana tokens if the sub-tokens are in the dictionary or add the capability to force apply the decompose feature to every Katakana tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3273) 404 Not Found on action=PREPRECOVERY
404 Not Found on action=PREPRECOVERY Key: SOLR-3273 URL: https://issues.apache.org/jira/browse/SOLR-3273 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen We have an application based on a recent copy of 4.0-SNAPSHOT. We have a preformance test setup where we performance test our application (and therefore indirectly Solr(Cloud)). When we run the performance test against a setup using SolrCloud without replication, everything seems to run very nicely for days. When we add replication to the setup the same performance test shows some problems - which we will report (and maybe help fix) in distinct issues here in jira. About the setup - the setup is a little more complex than described below, but I believe the description will tell enough: We have two solr servers which we start from solr-install/example using this command (ZooKeepers have been started before) - we first start solr on server1, and then starts solr on server2 after solr on server1 finished starting up: pre nohup java -Xmx4096m -Dcom.sun.management.jmxremote -DzkHost=server1:2181,server2:2181,server3:2181 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log /pre The ./myapp/solr.xml looks like this on server1: pre ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server1 hostPort=8983 hostContext=solr core name=collA_slice1_shard1 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr /pre The ./myapp/solr.xml looks like this on server2: pre ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server2 hostPort=8983 hostContext=solr core name=collA_slice1_shard2 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr /pre The first thing we observe is that Solr server1 (running collA_slice1_shard1) seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) is started up later it quickly reports the following in its solr.log an keeps doing that for a long time: pre SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not Found request: http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2 at org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) /pre Please note that we have changed a little bit in the way errors are logged, but basically this means that Solr server2 gets an 404 Not Found on its request http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2; to Solr server1. Seems like there is not a common agreement among the Solr servers on how/where to send those requests and how/where to listen for them. Regards, Per Steffensen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3909) Move Kuromoji to analysis.ja and introduce Japanese* naming
[ https://issues.apache.org/jira/browse/LUCENE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238267#comment-13238267 ] Christian Moen commented on LUCENE-3909: Committed revision 1305297 to {{trunk}}. Backporting to {{branch_3x}}. Move Kuromoji to analysis.ja and introduce Japanese* naming --- Key: LUCENE-3909 URL: https://issues.apache.org/jira/browse/LUCENE-3909 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Assignee: Christian Moen Lucene/Solr 3.6 and 4.0 will get out-of-the-box Japanese language support through {{KuromojiAnalyzer}}, {{KuromojiTokenizer}} and various other filters. These filters currently live in {{org.apache.lucene.analysis.kuromoji}}. I'm proposing that we move Kuromoji to a new Japanese package {{org.apache.lucene.analysis.ja}} in line with how other languages are organized. As part of this, I also think we should rename {{KuromojiAnalyzer}} to {{JapaneseAnalyzer}}, etc. to further align naming to our conventions by making it very clear that these analyzers are for Japanese. (As much as I like the name Kuromoji, I think Japanese is more fitting.) A potential issue I see with this that I'd like to raise and get feedback on, is that end-users in Japan and elsewhere who use lucene-gosen could have issues after an upgrade since lucene-gosen is in fact releasing its analyzers under the {{org.apache.lucene.analysis.ja}} namespace (and we'd have a name clash). I believe users should have the freedom to choose whichever Japanese analyzer, filter, etc. they'd like to use, and I don't want to propose a name change that just creates unnecessary problems for users, but I think the naming proposed above is most fitting for a Lucene/Solr release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238272#comment-13238272 ] Kazuaki Hiraga commented on LUCENE-3921: Hello, Christian. It's been a long time! We really want to have that capability. As you may know, It's hard to deal with tokens that consists of two or three Katakana tokens. We want to have a good way to solve the issue more systematically rather than making a hand-made dictionary. Looking forward to hearing from you. Add decompose compound Japanese Katakana token capability to Kuromoji - Key: LUCENE-3921 URL: https://issues.apache.org/jira/browse/LUCENE-3921 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 4.0 Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe Reporter: Kazuaki Hiraga Labels: features Japanese morphological analyzer, Kuromoji doesn't have a capability to decompose every Japanese Katakana compound tokens to sub-tokens. It seems that some Katakana tokens can be decomposed, but it cannot be applied every Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary has バッグ in its entry. I would like to apply the decompose feature to every Katakana tokens if the sub-tokens are in the dictionary or add the capability to force apply the decompose feature to every Katakana tokens. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3273) 404 Not Found on action=PREPRECOVERY
[ https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Per Steffensen updated SOLR-3273: - Description: We have an application based on a recent copy of 4.0-SNAPSHOT. We have a preformance test setup where we performance test our application (and therefore indirectly Solr(Cloud)). When we run the performance test against a setup using SolrCloud without replication, everything seems to run very nicely for days. When we add replication to the setup the same performance test shows some problems - which we will report (and maybe help fix) in distinct issues here in jira. About the setup - the setup is a little more complex than described below, but I believe the description will tell enough: We have two solr servers which we start from solr-install/example using this command (ZooKeepers have been started before) - we first start solr on server1, and then starts solr on server2 after solr on server1 finished starting up: {code} nohup java -Xmx4096m -Dcom.sun.management.jmxremote -DzkHost=server1:2181,server2:2181,server3:2181 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log {code} The ./myapp/solr.xml looks like this on server1: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server1 hostPort=8983 hostContext=solr core name=collA_slice1_shard1 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The ./myapp/solr.xml looks like this on server2: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server2 hostPort=8983 hostContext=solr core name=collA_slice1_shard2 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The first thing we observe is that Solr server1 (running collA_slice1_shard1) seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) is started up later it quickly reports the following in its solr.log an keeps doing that for a long time: {code} SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not Found request: http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2 at org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) {code} Please note that we have changed a little bit in the way errors are logged, but basically this means that Solr server2 gets an 404 Not Found on its request http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2; to Solr server1. Seems like there is not a common agreement among the Solr servers on how/where to send those requests and how/where to listen for them. Regards, Per Steffensen was: We have an application based on a recent copy of 4.0-SNAPSHOT. We have a preformance test setup where we performance test our application (and therefore indirectly Solr(Cloud)). When we run the performance test against a setup using SolrCloud without replication, everything seems to run very nicely for days. When we add replication to the setup the same performance test shows some problems - which we will report (and maybe help fix) in distinct issues here in jira. About the setup - the setup is a little more complex than described below, but I believe the description will tell enough: We have two solr servers which we start from solr-install/example using this command (ZooKeepers have been started before) - we first start solr on server1, and then starts solr on server2 after solr on server1 finished starting up: pre nohup java -Xmx4096m -Dcom.sun.management.jmxremote -DzkHost=server1:2181,server2:2181,server3:2181 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log /pre The ./myapp/solr.xml looks like this on
[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3659: -- Attachment: LUCENE-3659.patch A played a little bit around and implemented the IOContext / filename dependent buffer sizes for RAMFiles. The code currently prints out lot's of size infornation (like buffer sizes) on RAMDirectory.close(). This is just for debugging and to show what happens. To catually see real-world use cases, execute tests with ant test -Dtests.directory=RAMDirectory -Dtests.nightly=true Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3659: -- Attachment: LUCENE-3659.patch More improvements: - If you use new RAMDirectory(existingDir), the RAMFiles in the created RAMDirectory will have the original fileSize (if less then 1L 30 bytes) as bufferSize, as we know the file size upfront. Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji
Add Japanese Kanji number normalization to Kuromoji --- Key: LUCENE-3922 URL: https://issues.apache.org/jira/browse/LUCENE-3922 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Affects Versions: 4.0 Reporter: Kazuaki Hiraga Japanese people use Kanji numerals instead of Arabic numerals for writing price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 十二月(December). So, we would like to normalize those Kanji numerals to Arabic numerals (I don't think we need to have a capability to normalize to Kanji numerals). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY
[ https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238292#comment-13238292 ] Erick Erickson commented on SOLR-3273: -- Of course the people who actually know the code may make me look foolish, but why are you even turning on replication in a SolrCloud environment? As I understand it, all the replication etc is done for you by virtue of the leaders automatically distributing the incoming updates to all replicas so nothing useful is accomplished by turning on replication. If I'm on track, maybe the right solution is for the replication code to do the right thing when running in a SolrCloud configuration, which is to do nothing. 404 Not Found on action=PREPRECOVERY Key: SOLR-3273 URL: https://issues.apache.org/jira/browse/SOLR-3273 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen We have an application based on a recent copy of 4.0-SNAPSHOT. We have a preformance test setup where we performance test our application (and therefore indirectly Solr(Cloud)). When we run the performance test against a setup using SolrCloud without replication, everything seems to run very nicely for days. When we add replication to the setup the same performance test shows some problems - which we will report (and maybe help fix) in distinct issues here in jira. About the setup - the setup is a little more complex than described below, but I believe the description will tell enough: We have two solr servers which we start from solr-install/example using this command (ZooKeepers have been started before) - we first start solr on server1, and then starts solr on server2 after solr on server1 finished starting up: {code} nohup java -Xmx4096m -Dcom.sun.management.jmxremote -DzkHost=server1:2181,server2:2181,server3:2181 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log {code} The ./myapp/solr.xml looks like this on server1: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server1 hostPort=8983 hostContext=solr core name=collA_slice1_shard1 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The ./myapp/solr.xml looks like this on server2: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server2 hostPort=8983 hostContext=solr core name=collA_slice1_shard2 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The first thing we observe is that Solr server1 (running collA_slice1_shard1) seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) is started up later it quickly reports the following in its solr.log an keeps doing that for a long time: {code} SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not Found request: http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2 at org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) {code} Please note that we have changed a little bit in the way errors are logged, but basically this means that Solr server2 gets an 404 Not Found on its request http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2; to Solr server1. Seems like there is not a common agreement among the Solr servers on how/where to send those requests and how/where to listen for them. Regards, Per Steffensen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3659: -- Attachment: LUCENE-3659.patch Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3659: -- Attachment: LUCENE-3659.patch Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238303#comment-13238303 ] Christian Moen commented on LUCENE-3922: Thanks a lot, Kazu. This is a good idea to add. Patches are of course also very welcome! :) Add Japanese Kanji number normalization to Kuromoji --- Key: LUCENE-3922 URL: https://issues.apache.org/jira/browse/LUCENE-3922 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Affects Versions: 4.0 Reporter: Kazuaki Hiraga Labels: features Japanese people use Kanji numerals instead of Arabic numerals for writing price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 十二月(December). So, we would like to normalize those Kanji numerals to Arabic numerals (I don't think we need to have a capability to normalize to Kanji numerals). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3659: -- Attachment: (was: LUCENE-3659.patch) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3659: -- Attachment: (was: LUCENE-3659.patch) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY
[ https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238311#comment-13238311 ] Per Steffensen commented on SOLR-3273: -- Hi Thanks for your reply. Correct me (too) if Im wrong, but I believe SolrCloud does not do replication unless it is asked to. I believe you can turn replication on by setting numShards 1 somewhere, or you can set it up more manually by making sure you have more cores defined with the same shard value (slice1 in my case) in solr.xml's distributed on different solr instances - like we try to do. But I would really like to be corrected if anyone knows that I am doing something wrong. Regards, Per Steffensen 404 Not Found on action=PREPRECOVERY Key: SOLR-3273 URL: https://issues.apache.org/jira/browse/SOLR-3273 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen We have an application based on a recent copy of 4.0-SNAPSHOT. We have a preformance test setup where we performance test our application (and therefore indirectly Solr(Cloud)). When we run the performance test against a setup using SolrCloud without replication, everything seems to run very nicely for days. When we add replication to the setup the same performance test shows some problems - which we will report (and maybe help fix) in distinct issues here in jira. About the setup - the setup is a little more complex than described below, but I believe the description will tell enough: We have two solr servers which we start from solr-install/example using this command (ZooKeepers have been started before) - we first start solr on server1, and then starts solr on server2 after solr on server1 finished starting up: {code} nohup java -Xmx4096m -Dcom.sun.management.jmxremote -DzkHost=server1:2181,server2:2181,server3:2181 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log {code} The ./myapp/solr.xml looks like this on server1: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server1 hostPort=8983 hostContext=solr core name=collA_slice1_shard1 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The ./myapp/solr.xml looks like this on server2: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server2 hostPort=8983 hostContext=solr core name=collA_slice1_shard2 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The first thing we observe is that Solr server1 (running collA_slice1_shard1) seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) is started up later it quickly reports the following in its solr.log an keeps doing that for a long time: {code} SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not Found request: http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2 at org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) {code} Please note that we have changed a little bit in the way errors are logged, but basically this means that Solr server2 gets an 404 Not Found on its request http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2; to Solr server1. Seems like there is not a common agreement among the Solr servers on how/where to send those requests and how/where to listen for them. Regards, Per Steffensen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For
[jira] [Assigned] (SOLR-3272) Solr filter factory for MorfologikFilter
[ https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dawid Weiss reassigned SOLR-3272: - Assignee: Dawid Weiss Solr filter factory for MorfologikFilter Key: SOLR-3272 URL: https://issues.apache.org/jira/browse/SOLR-3272 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 4.0 Reporter: Rafał Kuć Assignee: Dawid Weiss Fix For: 4.0 Attachments: SOLR-3272.patch I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe someone will have make use of it :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3272) Solr filter factory for MorfologikFilter
[ https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238327#comment-13238327 ] Dawid Weiss commented on SOLR-3272: --- Hi Michał. Could you modify this patch to include support for the three dictionaries (combined, morfeusz and morfologik)? This would be more flexible (and the combined dictionary is nearly twice larger than morfologik itself so it's worth it). {code} return new MorfologikFilter(ts, DICTIONARY.MORFOLOGIK, luceneMatchVersion); {code} Also, an example of use in the JavaDoc would be nice (see BeiderMorseFilterFactory for example). The test should be using DEFAULT_VERSION not the fixed LUCENE_40. Thanks! Solr filter factory for MorfologikFilter Key: SOLR-3272 URL: https://issues.apache.org/jira/browse/SOLR-3272 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 4.0 Reporter: Rafał Kuć Assignee: Dawid Weiss Fix For: 4.0 Attachments: SOLR-3272.patch I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe someone will have make use of it :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (SOLR-3272) Solr filter factory for MorfologikFilter
[ https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238327#comment-13238327 ] Dawid Weiss edited comment on SOLR-3272 at 3/26/12 12:17 PM: - Hi Rafał. Could you modify this patch to include support for the three dictionaries (combined, morfeusz and morfologik)? This would be more flexible (and the combined dictionary is nearly twice larger than morfologik itself so it's worth it). {code} return new MorfologikFilter(ts, DICTIONARY.MORFOLOGIK, luceneMatchVersion); {code} Also, an example of use in the JavaDoc would be nice (see BeiderMorseFilterFactory for example). The test should be using DEFAULT_VERSION not the fixed LUCENE_40. Thanks! was (Author: dweiss): Hi Michał. Could you modify this patch to include support for the three dictionaries (combined, morfeusz and morfologik)? This would be more flexible (and the combined dictionary is nearly twice larger than morfologik itself so it's worth it). {code} return new MorfologikFilter(ts, DICTIONARY.MORFOLOGIK, luceneMatchVersion); {code} Also, an example of use in the JavaDoc would be nice (see BeiderMorseFilterFactory for example). The test should be using DEFAULT_VERSION not the fixed LUCENE_40. Thanks! Solr filter factory for MorfologikFilter Key: SOLR-3272 URL: https://issues.apache.org/jira/browse/SOLR-3272 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 4.0 Reporter: Rafał Kuć Assignee: Dawid Weiss Fix For: 4.0 Attachments: SOLR-3272.patch I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe someone will have make use of it :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238329#comment-13238329 ] Koji Sekiguchi commented on LUCENE-3922: We, RONDHUIT, have done this kind of normalization (and more!). You may be interested in: http://www.rondhuit-demo.com/RCSS/api/overview-summary.html#featured-japanese ||Summary||normalization sample|| |漢数字=算用数字正規化|四七=47, 四十七=47, 四拾七=47, 四〇七=407| |和暦=西暦正規化|昭和四七年、昭和四十七年、昭和四拾七年=1972年, 昭和六十四年、平成元年=1989年| Add Japanese Kanji number normalization to Kuromoji --- Key: LUCENE-3922 URL: https://issues.apache.org/jira/browse/LUCENE-3922 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Affects Versions: 4.0 Reporter: Kazuaki Hiraga Labels: features Japanese people use Kanji numerals instead of Arabic numerals for writing price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 十二月(December). So, we would like to normalize those Kanji numerals to Arabic numerals (I don't think we need to have a capability to normalize to Kanji numerals). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3272) Solr filter factory for MorfologikFilter
[ https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238330#comment-13238330 ] Rafał Kuć commented on SOLR-3272: - Sure Dawid, no problem. I'll provide a patch later today. Solr filter factory for MorfologikFilter Key: SOLR-3272 URL: https://issues.apache.org/jira/browse/SOLR-3272 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 4.0 Reporter: Rafał Kuć Assignee: Dawid Weiss Fix For: 4.0 Attachments: SOLR-3272.patch I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe someone will have make use of it :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3272) Solr filter factory for MorfologikFilter
[ https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238332#comment-13238332 ] Dawid Weiss commented on SOLR-3272: --- Thanks. Sorry about the name confusion btw. Don't know where I took Michał from :) Solr filter factory for MorfologikFilter Key: SOLR-3272 URL: https://issues.apache.org/jira/browse/SOLR-3272 Project: Solr Issue Type: New Feature Components: Schema and Analysis Affects Versions: 4.0 Reporter: Rafał Kuć Assignee: Dawid Weiss Fix For: 4.0 Attachments: SOLR-3272.patch I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe someone will have make use of it :) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji
[ https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238334#comment-13238334 ] Christian Moen commented on LUCENE-3922: Koji, this is very nice. Does the kanji number normalizer ({{KanjiNumberCharFilter}}) also deal with combinations of kanji and arabic numbers like Kazu's price example? Is the above code you refer to something that can go into Lucene or is it non-free software? Add Japanese Kanji number normalization to Kuromoji --- Key: LUCENE-3922 URL: https://issues.apache.org/jira/browse/LUCENE-3922 Project: Lucene - Java Issue Type: New Feature Components: modules/analysis Affects Versions: 4.0 Reporter: Kazuaki Hiraga Labels: features Japanese people use Kanji numerals instead of Arabic numerals for writing price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 十二月(December). So, we would like to normalize those Kanji numerals to Arabic numerals (I don't think we need to have a capability to normalize to Kanji numerals). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3274) ZooKeeper related SolrCloud problems
ZooKeeper related SolrCloud problems Key: SOLR-3274 URL: https://issues.apache.org/jira/browse/SOLR-3274 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 Solr servers, running 28 slices of the same collection (collA) - all slices have one replica (two shards all in all - leader + replica) - 56 cores all in all (8 shards on each solr instance). But anyways... Besides the problem reported in SOLR-3273, the system seems to run fine under high load for several hours, but eventually errors like the ones shown below start to occur. I might be wrong, but they all seem to indicate some kind of unstability in the collaboration between Solr and ZooKeeper. I have to say that I havnt been there to check ZooKeeper at the moment where those exception occur, but basically I dont believe the exceptions occur because ZooKeeper is not running stable - at least when I go and check ZooKeeper through other channels (e.g. my eclipse ZK plugin) it is always accepting my connection and generally seems to be doing fine. Exception 1) Often the first error we see in solr.log is something like this {code} Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} I believe this error basically occurs because SolrZkClient.isConnected reports false, which means that its internal keeper.getState does not return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED for a long time, since this error starts occuring after several hours of processing without this problem showing. But why is it suddenly not connected anymore?! Exception 2) We also see errors like the following, and if Im not mistaken, they start occuring shortly after Exception 1) (above) shows for the fist time {code} Mar 22, 2012 5:07:26 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at
[jira] [Created] (LUCENE-3923) fail the build on wrong svn:eol-style
fail the build on wrong svn:eol-style - Key: LUCENE-3923 URL: https://issues.apache.org/jira/browse/LUCENE-3923 Project: Lucene - Java Issue Type: Task Components: general/build Reporter: Robert Muir I'm tired of fixing this before releases. Jenkins should detect and fail on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3923) fail the build on wrong svn:eol-style
[ https://issues.apache.org/jira/browse/LUCENE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238355#comment-13238355 ] Michael McCandless commented on LUCENE-3923: +1 And, ideally, ant test as well... fail the build on wrong svn:eol-style - Key: LUCENE-3923 URL: https://issues.apache.org/jira/browse/LUCENE-3923 Project: Lucene - Java Issue Type: Task Components: general/build Reporter: Robert Muir I'm tired of fixing this before releases. Jenkins should detect and fail on this. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY
[ https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238377#comment-13238377 ] Mark Miller commented on SOLR-3273: --- bq. adminPath=/admin/myapp Thats probably the issue - I think we assume /admin/cores or whatever the default is. 404 Not Found on action=PREPRECOVERY Key: SOLR-3273 URL: https://issues.apache.org/jira/browse/SOLR-3273 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen We have an application based on a recent copy of 4.0-SNAPSHOT. We have a preformance test setup where we performance test our application (and therefore indirectly Solr(Cloud)). When we run the performance test against a setup using SolrCloud without replication, everything seems to run very nicely for days. When we add replication to the setup the same performance test shows some problems - which we will report (and maybe help fix) in distinct issues here in jira. About the setup - the setup is a little more complex than described below, but I believe the description will tell enough: We have two solr servers which we start from solr-install/example using this command (ZooKeepers have been started before) - we first start solr on server1, and then starts solr on server2 after solr on server1 finished starting up: {code} nohup java -Xmx4096m -Dcom.sun.management.jmxremote -DzkHost=server1:2181,server2:2181,server3:2181 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log {code} The ./myapp/solr.xml looks like this on server1: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server1 hostPort=8983 hostContext=solr core name=collA_slice1_shard1 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The ./myapp/solr.xml looks like this on server2: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server2 hostPort=8983 hostContext=solr core name=collA_slice1_shard2 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The first thing we observe is that Solr server1 (running collA_slice1_shard1) seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) is started up later it quickly reports the following in its solr.log an keeps doing that for a long time: {code} SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not Found request: http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2 at org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) {code} Please note that we have changed a little bit in the way errors are logged, but basically this means that Solr server2 gets an 404 Not Found on its request http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2; to Solr server1. Seems like there is not a common agreement among the Solr servers on how/where to send those requests and how/where to listen for them. Regards, Per Steffensen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (SOLR-3273) 404 Not Found on action=PREPRECOVERY
[ https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-3273: -- Priority: Minor (was: Major) 404 Not Found on action=PREPRECOVERY Key: SOLR-3273 URL: https://issues.apache.org/jira/browse/SOLR-3273 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller Priority: Minor We have an application based on a recent copy of 4.0-SNAPSHOT. We have a preformance test setup where we performance test our application (and therefore indirectly Solr(Cloud)). When we run the performance test against a setup using SolrCloud without replication, everything seems to run very nicely for days. When we add replication to the setup the same performance test shows some problems - which we will report (and maybe help fix) in distinct issues here in jira. About the setup - the setup is a little more complex than described below, but I believe the description will tell enough: We have two solr servers which we start from solr-install/example using this command (ZooKeepers have been started before) - we first start solr on server1, and then starts solr on server2 after solr on server1 finished starting up: {code} nohup java -Xmx4096m -Dcom.sun.management.jmxremote -DzkHost=server1:2181,server2:2181,server3:2181 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log {code} The ./myapp/solr.xml looks like this on server1: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server1 hostPort=8983 hostContext=solr core name=collA_slice1_shard1 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The ./myapp/solr.xml looks like this on server2: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server2 hostPort=8983 hostContext=solr core name=collA_slice1_shard2 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The first thing we observe is that Solr server1 (running collA_slice1_shard1) seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) is started up later it quickly reports the following in its solr.log an keeps doing that for a long time: {code} SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not Found request: http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2 at org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) {code} Please note that we have changed a little bit in the way errors are logged, but basically this means that Solr server2 gets an 404 Not Found on its request http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2; to Solr server1. Seems like there is not a common agreement among the Solr servers on how/where to send those requests and how/where to listen for them. Regards, Per Steffensen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3273) 404 Not Found on action=PREPRECOVERY
[ https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-3273: - Assignee: Mark Miller 404 Not Found on action=PREPRECOVERY Key: SOLR-3273 URL: https://issues.apache.org/jira/browse/SOLR-3273 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller We have an application based on a recent copy of 4.0-SNAPSHOT. We have a preformance test setup where we performance test our application (and therefore indirectly Solr(Cloud)). When we run the performance test against a setup using SolrCloud without replication, everything seems to run very nicely for days. When we add replication to the setup the same performance test shows some problems - which we will report (and maybe help fix) in distinct issues here in jira. About the setup - the setup is a little more complex than described below, but I believe the description will tell enough: We have two solr servers which we start from solr-install/example using this command (ZooKeepers have been started before) - we first start solr on server1, and then starts solr on server2 after solr on server1 finished starting up: {code} nohup java -Xmx4096m -Dcom.sun.management.jmxremote -DzkHost=server1:2181,server2:2181,server3:2181 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log {code} The ./myapp/solr.xml looks like this on server1: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server1 hostPort=8983 hostContext=solr core name=collA_slice1_shard1 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The ./myapp/solr.xml looks like this on server2: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server2 hostPort=8983 hostContext=solr core name=collA_slice1_shard2 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The first thing we observe is that Solr server1 (running collA_slice1_shard1) seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) is started up later it quickly reports the following in its solr.log an keeps doing that for a long time: {code} SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not Found request: http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2 at org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) {code} Please note that we have changed a little bit in the way errors are logged, but basically this means that Solr server2 gets an 404 Not Found on its request http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2; to Solr server1. Seems like there is not a common agreement among the Solr servers on how/where to send those requests and how/where to listen for them. Regards, Per Steffensen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238381#comment-13238381 ] Mark Miller commented on SOLR-3274: --- This happens because the connection between solr and zookeeper is lost - perhaps because the load on the box is too high. I think we may default to a fairly low timeout that could be raised (by default and manually). ZooKeeper related SolrCloud problems Key: SOLR-3274 URL: https://issues.apache.org/jira/browse/SOLR-3274 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 Solr servers, running 28 slices of the same collection (collA) - all slices have one replica (two shards all in all - leader + replica) - 56 cores all in all (8 shards on each solr instance). But anyways... Besides the problem reported in SOLR-3273, the system seems to run fine under high load for several hours, but eventually errors like the ones shown below start to occur. I might be wrong, but they all seem to indicate some kind of unstability in the collaboration between Solr and ZooKeeper. I have to say that I havnt been there to check ZooKeeper at the moment where those exception occur, but basically I dont believe the exceptions occur because ZooKeeper is not running stable - at least when I go and check ZooKeeper through other channels (e.g. my eclipse ZK plugin) it is always accepting my connection and generally seems to be doing fine. Exception 1) Often the first error we see in solr.log is something like this {code} Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} I believe this error basically occurs because SolrZkClient.isConnected reports false, which means that its internal keeper.getState does not return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED for a long time, since this error starts occuring after several hours of processing without this problem showing. But why is it suddenly not connected anymore?! Exception 2) We also see errors like the following, and if Im not mistaken, they start occuring shortly after Exception 1) (above) shows for the fist time {code} Mar 22, 2012 5:07:26 AM
[jira] [Assigned] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller reassigned SOLR-3274: - Assignee: Mark Miller ZooKeeper related SolrCloud problems Key: SOLR-3274 URL: https://issues.apache.org/jira/browse/SOLR-3274 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 Solr servers, running 28 slices of the same collection (collA) - all slices have one replica (two shards all in all - leader + replica) - 56 cores all in all (8 shards on each solr instance). But anyways... Besides the problem reported in SOLR-3273, the system seems to run fine under high load for several hours, but eventually errors like the ones shown below start to occur. I might be wrong, but they all seem to indicate some kind of unstability in the collaboration between Solr and ZooKeeper. I have to say that I havnt been there to check ZooKeeper at the moment where those exception occur, but basically I dont believe the exceptions occur because ZooKeeper is not running stable - at least when I go and check ZooKeeper through other channels (e.g. my eclipse ZK plugin) it is always accepting my connection and generally seems to be doing fine. Exception 1) Often the first error we see in solr.log is something like this {code} Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} I believe this error basically occurs because SolrZkClient.isConnected reports false, which means that its internal keeper.getState does not return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED for a long time, since this error starts occuring after several hours of processing without this problem showing. But why is it suddenly not connected anymore?! Exception 2) We also see errors like the following, and if Im not mistaken, they start occuring shortly after Exception 1) (above) shows for the fist time {code} Mar 22, 2012 5:07:26 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149) at
[jira] [Created] (SOLR-3275) Add the ability to set shard and collection in web gui when adding a shard
Add the ability to set shard and collection in web gui when adding a shard -- Key: SOLR-3275 URL: https://issues.apache.org/jira/browse/SOLR-3275 Project: Solr Issue Type: New Feature Components: web gui Affects Versions: 4.0 Reporter: Jamie Johnson Currently the latest web gui allows you to add an additional core but does not allow you to specify the shard or collection that core should be part of. In the core admin view when adding a core we should expose options to set these values when creating a core. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY
[ https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238384#comment-13238384 ] Per Steffensen commented on SOLR-3273: -- @Mark Miller: Thanks. We will try that. I would be very helpful if you could state exactly what you expect in adminPath. Does it have to be exactly /admin/cores or is /admin/cores/myapp allowed or does it have to be something else. Thanks! @Erick Erickson: Please note that I am talking about the built-in replication of SolrCloud and not the old replication described at http://wiki.apache.org/solr/SolrReplication 404 Not Found on action=PREPRECOVERY Key: SOLR-3273 URL: https://issues.apache.org/jira/browse/SOLR-3273 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller Priority: Minor We have an application based on a recent copy of 4.0-SNAPSHOT. We have a preformance test setup where we performance test our application (and therefore indirectly Solr(Cloud)). When we run the performance test against a setup using SolrCloud without replication, everything seems to run very nicely for days. When we add replication to the setup the same performance test shows some problems - which we will report (and maybe help fix) in distinct issues here in jira. About the setup - the setup is a little more complex than described below, but I believe the description will tell enough: We have two solr servers which we start from solr-install/example using this command (ZooKeepers have been started before) - we first start solr on server1, and then starts solr on server2 after solr on server1 finished starting up: {code} nohup java -Xmx4096m -Dcom.sun.management.jmxremote -DzkHost=server1:2181,server2:2181,server3:2181 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log {code} The ./myapp/solr.xml looks like this on server1: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server1 hostPort=8983 hostContext=solr core name=collA_slice1_shard1 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The ./myapp/solr.xml looks like this on server2: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server2 hostPort=8983 hostContext=solr core name=collA_slice1_shard2 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The first thing we observe is that Solr server1 (running collA_slice1_shard1) seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) is started up later it quickly reports the following in its solr.log an keeps doing that for a long time: {code} SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not Found request: http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2 at org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) {code} Please note that we have changed a little bit in the way errors are logged, but basically this means that Solr server2 gets an 404 Not Found on its request http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2; to Solr server1. Seems like there is not a common agreement among the Solr servers on how/where to send those requests and how/where to listen for them. Regards, Per Steffensen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see:
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238391#comment-13238391 ] Per Steffensen commented on SOLR-3274: -- Thanks a lot, Mark! Can all the exception be explained by connection loss between solr and zookeeper? Im not sure I totally buy the explanation because I believe that, even though there is a fairly high update/search-load on the machines in the cluster, the machines actually do not seem to be exhausted (CPU idle way above 0% (more like 50% in average), not very high IO-wait etc.). So I would expect plenty of resources to be available for ZK to respond fast. But lets see what happens if we set the timeout higher. Can you point me in the direction of how to set it manually? Regards, Per Steffensen ZooKeeper related SolrCloud problems Key: SOLR-3274 URL: https://issues.apache.org/jira/browse/SOLR-3274 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 Solr servers, running 28 slices of the same collection (collA) - all slices have one replica (two shards all in all - leader + replica) - 56 cores all in all (8 shards on each solr instance). But anyways... Besides the problem reported in SOLR-3273, the system seems to run fine under high load for several hours, but eventually errors like the ones shown below start to occur. I might be wrong, but they all seem to indicate some kind of unstability in the collaboration between Solr and ZooKeeper. I have to say that I havnt been there to check ZooKeeper at the moment where those exception occur, but basically I dont believe the exceptions occur because ZooKeeper is not running stable - at least when I go and check ZooKeeper through other channels (e.g. my eclipse ZK plugin) it is always accepting my connection and generally seems to be doing fine. Exception 1) Often the first error we see in solr.log is something like this {code} Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} I believe this error basically occurs because SolrZkClient.isConnected reports false, which means that its internal keeper.getState does not return
[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY
[ https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238394#comment-13238394 ] Mark Miller commented on SOLR-3273: --- just adminPath=/admin/cores - same as you see in the default solr.xml. Now I could make it so that we look up what the admin path is locally - but I don't know that we should - just because someone has changed the adminPath locally, doesn't mean they changed it on the 'remote' node. We don't really have a way of know what it is on the remote node. So it may be the right choice to just require that people leave it as is for solrcloud (though of course we should doc this). 404 Not Found on action=PREPRECOVERY Key: SOLR-3273 URL: https://issues.apache.org/jira/browse/SOLR-3273 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller Priority: Minor We have an application based on a recent copy of 4.0-SNAPSHOT. We have a preformance test setup where we performance test our application (and therefore indirectly Solr(Cloud)). When we run the performance test against a setup using SolrCloud without replication, everything seems to run very nicely for days. When we add replication to the setup the same performance test shows some problems - which we will report (and maybe help fix) in distinct issues here in jira. About the setup - the setup is a little more complex than described below, but I believe the description will tell enough: We have two solr servers which we start from solr-install/example using this command (ZooKeepers have been started before) - we first start solr on server1, and then starts solr on server2 after solr on server1 finished starting up: {code} nohup java -Xmx4096m -Dcom.sun.management.jmxremote -DzkHost=server1:2181,server2:2181,server3:2181 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log {code} The ./myapp/solr.xml looks like this on server1: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server1 hostPort=8983 hostContext=solr core name=collA_slice1_shard1 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The ./myapp/solr.xml looks like this on server2: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server2 hostPort=8983 hostContext=solr core name=collA_slice1_shard2 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The first thing we observe is that Solr server1 (running collA_slice1_shard1) seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) is started up later it quickly reports the following in its solr.log an keeps doing that for a long time: {code} SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not Found request: http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2 at org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) {code} Please note that we have changed a little bit in the way errors are logged, but basically this means that Solr server2 gets an 404 Not Found on its request http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2; to Solr server1. Seems like there is not a common agreement among the Solr servers on how/where to send those requests and how/where to listen for them. Regards, Per Steffensen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators:
[jira] [Commented] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary
[ https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238397#comment-13238397 ] Robert Muir commented on LUCENE-3888: - Thanks for the feedback Koji. I'm not happy with the situation: I thought it would be easy to support some rough Japanese spellcheck in 3.6 But it just seems like we need to do a lot of cleanup to make it work, I would rather fix all of these APIs and do it right the first time so that things like distributed support work too. split off the spell check word and surface form in spell check dictionary - Key: LUCENE-3888 URL: https://issues.apache.org/jira/browse/LUCENE-3888 Project: Lucene - Java Issue Type: Improvement Components: modules/spellchecker Reporter: Koji Sekiguchi Assignee: Koji Sekiguchi Priority: Minor Fix For: 4.0 Attachments: LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch The did you mean? feature by using Lucene's spell checker cannot work well for Japanese environment unfortunately and is the longstanding problem, because the logic needs comparatively long text to check spells, but for some languages (e.g. Japanese), most words are too short to use the spell checker. I think, for at least Japanese, the things can be improved if we split off the spell check word and surface form in the spell check dictionary. Then we can use ReadingAttribute for spell checking but CharTermAttribute for suggesting, for example. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests
[ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael McCandless reassigned LUCENE-3873: -- Assignee: Michael McCandless tie MockGraphTokenFilter into all analyzers tests - Key: LUCENE-3873 URL: https://issues.apache.org/jira/browse/LUCENE-3873 Project: Lucene - Java Issue Type: Task Components: modules/analysis Reporter: Robert Muir Assignee: Michael McCandless Mike made a MockGraphTokenFilter on LUCENE-3848. Many filters currently arent tested with anything but a simple tokenstream. we should test them with this, too, it might find bugs (zero-length terms, stacked terms/synonyms, etc) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests
[ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238398#comment-13238398 ] Michael McCandless commented on LUCENE-3873: LUCENE-3848 has the MockGraphTokenFilter patch... tie MockGraphTokenFilter into all analyzers tests - Key: LUCENE-3873 URL: https://issues.apache.org/jira/browse/LUCENE-3873 Project: Lucene - Java Issue Type: Task Components: modules/analysis Reporter: Robert Muir Mike made a MockGraphTokenFilter on LUCENE-3848. Many filters currently arent tested with anything but a simple tokenstream. we should test them with this, too, it might find bugs (zero-length terms, stacked terms/synonyms, etc) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238402#comment-13238402 ] Michael McCandless commented on LUCENE-3659: This looks great Uwe! I'm a little worried about the tiny file case; you're checking for SEGMENTS_* now, but many other files can be much smaller than 1/64th of the estimated segment size. I wonder if we should improve IOContext to hold the [rough] estimated file size (not just overall segment size)... the thing is that's sort of a hassle on codec impls. Or: maybe, on closing the ROS/RAMFile, we can downsize the final buffer (yes, this means copying the bytes, but that cost is vanishingly small as the RAMDir grows). Then tiny files stay tiny, though they are still [relatively] costly to create... I don't this RAMDir.createOutput should publish the RAMFile until the ROS is closed? Ie, you are not allowed to openInput on something still opened with createOutput in any Lucene Dir impl..? This would allow us to make RAMFile frozen (eg if ROS holds its own buffers and then creates RAMFile on close), that requires no sync when reading? I also don't think RAMFile should be public, ie, the only way to make changes to a file stored in a RAMDir is via RAMOutputStream. We can do this separately... Maybe we should pursue a growing buffer size...? Ie, where each newly added buffer is bigger than the one before (like ArrayUtil.oversize's growth function)... I realize that adds complexity (RAMInputStream.seek is more fun), but this would let tiny files use tiny RAM and huge files use few buffers. Ie, RAMDir would scale up and scale down well. Separately: I noticed we still have IndexOutput.setLength, but, nobody calls it anymore I think? (In 3.x we call this when creating a CFS). Maybe we should remove it... Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests
[ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238405#comment-13238405 ] Robert Muir commented on LUCENE-3873: - One way we can tie this in is via LUCENE-3919. But: I think we can use this filter in some individual tests immediately? E.g. we can just add a method testRandomGraphs to the filters that do lots of crazy state-capturing, putting this thing in-front-of/behind them in the analyzer and call checkRandomData? tie MockGraphTokenFilter into all analyzers tests - Key: LUCENE-3873 URL: https://issues.apache.org/jira/browse/LUCENE-3873 Project: Lucene - Java Issue Type: Task Components: modules/analysis Reporter: Robert Muir Assignee: Michael McCandless Mike made a MockGraphTokenFilter on LUCENE-3848. Many filters currently arent tested with anything but a simple tokenstream. we should test them with this, too, it might find bugs (zero-length terms, stacked terms/synonyms, etc) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238404#comment-13238404 ] Mark Miller commented on SOLR-3274: --- bq. Can all the exception be explained by connection loss between solr and zookeeper? bq. SessionExpiredException This indicates the connection with ZooKeeper was lost. bq. org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. If there is no connection to ZooKeeper, you will see this if you send an update. bq. org.apache.solr.common.SolrException: no servers hosting shard: Sami Siren has a JIRA issue about improving this message I believe - but normally it means that the cluster does not see a single node hosting a given shard. Not sure if this is related to the above - not the same smoking gun. bq. Can you point me in the direction of how to set it manually? The default is only 10 seconds. I'd try 30 seconds perhaps? You don't want it too low, but you also don't want it too high if you can help it. I can't remember what the zookeeper default is, but I've seen it set as high as 60 seconds looking around some hbase usage... You should be able to set it in solr.xml as a cores attribute: zkClientTimeout=3 or whatever. That is: cores adminPath=/admin/cores zkClientTimeout=3 You'd want to do it for each node. ZooKeeper related SolrCloud problems Key: SOLR-3274 URL: https://issues.apache.org/jira/browse/SOLR-3274 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 Solr servers, running 28 slices of the same collection (collA) - all slices have one replica (two shards all in all - leader + replica) - 56 cores all in all (8 shards on each solr instance). But anyways... Besides the problem reported in SOLR-3273, the system seems to run fine under high load for several hours, but eventually errors like the ones shown below start to occur. I might be wrong, but they all seem to indicate some kind of unstability in the collaboration between Solr and ZooKeeper. I have to say that I havnt been there to check ZooKeeper at the moment where those exception occur, but basically I dont believe the exceptions occur because ZooKeeper is not running stable - at least when I go and check ZooKeeper through other channels (e.g. my eclipse ZK plugin) it is always accepting my connection and generally seems to be doing fine. Exception 1) Often the first error we see in solr.log is something like this {code} Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at
[jira] [Commented] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238407#comment-13238407 ] Robert Muir commented on LUCENE-3659: - {quote} I'm a little worried about the tiny file case; you're checking for SEGMENTS_* now, but many other files can be much smaller than 1/64th of the estimated segment size. I wonder if we should improve IOContext to hold the [rough] estimated file size (not just overall segment size)... the thing is that's sort of a hassle on codec impls. {quote} Maybe its enough for IOContext to specify that its writing a 'metadata' file? These are all the tiny ones (fieldinfos, segmentinfos, .cfe, etc), as opposed to 'real files' like frq or prx that are expected to be possibly huge. Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3275) Add the ability to set shard and collection in web gui when adding a shard
[ https://issues.apache.org/jira/browse/SOLR-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238411#comment-13238411 ] Mark Miller commented on SOLR-3275: --- bq. we should expose options to set these values when creating a core. But they should probably only be visible if in cloud mode. Add the ability to set shard and collection in web gui when adding a shard -- Key: SOLR-3275 URL: https://issues.apache.org/jira/browse/SOLR-3275 Project: Solr Issue Type: New Feature Components: web gui Affects Versions: 4.0 Reporter: Jamie Johnson Currently the latest web gui allows you to add an additional core but does not allow you to specify the shard or collection that core should be part of. In the core admin view when adding a core we should expose options to set these values when creating a core. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY
[ https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238410#comment-13238410 ] Per Steffensen commented on SOLR-3273: -- Thanks a lot. It is ok for us just to use /admin/cores. We really do not mind. But at least it needs some documentation, or maybe share admin-path in ZK, so that a remote solr can acutally look it up. Well you decide that. Regards, Per Steffensen 404 Not Found on action=PREPRECOVERY Key: SOLR-3273 URL: https://issues.apache.org/jira/browse/SOLR-3273 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller Priority: Minor We have an application based on a recent copy of 4.0-SNAPSHOT. We have a preformance test setup where we performance test our application (and therefore indirectly Solr(Cloud)). When we run the performance test against a setup using SolrCloud without replication, everything seems to run very nicely for days. When we add replication to the setup the same performance test shows some problems - which we will report (and maybe help fix) in distinct issues here in jira. About the setup - the setup is a little more complex than described below, but I believe the description will tell enough: We have two solr servers which we start from solr-install/example using this command (ZooKeepers have been started before) - we first start solr on server1, and then starts solr on server2 after solr on server1 finished starting up: {code} nohup java -Xmx4096m -Dcom.sun.management.jmxremote -DzkHost=server1:2181,server2:2181,server3:2181 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log {code} The ./myapp/solr.xml looks like this on server1: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server1 hostPort=8983 hostContext=solr core name=collA_slice1_shard1 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The ./myapp/solr.xml looks like this on server2: {code:xml} ?xml version=1.0 encoding=UTF-8 ? solr persistent=false cores adminPath=/admin/myapp host=server2 hostPort=8983 hostContext=solr core name=collA_slice1_shard2 instanceDir=. dataDir=collA_slice1_data collection=collA shard=slice1 / /cores /solr {code} The first thing we observe is that Solr server1 (running collA_slice1_shard1) seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) is started up later it quickly reports the following in its solr.log an keeps doing that for a long time: {code} SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not Found request: http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2 at org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264) at org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188) at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285) at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206) {code} Please note that we have changed a little bit in the way errors are logged, but basically this means that Solr server2 gets an 404 Not Found on its request http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2; to Solr server1. Seems like there is not a common agreement among the Solr servers on how/where to send those requests and how/where to listen for them. Regards, Per Steffensen -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238409#comment-13238409 ] Mark Miller commented on SOLR-3274: --- bq. not the same smoking gun. Sorry - actually this does make sense with the other errors - if the zk connection is lost, that node is no longer considered live - if that happens to each node hosting a shard (say you have 1 replica and this happened to both nodes) then searches would fail with this. ZooKeeper related SolrCloud problems Key: SOLR-3274 URL: https://issues.apache.org/jira/browse/SOLR-3274 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 Solr servers, running 28 slices of the same collection (collA) - all slices have one replica (two shards all in all - leader + replica) - 56 cores all in all (8 shards on each solr instance). But anyways... Besides the problem reported in SOLR-3273, the system seems to run fine under high load for several hours, but eventually errors like the ones shown below start to occur. I might be wrong, but they all seem to indicate some kind of unstability in the collaboration between Solr and ZooKeeper. I have to say that I havnt been there to check ZooKeeper at the moment where those exception occur, but basically I dont believe the exceptions occur because ZooKeeper is not running stable - at least when I go and check ZooKeeper through other channels (e.g. my eclipse ZK plugin) it is always accepting my connection and generally seems to be doing fine. Exception 1) Often the first error we see in solr.log is something like this {code} Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} I believe this error basically occurs because SolrZkClient.isConnected reports false, which means that its internal keeper.getState does not return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED for a long time, since this error starts occuring after several hours of processing without this problem showing. But why is it suddenly not connected anymore?! Exception 2) We also see errors like the following, and if Im not mistaken, they start
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238419#comment-13238419 ] Per Steffensen commented on SOLR-3274: -- U 10 secs is A LOT OF TIME. I really wouldnt want to set it higher that that. If ZK is not able to answer within 10 secs I need to correct something else in my setup. I still believe that Solr might end in this state (where it believes that the connection to ZK is lost) some other way than actually experiencing a 10+ sec response-time from ZK, but I cant prove it (yet). So for now I will just thank you for your kind help, and assume that it is correct. Then basically my options are to setup a more responsive ZK cluster or maybe raise the ZK timeout on Solr side. Thanks, again. Regards, Per Steffensen ZooKeeper related SolrCloud problems Key: SOLR-3274 URL: https://issues.apache.org/jira/browse/SOLR-3274 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 Solr servers, running 28 slices of the same collection (collA) - all slices have one replica (two shards all in all - leader + replica) - 56 cores all in all (8 shards on each solr instance). But anyways... Besides the problem reported in SOLR-3273, the system seems to run fine under high load for several hours, but eventually errors like the ones shown below start to occur. I might be wrong, but they all seem to indicate some kind of unstability in the collaboration between Solr and ZooKeeper. I have to say that I havnt been there to check ZooKeeper at the moment where those exception occur, but basically I dont believe the exceptions occur because ZooKeeper is not running stable - at least when I go and check ZooKeeper through other channels (e.g. my eclipse ZK plugin) it is always accepting my connection and generally seems to be doing fine. Exception 1) Often the first error we see in solr.log is something like this {code} Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} I believe this error basically occurs because SolrZkClient.isConnected reports false, which means that its internal keeper.getState does not return
[jira] [Commented] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests
[ https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238422#comment-13238422 ] Michael McCandless commented on LUCENE-3873: I agree we can use it in specific places for starters... The patch on LUCENE-3848 mixes in TokenStream to Automaton and MockGraphTokenFilter; I'll split that apart and only commit MockGraphTokenFilter here. One problem is... MockGraphTokenFilter isn't setting offsets currently I think to do this correctly it needs to buffer up pending input tokens, until it's reached the posLength it wants to output for a random token, and then set the offset accordingly. tie MockGraphTokenFilter into all analyzers tests - Key: LUCENE-3873 URL: https://issues.apache.org/jira/browse/LUCENE-3873 Project: Lucene - Java Issue Type: Task Components: modules/analysis Reporter: Robert Muir Assignee: Michael McCandless Mike made a MockGraphTokenFilter on LUCENE-3848. Many filters currently arent tested with anything but a simple tokenstream. we should test them with this, too, it might find bugs (zero-length terms, stacked terms/synonyms, etc) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238428#comment-13238428 ] Per Steffensen commented on SOLR-3274: -- But why not just try to reconnect if/when this situation has occured, so that Solr can continue doing its work? I guess Solr does not do that, because it seems like when this error has first established, there is no recovering, and certainly (Im close to 100% positive) ZK will not continue doing 10+ secs response-times to all requests, even though it might do a 10+ sec response once in a while. Regards, Per Steffensen ZooKeeper related SolrCloud problems Key: SOLR-3274 URL: https://issues.apache.org/jira/browse/SOLR-3274 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 Solr servers, running 28 slices of the same collection (collA) - all slices have one replica (two shards all in all - leader + replica) - 56 cores all in all (8 shards on each solr instance). But anyways... Besides the problem reported in SOLR-3273, the system seems to run fine under high load for several hours, but eventually errors like the ones shown below start to occur. I might be wrong, but they all seem to indicate some kind of unstability in the collaboration between Solr and ZooKeeper. I have to say that I havnt been there to check ZooKeeper at the moment where those exception occur, but basically I dont believe the exceptions occur because ZooKeeper is not running stable - at least when I go and check ZooKeeper through other channels (e.g. my eclipse ZK plugin) it is always accepting my connection and generally seems to be doing fine. Exception 1) Often the first error we see in solr.log is something like this {code} Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} I believe this error basically occurs because SolrZkClient.isConnected reports false, which means that its internal keeper.getState does not return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED for a long time, since this error starts occuring after several hours of processing without this problem showing. But why is it
[jira] [Commented] (LUCENE-3909) Move Kuromoji to analysis.ja and introduce Japanese* naming
[ https://issues.apache.org/jira/browse/LUCENE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238435#comment-13238435 ] Christian Moen commented on LUCENE-3909: Committed revision 1305367 and 1305372 on {{branch_3x}}. I forgot to rename a few Solr test classes. Will follow up now in this JIRA. Move Kuromoji to analysis.ja and introduce Japanese* naming --- Key: LUCENE-3909 URL: https://issues.apache.org/jira/browse/LUCENE-3909 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Assignee: Christian Moen Lucene/Solr 3.6 and 4.0 will get out-of-the-box Japanese language support through {{KuromojiAnalyzer}}, {{KuromojiTokenizer}} and various other filters. These filters currently live in {{org.apache.lucene.analysis.kuromoji}}. I'm proposing that we move Kuromoji to a new Japanese package {{org.apache.lucene.analysis.ja}} in line with how other languages are organized. As part of this, I also think we should rename {{KuromojiAnalyzer}} to {{JapaneseAnalyzer}}, etc. to further align naming to our conventions by making it very clear that these analyzers are for Japanese. (As much as I like the name Kuromoji, I think Japanese is more fitting.) A potential issue I see with this that I'd like to raise and get feedback on, is that end-users in Japan and elsewhere who use lucene-gosen could have issues after an upgrade since lucene-gosen is in fact releasing its analyzers under the {{org.apache.lucene.analysis.ja}} namespace (and we'd have a name clash). I believe users should have the freedom to choose whichever Japanese analyzer, filter, etc. they'd like to use, and I don't want to propose a name change that just creates unnecessary problems for users, but I think the naming proposed above is most fitting for a Lucene/Solr release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Issue Comment Edited] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238441#comment-13238441 ] Uwe Schindler edited comment on LUCENE-3659 at 3/26/12 2:46 PM: Robert: That was the first idea that came to my mind, too. I think thats a good idea. It especially strange that the segments_xx/segments.gen file (which is not part of the current segment) is written with MERGE/FLUSH context. It should be written with a standard context? Or do I miss something? (This was the reason why I added the file name check). Initially I was expecting that writing the commit is done with a separate IOContext, but it isn't - the noisy debugging helps. was (Author: thetaphi): Robert: That was the first idea that came to my mind, too. I think thats a good idea. It especially strange that the segments_xx file (which is not part of the current segment) is written with MERGE/FLUSH context. It should be written with a standard context? Or do I miss something? (This was the reason why I added the file name check). Initially I was expecting that writing the commit is done with a separate IOContext, but it isn't - the noisy debugging helps. Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238448#comment-13238448 ] Robert Muir commented on LUCENE-3659: - I think if we were to implement it this way, its not a burden on codecs. By default, somewhere in lucene core inits the codec APIs with a context always. For example SegmentInfos.write(): {code} infosWriter.writeInfos(directory, segmentFileName, codec.getName(), this, IOContext.DEFAULT); {code} and DocFieldProcessor/SegmentMerger for fieldinfos: {code} infosWriter.write(state.directory, state.segmentName, state.fieldInfos, IOContext.DEFAULT); {code} These guys would just set this in the IOContext. Most/All codecs just pass this along. If a codec wants to ignore the IOContext and lie about it, thats its own choice. So I think its an easy change. Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238453#comment-13238453 ] Robert Muir commented on LUCENE-3659: - But also codecs that write their own private tiny metadata files (like .per from PerFieldPostingsFormat) should set this in the context. Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238454#comment-13238454 ] Robert Muir commented on LUCENE-3659: - Live docs aren't a metadata. I think you are conflating 'tiny' with 'metadata'. I'm saying we should declare its metadata, thats all. This is pretty black and white! IF a directory wants to, as a heuristic, interpret metadata == tiny, then thats fine, but thats separate. Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems
[ https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238455#comment-13238455 ] Mark Miller commented on SOLR-3274: --- {quote} But why not just try to reconnect if/when this situation has occured, so that Solr can continue doing its work? I guess Solr does not do that, because it seems like when this error has first established, there is no recovering, and certainly (Im close to 100% positive) ZK will not continue doing 10+ secs response-times to all requests, even though it might do a 10+ sec response once in a while. {quote} Solr does try to reconnect - but there can be no recovering due to the other issue you posted - because you have changed the core admin url. ZooKeeper related SolrCloud problems Key: SOLR-3274 URL: https://issues.apache.org/jira/browse/SOLR-3274 Project: Solr Issue Type: Bug Components: SolrCloud Affects Versions: 4.0 Environment: Any Reporter: Per Steffensen Assignee: Mark Miller Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 Solr servers, running 28 slices of the same collection (collA) - all slices have one replica (two shards all in all - leader + replica) - 56 cores all in all (8 shards on each solr instance). But anyways... Besides the problem reported in SOLR-3273, the system seems to run fine under high load for several hours, but eventually errors like the ones shown below start to occur. I might be wrong, but they all seem to indicate some kind of unstability in the collaboration between Solr and ZooKeeper. I have to say that I havnt been there to check ZooKeeper at the moment where those exception occur, but basically I dont believe the exceptions occur because ZooKeeper is not running stable - at least when I go and check ZooKeeper through other channels (e.g. my eclipse ZK plugin) it is always accepting my connection and generally seems to be doing fine. Exception 1) Often the first error we see in solr.log is something like this {code} Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates are disabled. at org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678) at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) {code} I believe this error basically occurs because SolrZkClient.isConnected reports false, which means that its internal keeper.getState does not return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED
[jira] [Resolved] (SOLR-3262) Remove threads from DIH (Trunk only)
[ https://issues.apache.org/jira/browse/SOLR-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Dyer resolved SOLR-3262. -- Resolution: Fixed committed. Trunk: r1305384 Remove threads from DIH (Trunk only) -- Key: SOLR-3262 URL: https://issues.apache.org/jira/browse/SOLR-3262 Project: Solr Issue Type: Bug Components: contrib - DataImportHandler Affects Versions: 4.0 Reporter: James Dyer Assignee: James Dyer Priority: Minor Fix For: 4.0 Attachments: SOLR-3262.patch SOLR-1352 introduced a multi-threading feature for DataImportHandler. Historically, this feature only seemed to work in a limited set of cases and I don't think we can guarantee users that using threads will behave consistently. Also, the multi-threaded option adds considerable complexity making code refactoring difficult. I propose removing threads from Trunk. (But keep it in 3.x, applying any bug fixes for it there.) This can be a first step in improving the DIH code base. Eventually we can possibly add a carefully though-out threads implementation back in. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
3.6 status
please wrap up your changes to 3.6 by wednesday. As described earlier: on wednesday branch_3x becomes our release branch. I will move all jira issues out of 3.6 unless they are marked blocker bugs. I will then send an email that the branch is frozen and any changes should have an associated jira. Wednesday is ~3.5 weeks since the initial please wrap up in 2 weeks email so I think its fair. -- lucidimagination.com - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (SOLR-3276) Update ja_text entry in schema.xml with useful info
Update ja_text entry in schema.xml with useful info --- Key: SOLR-3276 URL: https://issues.apache.org/jira/browse/SOLR-3276 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 3.6, 4.0 Reporter: Christian Moen Searching Japanese text is a big topic with many considerations that need to be made. I think it's helpful to add a link to the wiki in a comment near {{text_ja}} in {{scheme.xml}} to guide users to detailed information on features available, how to use them, etc. I've made a placeholder page on [http://wiki.apache.org/solr/JapaneseLanguageSupport] and I'll add details post-release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Assigned] (SOLR-3276) Update ja_text entry in schema.xml with useful info
[ https://issues.apache.org/jira/browse/SOLR-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christian Moen reassigned SOLR-3276: Assignee: Christian Moen Update ja_text entry in schema.xml with useful info --- Key: SOLR-3276 URL: https://issues.apache.org/jira/browse/SOLR-3276 Project: Solr Issue Type: Improvement Components: documentation Affects Versions: 3.6, 4.0 Reporter: Christian Moen Assignee: Christian Moen Searching Japanese text is a big topic with many considerations that need to be made. I think it's helpful to add a link to the wiki in a comment near {{text_ja}} in {{scheme.xml}} to guide users to detailed information on features available, how to use them, etc. I've made a placeholder page on [http://wiki.apache.org/solr/JapaneseLanguageSupport] and I'll add details post-release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3924) Optimize buffer size handling in RAMDirectory to make it more GC friendly
Optimize buffer size handling in RAMDirectory to make it more GC friendly - Key: LUCENE-3924 URL: https://issues.apache.org/jira/browse/LUCENE-3924 Project: Lucene - Java Issue Type: Improvement Components: core/store Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 RAMDirectory currently uses a fixed buffer size of 1024 bytes to allocate memory. This is very wasteful for large indexes. Improvements may be: - per file buffer sizes based on IOContext and maximum segment size - allocate only one buffer for files that are copied from another directory - dynamically increae buffer size when files grow (makes seek() complicated) -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3925) Spatial field types should not store doc frequencies or positions
Spatial field types should not store doc frequencies or positions - Key: LUCENE-3925 URL: https://issues.apache.org/jira/browse/LUCENE-3925 Project: Lucene - Java Issue Type: Improvement Components: modules/spatial Reporter: Simon Willnauer Assignee: David Smiley Priority: Minor Fix For: 4.0 It appears the corrections is simply to supply IndexOptions.DOCS_ONLY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Created] (LUCENE-3926) Improve Javadocs of RAMDirectory to document its limitations
Improve Javadocs of RAMDirectory to document its limitations Key: LUCENE-3926 URL: https://issues.apache.org/jira/browse/LUCENE-3926 Project: Lucene - Java Issue Type: Sub-task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3926) Improve Javadocs of RAMDirectory to document its limitations
[ https://issues.apache.org/jira/browse/LUCENE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3926: -- Attachment: (was: LUCENE-3659.patch) Improve Javadocs of RAMDirectory to document its limitations Key: LUCENE-3926 URL: https://issues.apache.org/jira/browse/LUCENE-3926 Project: Lucene - Java Issue Type: Sub-task Components: core/store Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3926) Improve Javadocs of RAMDirectory to document its limitations
[ https://issues.apache.org/jira/browse/LUCENE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3926: -- Attachment: (was: LUCENE-3659.patch) Improve Javadocs of RAMDirectory to document its limitations Key: LUCENE-3926 URL: https://issues.apache.org/jira/browse/LUCENE-3926 Project: Lucene - Java Issue Type: Sub-task Components: core/store Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3659: -- Issue Type: Sub-task (was: Task) Parent: LUCENE-3924 Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes -- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Sub-task Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENE-3659) Allow per-RAMFile buffer sizes based on IOContext and source of data (e.g. copy from another directory)
[ https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-3659: -- Affects Version/s: (was: 3.5) Fix Version/s: (was: 3.6) Summary: Allow per-RAMFile buffer sizes based on IOContext and source of data (e.g. copy from another directory) (was: Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes) Allow per-RAMFile buffer sizes based on IOContext and source of data (e.g. copy from another directory) --- Key: LUCENE-3659 URL: https://issues.apache.org/jira/browse/LUCENE-3659 Project: Lucene - Java Issue Type: Sub-task Affects Versions: 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 4.0 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3925) Spatial field types should not store doc frequencies or positions
[ https://issues.apache.org/jira/browse/LUCENE-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238495#comment-13238495 ] Robert Muir commented on LUCENE-3925: - +1 Spatial field types should not store doc frequencies or positions - Key: LUCENE-3925 URL: https://issues.apache.org/jira/browse/LUCENE-3925 Project: Lucene - Java Issue Type: Improvement Components: modules/spatial Reporter: Simon Willnauer Assignee: David Smiley Priority: Minor Fix For: 4.0 Original Estimate: 0.5h Remaining Estimate: 0.5h It appears the corrections is simply to supply IndexOptions.DOCS_ONLY -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (LUCENE-3926) Improve Javadocs of RAMDirectory to document its limitations
[ https://issues.apache.org/jira/browse/LUCENE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238496#comment-13238496 ] Uwe Schindler commented on LUCENE-3926: --- This issue should only so javadocs improvements! Improve Javadocs of RAMDirectory to document its limitations Key: LUCENE-3926 URL: https://issues.apache.org/jira/browse/LUCENE-3926 Project: Lucene - Java Issue Type: Sub-task Components: core/store Affects Versions: 3.5, 4.0 Reporter: Uwe Schindler Assignee: Uwe Schindler Fix For: 3.6, 4.0 Attachments: LUCENE-3659.patch Spinoff from several dev@lao issues: - [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E] - issue LUCENE-3653 The use cases for RAMDirectory are very limited and to prevent users from using it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve the javadocs. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
RE: Case where StandardAnalyzer doesn't remove punctuation
Hi Steve, thanks for your response. Totally makes sense, given that the comma character is a widely used for written number syntax (e.g. 1000 is the same as 1,000). Thanks also for the notes re the mailing list and nabble. Colm. -- View this message in context: http://lucene.472066.n3.nabble.com/Case-where-StandardAnalyzer-doesn-t-remove-punctuation-tp3848460p3858661.html Sent from the Lucene - Java Developer mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-435) QParser must validate existance/absense of q parameter
[ https://issues.apache.org/jira/browse/SOLR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238538#comment-13238538 ] Hoss Man commented on SOLR-435: --- bq. If the purpose of the QueryComponent is to be QParser agnostic and consequently unable to know if the 'q' parameter is even relevant, shouldn't it be up to the QParser to retrieve what it believes the query string to be from the request parameters? Sorry ... i chose my words carelessly and wound up saying almost the exact opposite of what i ment. What i should have said... * QueryComponent is responsible for determining the QParser to use for the main query and passing it the value of the q query-string param to the QParser.getParser(...) method * QParser.getParser passes that query-string on to whater QParserPlugin was selected as the qstr param to the createParser * The QParser that gets created by the createParser call should do whatever validation it needs to do (including a null check) in it's parse() method In answer to your questions... * QueryComponent can not do any validation of the q param, because it can't make any assumptions about what the defType QParser this are legal values -- not even a null check, because in case of things like dismax nll is perfectly fine * QParsers (and QParserPlugins) can't be made responsible for fetching the q param because they don't know if/when they are being used to parse the main query param, vs fq params, vs some other nested subquery * by putting this kind of validation/error checking in the QParser.parse method, we ensure that it is used properly even when the QParser(s) are used for things like 'fq' params or in nested subqueries bq. Hoss: I don't agree with your reasoning on the developer-user typo-ing the 'q' parameter. If you mistype basically any parameter then clearly it is as if you didn't even specify that parameter and you get the default behavior of the parameter you were trying to type correctly but didn't. understood ... but most other situations the default behavior is either do nothing or error ... we don't have a lot of default behaviors which are give me tones of stuff ... if you use {{facet=truefaceet.field=foo}} (note the extra character) you don't silently get get faceting on every field as a default -- you get no field faceting at all. if you misstype the q param name and get an error on your first attempt you immediately understand you did something wrong. likewise if we made the default a matches nothing query, then you'd get no results and (hopefully) be suspicious enough to realize you made a mistake -- but if we give you a bunch of results by default you may not realize at all that you're looking at all results not just the results of what you thought the query was. the only situations i can think of where forgetting or mistyping a param name doens't default to error or nothing are things with fixed expectations: start, rows, fl, etc... Those have defaults that (if they don't match what you tried to specify) are immediately obvious ... the 'start' attribute on the docList returned is wrong, you get more results then you expected, you get field names you know you didn't specify, etc... it's less obvious when you are looking at the results of a query that it's a match-all query instead of the query you thought you were specifying. like i said ... i'm -0 to having a hardcoded default query for lucene/dismax/edismax ... if you feel strongly about it that's fine, allthough i would try to convince you match none is a better hardcoded default then 'match all' (so that it's easier to recognize mistakes quickly) and really don't think we should do it w/o also add q.alt support to the LuceneQParser so people can override it. QParser must validate existance/absense of q parameter Key: SOLR-435 URL: https://issues.apache.org/jira/browse/SOLR-435 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 3.6, 4.0 Attachments: SOLR-435_q_defaults_to_all-docs.patch Each QParser should check if q exists or not. For some it will be required others not. currently it throws a null pointer: {code} java.lang.NullPointerException at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:36) at org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104) at org.apache.solr.search.QParser.getQuery(QParser.java:80) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:67) at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:150) ... {code} see:
[jira] [Commented] (LUCENE-3909) Move Kuromoji to analysis.ja and introduce Japanese* naming
[ https://issues.apache.org/jira/browse/LUCENE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238536#comment-13238536 ] Christian Moen commented on LUCENE-3909: Committed revision 1305421 on {{trunk}} and 1305437 to {{branch_3x}}. Move Kuromoji to analysis.ja and introduce Japanese* naming --- Key: LUCENE-3909 URL: https://issues.apache.org/jira/browse/LUCENE-3909 Project: Lucene - Java Issue Type: Improvement Components: modules/analysis Affects Versions: 3.6, 4.0 Reporter: Christian Moen Assignee: Christian Moen Lucene/Solr 3.6 and 4.0 will get out-of-the-box Japanese language support through {{KuromojiAnalyzer}}, {{KuromojiTokenizer}} and various other filters. These filters currently live in {{org.apache.lucene.analysis.kuromoji}}. I'm proposing that we move Kuromoji to a new Japanese package {{org.apache.lucene.analysis.ja}} in line with how other languages are organized. As part of this, I also think we should rename {{KuromojiAnalyzer}} to {{JapaneseAnalyzer}}, etc. to further align naming to our conventions by making it very clear that these analyzers are for Japanese. (As much as I like the name Kuromoji, I think Japanese is more fitting.) A potential issue I see with this that I'd like to raise and get feedback on, is that end-users in Japan and elsewhere who use lucene-gosen could have issues after an upgrade since lucene-gosen is in fact releasing its analyzers under the {{org.apache.lucene.analysis.ja}} namespace (and we'd have a name clash). I believe users should have the freedom to choose whichever Japanese analyzer, filter, etc. they'd like to use, and I don't want to propose a name change that just creates unnecessary problems for users, but I think the naming proposed above is most fitting for a Lucene/Solr release. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Reopened] (LUCENENET-466) optimisation for the GermanStemmer.vb
[ https://issues.apache.org/jira/browse/LUCENENET-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens reopened LUCENENET-466: --- I see what you're saying. I missed that in the original conversation that was linked to in an earlier comment. {quote} ue occurs pretty often as an infix (think of *steuer*): about 1.5% of the words of the German aspell dictionary are affected. ae and oe are rather seldom. Still, it may be worth a try, because the stemmer doesn't work morphologically anyway. It doesn't really matter if steuer is stemmed as steur or steu as long as it's consistent. {quote} I'm thinking that as long as it is made clear that this behavior is in the second stemmer, this would probably be an okay change to make as the second option in a way that doesn't break the root of the word. optimisation for the GermanStemmer.vb -- Key: LUCENENET-466 URL: https://issues.apache.org/jira/browse/LUCENENET-466 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Reporter: Prescott Nasser Priority: Minor Fix For: Lucene.Net 3.0.3 I have a little optimisation for the GermanStemmer.vb (in Contrib.Analyzers) class. At the moment the function Substitute converts the german Umlaute ä in a, ö ino and ü in u. This is not the correct german translation. They must be converted to ae, oe and ue. So I can write the name Björn or Bjoern but not Bjorn. With this optimization a user can search for Björn and also find Bjoern. Here is the optimized code snippet: else if ( buffer[c] == 'ä' ) { buffer[c] = 'a'; buffer.Insert(c + 1, 'e'); } else if ( buffer[c] == 'ö' ) { buffer[c] = 'o'; buffer.Insert(c + 1,'e'); } else if ( buffer[c] == 'ü' ) { buffer[c] = 'u'; buffer.Insert(c + 1,'e'); } Thank You Björn -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (SOLR-3231) Add the ability to KStemmer to preserve the original token when stemming
[ https://issues.apache.org/jira/browse/SOLR-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated SOLR-3231: -- Affects Version/s: (was: 4.0) Fix Version/s: 4.0 Add the ability to KStemmer to preserve the original token when stemming Key: SOLR-3231 URL: https://issues.apache.org/jira/browse/SOLR-3231 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Jamie Johnson Fix For: 4.0 Attachments: KStemFilter.patch While using the PorterStemmer, I found that there were often times that it was far to aggressive in it's stemming. In my particular case it is unrealistic to provide a protected word list which captures all possible words which should not be stemmed. To avoid this I proposed a solution whereby we store the original token as well as the stemmed token so exact searches would always work. Based on discussions on the mailing list Ahmet Arslan, I believe the attached patch to KStemmer provides the desired capabilities through a configuration parameter. This largely is a copy of the org.apache.lucene.wordnet.SynonymTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[JENKINS] Solr-3.x - Build # 642 - Failure
Build: https://builds.apache.org/job/Solr-3.x/642/ No tests ran. Build Log (for compile errors): [...truncated 7071 lines...] jar-analyzers-common: common.init: compile-lucene-core: jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: compile-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java [javac] Compiling 26 source files to /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:153: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:158: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:163: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:168: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:186: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:207: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:212: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:223: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:228: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:82: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:197: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:202: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:207: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:212: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:217: method does not override a method from its superclass [javac] @Override [javac]^ [javac]
[JENKINS] Lucene-Solr-tests-only-3.x - Build # 12878 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/12878/ No tests ran. Build Log (for compile errors): [...truncated 550 lines...] [echo] common.init: compile-lucene-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: compile-core: jar-core: [exec] Result: 1 [jar] Building jar: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/contrib/analyzers/common/lucene-analyzers-3.6-SNAPSHOT.jar common.init: compile-lucene-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java [javac] Compiling 26 source files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:153: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:158: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:163: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:168: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:186: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:207: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:212: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:223: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:228: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:82: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:197: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:202: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:207: method does not override a method from its superclass [javac] @Override [javac]^ [javac]
[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 2115 - Failure
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/2115/ No tests ran. Build Log (for compile errors): [...truncated 532 lines...] [echo] common.init: compile-lucene-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: compile-core: jar-core: [exec] Result: 1 [jar] Building jar: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build/contrib/analyzers/common/lucene-analyzers-3.6-SNAPSHOT.jar common.init: compile-lucene-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java [javac] Compiling 26 source files to /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:153: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:158: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:163: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:168: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:186: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:207: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:212: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:223: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:228: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:82: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:197: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:202: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:207: method does not override a method from its superclass [javac] @Override [javac]^ [javac]
RE: [JENKINS] Solr-3.x - Build # 642 - Failure
Christian: Could it be that you simply merged from trunk that did a svn copy. So you did not really merge the changes in 3.x to new files, your commit did remove the old files and replaced them by 3.x ones. @Override on interfaces is not compatible to Java 5. You should maybe test-build with Java 5. Yesterday I did something similar (I moved a file around in trunk) and wanted to backport that change. This is very risky when doing by merge, as this removes the old file and add the new one instead of renaming. What I did at the end: I renamed the files in 3.x by hand and then did a no-op merge to record merge properties. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Apache Jenkins Server [mailto:jenk...@builds.apache.org] Sent: Monday, March 26, 2012 6:52 PM To: dev@lucene.apache.org Subject: [JENKINS] Solr-3.x - Build # 642 - Failure Build: https://builds.apache.org/job/Solr-3.x/642/ No tests ran. Build Log (for compile errors): [...truncated 7071 lines...] jar-analyzers-common: common.init: compile-lucene-core: jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: compile-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java [javac] Compiling 26 source files to /usr/home/hudson/hudson- slave/workspace/Solr- 3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:153: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:158: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:163: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:168: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:186: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:207: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:212: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:223: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:228: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/UserDictionary.java:82: method does not override a method from its superclass [javac] @Override [javac]^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/UserDictionary.java:197: method does not override a method from its superclass [javac] @Override [javac]^ [javac]
[jira] [Commented] (SOLR-3231) Add the ability to KStemmer to preserve the original token when stemming
[ https://issues.apache.org/jira/browse/SOLR-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238557#comment-13238557 ] Robert Muir commented on SOLR-3231: --- I don't think we should approach the problem this way: this is the same discussion as LUCENE-3415 Add the ability to KStemmer to preserve the original token when stemming Key: SOLR-3231 URL: https://issues.apache.org/jira/browse/SOLR-3231 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Jamie Johnson Fix For: 4.0 Attachments: KStemFilter.patch While using the PorterStemmer, I found that there were often times that it was far to aggressive in it's stemming. In my particular case it is unrealistic to provide a protected word list which captures all possible words which should not be stemmed. To avoid this I proposed a solution whereby we store the original token as well as the stemmed token so exact searches would always work. Based on discussions on the mailing list Ahmet Arslan, I believe the attached patch to KStemmer provides the desired capabilities through a configuration parameter. This largely is a copy of the org.apache.lucene.wordnet.SynonymTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Commented] (SOLR-435) QParser must validate existance/absense of q parameter
[ https://issues.apache.org/jira/browse/SOLR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238560#comment-13238560 ] Ryan McKinley commented on SOLR-435: bq. if no query string is supplied, or if its blank or just whitespace, then the default is to match all documents. -0 When I opened this issue (4 years ago!) I was only worried that you get a NPE from a missing 'q' bq. don't think we should do it w/o also add q.alt support to the LuceneQParser so people can override it. +1 Match none seems like the most appropriate behavior unless you explicitly say something else QParser must validate existance/absense of q parameter Key: SOLR-435 URL: https://issues.apache.org/jira/browse/SOLR-435 Project: Solr Issue Type: Bug Components: search Affects Versions: 1.3 Reporter: Ryan McKinley Assignee: Ryan McKinley Fix For: 3.6, 4.0 Attachments: SOLR-435_q_defaults_to_all-docs.patch Each QParser should check if q exists or not. For some it will be required others not. currently it throws a null pointer: {code} java.lang.NullPointerException at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:36) at org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104) at org.apache.solr.search.QParser.getQuery(QParser.java:80) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:67) at org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:150) ... {code} see: http://www.nabble.com/query-parsing-error-to14124285.html#a14140108 -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
[jira] [Updated] (LUCENENET-466) optimisation for the GermanStemmer.vb
[ https://issues.apache.org/jira/browse/LUCENENET-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christopher Currens updated LUCENENET-466: -- Attachment: DIN2Stemmer.patch Bjorn, I've made this patch from the src/contrib/Analyzers folder, on top of the DIN2 changes already committed to trunk. Since the extent of my German is danke!, I was hoping you could see if this stemmer is working properly before I commit it to trunk. These were the test cases I made that should hopefully emulate the results of the normal DIN1 stemmer, where the word left of the semicolon is the word, and to the right, the result. {noformat} # Test cases for words with ae, ue, or oe in them Haus;hau Hauses;hau Haeuser;hau Haeusern;hau steuer;steur rueckwaerts;ruckwar geheimtuer;geheimtur {noformat} With the last word in particular, it produces fairly different results in each stemmer, though I think they are expected, due to the different DIN. Also, the DIN2 stemmer will also translate 'Häuser' and 'Häusern' properly (to hau), so there is support for both umlauts and the expanded 'ae', 'oe' and 'ue' forms. optimisation for the GermanStemmer.vb -- Key: LUCENENET-466 URL: https://issues.apache.org/jira/browse/LUCENENET-466 Project: Lucene.Net Issue Type: Improvement Components: Lucene.Net Contrib Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3 Reporter: Prescott Nasser Priority: Minor Fix For: Lucene.Net 3.0.3 Attachments: DIN2Stemmer.patch I have a little optimisation for the GermanStemmer.vb (in Contrib.Analyzers) class. At the moment the function Substitute converts the german Umlaute ä in a, ö ino and ü in u. This is not the correct german translation. They must be converted to ae, oe and ue. So I can write the name Björn or Bjoern but not Bjorn. With this optimization a user can search for Björn and also find Bjoern. Here is the optimized code snippet: else if ( buffer[c] == 'ä' ) { buffer[c] = 'a'; buffer.Insert(c + 1, 'e'); } else if ( buffer[c] == 'ö' ) { buffer[c] = 'o'; buffer.Insert(c + 1,'e'); } else if ( buffer[c] == 'ü' ) { buffer[c] = 'u'; buffer.Insert(c + 1,'e'); } Thank You Björn -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
Re: [JENKINS] Solr-3.x - Build # 642 - Failure
I took care of the @Overrides. On Mon, Mar 26, 2012 at 12:55 PM, Uwe Schindler u...@thetaphi.de wrote: Christian: Could it be that you simply merged from trunk that did a svn copy. So you did not really merge the changes in 3.x to new files, your commit did remove the old files and replaced them by 3.x ones. @Override on interfaces is not compatible to Java 5. You should maybe test-build with Java 5. Yesterday I did something similar (I moved a file around in trunk) and wanted to backport that change. This is very risky when doing by merge, as this removes the old file and add the new one instead of renaming. What I did at the end: I renamed the files in 3.x by hand and then did a no-op merge to record merge properties. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Apache Jenkins Server [mailto:jenk...@builds.apache.org] Sent: Monday, March 26, 2012 6:52 PM To: dev@lucene.apache.org Subject: [JENKINS] Solr-3.x - Build # 642 - Failure Build: https://builds.apache.org/job/Solr-3.x/642/ No tests ran. Build Log (for compile errors): [...truncated 7071 lines...] jar-analyzers-common: common.init: compile-lucene-core: jflex-uptodate-check: jflex-notice: javacc-uptodate-check: javacc-notice: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: compile-core: init: clover.setup: clover.info: [echo] [echo] Clover not found. Code coverage reports disabled. [echo] clover: common.compile-core: [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java [javac] Compiling 26 source files to /usr/home/hudson/hudson- slave/workspace/Solr- 3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:153: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:158: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:163: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:168: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:186: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:207: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:212: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:223: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/BinaryDictionary.java:228: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/UserDictionary.java:82: method does not override a method from its superclass [javac] @Override [javac] ^ [javac] /usr/home/hudson/hudson-slave/workspace/Solr- 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a nalysis/ja/dict/UserDictionary.java:197: method does not override a method
Re: Indexing Boolean Expressions
In full dislosure, there is a patent application that Yahoo! has filed for the use of inverted indexes for using complex predicates for matching contracts and opportunities in advertising: http://www.google.com/patents/US20110016109?printsec=abstract#v=onepageqf=false However I believe there are many more applications that can benefit from similar matching techniques (i.e. recommender systems, e-commerce, recruiting,etc) to make it worthwhile implementing the ideas exposed in the original VLDB'09 paper (which is public) in Lucene. As a Yahoo! employee, I might not be able to directly contribute to this project but will be happy to point to any publicly available pointer that can help. Cheers, -- Joaquin On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Joaquin, I looked through the paper several times, and see no problem to implement it in Lucene (the trivial case at least): Let's index conjunctive condition as {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3} then, form query from the incoming fact (event): fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD to enforce overlap between condition and event, wrap the query above into own query whose scorer will check that numClauses for the matched doc is equal to number of matched clauses. To get numClauses for the matched doc you can use FieldCache that's damn fast; and number of matched clauses can be obtained from DisjunctionSumScorer.nrMatchers() Negative clauses, and multivalue can be covered also, I believe. WDYT? On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado joaquin.delg...@gmail.comwrote: I looked at LUCENE-2987 and its work on the query side (changes to the accepted syntax to accept lower case 'or' and 'and'), which isn't really related to my proposal. What I'm proposing is to be able to index complex boolean expressions using Lucene. This can be viewed as the opposite of the regular search task. The objective here is find a set of relevant queries given a document (assignment of values to fields). This by itself may not sound that interesting but its a key piece to efficiently implementing any MATCHING system which is effectively a two-way search where constraints are defined both-ways. An example of this would be: 1) Job matching: Potential employers define their job posting as a documents along with complex boolean expressions used to narrow potential candidates. Job searchers upload their profile and may formulate complex queries when executing a search. Once a is search initiated from any of the sides constraints need to satisfied both ways. 2) Advertising: Publishers define constraints on the type of advertisers/ads they are willing to show in their sites. On the other hand, advertisers define constraints (typically at the campaign level) on publisher sites they want their ads to show at as well as on the user audiences they are targeting to. While some attribute values are known at definition time, others are only instantiated once the user visits a given page which triggers a matching request that must be satisfied in few milliseconds to select valid ads and then scored based on relevance. So in a matching system a MATCH QUERY is considered to to be a tuple that consists of a value assignment to attributes/fields (doc) + a boolean expression (query) that goes against a double index also built on tuples that simultaneously boolean expressions and associated documents. To do this efficiently we need to be able to build indexes on Boolean expressions (Lucene queries) and retrieve the set of matching expressions given a doc (typically few attributes with values assigned), which is the core of what is described in this paper: Indexing Boolean Expressions (See http://www.vldb.org/pvldb/2/vldb09-83.pdf) -- J So to effectively resolve the problem of realtime matching one can On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera calcmaste...@gmail.comwrote: On 02/21/2012 12:15 PM, Aayush Kothari wrote: So if Aayush Kothari is interested in working on this as a Student, all we need is a formal mentor (I can be the informal one). Anyone up for the task? Completely interested in working for and learning about the aforementioned subject/project. +1. This may be related to the work I'm doing with LUCENE-2987 Basically changing the grammar to accepts conjunctions AND and OR in the query text. I would be interested in working with you on some of the details. However, I too am not a formal committer. -- Joe Cabreraeminorlabs.com -- Sincerely yours Mikhail Khludnev Lucid Certified Apache Lucene/Solr Developer Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
[jira] [Commented] (SOLR-3231) Add the ability to KStemmer to preserve the original token when stemming
[ https://issues.apache.org/jira/browse/SOLR-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238564#comment-13238564 ] Jamie Johnson commented on SOLR-3231: - Thanks Robert. I just read LUCENE-3415 and understand the approach. My biggest issue is I don't like having to create a separate field to do an exact search, this of course is based on the fact that I was burned by this so perhaps I am biased. It feels like the right thing to do from a user of the API would be to do the least destructive thing, but again I have a specific use case in mind and am not considering all other implications. Add the ability to KStemmer to preserve the original token when stemming Key: SOLR-3231 URL: https://issues.apache.org/jira/browse/SOLR-3231 Project: Solr Issue Type: Improvement Components: Schema and Analysis Reporter: Jamie Johnson Fix For: 4.0 Attachments: KStemFilter.patch While using the PorterStemmer, I found that there were often times that it was far to aggressive in it's stemming. In my particular case it is unrealistic to provide a protected word list which captures all possible words which should not be stemmed. To avoid this I proposed a solution whereby we store the original token as well as the stemmed token so exact searches would always work. Based on discussions on the mailing list Ahmet Arslan, I believe the attached patch to KStemmer provides the desired capabilities through a configuration parameter. This largely is a copy of the org.apache.lucene.wordnet.SynonymTokenFilter. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org
Re: Indexing Boolean Expressions
BTW, the idea of indexing Boolean Expressions inside a text indexing engine is not new. For example Oracle Text provides the CTXRULE index and the MATCHES operator within their indexing stack, which is primarily used for Rule-based text classification. See: http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8 http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013 -- J On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado joaquin.delg...@gmail.comwrote: In full dislosure, there is a patent application that Yahoo! has filed for the use of inverted indexes for using complex predicates for matching contracts and opportunities in advertising: http://www.google.com/patents/US20110016109?printsec=abstract#v=onepageqf=false However I believe there are many more applications that can benefit from similar matching techniques (i.e. recommender systems, e-commerce, recruiting,etc) to make it worthwhile implementing the ideas exposed in the original VLDB'09 paper (which is public) in Lucene. As a Yahoo! employee, I might not be able to directly contribute to this project but will be happy to point to any publicly available pointer that can help. Cheers, -- Joaquin On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Joaquin, I looked through the paper several times, and see no problem to implement it in Lucene (the trivial case at least): Let's index conjunctive condition as {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3} then, form query from the incoming fact (event): fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD to enforce overlap between condition and event, wrap the query above into own query whose scorer will check that numClauses for the matched doc is equal to number of matched clauses. To get numClauses for the matched doc you can use FieldCache that's damn fast; and number of matched clauses can be obtained from DisjunctionSumScorer.nrMatchers() Negative clauses, and multivalue can be covered also, I believe. WDYT? On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado joaquin.delg...@gmail.comwrote: I looked at LUCENE-2987 and its work on the query side (changes to the accepted syntax to accept lower case 'or' and 'and'), which isn't really related to my proposal. What I'm proposing is to be able to index complex boolean expressions using Lucene. This can be viewed as the opposite of the regular search task. The objective here is find a set of relevant queries given a document (assignment of values to fields). This by itself may not sound that interesting but its a key piece to efficiently implementing any MATCHING system which is effectively a two-way search where constraints are defined both-ways. An example of this would be: 1) Job matching: Potential employers define their job posting as a documents along with complex boolean expressions used to narrow potential candidates. Job searchers upload their profile and may formulate complex queries when executing a search. Once a is search initiated from any of the sides constraints need to satisfied both ways. 2) Advertising: Publishers define constraints on the type of advertisers/ads they are willing to show in their sites. On the other hand, advertisers define constraints (typically at the campaign level) on publisher sites they want their ads to show at as well as on the user audiences they are targeting to. While some attribute values are known at definition time, others are only instantiated once the user visits a given page which triggers a matching request that must be satisfied in few milliseconds to select valid ads and then scored based on relevance. So in a matching system a MATCH QUERY is considered to to be a tuple that consists of a value assignment to attributes/fields (doc) + a boolean expression (query) that goes against a double index also built on tuples that simultaneously boolean expressions and associated documents. To do this efficiently we need to be able to build indexes on Boolean expressions (Lucene queries) and retrieve the set of matching expressions given a doc (typically few attributes with values assigned), which is the core of what is described in this paper: Indexing Boolean Expressions (See http://www.vldb.org/pvldb/2/vldb09-83.pdf) -- J So to effectively resolve the problem of realtime matching one can On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera calcmaste...@gmail.comwrote: On 02/21/2012 12:15 PM, Aayush Kothari wrote: So if Aayush Kothari is interested in working on this as a Student, all we need is a formal mentor (I can be the informal one). Anyone up for the task? Completely interested in working for and learning about the aforementioned subject/project. +1. This may be related to the work I'm doing with LUCENE-2987 Basically changing the grammar to accepts conjunctions AND and OR in the query text. I would be interested in
Re: Indexing Boolean Expressions
Efficient rule matching goes further back, at least to alerting in Verity K2. wunder Search Guy, Chegg On Mar 26, 2012, at 10:15 AM, J. Delgado wrote: BTW, the idea of indexing Boolean Expressions inside a text indexing engine is not new. For example Oracle Text provides the CTXRULE index and the MATCHES operator within their indexing stack, which is primarily used for Rule-based text classification. See: http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8 http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013 -- J On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado joaquin.delg...@gmail.com wrote: In full dislosure, there is a patent application that Yahoo! has filed for the use of inverted indexes for using complex predicates for matching contracts and opportunities in advertising: http://www.google.com/patents/US20110016109?printsec=abstract#v=onepageqf=false However I believe there are many more applications that can benefit from similar matching techniques (i.e. recommender systems, e-commerce, recruiting,etc) to make it worthwhile implementing the ideas exposed in the original VLDB'09 paper (which is public) in Lucene. As a Yahoo! employee, I might not be able to directly contribute to this project but will be happy to point to any publicly available pointer that can help. Cheers, -- Joaquin On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Joaquin, I looked through the paper several times, and see no problem to implement it in Lucene (the trivial case at least): Let's index conjunctive condition as {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3} then, form query from the incoming fact (event): fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD to enforce overlap between condition and event, wrap the query above into own query whose scorer will check that numClauses for the matched doc is equal to number of matched clauses. To get numClauses for the matched doc you can use FieldCache that's damn fast; and number of matched clauses can be obtained from DisjunctionSumScorer.nrMatchers() Negative clauses, and multivalue can be covered also, I believe. WDYT? On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado joaquin.delg...@gmail.com wrote: I looked at LUCENE-2987 and its work on the query side (changes to the accepted syntax to accept lower case 'or' and 'and'), which isn't really related to my proposal. What I'm proposing is to be able to index complex boolean expressions using Lucene. This can be viewed as the opposite of the regular search task. The objective here is find a set of relevant queries given a document (assignment of values to fields). This by itself may not sound that interesting but its a key piece to efficiently implementing any MATCHING system which is effectively a two-way search where constraints are defined both-ways. An example of this would be: 1) Job matching: Potential employers define their job posting as a documents along with complex boolean expressions used to narrow potential candidates. Job searchers upload their profile and may formulate complex queries when executing a search. Once a is search initiated from any of the sides constraints need to satisfied both ways. 2) Advertising: Publishers define constraints on the type of advertisers/ads they are willing to show in their sites. On the other hand, advertisers define constraints (typically at the campaign level) on publisher sites they want their ads to show at as well as on the user audiences they are targeting to. While some attribute values are known at definition time, others are only instantiated once the user visits a given page which triggers a matching request that must be satisfied in few milliseconds to select valid ads and then scored based on relevance. So in a matching system a MATCH QUERY is considered to to be a tuple that consists of a value assignment to attributes/fields (doc) + a boolean expression (query) that goes against a double index also built on tuples that simultaneously boolean expressions and associated documents. To do this efficiently we need to be able to build indexes on Boolean expressions (Lucene queries) and retrieve the set of matching expressions given a doc (typically few attributes with values assigned), which is the core of what is described in this paper: Indexing Boolean Expressions (See http://www.vldb.org/pvldb/2/vldb09-83.pdf) -- J So to effectively resolve the problem of realtime matching one can On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera calcmaste...@gmail.com wrote: On 02/21/2012 12:15 PM, Aayush Kothari wrote: So if Aayush Kothari is interested in working on this as a Student, all we need is a formal mentor (I can be the informal one). Anyone up for the task? Completely interested in working
copyField and precedence with dynamic fields
This seems like it's a JIRA, I couldn't find anything like this that's already a JIRA on a quick look. From a client: Here's a fragment of a schema file: fields field name=id type=string indexed=true stored=true required=true / field name=title_text type=text_general indexed=true stored=true multiValued=false / field name=title_phonetic type=phonetic indexed=true stored=true multiValued=false / dynamicField name=*_text type=text_general indexed=true stored=false / dynamicField name=*_phonetic type=phonetic indexed=true stored=false / /fields copyField source=*_text dest=*_phonetic / Here's an input doc: add doc field name=idID1/field field name=title_text1st Document/field field name=description_textAnother field/field /doc /add OK, add the doc with the above schema, and to a q=*:*fl=* The response does NOT contain title_phonetic. It looks like IndexSchema.registerCopyField won't notice that title_phonetic is a non-dynamic field and make a title_text - title_phonetic mapping. Is this a JIRA or intended or just not worth fixing? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org