[jira] [Commented] (LUCENENET-481) Port Contrib.MemoryIndex

2012-03-26 Thread Christopher Currens (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENENET-481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238661#comment-13238661
 ] 

Christopher Currens commented on LUCENENET-481:
---

If you're talking about the termComparator, that wasn't made generic until 3.1. 
 The comparator in 3.0.3 can't be ported the way it is anyway because of Java's 
type system, but I just want to make sure you're porting 3.0.3 to keep 
everything in line with the rest of the .NET versions.  You'll find that the 
3.x version in java uses a few other additions to the main lucene library that 
aren't yet available in 3.0.3.

This problem should be easily solved without reflection.  The comparator used 
basically requires that it be a {{KeyValuePairTKey, TValue}}, or more 
specifically, a {{KeyValuePairstring, TValue}}.  There are actually only 2 
different types that uses that termComparator: {{KeyValuePairstring, 
ArrayIntList[]}} and {{KeyValuePairstring,Info[]}}.  An exception to that is 
the {{private static sort(DictionaryK,V)}} method, but that can be solved 
with a static method, a type constraint (which is already implied in the java 
version) and some type inference (as a nicety).  I had ported most of this at 
one point (somewhere on my home computer), and if memory serves me correctly, I 
think this is how I solved this problem.

You can use this if you want:

{code}
class KvpComparer
{
public static int ComparerTKey, TValue(KeyValuePairTKey, TValue x, 
KeyValuePairTKey, TValue y)
where TKey : IComparableTKey
{
if (x.Equals(y)) return 0;
return x.Key.CompareTo(y.Key);
}
}

sealed class KvpComparerT : KvpComparer, IComparerKeyValuePairstring, T
{
public int Compare(KeyValuePairstring, T x, KeyValuePairstring, T y)
{
return Comparer(x, y);
}
}
{code}

You can create the two instances you need for the {{string,Info}} and 
{{string,ArrayIntList}} types.  For the {{Map.EntryK,V[] sort(HashMapK,V 
map}} method, constrain {{K}} to {{IComparableK}}, and then you can use it 
like {{Array.Sort(entries, KvpComparer.Compare)}}, which is nice because it's 
one less object you need to create (or more) for each type passed into sort.  
Alternatively, since the {{sort}} method is private, and only uses those two 
types, you can just change the signature and pass in one of the comparers 
instead, removing the base class from the equation.

 Port Contrib.MemoryIndex
 

 Key: LUCENENET-481
 URL: https://issues.apache.org/jira/browse/LUCENENET-481
 Project: Lucene.Net
  Issue Type: New Feature
Affects Versions: Lucene.Net 3.0.3
Reporter: Christopher Currens

 We need to port MemoryIndex from contrib, if we want to be able to port a few 
 other contrib libraries.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: Indexing Boolean Expressions

2012-03-26 Thread Mikhail Khludnev
Hello Joaquin,

I looked through the paper several times, and see no problem to implement
it in Lucene (the trivial case at least):

Let's index conjunctive condition as
 {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}

then, form query from the incoming fact (event):
fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD

to enforce overlap between condition and event, wrap the query above into
own query whose scorer will check that numClauses for the matched doc is
equal to number of matched clauses.
To get numClauses for the matched doc you can use FieldCache that's damn
fast; and number of matched clauses can be obtained from
DisjunctionSumScorer.nrMatchers()

Negative clauses, and multivalue can be covered also, I believe.

WDYT?

On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado joaquin.delg...@gmail.comwrote:

 I looked at LUCENE-2987 and its work on the query side (changes to the
 accepted syntax to accept lower case 'or' and 'and'), which isn't really
 related to my proposal.

 What I'm proposing is to be able to index complex boolean expressions
 using Lucene. This can be viewed as the opposite of the regular search
 task. The objective here is find a set of relevant queries given a document
 (assignment of values to fields).

 This by itself may not sound that interesting but its a key piece
 to efficiently implementing any MATCHING system which is effectively a
 two-way search where constraints are defined both-ways. An example of this
 would be:

 1) Job matching: Potential employers define their job posting as a
 documents along with complex boolean expressions used to narrow potential
 candidates. Job searchers upload their profile and may formulate complex
 queries when executing a search. Once a is search initiated from any of the
 sides constraints need to satisfied both ways.
 2) Advertising: Publishers define constraints on the type of
 advertisers/ads they are willing to show in their sites. On the other hand,
 advertisers define constraints (typically at the campaign level) on
 publisher sites they want their ads to show at as well as on the user
 audiences they are targeting to. While some attribute values are known at
 definition time, others are only instantiated once the user visits a given
 page which triggers a matching request that must be satisfied in
 few milliseconds to select valid ads and then scored based on relevance.

 So in a matching system a MATCH QUERY is considered to to be a tuple that
 consists of a value assignment to attributes/fields (doc) + a boolean
 expression (query) that goes against a double index also built on tuples
 that  simultaneously boolean expressions and associated documents.

 To do this efficiently we need to be able to build indexes on Boolean
 expressions (Lucene queries) and retrieve the set of matching expressions
 given a doc (typically few attributes with values assigned), which is the
 core of what is described in this paper: Indexing Boolean Expressions
 (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)

 -- J


 So to effectively resolve the problem of realtime matching one can

 On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera calcmaste...@gmail.comwrote:

  On 02/21/2012 12:15 PM, Aayush Kothari wrote:




  So if Aayush Kothari is interested in working on this as a Student,
 all we need is a formal mentor (I can be the informal one).

  Anyone up for the task?


   Completely interested in working for and learning about the
 aforementioned subject/project. +1.

 This may be related to the work I'm doing with LUCENE-2987
 Basically changing the grammar to accepts conjunctions AND and OR in the
 query text.
 I would be interested in working with you on some of the details.

 However, I too am not a formal committer.

 --
 Joe Cabreraeminorlabs.com





-- 
Sincerely yours
Mikhail Khludnev
Lucid Certified
Apache Lucene/Solr Developer
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com


[jira] [Updated] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary

2012-03-26 Thread Koji Sekiguchi (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-3888:
---

Fix Version/s: (was: 3.6)

Thanks Robert for giving some patches and comment.

{quote}
The only option for 3.6 would be something like my previous patch
(https://issues.apache.org/jira/secure/attachment/12519860/LUCENE-3888.patch) 
which
has the disadvantages of doing the second-phase re-ranking on surface forms.
{quote}

With the disadvantages, the spell checker won't work well for Japanese anyway. 
I give up this for 3.6.

 split off the spell check word and surface form in spell check dictionary
 -

 Key: LUCENE-3888
 URL: https://issues.apache.org/jira/browse/LUCENE-3888
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, 
 LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch


 The did you mean? feature by using Lucene's spell checker cannot work well 
 for Japanese environment unfortunately and is the longstanding problem, 
 because the logic needs comparatively long text to check spells, but for some 
 languages (e.g. Japanese), most words are too short to use the spell checker.
 I think, for at least Japanese, the things can be improved if we split off 
 the spell check word and surface form in the spell check dictionary. Then we 
 can use ReadingAttribute for spell checking but CharTermAttribute for 
 suggesting, for example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3174) Visualize Cluster State

2012-03-26 Thread Stefan Matheis (steffkes) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3174:


Attachment: SOLR-3174.patch

 Visualize Cluster State
 ---

 Key: SOLR-3174
 URL: https://issues.apache.org/jira/browse/SOLR-3174
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Ryan McKinley
 Attachments: SOLR-3174-graph.png, SOLR-3174-rgraph.png, 
 SOLR-3174.patch, SOLR-3174.patch


 It would be great to visualize the cluster state in the new UI. 
 See Mark's wish:
 https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3174) Visualize Cluster State

2012-03-26 Thread Stefan Matheis (steffkes) (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Matheis (steffkes) updated SOLR-3174:


Attachment: SOLR-3174-rgraph.png
SOLR-3174-graph.png

 Visualize Cluster State
 ---

 Key: SOLR-3174
 URL: https://issues.apache.org/jira/browse/SOLR-3174
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Ryan McKinley
 Attachments: SOLR-3174-graph.png, SOLR-3174-graph.png, 
 SOLR-3174-rgraph.png, SOLR-3174-rgraph.png, SOLR-3174.patch, SOLR-3174.patch


 It would be great to visualize the cluster state in the new UI. 
 See Mark's wish:
 https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji

2012-03-26 Thread Kazuaki Hiraga (Created) (JIRA)
Add decompose compound Japanese Katakana token capability to Kuromoji
-

 Key: LUCENE-3921
 URL: https://issues.apache.org/jira/browse/LUCENE-3921
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
 Environment: Cent OS 5, IPA Dictionary
Reporter: Kazuaki Hiraga


Japanese morphological analyzer, Kuromoji doesn't have a capability to 
decompose every Japanese Katakana compound tokens to sub-tokens. It seems that 
some Katakana tokens can be decomposed, but it cannot be applied every Katakana 
compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ don't 
decompose into トート バッグ and ショルダー バッグ although the IPA dictionary has バッグ 
in its entry.  I would like to apply the decompose feature to every Katakana 
tokens if the sub-tokens are in the dictionary or add the capability to force 
apply the decompose feature to every Katakana tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji

2012-03-26 Thread Kazuaki Hiraga (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kazuaki Hiraga updated LUCENE-3921:
---

Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe  (was: Cent 
OS 5, IPA Dictionary)

 Add decompose compound Japanese Katakana token capability to Kuromoji
 -

 Key: LUCENE-3921
 URL: https://issues.apache.org/jira/browse/LUCENE-3921
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
 Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe
Reporter: Kazuaki Hiraga
  Labels: features

 Japanese morphological analyzer, Kuromoji doesn't have a capability to 
 decompose every Japanese Katakana compound tokens to sub-tokens. It seems 
 that some Katakana tokens can be decomposed, but it cannot be applied every 
 Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ 
 don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary 
 has バッグ in its entry.  I would like to apply the decompose feature to every 
 Katakana tokens if the sub-tokens are in the dictionary or add the capability 
 to force apply the decompose feature to every Katakana tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3174) Visualize Cluster State

2012-03-26 Thread Stefan Matheis (steffkes) (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238199#comment-13238199
 ] 

Stefan Matheis (steffkes) commented on SOLR-3174:
-

Updated the Patch and the Screenshots. Radial-View is now working as expected. 
Also improved the displayed Hostname, if all have the same protocol, it's 
skipped - same for ports and directories.

 Visualize Cluster State
 ---

 Key: SOLR-3174
 URL: https://issues.apache.org/jira/browse/SOLR-3174
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Reporter: Ryan McKinley
 Attachments: SOLR-3174-graph.png, SOLR-3174-graph.png, 
 SOLR-3174-rgraph.png, SOLR-3174-rgraph.png, SOLR-3174.patch, SOLR-3174.patch


 It would be great to visualize the cluster state in the new UI. 
 See Mark's wish:
 https://issues.apache.org/jira/browse/SOLR-3162?focusedCommentId=13218272page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13218272

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect

2012-03-26 Thread Dawid Weiss (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238203#comment-13238203
 ] 

Dawid Weiss commented on LUCENE-3867:
-

For historical records: the previous implementation of RamUsageEstimator was 
off by anything between 3% (random size objects, including arrays) to 20% 
(objects smaller than 80 bytes). Again -- these are perfect scenario 
measurements with empty heap and max. allocation until OOM, with a serial GC. 
With a concurrent and parallel GCs the memory consumption estimation is still 
accurate but it's nearly impossible to tell when an OOM will occur or how the 
GC will manage the heap space. 

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
 --

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Uwe Schindler
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect

2012-03-26 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238206#comment-13238206
 ] 

Uwe Schindler commented on LUCENE-3867:
---

That's true. But you can still get the unreleaseable allocation, so the size 
of the non-gc-able object graph. If GC does not free the objects after release 
fast-enough, it will still do it once memory gets low. But the allocated 
objects with hard refs are not releaseable.

So I think it's fine for memory requirement purposes. If you want real heap 
allocation, you must use instrumentation.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
 --

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Uwe Schindler
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3867) RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect

2012-03-26 Thread Dawid Weiss (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238208#comment-13238208
 ] 

Dawid Weiss commented on LUCENE-3867:
-

I didn't say it's wrong -- it is fine and accurate. What I'm saying is that 
it's not really suitable for predictions; for answering questions like: how 
many objects of a given type/ types can I allocate before an OOM hits me? It 
doesn't really surprise me that much, but it would be nice. For measuring 
already allocated stuff it's more than fine of course.

 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER and other constants are incorrect
 --

 Key: LUCENE-3867
 URL: https://issues.apache.org/jira/browse/LUCENE-3867
 Project: Lucene - Java
  Issue Type: Bug
  Components: core/index
Reporter: Shai Erera
Assignee: Uwe Schindler
Priority: Trivial
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3867-3.x.patch, LUCENE-3867-compressedOops.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch, 
 LUCENE-3867.patch, LUCENE-3867.patch, LUCENE-3867.patch


 RamUsageEstimator.NUM_BYTES_ARRAY_HEADER is computed like that: 
 NUM_BYTES_OBJECT_HEADER + NUM_BYTES_INT + NUM_BYTES_OBJECT_REF. The 
 NUM_BYTES_OBJECT_REF part should not be included, at least not according to 
 this page: http://www.javamex.com/tutorials/memory/array_memory_usage.shtml
 {quote}
 A single-dimension array is a single object. As expected, the array has the 
 usual object header. However, this object head is 12 bytes to accommodate a 
 four-byte array length. Then comes the actual array data which, as you might 
 expect, consists of the number of elements multiplied by the number of bytes 
 required for one element, depending on its type. The memory usage for one 
 element is 4 bytes for an object reference ...
 {quote}
 While on it, I wrote a sizeOf(String) impl, and I wonder how do people feel 
 about including such helper methods in RUE, as static, stateless, methods? 
 It's not perfect, there's some room for improvement I'm sure, here it is:
 {code}
   /**
* Computes the approximate size of a String object. Note that if this 
 object
* is also referenced by another object, you should add
* {@link RamUsageEstimator#NUM_BYTES_OBJECT_REF} to the result of this
* method.
*/
   public static int sizeOf(String str) {
   return 2 * str.length() + 6 // chars + additional safeness for 
 arrays alignment
   + 3 * RamUsageEstimator.NUM_BYTES_INT // String 
 maintains 3 integers
   + RamUsageEstimator.NUM_BYTES_ARRAY_HEADER // 
 char[] array
   + RamUsageEstimator.NUM_BYTES_OBJECT_HEADER; // 
 String object
   }
 {code}
 If people are not against it, I'd like to also add sizeOf(int[] / byte[] / 
 long[] / double[] ... and String[]).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3272) Solr filter factory for MorfologikFilter

2012-03-26 Thread Updated

 [ 
https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rafał Kuć updated SOLR-3272:


Attachment: SOLR-3272.patch

Patch with MorfologikFilterFactory and test added.

 Solr filter factory for MorfologikFilter
 

 Key: SOLR-3272
 URL: https://issues.apache.org/jira/browse/SOLR-3272
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Rafał Kuć
 Fix For: 4.0

 Attachments: SOLR-3272.patch


 I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe 
 someone will have make use of it :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3272) Solr filter factory for MorfologikFilter

2012-03-26 Thread Created
Solr filter factory for MorfologikFilter


 Key: SOLR-3272
 URL: https://issues.apache.org/jira/browse/SOLR-3272
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Rafał Kuć
 Fix For: 4.0
 Attachments: SOLR-3272.patch

I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe 
someone will have make use of it :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3076) Solr should support block joins

2012-03-26 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3076?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238239#comment-13238239
 ] 

Michael McCandless commented on SOLR-3076:
--

{quote}
2. Do you agree with overall approach to deliver straightforward QP with 
explicit joining syntax? Or you object and insist on entity-relationship-schema 
approach?

3. What's is the level of uncertainty you have about the current QP syntax? 
What's your main concern and what's the way to improve it?
{quote}

Well, stepping back, my concern is still that I don't think there
should be any QP syntax to express block joins.  These are joins
determined at indexing time, and compiled into the index, and so the
only remaining query-time freedom is which fields you want to search
against (something QP can already understand, ie field:text syntax).
From that fields list the required joins are implied.

I can't imagine users learning/typing the sort of syntax we are
discussing here.

It's true there are exceptional cases (Hoss's size field that's on
both parent and child docs), but, that's the exception not the rule; I
don't think we should design things (APIs, QP syntax) around exceptional
cases.  And, I think such an exception should be
handled by some sort of field aliasing (book_page_count vs
chapter_page_count).

For query-time join, which is fully flexible, I agree the QP must (and
already does) include join syntax, ie be more like SQL, where you can
express arbitrary on-the-fly joins.

But, at the same time, the 'users' of Solr's QP syntax may not be the
end user, ie, the app's front end may very well construct these
complex join expressions and so it's really the developers of that
search app writing these join queries.  So perhaps it's fine to add
crazy-expert syntax that end users would rarely use but search app
developers might...?

All this being said, I defer to Hoss (and other committers more
experienced w/ Solr QP issues) here... if they all feel this added QP
syntax makes sense then let's do it!


 Solr should support block joins
 ---

 Key: SOLR-3076
 URL: https://issues.apache.org/jira/browse/SOLR-3076
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
 Attachments: SOLR-3076.patch, SOLR-3076.patch, SOLR-3076.patch, 
 SOLR-3076.patch, SOLR-3076.patch, bjq-vs-filters-backward-disi.patch, 
 bjq-vs-filters-illegal-state.patch, child-bjqparser.patch, 
 parent-bjq-qparser.patch, parent-bjq-qparser.patch, 
 solrconf-bjq-erschema-snippet.xml, tochild-bjq-filtered-search-fix.patch


 Lucene has the ability to do block joins, we should add it to Solr.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji

2012-03-26 Thread Christian Moen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238253#comment-13238253
 ] 

Christian Moen commented on LUCENE-3921:


Hello, Kazu.  Long time no see -- I hope things are well!

This is very good feature request.  I think this is possible by changing how we 
emit unknown words, i.e. by not emitting them as greedily and giving the 
lattice more segmentation options.  For example, if we find an unknown word 
トートバッグ (by regular greedy matching), we can emit

{noformat}
ト
トー
トート
トートバ
トートバッ
トートバッグ
{noformat}

in the current position.  When we reach the position that starts with バッグ, 
we'll find a known word, and when the Viterbi runs, it's likely to choose トート 
and バッグ as the best path.

Let me have a look at this by looking into the lattice details.

 Add decompose compound Japanese Katakana token capability to Kuromoji
 -

 Key: LUCENE-3921
 URL: https://issues.apache.org/jira/browse/LUCENE-3921
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
 Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe
Reporter: Kazuaki Hiraga
  Labels: features

 Japanese morphological analyzer, Kuromoji doesn't have a capability to 
 decompose every Japanese Katakana compound tokens to sub-tokens. It seems 
 that some Katakana tokens can be decomposed, but it cannot be applied every 
 Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ 
 don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary 
 has バッグ in its entry.  I would like to apply the decompose feature to every 
 Katakana tokens if the sub-tokens are in the dictionary or add the capability 
 to force apply the decompose feature to every Katakana tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji

2012-03-26 Thread Christian Moen (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238253#comment-13238253
 ] 

Christian Moen edited comment on LUCENE-3921 at 3/26/12 10:44 AM:
--

Hello, Kazu.  Long time no see -- I hope things are well!

This is very good feature request.  I think this is possible by changing how we 
emit unknown words, i.e. by not emitting them as greedily and giving the 
lattice more segmentation options.  For example, if we find an unknown word 
トートバッグ (by regular greedy matching), we can emit

{noformat}
ト
トー
トート
トートバ
トートバッ
トートバッグ
{noformat}

in the current position.  When we reach the position that starts with バッグ, 
we'll find a known word, and when the Viterbi runs, it's likely to choose トート 
and バッグ as the best path.

Let me have a play by looking into the lattice details and see if something 
like this is feasible.

  was (Author: cm):
Hello, Kazu.  Long time no see -- I hope things are well!

This is very good feature request.  I think this is possible by changing how we 
emit unknown words, i.e. by not emitting them as greedily and giving the 
lattice more segmentation options.  For example, if we find an unknown word 
トートバッグ (by regular greedy matching), we can emit

{noformat}
ト
トー
トート
トートバ
トートバッ
トートバッグ
{noformat}

in the current position.  When we reach the position that starts with バッグ, 
we'll find a known word, and when the Viterbi runs, it's likely to choose トート 
and バッグ as the best path.

Let me have a look at this by looking into the lattice details.
  
 Add decompose compound Japanese Katakana token capability to Kuromoji
 -

 Key: LUCENE-3921
 URL: https://issues.apache.org/jira/browse/LUCENE-3921
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
 Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe
Reporter: Kazuaki Hiraga
  Labels: features

 Japanese morphological analyzer, Kuromoji doesn't have a capability to 
 decompose every Japanese Katakana compound tokens to sub-tokens. It seems 
 that some Katakana tokens can be decomposed, but it cannot be applied every 
 Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ 
 don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary 
 has バッグ in its entry.  I would like to apply the decompose feature to every 
 Katakana tokens if the sub-tokens are in the dictionary or add the capability 
 to force apply the decompose feature to every Katakana tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji

2012-03-26 Thread Christian Moen (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238253#comment-13238253
 ] 

Christian Moen edited comment on LUCENE-3921 at 3/26/12 10:57 AM:
--

Hello, Kazu.  Long time no see -- I hope things are well!

This is very good feature request.  I think this might be possible by changing 
how we emit unknown words, i.e. by not emitting them as greedily and giving the 
lattice more segmentation options.  For example, if we find an unknown word 
トートバッグ (by regular greedy matching), we can emit

{noformat}
ト
トー
トート
トートバ
トートバッ
トートバッグ
{noformat}

in the current position.  When we reach the position that starts with バッグ we'll 
find a known word.  When the Viterbi runs, it's likely to choose トート and バッグ as 
its best path.

Let me have a play by looking into the lattice details and see if something 
like this is feasible.  We are sort of hacking the model here so we also need 
to consider side-effects.

  was (Author: cm):
Hello, Kazu.  Long time no see -- I hope things are well!

This is very good feature request.  I think this is possible by changing how we 
emit unknown words, i.e. by not emitting them as greedily and giving the 
lattice more segmentation options.  For example, if we find an unknown word 
トートバッグ (by regular greedy matching), we can emit

{noformat}
ト
トー
トート
トートバ
トートバッ
トートバッグ
{noformat}

in the current position.  When we reach the position that starts with バッグ, 
we'll find a known word, and when the Viterbi runs, it's likely to choose トート 
and バッグ as the best path.

Let me have a play by looking into the lattice details and see if something 
like this is feasible.
  
 Add decompose compound Japanese Katakana token capability to Kuromoji
 -

 Key: LUCENE-3921
 URL: https://issues.apache.org/jira/browse/LUCENE-3921
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
 Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe
Reporter: Kazuaki Hiraga
  Labels: features

 Japanese morphological analyzer, Kuromoji doesn't have a capability to 
 decompose every Japanese Katakana compound tokens to sub-tokens. It seems 
 that some Katakana tokens can be decomposed, but it cannot be applied every 
 Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ 
 don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary 
 has バッグ in its entry.  I would like to apply the decompose feature to every 
 Katakana tokens if the sub-tokens are in the dictionary or add the capability 
 to force apply the decompose feature to every Katakana tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3273) 404 Not Found on action=PREPRECOVERY

2012-03-26 Thread Per Steffensen (Created) (JIRA)
404 Not Found on action=PREPRECOVERY


 Key: SOLR-3273
 URL: https://issues.apache.org/jira/browse/SOLR-3273
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen


We have an application based on a recent copy of 4.0-SNAPSHOT. We have a 
preformance test setup where we performance test our application (and therefore 
indirectly Solr(Cloud)). When we run the performance test against a setup using 
SolrCloud without replication, everything seems to run very nicely for days. 
When we add replication to the setup the same performance test shows some 
problems - which we will report (and maybe help fix) in distinct issues here in 
jira.

About the setup - the setup is a little more complex than described below, but 
I believe the description will tell enough:
We have two solr servers which we start from solr-install/example using this 
command (ZooKeepers have been started before) - we first start solr on server1, 
and then starts solr on server2 after solr on server1 finished starting up: 
pre
nohup java -Xmx4096m -Dcom.sun.management.jmxremote 
-DzkHost=server1:2181,server2:2181,server3:2181 
-Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf 
-Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties 
-jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log 
/pre
The ./myapp/solr.xml looks like this on server1:
pre
?xml version=1.0 encoding=UTF-8 ?
solr persistent=false
  cores adminPath=/admin/myapp host=server1 hostPort=8983 
hostContext=solr
core name=collA_slice1_shard1 instanceDir=. 
dataDir=collA_slice1_data collection=collA shard=slice1 /
  /cores
/solr
/pre
The ./myapp/solr.xml looks like this on server2:
pre
?xml version=1.0 encoding=UTF-8 ?
solr persistent=false
  cores adminPath=/admin/myapp host=server2 hostPort=8983 
hostContext=solr
core name=collA_slice1_shard2 instanceDir=. 
dataDir=collA_slice1_data collection=collA shard=slice1 /
  /cores
/solr
/pre

The first thing we observe is that Solr server1 (running collA_slice1_shard1) 
seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) 
is started up later it quickly reports the following in its solr.log an keeps 
doing that for a long time:
pre
SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not 
Found

request: 
http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
at 
org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
/pre

Please note that we have changed a little bit in the way errors are logged, but 
basically this means that Solr server2 gets an 404 Not Found on its request 
http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2;
 to Solr server1.

Seems like there is not a common agreement among the Solr servers on how/where 
to send those requests and how/where to listen for them.

Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3909) Move Kuromoji to analysis.ja and introduce Japanese* naming

2012-03-26 Thread Christian Moen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238267#comment-13238267
 ] 

Christian Moen commented on LUCENE-3909:


Committed revision 1305297 to {{trunk}}.  Backporting to {{branch_3x}}.

 Move Kuromoji to analysis.ja and introduce Japanese* naming
 ---

 Key: LUCENE-3909
 URL: https://issues.apache.org/jira/browse/LUCENE-3909
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
Assignee: Christian Moen

 Lucene/Solr 3.6 and 4.0 will get out-of-the-box Japanese language support 
 through {{KuromojiAnalyzer}}, {{KuromojiTokenizer}} and various other 
 filters.  These filters currently live in 
 {{org.apache.lucene.analysis.kuromoji}}.
 I'm proposing that we move Kuromoji to a new Japanese package 
 {{org.apache.lucene.analysis.ja}} in line with how other languages are 
 organized.  As part of this, I also think we should rename 
 {{KuromojiAnalyzer}} to {{JapaneseAnalyzer}}, etc. to further align naming to 
 our conventions by making it very clear that these analyzers are for 
 Japanese.  (As much as I like the name Kuromoji, I think Japanese is more 
 fitting.)
 A potential issue I see with this that I'd like to raise and get feedback on, 
 is that end-users in Japan and elsewhere who use lucene-gosen could have 
 issues after an upgrade since lucene-gosen is in fact releasing its analyzers 
 under the {{org.apache.lucene.analysis.ja}} namespace (and we'd have a name 
 clash).
 I believe users should have the freedom to choose whichever Japanese 
 analyzer, filter, etc. they'd like to use, and I don't want to propose a name 
 change that just creates unnecessary problems for users, but I think the 
 naming proposed above is most fitting for a Lucene/Solr release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3921) Add decompose compound Japanese Katakana token capability to Kuromoji

2012-03-26 Thread Kazuaki Hiraga (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238272#comment-13238272
 ] 

Kazuaki Hiraga commented on LUCENE-3921:


Hello, Christian. It's been a long time!

We really want to have that capability. As you may know, It's hard to deal with 
tokens that consists of two or three Katakana tokens. We want to have a good 
way to solve the issue more systematically rather than making a hand-made 
dictionary.

Looking forward to hearing from you.


 Add decompose compound Japanese Katakana token capability to Kuromoji
 -

 Key: LUCENE-3921
 URL: https://issues.apache.org/jira/browse/LUCENE-3921
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 4.0
 Environment: Cent OS 5, IPA Dictionary, Run with Search mdoe
Reporter: Kazuaki Hiraga
  Labels: features

 Japanese morphological analyzer, Kuromoji doesn't have a capability to 
 decompose every Japanese Katakana compound tokens to sub-tokens. It seems 
 that some Katakana tokens can be decomposed, but it cannot be applied every 
 Katakana compound tokens. For instance, トートバッグ(tote bag) and ショルダーバッグ 
 don't decompose into トート バッグ and ショルダー バッグ although the IPA dictionary 
 has バッグ in its entry.  I would like to apply the decompose feature to every 
 Katakana tokens if the sub-tokens are in the dictionary or add the capability 
 to force apply the decompose feature to every Katakana tokens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3273) 404 Not Found on action=PREPRECOVERY

2012-03-26 Thread Per Steffensen (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Per Steffensen updated SOLR-3273:
-

Description: 
We have an application based on a recent copy of 4.0-SNAPSHOT. We have a 
preformance test setup where we performance test our application (and therefore 
indirectly Solr(Cloud)). When we run the performance test against a setup using 
SolrCloud without replication, everything seems to run very nicely for days. 
When we add replication to the setup the same performance test shows some 
problems - which we will report (and maybe help fix) in distinct issues here in 
jira.

About the setup - the setup is a little more complex than described below, but 
I believe the description will tell enough:
We have two solr servers which we start from solr-install/example using this 
command (ZooKeepers have been started before) - we first start solr on server1, 
and then starts solr on server2 after solr on server1 finished starting up: 
{code}
nohup java -Xmx4096m -Dcom.sun.management.jmxremote 
-DzkHost=server1:2181,server2:2181,server3:2181 
-Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf 
-Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties 
-jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log 
{code}
The ./myapp/solr.xml looks like this on server1:
{code:xml}
?xml version=1.0 encoding=UTF-8 ?
solr persistent=false
  cores adminPath=/admin/myapp host=server1 hostPort=8983 
hostContext=solr
core name=collA_slice1_shard1 instanceDir=. 
dataDir=collA_slice1_data collection=collA shard=slice1 /
  /cores
/solr
{code}
The ./myapp/solr.xml looks like this on server2:
{code:xml}
?xml version=1.0 encoding=UTF-8 ?
solr persistent=false
  cores adminPath=/admin/myapp host=server2 hostPort=8983 
hostContext=solr
core name=collA_slice1_shard2 instanceDir=. 
dataDir=collA_slice1_data collection=collA shard=slice1 /
  /cores
/solr
{code}

The first thing we observe is that Solr server1 (running collA_slice1_shard1) 
seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) 
is started up later it quickly reports the following in its solr.log an keeps 
doing that for a long time:
{code}
SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: Not 
Found

request: 
http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
at 
org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
at 
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
{code}

Please note that we have changed a little bit in the way errors are logged, but 
basically this means that Solr server2 gets an 404 Not Found on its request 
http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2;
 to Solr server1.

Seems like there is not a common agreement among the Solr servers on how/where 
to send those requests and how/where to listen for them.

Regards, Per Steffensen

  was:
We have an application based on a recent copy of 4.0-SNAPSHOT. We have a 
preformance test setup where we performance test our application (and therefore 
indirectly Solr(Cloud)). When we run the performance test against a setup using 
SolrCloud without replication, everything seems to run very nicely for days. 
When we add replication to the setup the same performance test shows some 
problems - which we will report (and maybe help fix) in distinct issues here in 
jira.

About the setup - the setup is a little more complex than described below, but 
I believe the description will tell enough:
We have two solr servers which we start from solr-install/example using this 
command (ZooKeepers have been started before) - we first start solr on server1, 
and then starts solr on server2 after solr on server1 finished starting up: 
pre
nohup java -Xmx4096m -Dcom.sun.management.jmxremote 
-DzkHost=server1:2181,server2:2181,server3:2181 
-Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf 
-Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties 
-jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log 
/pre
The ./myapp/solr.xml looks like this on 

[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Uwe Schindler (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3659:
--

Attachment: LUCENE-3659.patch

A played a little bit around and implemented the IOContext / filename dependent 
buffer sizes for RAMFiles.

The code currently prints out lot's of size infornation (like buffer sizes) on 
RAMDirectory.close(). This is just for debugging and to show what happens.

To catually see real-world use cases, execute tests with ant test 
-Dtests.directory=RAMDirectory -Dtests.nightly=true

 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Uwe Schindler (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3659:
--

Attachment: LUCENE-3659.patch

More improvements:
- If you use new RAMDirectory(existingDir), the RAMFiles in the created 
RAMDirectory will have the original fileSize (if less then 1L  30 bytes) as 
bufferSize, as we know the file size upfront.

 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-03-26 Thread Kazuaki Hiraga (Created) (JIRA)
Add Japanese Kanji number normalization to Kuromoji
---

 Key: LUCENE-3922
 URL: https://issues.apache.org/jira/browse/LUCENE-3922
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Kazuaki Hiraga


Japanese people use Kanji numerals instead of Arabic numerals for writing 
price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
numerals (I don't think we need to have a capability to normalize to Kanji 
numerals).

 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY

2012-03-26 Thread Erick Erickson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238292#comment-13238292
 ] 

Erick Erickson commented on SOLR-3273:
--

Of course the people who actually know the code may make me look foolish, but 
why are you even turning on replication in a SolrCloud environment? As I 
understand it, all the replication  etc is done for you by virtue of the 
leaders automatically distributing the incoming updates to all replicas so 
nothing useful is accomplished by turning on replication.


If I'm on track, maybe the right solution is for the replication code to do 
the right thing when running in a SolrCloud configuration, which is to do 
nothing. 



 404 Not Found on action=PREPRECOVERY
 

 Key: SOLR-3273
 URL: https://issues.apache.org/jira/browse/SOLR-3273
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen

 We have an application based on a recent copy of 4.0-SNAPSHOT. We have a 
 preformance test setup where we performance test our application (and 
 therefore indirectly Solr(Cloud)). When we run the performance test against a 
 setup using SolrCloud without replication, everything seems to run very 
 nicely for days. When we add replication to the setup the same performance 
 test shows some problems - which we will report (and maybe help fix) in 
 distinct issues here in jira.
 About the setup - the setup is a little more complex than described below, 
 but I believe the description will tell enough:
 We have two solr servers which we start from solr-install/example using 
 this command (ZooKeepers have been started before) - we first start solr on 
 server1, and then starts solr on server2 after solr on server1 finished 
 starting up: 
 {code}
 nohup java -Xmx4096m -Dcom.sun.management.jmxremote 
 -DzkHost=server1:2181,server2:2181,server3:2181 
 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf 
 -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties 
 -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log 
 {code}
 The ./myapp/solr.xml looks like this on server1:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server1 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard1 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The ./myapp/solr.xml looks like this on server2:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server2 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard2 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The first thing we observe is that Solr server1 (running collA_slice1_shard1) 
 seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) 
 is started up later it quickly reports the following in its solr.log an keeps 
 doing that for a long time:
 {code}
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
 Not Found
 request: 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
 at 
 org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 {code}
 Please note that we have changed a little bit in the way errors are logged, 
 but basically this means that Solr server2 gets an 404 Not Found on its 
 request 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2;
  to Solr server1.
 Seems like there is not a common agreement among the Solr servers on 
 how/where to send those requests and how/where to listen for them.
 Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Uwe Schindler (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3659:
--

Attachment: LUCENE-3659.patch

 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch, 
 LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Uwe Schindler (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3659:
--

Attachment: LUCENE-3659.patch

 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-03-26 Thread Christian Moen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238303#comment-13238303
 ] 

Christian Moen commented on LUCENE-3922:


Thanks a lot, Kazu.

This is a good idea to add.  Patches are of course also very welcome! :)

 Add Japanese Kanji number normalization to Kuromoji
 ---

 Key: LUCENE-3922
 URL: https://issues.apache.org/jira/browse/LUCENE-3922
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Kazuaki Hiraga
  Labels: features

 Japanese people use Kanji numerals instead of Arabic numerals for writing 
 price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
 numerals (I don't think we need to have a capability to normalize to Kanji 
 numerals).
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Uwe Schindler (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3659:
--

Attachment: (was: LUCENE-3659.patch)

 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Uwe Schindler (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3659:
--

Attachment: (was: LUCENE-3659.patch)

 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY

2012-03-26 Thread Per Steffensen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238311#comment-13238311
 ] 

Per Steffensen commented on SOLR-3273:
--

Hi

Thanks for your reply. Correct me (too) if Im wrong, but I believe SolrCloud 
does not do replication unless it is asked to. I believe you can turn 
replication on by setting numShards  1 somewhere, or you can set it up more 
manually by making sure you have more cores defined with the same shard value 
(slice1 in my case) in solr.xml's distributed on different solr instances - 
like we try to do.

But I would really like to be corrected if anyone knows that I am doing 
something wrong.

Regards, Per Steffensen

 404 Not Found on action=PREPRECOVERY
 

 Key: SOLR-3273
 URL: https://issues.apache.org/jira/browse/SOLR-3273
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen

 We have an application based on a recent copy of 4.0-SNAPSHOT. We have a 
 preformance test setup where we performance test our application (and 
 therefore indirectly Solr(Cloud)). When we run the performance test against a 
 setup using SolrCloud without replication, everything seems to run very 
 nicely for days. When we add replication to the setup the same performance 
 test shows some problems - which we will report (and maybe help fix) in 
 distinct issues here in jira.
 About the setup - the setup is a little more complex than described below, 
 but I believe the description will tell enough:
 We have two solr servers which we start from solr-install/example using 
 this command (ZooKeepers have been started before) - we first start solr on 
 server1, and then starts solr on server2 after solr on server1 finished 
 starting up: 
 {code}
 nohup java -Xmx4096m -Dcom.sun.management.jmxremote 
 -DzkHost=server1:2181,server2:2181,server3:2181 
 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf 
 -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties 
 -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log 
 {code}
 The ./myapp/solr.xml looks like this on server1:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server1 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard1 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The ./myapp/solr.xml looks like this on server2:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server2 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard2 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The first thing we observe is that Solr server1 (running collA_slice1_shard1) 
 seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) 
 is started up later it quickly reports the following in its solr.log an keeps 
 doing that for a long time:
 {code}
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
 Not Found
 request: 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
 at 
 org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 {code}
 Please note that we have changed a little bit in the way errors are logged, 
 but basically this means that Solr server2 gets an 404 Not Found on its 
 request 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2;
  to Solr server1.
 Seems like there is not a common agreement among the Solr servers on 
 how/where to send those requests and how/where to listen for them.
 Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For 

[jira] [Assigned] (SOLR-3272) Solr filter factory for MorfologikFilter

2012-03-26 Thread Dawid Weiss (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dawid Weiss reassigned SOLR-3272:
-

Assignee: Dawid Weiss

 Solr filter factory for MorfologikFilter
 

 Key: SOLR-3272
 URL: https://issues.apache.org/jira/browse/SOLR-3272
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Rafał Kuć
Assignee: Dawid Weiss
 Fix For: 4.0

 Attachments: SOLR-3272.patch


 I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe 
 someone will have make use of it :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3272) Solr filter factory for MorfologikFilter

2012-03-26 Thread Dawid Weiss (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238327#comment-13238327
 ] 

Dawid Weiss commented on SOLR-3272:
---

Hi Michał. Could you modify this patch to include support for the three 
dictionaries (combined, morfeusz and morfologik)? This would be more flexible 
(and the combined dictionary is nearly twice larger than morfologik itself so 
it's worth it).
{code}
return new MorfologikFilter(ts, DICTIONARY.MORFOLOGIK, luceneMatchVersion);
{code}

Also, an example of use in the JavaDoc would be nice (see 
BeiderMorseFilterFactory for example). The test should be using DEFAULT_VERSION 
not the fixed LUCENE_40. Thanks!

 Solr filter factory for MorfologikFilter
 

 Key: SOLR-3272
 URL: https://issues.apache.org/jira/browse/SOLR-3272
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Rafał Kuć
Assignee: Dawid Weiss
 Fix For: 4.0

 Attachments: SOLR-3272.patch


 I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe 
 someone will have make use of it :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (SOLR-3272) Solr filter factory for MorfologikFilter

2012-03-26 Thread Dawid Weiss (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238327#comment-13238327
 ] 

Dawid Weiss edited comment on SOLR-3272 at 3/26/12 12:17 PM:
-

Hi Rafał. Could you modify this patch to include support for the three 
dictionaries (combined, morfeusz and morfologik)? This would be more flexible 
(and the combined dictionary is nearly twice larger than morfologik itself so 
it's worth it).
{code}
return new MorfologikFilter(ts, DICTIONARY.MORFOLOGIK, luceneMatchVersion);
{code}

Also, an example of use in the JavaDoc would be nice (see 
BeiderMorseFilterFactory for example). The test should be using DEFAULT_VERSION 
not the fixed LUCENE_40. Thanks!

  was (Author: dweiss):
Hi Michał. Could you modify this patch to include support for the three 
dictionaries (combined, morfeusz and morfologik)? This would be more flexible 
(and the combined dictionary is nearly twice larger than morfologik itself so 
it's worth it).
{code}
return new MorfologikFilter(ts, DICTIONARY.MORFOLOGIK, luceneMatchVersion);
{code}

Also, an example of use in the JavaDoc would be nice (see 
BeiderMorseFilterFactory for example). The test should be using DEFAULT_VERSION 
not the fixed LUCENE_40. Thanks!
  
 Solr filter factory for MorfologikFilter
 

 Key: SOLR-3272
 URL: https://issues.apache.org/jira/browse/SOLR-3272
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Rafał Kuć
Assignee: Dawid Weiss
 Fix For: 4.0

 Attachments: SOLR-3272.patch


 I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe 
 someone will have make use of it :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-03-26 Thread Koji Sekiguchi (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238329#comment-13238329
 ] 

Koji Sekiguchi commented on LUCENE-3922:


We, RONDHUIT, have done this kind of normalization (and more!). You may be 
interested in:

http://www.rondhuit-demo.com/RCSS/api/overview-summary.html#featured-japanese

||Summary||normalization sample||
|漢数字=算用数字正規化|四七=47, 四十七=47, 四拾七=47, 四〇七=407|
|和暦=西暦正規化|昭和四七年、昭和四十七年、昭和四拾七年=1972年, 昭和六十四年、平成元年=1989年|


 Add Japanese Kanji number normalization to Kuromoji
 ---

 Key: LUCENE-3922
 URL: https://issues.apache.org/jira/browse/LUCENE-3922
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Kazuaki Hiraga
  Labels: features

 Japanese people use Kanji numerals instead of Arabic numerals for writing 
 price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
 numerals (I don't think we need to have a capability to normalize to Kanji 
 numerals).
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3272) Solr filter factory for MorfologikFilter

2012-03-26 Thread Commented

[ 
https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238330#comment-13238330
 ] 

Rafał Kuć commented on SOLR-3272:
-

Sure Dawid, no problem. I'll provide a patch later today.

 Solr filter factory for MorfologikFilter
 

 Key: SOLR-3272
 URL: https://issues.apache.org/jira/browse/SOLR-3272
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Rafał Kuć
Assignee: Dawid Weiss
 Fix For: 4.0

 Attachments: SOLR-3272.patch


 I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe 
 someone will have make use of it :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3272) Solr filter factory for MorfologikFilter

2012-03-26 Thread Dawid Weiss (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238332#comment-13238332
 ] 

Dawid Weiss commented on SOLR-3272:
---

Thanks. Sorry about the name confusion btw. Don't know where I took Michał from 
:)

 Solr filter factory for MorfologikFilter
 

 Key: SOLR-3272
 URL: https://issues.apache.org/jira/browse/SOLR-3272
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 4.0
Reporter: Rafał Kuć
Assignee: Dawid Weiss
 Fix For: 4.0

 Attachments: SOLR-3272.patch


 I didn't find MorfologikFilter factory in Solr, so here is a simple. Maybe 
 someone will have make use of it :)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3922) Add Japanese Kanji number normalization to Kuromoji

2012-03-26 Thread Christian Moen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3922?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238334#comment-13238334
 ] 

Christian Moen commented on LUCENE-3922:


Koji, this is very nice.

Does the kanji number normalizer ({{KanjiNumberCharFilter}}) also deal with 
combinations of kanji and arabic numbers like Kazu's price example?

Is the above code you refer to something that can go into Lucene or is it 
non-free software?

 Add Japanese Kanji number normalization to Kuromoji
 ---

 Key: LUCENE-3922
 URL: https://issues.apache.org/jira/browse/LUCENE-3922
 Project: Lucene - Java
  Issue Type: New Feature
  Components: modules/analysis
Affects Versions: 4.0
Reporter: Kazuaki Hiraga
  Labels: features

 Japanese people use Kanji numerals instead of Arabic numerals for writing 
 price, address and so on. i.e 12万4800円(124,800JPY), 二番町三ノ二(3-2 Nibancho) and 
 十二月(December).  So, we would like to normalize those Kanji numerals to Arabic 
 numerals (I don't think we need to have a capability to normalize to Kanji 
 numerals).
  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3274) ZooKeeper related SolrCloud problems

2012-03-26 Thread Per Steffensen (Created) (JIRA)
ZooKeeper related SolrCloud problems


 Key: SOLR-3274
 URL: https://issues.apache.org/jira/browse/SOLR-3274
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen


Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
Solr servers, running 28 slices of the same collection (collA) - all slices 
have one replica (two shards all in all - leader + replica) - 56 cores all in 
all (8 shards on each solr instance). But anyways...

Besides the problem reported in SOLR-3273, the system seems to run fine under 
high load for several hours, but eventually errors like the ones shown below 
start to occur. I might be wrong, but they all seem to indicate some kind of 
unstability in the collaboration between Solr and ZooKeeper. I have to say that 
I havnt been there to check ZooKeeper at the moment where those exception 
occur, but basically I dont believe the exceptions occur because ZooKeeper is 
not running stable - at least when I go and check ZooKeeper through other 
channels (e.g. my eclipse ZK plugin) it is always accepting my connection and 
generally seems to be doing fine.

Exception 1) Often the first error we see in solr.log is something like this
{code}
Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
Updates are disabled.
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
{code}
I believe this error basically occurs because SolrZkClient.isConnected reports 
false, which means that its internal keeper.getState does not return 
ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED for a 
long time, since this error starts occuring after several hours of processing 
without this problem showing. But why is it suddenly not connected anymore?!

Exception 2) We also see errors like the following, and if Im not mistaken, 
they start occuring shortly after Exception 1) (above) shows for the fist time
{code}
Mar 22, 2012 5:07:26 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: 
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
at 
org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:123)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at 

[jira] [Created] (LUCENE-3923) fail the build on wrong svn:eol-style

2012-03-26 Thread Robert Muir (Created) (JIRA)
fail the build on wrong svn:eol-style
-

 Key: LUCENE-3923
 URL: https://issues.apache.org/jira/browse/LUCENE-3923
 Project: Lucene - Java
  Issue Type: Task
  Components: general/build
Reporter: Robert Muir


I'm tired of fixing this before releases. Jenkins should detect and fail on 
this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3923) fail the build on wrong svn:eol-style

2012-03-26 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238355#comment-13238355
 ] 

Michael McCandless commented on LUCENE-3923:


+1

And, ideally, ant test as well...

 fail the build on wrong svn:eol-style
 -

 Key: LUCENE-3923
 URL: https://issues.apache.org/jira/browse/LUCENE-3923
 Project: Lucene - Java
  Issue Type: Task
  Components: general/build
Reporter: Robert Muir

 I'm tired of fixing this before releases. Jenkins should detect and fail on 
 this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY

2012-03-26 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238377#comment-13238377
 ] 

Mark Miller commented on SOLR-3273:
---

bq. adminPath=/admin/myapp

Thats probably the issue - I think we assume /admin/cores or whatever the 
default is.

 404 Not Found on action=PREPRECOVERY
 

 Key: SOLR-3273
 URL: https://issues.apache.org/jira/browse/SOLR-3273
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen

 We have an application based on a recent copy of 4.0-SNAPSHOT. We have a 
 preformance test setup where we performance test our application (and 
 therefore indirectly Solr(Cloud)). When we run the performance test against a 
 setup using SolrCloud without replication, everything seems to run very 
 nicely for days. When we add replication to the setup the same performance 
 test shows some problems - which we will report (and maybe help fix) in 
 distinct issues here in jira.
 About the setup - the setup is a little more complex than described below, 
 but I believe the description will tell enough:
 We have two solr servers which we start from solr-install/example using 
 this command (ZooKeepers have been started before) - we first start solr on 
 server1, and then starts solr on server2 after solr on server1 finished 
 starting up: 
 {code}
 nohup java -Xmx4096m -Dcom.sun.management.jmxremote 
 -DzkHost=server1:2181,server2:2181,server3:2181 
 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf 
 -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties 
 -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log 
 {code}
 The ./myapp/solr.xml looks like this on server1:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server1 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard1 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The ./myapp/solr.xml looks like this on server2:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server2 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard2 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The first thing we observe is that Solr server1 (running collA_slice1_shard1) 
 seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) 
 is started up later it quickly reports the following in its solr.log an keeps 
 doing that for a long time:
 {code}
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
 Not Found
 request: 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
 at 
 org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 {code}
 Please note that we have changed a little bit in the way errors are logged, 
 but basically this means that Solr server2 gets an 404 Not Found on its 
 request 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2;
  to Solr server1.
 Seems like there is not a common agreement among the Solr servers on 
 how/where to send those requests and how/where to listen for them.
 Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (SOLR-3273) 404 Not Found on action=PREPRECOVERY

2012-03-26 Thread Mark Miller (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3273:
--

Priority: Minor  (was: Major)

 404 Not Found on action=PREPRECOVERY
 

 Key: SOLR-3273
 URL: https://issues.apache.org/jira/browse/SOLR-3273
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller
Priority: Minor

 We have an application based on a recent copy of 4.0-SNAPSHOT. We have a 
 preformance test setup where we performance test our application (and 
 therefore indirectly Solr(Cloud)). When we run the performance test against a 
 setup using SolrCloud without replication, everything seems to run very 
 nicely for days. When we add replication to the setup the same performance 
 test shows some problems - which we will report (and maybe help fix) in 
 distinct issues here in jira.
 About the setup - the setup is a little more complex than described below, 
 but I believe the description will tell enough:
 We have two solr servers which we start from solr-install/example using 
 this command (ZooKeepers have been started before) - we first start solr on 
 server1, and then starts solr on server2 after solr on server1 finished 
 starting up: 
 {code}
 nohup java -Xmx4096m -Dcom.sun.management.jmxremote 
 -DzkHost=server1:2181,server2:2181,server3:2181 
 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf 
 -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties 
 -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log 
 {code}
 The ./myapp/solr.xml looks like this on server1:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server1 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard1 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The ./myapp/solr.xml looks like this on server2:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server2 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard2 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The first thing we observe is that Solr server1 (running collA_slice1_shard1) 
 seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) 
 is started up later it quickly reports the following in its solr.log an keeps 
 doing that for a long time:
 {code}
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
 Not Found
 request: 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
 at 
 org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 {code}
 Please note that we have changed a little bit in the way errors are logged, 
 but basically this means that Solr server2 gets an 404 Not Found on its 
 request 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2;
  to Solr server1.
 Seems like there is not a common agreement among the Solr servers on 
 how/where to send those requests and how/where to listen for them.
 Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-3273) 404 Not Found on action=PREPRECOVERY

2012-03-26 Thread Mark Miller (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-3273:
-

Assignee: Mark Miller

 404 Not Found on action=PREPRECOVERY
 

 Key: SOLR-3273
 URL: https://issues.apache.org/jira/browse/SOLR-3273
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller

 We have an application based on a recent copy of 4.0-SNAPSHOT. We have a 
 preformance test setup where we performance test our application (and 
 therefore indirectly Solr(Cloud)). When we run the performance test against a 
 setup using SolrCloud without replication, everything seems to run very 
 nicely for days. When we add replication to the setup the same performance 
 test shows some problems - which we will report (and maybe help fix) in 
 distinct issues here in jira.
 About the setup - the setup is a little more complex than described below, 
 but I believe the description will tell enough:
 We have two solr servers which we start from solr-install/example using 
 this command (ZooKeepers have been started before) - we first start solr on 
 server1, and then starts solr on server2 after solr on server1 finished 
 starting up: 
 {code}
 nohup java -Xmx4096m -Dcom.sun.management.jmxremote 
 -DzkHost=server1:2181,server2:2181,server3:2181 
 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf 
 -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties 
 -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log 
 {code}
 The ./myapp/solr.xml looks like this on server1:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server1 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard1 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The ./myapp/solr.xml looks like this on server2:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server2 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard2 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The first thing we observe is that Solr server1 (running collA_slice1_shard1) 
 seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) 
 is started up later it quickly reports the following in its solr.log an keeps 
 doing that for a long time:
 {code}
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
 Not Found
 request: 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
 at 
 org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 {code}
 Please note that we have changed a little bit in the way errors are logged, 
 but basically this means that Solr server2 gets an 404 Not Found on its 
 request 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2;
  to Solr server1.
 Seems like there is not a common agreement among the Solr servers on 
 how/where to send those requests and how/where to listen for them.
 Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2012-03-26 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238381#comment-13238381
 ] 

Mark Miller commented on SOLR-3274:
---

This happens because the connection between solr and zookeeper is lost - 
perhaps because the load on the box is too high. I think we may default to a 
fairly low timeout that could be raised (by default and manually).

 ZooKeeper related SolrCloud problems
 

 Key: SOLR-3274
 URL: https://issues.apache.org/jira/browse/SOLR-3274
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen

 Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
 Solr servers, running 28 slices of the same collection (collA) - all slices 
 have one replica (two shards all in all - leader + replica) - 56 cores all in 
 all (8 shards on each solr instance). But anyways...
 Besides the problem reported in SOLR-3273, the system seems to run fine under 
 high load for several hours, but eventually errors like the ones shown below 
 start to occur. I might be wrong, but they all seem to indicate some kind of 
 unstability in the collaboration between Solr and ZooKeeper. I have to say 
 that I havnt been there to check ZooKeeper at the moment where those 
 exception occur, but basically I dont believe the exceptions occur because 
 ZooKeeper is not running stable - at least when I go and check ZooKeeper 
 through other channels (e.g. my eclipse ZK plugin) it is always accepting 
 my connection and generally seems to be doing fine.
 Exception 1) Often the first error we see in solr.log is something like this
 {code}
 Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
 Updates are disabled.
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 {code}
 I believe this error basically occurs because SolrZkClient.isConnected 
 reports false, which means that its internal keeper.getState does not 
 return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED 
 for a long time, since this error starts occuring after several hours of 
 processing without this problem showing. But why is it suddenly not connected 
 anymore?!
 Exception 2) We also see errors like the following, and if Im not mistaken, 
 they start occuring shortly after Exception 1) (above) shows for the fist 
 time
 {code}
 Mar 22, 2012 5:07:26 AM 

[jira] [Assigned] (SOLR-3274) ZooKeeper related SolrCloud problems

2012-03-26 Thread Mark Miller (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller reassigned SOLR-3274:
-

Assignee: Mark Miller

 ZooKeeper related SolrCloud problems
 

 Key: SOLR-3274
 URL: https://issues.apache.org/jira/browse/SOLR-3274
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller

 Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
 Solr servers, running 28 slices of the same collection (collA) - all slices 
 have one replica (two shards all in all - leader + replica) - 56 cores all in 
 all (8 shards on each solr instance). But anyways...
 Besides the problem reported in SOLR-3273, the system seems to run fine under 
 high load for several hours, but eventually errors like the ones shown below 
 start to occur. I might be wrong, but they all seem to indicate some kind of 
 unstability in the collaboration between Solr and ZooKeeper. I have to say 
 that I havnt been there to check ZooKeeper at the moment where those 
 exception occur, but basically I dont believe the exceptions occur because 
 ZooKeeper is not running stable - at least when I go and check ZooKeeper 
 through other channels (e.g. my eclipse ZK plugin) it is always accepting 
 my connection and generally seems to be doing fine.
 Exception 1) Often the first error we see in solr.log is something like this
 {code}
 Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
 Updates are disabled.
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 {code}
 I believe this error basically occurs because SolrZkClient.isConnected 
 reports false, which means that its internal keeper.getState does not 
 return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED 
 for a long time, since this error starts occuring after several hours of 
 processing without this problem showing. But why is it suddenly not connected 
 anymore?!
 Exception 2) We also see errors like the following, and if Im not mistaken, 
 they start occuring shortly after Exception 1) (above) shows for the fist 
 time
 {code}
 Mar 22, 2012 5:07:26 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: no servers hosting shard: 
 at 
 org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:149)
 at 
 

[jira] [Created] (SOLR-3275) Add the ability to set shard and collection in web gui when adding a shard

2012-03-26 Thread Jamie Johnson (Created) (JIRA)
Add the ability to set shard and collection in web gui when adding a shard
--

 Key: SOLR-3275
 URL: https://issues.apache.org/jira/browse/SOLR-3275
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Affects Versions: 4.0
Reporter: Jamie Johnson


Currently the latest web gui allows you to add an additional core but does not 
allow you to specify the shard or collection that core should be part of.  In 
the core admin view when adding a core we should expose options to set these 
values when creating a core.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY

2012-03-26 Thread Per Steffensen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238384#comment-13238384
 ] 

Per Steffensen commented on SOLR-3273:
--

@Mark Miller: Thanks. We will try that. I would be very helpful if you could 
state exactly what you expect in adminPath. Does it have to be exactly 
/admin/cores or is /admin/cores/myapp allowed or does it have to be 
something else. Thanks!

@Erick Erickson: Please note that I am talking about the built-in replication 
of SolrCloud and not the old replication described at 
http://wiki.apache.org/solr/SolrReplication

 404 Not Found on action=PREPRECOVERY
 

 Key: SOLR-3273
 URL: https://issues.apache.org/jira/browse/SOLR-3273
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller
Priority: Minor

 We have an application based on a recent copy of 4.0-SNAPSHOT. We have a 
 preformance test setup where we performance test our application (and 
 therefore indirectly Solr(Cloud)). When we run the performance test against a 
 setup using SolrCloud without replication, everything seems to run very 
 nicely for days. When we add replication to the setup the same performance 
 test shows some problems - which we will report (and maybe help fix) in 
 distinct issues here in jira.
 About the setup - the setup is a little more complex than described below, 
 but I believe the description will tell enough:
 We have two solr servers which we start from solr-install/example using 
 this command (ZooKeepers have been started before) - we first start solr on 
 server1, and then starts solr on server2 after solr on server1 finished 
 starting up: 
 {code}
 nohup java -Xmx4096m -Dcom.sun.management.jmxremote 
 -DzkHost=server1:2181,server2:2181,server3:2181 
 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf 
 -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties 
 -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log 
 {code}
 The ./myapp/solr.xml looks like this on server1:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server1 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard1 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The ./myapp/solr.xml looks like this on server2:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server2 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard2 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The first thing we observe is that Solr server1 (running collA_slice1_shard1) 
 seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) 
 is started up later it quickly reports the following in its solr.log an keeps 
 doing that for a long time:
 {code}
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
 Not Found
 request: 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
 at 
 org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 {code}
 Please note that we have changed a little bit in the way errors are logged, 
 but basically this means that Solr server2 gets an 404 Not Found on its 
 request 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2;
  to Solr server1.
 Seems like there is not a common agreement among the Solr servers on 
 how/where to send those requests and how/where to listen for them.
 Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: 

[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2012-03-26 Thread Per Steffensen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238391#comment-13238391
 ] 

Per Steffensen commented on SOLR-3274:
--

Thanks a lot, Mark! 

Can all the exception be explained by connection loss between solr and 
zookeeper? 

Im not sure I totally buy the explanation because I believe that, even though 
there is a fairly high update/search-load on the machines in the cluster, the 
machines actually do not seem to be exhausted (CPU idle way above 0% (more like 
50% in average), not very high IO-wait etc.). So I would expect plenty of 
resources to be available for ZK to respond fast. But lets see what happens if 
we set the timeout higher. Can you point me in the direction of how to set it 
manually?

Regards, Per Steffensen

 ZooKeeper related SolrCloud problems
 

 Key: SOLR-3274
 URL: https://issues.apache.org/jira/browse/SOLR-3274
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller

 Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
 Solr servers, running 28 slices of the same collection (collA) - all slices 
 have one replica (two shards all in all - leader + replica) - 56 cores all in 
 all (8 shards on each solr instance). But anyways...
 Besides the problem reported in SOLR-3273, the system seems to run fine under 
 high load for several hours, but eventually errors like the ones shown below 
 start to occur. I might be wrong, but they all seem to indicate some kind of 
 unstability in the collaboration between Solr and ZooKeeper. I have to say 
 that I havnt been there to check ZooKeeper at the moment where those 
 exception occur, but basically I dont believe the exceptions occur because 
 ZooKeeper is not running stable - at least when I go and check ZooKeeper 
 through other channels (e.g. my eclipse ZK plugin) it is always accepting 
 my connection and generally seems to be doing fine.
 Exception 1) Often the first error we see in solr.log is something like this
 {code}
 Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
 Updates are disabled.
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 {code}
 I believe this error basically occurs because SolrZkClient.isConnected 
 reports false, which means that its internal keeper.getState does not 
 return 

[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY

2012-03-26 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238394#comment-13238394
 ] 

Mark Miller commented on SOLR-3273:
---

just adminPath=/admin/cores - same as you see in the default solr.xml.

Now I could make it so that we look up what the admin path is locally - but I 
don't know that we should - just because someone has changed the adminPath 
locally, doesn't mean they changed it on the 'remote' node. We don't really 
have a way of know what it is on the remote node. So it may be the right choice 
to just require that people leave it as is for solrcloud (though of course we 
should doc this).

 404 Not Found on action=PREPRECOVERY
 

 Key: SOLR-3273
 URL: https://issues.apache.org/jira/browse/SOLR-3273
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller
Priority: Minor

 We have an application based on a recent copy of 4.0-SNAPSHOT. We have a 
 preformance test setup where we performance test our application (and 
 therefore indirectly Solr(Cloud)). When we run the performance test against a 
 setup using SolrCloud without replication, everything seems to run very 
 nicely for days. When we add replication to the setup the same performance 
 test shows some problems - which we will report (and maybe help fix) in 
 distinct issues here in jira.
 About the setup - the setup is a little more complex than described below, 
 but I believe the description will tell enough:
 We have two solr servers which we start from solr-install/example using 
 this command (ZooKeepers have been started before) - we first start solr on 
 server1, and then starts solr on server2 after solr on server1 finished 
 starting up: 
 {code}
 nohup java -Xmx4096m -Dcom.sun.management.jmxremote 
 -DzkHost=server1:2181,server2:2181,server3:2181 
 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf 
 -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties 
 -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log 
 {code}
 The ./myapp/solr.xml looks like this on server1:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server1 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard1 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The ./myapp/solr.xml looks like this on server2:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server2 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard2 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The first thing we observe is that Solr server1 (running collA_slice1_shard1) 
 seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) 
 is started up later it quickly reports the following in its solr.log an keeps 
 doing that for a long time:
 {code}
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
 Not Found
 request: 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
 at 
 org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 {code}
 Please note that we have changed a little bit in the way errors are logged, 
 but basically this means that Solr server2 gets an 404 Not Found on its 
 request 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2;
  to Solr server1.
 Seems like there is not a common agreement among the Solr servers on 
 how/where to send those requests and how/where to listen for them.
 Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 

[jira] [Commented] (LUCENE-3888) split off the spell check word and surface form in spell check dictionary

2012-03-26 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3888?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238397#comment-13238397
 ] 

Robert Muir commented on LUCENE-3888:
-

Thanks for the feedback Koji.

I'm not happy with the situation: I thought it would be easy to support
some rough Japanese spellcheck in 3.6 

But it just seems like we need to do a lot of cleanup to make it work,
I would rather fix all of these APIs and do it right the first time so
that things like distributed support work too.


 split off the spell check word and surface form in spell check dictionary
 -

 Key: LUCENE-3888
 URL: https://issues.apache.org/jira/browse/LUCENE-3888
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/spellchecker
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 4.0

 Attachments: LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch, 
 LUCENE-3888.patch, LUCENE-3888.patch, LUCENE-3888.patch


 The did you mean? feature by using Lucene's spell checker cannot work well 
 for Japanese environment unfortunately and is the longstanding problem, 
 because the logic needs comparatively long text to check spells, but for some 
 languages (e.g. Japanese), most words are too short to use the spell checker.
 I think, for at least Japanese, the things can be improved if we split off 
 the spell check word and surface form in the spell check dictionary. Then we 
 can use ReadingAttribute for spell checking but CharTermAttribute for 
 suggesting, for example.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

2012-03-26 Thread Michael McCandless (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless reassigned LUCENE-3873:
--

Assignee: Michael McCandless

 tie MockGraphTokenFilter into all analyzers tests
 -

 Key: LUCENE-3873
 URL: https://issues.apache.org/jira/browse/LUCENE-3873
 Project: Lucene - Java
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Michael McCandless

 Mike made a MockGraphTokenFilter on LUCENE-3848.
 Many filters currently arent tested with anything but a simple tokenstream.
 we should test them with this, too, it might find bugs (zero-length terms,
 stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

2012-03-26 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238398#comment-13238398
 ] 

Michael McCandless commented on LUCENE-3873:


LUCENE-3848 has the MockGraphTokenFilter patch...

 tie MockGraphTokenFilter into all analyzers tests
 -

 Key: LUCENE-3873
 URL: https://issues.apache.org/jira/browse/LUCENE-3873
 Project: Lucene - Java
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir

 Mike made a MockGraphTokenFilter on LUCENE-3848.
 Many filters currently arent tested with anything but a simple tokenstream.
 we should test them with this, too, it might find bugs (zero-length terms,
 stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238402#comment-13238402
 ] 

Michael McCandless commented on LUCENE-3659:


This looks great Uwe!

I'm a little worried about the tiny file case; you're checking for
SEGMENTS_* now, but many other files can be much smaller than 1/64th
of the estimated segment size.

I wonder if we should improve IOContext to hold the [rough]
estimated file size (not just overall segment size)... the thing is
that's sort of a hassle on codec impls.

Or: maybe, on closing the ROS/RAMFile, we can downsize the final
buffer (yes, this means copying the bytes, but that cost is vanishingly
small as the RAMDir grows).  Then tiny files stay tiny, though they
are still [relatively] costly to create...

I don't this RAMDir.createOutput should publish the RAMFile until the
ROS is closed?  Ie, you are not allowed to openInput on something
still opened with createOutput in any Lucene Dir impl..?  This would
allow us to make RAMFile frozen (eg if ROS holds its own buffers and
then creates RAMFile on close), that requires no sync when reading?

I also don't think RAMFile should be public, ie, the only way to make
changes to a file stored in a RAMDir is via RAMOutputStream.  We can
do this separately...

Maybe we should pursue a growing buffer size...?  Ie, where each newly
added buffer is bigger than the one before (like ArrayUtil.oversize's
growth function)... I realize that adds complexity
(RAMInputStream.seek is more fun), but this would let tiny files use
tiny RAM and huge files use few buffers.  Ie, RAMDir would scale up
and scale down well.

Separately: I noticed we still have IndexOutput.setLength, but, nobody
calls it anymore I think?  (In 3.x we call this when creating a CFS).
Maybe we should remove it...


 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

2012-03-26 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238405#comment-13238405
 ] 

Robert Muir commented on LUCENE-3873:
-

One way we can tie this in is via LUCENE-3919.

But: I think we can use this filter in some individual tests immediately?

E.g. we can just add a method testRandomGraphs to the filters that do lots
of crazy state-capturing, putting this thing in-front-of/behind them in
the analyzer and call checkRandomData?

 tie MockGraphTokenFilter into all analyzers tests
 -

 Key: LUCENE-3873
 URL: https://issues.apache.org/jira/browse/LUCENE-3873
 Project: Lucene - Java
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Michael McCandless

 Mike made a MockGraphTokenFilter on LUCENE-3848.
 Many filters currently arent tested with anything but a simple tokenstream.
 we should test them with this, too, it might find bugs (zero-length terms,
 stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2012-03-26 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238404#comment-13238404
 ] 

Mark Miller commented on SOLR-3274:
---

bq. Can all the exception be explained by connection loss between solr and 
zookeeper?

bq. SessionExpiredException

This indicates the connection with ZooKeeper was lost.

bq. org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates 
are disabled.

If there is no connection to ZooKeeper, you will see this if you send an update.

bq. org.apache.solr.common.SolrException: no servers hosting shard: 

Sami Siren has a JIRA issue about improving this message I believe - but 
normally it means that the cluster does not see a single node hosting a given 
shard. Not sure if this is related to the above - not the same smoking gun.

bq. Can you point me in the direction of how to set it manually?

The default is only 10 seconds. I'd try 30 seconds perhaps? You don't want it 
too low, but you also don't want it too high if you can help it. I can't 
remember what the zookeeper default is, but I've seen it set as high as 60 
seconds looking around some hbase usage...

You should be able to set it in solr.xml as a cores attribute: 
zkClientTimeout=3 or whatever.

That is:   cores adminPath=/admin/cores zkClientTimeout=3

You'd want to do it for each node.

 ZooKeeper related SolrCloud problems
 

 Key: SOLR-3274
 URL: https://issues.apache.org/jira/browse/SOLR-3274
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller

 Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
 Solr servers, running 28 slices of the same collection (collA) - all slices 
 have one replica (two shards all in all - leader + replica) - 56 cores all in 
 all (8 shards on each solr instance). But anyways...
 Besides the problem reported in SOLR-3273, the system seems to run fine under 
 high load for several hours, but eventually errors like the ones shown below 
 start to occur. I might be wrong, but they all seem to indicate some kind of 
 unstability in the collaboration between Solr and ZooKeeper. I have to say 
 that I havnt been there to check ZooKeeper at the moment where those 
 exception occur, but basically I dont believe the exceptions occur because 
 ZooKeeper is not running stable - at least when I go and check ZooKeeper 
 through other channels (e.g. my eclipse ZK plugin) it is always accepting 
 my connection and generally seems to be doing fine.
 Exception 1) Often the first error we see in solr.log is something like this
 {code}
 Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
 Updates are disabled.
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at 

[jira] [Commented] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238407#comment-13238407
 ] 

Robert Muir commented on LUCENE-3659:
-

{quote}
I'm a little worried about the tiny file case; you're checking for
SEGMENTS_* now, but many other files can be much smaller than 1/64th
of the estimated segment size.

I wonder if we should improve IOContext to hold the [rough]
estimated file size (not just overall segment size)... the thing is
that's sort of a hassle on codec impls.
{quote}

Maybe its enough for IOContext to specify that its writing a 'metadata'
file? These are all the tiny ones (fieldinfos, segmentinfos, .cfe, etc),
as opposed to 'real files' like frq or prx that are expected to be possibly 
huge.



 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3275) Add the ability to set shard and collection in web gui when adding a shard

2012-03-26 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238411#comment-13238411
 ] 

Mark Miller commented on SOLR-3275:
---

bq. we should expose options to set these values when creating a core.

But they should probably only be visible if in cloud mode.

 Add the ability to set shard and collection in web gui when adding a shard
 --

 Key: SOLR-3275
 URL: https://issues.apache.org/jira/browse/SOLR-3275
 Project: Solr
  Issue Type: New Feature
  Components: web gui
Affects Versions: 4.0
Reporter: Jamie Johnson

 Currently the latest web gui allows you to add an additional core but does 
 not allow you to specify the shard or collection that core should be part of. 
  In the core admin view when adding a core we should expose options to set 
 these values when creating a core.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3273) 404 Not Found on action=PREPRECOVERY

2012-03-26 Thread Per Steffensen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238410#comment-13238410
 ] 

Per Steffensen commented on SOLR-3273:
--

Thanks a lot. It is ok for us just to use /admin/cores. We really do not 
mind. But at least it needs some documentation, or maybe share admin-path in 
ZK, so that a remote solr can acutally look it up. Well you decide that.

Regards, Per Steffensen

 404 Not Found on action=PREPRECOVERY
 

 Key: SOLR-3273
 URL: https://issues.apache.org/jira/browse/SOLR-3273
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller
Priority: Minor

 We have an application based on a recent copy of 4.0-SNAPSHOT. We have a 
 preformance test setup where we performance test our application (and 
 therefore indirectly Solr(Cloud)). When we run the performance test against a 
 setup using SolrCloud without replication, everything seems to run very 
 nicely for days. When we add replication to the setup the same performance 
 test shows some problems - which we will report (and maybe help fix) in 
 distinct issues here in jira.
 About the setup - the setup is a little more complex than described below, 
 but I believe the description will tell enough:
 We have two solr servers which we start from solr-install/example using 
 this command (ZooKeepers have been started before) - we first start solr on 
 server1, and then starts solr on server2 after solr on server1 finished 
 starting up: 
 {code}
 nohup java -Xmx4096m -Dcom.sun.management.jmxremote 
 -DzkHost=server1:2181,server2:2181,server3:2181 
 -Dbootstrap_confdir=./myapp/conf -Dcollection.configName=myapp_conf 
 -Dsolr.solr.home=./myapp -Djava.util.logging.config.file=logging.properties 
 -jar start.jar ./myapp/logs/stdout.log 2./myapp/logs/stderr.log 
 {code}
 The ./myapp/solr.xml looks like this on server1:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server1 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard1 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The ./myapp/solr.xml looks like this on server2:
 {code:xml}
 ?xml version=1.0 encoding=UTF-8 ?
 solr persistent=false
   cores adminPath=/admin/myapp host=server2 hostPort=8983 
 hostContext=solr
 core name=collA_slice1_shard2 instanceDir=. 
 dataDir=collA_slice1_data collection=collA shard=slice1 /
   /cores
 /solr
 {code}
 The first thing we observe is that Solr server1 (running collA_slice1_shard1) 
 seems to start up nicely, but when Solr server2 (running collA_slice1_shard2) 
 is started up later it quickly reports the following in its solr.log an keeps 
 doing that for a long time:
 {code}
 SEVERE: Error while trying to recover:org.apache.solr.common.SolrException: 
 Not Found
 request: 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2
 at 
 org.apache.solr.common.SolrExceptionPropagationHelper.decodeFromMsg(SolrExceptionPropagationHelper.java:40)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:445)
 at 
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:264)
 at 
 org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:188)
 at 
 org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:285)
 at 
 org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:206)
 {code}
 Please note that we have changed a little bit in the way errors are logged, 
 but basically this means that Solr server2 gets an 404 Not Found on its 
 request 
 http://server1:8983/solr/admin/cores?action=PREPRECOVERYcore=collA_slice1_shard1nodeName=server2%3A8983_solrcoreNodeName=server2%3A8983_solr_collA_slice1_shard2state=recoveringcheckLive=truepauseFor=6000wt=javabinversion=2;
  to Solr server1.
 Seems like there is not a common agreement among the Solr servers on 
 how/where to send those requests and how/where to listen for them.
 Regards, Per Steffensen

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2012-03-26 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238409#comment-13238409
 ] 

Mark Miller commented on SOLR-3274:
---

bq. not the same smoking gun.

Sorry - actually this does make sense with the other errors - if the zk 
connection is lost, that node is no longer considered live - if that happens to 
each node hosting a shard (say you have 1 replica and this happened to both 
nodes) then searches would fail with this.

 ZooKeeper related SolrCloud problems
 

 Key: SOLR-3274
 URL: https://issues.apache.org/jira/browse/SOLR-3274
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller

 Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
 Solr servers, running 28 slices of the same collection (collA) - all slices 
 have one replica (two shards all in all - leader + replica) - 56 cores all in 
 all (8 shards on each solr instance). But anyways...
 Besides the problem reported in SOLR-3273, the system seems to run fine under 
 high load for several hours, but eventually errors like the ones shown below 
 start to occur. I might be wrong, but they all seem to indicate some kind of 
 unstability in the collaboration between Solr and ZooKeeper. I have to say 
 that I havnt been there to check ZooKeeper at the moment where those 
 exception occur, but basically I dont believe the exceptions occur because 
 ZooKeeper is not running stable - at least when I go and check ZooKeeper 
 through other channels (e.g. my eclipse ZK plugin) it is always accepting 
 my connection and generally seems to be doing fine.
 Exception 1) Often the first error we see in solr.log is something like this
 {code}
 Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
 Updates are disabled.
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 {code}
 I believe this error basically occurs because SolrZkClient.isConnected 
 reports false, which means that its internal keeper.getState does not 
 return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED 
 for a long time, since this error starts occuring after several hours of 
 processing without this problem showing. But why is it suddenly not connected 
 anymore?!
 Exception 2) We also see errors like the following, and if Im not mistaken, 
 they start 

[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2012-03-26 Thread Per Steffensen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238419#comment-13238419
 ] 

Per Steffensen commented on SOLR-3274:
--

U 10 secs is A LOT OF TIME. I really wouldnt want to set it higher that 
that. If ZK is not able to answer within 10 secs I need to correct something 
else in my setup. 

I still believe that Solr might end in this state (where it believes that the 
connection to ZK is lost) some other way than actually experiencing a 10+ sec 
response-time from ZK, but I cant prove it (yet). So for now I will just thank 
you for your kind help, and assume that it is correct. Then basically my 
options are to setup a more responsive ZK cluster or maybe raise the ZK timeout 
on Solr side. 

Thanks, again.

Regards, Per Steffensen

 ZooKeeper related SolrCloud problems
 

 Key: SOLR-3274
 URL: https://issues.apache.org/jira/browse/SOLR-3274
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller

 Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
 Solr servers, running 28 slices of the same collection (collA) - all slices 
 have one replica (two shards all in all - leader + replica) - 56 cores all in 
 all (8 shards on each solr instance). But anyways...
 Besides the problem reported in SOLR-3273, the system seems to run fine under 
 high load for several hours, but eventually errors like the ones shown below 
 start to occur. I might be wrong, but they all seem to indicate some kind of 
 unstability in the collaboration between Solr and ZooKeeper. I have to say 
 that I havnt been there to check ZooKeeper at the moment where those 
 exception occur, but basically I dont believe the exceptions occur because 
 ZooKeeper is not running stable - at least when I go and check ZooKeeper 
 through other channels (e.g. my eclipse ZK plugin) it is always accepting 
 my connection and generally seems to be doing fine.
 Exception 1) Often the first error we see in solr.log is something like this
 {code}
 Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
 Updates are disabled.
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 {code}
 I believe this error basically occurs because SolrZkClient.isConnected 
 reports false, which means that its internal keeper.getState does not 
 return 

[jira] [Commented] (LUCENE-3873) tie MockGraphTokenFilter into all analyzers tests

2012-03-26 Thread Michael McCandless (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238422#comment-13238422
 ] 

Michael McCandless commented on LUCENE-3873:


I agree we can use it in specific places for starters...

The patch on LUCENE-3848 mixes in TokenStream to Automaton and 
MockGraphTokenFilter; I'll split that apart and only commit 
MockGraphTokenFilter here.

One problem is... MockGraphTokenFilter isn't setting offsets currently I 
think to do this correctly it needs to buffer up pending input tokens, until 
it's reached the posLength it wants to output for a random token, and then set 
the offset accordingly.

 tie MockGraphTokenFilter into all analyzers tests
 -

 Key: LUCENE-3873
 URL: https://issues.apache.org/jira/browse/LUCENE-3873
 Project: Lucene - Java
  Issue Type: Task
  Components: modules/analysis
Reporter: Robert Muir
Assignee: Michael McCandless

 Mike made a MockGraphTokenFilter on LUCENE-3848.
 Many filters currently arent tested with anything but a simple tokenstream.
 we should test them with this, too, it might find bugs (zero-length terms,
 stacked terms/synonyms, etc)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2012-03-26 Thread Per Steffensen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238428#comment-13238428
 ] 

Per Steffensen commented on SOLR-3274:
--

But why not just try to reconnect if/when this situation has occured, so that 
Solr can continue doing its work? I guess Solr does not do that, because it 
seems like when this error has first established, there is no recovering, and 
certainly (Im close to 100% positive) ZK will not continue doing 10+ secs 
response-times to all requests, even though it might do a 10+ sec response once 
in a while.

Regards, Per Steffensen

 ZooKeeper related SolrCloud problems
 

 Key: SOLR-3274
 URL: https://issues.apache.org/jira/browse/SOLR-3274
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller

 Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
 Solr servers, running 28 slices of the same collection (collA) - all slices 
 have one replica (two shards all in all - leader + replica) - 56 cores all in 
 all (8 shards on each solr instance). But anyways...
 Besides the problem reported in SOLR-3273, the system seems to run fine under 
 high load for several hours, but eventually errors like the ones shown below 
 start to occur. I might be wrong, but they all seem to indicate some kind of 
 unstability in the collaboration between Solr and ZooKeeper. I have to say 
 that I havnt been there to check ZooKeeper at the moment where those 
 exception occur, but basically I dont believe the exceptions occur because 
 ZooKeeper is not running stable - at least when I go and check ZooKeeper 
 through other channels (e.g. my eclipse ZK plugin) it is always accepting 
 my connection and generally seems to be doing fine.
 Exception 1) Often the first error we see in solr.log is something like this
 {code}
 Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
 Updates are disabled.
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 {code}
 I believe this error basically occurs because SolrZkClient.isConnected 
 reports false, which means that its internal keeper.getState does not 
 return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED 
 for a long time, since this error starts occuring after several hours of 
 processing without this problem showing. But why is it 

[jira] [Commented] (LUCENE-3909) Move Kuromoji to analysis.ja and introduce Japanese* naming

2012-03-26 Thread Christian Moen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238435#comment-13238435
 ] 

Christian Moen commented on LUCENE-3909:


Committed revision 1305367 and 1305372 on {{branch_3x}}.

I forgot to rename a few Solr test classes.  Will follow up now in this JIRA.

 Move Kuromoji to analysis.ja and introduce Japanese* naming
 ---

 Key: LUCENE-3909
 URL: https://issues.apache.org/jira/browse/LUCENE-3909
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
Assignee: Christian Moen

 Lucene/Solr 3.6 and 4.0 will get out-of-the-box Japanese language support 
 through {{KuromojiAnalyzer}}, {{KuromojiTokenizer}} and various other 
 filters.  These filters currently live in 
 {{org.apache.lucene.analysis.kuromoji}}.
 I'm proposing that we move Kuromoji to a new Japanese package 
 {{org.apache.lucene.analysis.ja}} in line with how other languages are 
 organized.  As part of this, I also think we should rename 
 {{KuromojiAnalyzer}} to {{JapaneseAnalyzer}}, etc. to further align naming to 
 our conventions by making it very clear that these analyzers are for 
 Japanese.  (As much as I like the name Kuromoji, I think Japanese is more 
 fitting.)
 A potential issue I see with this that I'd like to raise and get feedback on, 
 is that end-users in Japan and elsewhere who use lucene-gosen could have 
 issues after an upgrade since lucene-gosen is in fact releasing its analyzers 
 under the {{org.apache.lucene.analysis.ja}} namespace (and we'd have a name 
 clash).
 I believe users should have the freedom to choose whichever Japanese 
 analyzer, filter, etc. they'd like to use, and I don't want to propose a name 
 change that just creates unnecessary problems for users, but I think the 
 naming proposed above is most fitting for a Lucene/Solr release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Issue Comment Edited] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Uwe Schindler (Issue Comment Edited) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238441#comment-13238441
 ] 

Uwe Schindler edited comment on LUCENE-3659 at 3/26/12 2:46 PM:


Robert: That was the first idea that came to my mind, too. I think thats a good 
idea. It especially strange that the segments_xx/segments.gen file (which is 
not part of the current segment) is written with MERGE/FLUSH context. It should 
be written with a standard context? Or do I miss something? (This was the 
reason why I added the file name check). Initially I was expecting that writing 
the commit is done with a separate IOContext, but it isn't - the noisy 
debugging helps.

  was (Author: thetaphi):
Robert: That was the first idea that came to my mind, too. I think thats a 
good idea. It especially strange that the segments_xx file (which is not part 
of the current segment) is written with MERGE/FLUSH context. It should be 
written with a standard context? Or do I miss something? (This was the reason 
why I added the file name check). Initially I was expecting that writing the 
commit is done with a separate IOContext, but it isn't - the noisy debugging 
helps.
  
 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238448#comment-13238448
 ] 

Robert Muir commented on LUCENE-3659:
-

I think if we were to implement it this way, its not a burden on codecs.
By default, somewhere in lucene core inits the codec APIs with a context always.
For example SegmentInfos.write():
{code}
infosWriter.writeInfos(directory, segmentFileName, codec.getName(), this, 
IOContext.DEFAULT);
{code}

and DocFieldProcessor/SegmentMerger for fieldinfos:
{code}
infosWriter.write(state.directory, state.segmentName, state.fieldInfos, 
IOContext.DEFAULT);
{code}

These guys would just set this in the IOContext. Most/All codecs just pass this 
along.
If a codec wants to ignore the IOContext and lie about it, thats its own choice.
So I think its an easy change.


 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238453#comment-13238453
 ] 

Robert Muir commented on LUCENE-3659:
-

But also codecs that write their own private tiny metadata files (like .per 
from PerFieldPostingsFormat)
should set this in the context.

 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238454#comment-13238454
 ] 

Robert Muir commented on LUCENE-3659:
-

Live docs aren't a metadata. I think you are conflating 'tiny' with 'metadata'.

I'm saying we should declare its metadata, thats all. This is pretty black and 
white!

IF a directory wants to, as a heuristic, interpret metadata == tiny, then thats 
fine,
but thats separate.

 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2012-03-26 Thread Mark Miller (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238455#comment-13238455
 ] 

Mark Miller commented on SOLR-3274:
---

{quote}
But why not just try to reconnect if/when this situation has occured, so that 
Solr can continue doing its work? I guess Solr does not do that, because it 
seems like when this error has first established, there is no recovering, and 
certainly (Im close to 100% positive) ZK will not continue doing 10+ secs 
response-times to all requests, even though it might do a 10+ sec response once 
in a while.
{quote}

Solr does try to reconnect - but there can be no recovering due to the other 
issue you posted - because you have changed the core admin url.

 ZooKeeper related SolrCloud problems
 

 Key: SOLR-3274
 URL: https://issues.apache.org/jira/browse/SOLR-3274
 Project: Solr
  Issue Type: Bug
  Components: SolrCloud
Affects Versions: 4.0
 Environment: Any
Reporter: Per Steffensen
Assignee: Mark Miller

 Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
 Solr servers, running 28 slices of the same collection (collA) - all slices 
 have one replica (two shards all in all - leader + replica) - 56 cores all in 
 all (8 shards on each solr instance). But anyways...
 Besides the problem reported in SOLR-3273, the system seems to run fine under 
 high load for several hours, but eventually errors like the ones shown below 
 start to occur. I might be wrong, but they all seem to indicate some kind of 
 unstability in the collaboration between Solr and ZooKeeper. I have to say 
 that I havnt been there to check ZooKeeper at the moment where those 
 exception occur, but basically I dont believe the exceptions occur because 
 ZooKeeper is not running stable - at least when I go and check ZooKeeper 
 through other channels (e.g. my eclipse ZK plugin) it is always accepting 
 my connection and generally seems to be doing fine.
 Exception 1) Often the first error we see in solr.log is something like this
 {code}
 Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
 SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
 Updates are disabled.
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
 at 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
 at 
 org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
 at 
 org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
 at org.mortbay.jetty.Server.handle(Server.java:326)
 at 
 org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
 at 
 org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
 at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
 at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
 at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
 at 
 org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
 at 
 org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
 {code}
 I believe this error basically occurs because SolrZkClient.isConnected 
 reports false, which means that its internal keeper.getState does not 
 return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED 
 

[jira] [Resolved] (SOLR-3262) Remove threads from DIH (Trunk only)

2012-03-26 Thread James Dyer (Resolved) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Dyer resolved SOLR-3262.
--

Resolution: Fixed

committed.  Trunk:  r1305384

 Remove threads from DIH (Trunk only)
 --

 Key: SOLR-3262
 URL: https://issues.apache.org/jira/browse/SOLR-3262
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Affects Versions: 4.0
Reporter: James Dyer
Assignee: James Dyer
Priority: Minor
 Fix For: 4.0

 Attachments: SOLR-3262.patch


 SOLR-1352 introduced a multi-threading feature for DataImportHandler.  
 Historically, this feature only seemed to work in a limited set of cases and 
 I don't think we can guarantee users that using threads will behave 
 consistently.  Also, the multi-threaded option adds considerable complexity 
 making code refactoring difficult. 
 I propose removing threads from Trunk.  (But keep it in 3.x, applying any 
 bug fixes for it there.)  This can be a first step in improving the DIH code 
 base.  
 Eventually we can possibly add a carefully though-out threads 
 implementation back in.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



3.6 status

2012-03-26 Thread Robert Muir
please wrap up your changes to 3.6 by wednesday.

As described earlier: on wednesday branch_3x becomes our release
branch. I will move all jira issues out of 3.6 unless they are marked
blocker bugs.
I will then send an email that the branch is frozen and any changes
should have an associated jira.

Wednesday is ~3.5 weeks since the initial please wrap up in 2 weeks
email so I think its fair.

-- 
lucidimagination.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (SOLR-3276) Update ja_text entry in schema.xml with useful info

2012-03-26 Thread Christian Moen (Created) (JIRA)
Update ja_text entry in schema.xml with useful info
---

 Key: SOLR-3276
 URL: https://issues.apache.org/jira/browse/SOLR-3276
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.6, 4.0
Reporter: Christian Moen


Searching Japanese text is a big topic with many considerations that need to be 
made.  I think it's helpful to add a link to the wiki in a comment near 
{{text_ja}} in {{scheme.xml}} to guide users to detailed information on 
features available, how to use them, etc.

I've made a placeholder page on 
[http://wiki.apache.org/solr/JapaneseLanguageSupport] and I'll add details 
post-release.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Assigned] (SOLR-3276) Update ja_text entry in schema.xml with useful info

2012-03-26 Thread Christian Moen (Assigned) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christian Moen reassigned SOLR-3276:


Assignee: Christian Moen

 Update ja_text entry in schema.xml with useful info
 ---

 Key: SOLR-3276
 URL: https://issues.apache.org/jira/browse/SOLR-3276
 Project: Solr
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
Assignee: Christian Moen

 Searching Japanese text is a big topic with many considerations that need to 
 be made.  I think it's helpful to add a link to the wiki in a comment near 
 {{text_ja}} in {{scheme.xml}} to guide users to detailed information on 
 features available, how to use them, etc.
 I've made a placeholder page on 
 [http://wiki.apache.org/solr/JapaneseLanguageSupport] and I'll add details 
 post-release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3924) Optimize buffer size handling in RAMDirectory to make it more GC friendly

2012-03-26 Thread Uwe Schindler (Created) (JIRA)
Optimize buffer size handling in RAMDirectory to make it more GC friendly
-

 Key: LUCENE-3924
 URL: https://issues.apache.org/jira/browse/LUCENE-3924
 Project: Lucene - Java
  Issue Type: Improvement
  Components: core/store
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0


RAMDirectory currently uses a fixed buffer size of 1024 bytes to allocate 
memory. This is very wasteful for large indexes. Improvements may be:
- per file buffer sizes based on IOContext and maximum segment size
- allocate only one buffer for files that are copied from another directory
- dynamically increae buffer size when files grow (makes seek() complicated)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3925) Spatial field types should not store doc frequencies or positions

2012-03-26 Thread David Smiley (Created) (JIRA)
Spatial field types should not store doc frequencies or positions
-

 Key: LUCENE-3925
 URL: https://issues.apache.org/jira/browse/LUCENE-3925
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/spatial
Reporter: Simon Willnauer
Assignee: David Smiley
Priority: Minor
 Fix For: 4.0


It appears the corrections is simply to supply IndexOptions.DOCS_ONLY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Created] (LUCENE-3926) Improve Javadocs of RAMDirectory to document its limitations

2012-03-26 Thread Uwe Schindler (Created) (JIRA)
Improve Javadocs of RAMDirectory to document its limitations


 Key: LUCENE-3926
 URL: https://issues.apache.org/jira/browse/LUCENE-3926
 Project: Lucene - Java
  Issue Type: Sub-task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0
 Attachments: LUCENE-3659.patch

Spinoff from several dev@lao issues:
- 
[http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
- issue LUCENE-3653

The use cases for RAMDirectory are very limited and to prevent users from using 
it for e.g. loading a 50 Gigabyte index from a file on disk, we should improve 
the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3926) Improve Javadocs of RAMDirectory to document its limitations

2012-03-26 Thread Uwe Schindler (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3926:
--

Attachment: (was: LUCENE-3659.patch)

 Improve Javadocs of RAMDirectory to document its limitations
 

 Key: LUCENE-3926
 URL: https://issues.apache.org/jira/browse/LUCENE-3926
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/store
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3926) Improve Javadocs of RAMDirectory to document its limitations

2012-03-26 Thread Uwe Schindler (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3926:
--

Attachment: (was: LUCENE-3659.patch)

 Improve Javadocs of RAMDirectory to document its limitations
 

 Key: LUCENE-3926
 URL: https://issues.apache.org/jira/browse/LUCENE-3926
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/store
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3659) Improve Javadocs of RAMDirectory to document its limitations and add improvements to make it more GC friendly on large indexes

2012-03-26 Thread Uwe Schindler (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3659:
--

Issue Type: Sub-task  (was: Task)
Parent: LUCENE-3924

 Improve Javadocs of RAMDirectory to document its limitations and add 
 improvements to make it more GC friendly on large indexes
 --

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Sub-task
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENE-3659) Allow per-RAMFile buffer sizes based on IOContext and source of data (e.g. copy from another directory)

2012-03-26 Thread Uwe Schindler (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-3659:
--

Affects Version/s: (was: 3.5)
Fix Version/s: (was: 3.6)
  Summary: Allow per-RAMFile buffer sizes based on IOContext and 
source of data (e.g. copy from another directory)  (was: Improve Javadocs of 
RAMDirectory to document its limitations and add improvements to make it more 
GC friendly on large indexes)

 Allow per-RAMFile buffer sizes based on IOContext and source of data (e.g. 
 copy from another directory)
 ---

 Key: LUCENE-3659
 URL: https://issues.apache.org/jira/browse/LUCENE-3659
 Project: Lucene - Java
  Issue Type: Sub-task
Affects Versions: 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 4.0

 Attachments: LUCENE-3659.patch, LUCENE-3659.patch, LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3925) Spatial field types should not store doc frequencies or positions

2012-03-26 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3925?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238495#comment-13238495
 ] 

Robert Muir commented on LUCENE-3925:
-

+1

 Spatial field types should not store doc frequencies or positions
 -

 Key: LUCENE-3925
 URL: https://issues.apache.org/jira/browse/LUCENE-3925
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/spatial
Reporter: Simon Willnauer
Assignee: David Smiley
Priority: Minor
 Fix For: 4.0

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 It appears the corrections is simply to supply IndexOptions.DOCS_ONLY

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (LUCENE-3926) Improve Javadocs of RAMDirectory to document its limitations

2012-03-26 Thread Uwe Schindler (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238496#comment-13238496
 ] 

Uwe Schindler commented on LUCENE-3926:
---

This issue should only so javadocs improvements!

 Improve Javadocs of RAMDirectory to document its limitations
 

 Key: LUCENE-3926
 URL: https://issues.apache.org/jira/browse/LUCENE-3926
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: core/store
Affects Versions: 3.5, 4.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.6, 4.0

 Attachments: LUCENE-3659.patch


 Spinoff from several dev@lao issues:
 - 
 [http://mail-archives.apache.org/mod_mbox/lucene-dev/201112.mbox/%3C001001ccbf1c%2471845830%24548d0890%24%40thetaphi.de%3E]
 - issue LUCENE-3653
 The use cases for RAMDirectory are very limited and to prevent users from 
 using it for e.g. loading a 50 Gigabyte index from a file on disk, we should 
 improve the javadocs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: Case where StandardAnalyzer doesn't remove punctuation

2012-03-26 Thread colm.mchugh
Hi Steve,

thanks for your response. Totally makes sense, given that the comma
character is a widely used for written number syntax (e.g. 1000 is the same
as 1,000). Thanks also for the notes re the mailing list and nabble.

Colm.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Case-where-StandardAnalyzer-doesn-t-remove-punctuation-tp3848460p3858661.html
Sent from the Lucene - Java Developer mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-435) QParser must validate existance/absense of q parameter

2012-03-26 Thread Hoss Man (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238538#comment-13238538
 ] 

Hoss Man commented on SOLR-435:
---

bq. If the purpose of the QueryComponent is to be QParser agnostic and 
consequently unable to know if the 'q' parameter is even relevant, shouldn't it 
be up to the QParser to retrieve what it believes the query string to be from 
the request parameters?

Sorry ... i chose my words carelessly and wound up saying almost the exact 
opposite of what i ment.

What i should have said...

* QueryComponent is responsible for determining the QParser to use for the main 
query and passing it the value of the q query-string param to the  
QParser.getParser(...) method
* QParser.getParser passes that query-string on to whater QParserPlugin was 
selected as the qstr param to the createParser
* The QParser that gets created by the createParser call should do whatever 
validation it needs to do (including a null check) in it's parse() method

In answer to your questions...

* QueryComponent can not do any validation of the q param, because it can't 
make any assumptions about what the defType QParser this are legal values -- 
not even a null check, because in case of things like dismax nll is perfectly 
fine
* QParsers (and QParserPlugins) can't be made responsible for fetching the q 
param because they don't know if/when they are being used to parse the main 
query param, vs fq params, vs some other nested subquery
* by putting this kind of validation/error checking in the QParser.parse 
method, we ensure that it is used properly even when the QParser(s) are used 
for things like 'fq' params or in nested subqueries

bq. Hoss: I don't agree with your reasoning on the developer-user typo-ing the 
'q' parameter. If you mistype basically any parameter then clearly it is as if 
you didn't even specify that parameter and you get the default behavior of the 
parameter you were trying to type correctly but didn't. 

understood ... but most other situations the default behavior is either do 
nothing or error ... we don't have a lot of default behaviors which are 
give me tones of stuff ... if you use {{facet=truefaceet.field=foo}} (note 
the extra character) you don't silently get get faceting on every field as a 
default -- you get no field faceting at all. if you misstype the q param name 
and get an error on your first attempt you immediately understand you did 
something wrong.  likewise if we made the default a matches nothing query, 
then you'd get no results and (hopefully) be suspicious enough to realize you 
made a mistake -- but if we give you a bunch of results by default you may not 
realize at all that you're looking at all results not just the results of what 
you thought the query was.  the only situations i can think of where forgetting 
or mistyping a param name doens't default to error or nothing are things with 
fixed expectations: start, rows, fl, etc...  Those have defaults that (if they 
don't match what you tried to specify) are immediately obvious ... the 'start' 
attribute on the docList returned is wrong, you get more results then you 
expected, you get field names you know you didn't specify, etc...  it's less 
obvious when you are looking at the results of a query that it's a match-all 
query instead of the query you thought you were specifying.

like i said ... i'm -0 to having a hardcoded default query for 
lucene/dismax/edismax ... if you feel strongly about it that's fine, allthough 
i would try to convince you match none is a better hardcoded default then 
'match all' (so that it's easier to recognize mistakes quickly) and really 
don't think we should do it w/o also add q.alt support to the LuceneQParser so 
people can override it.



 QParser must validate existance/absense of q parameter
 

 Key: SOLR-435
 URL: https://issues.apache.org/jira/browse/SOLR-435
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 3.6, 4.0

 Attachments: SOLR-435_q_defaults_to_all-docs.patch


 Each QParser should check if q exists or not.  For some it will be required 
 others not.
 currently it throws a null pointer:
 {code}
 java.lang.NullPointerException
   at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:36)
   at 
 org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104)
   at org.apache.solr.search.QParser.getQuery(QParser.java:80)
   at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:67)
   at 
 org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:150)
 ...
 {code}
 see:
 

[jira] [Commented] (LUCENE-3909) Move Kuromoji to analysis.ja and introduce Japanese* naming

2012-03-26 Thread Christian Moen (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-3909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238536#comment-13238536
 ] 

Christian Moen commented on LUCENE-3909:


Committed revision 1305421 on {{trunk}} and 1305437 to {{branch_3x}}.

 Move Kuromoji to analysis.ja and introduce Japanese* naming
 ---

 Key: LUCENE-3909
 URL: https://issues.apache.org/jira/browse/LUCENE-3909
 Project: Lucene - Java
  Issue Type: Improvement
  Components: modules/analysis
Affects Versions: 3.6, 4.0
Reporter: Christian Moen
Assignee: Christian Moen

 Lucene/Solr 3.6 and 4.0 will get out-of-the-box Japanese language support 
 through {{KuromojiAnalyzer}}, {{KuromojiTokenizer}} and various other 
 filters.  These filters currently live in 
 {{org.apache.lucene.analysis.kuromoji}}.
 I'm proposing that we move Kuromoji to a new Japanese package 
 {{org.apache.lucene.analysis.ja}} in line with how other languages are 
 organized.  As part of this, I also think we should rename 
 {{KuromojiAnalyzer}} to {{JapaneseAnalyzer}}, etc. to further align naming to 
 our conventions by making it very clear that these analyzers are for 
 Japanese.  (As much as I like the name Kuromoji, I think Japanese is more 
 fitting.)
 A potential issue I see with this that I'd like to raise and get feedback on, 
 is that end-users in Japan and elsewhere who use lucene-gosen could have 
 issues after an upgrade since lucene-gosen is in fact releasing its analyzers 
 under the {{org.apache.lucene.analysis.ja}} namespace (and we'd have a name 
 clash).
 I believe users should have the freedom to choose whichever Japanese 
 analyzer, filter, etc. they'd like to use, and I don't want to propose a name 
 change that just creates unnecessary problems for users, but I think the 
 naming proposed above is most fitting for a Lucene/Solr release.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Reopened] (LUCENENET-466) optimisation for the GermanStemmer.vb‏

2012-03-26 Thread Christopher Currens (Reopened) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Currens reopened LUCENENET-466:
---


I see what you're saying.  I missed that in the original conversation that was 
linked to in an earlier comment.

{quote}
ue occurs pretty often as an infix (think of *steuer*): about 1.5%
of the words of the German aspell dictionary are affected. ae and
oe are rather seldom.

Still, it may be worth a try, because the stemmer doesn't work
morphologically anyway. It doesn't really matter if steuer is
stemmed as steur or steu as long as it's consistent.
{quote}

I'm thinking that as long as it is made clear that this behavior is in the 
second stemmer, this would probably be an okay change to make as the second 
option in a way that doesn't break the root of the word.

 optimisation for the GermanStemmer.vb‏
 --

 Key: LUCENENET-466
 URL: https://issues.apache.org/jira/browse/LUCENENET-466
 Project: Lucene.Net
  Issue Type: Improvement
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3
Reporter: Prescott Nasser
Priority: Minor
 Fix For: Lucene.Net 3.0.3


 I have a little optimisation for the GermanStemmer.vb (in 
 Contrib.Analyzers) class. At the moment the function Substitute 
 converts the german Umlaute ä in a, ö ino and ü in u. This 
 is not the correct german translation. They must be converted to ae, 
 oe and ue. So I can write the name Björn or Bjoern but not 
 Bjorn. With this optimization a user can search for Björn and also 
 find Bjoern.
  
 Here is the optimized code snippet:
  
 else if ( buffer[c] == 'ä' )
  {
  buffer[c] = 'a';
  buffer.Insert(c + 1, 'e');
  }
  else if ( buffer[c] == 'ö' )
  {
  buffer[c] = 'o';
  buffer.Insert(c + 1,'e');
  }
  else if ( buffer[c] == 'ü' )
  {
  buffer[c] = 'u';
  buffer.Insert(c + 1,'e');
  }
  
 Thank You
 Björn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (SOLR-3231) Add the ability to KStemmer to preserve the original token when stemming

2012-03-26 Thread Mark Miller (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mark Miller updated SOLR-3231:
--

Affects Version/s: (was: 4.0)
Fix Version/s: 4.0

 Add the ability to KStemmer to preserve the original token when stemming
 

 Key: SOLR-3231
 URL: https://issues.apache.org/jira/browse/SOLR-3231
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Jamie Johnson
 Fix For: 4.0

 Attachments: KStemFilter.patch


 While using the PorterStemmer, I found that there were often times that it 
 was far to aggressive in it's stemming.  In my particular case it is 
 unrealistic to provide a protected word list which captures all possible 
 words which should not be stemmed.  To avoid this I proposed a solution 
 whereby we store the original token as well as the stemmed token so exact 
 searches would always work.  Based on discussions on the mailing list Ahmet 
 Arslan, I believe the attached patch to KStemmer provides the desired 
 capabilities through a configuration parameter.  This largely is a copy of 
 the org.apache.lucene.wordnet.SynonymTokenFilter.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[JENKINS] Solr-3.x - Build # 642 - Failure

2012-03-26 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Solr-3.x/642/

No tests ran.

Build Log (for compile errors):
[...truncated 7071 lines...]
jar-analyzers-common:

common.init:

compile-lucene-core:

jflex-uptodate-check:

jflex-notice:

javacc-uptodate-check:

javacc-notice:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:

compile-core:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java
[javac] Compiling 26 source files to 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:153:
 method does not override a method from its superclass
[javac]   @Override 
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:158:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:163:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:168:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:186:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:207:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:212:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:223:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:228:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:82:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:197:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:202:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:207:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:212:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Solr-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:217:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 

[JENKINS] Lucene-Solr-tests-only-3.x - Build # 12878 - Failure

2012-03-26 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x/12878/

No tests ran.

Build Log (for compile errors):
[...truncated 550 lines...]
 [echo] 

common.init:

compile-lucene-core:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:

compile-core:

jar-core:
 [exec] Result: 1
  [jar] Building jar: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/contrib/analyzers/common/lucene-analyzers-3.6-SNAPSHOT.jar

common.init:

compile-lucene-core:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java
[javac] Compiling 26 source files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:153:
 method does not override a method from its superclass
[javac]   @Override 
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:158:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:163:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:168:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:186:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:207:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:212:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:223:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:228:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:82:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:197:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:202:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:207:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 

[JENKINS] Lucene-Solr-tests-only-3.x-java7 - Build # 2115 - Failure

2012-03-26 Thread Apache Jenkins Server
Build: https://builds.apache.org/job/Lucene-Solr-tests-only-3.x-java7/2115/

No tests ran.

Build Log (for compile errors):
[...truncated 532 lines...]
 [echo] 

common.init:

compile-lucene-core:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:

compile-core:

jar-core:
 [exec] Result: 1
  [jar] Building jar: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build/contrib/analyzers/common/lucene-analyzers-3.6-SNAPSHOT.jar

common.init:

compile-lucene-core:

init:

clover.setup:

clover.info:
 [echo] 
 [echo]   Clover not found. Code coverage reports disabled.
 [echo] 

clover:

common.compile-core:
[mkdir] Created dir: 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java
[javac] Compiling 26 source files to 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:153:
 method does not override a method from its superclass
[javac]   @Override 
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:158:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:163:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:168:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:186:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:207:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:212:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:223:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/BinaryDictionary.java:228:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:82:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:197:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:202:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 
/usr/home/hudson/hudson-slave/workspace/Lucene-Solr-tests-only-3.x-java7/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/analysis/ja/dict/UserDictionary.java:207:
 method does not override a method from its superclass
[javac]   @Override
[javac]^
[javac] 

RE: [JENKINS] Solr-3.x - Build # 642 - Failure

2012-03-26 Thread Uwe Schindler
Christian:

Could it be that you simply merged from trunk that did a svn copy. So you did 
not really merge the changes in 3.x to new files, your commit did remove the 
old files and replaced them by 3.x ones. @Override on interfaces is not 
compatible to Java 5. You should maybe test-build with Java 5.

Yesterday I did something similar (I moved a file around in trunk) and wanted 
to backport that change. This is very risky when doing by merge, as this 
removes the old file and add the new one instead of renaming. What I did at the 
end: I renamed the files in 3.x by hand and then did a no-op merge to record 
merge properties.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
 Sent: Monday, March 26, 2012 6:52 PM
 To: dev@lucene.apache.org
 Subject: [JENKINS] Solr-3.x - Build # 642 - Failure
 
 Build: https://builds.apache.org/job/Solr-3.x/642/
 
 No tests ran.
 
 Build Log (for compile errors):
 [...truncated 7071 lines...]
 jar-analyzers-common:
 
 common.init:
 
 compile-lucene-core:
 
 jflex-uptodate-check:
 
 jflex-notice:
 
 javacc-uptodate-check:
 
 javacc-notice:
 
 init:
 
 clover.setup:
 
 clover.info:
  [echo]
  [echo]   Clover not found. Code coverage reports disabled.
  [echo]
 
 clover:
 
 common.compile-core:
 
 compile-core:
 
 init:
 
 clover.setup:
 
 clover.info:
  [echo]
  [echo]   Clover not found. Code coverage reports disabled.
  [echo]
 
 clover:
 
 common.compile-core:
 [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java
 [javac] Compiling 26 source files to /usr/home/hudson/hudson-
 slave/workspace/Solr-
 3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java
 [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:153: method does not override a method
 from its superclass
 [javac]   @Override
 [javac]^
 [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:158: method does not override a method
 from its superclass
 [javac]   @Override
 [javac]^
 [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:163: method does not override a method
 from its superclass
 [javac]   @Override
 [javac]^
 [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:168: method does not override a method
 from its superclass
 [javac]   @Override
 [javac]^
 [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:186: method does not override a method
 from its superclass
 [javac]   @Override
 [javac]^
 [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:207: method does not override a method
 from its superclass
 [javac]   @Override
 [javac]^
 [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:212: method does not override a method
 from its superclass
 [javac]   @Override
 [javac]^
 [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:223: method does not override a method
 from its superclass
 [javac]   @Override
 [javac]^
 [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:228: method does not override a method
 from its superclass
 [javac]   @Override
 [javac]^
 [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/UserDictionary.java:82: method does not override a method
 from its superclass
 [javac]   @Override
 [javac]^
 [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/UserDictionary.java:197: method does not override a method
 from its superclass
 [javac]   @Override
 [javac]^
 [javac] 

[jira] [Commented] (SOLR-3231) Add the ability to KStemmer to preserve the original token when stemming

2012-03-26 Thread Robert Muir (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238557#comment-13238557
 ] 

Robert Muir commented on SOLR-3231:
---

I don't think we should approach the problem this way: this is the 
same discussion as LUCENE-3415

 Add the ability to KStemmer to preserve the original token when stemming
 

 Key: SOLR-3231
 URL: https://issues.apache.org/jira/browse/SOLR-3231
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Jamie Johnson
 Fix For: 4.0

 Attachments: KStemFilter.patch


 While using the PorterStemmer, I found that there were often times that it 
 was far to aggressive in it's stemming.  In my particular case it is 
 unrealistic to provide a protected word list which captures all possible 
 words which should not be stemmed.  To avoid this I proposed a solution 
 whereby we store the original token as well as the stemmed token so exact 
 searches would always work.  Based on discussions on the mailing list Ahmet 
 Arslan, I believe the attached patch to KStemmer provides the desired 
 capabilities through a configuration parameter.  This largely is a copy of 
 the org.apache.lucene.wordnet.SynonymTokenFilter.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-435) QParser must validate existance/absense of q parameter

2012-03-26 Thread Ryan McKinley (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238560#comment-13238560
 ] 

Ryan McKinley commented on SOLR-435:


bq. if no query string is supplied, or if its blank or just whitespace, then 
the default is to match all documents.

-0

When I opened this issue (4 years ago!) I was only worried that you get a NPE 
from a missing 'q'

bq. don't think we should do it w/o also add q.alt support to the LuceneQParser 
so people can override it.

+1

Match none seems like the most appropriate behavior unless you explicitly say 
something else 


 QParser must validate existance/absense of q parameter
 

 Key: SOLR-435
 URL: https://issues.apache.org/jira/browse/SOLR-435
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
Reporter: Ryan McKinley
Assignee: Ryan McKinley
 Fix For: 3.6, 4.0

 Attachments: SOLR-435_q_defaults_to_all-docs.patch


 Each QParser should check if q exists or not.  For some it will be required 
 others not.
 currently it throws a null pointer:
 {code}
 java.lang.NullPointerException
   at org.apache.solr.common.util.StrUtils.splitSmart(StrUtils.java:36)
   at 
 org.apache.solr.search.OldLuceneQParser.parse(LuceneQParserPlugin.java:104)
   at org.apache.solr.search.QParser.getQuery(QParser.java:80)
   at 
 org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:67)
   at 
 org.apache.solr.handler.SearchHandler.handleRequestBody(SearchHandler.java:150)
 ...
 {code}
 see:
 http://www.nabble.com/query-parsing-error-to14124285.html#a14140108

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Updated] (LUCENENET-466) optimisation for the GermanStemmer.vb‏

2012-03-26 Thread Christopher Currens (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENENET-466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Christopher Currens updated LUCENENET-466:
--

Attachment: DIN2Stemmer.patch

Bjorn,

I've made this patch from the src/contrib/Analyzers folder, on top of the DIN2 
changes already committed to trunk.  Since the extent of my German is danke!, 
I was hoping you could see if this stemmer is working properly before I commit 
it to trunk.

These were the test cases I made that should hopefully emulate the results of 
the normal DIN1 stemmer, where the word left of the semicolon is the word, and 
to the right, the result.

{noformat}
# Test cases for words with ae, ue, or oe in them
Haus;hau
Hauses;hau
Haeuser;hau
Haeusern;hau
steuer;steur
rueckwaerts;ruckwar
geheimtuer;geheimtur
{noformat}

With the last word in particular, it produces fairly different results in each 
stemmer, though I think they are expected, due to the different DIN.

Also, the DIN2 stemmer will also translate 'Häuser' and 'Häusern' properly (to 
hau), so there is support for both umlauts and the expanded 'ae', 'oe' and 'ue' 
forms.

 optimisation for the GermanStemmer.vb‏
 --

 Key: LUCENENET-466
 URL: https://issues.apache.org/jira/browse/LUCENENET-466
 Project: Lucene.Net
  Issue Type: Improvement
  Components: Lucene.Net Contrib
Affects Versions: Lucene.Net 2.9.4, Lucene.Net 2.9.4g, Lucene.Net 3.0.3
Reporter: Prescott Nasser
Priority: Minor
 Fix For: Lucene.Net 3.0.3

 Attachments: DIN2Stemmer.patch


 I have a little optimisation for the GermanStemmer.vb (in 
 Contrib.Analyzers) class. At the moment the function Substitute 
 converts the german Umlaute ä in a, ö ino and ü in u. This 
 is not the correct german translation. They must be converted to ae, 
 oe and ue. So I can write the name Björn or Bjoern but not 
 Bjorn. With this optimization a user can search for Björn and also 
 find Bjoern.
  
 Here is the optimized code snippet:
  
 else if ( buffer[c] == 'ä' )
  {
  buffer[c] = 'a';
  buffer.Insert(c + 1, 'e');
  }
  else if ( buffer[c] == 'ö' )
  {
  buffer[c] = 'o';
  buffer.Insert(c + 1,'e');
  }
  else if ( buffer[c] == 'ü' )
  {
  buffer[c] = 'u';
  buffer.Insert(c + 1,'e');
  }
  
 Thank You
 Björn

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




Re: [JENKINS] Solr-3.x - Build # 642 - Failure

2012-03-26 Thread Robert Muir
I took care of the @Overrides.

On Mon, Mar 26, 2012 at 12:55 PM, Uwe Schindler u...@thetaphi.de wrote:
 Christian:

 Could it be that you simply merged from trunk that did a svn copy. So you did 
 not really merge the changes in 3.x to new files, your commit did remove the 
 old files and replaced them by 3.x ones. @Override on interfaces is not 
 compatible to Java 5. You should maybe test-build with Java 5.

 Yesterday I did something similar (I moved a file around in trunk) and wanted 
 to backport that change. This is very risky when doing by merge, as this 
 removes the old file and add the new one instead of renaming. What I did at 
 the end: I renamed the files in 3.x by hand and then did a no-op merge to 
 record merge properties.

 Uwe

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Apache Jenkins Server [mailto:jenk...@builds.apache.org]
 Sent: Monday, March 26, 2012 6:52 PM
 To: dev@lucene.apache.org
 Subject: [JENKINS] Solr-3.x - Build # 642 - Failure

 Build: https://builds.apache.org/job/Solr-3.x/642/

 No tests ran.

 Build Log (for compile errors):
 [...truncated 7071 lines...]
 jar-analyzers-common:

 common.init:

 compile-lucene-core:

 jflex-uptodate-check:

 jflex-notice:

 javacc-uptodate-check:

 javacc-notice:

 init:

 clover.setup:

 clover.info:
      [echo]
      [echo]       Clover not found. Code coverage reports disabled.
      [echo]

 clover:

 common.compile-core:

 compile-core:

 init:

 clover.setup:

 clover.info:
      [echo]
      [echo]       Clover not found. Code coverage reports disabled.
      [echo]

 clover:

 common.compile-core:
     [mkdir] Created dir: /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java
     [javac] Compiling 26 source files to /usr/home/hudson/hudson-
 slave/workspace/Solr-
 3.x/checkout/lucene/build/contrib/analyzers/kuromoji/classes/java
     [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:153: method does not override a method
 from its superclass
     [javac]   @Override
     [javac]    ^
     [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:158: method does not override a method
 from its superclass
     [javac]   @Override
     [javac]    ^
     [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:163: method does not override a method
 from its superclass
     [javac]   @Override
     [javac]    ^
     [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:168: method does not override a method
 from its superclass
     [javac]   @Override
     [javac]    ^
     [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:186: method does not override a method
 from its superclass
     [javac]   @Override
     [javac]    ^
     [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:207: method does not override a method
 from its superclass
     [javac]   @Override
     [javac]    ^
     [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:212: method does not override a method
 from its superclass
     [javac]   @Override
     [javac]    ^
     [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:223: method does not override a method
 from its superclass
     [javac]   @Override
     [javac]    ^
     [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/BinaryDictionary.java:228: method does not override a method
 from its superclass
     [javac]   @Override
     [javac]    ^
     [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/UserDictionary.java:82: method does not override a method
 from its superclass
     [javac]       @Override
     [javac]        ^
     [javac] /usr/home/hudson/hudson-slave/workspace/Solr-
 3.x/checkout/lucene/contrib/analyzers/kuromoji/src/java/org/apache/lucene/a
 nalysis/ja/dict/UserDictionary.java:197: method does not override a method
 

Re: Indexing Boolean Expressions

2012-03-26 Thread J. Delgado
In full dislosure, there is a patent application that Yahoo! has filed for
the use of inverted indexes for using complex  predicates for matching
contracts and opportunities in advertising:
http://www.google.com/patents/US20110016109?printsec=abstract#v=onepageqf=false

However I believe there are many more applications that can benefit from
similar matching techniques (i.e. recommender systems,
e-commerce, recruiting,etc) to make it worthwhile implementing the ideas
exposed in the original VLDB'09 paper (which is public) in Lucene.

As a Yahoo! employee, I might not be able to directly contribute to this
project but will be happy to point to any publicly available pointer that
can help.

Cheers,

-- Joaquin


On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev 
mkhlud...@griddynamics.com wrote:

 Hello Joaquin,

 I looked through the paper several times, and see no problem to implement
 it in Lucene (the trivial case at least):

 Let's index conjunctive condition as
  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}

 then, form query from the incoming fact (event):
 fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD

 to enforce overlap between condition and event, wrap the query above into
 own query whose scorer will check that numClauses for the matched doc is
 equal to number of matched clauses.
 To get numClauses for the matched doc you can use FieldCache that's damn
 fast; and number of matched clauses can be obtained from
 DisjunctionSumScorer.nrMatchers()

 Negative clauses, and multivalue can be covered also, I believe.

 WDYT?


 On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado joaquin.delg...@gmail.comwrote:

 I looked at LUCENE-2987 and its work on the query side (changes to the
 accepted syntax to accept lower case 'or' and 'and'), which isn't really
 related to my proposal.

 What I'm proposing is to be able to index complex boolean expressions
 using Lucene. This can be viewed as the opposite of the regular search
 task. The objective here is find a set of relevant queries given a document
 (assignment of values to fields).

 This by itself may not sound that interesting but its a key piece
 to efficiently implementing any MATCHING system which is effectively a
 two-way search where constraints are defined both-ways. An example of this
 would be:

 1) Job matching: Potential employers define their job posting as a
 documents along with complex boolean expressions used to narrow potential
 candidates. Job searchers upload their profile and may formulate complex
 queries when executing a search. Once a is search initiated from any of the
 sides constraints need to satisfied both ways.
 2) Advertising: Publishers define constraints on the type of
 advertisers/ads they are willing to show in their sites. On the other hand,
 advertisers define constraints (typically at the campaign level) on
 publisher sites they want their ads to show at as well as on the user
 audiences they are targeting to. While some attribute values are known at
 definition time, others are only instantiated once the user visits a given
 page which triggers a matching request that must be satisfied in
 few milliseconds to select valid ads and then scored based on relevance.

 So in a matching system a MATCH QUERY is considered to to be a tuple that
 consists of a value assignment to attributes/fields (doc) + a boolean
 expression (query) that goes against a double index also built on tuples
 that  simultaneously boolean expressions and associated documents.

 To do this efficiently we need to be able to build indexes on Boolean
 expressions (Lucene queries) and retrieve the set of matching expressions
 given a doc (typically few attributes with values assigned), which is the
 core of what is described in this paper: Indexing Boolean Expressions
 (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)

 -- J


 So to effectively resolve the problem of realtime matching one can

 On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera calcmaste...@gmail.comwrote:

  On 02/21/2012 12:15 PM, Aayush Kothari wrote:




  So if Aayush Kothari is interested in working on this as a Student,
 all we need is a formal mentor (I can be the informal one).

  Anyone up for the task?


   Completely interested in working for and learning about the
 aforementioned subject/project. +1.

 This may be related to the work I'm doing with LUCENE-2987
 Basically changing the grammar to accepts conjunctions AND and OR in the
 query text.
 I would be interested in working with you on some of the details.

 However, I too am not a formal committer.

 --
 Joe Cabreraeminorlabs.com





 --
 Sincerely yours
 Mikhail Khludnev
 Lucid Certified
 Apache Lucene/Solr Developer
 Grid Dynamics

 http://www.griddynamics.com
  mkhlud...@griddynamics.com




[jira] [Commented] (SOLR-3231) Add the ability to KStemmer to preserve the original token when stemming

2012-03-26 Thread Jamie Johnson (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-3231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13238564#comment-13238564
 ] 

Jamie Johnson commented on SOLR-3231:
-

Thanks Robert.  I just read LUCENE-3415 and understand the approach.  My 
biggest issue is I don't like having to create a separate field to do an exact 
search, this of course is based on the fact that I was burned by this so 
perhaps I am biased.  It feels like the right thing to do from a user of the 
API would be to do the least destructive thing, but again I have a specific use 
case in mind and am not considering all other implications.

 Add the ability to KStemmer to preserve the original token when stemming
 

 Key: SOLR-3231
 URL: https://issues.apache.org/jira/browse/SOLR-3231
 Project: Solr
  Issue Type: Improvement
  Components: Schema and Analysis
Reporter: Jamie Johnson
 Fix For: 4.0

 Attachments: KStemFilter.patch


 While using the PorterStemmer, I found that there were often times that it 
 was far to aggressive in it's stemming.  In my particular case it is 
 unrealistic to provide a protected word list which captures all possible 
 words which should not be stemmed.  To avoid this I proposed a solution 
 whereby we store the original token as well as the stemmed token so exact 
 searches would always work.  Based on discussions on the mailing list Ahmet 
 Arslan, I believe the attached patch to KStemmer provides the desired 
 capabilities through a configuration parameter.  This largely is a copy of 
 the org.apache.lucene.wordnet.SynonymTokenFilter.  

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Indexing Boolean Expressions

2012-03-26 Thread J. Delgado
BTW, the idea of indexing Boolean Expressions inside a text indexing engine
is not new. For example Oracle Text provides the CTXRULE index and the
MATCHES operator within their indexing stack, which is primarily used for
Rule-based text classification.

See:

http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8

http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013

-- J

On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado joaquin.delg...@gmail.comwrote:

 In full dislosure, there is a patent application that Yahoo! has filed for
 the use of inverted indexes for using complex  predicates for matching
 contracts and opportunities in advertising:

 http://www.google.com/patents/US20110016109?printsec=abstract#v=onepageqf=false

 However I believe there are many more applications that can benefit from
 similar matching techniques (i.e. recommender systems,
 e-commerce, recruiting,etc) to make it worthwhile implementing the ideas
 exposed in the original VLDB'09 paper (which is public) in Lucene.

 As a Yahoo! employee, I might not be able to directly contribute to this
 project but will be happy to point to any publicly available pointer that
 can help.

 Cheers,

 -- Joaquin


 On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:

 Hello Joaquin,

 I looked through the paper several times, and see no problem to implement
 it in Lucene (the trivial case at least):

 Let's index conjunctive condition as
  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}

 then, form query from the incoming fact (event):
 fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD

 to enforce overlap between condition and event, wrap the query above into
 own query whose scorer will check that numClauses for the matched doc is
 equal to number of matched clauses.
 To get numClauses for the matched doc you can use FieldCache that's
 damn fast; and number of matched clauses can be obtained from
 DisjunctionSumScorer.nrMatchers()

 Negative clauses, and multivalue can be covered also, I believe.

 WDYT?


 On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado joaquin.delg...@gmail.comwrote:

 I looked at LUCENE-2987 and its work on the query side (changes to the
 accepted syntax to accept lower case 'or' and 'and'), which isn't really
 related to my proposal.

 What I'm proposing is to be able to index complex boolean expressions
 using Lucene. This can be viewed as the opposite of the regular search
 task. The objective here is find a set of relevant queries given a document
 (assignment of values to fields).

 This by itself may not sound that interesting but its a key piece
 to efficiently implementing any MATCHING system which is effectively a
 two-way search where constraints are defined both-ways. An example of this
 would be:

 1) Job matching: Potential employers define their job posting as a
 documents along with complex boolean expressions used to narrow potential
 candidates. Job searchers upload their profile and may formulate complex
 queries when executing a search. Once a is search initiated from any of the
 sides constraints need to satisfied both ways.
 2) Advertising: Publishers define constraints on the type of
 advertisers/ads they are willing to show in their sites. On the other hand,
 advertisers define constraints (typically at the campaign level) on
 publisher sites they want their ads to show at as well as on the user
 audiences they are targeting to. While some attribute values are known at
 definition time, others are only instantiated once the user visits a given
 page which triggers a matching request that must be satisfied in
 few milliseconds to select valid ads and then scored based on relevance.

 So in a matching system a MATCH QUERY is considered to to be a tuple
 that consists of a value assignment to attributes/fields (doc) + a boolean
 expression (query) that goes against a double index also built on tuples
 that  simultaneously boolean expressions and associated documents.

 To do this efficiently we need to be able to build indexes on Boolean
 expressions (Lucene queries) and retrieve the set of matching expressions
 given a doc (typically few attributes with values assigned), which is the
 core of what is described in this paper: Indexing Boolean Expressions
 (See http://www.vldb.org/pvldb/2/vldb09-83.pdf)

 -- J


 So to effectively resolve the problem of realtime matching one can

 On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera calcmaste...@gmail.comwrote:

  On 02/21/2012 12:15 PM, Aayush Kothari wrote:




  So if Aayush Kothari is interested in working on this as a Student,
 all we need is a formal mentor (I can be the informal one).

  Anyone up for the task?


   Completely interested in working for and learning about the
 aforementioned subject/project. +1.

 This may be related to the work I'm doing with LUCENE-2987
 Basically changing the grammar to accepts conjunctions AND and OR in
 the query text.
 I would be interested in 

Re: Indexing Boolean Expressions

2012-03-26 Thread Walter Underwood
Efficient rule matching goes further back, at least to alerting in Verity K2.

wunder
Search Guy, Chegg

On Mar 26, 2012, at 10:15 AM, J. Delgado wrote:

 BTW, the idea of indexing Boolean Expressions inside a text indexing engine 
 is not new. For example Oracle Text provides the CTXRULE index and the 
 MATCHES operator within their indexing stack, which is primarily used for 
 Rule-based text classification.
 
 See:
 
 http://docs.oracle.com/cd/B28359_01/text.111/b28303/query.htm#autoId8
 
 http://docs.oracle.com/cd/B28359_01/text.111/b28303/classify.htm#g1011013
 
 -- J
 
 On Mon, Mar 26, 2012 at 10:07 AM, J. Delgado joaquin.delg...@gmail.com 
 wrote:
 In full dislosure, there is a patent application that Yahoo! has filed for 
 the use of inverted indexes for using complex  predicates for matching 
 contracts and opportunities in advertising:
 http://www.google.com/patents/US20110016109?printsec=abstract#v=onepageqf=false
 
 However I believe there are many more applications that can benefit from 
 similar matching techniques (i.e. recommender systems, e-commerce, 
 recruiting,etc) to make it worthwhile implementing the ideas exposed in the 
 original VLDB'09 paper (which is public) in Lucene.
 
 As a Yahoo! employee, I might not be able to directly contribute to this 
 project but will be happy to point to any publicly available pointer that can 
 help.
 
 Cheers,
 
 -- Joaquin
 
 
 On Sun, Mar 25, 2012 at 11:44 PM, Mikhail Khludnev 
 mkhlud...@griddynamics.com wrote:
 Hello Joaquin,
 
 I looked through the paper several times, and see no problem to implement it 
 in Lucene (the trivial case at least):
 
 Let's index conjunctive condition as
  {fieldA:valA,fieldB:valB,fieldC:valC,numClauses:3}
 
 then, form query from the incoming fact (event):
 fieldA:valA OR fieldB:valB OR fieldC:valC OR fieldD:valD
 
 to enforce overlap between condition and event, wrap the query above into own 
 query whose scorer will check that numClauses for the matched doc is equal to 
 number of matched clauses. 
 To get numClauses for the matched doc you can use FieldCache that's damn 
 fast; and number of matched clauses can be obtained from 
 DisjunctionSumScorer.nrMatchers()
 
 Negative clauses, and multivalue can be covered also, I believe.  
 
 WDYT?
 
 
 On Mon, Mar 5, 2012 at 10:05 PM, J. Delgado joaquin.delg...@gmail.com wrote:
 I looked at LUCENE-2987 and its work on the query side (changes to the 
 accepted syntax to accept lower case 'or' and 'and'), which isn't really 
 related to my proposal.
 
 What I'm proposing is to be able to index complex boolean expressions using 
 Lucene. This can be viewed as the opposite of the regular search task. The 
 objective here is find a set of relevant queries given a document (assignment 
 of values to fields).
 
 This by itself may not sound that interesting but its a key piece to 
 efficiently implementing any MATCHING system which is effectively a two-way 
 search where constraints are defined both-ways. An example of this would be:
 
 1) Job matching: Potential employers define their job posting as a 
 documents along with complex boolean expressions used to narrow potential 
 candidates. Job searchers upload their profile and may formulate complex 
 queries when executing a search. Once a is search initiated from any of the 
 sides constraints need to satisfied both ways. 
 2) Advertising: Publishers define constraints on the type of advertisers/ads 
 they are willing to show in their sites. On the other hand, advertisers 
 define constraints (typically at the campaign level) on publisher sites they 
 want their ads to show at as well as on the user audiences they are targeting 
 to. While some attribute values are known at definition time, others are only 
 instantiated once the user visits a given page which triggers a matching 
 request that must be satisfied in few milliseconds to select valid ads and 
 then scored based on relevance.
 
 So in a matching system a MATCH QUERY is considered to to be a tuple that 
 consists of a value assignment to attributes/fields (doc) + a boolean 
 expression (query) that goes against a double index also built on tuples that 
  simultaneously boolean expressions and associated documents.
 
 To do this efficiently we need to be able to build indexes on Boolean 
 expressions (Lucene queries) and retrieve the set of matching expressions 
 given a doc (typically few attributes with values assigned), which is the 
 core of what is described in this paper: Indexing Boolean Expressions (See 
 http://www.vldb.org/pvldb/2/vldb09-83.pdf)
 
 -- J
 
 
 So to effectively resolve the problem of realtime matching one can 
 
 On Tue, Feb 21, 2012 at 2:18 PM, Joe Cabrera calcmaste...@gmail.com wrote:
 On 02/21/2012 12:15 PM, Aayush Kothari wrote:
 
 
 
 
 So if Aayush Kothari is interested in working on this as a Student, all we 
 need is a formal mentor (I can be the informal one). 
 
 Anyone up for the task?
 
 
 Completely interested in working 

copyField and precedence with dynamic fields

2012-03-26 Thread Erick Erickson
This seems like it's a JIRA, I couldn't find anything like this that's
already a JIRA on a quick look.

From a client:

Here's a fragment of a schema file:
fields
   field name=id type=string indexed=true stored=true
required=true /
   field name=title_text type=text_general indexed=true
stored=true multiValued=false /
   field name=title_phonetic type=phonetic indexed=true
stored=true multiValued=false /

   dynamicField name=*_text type=text_general indexed=true
stored=false /
   dynamicField name=*_phonetic type=phonetic indexed=true
stored=false /
 /fields
 copyField source=*_text dest=*_phonetic /

Here's an input doc:
add
  doc
field name=idID1/field
field name=title_text1st Document/field
field name=description_textAnother field/field
  /doc
/add

OK, add the doc with the above schema, and to a q=*:*fl=*

The response does NOT contain title_phonetic.

It looks like IndexSchema.registerCopyField won't notice that
title_phonetic is a non-dynamic field and make a title_text -
title_phonetic mapping.



Is this a JIRA or intended or just not worth fixing?

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



  1   2   >