Re: Iso accents and wildcards

2009-10-30 Thread jfmelian
if the request contains any wilcard then filters are not called :
no ISOLatin1AccentFilterFactory and no SnowballPorterFilterFactory  !

"économie" is indexed to "econom"

solr don't found :
 - term starts with "éco" (éco*)
 - term starts with "economi" (economi*)

if you index manger, mangé and mangue, the indexed terms will be mang and mangu

requests  ->  results

manger   ->   mange, mangé
mangé->   mange, mangé
mang ->   mange, manger
mangu->   mangue
mang*->   manger, mangé, mangue
mang?->   mangue  (and not mangé)
mangé*   ->   nothing

Jean-François


- "Nicolas Leconte"  a écrit :

| Hi all,
| 
| I have a field that contains accentuated char in it, what I whant is
| to 
| be able to search with ignore accents.
| I have set up that field with :
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| 
| In the index the word "économie" is translated to  "econom", the 
| accent 
| is removed thanks to the ISOLatin1AccentFilterFactory and the end of
| the 
| word removent thanks to the SnowballPorterFilterFactory.
| 
| When I request with title:econ* I can have the correct  answers, but
| if  
| I request  with  title:écon*  I  have no  answers.
| If I request with title:économ (the exact word of the index) it works,
| 
| so there might be something wrong with the wildcard.
| As far as I can understand the analyser should be use exactly the same
| 
| in both index and query time.
| 
| I have tested with changing the order of the filters (putting the 
| ISOLatin1AccentFilterFactory on top) without any result.
| 
| Could anybody help me with that and point me what may be wrong with my
| 
| shema ?


Re: is EmbeddedSolrServer thread safe ?

2009-10-22 Thread jfmelian
thanks
- "Noble Paul നോബിള്‍ नोब्ळ्"  a écrit :

| yes
| 
| On Thu, Oct 22, 2009 at 2:38 PM,   wrote:
| > at SolrJ wiki page :
| http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer
| >
| > "CommonsHttpSolrServer is thread-safe and if you are using the
| following constructor,
| > you *MUST* re-use the same instance for all requests. ..."
| >
| > But is it the same for EmbeddedSolrServer ?
| >
| > Best regards
| >
| > Jean-François
| >
| 
| 
| 
| -- 
| -
| Noble Paul | Principal Engineer| AOL | http://aol.com


is EmbeddedSolrServer thread safe ?

2009-10-22 Thread jfmelian
at SolrJ wiki page : http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer 

"CommonsHttpSolrServer is thread-safe and if you are using the following 
constructor, 
you *MUST* re-use the same instance for all requests. ..." 

But is it the same for EmbeddedSolrServer ? 

Best regards 

Jean-François 


browse terms of index

2009-10-15 Thread jfmelian
Hi 

I use a sample embedded Apache Solr to create a Lucene index with few documents 
for tests purpose. 
Documents have text string, sint, sfloat, bool, and date fields, each of them 
are indexed. 
At this time they are also stored but only the ids documents will be stored at 
the end. 

I want to list the terms of index. I don't found a way this solr api so I made 
a try Apache Luke ( Lucene api.) 
Here the code of luke to see terms of index : 

public void terms(String field) throws CorruptIndexException, IOException { 
validateIndexSet(); 
validateOperationPossible(); 
SortedMap termMap = new TreeMap(); 
IndexReader reader = null; 
try { 
reader = IndexReader.open(indexName); 
TermEnum terms = reader.terms(); // return an enumeration of terms 
while (terms.next()) { 
Term term = terms.term(); 
if ((field.trim().length() == 0) || field.equals(term.field())) { 
termMap.put(term.field() + ":" + term.text(), 
new Integer((terms.docFreq(; 
} 
} 
int nkeys = 0; 
for (String key : termMap.keySet()) { 
Lucli.message(key + ": " + termMap.get(key)); 
nkeys++; 
if (nkeys > Lucli.MAX_TERMS) { 
break; 
} 
} 
} finally { 
closeReader(reader); 
} 
} 

But for sfloat field (is the same for sint) I don't see the value of the term. 
The class Term of Lucene have just 2 fields of type String (name and value) 

Here values returned for the dynamic field f_float of type sfloat : 

f_float:┼?? 
f_float:┼?? 
f_float:┼?l 
f_float:┼?? 
f_float:┼?? 

So, 
have a way to convert term in the good type (int, date, float ) ? 
Or Have a way to see index terms with solr api ? 

Thanks for help 

Jean-François Melian