Re: Indexing multiple entities

2009-11-01 Thread Christian López Espínola
On Sun, Nov 1, 2009 at 5:30 AM, Avlesh Singh avl...@gmail.com wrote:

 The use case on DocumentObjectBinder is that I could override
 toSolrInputDocument, and if field = ID, I could do: setField(id,
 obj.getClass().getName() + obj.getId()) or something like that.


 Unless I am missing something here, can't you write the getter of id field
 in your solr bean as underneath?

 @Field
 private String id;
 public getId(){
  return (this.getClass().getName() + this.id);
 }

I'm using a code generator for my entities, and I cannot modify the generation.
I need to work out another option :(


 Cheers
 Avlesh

 On Fri, Oct 30, 2009 at 1:33 PM, Christian López Espínola 
 penyask...@gmail.com wrote:

 On Fri, Oct 30, 2009 at 2:04 AM, Avlesh Singh avl...@gmail.com wrote:
 
  One thing I thought about is if I can define my own
  DocumentObjectBinder, so I can concatenate my entity names with the
  IDs in the XML creation.
 
  Anyone knows if something like this can be done without modifying
  Solrj sources? Is there any injection or plugin mecanism for this?
 
  More details on the use-case please.

 If I index a Book with ID=3, and then a Magazine with ID=3, I'll be
 really removing my Book3 and indexing Magazine3. I want both entities
 to be in the index.

 The use case on DocumentObjectBinder is that I could override
 toSolrInputDocument, and if field = ID, I could do: setField(id,
 obj.getClass().getName() + obj.getId()) or something like that.

 The goal is avoiding creating all the XMLs to be sent to Solr but
 having the possibility of modifying them in some way.

 Do you know how can I do that, or a better way of achieving the same
 results?


  Cheers
  Avlesh
 
  On Fri, Oct 30, 2009 at 2:16 AM, Christian López Espínola 
  penyask...@gmail.com wrote:
 
  Hi Israel,
 
  Thanks for your suggestion,
 
  On Thu, Oct 29, 2009 at 9:37 PM, Israel Ekpo israele...@gmail.com
 wrote:
   On Thu, Oct 29, 2009 at 3:31 PM, Christian López Espínola 
   penyask...@gmail.com wrote:
  
   Hi, my name is Christian and I'm a newbie introducing to solr (and
  solrj).
  
   I'm working on a website where I want to index multiple entities,
 like
   Book or Magazine.
   The issue I'm facing is both of them have an attribute ID, which I
   want to use as the uniqueKey on my schema, so I cannot identify
   uniquely a document (because ID is saved in a database too, and it's
   autonumeric).
  
   I'm sure that this is a common pattern, but I don't find the way of
  solving
   it.
  
   How do you usually solve this? Thanks in advance.
  
  
   --
   Cheers,
  
   Christian López Espínola penyaskito
  
  
   Hi Christian,
  
   It looks like you are bringing in data to Solr from a database where
  there
   are two separate tables.
  
   One for *Books* and another one for *Magazines*.
  
   If this is the case, you could define your uniqueKey element in Solr
  schema
   to be a string instead of an integer then you can still load
 documents
   from both the books and magazines database tables but your could
 prefix
  the
   uniqueKey field with B for books and M for magazines
  
   Like so :
  
   field name=id type=string indexed=true stored=true
   required=true/
  
   uniqueKeyid/uniqueKey
  
   Then when loading the books or magazines into Solr you can create the
   documents with id fields like this
  
   add
    doc
      field name=idB14000/field
    /doc
    doc
      field name=idM14000/field
    /doc
    doc
      field name=idB14001/field
    /doc
    doc
      field name=idM14001/field
    /doc
   /add
  
   I hope this helps
 
  This was my first thought, but in practice there isn't Book and
  Magazine, but about 50 different entities, so I'm using the Field
  annotation of solrj for simplifying my code (it manages for me the XML
  creation, etc).
  One thing I thought about is if I can define my own
  DocumentObjectBinder, so I can concatenate my entity names with the
  IDs in the XML creation.
 
  Anyone knows if something like this can be done without modifying
  Solrj sources? Is there any injection or plugin mecanism for this?
 
  Thanks in advance.
 
 
   --
   Good Enough is not good enough.
   To give anything less than your best is to sacrifice the gift.
   Quality First. Measure Twice. Cut Once.
  
 
 
 
  --
  Cheers,
 
  Christian López Espínola penyaskito
 
 



 --
 Cheers,

 Christian López Espínola penyaskito





-- 
Cheers,

Christian López Espínola penyaskito


problems with PhraseHighlighter

2009-11-01 Thread AHMET ARSLAN
Hello everyone,

I am having problems with highlighting the complete text of a field. I have an 
xml field. I am querying proximity searches on this field. 

xml:  ( proximity1 AND/OR proximity2 AND/OR …)

Results are returned successfully satisfying the proximity query. However when 
I request highlighting sometimes it returns nothing sometimes it returns 
missing proximity terms.

I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml.
maxFieldLength2147483647/maxFieldLength

I am using these highlighting parameters:

hl.maxAnalyzedChars=2147483647
hl.fragsize=2147483647
hl.usePhraseHighlighter=true
hl.requireFieldMatch=true
hl.fl=xml
hl=true

I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it 
didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns but 
all query terms are highlighted. 

What value of hl.fragsize should I use to highlight complete text of a field? 0 
or 2147483647?

What is the highest value that I can set to hl.maxAnalyzedChars and hl.fragsize?

I am querying same field and requesting same field in highlighting. Although a 
document matches a query no highlighting returns back. What could be the reason?

If a document matches a query, there should be highlighting returning back, 
right?

Any help or pointers are really appreciated. 






Re: problems with PhraseHighlighter

2009-11-01 Thread Avlesh Singh
Copy-paste your field definition for the field you are trying to
highlight/search on.

Cheers
Avlesh

On Sun, Nov 1, 2009 at 8:24 PM, AHMET ARSLAN iori...@yahoo.com wrote:

 Hello everyone,

 I am having problems with highlighting the complete text of a field. I have
 an xml field. I am querying proximity searches on this field.

 xml:  ( proximity1 AND/OR proximity2 AND/OR …)

 Results are returned successfully satisfying the proximity query. However
 when I request highlighting sometimes it returns nothing sometimes it
 returns missing proximity terms.

 I set my maxFieldLength to Integer.MAX_VALUE in solrconfig.xml.
 maxFieldLength2147483647/maxFieldLength

 I am using these highlighting parameters:

 hl.maxAnalyzedChars=2147483647
 hl.fragsize=2147483647
 hl.usePhraseHighlighter=true
 hl.requireFieldMatch=true
 hl.fl=xml
 hl=true

 I tried combinations of hl.fragsize=0 and hl.requireFieldMatch=false but it
 didn’t help. When i set hl.usePhraseHighlighter=false highlighting returns
 but all query terms are highlighted.

 What value of hl.fragsize should I use to highlight complete text of a
 field? 0 or 2147483647?

 What is the highest value that I can set to hl.maxAnalyzedChars and
 hl.fragsize?

 I am querying same field and requesting same field in highlighting.
 Although a document matches a query no highlighting returns back. What could
 be the reason?

 If a document matches a query, there should be highlighting returning back,
 right?

 Any help or pointers are really appreciated.







Re: problems with PhraseHighlighter

2009-11-01 Thread AHMET ARSLAN
 Copy-paste your field definition for
 the field you are trying to
 highlight/search on.
 
 Cheers
 Avlesh

Thank you for your interest Avlesh,

My field type mostly contains custom filters and tokenizers.

fieldType name=XMLText class=solr.TextField positionIncrementGap=100
 analyzer type=index
  tokenizer class=XMLStripStandardTokenizerFactory / 
  filter class=solr.SynonymFilterFactory synonyms=synonyms_index.txt 
ignoreCase=true expand=true / 
  filter class=CustomStemFilterFactory protected=protwords.txt / 
  filter class=LowerCaseFilterFactory / 
  /analyzer
 analyzer type=query
  tokenizer class=CustomTokenizerFactory / 
  filter class=CustomDeasciifyFilterFactory / 
  filter class=CustomStemFilterFactory protected=protwords.txt / 
  filter class=LowerCaseFilterFactory / 
  /analyzer
  /fieldType


Firstly I tried to use solr.HTMLStripCharFilterFactory to strip xml tags, it 
works fine but when it comes to highlighting the em tags are replaced 
incorrect position. Same as solr.HTMLStripStandardTokenizerFactory. The em 
tags are inserted interestingly exactly one character before the actual term. 
So I added a new token definition to StandardTokenizer's jflex file, to 
recogize xml tags and ingores them. I confirmed that it is working with some 
testcases. It strips xml tags in tokenizer level. I am doing this because I am 
displaying original documents with xml + xslt. Therefore i need to highlight 
xml files to display.

And I am using ComplexPhraseQueryParser [1].

But i reproduced the problem with defType=luceneq=term1 term2~5 I see that 
term1 and term2 is 5 terms close to each other . Therefore it is returned. But 
highlighting is empty. And there is no xml tags (stripped by tokenizer) between 
those terms in the original document.

hl.maxanalyzedchars parameter is about original document, right? I mean in my 
case including xml tags too.

[1] 
http://lucene.apache.org/java/2_9_0/api/contrib-misc/org/apache/lucene/queryParser/complexPhrase/package-summary.html


  


Question about DIH execution order

2009-11-01 Thread Bertie Shen
Hi folks,

  I have the following data-config.xml. Is there a way to
let transformation take place after executing SQL select comment from
Rating where Rating.CourseId = ${Course.CourseId}?  In MySQL database,
column CourseId in table Course is integer 1, 2, etc;
template transformation will make them like Course:1, Course:2; column
CourseId in table Rating is also integer 1, 2, etc.

  If transformation happens before executing select comment from Rating
where Rating.CourseId = ${Course.CourseId}, then there will no match for
the SQL statement execution.

 document
 entity name=Course transformer=TemplateTransformer query=select *
from Course
  field
column=CourseId template=Course:${Course.CourseId} name=id/
  entity name=Rating query=select comment from Rating where
Rating.CourseId = ${Course.CourseId}
field column=comment name=review/
  /entity
/entity
  /document


RE: autocomplete

2009-11-01 Thread Ankit Bhatnagar


Hey Avlesh,
Thanks for your reply.


-Ankit

-Original Message-
From: Avlesh Singh [mailto:avl...@gmail.com] 
Sent: Saturday, October 31, 2009 10:08 PM
To: solr-user@lucene.apache.org
Subject: Re: autocomplete



 q=*:*fq=ac:*all*wt=jsonrows=15start=0indent=onomitHeader=truejson.wrf=?;

Why is the json.wrf not specified? Without the callback function, the string
that is return back is illegal javascript for the browser. You need to
specify this parameter which is a wrapper or a callback function. If you
specify json.wrf=foo, as soon as the browser gets a response, it would call
a function named foo (needs to already defined). Inside foo you can have
you own implementation to interpret and render this data.

Cheers
Avlesh

On Sat, Oct 31, 2009 at 12:13 AM, Ankit Bhatnagar abhatna...@vantage.comwrote:


 Hi guys,

 Enterprise 1.4 Solr Book (AutoComplete) says this works -

 My query looks like -


 q=*:*fq=ac:*all*wt=jsonrows=15start=0indent=onomitHeader=truejson.wrf=?;


 And it returns three results


 {
  responseHeader:{
  status:0,
  QTime:38,
  params:{
indent:on,
start:0,
q:*:*,
wt:json,
fq:ac:*all*,
rows:15}},
  response:{numFound:3,start:0,docs:[
{
 id:1,
 ac:Can you show me all the results},
{
 id:2,
 ac:Can you show all companies },
{
 id:3,
 ac:Can you list all companies}]
  }}



 But browser says syntax error --


 Ankit
















latest lucene libraries in maven repo

2009-11-01 Thread Uri Boness

Hi,

It seems the the latest lucene libraries are not up to date in the Solr 
maven repo 
(http://people.apache.org/repo/m2-snapshot-repository/org/apache/solr/solr-lucene-core/1.4-SNAPSHOT/)


Can we expect them to be updated soon?

Cheers,
Uri


Programmatically configuring SLF4J for Solr 1.4?

2009-11-01 Thread Don Werve
So, I've spent a bit of the day banging my head against this, and can't get
it sorted.  I'm using a DirectSolrConnection embedded in a JRuby
application, and everything works great, except I can't seem to get it to do
anything except log to the console.  I've tried pointing
'java.util.logging.config.file' to a properties file, as well as specifying
a logfile as part of the constructor for DirectSolrConnection, but so far,
nothing has really worked.

What I'd like to do is programmatically direct the Solr logs to a logfile,
so that I can have my app start up, parse its config, and throw the Solr
logs where they need to go based on that.

So, I don't suppose anybody has a code snippet (in Java) that sets up SLF4J
for Solr logging (and that doesn't reference an external properties file)?

Using the latest (1 Nov 2009) nightly build of Solr 1.4.0-dev


Re: Programmatically configuring SLF4J for Solr 1.4?

2009-11-01 Thread Ryan McKinley
I'm sure it is possible to configure JDK logging (java.util.loging)  
programatically... but I have never had much luck with it.


It is very easy to configure log4j programatically, and this works  
great with solr.


To use log4j rather then JDK logging, simply add slf4j- 
log4j12-1.5.8.jar (from http://www.slf4j.org/download.html) to your  
classpath


ryan



On Nov 1, 2009, at 11:05 PM, Don Werve wrote:

So, I've spent a bit of the day banging my head against this, and  
can't get

it sorted.  I'm using a DirectSolrConnection embedded in a JRuby
application, and everything works great, except I can't seem to get  
it to do

anything except log to the console.  I've tried pointing
'java.util.logging.config.file' to a properties file, as well as  
specifying
a logfile as part of the constructor for DirectSolrConnection, but  
so far,

nothing has really worked.

What I'd like to do is programmatically direct the Solr logs to a  
logfile,
so that I can have my app start up, parse its config, and throw the  
Solr

logs where they need to go based on that.

So, I don't suppose anybody has a code snippet (in Java) that sets  
up SLF4J
for Solr logging (and that doesn't reference an external properties  
file)?


Using the latest (1 Nov 2009) nightly build of Solr 1.4.0-dev




Re: Question about DIH execution order

2009-11-01 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Sun, Nov 1, 2009 at 11:59 PM, Bertie Shen bertie.s...@gmail.com wrote:
 Hi folks,

  I have the following data-config.xml. Is there a way to
 let transformation take place after executing SQL select comment from
 Rating where Rating.CourseId = ${Course.CourseId}?  In MySQL database,
 column CourseId in table Course is integer 1, 2, etc;
 template transformation will make them like Course:1, Course:2; column
 CourseId in table Rating is also integer 1, 2, etc.

  If transformation happens before executing select comment from Rating
 where Rating.CourseId = ${Course.CourseId}, then there will no match for
 the SQL statement execution.

  document
     entity name=Course transformer=TemplateTransformer query=select *
 from Course
      field
 column=CourseId template=Course:${Course.CourseId} name=id/
      entity name=Rating query=select comment from Rating where
 Rating.CourseId = ${Course.CourseId}
        field column=comment name=review/
      /entity
    /entity
  /document


keep the field as follows
  field
column=TmpCourseId name=CourseId
template=Course:${Course.CourseId} name=id/




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: multiple sql queries for one index?

2009-11-01 Thread Amit Nithian
I don't particularly like the nested entities approach because from what I
recall it will execute separate SQL queries for each top level record which,
to me, doesn't seem very ideal for large scale indexing.

I know it's a pain to do a ton of joins.. believe me our dataset has a boat
load of joins too but I think it works out much better to have a GIANT SQL
statement execute because you can do record level fetching and index faster
(as opposed to waiting for the entire recordset to buffer and send to the
client).

Try using temporary tables in your connection to help reduce some of the
data down or stored procedures if you have that control in your DB.

Hope that helps!
Amit

On Thu, Oct 29, 2009 at 5:00 PM, Avlesh Singh avl...@gmail.com wrote:

 Read this example fully -
 http://wiki.apache.org/solr/DataImportHandler#Full_Import_Example
 nested entities is an answer to your question. The example has a sample.

 Cheers
 Avlesh

 On Fri, Oct 30, 2009 at 2:58 AM, Joel Nylund jnyl...@yahoo.com wrote:

  Hi, Its been hurting my brain all day to try to build 1 query for my
 index
  (joins upon joins upon joins). Is there a way I can do multiple queries
 to
  populate the same index? I have one main table that I can join everything
  back via ID, it should be theoretically possible
 
  If this can be done, can someone point me to an example?
 
  thanks
  Joel
 
 



Re: Solr YUI autocomplete

2009-11-01 Thread Amit Nithian
I've used the YUI auto complete (albeit not with Solr which shouldn't matter
here) and it should work with JSON. I did one that simply made XHR calls
over to a method on my server which returned pipe delimited text which
worked fine.

Are you using the XHR Data source and if so, what type are you telling it to
expect. One of the examples on the YUI site is text based and i'm sure you
can specify TYPE_JSON or JS_ARRAY too.

- Amit

On Fri, Oct 30, 2009 at 7:04 AM, Ankit Bhatnagar abhatna...@vantage.comwrote:


 Does Solr supports JSONP (JSON with Padding) in the response?

 -Ankit



 -Original Message-
 From: Ankit Bhatnagar [mailto:abhatna...@vantage.com]
 Sent: Friday, October 30, 2009 10:27 AM
 To: 'solr-user@lucene.apache.org'
 Subject: Solr YUI autocomplete

 Hi Guys,

 I have question regarding - how to specify the

 I am using YUI autocomplete widget and it expects the JSONP response.


 http://localhost:8983/solr/select/?q=monitorversion=2.2start=0rows=10indent=onwt=jsonjson.wrf=

 I am not sure how should I specify the json.wrf=function

 Thanks
 Ankit



Re: Greater-than and less-than in data import SQL queries

2009-11-01 Thread Amit Nithian
A thought I had on this from a DIH design perspective. Would it be better to
have the SQL queries stored in an element rather than an attribute so that
you can wrap it in a CDATA block without having to mess up the look of query
with lt, gt? Makes debugging easier (I know find and replace is trivial
but it can be annoying when debugging SQL issues :-)).

On Wed, Oct 28, 2009 at 5:15 PM, Lance Norskog goks...@gmail.com wrote:

 It is easier to put SQL select statements in a view, and just use that
 view from the DIH configuration file.

 On Tue, Oct 27, 2009 at 12:30 PM, Andrew Clegg andrew.cl...@gmail.com
 wrote:
 
 
  Heh, eventually I decided
 
  where 4  node_depth
 
  was the most pleasing (if slightly WTF-ish) way of writing it...
 
  Cheers,
 
  Andrew.
 
 
  Erik Hatcher-4 wrote:
 
  Use lt; instead of  in that attribute.  That should fix the issue.
  Remember, it's an XML file, so it has to obey XML encoding rules which
  make it ugly but whatcha gonna do?
 
Erik
 
  On Oct 27, 2009, at 11:50 AM, Andrew Clegg wrote:
 
 
  Hi,
 
  If I have a DataImportHandler query with a greater-than sign in,
  like this:
 
 entity name=higher_node dataSource=database
  query=select *,
  title as keywords from cathnode_text where node_depth  4
 
  Everything's fine. However, if it contains a less-than sign:
 
 entity name=higher_node dataSource=database
  query=select *,
  title as keywords from cathnode_text where node_depth  4
 
  I get this exception:
 
  INFO: Processing configuration from solrconfig.xml:
  {config=dataconfig.xml}
  [Fatal Error] :240:129: The value of attribute query associated
  with an
  element type null must not contain the '' character.
  27-Oct-2009 15:30:49
  org.apache.solr.handler.dataimport.DataImportHandler
  inform
  SEVERE: Exception while loading DataImporter
  org.apache.solr.handler.dataimport.DataImportHandlerException:
  Exception
  occurred while initializing context
 at
  org
  .apache
  .solr
  .handler.dataimport.DataImporter.loadDataConfig(DataImporter.java:184)
 at
  org
  .apache
  .solr.handler.dataimport.DataImporter.init(DataImporter.java:101)
 at
  org
  .apache
  .solr
  .handler.dataimport.DataImportHandler.inform(DataImportHandler.java:
  113)
 at
  org
  .apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:
  424)
 at org.apache.solr.core.SolrCore.init(SolrCore.java:588)
 at
  org.apache.solr.core.CoreContainer
  $Initializer.initialize(CoreContainer.java:137)
 at
  org
  .apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:
  83)
 at
  org
  .apache
  .catalina
  .core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:
  275)
 at
  org
  .apache
  .catalina
  .core
  .ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:
  397)
 at
  org
  .apache
  .catalina
  .core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108)
 at
  org
  .apache
  .catalina.core.StandardContext.filterStart(StandardContext.java:3709)
 at
  org.apache.catalina.core.StandardContext.start(StandardContext.java:
  4356)
 at
  org.apache.catalina.manager.ManagerServlet.start(ManagerServlet.java:
  1244)
 at
  org
  .apache
  .catalina.manager.HTMLManagerServlet.start(HTMLManagerServlet.java:
  604)
 at
  org
  .apache
  .catalina.manager.HTMLManagerServlet.doGet(HTMLManagerServlet.java:
  129)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:690)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
 at
  org
  .apache
  .catalina
  .core
  .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:
  290)
 at
  org
  .apache
  .catalina
  .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
  org
  .apache
  .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
  233)
 at
  org
  .apache
  .catalina.core.StandardContextValve.invoke(StandardContextValve.java:
  175)
 at
  org
  .apache
  .catalina
  .authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:525)
 at
  org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:
  568)
 at
  org
  .apache
  .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at
  org
  .apache
  .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
  org
  .apache
  .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:
  109)
 at
  org
  .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
  286)
 at
  org
  .apache.coyote.http11.Http11Processor.process(Http11Processor.java:
  844)
 at
  org.apache.coyote.http11.Http11Protocol
  $Http11ConnectionHandler.process(Http11Protocol.java:583)
 at
  org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:
  447)
 at 

Re: Iso accents and wildcards

2009-11-01 Thread Nicolas Leconte
Tks for the explain now I can clearly understand why it doesn't work as 
I was expecting :)


jfmel...@free.fr a écrit :

if the request contains any wilcard then filters are not called :
no ISOLatin1AccentFilterFactory and no SnowballPorterFilterFactory  !

économie is indexed to econom

solr don't found :
 - term starts with éco (éco*)
 - term starts with economi (economi*)

if you index manger, mangé and mangue, the indexed terms will be mang and mangu

requests  -  results

manger   -   mange, mangé
mangé-   mange, mangé
mang -   mange, manger
mangu-   mangue
mang*-   manger, mangé, mangue
mang?-   mangue  (and not mangé)
mangé*   -   nothing

Jean-François


- Nicolas Leconte nicolas.ai...@aidel.com a écrit :

| Hi all,
| 
| I have a field that contains accentuated char in it, what I whant is
| to 
| be able to search with ignore accents.

| I have set up that field with :
| analyzer
| tokenizer class=solr.StandardTokenizerFactory/
| filter class=solr.StandardFilterFactory/
| filter class=solr.WordDelimiterFilterFactory generateWordParts=1
| 
| generateNumberParts=1 catenateWords=1 catenateNumbers=1 
| catenateAll=0 splitOnCaseChange=1 /

| filter class=solr.LowerCaseFilterFactory/
| filter class=solr.StopFilterFactory ignoreCase=true 
| words=stopwords.txt /

| filter class=solr.SnowballPorterFilterFactory language=French/
| filter class=solr.LowerCaseFilterFactory/
| filter class=solr.ISOLatin1AccentFilterFactory/
| filter class=solr.RemoveDuplicatesTokenFilterFactory/
| /analyzer
| 
| In the index the word économie is translated to  econom, the 
| accent 
| is removed thanks to the ISOLatin1AccentFilterFactory and the end of
| the 
| word removent thanks to the SnowballPorterFilterFactory.
| 
| When I request with title:econ* I can have the correct  answers, but
| if  
| I request  with  title:écon*  I  have no  answers.

| If I request with title:économ (the exact word of the index) it works,
| 
| so there might be something wrong with the wildcard.

| As far as I can understand the analyser should be use exactly the same
| 
| in both index and query time.
| 
| I have tested with changing the order of the filters (putting the 
| ISOLatin1AccentFilterFactory on top) without any result.
| 
| Could anybody help me with that and point me what may be wrong with my
| 
| shema ?



  




Re: Iso accents and wildcards

2009-11-01 Thread Nicolas Leconte

Tks for the tips, I will try to do exactly what u suggest.

Avlesh Singh a écrit :

When I request with title:econ* I can have the correct  answers, but if  I
request  with  title:écon*  I  have no  answers.
If I request with title:économ (the exact word of the index) it works, so
there might be something wrong with the wildcard.
As far as I can understand the analyser should be use exactly the same in
both index and query time.



Wildcard queries are not analyzed and hence the inconsistent behaviour.
The easiest way out is to define one more field title_orginal as an
untokenized field. While querying, you can use both the fields at the same
time. e.g. q=(title:écon* title_orginal:écon*). In any case, you would get
desired matches.

Cheers
Avlesh

On Fri, Oct 30, 2009 at 9:19 PM, Nicolas Leconte nicolas.ai...@aidel.comwrote:

  

Hi all,

I have a field that contains accentuated char in it, what I whant is to be
able to search with ignore accents.
I have set up that field with :
analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.WordDelimiterFilterFactory generateWordParts=1
generateNumberParts=1 catenateWords=1 catenateNumbers=1
catenateAll=0 splitOnCaseChange=1 /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt /
filter class=solr.SnowballPorterFilterFactory language=French/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.ISOLatin1AccentFilterFactory/
filter class=solr.RemoveDuplicatesTokenFilterFactory/
/analyzer

In the index the word économie is translated to  econom, the  accent is
removed thanks to the ISOLatin1AccentFilterFactory and the end of the word
removent thanks to the SnowballPorterFilterFactory.

When I request with title:econ* I can have the correct  answers, but if  I
request  with  title:écon*  I  have no  answers.
If I request with title:économ (the exact word of the index) it works, so
there might be something wrong with the wildcard.
As far as I can understand the analyser should be use exactly the same in
both index and query time.

I have tested with changing the order of the filters (putting the
ISOLatin1AccentFilterFactory on top) without any result.

Could anybody help me with that and point me what may be wrong with my
shema ?