a bug in commit script?

2007-09-20 Thread Yu-Hui Jin
Hi, guys, It seems there's a small bug in the bin/commit script for solr 1.2. I was able to run snapinstaller successfully to install the index and open a new searcher. (This is verified by querying the new docs through the web admin UI.) However, the snapinstaller script failed due to the

Re: a bug in commit script?

2007-09-20 Thread Chris Hostetter
: : It seems there's a small bug in the bin/commit script for solr 1.2. A fix was already commited to the trunk for this as part of SOLR-282 (but there doesn't seem to be a note about it in the changelog) -Hoss

Re: Solr Index - no segments* file found in org.apache.lucene.store.FSDirectory

2007-09-20 Thread Chris Hostetter
: Does this case arise when i do a search when there is no index?? - If yes, : then i guess the Exception can be made more meaningful. in normal operation, i believe this shouldn't happen -- Solr will create the index for you on startup if there isn't one. You're attampting a fairly advanced

Re: How can i make a distribute search on Solr?

2007-09-20 Thread David Welton
Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? eg, 1) get the job (a query) 2) map it to workers ( servers that provide search results from their own indexing) 3) wait for the results from all workers that reply within acceptable timeframe. 4) comb through

Re: Strange behavior when searching with accents

2007-09-20 Thread Thorsten Scherler
On Thu, 2007-09-20 at 10:11 +0200, Thierry Collogne wrote: Hello, We are experiencing some strange behavior while searching with words containing accents. We are using two examples rené and matthé When we search for rené or for rene, we get the same results, so that is ok. But when we

Re: Strange behavior when searching with accents

2007-09-20 Thread Bertrand Delacretaz
On 9/20/07, Thierry Collogne [EMAIL PROTECTED] wrote: ..when we search for matthé or for matthe, we get two totally different results The analyzer admin tool should help you find out what's happening, see http://wiki.apache.org/solr/FAQ#head-b25df8c8393bbcca28f1f344c432975002e29ca9

Re: Strange behavior when searching with accents

2007-09-20 Thread Thierry Collogne
We are using this schema definition fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter

Re: Strange behavior when searching with accents

2007-09-20 Thread Thierry Collogne
I have entered the the matthé term in the the analyzer, but as far as I understand, it should be ok. I have made some screenshots with the results. http://farm2.static.flickr.com/1407/1412619772_0b697789cd_o.jpg http://farm2.static.flickr.com/1245/1412619774_3351b287bc_o.jpg I find it strange

Re: Strange behavior when searching with accents

2007-09-20 Thread Thorsten Scherler
On Thu, 2007-09-20 at 13:33 +0200, Thierry Collogne wrote: We are using this schema definition Thierry, try to move the solr.ISOLatin1AccentFilterFactory up the filter cue, like: ... tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ ... for

Re: Strange behavior when searching with accents

2007-09-20 Thread Thorsten Scherler
On Thu, 2007-09-20 at 14:01 +0200, Thierry Collogne wrote: I have entered the the matthé term in the the analyzer, but as far as I understand, it should be ok. I have made some screenshots with the results. http://farm2.static.flickr.com/1407/1412619772_0b697789cd_o.jpg

Re: Strange behavior when searching with accents

2007-09-20 Thread Bertrand Delacretaz
On 9/20/07, Thierry Collogne [EMAIL PROTECTED] wrote: ...Thank you very much. Moving the filter class= solr.ISOLatin1AccentFilterFactory/ up in the chain fixed it Yes, the problem was the EnglishPorterFilterFactory before the accents removal: the stemmer doesn't know about accents, so no

Re: Strange behavior when searching with accents

2007-09-20 Thread Thierry Collogne
Thorsten, Thank you very much. Moving the filter class= solr.ISOLatin1AccentFilterFactory/ up in the chain fixed it. On 20/09/2007, Thorsten Scherler [EMAIL PROTECTED] wrote: On Thu, 2007-09-20 at 14:01 +0200, Thierry Collogne wrote: I have entered the the matthé term in the the analyzer,

Re: Term extraction

2007-09-20 Thread Michael Kimsal
Not sure if this is in the same league or not, but Yahoo offers a term extraction web service. http://developer.yahoo.com/search/content/V1/termExtraction.html On 9/20/07, Grant Ingersoll [EMAIL PROTECTED] wrote: You might investigate some tools like Alias-i's LingPipe or do some searches

Re: Filter by Group

2007-09-20 Thread mark angelillo
Thanks, Pieter. I'll go for that then. Mark On Sep 19, 2007, at 10:15 PM, Pieter Berkel wrote: Sounds like you're on the right track, if your groups overap (i.e. a document can be in group A and B), then you should ensure your groups field is multivalued. If you are searching for foo in

Re: Strange behavior when searching with accents

2007-09-20 Thread Thierry Collogne
We are indexing both french and dutch. I will take a look at SnowballPorterFilterFactory later, but thanks for the advice. On 20/09/2007, Bertrand Delacretaz [EMAIL PROTECTED] wrote: On 9/20/07, Thierry Collogne [EMAIL PROTECTED] wrote: ...Thank you very much. Moving the filter class=

Re: How can i make a distribute search on Solr?

2007-09-20 Thread Yonik Seeley
On 9/19/07, Norberto Meijome [EMAIL PROTECTED] wrote: Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? Not really... you could force a *lot* of different problems into map-reduce (that's sort of the point... being able to automatically parallelize a lot of different

Re: Term extraction

2007-09-20 Thread Yonik Seeley
On 9/19/07, Pieter Berkel [EMAIL PROTECTED] wrote: However, I'd like to be able to analyze documents more intelligently to recognize phrase keywords such as open source, Microsoft Office, Bill Gates rather than splitting each word into separate tokens (the field is never used in search queries

Re: Strange behavior when searching with accents

2007-09-20 Thread Thorsten Scherler
On Thu, 2007-09-20 at 15:27 +0200, Bertrand Delacretaz wrote: On 9/20/07, Thierry Collogne [EMAIL PROTECTED] wrote: ...Thank you very much. Moving the filter class= solr.ISOLatin1AccentFilterFactory/ up in the chain fixed it Yes, the problem was the EnglishPorterFilterFactory before

Re: How can i make a distribute search on Solr?

2007-09-20 Thread Norberto Meijome
On Thu, 20 Sep 2007 09:58:17 +0200 David Welton [EMAIL PROTECTED] wrote: That seems to be how Sphinx works: http://www.sphinxsearch.com/doc.html#distributed Of course, the details of this are far over my head for either system, so I don't really know if that's a sensible way of doing

Re: How can i make a distribute search on Solr?

2007-09-20 Thread Norberto Meijome
On Thu, 20 Sep 2007 09:53:46 -0400 Yonik Seeley [EMAIL PROTECTED] wrote: On 9/19/07, Norberto Meijome [EMAIL PROTECTED] wrote: Maybe I got this wrong...but isn't this what mapreduce is meant to deal with? Not really... you could force a *lot* of different problems into map-reduce

RE: Index/Update Problems with Solrj/Tomcat and Larger Files

2007-09-20 Thread Daley, Kristopher M.
I am running against 1.2. Where would I get the 1.3-dev version? I will try different versions of Tomcat and/or Jetty. Thanks for all your suggestions, I'll let you know. -Original Message- From: Ryan McKinley [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 19, 2007 8:30 PM To:

Re: Strange behavior when searching with accents

2007-09-20 Thread Bertrand Delacretaz
On 9/20/07, Thorsten Scherler [EMAIL PROTECTED] wrote: ...Betrand, does the French Snowball work fine?... I've seen some weirdnesses, like tennis and tenir (means to hold) both stemmed to ten, but in all of our (simple) tests it was ok. The application where we're using it does not require high

Solr and FieldCache

2007-09-20 Thread Walter Ferrara
I have an index with several fields, but just one stored: ID (string, unique). I need to access that ID field for each of the tops nodes docs in my results (this is done inside a handler I wrote), code looks like: Hits hits = searcher.search(query); for(int i=0; inodes; i++) {

Re: Solr and FieldCache

2007-09-20 Thread J.J. Larrea
At 5:30 PM +0200 9/20/07, Walter Ferrara wrote: I have an index with several fields, but just one stored: ID (string, unique). I need to access that ID field for each of the tops nodes docs in my results (this is done inside a handler I wrote), code looks like: Hits hits =

Re: Solr and FieldCache

2007-09-20 Thread Walter Ferrara
About stored/index difference: ID is a string, (= solr.StrField) so FieldCache give me what I need. I'm just wondering, as this cached object could be (theoretically) pretty big, do I need to be aware of some OOM? I know that FieldCache use weakmaps, so I presume the cached array for the older

Faceting question

2007-09-20 Thread Cric Digs
I've been struggling with this a bit so here goes: I'm using faceting to get some results. I also want to get another field - the id field along with it. Is it possible to get that somehow in the facet results? Thanks!

Re: Solr and FieldCache

2007-09-20 Thread Yonik Seeley
On 9/20/07, Walter Ferrara [EMAIL PROTECTED] wrote: I'm just wondering, as this cached object could be (theoretically) pretty big, do I need to be aware of some OOM? I know that FieldCache use weakmaps, so I presume the cached array for the older reader(s) will be gc-ed when the reader is no

Re: rsync start and enable for multiple solr instances within one tomcat

2007-09-20 Thread Yu-Hui Jin
Ok, I should correct myself. For #1, I think we need to 1) config different port for each solr home dir (since they run on the same host); 2) run rsync-start script under each of the solr home's bin dir. (btw, just to make clear, we should run rsync-start after rsync-enable that I understand.)

Re: rsync start and enable for multiple solr instances within one tomcat

2007-09-20 Thread Chris Hostetter
: 1) config different port for each solr home dir (since they run on the same : host); you mean a differnet rsync port right? ... yes the scripts as distributed assume that each rsync daemon will be dedicated to a single solr instance .. the idea beaing that even if you have 12 Solr intances

Re: Solr and FieldCache

2007-09-20 Thread Yonik Seeley
On 9/20/07, Walter Ferrara [EMAIL PROTECTED] wrote: I have an index with several fields, but just one stored: ID (string, unique). I need to access that ID field for each of the tops nodes docs in my results (this is done inside a handler I wrote), code looks like: Hits hits =

Re: rsync start and enable for multiple solr instances within one tomcat

2007-09-20 Thread Yu-Hui Jin
Thanks, Hoss. For the last question, yes I understand now it's referring to whatever solr home we have named. However, there's still the last part of my question that feels suspicious why the solr string is directly coded in the script (unlike other cases they usually use ${solr_root} to get to

Re: Faceting question

2007-09-20 Thread Chris Hostetter
: I'm using faceting to get some results. I also want to get another field - : the id field along with it. Is it possible to get that somehow in the facet : results? you're going to have to elaborate on what it is you are trying to do ... i genuinely have no idea what you are asking (and i

RE: Faceting question

2007-09-20 Thread Binkley, Peter
You mean, when it says that facet term foo has 10 documents, you want those 10 ids? I think that will require a further query from your application. Peter -Original Message- From: Cric Digs [mailto:[EMAIL PROTECTED] Sent: Thursday, September 20, 2007 12:43 PM To:

Re: rsync start and enable for multiple solr instances within one tomcat

2007-09-20 Thread Yu-Hui Jin
ok. Hoss. I think I'll believe you since nobody raised any issue running the script. And I'm about to try it out shortly with different solr home names. So just to help my knowledge, where does this virtual setting of this solr string happen? Should it be in some config file or sth? thanks,

Re: rsync start and enable for multiple solr instances within one tomcat

2007-09-20 Thread Chris Hostetter
: So just to help my knowledge, where does this virtual setting of this solr : string happen? Should it be in some config file or sth? rsyncd-start creates an rsync config file on the fly ... much of it is constants, but it fills in the rsync port using a variable from your config. -Hoss

Re: rsync start and enable for multiple solr instances within one tomcat

2007-09-20 Thread Bill Au
The solr that you are referring to in your third question in the name of the rsync area which is map to the solr data directory. This is defined in the rsyncd configuration file which is generated on the fly as Chris has pointed out. Take a look at rsyncd-start. snappuller rsync the index from

Re: a bug in commit script?

2007-09-20 Thread Bill Au
That would be my bad. I noticed the problem while fixing SOLR-282 which is not related. I fixed both problems in stead of opening a different bug for the response format issue. I will update the change log. Bill On 9/20/07, Chris Hostetter [EMAIL PROTECTED] wrote: : : It seems there's a

clarification needed for the Ranking score

2007-09-20 Thread Dilip.TS
Hi, I need a clarification regarding the SOLR Ranking. consider the scenario for searching for courses based on following relevance: a. Courses with the term in the courseTitle, courseTag and in the courseDescription would appear first b. Courses with the term in the courseTitle and