Re: Making stemming dynamic at query time

2007-12-18 Thread Bertrand Delacretaz
On Dec 18, 2007 9:41 PM, Kamran Shadkhast [EMAIL PROTECTED] wrote: ...it would be great if we could dynamiclly control this during search if we want to search with stemming or not The easiest is probably to have two copies of your field, using copyField, one stemmed and one not, and search

Re: Which terms in the query match

2007-10-17 Thread Bertrand Delacretaz
On 10/16/07, Nishant Soni [EMAIL PROTECTED] wrote: ...So is there a way to query solr about which of the tokens in the query actually matched ?... The analyzer admin page should help, see http://wiki.apache.org/solr/FAQ#head-b25df8c8393bbcca28f1f344c432975002e29ca9 -Bertrand

Re: Strange behavior when searching with accents

2007-09-20 Thread Bertrand Delacretaz
On 9/20/07, Thierry Collogne [EMAIL PROTECTED] wrote: ..when we search for matthé or for matthe, we get two totally different results The analyzer admin tool should help you find out what's happening, see http://wiki.apache.org/solr/FAQ#head-b25df8c8393bbcca28f1f344c432975002e29ca9

Re: Strange behavior when searching with accents

2007-09-20 Thread Bertrand Delacretaz
On 9/20/07, Thierry Collogne [EMAIL PROTECTED] wrote: ...Thank you very much. Moving the filter class= solr.ISOLatin1AccentFilterFactory/ up in the chain fixed it Yes, the problem was the EnglishPorterFilterFactory before the accents removal: the stemmer doesn't know about accents, so no

Re: Strange behavior when searching with accents

2007-09-20 Thread Bertrand Delacretaz
On 9/20/07, Thorsten Scherler [EMAIL PROTECTED] wrote: ...Betrand, does the French Snowball work fine?... I've seen some weirdnesses, like tennis and tenir (means to hold) both stemmed to ten, but in all of our (simple) tests it was ok. The application where we're using it does not require high

Re: SOLR developer

2007-08-30 Thread Bertrand Delacretaz
On 8/31/07, Tim Archambault [EMAIL PROTECTED] wrote: ...I'm thinking of sending a similar list-serv item out, but I noticed this is a solr-user list, not necessarily a developers list so I thought I'd ask Note that there's also [EMAIL PROTECTED] for such purposes, see

Re: solr question

2007-07-21 Thread Bertrand Delacretaz
On 7/21/07, Alessandro Ferrucci [EMAIL PROTECTED] wrote: ... the user could enter the following combinations of words: ... WORD WORD ...where the second instance is either last-name first-name OR first-name last-name. ... The dismax handler can indeed search terms in several fields, but I'd

Re: LIUS/Fulltext indexing

2007-06-12 Thread Bertrand Delacretaz
On 6/12/07, Yonik Seeley [EMAIL PROTECTED] wrote: ... I think Tika will be the way forward (some of the code for Tika is coming from LIUS)... Work has indeed started to incoroporate the Lius code into Tika, see https://issues.apache.org/jira/browse/TIKA-7 and

Re: LIUS/Fulltext indexing

2007-06-12 Thread Bertrand Delacretaz
On 6/12/07, Vish D. [EMAIL PROTECTED] wrote: ...Sounds interesting. I can't seem to find any clear dates on the project website. Do you know? ...V1 shipping date?... Not at the moment, Tika just entered incubation and it's impossible to predict what will happen. But help is welcome, of course

Re: how to crawl when Solr is search engine?

2007-06-07 Thread Bertrand Delacretaz
On 6/7/07, Ian Holsman [EMAIL PROTECTED] wrote: . it's called XSLT. most modern browsers can do the transform on the client side. otherwise there is some server side tools (cocoon I think does this) to do the transform on the server before sending it out Solr also does server-side XSLT,

Re: Solr in Windows

2007-04-26 Thread Bertrand Delacretaz
On 4/26/07, guruprasad [EMAIL PROTECTED] wrote: ...Is it only for Linux or can I install Solr on my Windows Desktop too?... Solr itself should run fine on any JVM 1.5, including Windows (and several Solr developers are working on Windows IIUC). Some of our docs refer to auxiliary scripts

Re: Re[2]: Things are not quite stable...

2007-04-25 Thread Bertrand Delacretaz
On 4/25/07, Jack L [EMAIL PROTECTED] wrote: ...Maybe it's time to think about upgrading Jetty... It's in the pipeline, see https://issues.apache.org/jira/browse/SOLR-128 -Bertrand

Re: Re[6]: Things are not quite stable...

2007-04-25 Thread Bertrand Delacretaz
On 4/25/07, Jack L [EMAIL PROTECTED] wrote: ...Regardless, I think it's a good idea to use a newer, released (not RC) version in general, considering 5.1 is one major version behind Agreed, but note that we don't have any factual evidence that the Jetty RC that we use is indeed the cause

Re: snapshooter on OS X

2007-04-22 Thread Bertrand Delacretaz
On 4/23/07, Grant Ingersoll [EMAIL PROTECTED] wrote: ...The error says something about command not found line 15, but all the files I looked at, line 15 was a comment... Running your script with bash -x myscript should help, it will echo commands before executing them. -Bertrand

Re: finalizer() in SolrCore (was: Commits and Container Shutdown)

2007-04-16 Thread Bertrand Delacretaz
On 4/16/07, Yonik Seeley [EMAIL PROTECTED] wrote: ...Yes, it's a typo. Fixed in revision 529367. -Bertrand

finalizer() in SolrCore (was: Commits and Container Shutdown)

2007-04-15 Thread Bertrand Delacretaz
On 4/16/07, Erik Hatcher [EMAIL PROTECTED] wrote: ...Further details on this: SolrCore has a finalizer() method that closes the update handler. I'm not clear on finalizer() though. How/ when is that invoked? I know about Object.finalize(), but not finalizer()... Looking at the code, it

Re: Solr Query Language

2007-04-15 Thread Bertrand Delacretaz
On 4/16/07, Jack L [EMAIL PROTECTED] wrote: Is the lucene query syntax available in solr? ... The syntax depends on the request handler used, if you're using the standard one the docs are at http://wiki.apache.org/solr/StandardRequestHandler -Bertrand

Re: Posting PDF,DOC,TXT

2007-04-06 Thread Bertrand Delacretaz
On 4/6/07, Suresh Kannan [EMAIL PROTECTED] wrote: I would like to post PDF, DOC, TXT into SOLR to do the indexing. There's no way to do that directly at the moment, you'll need to convert them to the XML format that Solr expects. The Lucene FAQ at http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Solr logo poll

2007-04-06 Thread Bertrand Delacretaz
On 4/6/07, Yonik Seeley [EMAIL PROTECTED] wrote: ...What form of logo do you prefer, A or B? B -Bertrand (a Tex Avery fan ;-)

Re: Instructables on solr

2007-04-05 Thread Bertrand Delacretaz
On 4/4/07, Ryan McKinley [EMAIL PROTECTED] wrote: ...We have been running solr for months as a band-aid, this release integrates solr deeply... Awesome - thanks for sharing this! If you don't mind, it'd be cool to add some info to http://wiki.apache.org/solr/PublicServers -Bertrand

Re: Reposting unABLE to match

2007-03-27 Thread Bertrand Delacretaz
On 3/27/07, Shridhar Venkatraman [EMAIL PROTECTED] wrote: ...Reposting unABLE to match No need to repost if your message made it to the list. If it hasn't been answered yet, it either means that no one knows the answer or that no one has had the time to answer yet. We're all volunteers here.

Re: schema field type doesn't work

2007-03-24 Thread Bertrand Delacretaz
On 3/24/07, Dimitar Ouzounov [EMAIL PROTECTED] wrote: ...I must be doing something wrong, maybe in the schema. Does anyone have any suggestions?.. The best way to debug such problems is with the analyzer admin tool: http://localhost:8983/solr/admin/analysis.jsp You can try various

Re: How to assure a permanent index.

2007-03-21 Thread Bertrand Delacretaz
On 3/21/07, Thierry Collogne [EMAIL PROTECTED] wrote: ...I mean if I do the following. - delete all documents from the index - add all documents - do a commit. Will this result in a temporary empty index, or will I always have results?... Changes to the index are invisible

Re: Problems with special characters

2007-03-21 Thread Bertrand Delacretaz
On 3/21/07, Thierry Collogne [EMAIL PROTECTED] wrote: ...I am using the post.jar file to update the search indexes. Problem is that foreign characters like é, à, ... don't work correctly... You're right, I have entered the issue in https://issues.apache.org/jira/browse/SOLR-194 For now,

Re: Problems with special characters

2007-03-21 Thread Bertrand Delacretaz
On 3/21/07, Bertrand Delacretaz [EMAIL PROTECTED] wrote: ...For now, using this as a workaround should help: java -Dfile.encoding=UTF-8 -jar post.jar http://localhost:8983/solr/update utf8-example.xml.. Should be fixed now, if you can grab the latest SimplePostToolCode [1] it should work

Re: Date range boost

2007-03-12 Thread Bertrand Delacretaz
On 3/12/07, stefano nicolai [EMAIL PROTECTED] wrote: ...All of these items have a field containing the date they were created (it's a string field at the moment, as i have this type inside my DB). I want to give a higher score to the ones with the most recent date... You should be able to

Re: production solr - app server choice ?

2007-03-10 Thread Bertrand Delacretaz
On 3/9/07, rubdabadub [EMAIL PROTECTED] wrote: ...The site is a local portal and the traffic is very high and I am not sure if Jetty is enough maybe it is Just an additional note on this: asking four people about what very high traffic means might also give you five different answers ;-)

Re: Adding data as UTF-8

2007-03-10 Thread Bertrand Delacretaz
On 3/10/07, Walter Underwood [EMAIL PROTECTED] wrote: It is better to use application/xml. See RFC 3023. Using text/xml; charset=UTF-8 will override the XML encoding declaration. application/xml will not... I agree, but did you try this with our example setup, started with java -jar start.jar?

Re: Adding data as UTF-8

2007-03-10 Thread Bertrand Delacretaz
On 3/10/07, Walter Underwood [EMAIL PROTECTED] wrote: If it does something different, that is a bug. RFC 3023 is clear. --wunder.. Sure - just wanted to confirm what I'm seeing, thanks! -Bertrand

Re: production solr - app server choice ?

2007-03-09 Thread Bertrand Delacretaz
On 3/9/07, rubdabadub [EMAIL PROTECTED] wrote: ...I am wondering what everyone is using when it comes to app server i.e. Jetty, Resin, Tomcat etc I suspect that asking four people might give you five different answers on this one ;-) Whichever servlet container you use, IMHO the

Re: Error with bin/optimize and multiple solr webapps

2007-03-07 Thread Bertrand Delacretaz
On 3/7/07, Jeff Rodenburg [EMAIL PROTECTED] wrote: Oops, my bad I didn't see either 186 or 187 before entering 188. :-) I have closed SOLR-186 and SOLR-187 as duplicates, please add relevant info to SOLR-188 if needed. -Bertrand

Re: merely a suggestion: schema.xml validator or better schema validation logging

2007-03-03 Thread Bertrand Delacretaz
On 3/3/07, Ryan McKinley [EMAIL PROTECTED] wrote: ...The rationale with the solrconfig stuff is that a broken config should behave as best it can. This is great if you are running a real site with people actively using it - it is a pain in the ass if you are getting started and don't notice

Re: merely a suggestion: schema.xml validator or better schema validation logging

2007-03-01 Thread Bertrand Delacretaz
On 3/2/07, Jed Reynolds [EMAIL PROTECTED] wrote: ...my first try at defining a schema.xml file was tough because my only feedback for a long time was NullPointerException from SolrCore when I was trying to add content... Can you give us enough information to reproduce the problem? What was

Re: MoreLikeThis and term vectors - documentation suggestion

2007-02-26 Thread Bertrand Delacretaz
On 2/26/07, Ken Krugler [EMAIL PROTECTED] wrote: ...I was trying out the MoreLikeThis support, and getting some odd results... Thanks for the info, I have added a link to your message at https://issues.apache.org/jira/browse/SOLR-69 -Bertrand

Re: Tagging

2007-02-14 Thread Bertrand Delacretaz
On 2/14/07, Erik Hatcher [EMAIL PROTECTED] wrote: ...Sorry if I'm sending things mangled somehow - and if anyone has suggestions on correcting I'm all ears For long links I tend to use http://tinyurl.com/, but it's a bit painful to do that for all links. -Bertrand

Re: Incremental replication...

2007-02-13 Thread Bertrand Delacretaz
On 2/13/07, escher2k [EMAIL PROTECTED] wrote: ...Atleast from looking at the snapshooter script, it doesn't seem to be doing anything specific... The snapshooter script only makes an instant snapshot of the index directory using cp -lr. This does not involve any copying of index data. The

Re: performance testing practices

2007-02-05 Thread Bertrand Delacretaz
On 2/5/07, Erik Hatcher [EMAIL PROTECTED] wrote: ...What numbers are folks capturing? What techniques are you using to capture numbers?... I've been using my httpstone utility (http://code.google.com/p/httpstone/) along with ab (http://httpd.apache.org/docs/2.2/programs/ab.html) to generate

Re: MoreLikeThis similarity-type queries in Solr

2007-01-31 Thread Bertrand Delacretaz
On 1/31/07, Brian Whitman [EMAIL PROTECTED] wrote: Does Solr have support for the Lucene query-contrib MoreLikeThis query type or anything like it? ... Yes, there's a patch in http://issues.apache.org/jira/browse/SOLR-69 - if you try it, please add your comments on that page. -Bertrand

Re: MoreLikeThis similarity-type queries in Solr

2007-01-31 Thread Bertrand Delacretaz
On 1/31/07, Andrew Nagy [EMAIL PROTECTED] wrote: ... Yes, there's a patch in http://issues.apache.org/jira/browse/SOLR-69 -... Anyword on something like this being incorporated into the official SOLR release? The patch is quite simple, I think we could commit it soon if the other committers

Re: How to Index Word, Excel, PDF files?

2007-01-29 Thread Bertrand Delacretaz
On 1/29/07, Leandro Saad [EMAIL PROTECTED] wrote: ...I'd like to know if solr can index Word, Excel and PDF files or I must create a xml representation of those files matching my schema?... Currently you must create the XML yourself outside of Solr. This might change, see

Re: Split one string into many fields

2007-01-22 Thread Bertrand Delacretaz
On 1/22/07, Yonik Seeley [EMAIL PROTECTED] wrote: ...When we get to it, I'd like to hear why it (things like PDF parsing) should be inside Solr rather than outside using our update interfaces Same here. I haven't had time to follow the recent (rich) design discussions about this stuff,

Re: Document freshness and Boost Functions

2007-01-17 Thread Bertrand Delacretaz
On 1/17/07, Luis Neves [EMAIL PROTECTED] wrote: ...I see that is possible to use Boost Functions to influence the score. How would that work in order to improve the score of recent documents? (I have a timestamp field in the schema)... I've been using expressions like these in boolean

Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Bertrand Delacretaz
On 1/16/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: ...Could someone give me some code examples on how Solr requests can be called by Java code... Although our Java client landscape is still a bit fuzzy (there are several variants floating around), you might want to look at the code found

Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Bertrand Delacretaz
On 1/16/07, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: ...and how would you do it calling it from another web application, let's say from a servlet or so?... Doesn't make much difference if your client is a standalone or a web application: you Solr client class will need to be configured with

Re: Calling Solr requests from java code - examples?

2007-01-16 Thread Bertrand Delacretaz
On 1/16/07, Pavel Penchev [EMAIL PROTECTED] wrote: ...What about the case where solr and my application are deployed in the same instance of say tomcat. Is there a way to skip the http requests and use a direct api?... The javax.servlet.RequestDispatcher interface allows you to access other

Re: Faceted Dates

2007-01-09 Thread Bertrand Delacretaz
On 1/9/07, Ryan McKinley [EMAIL PROTECTED] wrote: ...I would like to use faceted browsing to group documents by year, month, and day. I can think of a few ways to do this, but I'd like to see what folks think before i start down the wrong track Dunno if you've already read it, but I found

Re: Handling disparate data sources in Solr

2006-12-23 Thread Bertrand Delacretaz
On 12/23/06, Alan Burlison [EMAIL PROTECTED] wrote: ...As well as centralising the index, I also want to centralise the handling of the different document types... My Subversion and Solr presentation from the last Cocoon GetTogether might give you ideas for how to handle this, see the link at

Re: Opinions wanted about a new Solr logo (SOLR-58)

2006-12-18 Thread Bertrand Delacretaz
On 12/18/06, Linda Tan [EMAIL PROTECTED] wrote: I just learned no attachments are allowed on this list. I've put the image in the jira.. Thanks, it looks good indeed! -Bertrand

Re: post the output of a URL to solr

2006-11-30 Thread Bertrand Delacretaz
On 11/30/06, Mike Klaas [EMAIL PROTECTED] wrote: ...Try something like: wget http://localhost:/gaz/solr/f0.xml -O - | curl http://localhost:8983/solr/update --data-binary - -H 'Content-type:text/xml; charset=utf-8' and if you use curl you can use it on both sides to avoid the dependency

Re: Solr and Oracle

2006-11-24 Thread Bertrand Delacretaz
On 11/23/06, Nicolas St-Laurent [EMAIL PROTECTED] wrote: ...I index huge Oracle tables with Lucene with a custom made indexer/search engine. But I would prefer to use Solr instead... Instead of using Lucene's API directly, with Solr you'll have to add your documents to the index using HTTP

Re: Extending Solr's Admin functionality

2006-09-24 Thread Bertrand Delacretaz
On 9/24/06, Erik Hatcher [EMAIL PROTECTED] wrote: ...perhaps some authentication/ authorization as well as HTTPS should eventually make it into the core, but getting more fine grained is unnecessary... If meaningful URLs are used (admin/stats, admin/config, admin/analysis, etc.), it is

Re: Re: Doc add limit

2006-07-28 Thread Bertrand Delacretaz
On 7/28/06, Yonik Seeley [EMAIL PROTECTED] wrote: ...Getting all the little details of connection handling correct can be tough... it's probably a good idea if we work toward common client libraries so everyone doesn't have to reinvent them Jakarta's HttpClient [1] is IMHO a good base for

Re: Re: Cyrillic characters

2006-07-19 Thread Bertrand Delacretaz
On 7/19/06, Tricia Williams [EMAIL PROTECTED] wrote: ...What I called the _solr url encoding_ was the q= parameter translated into I'm not sure what encoding in the url... I think I've seen the same problem, haven't investigated deeper but IIUC the encoding used when posting a form is related