Re: Doing url search in solr is slow

2012-01-09 Thread François Schiettecatte
About the search 'referal_url:*www.someurl.com*', having a wildcard at the start will cause a dictionary scan for every term you search on unless you use ReversedWildcardFilterFactory. That could be the cause of your slowdown if you are I/O bound, and even if you are CPU bound for that matter.

Re: best query for one-box search string over multiple types fields?

2012-01-15 Thread François Schiettecatte
Johnny What you are going to want to do is boost the artist field with respect to the others, for example using edismax my 'qf' parameter is: number^5 title^3 default so hits in the number field get a five-fold boost and hits in the title field get a three-fold boost. In your case

Re: Question on Reverse Indexing

2012-01-17 Thread François Schiettecatte
Using ReversedWildcardFilterFactory will double the size of your dictionary (more or less), maybe the drop in performance that you are seeing is a result of that? François On Jan 17, 2012, at 9:01 PM, Shyam Bhaskaran wrote: Hi, For reverse indexing we are using the

Re: Using UUID for uniqueId

2012-02-08 Thread François Schiettecatte
Anderson I would say that this is highly unlikely, but you would need to pay attention to how they are generated, this would be a good place to start: http://en.wikipedia.org/wiki/Universally_unique_identifier Cheers François On Feb 8, 2012, at 1:31 PM, Anderson vasconcelos wrote:

Re: Help:Solr can't put all pdf files into index

2012-02-09 Thread François Schiettecatte
Have you tried checking any logs? Have you tried identifying a file which did not make it in and submitting just that one and seeing what happens? François On Feb 9, 2012, at 10:37 AM, Rong Kang wrote: Yes, I put all file in one directory and I have tested file names using code.

Re: Development inside or outside of Solr?

2012-02-20 Thread François Schiettecatte
You could take a look at this: http://www.let.rug.nl/vannoord/TextCat/ Will probably require some work to integrate/implement through François On Feb 20, 2012, at 3:37 AM, bing wrote: I have looked into the TikaCLI with -language option, and learned that Tika can output only the

Re: Solr logging

2012-02-20 Thread François Schiettecatte
Ola Here is what I have for this: ## # # Log4J configuration for SOLR # # http://wiki.apache.org/solr/SolrLogging # # # 1) Download LOG4J: # http://logging.apache.org/log4j/1.2/ #

Re: Solr out of memory exception

2012-03-15 Thread François Schiettecatte
FWIW it looks like this feature has been enabled by default since JDK 6 Update 23: http://blog.juma.me.uk/2008/10/14/32-bit-or-64-bit-jvm-how-about-a-hybrid/ François On Mar 15, 2012, at 6:39 AM, Husain, Yavar wrote: Thanks a ton. From: Li

Re: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread François Schiettecatte
John You can still use leading wildcards even if you dont have the ReversedWildcardFilterFactory in your analysis but it means you will be scanning the entire dictionary when the search is run which can be a performance issue. If you do use ReversedWildcardFilterFactory you wont have that

Re: Is leading wildcard search turned on by default in Solr 3.6.1?

2012-11-12 Thread François Schiettecatte
I suspect it is just part of the wildcard handling, maybe someone can chime in here, you may need to catch this before it gets to SOLR. François On Nov 12, 2012, at 5:44 PM, johnmu...@aol.com wrote: Thanks for the quick response. So, I do not want to use ReversedWildcardFilterFactory,

Re: Indexing only on change

2012-11-24 Thread François Schiettecatte
I would create a hash of the document content and store that in SOLR along with any document info you wish to store. When a document is presented for indexing, hash that and compare to the hash of the stored document, index if they are different and skip if they are not. François On Nov 24,

Re: Spellcheck compounded words

2011-07-26 Thread François Schiettecatte
FWIW, here is the process I follow to create a log4j aware version of the apache solr war file and the corresponding lo4j.properties files. Have fun :) François ## # # Log4J configuration for SOLR # #

Re: Spellcheck compounded words

2011-07-26 Thread François Schiettecatte
I get slf4j-log4j12-1.6.1.jar from http://www.slf4j.org/dist/slf4j-1.6.1.tar.gz, it is what interfaces slf4j to log4j, you will also need to add log4j-1.2.16.jar to WEB-INF/lib. François On Jul 26, 2011, at 3:40 PM, O. Klein wrote: François Schiettecatte wrote: # # 4) Copy

Re: schema.xml changes, need re-indexing ?

2011-07-27 Thread François Schiettecatte
I have not seen this mentioned anywhere, but I found a useful 'trick' to restart solr without having to restart tomcat. All you need to do is 'touch' the solr.xml in the solr.home directory. It can take a few seconds but solr will restart and reload any config. Cheers François On Jul 27,

Re: Solr can not index F**K!

2011-07-31 Thread François Schiettecatte
That seems a little far fetched, have you checked your analysis? François On Jul 31, 2011, at 4:58 PM, randohi wrote: One of our clients (a hot girl!) brought this to our attention: In this document there are many f* words:

Re: Solr can not index F**K!

2011-07-31 Thread François Schiettecatte
Indeed, the analysis will show if the term is a stop word, the term gets removed by the stop filter, turning on verbose output shows that. François On Jul 31, 2011, at 6:27 PM, Shashi Kant wrote: Check your Stop words list On Jul 31, 2011 6:25 PM, François Schiettecatte fschietteca

Re: Solr 3.3 crashes after ~18 hours?

2011-08-02 Thread François Schiettecatte
Assuming you are running on Linux, you might want to check /var/log/messages too (the location might vary), I think the kernel logs forced process termination there. I recall that the kernel will usually picks the process consuming the most memory, there may be other factors involved too.

Re: SolrServer instances

2011-08-26 Thread François Schiettecatte
Sounds to me that you are looking for HTTP Persistent Connections (connection keep-alive as opposed to close), and a singleton object. This would be outside SOLR per se. A few caveats though, I am not sure if tomcat supports keep-alive, and I am not sure how SOLR deals with multiple requests

Re: Error while decoding %DC (Ü) from URL - results in ?

2011-08-27 Thread François Schiettecatte
Merlin Ü encodes to two characters in utf-8 (C39C), and one in iso-8859-1 (%DC) so it looks like there is a charset mismatch somewhere. Cheers François On Aug 27, 2011, at 6:34 AM, Merlin Morgenstern wrote: Hello, I am having problems with searches that are issued from spiders that

Re: Error while decoding %DC (Ü) from URL - results in ?

2011-08-29 Thread François Schiettecatte
/27 François Schiettecatte fschietteca...@gmail.com Merlin Ü encodes to two characters in utf-8 (C39C), and one in iso-8859-1 (%DC) so it looks like there is a charset mismatch somewhere. Cheers François On Aug 27, 2011, at 6:34 AM, Merlin Morgenstern wrote: Hello, I am

Re: shareSchema=true - location of schema.xml?

2011-08-31 Thread François Schiettecatte
Satish You don't say which platform you are on but have you tried links (with ln on linux/unix) ? François On Aug 31, 2011, at 12:25 AM, Satish Talim wrote: I have 1000's of cores and to reduce the cost of loading unloading schema.xml, I have my solr.xml as mentioned here -

Re: Solr and wikipedia for schools

2011-09-04 Thread François Schiettecatte
I note that there is a full download option available, might be easier than crawling. François On Sep 4, 2011, at 9:56 AM, Markus Jelsma wrote: Hi, Solr is a search engine, not a crawler. You can use Apache Nutch to crawl your site and have it indexed in Solr. Cheers, Hi, I am

Re: MMapDirectory failed to map a 23G compound index segment

2011-09-07 Thread François Schiettecatte
My memory of this is a little rusty but isn't mmap also limited by mem + swap on the box? What does 'free -g' report? François On Sep 7, 2011, at 12:25 PM, Rich Cariens wrote: Ahoy ahoy! I've run into the dreaded OOM error with MMapDirectory on a 23G cfs compound index segment file. The

Re: synonyms.txt: different results on admin and on site..

2011-09-08 Thread François Schiettecatte
Wildcard terms are not analyzed, so your synonyms.txt may come into play here, have you check the analysis for deniz* ? François On Sep 7, 2011, at 10:08 PM, deniz wrote: well yea you are right... i realised that lack of detail issue here... so here it comes... This is from my

Re: drastic performance decrease with 20 cores

2011-09-26 Thread François Schiettecatte
You have not said how big your index is but I suspect that allocating 13GB for your 20 cores is starving the OS of memory for caching file data. Have you tried 6GB with 20 cores? I suspect you will see the same performance as 6GB 10 cores. Generally it is better to allocate just enough memory

Re: Uncomplete date expressions

2011-10-29 Thread François Schiettecatte
Erik I would complement the date with default values as you suggest and store a boolean flag indicating whether the date was complete or not, or store the original date if it is not complete which would probably be better because the presence of that data would tell you that the original date

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread François Schiettecatte
Arshad Actually it is available, you need to use the ReversedWildcardFilterFactory which I am sure you can Google for. Solr and SQL address different problem sets with some overlaps but there are significant differences between the two technologies. Actually '%Solr%' is a worse case for SQL

Re: Is SQL Like operator feature available in Apache Solr query

2011-11-01 Thread François Schiettecatte
. This is what's Solr made for. :) -Kuli Am 01.11.2011 13:24, schrieb François Schiettecatte: Arshad Actually it is available, you need to use the ReversedWildcardFilterFactory which I am sure you can Google for. Solr and SQL address different problem sets with some overlaps

Re: query within search results

2011-11-08 Thread François Schiettecatte
Wouldn't 'diseases AND water' or '+diseases +water' return you that result? Or you could search on 'water' while filtering on 'diseases'. Or am I missing something here? François On Nov 8, 2011, at 4:19 PM, sharnel pereira wrote: Hi, I have 10k records indexed using solr 1.4 We have a

Re: how index words with their perfix in solr?

2011-11-28 Thread François Schiettecatte
It looks like you are using the plural stemmer, you might want to look into using the Porter stemmer instead: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Stemming François On Nov 28, 2011, at 9:14 AM, mina wrote: I use solr 3.3,I want solr index words with their suffixes.

Re: Don't snowball depending on terms

2011-11-29 Thread François Schiettecatte
It won't and depending on how your analyzer is set up the terms are most likely stemmed at index time. You could create a separate field for unstemmed terms though, or use a less aggressive stemmer such as EnglishMinimalStemFilterFactory. François On Nov 29, 2011, at 12:33 PM, Robert Brown

Re: how index words with their perfix in solr?

2011-11-29 Thread François Schiettecatte
You might try the snowball stemmer too, I am not sure how closely that will fit your requirements though. Alternatively you could use synonyms. François On Nov 29, 2011, at 1:08 AM, mina wrote: thank you for your answer.i read it and i use this filter in my schema.xml in solr: filter

Re: Shutdown hook issue

2011-12-14 Thread François Schiettecatte
I am not an expert on this but the oom-killer will kill off the process consuming the greatest amount of memory if the machine runs out of memory, and you should see something to that effect in the system log, /var/log/messages I think. François On Dec 14, 2011, at 2:54 PM, Adolfo Castro

Re: solr benchmarks

2010-12-31 Thread François Schiettecatte
I would shard the index so that each shard is no larger than the memory of the machine it sits on, that way your entire index will be in memory all the time. When I was at Feedster (I wrote the search engine), the rule of thumb I had was to have 14GB of index on a 16GB machine. François On

Re: Spelling Suggestions vs Correction

2011-01-01 Thread François Schiettecatte
I have just been playing around with spell check and from what I can tell it does not do that automatically, you would need to program that in your application yourself, basically run a search, check the spellcheck, run the search again if needed, and present the results to the user, not all

Re: Improving Solr performance

2011-01-07 Thread François Schiettecatte
It sounds like your system is I/O bound and I suspect (bet even) that all your index files are on the same disk drive. Also you have only 8GB of RAM for 100GB of index, so while your SOLR instance will cache some stuff and the balance will be used for caching file blocks, there really isn't

Re: Box occasionally pegs one cpu at 100%

2011-01-10 Thread François Schiettecatte
This reminded me of a situation I ran into in the past where the JVM was being rendered useless because it was calling FGC repeatedly. Effectively what was going on is that a very large array was allocated which swamped the JVM memory and caused it to trash, much like an OS. Here are some

Re: Malformed XML with exotic characters

2011-02-01 Thread François Schiettecatte
Markus A few things to check, make sure whatever SOLR is hosted on is outputting utf-8 ( URIEncoding=UTF-8 in the Connector section in server.xml on Tomcat for example), which it looks like here, also make sure that whatever http header there is tells firefox that it is getting utf-8

Re: Solr 4.0 trunk in production

2011-02-19 Thread François Schiettecatte
I use it in a production setting, but I don't have a very large data set or a very heavy query load, the reason I use it is for edismax. François On Feb 19, 2011, at 9:50 AM, Mark wrote: Would I be crazy even to consider putting this in production? Thanks

Re: change in field_type

2011-02-21 Thread François Schiettecatte
Hello What about adding or deleting fields? I have been reindexing after doing that but is it needed? François On Feb 21, 2011, at 7:16 AM, Otis Gospodnetic wrote: Hello, When you change types you typically want to reindex everything. Otis Sematext :: http://sematext.com/ ::

Re: memory leak during undeploying

2011-03-02 Thread François Schiettecatte
Hi I get the same problem on tomcat with other applications, so this does not appear to be limited to SOLR. I got the error on tomcat 6 and 7. The only solution I found was to kill tomcat and start it again. François On Mar 2, 2011, at 2:28 PM, Search Learn wrote: Hello, We currently

Re: memory leak during undeploying

2011-03-02 Thread François Schiettecatte
Hi I get the same problem on tomcat with other applications, so this does not appear to be limited to SOLR. I got the error on tomcat 6 and 7. The only solution I found was to kill tomcat and start it again. François On Mar 2, 2011, at 2:28 PM, Search Learn wrote: Hello, We currently

Re: How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread François Schiettecatte
I did a little research into this for a client a while. The character mapping is not one to one which complicates things (TC and SC have evolved independently) and if you want to do a perfect job you will need a dictionary. However there are tables out there (I can dig one up for you) that

Re: How to handle searches across traditional and simplifies Chinese?

2011-03-07 Thread François Schiettecatte
help. --- On Mon, 3/7/11, François Schiettecatte fschietteca...@gmail.com wrote: From: François Schiettecatte fschietteca...@gmail.com Subject: Re: How to handle searches across traditional and simplifies Chinese? To: solr-user@lucene.apache.org Date: Monday, March 7, 2011, 5:24 PM I

Re: Multiple Japanese Alphabets in Solr

2011-03-11 Thread François Schiettecatte
Tomás That wont really work, transliteration to Romaji works for individual terms only so you would need to tokenize the Japanese prior to transliteration. I am not sure what tool you plan to use for transliteration, I have used ICU in the past and from what I can tell it does not

Re: Multiple Japanese Alphabets in Solr

2011-03-11 Thread François Schiettecatte
with Lucene and Solr. wunder On Mar 11, 2011, at 8:09 AM, François Schiettecatte wrote: Tomás That wont really work, transliteration to Romaji works for individual terms only so you would need to tokenize the Japanese prior to transliteration. I am not sure what tool you plan to use

Re: Multiple Japanese Alphabets in Solr

2011-03-11 Thread François Schiettecatte
François Schiettecatte fschietteca...@gmail.com Good question about transliteration, the issue has to do with recall, for example, I can write 'Toyota' as 'トヨタ' or 'とよた' (Katakana and Hiragana respectively), not doing the transliteration will miss results. You will find that the big search engines

Re: Multiple Japanese Alphabets in Solr

2011-03-11 Thread François Schiettecatte
it doesn't do Kanji) Tomás 2011/3/11 François Schiettecatte fschietteca...@gmail.com Good question about transliteration, the issue has to do with recall, for example, I can write 'Toyota' as 'トヨタ' or 'とよた' (Katakana and Hiragana respectively), not doing the transliteration will miss

Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-16 Thread François Schiettecatte
Lewis Quick response, I am currently using Tomcat 7.0.8 with solr (with no issues), I will upgrade to 7.0.11 tonight and see if I run into the same issues. Stay tuned as they say. Cheers François On Mar 16, 2011, at 2:38 PM, McGibbney, Lewis John wrote: Hello list, Is anyone running

Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-17 Thread François Schiettecatte
Lewis My update from tomcat 7.0.8 to 7.0.11 went with no hitches, I checked my context file and it does not have the xml preamble your has, specifically: '?xml version=1.0 encoding=utf-8?', Here is my context file: Context

Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11

2011-03-17 Thread François Schiettecatte
at the start of your XML file or some non blank lines ? Pierre -Message d'origine- De : François Schiettecatte [mailto:fschietteca...@gmail.com] Envoyé : jeudi 17 mars 2011 14:48 À : solr-user@lucene.apache.org Objet : Re: Using Solr 1.4.1 on most recent Tomcat 7.0.11 Lewis My

Re: Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.

2011-03-25 Thread François Schiettecatte
François I think there is a language identification tool in the Nutch code base, otherwise I have written one in Perl which could easily be translated to Java. I wont have access to it for 10 days (I am traveling), but I am happy to send you a link to it when I get back (and anyone else who

Re: Wanted: a directory of quick-and-(not too)dirty analyzers for multi-language RDF.

2011-03-25 Thread François Schiettecatte
I had meant to also include a link to a blog post of mine that lists some useful links: http://fschiettecatte.wordpress.com/2008/07/23/language-recognition/ François On Mar 25, 2011, at 11:59 AM, Grant Ingersoll wrote: You are looking for a language identification tool. You could

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-05 Thread François Schiettecatte
And if you have control over machine placement, split them across racks so that a power outage on one rack does not take out your search cluster. François On Apr 5, 2011, at 3:19 AM, Ephraim Ofir wrote: I'm not sure about the scale you're aiming for, but you probably want to do both sharding

Re: Very very large scale Solr Deployment = how to do (Expert Question)?

2011-04-08 Thread François Schiettecatte
You might also want to look at the heritrix crawler too: http://crawler.archive.org/ I have written three crawlers in the past, all for RSS feeds, it is not easy. Happy to provide tips and help if you want to go down that route. François On Apr 8, 2011, at 1:53 AM, Andrea Campi wrote:

Re: QUESTION: SOLR INDEX BIG FILE SIZES

2011-04-15 Thread François Schiettecatte
Specifically to the file size support, all the file systems on current releases of linux (and unixes too) support large files with 64 bit offsets, and I am pretty sure that java VM supports 64 bit offsets in files, so there is no 2GB file size limit anymore. François On Apr 15, 2011, at 4:31

Re: Solr indexing size for a particular document.

2011-04-19 Thread François Schiettecatte
I think you could approximate this with some empirical measurements, i.e. index 1,000 'typical' documents and see what the resulting index size it. Of course you may need to adjust this number upwards if there is a lot of variability in document size. When I built the search engine that ran

Re: testing of stemming

2011-04-19 Thread François Schiettecatte
I would start here: http://snowball.tartarus.org/ François On Apr 19, 2011, at 11:15 AM, bryan rasmussen wrote: Hi, I was wondering if I have a large number of queries I want to test stemming on if there is a free standing library I can just run it against without having to do

Re: Searching performance suffers tremendously during indexing

2011-05-01 Thread François Schiettecatte
If you are on linux, I would recommend two tools you can use to track what is going on on the machine, atop ( http://freshmeat.net/projects/atop/ ) and dstat ( http://freshmeat.net/projects/dstat/ ). atop in particular has been very useful to me in tracking down performance issues in real time

Re: Searching performance suffers tremendously during indexing

2011-05-01 Thread François Schiettecatte
Couple of things. One you are not swaping which is a good thing. Second (and I am not sure what delay you selected for dstat, I would assume the default of 1 second) there is some pretty heavy write activity like this: 26 1 71 2 0 0 |4096B 1424k| 0 0 | 719 415 | 197M 11G|1.00

Re: Does the Solr enable Lemmatization [not the Stemming]

2011-05-05 Thread François Schiettecatte
Rajani You might also want to look at Balie ( http://balie.sourceforge.net/ ), from the web site: Features: • language identification • tokenization • sentence boundary detection • named-entity recognition Can't vouch for it though. On May 5, 2011, at 4:58

Re: Thoughts on Search Analytics?

2011-05-05 Thread François Schiettecatte
When I ran the search engine at Feedster, I wrote a perl script that ran nightly and gave me: total number of searches total number of searches per hour N most frequent searches max time for a search min time for a search mean time for searches median time for searches N slowest searches

Re: Want to Delete Existing Index create fresh index

2011-05-14 Thread François Schiettecatte
You can also shut down solr/lucene, do: rm -rf /YourIndexName/data/index and restart, the index directory will be automatically recreated. François On May 14, 2011, at 1:53 AM, Gabriele Kahlout wrote: curl --fail $solrIndex/update?commit=true -d 'deletequery*:*/query/delete' #empty

Re: Want to Delete Existing Index create fresh index

2011-05-14 Thread François Schiettecatte
. -- Regards, Dmitry Kan On Sat, May 14, 2011 at 4:01 PM, Pawan Darira pawan.dar...@gmail.com wrote: I did that. Index directory is created but not contents in that 2011/5/14 François Schiettecatte fschietteca...@gmail.com You can also shut down solr/lucene, do: rm -rf

Re: UniqueKey field in schema.xml

2011-05-26 Thread François Schiettecatte
You concatenate the two keys into a single string, with some sort of delimiter between the two keys. François On May 26, 2011, at 6:05 AM, Romi wrote: what do you mean by combine two fields customerID and ProductId. what i tried is 1. make both fields unique but it doesnot server my

Re: UniqueKey field in schema.xml

2011-05-26 Thread François Schiettecatte
Here is some code: -- final String key1 = 1; final String key2 = 2; final String masterKey = key1 + : + key2; -- You need to combine the keys *before* you send them to Solr. François On May 26, 2011, at 7:02 AM, Romi wrote: I am not getting how can i combine two keys in to a

Re: DIH: Exception with Too many connections

2011-05-31 Thread François Schiettecatte
Hi You might also check the 'max_user_connections' settings too if you have that set: # Maximum number of connections, and per user max_connections = 2048 max_user_connections = 2048 http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html Cheers

Re: synonyms problem

2011-06-02 Thread François Schiettecatte
Are you sure solr.StrField is the way to go with this? solr.StrField stores the entire text verbatim and I am pretty sure skips any analysis. Perhaps you should use solr.TextField instead. François On Jun 2, 2011, at 2:28 AM, deniz wrote: Hi all, here is a piece from my solfconfig:

Re: how to request for Json object

2011-06-02 Thread François Schiettecatte
This is not really an issue with SOLR per se, and I have run into this before, you will need to read up on 'Access-Control-Allow-Origin' which needs to be set in the http headers that your ajax pager is returning. Beware that not all browsers obey it and Olivier is right when he suggested

Re: Solr Field name restrictions

2011-06-04 Thread François Schiettecatte
Underscores and dashes are fine, but I would think that colons (:) are verboten. François On Jun 4, 2011, at 9:49 PM, Jamie Johnson wrote: Is there a list anywhere detailing field name restrictions. I imagine fields containing periods (.) are problematic if you try to use that field when

Re: Strange behavior

2011-06-14 Thread François Schiettecatte
I think you will need to provide more information than this, no-one on this list is omniscient AFAIK. François On Jun 14, 2011, at 10:44 AM, Denis Kuzmenok wrote: Hi. I've debugged search on test machine, after copying to production server the entire directory (entire solr directory),

Re: Performance loss - querying more than 64 cores (randomly)

2011-06-16 Thread François Schiettecatte
I am assuming that you are running on linux here, I have found atop to be very useful to see what is going on. http://freshmeat.net/projects/atop/ dstat is also very useful too but needs a little more work to 'decode'. Obviously there is contention going on, you just need to figure out

Re: Why does paste get parsed into past?

2011-06-18 Thread François Schiettecatte
What do you have set up for stemming? François On Jun 18, 2011, at 8:00 AM, Gabriele Kahlout wrote: Hello, Debugging query results I find that: str name=querystringpaste/str str name=parsedquerycontent:past/str Now paste and past are two different words. Why does Solr not consider

Re: Why does paste get parsed into past?

2011-06-18 Thread François Schiettecatte
. My real issue is why are not query keywords treated as a set?http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201106.mbox/%3CBANLkTikHunhyWc2WVTofRYU4ZW=c8oe...@mail.gmail.com%3E 2011/6/18 François Schiettecatte fschietteca...@gmail.com What do you have set up for stemming

Re: Multiple indexes

2011-06-18 Thread François Schiettecatte
Sure. François On Jun 18, 2011, at 2:25 PM, shacky wrote: 2011/6/15 Edoardo Tosca e.to...@sourcesense.com: Try to use multiple cores: http://wiki.apache.org/solr/CoreAdmin Can I do concurrent searches on multiple cores?

Re: Multiple indexes

2011-06-18 Thread François Schiettecatte
+ normalized tables, but only 4 cores. Perhaps describing what you are trying to achieve would give us greater insight and thus be able to make more concrete recommendation? Cheers François On Jun 18, 2011, at 2:36 PM, shacky wrote: Il 18 giugno 2011 20:27, François Schiettecatte

Re: Is it true that I cannot delete stored content from the index?

2011-06-19 Thread François Schiettecatte
That is correct, but you only need to commit, optimize is not a requirement here. François On Jun 18, 2011, at 11:54 PM, Mohammad Shariq wrote: I have define uniqueKey in my solr and Deleting the docs from solr using this uniqueKey. and then doing optimization once in a day. is this right

Re: Extending Solr Highlighter to pull information from external source

2011-06-20 Thread François Schiettecatte
Mike I would be very interested in the answer to that question too. My hunch is that the answer is no too. I have a few text databases that range from 200MB to about 60GB with which I could run some tests. I will have some downtime in early July and will post results. From what I can tell the

Re: Searching in Traditional / Simplified Chinese Record

2011-06-20 Thread François Schiettecatte
Wayne I am not sure what you mean by 'changing the record'. One option would be to implement something like the synonyms filter to generate the TC for SC when you index the document, which would index both the TC and the SC in the same location. That way your users would be able to search with

Re: Include synonys in solr

2011-06-28 Thread François Schiettecatte
Well you need to find word lists and/or a thesaurus. This is one place to start: http://wordlist.sourceforge.net/ I used the US/UK english word list for my synonyms for an index I have because it contains both US and UK english terms, the list lacks some medical terms though so we

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
Create a hash from the url and use that as the unique key, md5 or sha1 would probably be good enough. Cheers François On Jun 28, 2011, at 7:29 AM, Mohammad Shariq wrote: I also have the problem of duplicate docs. I am indexing news articles, Every news article will have the source URL, If

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
using SOLR as index engine Only and using Riak(key-value storage) as storage engine, I dont want to do the overwrite on duplicate. I just need to discard the duplicates. 2011/6/28 François Schiettecatte fschietteca...@gmail.com Create a hash from the url and use that as the unique key

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
to work on it, as there are some other low hanging fruits I've to capture. Will share my thoughts soon. *Pranav Prakash* temet nosce Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com | Google http://www.google.com/profiles/pranny 2011/6/28 François Schiettecatte

Re: Removing duplicate documents from search results

2011-06-28 Thread François Schiettecatte
nosce Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com | Google http://www.google.com/profiles/pranny 2011/6/28 François Schiettecatte fschietteca...@gmail.com Maybe there is a way to get Solr to reject documents that already exist in the index but I doubt

Re: Include synonys in solr

2011-06-28 Thread François Schiettecatte
wrote: Thanks François Schiettecatte, information you provided is very helpful. i need to know one more thing, i downloaded one of the given dictionary but it contains many files, do i need to add all this files data in to synonyms.text ?? - Thanks Regards Romi -- View this message

Re: filters effect on search results

2011-06-29 Thread François Schiettecatte
Indeed, I find the Porter stemmer to be too 'aggressive' for my taste, I prefer the EnglishMinimalStemFilterFactory, with the caveat that it depends on your data set. Cheers François On Jun 29, 2011, at 6:21 AM, Ahmet Arslan wrote: Hi, when i query for elegant in solr i get results for

Re: Wildcard search not working if full word is queried

2011-06-30 Thread François Schiettecatte
I would run that word through the analyzer, I suspect that the word 'teste' is being stemmed to 'test' in the index, at least that is the first place I would check. François On Jun 30, 2011, at 2:21 PM, Celso Pinto wrote: Hi everyone, I'm having some trouble figuring out why a query with

Re: Wildcard search not working if full word is queried

2011-07-01 Thread François Schiettecatte
, it is indeed being stemmed, thanks a lot for the heads up. It appears that stemming is also configured for the query so it should work just the same, no? Thanks again. Regards, Celso 2011/6/30 François Schiettecatte fschietteca...@gmail.com: I would run that word through the analyzer, I

Re: performance variation with respect to the index size

2011-07-08 Thread François Schiettecatte
Hi I don't think that anyone has run such benchmarks, in fact this topic came up two weeks ago and I volunteered some time to do that because I have some spare time this week, so I am going to run some benchmarks this weekend and report back. The machine I have to do this a core i7 960, 24GB,

Re: Result list order in case of ties

2011-07-12 Thread François Schiettecatte
You just need to provide a second sort field along the lines of: sort=score desc, author desc François On Jul 12, 2011, at 6:13 AM, Lox wrote: Hi, In the case where two or more documents are returned with the same score, is there a way to tell Solr to sort them alphabetically?

Re: Wildcard

2011-07-13 Thread François Schiettecatte
http://lucene.apache.org/java/2_9_1/queryparsersyntax.html http://wiki.apache.org/solr/SolrQuerySyntax François On Jul 13, 2011, at 1:29 PM, GAURAV PAREEK wrote: Hello, What are wildcards we can use with the SOLR ? Regards, Gaurav

Re: - character in search query

2011-07-14 Thread François Schiettecatte
Easy, the hyphen is out on its own (with spaces on either side) and is probably getting removed from the search by the tokenizer. Check your analysis. François On Jul 14, 2011, at 6:05 AM, roySolr wrote: It looks like it's still not working. I send this to SOLR: q=arsenal \- london I

Re: How to find whether solr server is running or not

2011-07-19 Thread François Schiettecatte
I think anything but a 200 OK mean it is dead like the proverbial parrot :) François On Jul 19, 2011, at 7:42 AM, Romi wrote: But the problem is when solr server is not runing *http://host:port/solr/admin/ping* will not give me any json response then how will i get the status :( when

Re: problem searching on non standard characters

2011-07-22 Thread François Schiettecatte
Check your analyzers to make sure that these characters are not getting stripped out in the tokenization process, the url for 3.3 is somewhere along the lines of: http://localhost/solr/admin/analysis.jsp?highlight=on And you should be indeed be searching on \#test. François On Jul

Re: problem searching on non standard characters

2011-07-22 Thread François Schiettecatte
Adding to my previous reply, I just did a quick check on the 'text_en' and 'text_en_splitting' field types and they both strip leading '#'. Cheers François On Jul 22, 2011, at 10:49 AM, Shawn Heisey wrote: On 7/22/2011 8:34 AM, Jason Toy wrote: How does one search for words with characters

Re: Indexation Speed?

2012-06-19 Thread François Schiettecatte
Just a suggestion, you might want to monitor CPU usage and disk I/O, there might be a bottleneck. Cheers François On Jun 19, 2012, at 7:07 AM, Bruno Mannina wrote: Actually -Xmx512m and no effect Concerning maxFieldLength, no problem it's commented Le 19/06/2012 13:02, Erick Erickson

Re: Indexation Speed?

2012-06-19 Thread François Schiettecatte
during the process but How can I check IO HDD ? Le 19/06/2012 14:13, François Schiettecatte a écrit : Just a suggestion, you might want to monitor CPU usage and disk I/O, there might be a bottleneck. Cheers François On Jun 19, 2012, at 7:07 AM, Bruno Mannina wrote: Actually

Re: Indexation Speed?

2012-06-19 Thread François Schiettecatte
, at 9:03 AM, Bruno Mannina wrote: Linux Ubuntu :) since 2 months ! so I'm a new in this world :) Le 19/06/2012 15:01, François Schiettecatte a écrit : Well that depends on the platform you are on, you did not mention that. If you are using linux, you could use atop ( http://www.atoptool.nl

Re: difference between stored=false and stored=true ?

2012-06-30 Thread François Schiettecatte
Giovanni stored=true means the data is stored in the index and can be returned with the search results (see the 'fl' parameter). This is independent of indexed=.. Which means that you can store but not index a field: indexed=false stored=true Best regards François On Jun 30,

Re: Can't find solr.xml

2012-07-11 Thread François Schiettecatte
On Jul 11, 2012, at 2:52 PM, Shawn Heisey wrote: On 7/2/2012 2:33 AM, Nabeel Sulieman wrote: Argh! (and hooray!) I started from scratch again, following the wiki instructions. I did only one thing differently; put my data directory in /opt instead of /home/dev. And now it works! I'm

  1   2   >