Apache solr for multiple searches
Hi I have been using Apache Solr for my Job Portal. I have been using the Apache Solr successfully for searching the resumes based on keywords. Now i need to use the same for Job search. Can we have one single instance of the Apache Solr running for both the search like Job search and resume search. Regards Bhuvan
Multi index
Hallo Users... At the Moment i test MultiCorae Solr, but i cant search in more than one core direktly.. Exist a way to use multiindex, 3-5 Indizes in one core ans search direkty in all? ore only in one? it is realy important or my Projekt. Thanks King
Re: restore space between words by spell checker
Otis Gospodnetic wrote: I'm not sure if that can be easily done (other than going char by char and testing), because nothing indicates where the space might be, not even an upper case there. I'd be curious to know if you find a better solution. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: Andrey Klochkov akloch...@griddynamics.com To: solr-user solr-user@lucene.apache.org Sent: Fri, November 27, 2009 6:09:08 AM Subject: restore space between words by spell checker Hi If a user issued a misspelled query, forgetting to place space between words, is it possible to fix it with a spell checker or by some other mechanism? For example, if we get query tommyhitfiger and have terms tommy and hitfiger in the index, how to fix the query? The usual approach to solving this is to index compound words, i.e. when producing a spellchecker dictionary add a record tommyhitfiger with a field that points to tommy hitfiger. Details vary depending on what spellchecking impl. you use. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Retrieving large num of docs
Hi Andrew, I applied the patch you suggested. I am not finding any significant changes in the response times. I am wondering if I forgot some important configuration setting etc. Here is what I did: 1. Wrote a small program using solrj to use EmbeddedSolrServer (most of the code is from the solr wiki) and run the server on an index of ~700k docs and note down the avg response time 2. Applied the SOLR-797.patch to the source code of Solr1.4 3. complied the source code and rebuilt the jar files. 4. Rerun step 1 using the new jar files. Am I supposed to do any other config changes in order to see the performance jump that you are able to achieve. Thanks a lot, Raghu On Fri, Nov 27, 2009 at 3:16 PM, AHMET ARSLAN iori...@yahoo.com wrote: Hi Andrew, We are running solr using its http interface from python. From the resources I could find, EmbeddedSolrServer is possible only if I am using solr from a java program. It will be useful to understand if a significant part of the performance increase is due to bypassing HTTP before going down this path. In the mean time I am trying my luck with the other suggestions. Can you share the patch that helps cache solr documents instead of lucene documents? May be these links can help http://wiki.apache.org/lucene-java/ImproveSearchingSpeed http://wiki.apache.org/lucene-java/ImproveIndexingSpeed http://www.lucidimagination.com/Downloads/LucidGaze-for-Solr how often do you update your index? is your index optimized? configuring caching can also help: http://wiki.apache.org/solr/SolrCaching http://wiki.apache.org/solr/SolrPerformanceFactors
Re: Retrieving large num of docs
Hi Raghu Let me describe our use case in more details. Probably that will clarify things. The usual use case for Lucene/Solr is retrieving of small portion of the result set (10-20 documents). In our case we need to read the whole result set and this creates huge load on Lucene index, meaning a lot of IO. Keep in mind that we have large number of stored fields in the index. In our case there's one thing that makes things simpler: our index is so small that we can get every document in cache. This means that even if we retrieve all documents for every result set, we don't retrieve them from Lucene index and then the performance should be Ok. But here we've got 2 problems: 1. Solr caches Lucene's Document instances. And in case of retrieving the whole result set it recreates SolrDocument instances every time. This creates a load on CPU and in particular on Java GC. 2. EmbeddedSolrServer converts the whole response into a byte array and then restores it back converting Lucene's documents and DocList's to Solr's SolrDocument and SolrDocumentList instances. This create additional load on CPU and GC. We patched Solr to eliminate those things and that fixed our performance problems. I think that if you don't place all your documents in caches and/or you don't use stored fields, retrieving ID field only, then probably those improvements won't help you. I suggest you first to find your bottlenecks. Look at IO, memory usage etc. Using a profiler is the best thing too. Probably you can use some tools from lucidimation for profiling. On Sat, Nov 28, 2009 at 4:47 PM, Raghuveer Kancherla raghuveer.kanche...@aplopio.com wrote: Hi Andrew, I applied the patch you suggested. I am not finding any significant changes in the response times. I am wondering if I forgot some important configuration setting etc. Here is what I did: 1. Wrote a small program using solrj to use EmbeddedSolrServer (most of the code is from the solr wiki) and run the server on an index of ~700k docs and note down the avg response time 2. Applied the SOLR-797.patch to the source code of Solr1.4 3. complied the source code and rebuilt the jar files. 4. Rerun step 1 using the new jar files. Am I supposed to do any other config changes in order to see the performance jump that you are able to achieve. Thanks a lot, Raghu On Fri, Nov 27, 2009 at 3:16 PM, AHMET ARSLAN iori...@yahoo.com wrote: Hi Andrew, We are running solr using its http interface from python. From the resources I could find, EmbeddedSolrServer is possible only if I am using solr from a java program. It will be useful to understand if a significant part of the performance increase is due to bypassing HTTP before going down this path. In the mean time I am trying my luck with the other suggestions. Can you share the patch that helps cache solr documents instead of lucene documents? May be these links can help http://wiki.apache.org/lucene-java/ImproveSearchingSpeed http://wiki.apache.org/lucene-java/ImproveIndexingSpeed http://www.lucidimagination.com/Downloads/LucidGaze-for-Solr how often do you update your index? is your index optimized? configuring caching can also help: http://wiki.apache.org/solr/SolrCaching http://wiki.apache.org/solr/SolrPerformanceFactors -- Andrew Klochkov Senior Software Engineer, Grid Dynamics
Re: restore space between words by spell checker
For example, if we get query tommyhitfiger and have terms tommy and hitfiger in the index, how to fix the query? The usual approach to solving this is to index compound words, i.e. when producing a spellchecker dictionary add a record tommyhitfiger with a field that points to tommy hitfiger. Details vary depending on what spellchecking impl. you use. I'm using the default Solr's spell checker, which is using n-gram index and Levenshtein distance. Can it's be customized to include compound words? What alternative spell checkers for Lucene/Solr do exist? I tried to experiment with Lucene spell checker and noticed that if configured with a low accuracy it can find words tommy and hilfiger that form the whole word. So I was able to create some logic which post-process spell checker results and finds the correct query tommy hilfiger. It just iterates over all possible combinations of terms suggested by spell checker and compares the resulting query to original by DoubleMetaphor. I'm not sure that this is the best solution though, probably it's just not fast enough. -- Andrew Klochkov Senior Software Engineer, Grid Dynamics
is it possible to use Xinclude in schema.xml?
I'm trying to determine if it's possible to use Xinclude to (for example) have a base schema file and then substitute various pieces. It seems that the schema fieldTypes throw exceptions if there is an unexpected attribute? SEVERE: java.lang.RuntimeException: schema fieldtype text(org.apache.solr.schema.TextField) invalid arguments:{xml:base=solr/core2/conf/text-analyzer.xml} This is what I'm trying to do (details of the analyzer chain omitted - nothing unusual) - so the error occurs when the external xml file is actually included: xi:include href=solr/core2/conf/text-analyzer.xml xmlns:xi=http://www.w3.org/2001/XInclude; xi:fallback fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index ... /analyzer analyzer type=query ... /analyzer /fieldType /xi:fallback /xi:include Where (for testing) the text-analyzer.xml file just looks like the fallback: ?xml version=1.0 encoding=UTF-8 ? fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index ... /analyzer analyzer type=query ... /analyzer /fieldType -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: is it possible to use Xinclude in schema.xml?
Follow-up: it seems the schema parser doesn't barf if you use xinclude with a single analyzer element, but so far seems like it's impossible for a field type. So this seems to work: fieldType name=text class=solr.TextField positionIncrementGap=100 xi:include href=solr/core2/conf/text-analyzer.xml xi:fallback analyzer type=index ... /analyzer /xi:fallback /xi:include analyzer type=query ... /analyzer /fieldType On Sat, Nov 28, 2009 at 1:40 PM, Peter Wolanin peter.wola...@acquia.com wrote: I'm trying to determine if it's possible to use Xinclude to (for example) have a base schema file and then substitute various pieces. It seems that the schema fieldTypes throw exceptions if there is an unexpected attribute? SEVERE: java.lang.RuntimeException: schema fieldtype text(org.apache.solr.schema.TextField) invalid arguments:{xml:base=solr/core2/conf/text-analyzer.xml} This is what I'm trying to do (details of the analyzer chain omitted - nothing unusual) - so the error occurs when the external xml file is actually included: xi:include href=solr/core2/conf/text-analyzer.xml xmlns:xi=http://www.w3.org/2001/XInclude; xi:fallback fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index ... /analyzer analyzer type=query ... /analyzer /fieldType /xi:fallback /xi:include Where (for testing) the text-analyzer.xml file just looks like the fallback: ?xml version=1.0 encoding=UTF-8 ? fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index ... /analyzer analyzer type=query ... /analyzer /fieldType -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
Re: ExternalFileField is broken in Solr 1.4?
Are you sure? TestFunctionQuery.testExternalField() has a test for reloading on a commit. Are you putting the file in the data directory? -Yonik http://www.lucidimagination.com 2009/11/28 Koji Sekiguchi k...@r.email.ne.jp: It seems that ExternalFileField doesn't work in 1.4. In 1.4, I need to restart Solr to reflect external_[fieldname] file. Only commit/ was needed in 1.3... Koji
Re: is it possible to use Xinclude in schema.xml?
Yea i tried it as well it doesn't seem to implement xpointer properly so you can't add multiple fields or field types David On 28 Nov 2009, at 18:49, Peter Wolanin peter.wola...@acquia.com wrote: Follow-up: it seems the schema parser doesn't barf if you use xinclude with a single analyzer element, but so far seems like it's impossible for a field type. So this seems to work: fieldType name=text class=solr.TextField positionIncrementGap=100 xi:include href=solr/core2/conf/text-analyzer.xml xi:fallback analyzer type=index ... /analyzer /xi:fallback /xi:include analyzer type=query ... /analyzer /fieldType On Sat, Nov 28, 2009 at 1:40 PM, Peter Wolanin peter.wola...@acquia.com wrote: I'm trying to determine if it's possible to use Xinclude to (for example) have a base schema file and then substitute various pieces. It seems that the schema fieldTypes throw exceptions if there is an unexpected attribute? SEVERE: java.lang.RuntimeException: schema fieldtype text(org.apache.solr.schema.TextField) invalid arguments:{xml:base=solr/core2/conf/text-analyzer.xml} This is what I'm trying to do (details of the analyzer chain omitted - nothing unusual) - so the error occurs when the external xml file is actually included: xi:include href=solr/core2/conf/text-analyzer.xml xmlns:xi=http://www.w3.org/2001/XInclude; xi:fallback fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index ... /analyzer analyzer type=query ... /analyzer /fieldType /xi:fallback /xi:include Where (for testing) the text-analyzer.xml file just looks like the fallback: ?xml version=1.0 encoding=UTF-8 ? fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index ... /analyzer analyzer type=query ... /analyzer /fieldType -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com -- Peter M. Wolanin, Ph.D. Momentum Specialist, Acquia. Inc. peter.wola...@acquia.com
RE: Trouble Configuring WordDelimiterFilterFactory
Hi Rahul, On 11/26/2009 at 12:53 AM, Rahul R wrote: Is there a way by which I can prevent the WordDelimiterFilterFactory from totally acting on numerical data ? prevent ... from totally acting on is pretty vague, and nowhere AFAICT do you say precisely what it is you want. It would help if you could give example text and the terms you think should be the result of analysis of the text. If you want different index/query time behavior, please provide this info for both. Steve
Re: ExternalFileField is broken in Solr 1.4?
Hmm, if I set reopenReaders to false, as SolrIndexReader objects between before and after commit are different, ExternalFileField works as expected. If I set reopenReaders to true, it doesn't work. ... I don't like this dependency. What I don't understand is that if I set reopenReaders to true in solrconfig-functionquery.xml, TestFunctionQuery.testExternalField() is done successfully. Koji -- http://www.rondhuit.com/en/ Yonik Seeley wrote: Are you sure? TestFunctionQuery.testExternalField() has a test for reloading on a commit. Are you putting the file in the data directory? -Yonik http://www.lucidimagination.com 2009/11/28 Koji Sekiguchi k...@r.email.ne.jp: It seems that ExternalFileField doesn't work in 1.4. In 1.4, I need to restart Solr to reflect external_[fieldname] file. Only commit/ was needed in 1.3... Koji
Re: ExternalFileField is broken in Solr 1.4?
Yonik Seeley wrote: Go ahead and open a bug. One idea is to use a different key for the weak map (something that changes every commit). -Yonik http://www.lucidimagination.com Yonik, Thank you. I opend SOLR-1607. Do you have any ideas for the candidate of the key? Koji -- http://www.rondhuit.com/en/
Multi Index
Hi all I am in need of using single Solr instance for multi indexing. Please let me know if this is possble to do. Regards Bhuvi