Challenge: Is dynamic data source possible for DataImportHandler JdbcDataSource?
Hi, The challenge I'm facing is some sort of dynamic data source. Your valuable input is highly appreciated. Below is my data-config.xml. I have one user database and two company databases. The user table in the user database has four columns which are id + name + company_dbname + company_id. Depending on the company_dbname, I need to look up either companydb0 or companydb1 to get the company name by the company_id. dataConfig dataSource type=JdbcDataSource name=userdb driver=com.mysql.jdbc.Driver url=jdbc:mysql://db0.com:3306/user user=xxx password=calltextual batchSize=-1/ dataSource type=JdbcDataSource name=companydb0 driver=com.mysql.jdbc.Driver url=jdbc:mysql://companydb0.com:3306/company user=xxx password=calltextual batchSize=-1/ dataSource type=JdbcDataSource name=companydb1 driver=com.mysql.jdbc.Driver url=jdbc:mysql://companydb1.com:3306/company user=xxx password=calltextual batchSize=-1/ document name=USERS entity name=USER dataSource=userdb query=SELECT id, name, company_dbname, company_id from user field column=id name=id / field column=name name=name / entity name=company dataSource=${USER.company_dbname} query=SELECT name from company WHERE id = '${PG0.company_id}' field column=name name=company_name / /entity /entity /document /dataConfig Is it doable to set the data source dynamically for the child entity? In my case, I would like to set company entity dataSource to ${USER.company_dbname} which is returned from USER entity query. If it's not doable with current implementation, I would like to download the source code and customize it for my needs. Which source java file I should start with? Many many thanks, Kevin
Re: muticore setup with tomcat
below is my setup, Context docBase=/home/zhangyongjiang/applications/solr/solr.war debug=0 crossContext=true Environment name=solr/home type=java.lang.String value=/home/zhangyongjiang/applications/solr override=false / /Context then under /home/zhangyongjiang/applications/solr, I have solr.xml as below, solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=core1 instanceDir=core1 / core name=core2 instanceDir=core2 / core name=core3 instanceDir=core3 / core name=core4 instanceDir=core4 / /cores /solr under /home/zhangyongjiang/applications/solr, I created core1/, core2/, core3/, core4 subdirectories. hope it helps. - Original Message From: Chris Hostetter hossman_luc...@fucit.org To: solr-user@lucene.apache.org Sent: Tuesday, March 17, 2009 3:46:11 PM Subject: Re: muticore setup with tomcat You haven't really given us a lot of information to work with... what shows up in your logs? what did you name the context fragment file? where did you put the context fragment file? where did you put the multicore directory? sharing *exact* directory lisings and the *exact* commands you've executed is much more likely to help people understand what you're seeing. For example: the SolrTomcat wiki page shows an exact set of shell commands to install solr and tomcat on linux or cygwin and get it running against a simple example ... if you can provide a similar set commands showing *exactly* what you've done, people might be able to spot the problem (or try the steps themselve and reproduce the problem) http://wiki.apache.org/solr/SolrTomcat : Date: Mon, 9 Mar 2009 14:55:47 +0530 : Hi, : : I am trying to do amulticore set up.. : : I added the following from the 1.3 solr download to new dir called multicore : : core0 ,core1,solr.xml and solr.war : : in the tomcat context fragment i have defined as : : Context docBase=c:/multicore/solr.war debug=0 crossContext=true :Environment name=solr/home type=java.lang.String value=C:\multicore : override=true / : /Context : http://localhost:8080/multicore/admin : http://localhost:8080/multicore/admin/core0 : : The above 2 ursl give me resource not found error : : the solr.xml is the default one from the download. : : Please tell me as to what needs to be changed to make this work in tomcat : : Regards : Sujatha : -Hoss
Re: How can I configure different types in Solr?
One solr instance has only one doc type. So you have many types, the first option is to use multiple solr server instance. The second option is to use multiple core. In this case, you have one solr sever instance but in the server instance you have more than one core. If you don't want to use multiple server instance or multiple core, the third option is to use the dynamic field. Here is my approach. In the schema, I define all dynamic fields I need. It covers all of my cases. Its format is *_DATATYPE_INDEXED_STORED_MULTIPLEVALUE. DATATYPE = Integer | Float | Double | String | Text | DaTe | Long. The short versons are i f d s t dt INDEXED: i -- yes, it's indexed; ni -- no, it's not indexed STORED: s -- yes, it's stored; ns -- no, it's not stored MULTIPLEVALUE: m -- yes, this field has multi-values; nm -- no, this field has single value. My list of dynamic fields. dynamicField name=*_i_i_s_m type=integerindexed=true stored=true multiValued=true/ dynamicField name=*_i_i_s_nm type=integerindexed=true stored=true multiValued=false/ dynamicField name=*_i_i_ns_m type=integerindexed=true stored=false multiValued=true/ dynamicField name=*_i_i_ns_nm type=integerindexed=true stored=false multiValued=false/ dynamicField name=*_i_ni_s_m type=integerindexed=false stored=true multiValued=true/ dynamicField name=*_i_ni_s_nm type=integerindexed=false stored=true multiValued=false/ dynamicField name=*_i_ni_ns_m type=integerindexed=false stored=false multiValued=true/ dynamicField name=*_i_ni_ns_nm type=integerindexed=false stored=false multiValued=false/ dynamicField name=*_l_i_s_m type=longindexed=true stored=true multiValued=true/ dynamicField name=*_l_i_s_nm type=longindexed=true stored=true multiValued=false/ dynamicField name=*_l_i_ns_m type=longindexed=true stored=false multiValued=true/ dynamicField name=*_l_i_ns_nm type=longindexed=true stored=false multiValued=false/ dynamicField name=*_l_ni_s_m type=longindexed=false stored=true multiValued=true/ dynamicField name=*_l_ni_s_nm type=longindexed=false stored=true multiValued=false/ dynamicField name=*_l_ni_ns_m type=longindexed=false stored=false multiValued=true/ dynamicField name=*_l_ni_ns_nm type=longindexed=false stored=false multiValued=false/ dynamicField name=*_f_i_s_m type=floatindexed=true stored=true multiValued=true/ dynamicField name=*_f_i_s_nm type=floatindexed=true stored=true multiValued=false/ dynamicField name=*_f_i_ns_m type=floatindexed=true stored=false multiValued=true/ dynamicField name=*_f_i_ns_nm type=floatindexed=true stored=false multiValued=false/ dynamicField name=*_f_ni_s_m type=floatindexed=false stored=true multiValued=true/ dynamicField name=*_f_ni_s_nm type=floatindexed=false stored=true multiValued=false/ dynamicField name=*_f_ni_ns_m type=floatindexed=false stored=false multiValued=true/ dynamicField name=*_f_ni_ns_nm type=floatindexed=false stored=false multiValued=false/ dynamicField name=*_d_i_s_m type=doubleindexed=true stored=true multiValued=true/ dynamicField name=*_d_i_s_nm type=doubleindexed=true stored=true multiValued=false/ dynamicField name=*_d_i_ns_m type=doubleindexed=true stored=false multiValued=true/ dynamicField name=*_d_i_ns_nm type=doubleindexed=true stored=false multiValued=false/ dynamicField name=*_d_ni_s_m type=doubleindexed=false stored=true multiValued=true/ dynamicField name=*_d_ni_s_nm type=doubleindexed=false stored=true multiValued=false/ dynamicField name=*_d_ni_ns_m type=doubleindexed=false stored=false multiValued=true/ dynamicField name=*_d_ni_ns_nm type=doubleindexed=false stored=false multiValued=false/ dynamicField name=*_si_i_s_m type=sintindexed=true stored=true multiValued=true/ dynamicField name=*_si_i_s_nm type=sintindexed=true stored=true multiValued=false/ dynamicField name=*_si_i_ns_m type=sintindexed=true stored=false multiValued=true/ dynamicField name=*_si_i_ns_nm type=sintindexed=true stored=false multiValued=false/ dynamicField name=*_si_ni_s_m type=sintindexed=false stored=true multiValued=true/ dynamicField name=*_si_ni_s_nm type=sintindexed=false stored=true multiValued=false/ dynamicField name=*_si_ni_ns_m type=sintindexed=false stored=false multiValued=true/ dynamicField name=*_si_ni_ns_nm type=sintindexed=false stored=false multiValued=false/ dynamicField name=*_sl_i_s_m type=slongindexed=true stored=true multiValued=true/ dynamicField name=*_sl_i_s_nm type=slongindexed=true stored=true multiValued=false/ dynamicField name=*_sl_i_ns_m type=slongindexed=true stored=false multiValued=true/ dynamicField name=*_sl_i_ns_nm type=slongindexed=true stored=false multiValued=false/ dynamicField name=*_sl_ni_s_m type=slongindexed=false stored=true
use () in the query string
Hello, In my case, my query of id_s_i_s_nm:(om_B00114162K*) returned nothing but query id_s_i_s_nm:om_B00114162K* returned the right result. What's the difference between using () or not. Thanks a lot, Kevin
Re: unique result
It's exactly what I'm looking for. Thank you Grant. - Original Message From: Grant Ingersoll gsing...@apache.org To: solr-user@lucene.apache.org Sent: Thursday, February 26, 2009 6:56:22 AM Subject: Re: unique result I presume these all have different unique ids? If you can address it at indexing time, then have a look at https://issues.apache.org/jira/browse/SOLR-799 Otherwise, you might look at https://issues.apache.org/jira/browse/SOLR-236 On Feb 25, 2009, at 6:54 PM, Cheng Zhang wrote: Is it possible to have Solr to remove duplicated query results? For example, instead of return result name=response numFound=572 start=0 doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmVideo Games/str /doc doc str name=productGroup_t_i_s_nmVideo Games/str /doc /result return: result name=response numFound=572 start=0 doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmVideo Games/str /doc /result Thanks a lot, Kevin -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
unique result
Is it possible to have Solr to remove duplicated query results? For example, instead of return result name=response numFound=572 start=0 doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmVideo Games/str /doc doc str name=productGroup_t_i_s_nmVideo Games/str /doc /result return: result name=response numFound=572 start=0 doc str name=productGroup_t_i_s_nmWireless/str /doc doc str name=productGroup_t_i_s_nmVideo Games/str /doc /result Thanks a lot, Kevin
auto generated document id?
hello, in solr world, is it possible to have the doc id generated automatically? thx a lot, kevin
Re: auto generated document id?
thx. it's exactly what i'm looking for. - Original Message From: Bruno Aranda brunoara...@gmail.com To: solr-user@lucene.apache.org Sent: Sunday, February 22, 2009 12:10:47 PM Subject: Re: auto generated document id? And, as well, you coud use automatically generated UUIDs: http://wiki.apache.org/solr/UniqueKey Cheers, Bruno 2009/2/22 Cheng Zhang zhangyongji...@yahoo.com hello, in solr world, is it possible to have the doc id generated automatically? thx a lot, kevin
Re: score filter
Hi Grant, In my case, for example searching a book. Some of the returned documents are with high relevance (score 3), but some of document with low score (0.01) are useless. Without a score filter, I have to go through each document to find out the number of documents I'm interested (score nnn). This causes some problem for pagination. For example if I only need to display the first 10 records I need to retrieve all 1000 documents to figure out the number of meaningful documents which have score nnn. Thx, Kevin - Original Message From: Grant Ingersoll gsing...@apache.org To: solr-user@lucene.apache.org Sent: Wednesday, February 11, 2009 6:47:11 AM Subject: Re: score filter What's the motivation for wanting to do this? The reason I ask, is score is a relative thing determined by Lucene based on your index statistics. It is only meaningful for comparing the results of a specific query with a specific instance of the index. In other words, it isn't useful to filter on b/c there is no way of knowing what a good cutoff value would be. So, you won't be able to do score:[1.2 TO *] because score is a not an actual Field. That being said, you probably could implement a HitCollector at the Lucene level and somehow hook it into Solr to do what you want. Or, of course, just stop processing the results in your app after you see a score below a certain value. Naturally, this still means you have to retrieve the results. -Grant On Feb 10, 2009, at 10:01 PM, Cheng Zhang wrote: Hello, Is there a way to set a score filter? I tried +score:[1.2 TO *] but it did not work. Many thanks, Kevin -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: score filter
Just did some research. It seems that it's doable with additional code added to Solr but not out of box. Thank you, Grant. - Original Message From: Grant Ingersoll gsing...@apache.org To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Wednesday, February 11, 2009 8:14:01 AM Subject: Re: score filter At what point do you draw the line? 0.01 is too low, but what about 0.5 or 0.3? In fact, there may be queries where 0.01 is relevant. Relevance is a tricky thing and putting in arbitrary cutoffs is usually not a good thing. An alternative might be to instead look at the difference between scores and see if the gap is larger than some delta, but even that is subject to the vagaries of scoring. What kind of relevance testing have you done so far to come up with those values? See also http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Debugging-Relevance-Issues-in-Search/ On Feb 11, 2009, at 10:16, Cheng Zhang zhangyongji...@yahoo.com wrote: Hi Grant, In my case, for example searching a book. Some of the returned documents are with high relevance (score 3), but some of document with low score (0.01) are useless. Without a score filter, I have to go through each document to find out the number of documents I'm interested (score nnn). This causes some problem for pagination. For example if I only need to display the first 10 records I need to retrieve all 1000 documents to figure out the number of meaningful documents which have score nnn. Thx, Kevin - Original Message From: Grant Ingersoll gsing...@apache.org To: solr-user@lucene.apache.org Sent: Wednesday, February 11, 2009 6:47:11 AM Subject: Re: score filter What's the motivation for wanting to do this? The reason I ask, is score is a relative thing determined by Lucene based on your index statistics. It is only meaningful for comparing the results of a specific query with a specific instance of the index. In other words, it isn't useful to filter on b/c there is no way of knowing what a good cutoff value would be. So, you won't be able to do score:[1.2 TO *] because score is a not an actual Field. That being said, you probably could implement a HitCollector at the Lucene level and somehow hook it into Solr to do what you want. Or, of course, just stop processing the results in your app after you see a score below a certain value. Naturally, this still means you have to retrieve the results. -Grant On Feb 10, 2009, at 10:01 PM, Cheng Zhang wrote: Hello, Is there a way to set a score filter? I tried +score:[1.2 TO *] but it did not work. Many thanks, Kevin -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
score filter
Hello, Is there a way to set a score filter? I tried +score:[1.2 TO *] but it did not work. Many thanks, Kevin
Re: Decrease warmupTime
Otis, I did restart the solr server but it may not be enough. I just deleted the tomcat work directory and it works now. No warming anymore. Thanks a lot for your information. -Kevin - Original Message From: Cheng Zhang zhangyongji...@yahoo.com To: solr-user@lucene.apache.org Sent: Friday, February 6, 2009 10:47:38 PM Subject: Re: Decrease warmupTime I did restart the solr server. Here is the config. filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=128/ !-- queryResultCache caches results of searches - ordered lists of document ids (DocList) based on a query, a sort, and the range of documents requested. -- queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ !-- documentCache caches Lucene Document objects (the stored fields for each document). Since Lucene internal document ids are transient, this cache will not be autowarmed. -- documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ Thx. - Original Message From: Otis Gospodnetic otis_gospodne...@yahoo.com To: solr-user@lucene.apache.org Sent: Friday, February 6, 2009 10:40:45 PM Subject: Re: Decrease warmupTime Have you restarted Solr after you made the change? Can you paste your query result cache config? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Cheng Zhang zhangyongji...@yahoo.com To: solr-user@lucene.apache.org Sent: Friday, February 6, 2009 11:04:07 PM Subject: Re: Decrease warmupTime Hi Yonik, I just changed the autowarmCount for queryResultCache but it did not work. In the log, it still shows warmupTime for autowarmCount is about 45 seconds. queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=6,evictions=0,size=6,warmupTime=44055,cumulative_lookups=1,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=1,cumulative_evictions=0} Any other suggestion? Thanks a lot, Kevin - Original Message From: Yonik Seeley ysee...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, February 6, 2009 5:18:47 PM Subject: Re: Decrease warmupTime On Fri, Feb 6, 2009 at 5:12 PM, Cheng Zhang zhangyongji...@yahoo.com wrote: Is there any way to decrease this warmupTime? Go into solrconfig.xml and reduce (or eliminate) the autowarm counts for the caches. -Yonik
Decrease warmupTime
First, I'm new Solr. I have setup a Solr server and added some documents into it. I noticed that as I added more and more docs, the warmupTime became longer and longer. After added 400K docs, I can see the warmupTime is now about 1 minutes. Here is one log entry: queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=6,evictions=0,size=6,warmupTime=56687,cumulative_lookups=2, cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=2,cumulative_evictions=0} If I try to insert more docs before warmupTime ends, I will get exception. Is there any way to decrease this warmupTime? Thanks a lot, Kevin
Re: Decrease warmupTime
Hi Yonik, I just changed the autowarmCount for queryResultCache but it did not work. In the log, it still shows warmupTime for autowarmCount is about 45 seconds. queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=6,evictions=0,size=6,warmupTime=44055,cumulative_lookups=1,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=1,cumulative_evictions=0} Any other suggestion? Thanks a lot, Kevin - Original Message From: Yonik Seeley ysee...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, February 6, 2009 5:18:47 PM Subject: Re: Decrease warmupTime On Fri, Feb 6, 2009 at 5:12 PM, Cheng Zhang zhangyongji...@yahoo.com wrote: Is there any way to decrease this warmupTime? Go into solrconfig.xml and reduce (or eliminate) the autowarm counts for the caches. -Yonik
Re: Decrease warmupTime
I did restart the solr server. Here is the config. filterCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=128/ !-- queryResultCache caches results of searches - ordered lists of document ids (DocList) based on a query, a sort, and the range of documents requested. -- queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ !-- documentCache caches Lucene Document objects (the stored fields for each document). Since Lucene internal document ids are transient, this cache will not be autowarmed. -- documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ Thx. - Original Message From: Otis Gospodnetic otis_gospodne...@yahoo.com To: solr-user@lucene.apache.org Sent: Friday, February 6, 2009 10:40:45 PM Subject: Re: Decrease warmupTime Have you restarted Solr after you made the change? Can you paste your query result cache config? Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch From: Cheng Zhang zhangyongji...@yahoo.com To: solr-user@lucene.apache.org Sent: Friday, February 6, 2009 11:04:07 PM Subject: Re: Decrease warmupTime Hi Yonik, I just changed the autowarmCount for queryResultCache but it did not work. In the log, it still shows warmupTime for autowarmCount is about 45 seconds. queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=6,evictions=0,size=6,warmupTime=44055,cumulative_lookups=1,cumulative_hits=0,cumulative_hitratio=0.00,cumulative_inserts=1,cumulative_evictions=0} Any other suggestion? Thanks a lot, Kevin - Original Message From: Yonik Seeley ysee...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, February 6, 2009 5:18:47 PM Subject: Re: Decrease warmupTime On Fri, Feb 6, 2009 at 5:12 PM, Cheng Zhang zhangyongji...@yahoo.com wrote: Is there any way to decrease this warmupTime? Go into solrconfig.xml and reduce (or eliminate) the autowarm counts for the caches. -Yonik
newbie question --- multiple schemas
Hello, Is it possible to define more than one schema? I'm reading the example schema.xml. It seems that we can only define one schema? What about if I want to define one schema for document type A and another schema for document type B? Thanks a lot, Kevin