Re: date slider
Hi, I'm not sure if this applies to your use case but when I was building our faceted search (see http://www.mysecondhome.co.uk/search.html) at first I wanted to do the same, retrieve the minimum and maximum values but when I did the few values that were a lot higher than the others made it almost impossible to select a reasonable range. That's why I switched to a fixed range of reasonable values with the last option being anything higher. This way the resultset is spread out pretty evenly over the length of the slider. If the values over which you want to do range selection don't vary a lot I think this is the best option, otherwise I guess you'll have to use another solution. Maybe if the values do change a lot but not very often you could generate new fixed range values after updating Solr. If you think something like what I've made is useful to you, I'll be happy to answer any questions about how I implemented this. Regards, gwk On 5/16/2010 10:07 PM, Lukas Kahwe Smith wrote: On 16.05.2010, at 21:01, Ahmet Arslan iori...@yahoo.com wrote: http://wiki.apache.org/solr/StatsComponent can give you min and max values. Sorry my bad, I just tested StatsComponent with tdate field. And it is not working for date typed fields. Wiki says it is for numeric fields. ok thx for checking. is my use case really so unusual? i guess i could store a unix timestamp or i just do a fixed range. hmm if i use facets with a really large gap will it always give me at least the min and max maybe? will try it out when i get home. regards Lukas
Re: date slider
Maybe you would like something like this: lowest value: http://localhost:8983/solr/select?q=*:*rows=1fl=datesort=date%20asc highest value: http://localhost:8983/solr/select?q=*:*rows=1fl=datesort=date%20desc Hope this helps, Péter - Original Message - From: gwk g...@eyefi.nl To: solr-user@lucene.apache.org Sent: Monday, May 17, 2010 11:04 AM Subject: Re: date slider Hi, I'm not sure if this applies to your use case but when I was building our faceted search (see http://www.mysecondhome.co.uk/search.html) at first I wanted to do the same, retrieve the minimum and maximum values but when I did the few values that were a lot higher than the others made it almost impossible to select a reasonable range. That's why I switched to a fixed range of reasonable values with the last option being anything higher. This way the resultset is spread out pretty evenly over the length of the slider. If the values over which you want to do range selection don't vary a lot I think this is the best option, otherwise I guess you'll have to use another solution. Maybe if the values do change a lot but not very often you could generate new fixed range values after updating Solr. If you think something like what I've made is useful to you, I'll be happy to answer any questions about how I implemented this. Regards, gwk On 5/16/2010 10:07 PM, Lukas Kahwe Smith wrote: On 16.05.2010, at 21:01, Ahmet Arslan iori...@yahoo.com wrote: http://wiki.apache.org/solr/StatsComponent can give you min and max values. Sorry my bad, I just tested StatsComponent with tdate field. And it is not working for date typed fields. Wiki says it is for numeric fields. ok thx for checking. is my use case really so unusual? i guess i could store a unix timestamp or i just do a fixed range. hmm if i use facets with a really large gap will it always give me at least the min and max maybe? will try it out when i get home. regards Lukas
Wildcars in Queries
Hi, i'm new to solr. Can I use wilcard like '*' in my queries? Thanx, Robert
Re: Wildcars in Queries
Yes, also you can use '?' for a single character wild card. On Mon, May 17, 2010 at 11:21 AM, Robert Naczinski robert.naczin...@googlemail.com wrote: Hi, i'm new to solr. Can I use wilcard like '*' in my queries? Thanx, Robert
Re: Wildcars in Queries
How I can do that? I that distribute example I'cant use wildcards ;-( 2010/5/17 Leonardo Menezes leonardo.menez...@googlemail.com: Yes, also you can use '?' for a single character wild card. On Mon, May 17, 2010 at 11:21 AM, Robert Naczinski robert.naczin...@googlemail.com wrote: Hi, i'm new to solr. Can I use wilcard like '*' in my queries? Thanx, Robert
Re: Wildcars in Queries
http://wiki.apache.org/solr/SolrQuerySyntax On Mon, May 17, 2010 at 11:44 AM, Robert Naczinski robert.naczin...@googlemail.com wrote: How I can do that? I that distribute example I'cant use wildcards ;-( 2010/5/17 Leonardo Menezes leonardo.menez...@googlemail.com: Yes, also you can use '?' for a single character wild card. On Mon, May 17, 2010 at 11:21 AM, Robert Naczinski robert.naczin...@googlemail.com wrote: Hi, i'm new to solr. Can I use wilcard like '*' in my queries? Thanx, Robert
Direct hits using Solr
Hi, Is there a way to have Solr return a URL that is not part of index. We have a need that search engine return a specific URL for a specific search term and that result is supposed to be the first result (per Biz) among the result set. The URL is an external URL and there is no intent to index contents of that site. any help towards feasibility of this issue is greatly appreciated Thanks, Sai Thumuluri
DIH. behavior after a import. Log, delete table !?
Hello. for my Delta-Import, i get the Id's which are should be updatet from an extra table in my database. ... when dih finished the delta-import it's necessary, that the table with the ID's is to delete. can i put a sql query in the DIH for that issue ? this code should only be send to the database when import was succesfully ... any suggestions ?? thxxx -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p823232.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Direct hits using Solr
We have a need that search engine return a specific URL for a specific search term and that result is supposed to be the first result (per Biz) among the result set. This part seems like http://wiki.apache.org/solr/QueryElevationComponent The URL is an external URL and there is no intent to index contents of that site. Can you explain in more detail? Even if you don't index content of that site, you may have to index that URL.
Re: disable caches in real time
Any suggestions? I have thought in have two configurations per server and reload each one with the appropiated config file but i would prefer another solution if its possible. Thanks, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/5/14 Marco Martinez mmarti...@paradigmatecnologico.com Hi, I want to know if there is any approach to disable caches in a specific core from a multicore server. My situation is the next: I have a multicore server where the core0 will be listen to the queries and other core (core1) that will be replicated from a master server. Once the replication has been done, i will swap the cores. My point is that i want to disable the caches in the core that is in charge of the replication to save memory in the machine. Any suggestions will be appreciated. Thanks in advance, Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42
Re: DIH. behavior after a import. Log, delete table !?
for my Delta-Import, i get the Id's which are should be updatet from an extra table in my database. ... when dih finished the delta-import it's necessary, that the table with the ID's is to delete. can i put a sql query in the DIH for that issue ? deletedPkQuery (sql query) is used in delta-import to delete documents from solr index. Is this what you mean?
Re: DIH. behavior after a import. Log, delete table !?
hm .. no =( i want to delete from a mysql database, not from my solr-index -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p823264.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH. behavior after a import. Log, delete table !?
hm i think i can use deletedPkQuery but it dont works for me, but maybe you can help me. here is my conifg. entity name=item pk=id transformer=script:BoostDoc query=select i.id, i.shop_id, i.is_active, i.shop ... deltaImportQuery=select i.id, i.s ...WHERE ... AND i.id='${dataimporter.delta.update_id} deltaQuery= SELECT update_id FROM solr_imports deletedPkQuery =DELETE FROM solr_imports WHERE solr_imports.update_id='${dataimporter.item.update_id}' so, i only want to delete these ID'S which are updatet. this is my exception: SCHWERWIEGEND: Delta Import Failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: DELETE FROM solr_imports WHERE solr_imports.update_id='' Processing Document # 1 so the deletedPkQuery get no ID's =( -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p823298.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH. behavior after a import. Log, delete table !?
hm i think i can use deletedPkQuery but it dont works for me, but maybe you can help me. here is my conifg. entity name=item pk=id transformer=script:BoostDoc query=select i.id, i.shop_id, i.is_active, i.shop ... deltaImportQuery=select i.id, i.s ...WHERE ... AND i.id='${dataimporter.delta.update_id} deltaQuery= SELECT update_id FROM solr_imports deletedPkQuery =DELETE FROM solr_imports WHERE solr_imports.update_id='${dataimporter.item.update_id}' so, i only want to delete these ID'S which are updatet. this is my exception: SCHWERWIEGEND: Delta Import Failed org.apache.solr.handler.dataimport.DataImportHandlerException: Unable to execute query: DELETE FROM solr_imports WHERE solr_imports.update_id='' Processing Document # 1 so the deletedPkQuery get no ID's =( I am not sure what will happen with this kind of deletedPkQuery. Probably you won't be able to use ${dataimporter.delta.update_id} variable. But i am also curios what will happen. can you try this: deletedPkQuery =DELETE FROM solr_imports WHERE solr_imports.update_id='${dataimporter.delta.update_id}' Since your deltaQuery does not contain WHERE clause, why not delete (with another program or script) solr_imports table after delta-import?
Re: DIH. behavior after a import. Log, delete table !?
thats what i try ! :D i dont want to do this with another script, because i never know when a delta-import is finished, and when he is completed, i dont know with which result. complete, fail, ?!?!? so i thought dih can delete the updated ID's in my database =( i try also to empty the table like this: TRUNCTATE TABLE solr_imports this works, but i get new exception... -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p823413.html Sent from the Solr - User mailing list archive at Nabble.com.
CFP for Lucene Revolution Conference, Boston, MA October 7 8 2010
Lucene Revolution Call For Participation - Boston, Massachusetts October 7 8, 2010 The first US conference dedicated to Apache Lucene and Solr is coming to Boston, October 7 8, 2010. The conference is sponsored by Lucid Imagination with additional support from community and other commercial co‐sponsors. The audience will include those experienced Solr and Lucene application development, along with those experienced in other enterprise search technologies interested becoming more familiar with Solr and Lucene technologies and the opportunities they present. We are soliciting 45‐minute presentations for the conference. Key Dates: May 12, 2010 Call For Participation Open June 23, 2010Call For Participation Closes June 28, 2010Speaker Acceptance/Rejection Notification October 5‐6, 2010 Lucene and Solr Pre‐conference Training Sessions October 7‐8, 2010 Conference Sessions Topics of interest include: Lucene and Solr in the Enterprise (case studies, implementation, return on investment, etc.) “How We Did It” Development Case Studies Spatial/Geo search Lucene and Solr in the Cloud (Deployment cases as well as tutorials) Scalability and Performance Tuning Large Scale Search Real Time Search Data Integration/Data Management Lucene Solr for Mobile Applications All accepted speakers will qualify for discounted conference admission. Financial assistance is available for speakers that qualify. To submit a 45‐minute presentation proposal, please send an email to c...@lucenerevolution.org with Subject containing: your name, Topic your session title containing the following information in plain text. If you have more than one topic proposed, send a separate email. Do not attach Word or other text file documents. Return all fields completed as follows: 1.Your full name, title, and organization 2.Contact information, including your address, email, phone number 3.The name of your proposed session (keep your title simple, interesting, and relevant to the topic) 4.A 75‐200 word overview of your presentation; in addition to the topic, describe whether your presentation is intended as a tutorial, description of an implementation, an theoretical/academic discussion, etc. 5.A 100‐200‐word speaker bio that includes prior conference speaking or related experience To be considered, proposals must be received by 12 Midnight PDT Wednesday, June 23, 2010. Please email any general questions regarding the conference to i...@lucenerevolution.org. To be added to the conference mailing list, please email sig...@lucenerevolution.org. If your organization is interested in sponsorship opportunities, email spon...@lucenerevolution.org. We look forward to seeing you in Boston!
RE: Direct hits using Solr
How do I index an URL without indexing the content? Basically our requirement is that - we have certain search terms for which there need to be a URL that should come right on top. I tried to use elevate option within Solr - but from what I know - I need to have an id of the indexed content for me to elevate a particular URL. Sai Thumuluri -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Monday, May 17, 2010 6:12 AM To: solr-user@lucene.apache.org Subject: Re: Direct hits using Solr We have a need that search engine return a specific URL for a specific search term and that result is supposed to be the first result (per Biz) among the result set. This part seems like http://wiki.apache.org/solr/QueryElevationComponent The URL is an external URL and there is no intent to index contents of that site. Can you explain in more detail? Even if you don't index content of that site, you may have to index that URL.
Customized Solr DataImporter
HI, I want to map my solr-fields using the Customized DataImport Handler For ex: I have a fields called field column=NAME name=field1 / field column=NO name=field2 / Actually my column-names comes dynamically from another table it varies from client to client. instead of giving the Mapped-Db-columns as 'NAME' i want to configure this dynamically using the Customized Import Handler. can i use My Own DataImportHandler To implement this. Please help me . Thanks in advance -- View this message in context: http://lucene.472066.n3.nabble.com/Customized-Solr-DataImporter-tp823556p823556.html Sent from the Solr - User mailing list archive at Nabble.com.
Issues with clustering in multicore
Hi, I was trying out a clustering example. which worked out as mentioned in the document. Now, I want to use the clustering feature in my multicore where i have my core indexes saved. so i edit the solrconfig.xml in tht file to add clustering information (i did make sure that the lib declaration points to the correct location). but when i restart the solrserver for multicore, i get the following exception May 17, 2010 7:17:41 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.clustering.ClusteringComponent' at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:373) at org.apache.solr.core.SolrCore.createInstance(SolrCore.java:413) at org.apache.solr.core.SolrCore.createInitInstance(SolrCore.java:435) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1498) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1492) at org.apache.solr.core.SolrCore.initPlugins(SolrCore.java:1525) at org.apache.solr.core.SolrCore.loadSearchComponents(SolrCore.java:833) at org.apache.solr.core.SolrCore.init(SolrCore.java:551) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:428) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.mortbay.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.servlet.ServletHandler.initialize(ServletHandler.java:594) at org.mortbay.jetty.servlet.Context.startContext(Context.java:139) at org.mortbay.jetty.webapp.WebAppContext.startContext(WebAppContext.java:1218) at org.mortbay.jetty.handler.ContextHandler.doStart(ContextHandler.java:500) at org.mortbay.jetty.webapp.WebAppContext.doStart(WebAppContext.java:448) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.jetty.handler.ContextHandlerCollection.doStart(ContextHandlerCollection.java:161) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerCollection.doStart(HandlerCollection.java:147) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.jetty.handler.HandlerWrapper.doStart(HandlerWrapper.java:117) at org.mortbay.jetty.Server.doStart(Server.java:210) at org.mortbay.component.AbstractLifeCycle.start(AbstractLifeCycle.java:40) at org.mortbay.xml.XmlConfiguration.main(XmlConfiguration.java:929) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.mortbay.start.Main.invokeMain(Main.java:183) at org.mortbay.start.Main.start(Main.java:497) at org.mortbay.start.Main.main(Main.java:115) Caused by: java.lang.ClassNotFoundException: org.apache.solr.handler.clustering.ClusteringComponent at java.net.URLClassLoader$1.run(URLClassLoader.java:200) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:307) at java.net.FactoryURLClassLoader.loadClass(URLClassLoader.java:592) at java.lang.ClassLoader.loadClass(ClassLoader.java:252) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:320) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:247) at org.apache.solr.core.SolrResourceLoader.findClass(SolrResourceLoader.java:357) ... 35 more Any pointers, RegardS, Raakhi
Re: Direct hits using Solr
Sai - this seems to be best built into your application tier above Solr, such that you have a database of special terms and URL mappings and simply present them above the results returned from Solr. Erik http://www.lucidimagination.com On May 17, 2010, at 3:11 PM, sai.thumul...@verizonwireless.com wrote: How do I index an URL without indexing the content? Basically our requirement is that - we have certain search terms for which there need to be a URL that should come right on top. I tried to use elevate option within Solr - but from what I know - I need to have an id of the indexed content for me to elevate a particular URL. Sai Thumuluri -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Monday, May 17, 2010 6:12 AM To: solr-user@lucene.apache.org Subject: Re: Direct hits using Solr We have a need that search engine return a specific URL for a specific search term and that result is supposed to be the first result (per Biz) among the result set. This part seems like http://wiki.apache.org/solr/QueryElevationComponent The URL is an external URL and there is no intent to index contents of that site. Can you explain in more detail? Even if you don't index content of that site, you may have to index that URL.
Re: Solr Search problem; cannot search the existing word in the index content
A couple of things: 1 try searching with debugQuery=on attached to your URL, that'll give you some clues. 2 It's really worthwhile exploring the admin pages for a while, it'll also give you a world of information. It takes a while to understand what the various pages are telling you, but you'll come to rely on them. 3 Are you really searching with leading and trailing wildcards or is that just the mail changing bolding? Because this is tricky, very tricky. Search the mail archives for leading wildcard to see lots of discussion of this topic. You might back off a bit and try building up to wildcards if that's what you're doing HTH Erick On Mon, May 17, 2010 at 1:11 AM, Mint o_O! mint@gmail.com wrote: Hi, I'm working on the index/search project recently and i found solr which is very fascinating to me. I followed the test successful from the tutorial page. Starting up jetty and run adding new xml (user:~/solr/example/exampledocs$ *java -jar post.jar *.xml*) so far so good at this stage. Now i have create my own testing westpac.xml file with real data I intend to implement, putting in exampledocs and again ran the command (user:~/solr/example/exampledocs$ *java -jar post.jar westpac.xml*). Everything went on very well however when i searched for *rhode* which is in the content. And Index returned nothing. Could anyone guide me what I did wrong why i couldn't search for that word even though that word is in my index content. thanks, Mint
Re: Related terms/combined terms
ho it is possible to search with termscomponent and shingle for things likeÖ: Driver Callaway it should be the same suggestion come like when i search for Callaway Dri.. -- View this message in context: http://lucene.472066.n3.nabble.com/Related-terms-combined-terms-tp694083p823749.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: Autosuggest
i have also thought about an autosuggest for our intranet search. one other solution could be: put all the searched queries into a database and do a lookup not on the terms indexed by solr but rather a lookup to what have been searched in the past. we have written a small script, that takes the solr-log, extracts the query, hits co put everything into a mysql-database and then have the autosuggest search again these database entries. markus -Ursprüngliche Nachricht- Von: Blargy [mailto:zman...@hotmail.com] Gesendet: Samstag, 15. Mai 2010 17:45 An: solr-user@lucene.apache.org Betreff: Re: Autosuggest Maybe I should have phrased it as: Is this ready to be used with Solr 1.4? Also, as Grang asked in the thread, what is the actual status of that patch? Thanks again! -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-tp818430p819765.html Sent from the Solr - User mailing list archive at Nabble.com.
Targeting two fields with the same query or one field gathering contents from both ?
Hey, let's say I have : - a field named A with specific contents - a field named B with specific contents - a field named C witch contents only from A and B added with copyField. Are those queries equivalents in terms of performance : - A: (the lazy fox) AND B: (the lazy fox) - C: (the lazy fox) ?? Thanks, Xavier
RE: Direct hits using Solr
How do I index an URL without indexing the content? Basically our requirement is that - we have certain search terms for which there need to be a URL that should come right on top. I tried to use elevate option within Solr - but from what I know - I need to have an id of the indexed content for me to elevate a particular URL. What does your current schema.xml look like? How many URLs do you have? You can add a new field -lets say named URL- to schema.xml and insert those URL with some special uniqueKey. And then you can list those uniqueKeys and keywords into elevate.xml.
Re: Customized Solr DataImporter
HI, I want to map my solr-fields using the Customized DataImport Handler For ex: I have a fields called field column=NAME name=field1 / field column=NO name=field2 / Actually my column-names comes dynamically from another table it varies from client to client. instead of giving the Mapped-Db-columns as 'NAME' i want to configure this dynamically using the Customized Import Handler. can i use My Own DataImportHandler To implement this. Sounds like you can do what you want using Dynamic_fields combined with or without a custom transformer. http://wiki.apache.org/solr/SchemaXml#Dynamic_fields http://wiki.apache.org/solr/DIHCustomTransformer
Re: Targeting two fields with the same query or one field gathering contents from both ?
Le 17/05/2010 16:57, Xavier Schepler a écrit : Hey, let's say I have : - a field named A with specific contents - a field named B with specific contents - a field named C witch contents only from A and B added with copyField. Are those queries equivalents in terms of performance : - A: (the lazy fox) AND B: (the lazy fox) - C: (the lazy fox) ?? Thanks, Xavier I made some tests and it appears than the second query is much faster than the first ...
Re: Targeting two fields with the same query or one field gathering contents from both ?
No, the equivalent for this will be: - A: (the lazy fox) *OR* B: (the lazy fox) - C: (the lazy fox) Imagine the situation that you dont have in B 'the lazy fox', with the AND you get 0 results although you have 'the lazy fox' in A and C Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/5/17 Xavier Schepler xavier.schep...@sciences-po.fr Hey, let's say I have : - a field named A with specific contents - a field named B with specific contents - a field named C witch contents only from A and B added with copyField. Are those queries equivalents in terms of performance : - A: (the lazy fox) AND B: (the lazy fox) - C: (the lazy fox) ?? Thanks, Xavier
Re: Targeting two fields with the same query or one field gathering contents from both ?
Le 17/05/2010 17:49, Marco Martinez a écrit : No, the equivalent for this will be: - A: (the lazy fox) *OR* B: (the lazy fox) - C: (the lazy fox) Imagine the situation that you dont have in B 'the lazy fox', with the AND you get 0 results although you have 'the lazy fox' in A and C Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª Planta 28224 Pozuelo de Alarcón Tel.: 91 352 59 42 2010/5/17 Xavier Scheplerxavier.schep...@sciences-po.fr Hey, let's say I have : - a field named A with specific contents - a field named B with specific contents - a field named C witch contents only from A and B added with copyField. Are those queries equivalents in terms of performance : - A: (the lazy fox) AND B: (the lazy fox) - C: (the lazy fox) ?? Thanks, Xavier yes you're right I figured it after posting.
Re: DIH. behavior after a import. Log, delete table !?
thats what i try ! :D i dont want to do this with another script, because i never know when a delta-import is finished, and when he is completed, i dont know with which result. complete, fail, ?!?!? If you are updating your index *only* with DIH, after every full/delta import commit and optimize occurs by default. And you can auto-run your special code after every optimize/commit. http://wiki.apache.org/solr/SolrConfigXml#A.22Update.22_Related_Event_Listeners solrconfig.xml: listener event=postCommit class=solr.RunExecutableListener str name=exesolr/bin/test/str str name=dir./str bool name=waittrue/bool /listener test file: #!/bin/bash java -jar /home/search/junk.jar
Re: How to tell which field matched?
In our case, we had specific matching that we needed to return, so I can't really contribute this to the code base, but we did get this working. Basically, we have a custom request handler. After it receives the search results, we then send this to our matcher algorithm. We then go through each document in the doc list. Based on the field type we are looking at, we send our input data through the correct analyzer and come up with a TokenStream. And then for each document, we also send each value in the field (for multivalued) through that field's analyzer to also produce a TokenStream. Each TokenStream was also sent into a multi-valued HashMap with starting position as the key. We then step through each position to find matches. We use some other hash lists as well to make it more efficient so that we are only analyzing the same data once. In our case, we were just looking for score of how similar the index and input data were as well as some other information that was specific to our application. So, it is not necessarily how Solr/Lucene determined a match. But, it provided what we needed for our case. And in fact, we did not want exactly how the search results were created. And then we return the NamedList similar to how the highlighter or debug works. One warning is that this is a very doable problem, but is definitely not trivial to implement, depending on your specific requirements. From: Jon Baer jonb...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, May 15, 2010 8:56:57 AM Subject: Re: How to tell which field matched? Sorry my response wasn't to actually use debugQuery on for production it was more of wondering if it (the component) gave you the insight data you were looking for, on a side note Im also interested in this type of component because there are a number of projects I have worked on recently where it seems people outside of tuning the index want to know why did my query match these results? in some sort of ~plain english explanation~. I have the feeling what you want is possible it's just not finding it's way into the result set yet (guess) or needs a plugin. - Jon On May 15, 2010, at 11:16 AM, Tim Garton wrote: Additionally, I don't think this gets us what we want with multiValued fields. It tells if a multiValued field matched, but not which value out of the multiple values matched. I am beginning to suspect that this information can't be returned and we may have to restructure our schema. -Tim On Sat, May 15, 2010 at 7:12 AM, Sascha Szott sz...@zib.de wrote: Hi, I'm not sure if debugQuery=on is a feasible solution in a productive environment, as generating such extra information requires a reasonable amount of computation. -Sascha Jon Baer wrote: Does the standard debug component (?debugQuery=on) give you what you need? http://wiki.apache.org/solr/SolrRelevancyFAQ#Why_does_id:archangel_come_before_id:hawkgirl_when_querying_for_.22wings.22 - Jon On May 14, 2010, at 4:03 PM, Tim Garton wrote: All, I've searched around for help with something we are trying to do and haven't come across much. We are running solr 1.4. Here is a summary of the issue we are facing: A simplified example of our schema is something like this: field name=id type=string indexed=true stored=true required=true / field name=title type=text indexed=true stored=true required=true / field name=date_posted type=tdate indexed=true stored=true / field name=supplement_title type=text indexed=true stored=true multiValued=true / field name=supplement_pdf_url type=text indexed=true stored=true multiValued=true / field name=supplement_pdf_text type=text indexed=true stored=true multiValued=true / When someone does a search we search across the title, supplement_title, and supplement_pdf_text fields. When we get our results, we would like to be able to tell which field the search matched and if it's a multiValued field, which of the multiple values matched. This is so that we can display results similar to: Example Title Example Supplement Title Example Supplement Title 2 (your search matched this document) Example Supplement Title 3 Example Title 2 Example Supplement Title 4 Example Supplement Title 5 Example Supplement Title 6 (your search matched this document) etc. How would you recommend doing this? Is there some way to get solr to tell us which field matched, including multiValued fields? As a workaround we have been using highlighting to tell which field matched, but it doesn't get us what we want for multiValued fields and there is a significant cost to enabling the highlighting. Should we design our schema in some other fashion to achieve these results? Thanks. -Tim
RE: Wildcars in Queries
Yes you can use. But be careful with such Queries like *ababa, (might will blow up) Also it depends on how you are anlysing the fields? Ankit -Original Message- From: Robert Naczinski [mailto:robert.naczin...@googlemail.com] Sent: Monday, May 17, 2010 5:22 AM To: solr-user@lucene.apache.org Subject: Wildcars in Queries Hi, i'm new to solr. Can I use wilcard like '*' in my queries? Thanx, Robert
RE: Direct hits using Solr
Thank you Erik, I will follow this route Sai Thumuluri -Original Message- From: Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Monday, May 17, 2010 10:22 AM To: solr-user@lucene.apache.org Subject: Re: Direct hits using Solr Sai - this seems to be best built into your application tier above Solr, such that you have a database of special terms and URL mappings and simply present them above the results returned from Solr. Erik http://www.lucidimagination.com On May 17, 2010, at 3:11 PM, sai.thumul...@verizonwireless.com wrote: How do I index an URL without indexing the content? Basically our requirement is that - we have certain search terms for which there need to be a URL that should come right on top. I tried to use elevate option within Solr - but from what I know - I need to have an id of the indexed content for me to elevate a particular URL. Sai Thumuluri -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Monday, May 17, 2010 6:12 AM To: solr-user@lucene.apache.org Subject: Re: Direct hits using Solr We have a need that search engine return a specific URL for a specific search term and that result is supposed to be the first result (per Biz) among the result set. This part seems like http://wiki.apache.org/solr/QueryElevationComponent The URL is an external URL and there is no intent to index contents of that site. Can you explain in more detail? Even if you don't index content of that site, you may have to index that URL.
Date faceting and memory leaks
I have been running load testing using JMeter on a Solr 1.4 index with ~4 million docs. I notice a steady JVM heap size increase as I iterator 100 query terms a number of times against the index. The GC does not seems to claim the heap after the test run is completed. It will run into OutOfMemory as I repeat the test or increase the number of threads/users. The date facet queries are specified as following (as part of append section in request handler): lst name=appends str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY TO *]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-90DAY TO NOW-30DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-180DAY TO NOW-90DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-365DAY TO NOW-180DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-730DAY TO NOW-365DAY]/str str name=facet.query{!ex=last_modified}last_modified:[* TO NOW-730DAY]/str /lst The last_modified field is a TrieDateField with a precisionStep of 6. I have played for filterCache setting but does not have any effects as the date field cache seems be managed by Lucene FieldCahce. Please help as I can be struggling with this for days. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p824372.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Date faceting and memory leaks
What garbage collection settings are you running at the command line when starting Solr? On May 17, 2010, at 2:41 PM, Yao wrote: I have been running load testing using JMeter on a Solr 1.4 index with ~4 million docs. I notice a steady JVM heap size increase as I iterator 100 query terms a number of times against the index. The GC does not seems to claim the heap after the test run is completed. It will run into OutOfMemory as I repeat the test or increase the number of threads/users. The date facet queries are specified as following (as part of append section in request handler): lst name=appends str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY TO *]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-90DAY TO NOW-30DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-180DAY TO NOW-90DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-365DAY TO NOW-180DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-730DAY TO NOW-365DAY]/str str name=facet.query{!ex=last_modified}last_modified:[* TO NOW-730DAY]/str /lst The last_modified field is a TrieDateField with a precisionStep of 6. I have played for filterCache setting but does not have any effects as the date field cache seems be managed by Lucene FieldCahce. Please help as I can be struggling with this for days. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p824372.html Sent from the Solr - User mailing list archive at Nabble.com. --- Antonio Lobato Symplicity Corporation www.symplicity.com (703) 351-0200 x 8101 alob...@symplicity.com
RE: Date faceting and memory leaks
I do not have any GC specific setting in command line. I had tried to force GC collection via Jconsole at the end of the run but it didn't seems to do anything the heap size. -Yao -Original Message- From: Antonio Lobato [mailto:alob...@symplicity.com] Sent: Monday, May 17, 2010 2:44 PM To: solr-user@lucene.apache.org Subject: Re: Date faceting and memory leaks What garbage collection settings are you running at the command line when starting Solr? On May 17, 2010, at 2:41 PM, Yao wrote: I have been running load testing using JMeter on a Solr 1.4 index with ~4 million docs. I notice a steady JVM heap size increase as I iterator 100 query terms a number of times against the index. The GC does not seems to claim the heap after the test run is completed. It will run into OutOfMemory as I repeat the test or increase the number of threads/users. The date facet queries are specified as following (as part of append section in request handler): lst name=appends str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY TO *]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-90DAY TO NOW-30DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-180DAY TO NOW-90DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-365DAY TO NOW-180DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-730DAY TO NOW-365DAY]/str str name=facet.query{!ex=last_modified}last_modified:[* TO NOW-730DAY]/str /lst The last_modified field is a TrieDateField with a precisionStep of 6. I have played for filterCache setting but does not have any effects as the date field cache seems be managed by Lucene FieldCahce. Please help as I can be struggling with this for days. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp8243 72p824372.html Sent from the Solr - User mailing list archive at Nabble.com. --- Antonio Lobato Symplicity Corporation www.symplicity.com (703) 351-0200 x 8101 alob...@symplicity.com
Re: Date faceting and memory leaks
I have ~50 million docs, and use the follow lines without any issues: -XX:MaxNewSize=24m -XX:NewSize=24m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC Perhaps try them out? On May 17, 2010, at 2:47 PM, Ge, Yao (Y.) wrote: I do not have any GC specific setting in command line. I had tried to force GC collection via Jconsole at the end of the run but it didn't seems to do anything the heap size. -Yao -Original Message- From: Antonio Lobato [mailto:alob...@symplicity.com] Sent: Monday, May 17, 2010 2:44 PM To: solr-user@lucene.apache.org Subject: Re: Date faceting and memory leaks What garbage collection settings are you running at the command line when starting Solr? On May 17, 2010, at 2:41 PM, Yao wrote: I have been running load testing using JMeter on a Solr 1.4 index with ~4 million docs. I notice a steady JVM heap size increase as I iterator 100 query terms a number of times against the index. The GC does not seems to claim the heap after the test run is completed. It will run into OutOfMemory as I repeat the test or increase the number of threads/users. The date facet queries are specified as following (as part of append section in request handler): lst name=appends str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY TO *]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-90DAY TO NOW-30DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-180DAY TO NOW-90DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-365DAY TO NOW-180DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-730DAY TO NOW-365DAY]/str str name=facet.query{!ex=last_modified}last_modified:[* TO NOW-730DAY]/str /lst The last_modified field is a TrieDateField with a precisionStep of 6. I have played for filterCache setting but does not have any effects as the date field cache seems be managed by Lucene FieldCahce. Please help as I can be struggling with this for days. Thanks in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp8243 72p824372.html Sent from the Solr - User mailing list archive at Nabble.com. --- Antonio Lobato Symplicity Corporation www.symplicity.com (703) 351-0200 x 8101 alob...@symplicity.com --- Antonio Lobato Symplicity Corporation www.symplicity.com (703) 351-0200 x 8101 alob...@symplicity.com
Re: DIH. behavior after a import. Log, delete table !?
oh, nice. so i can me make a jar-file with the query i need and in solrconfig.xml i need to define this.. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-behavior-after-a-import-Log-delete-table-tp823232p824484.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: StackOverflowError during Delta-Import
Is there anymore information I can post so someone can give me a clue on whats happening? -- View this message in context: http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-tp811053p824516.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Date faceting and memory leaks
No I still have the OOM issue with repeated facet query request on the date field. I forgot to mention that I am running 64-bit IBM 1.5 JVM. I also tried the Sun 1.6 JVM with and without your GC arguments. The GC pattern is different but the heap size does not drop as the test going on. I tested with a single thread from Jmeter just to make sure there is ample room for GC to clean house. The jmeter fires request one after another without pause but I assume it should not effect GC. It is clear to me that date facet query has some major impact on this as I can run the load test with other field facets with no problem (JVM heap size would stabilize at certain level over time). -- View this message in context: http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p824577.html Sent from the Solr - User mailing list archive at Nabble.com.
shards design/customization coding question
We have a large index, separated into multiple shards, that consists of records exported from a database. One requirement is to support near real-time synchronization with the database. To accomplish this we are considering creating a daily shard where create and update documents (records never get deleted) will be posted and at the end of the day, empty the daily shard into the other shards and start afresh the next day. The problem with this approach is when an existing database record is updated into the daily shard, then the daily shard contains an updated document that has a duplicate id with another shard. It is my understanding that in the case of duplicate document ids returned from multiple shards, the document returned first will be returned in the search results and the other duplicate document ids will be discarded. My question is where can I customize the solr code to specify that documents from a particular shard should be given precedence in the search results. Any pointers would be very much appreciated. _ Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. http://www.windowslive.com/campaign/thenewbusy?ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_1
SOLR-788
I am looking at SOLR-788, trying to apply it to latest trunk. It looks like that's going to require some rework, because the included constant PURPOSE_GET_MLT_RESULTS conflicts with something added later, PURPOSE_GET_TERMS. How hard would it be to rework this to apply correctly to trunk? Is it simply a matter of advancing the constant to the next bit in the mask? There's been no discussion on the issue as to whether the original patch or the alternate one is better. Does anyone know? Thanks, Shawn
Re: StackOverflowError during Delta-Import
I just found out if I remove my deletedPkQuery then the import will work. Is it possible that the there is some conflict between my delta indexing and my delta deleting? Any suggestions? -- View this message in context: http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-tp811053p824780.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Date faceting and memory leaks
: Subject: Date faceting and memory leaks First off, just to be clear, you don't seem to be useing the date faceting feature, you are using the Facet Query feature, your queries just so happen to be on a date field. Second: to help people help you, you need to provide all the details. you've shown us the appends section of your request handler config, but you havne't given us any other details about the queries -- what does the *full* configuration look like for this handler? what do all the test urls look like? etc... You also haven't given us any other details about your solr setup. in particularly, knowing what your cache configurations look like is crucial. : I have been running load testing using JMeter on a Solr 1.4 index with ~4 : million docs. I notice a steady JVM heap size increase as I iterator 100 : query terms a number of times against the index. The GC does not seems to : claim the heap after the test run is completed. It will run into OutOfMemory Third: how *exactly* are you measuring/monitoring heap size ? ... you won't neccessarily see the Heap decrease in size, even after GC. Forth: what do you cache sizes (and cache hit rates look like before/during/after your test run? I ask aout this specificly because the queries you have configured don't do any date rounding, which means Solr will attempt to cache a differnet range query for each of your hard coded facet.query ranges every millisecond that it recieves a request... : str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY TO : *]/str ...so you might want to consider changing those to things like... str name=facet.query{!ex=last_modified}last_modified:[NOW/DAY-90DAY TO NOW/DAY-30DAY]/str ...if what you care about is day precision. presumably in your requests you have an fq that is taged with the name last_modified ? (see what i mean about needing all the details, i'm just guessing here based on what i know) ... you'll want that to round down to the start of the day as well. These unique queries for every millisecond could easily explain getting an OOM if your filterCache is very large (since i don't know how big your filterCache is, or what kind of cache hit rates you are getting, i can only guess) : I have played for filterCache setting but does not have any effects as the : date field cache seems be managed by Lucene FieldCahce. no. a fieldCache is created for each field as needed (mainly for sorting, and in some cases for field term faceting) but for facet.querys like these (and for hte corrisponding fqs) an entry in the filterCache is created for each unique query. -Hoss
Re: grouping in fq
: Wait. If the default op is OR, I thought this query: : : (+category:xyz +price:[100 TO *]) -category:xyz : : meant with xyz and range, OR without xyz because without a plus or Nope. regardless of hte default op, you've got a BooleanQuery with two clauses, one of which is negative. the other clauses is either mandatory because the default op says it should be, or it's mandatory because it's the only SHOULD clause and there are not MUST clauses. consider it written out a little more simply... (+A +B) -A ...the params arround the A and B clauses make them a BooleanQuery which we can call X... X -A ...and now hopefully it's clear: A is prohibited, and since there aren't any mandatory clauses, and there is only one optional clause, that optional clause (X) is now mandatory ... Since X = (+A +B), that means (+A +B) is mandatory. so we get no matches because we can't match A and -A at the same time. : minus, OR really means SHOULD (which, bizzarely, is not a keyword). (yeah, it anoyes me that there is no prefix markup for SHOULD ... it wouldn't be so bad except thta if you change the default op to MUST there is no way of expression whole families of queries .. that's why i never recomend making the default op MUST) -Hoss
Which Solr to use?
I've been investigating Solr on and off as a (or even the) search solution for my employer's content management solution. One of the biggest questions in my mind at this point is which version to go with. In general, 1.4 would seem the obvious choice, as it's the only released version on that list. There's a commercially supported distro from Lucid, and things should presumably be pretty stable. What led me down the rabbit hole is that a) we generally have quite a lot of business documents to index (Word and PDF, mostly), and b) the pull approach implemented in the DataImportHandler is much more attractive in our architecture than the push model we'd otherwise have to contruct. Unfortunately, the TikaEntityProcessor and the binary data sources on which it depends were added after 1.4 was released. Back in early March, I was able to get things up and running with a 1.5 nightly (and Tika 0.7-snapshot), but since then the course of Solr development has... changed significantly. The 1.5 branch has been abandoned, and (to my uninformed eye) it seems that there's a lot of upheaval in the trunk as things merge with Lucene. And it also appears that the released Tika 0.7 might not be compatible with Solr? (Judging by SOLR-1902, that is.) What I'm looking for is some advice on what course to pursue: - Plunge ahead with the trunk, and hope that things stabilize by a few months from now, when we'd be hoping to go live on one of our biggest client sites. - Go with the last 1.5 code, knowing that the features we want are in there, and hope we don't run into anything majorly broken. - Stick with 1.4, and just accept the necessity of needing to push content to the HTTP interface. I don't expect a definitive answer, of course, but I'd like to be better informed about the risks and benefits. Also: does anyone have a sense whether it'd be possible to back-port the TikaEntityProcessor stuff to 1.4? Sixten
Re: synonyms not working with copyfield
: fields during indexing. However, my search interface is just a text : box like Google and I need to take the query and return only those : documents that match ALL terms in the query and if I am going to take as mentioned previously in this thread: this is exactly what the dismax QParser was designed for. -Hoss
Re: Date faceting and memory leaks
Chris, Thanks for the detailed response. No I am not using Date Facet but Facet Query as for facet display. Here is the full configuration of my dismax query handler: requestHandler name=dismax class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=echoParamsexplicit/str float name=tie0.01/float str name=qf title text^0.5 domain^0.1 nature^0.1 author /str str name=pf title text /str str name=bf recip(ms(NOW,last_modified),3.16e-11,1,1) /str str name=fl url,title,domain,nature,src,last_modified,text,sz /str str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps100/int str name=q.alt*:*/str !-- example highlighter config, enable per-query with hl=true -- str name=hlon/str str name=hl.fltitle,text/str !-- for this field, we want no fragmenting, just highlighting -- str name=f.title.hl.fragsize0/str str name=f.text.hl.snippets3/str !-- instructs Solr to return the field itself if no query terms are found -- str name=f.text.hl.alternateFieldtext/str str name=f.text.h1.maxAlternateFieldLength400/str str name=f.text.hl.fragmenterregex/str !-- defined below -- /lst lst name=appends str name=facet.field{!ex=src}src/str str name=facet.field{!ex=domain}domain/str str name=facet.field{!ex=nature}nature/str str name=facet.query{!ex=last_modified}last_modified:[NOW-30DAY TO *]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-90DAY TO NOW-30DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-180DAY TO NOW-90DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-365DAY TO NOW-180DAY]/str str name=facet.query{!ex=last_modified}last_modified:[NOW-730DAY TO NOW-365DAY]/str str name=facet.query{!ex=last_modified}last_modified:[* TO NOW-730DAY]/str /lst /requestHandler Cache settings: filterCache class=solr.LRUCache size=1512000 initialSize=1512000 autowarmCount=1280/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=32/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ I am monitoring Solr JVM Heap Memory Usage via remote Jconsole, the image below shows how heap size keep increasing as more facet query requests being sent the Solr via JMeter: http://n3.nabble.com/file/n825038/memory-1.jpg The following is the request URL pattern: select?rows=0facet=truefacet.mincount=1facet.method=enumq=${query}qt=dismax where ${query} is selected randomly from a list of 100 query terms The date rounding suggest is a very good one, I will need to rerun the test and report back on the cache setting. I remember my filterCache hit ratio is around 0.7. I did use the tagged results for multi-select display of facet values but in this case there is no fq in the load test request URL. Thanks again and I will report back on the re-run with date rounding. -- View this message in context: http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p825038.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Date faceting and memory leaks
: Cache settings: : filterCache class=solr.LRUCache size=1512000 initialSize=1512000 : autowarmCount=1280/ that's a monster filterCache ...i can easly imagine it causing an OOM if your heap is only 5G. : The date rounding suggest is a very good one, I will need to rerun the test : and report back on the cache setting. I remember my filterCache hit ratio is : around 0.7. I did use the tagged results for multi-select display of facet a hit ratio or 0.7 ratio, or 0.7% hit rate? ... with that many unique facet queries, i can't imaging you were getting a 70% hit rate. I'm betting if you monitor that filterCache size and hit rate as you run your test you'll see it just grow and grow until the OOM. and if you analyze the heap dumps you'll probably see the cache hanging on to a ton of DocSets that will never be used again. : values but in this case there is no fq in the load test request URL. I've never tested this, so i can't say for sure, but if it turns out that the filterCache is not your problem, then perhaps there is soemthing wonky with the filterquery exclusion code in cases like this -- where you explicilty exlucde a taged fq but that fq doesn't exist. the qya to rule it out would be to remove the exlcusion from your configs and test it that way to see if the behavior is hte same. -Hoss
Re: Date faceting and memory leaks
Chris, Just completed the re-run and your date rounding tip saved my day. I now realized the NOW as a timestamp is a very bad idea for query caching as it is never the same in value. NOW/DAY would at least makes a set facet queries caches re-usable for a period of time. It turns on you can help with your insight with just the little fraction of information provided. Thanks again! -Yao -- View this message in context: http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p825059.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Date faceting and memory leaks
Just to close the loop. I was fooling around the all the cache setting trying to figure out my problem, so the filterCache is set as part of the experiments. It did not cause any memory issue in this case. After the date rounding adjustment, I re-ran the query with 15 threads with 6000 request and got 1,500/minute throughput by only using a little more than 0.5 GB of Heap Memory. The hit ratio reported in Solr admin statistics page shows filterCache has a hitratio of 0.99. with 103800 lookups and 103773 hits, I assume it is 99%. Have a nice day. -Yao From: Chris Hostetter-3 [via Lucene] [mailto:ml-node+825052-1711725506-201...@n3.nabble.com] Sent: Monday, May 17, 2010 9:04 PM To: Ge, Yao (Y.) Subject: Re: Date faceting and memory leaks : Cache settings: : filterCache class=solr.LRUCache size=1512000 initialSize=1512000 : autowarmCount=1280/ that's a monster filterCache ...i can easly imagine it causing an OOM if your heap is only 5G. : The date rounding suggest is a very good one, I will need to rerun the test : and report back on the cache setting. I remember my filterCache hit ratio is : around 0.7. I did use the tagged results for multi-select display of facet a hit ratio or 0.7 ratio, or 0.7% hit rate? ... with that many unique facet queries, i can't imaging you were getting a 70% hit rate. I'm betting if you monitor that filterCache size and hit rate as you run your test you'll see it just grow and grow until the OOM. and if you analyze the heap dumps you'll probably see the cache hanging on to a ton of DocSets that will never be used again. : values but in this case there is no fq in the load test request URL. I've never tested this, so i can't say for sure, but if it turns out that the filterCache is not your problem, then perhaps there is soemthing wonky with the filterquery exclusion code in cases like this -- where you explicilty exlucde a taged fq but that fq doesn't exist. the qya to rule it out would be to remove the exlcusion from your configs and test it that way to see if the behavior is hte same. -Hoss View message @ http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp8243 72p825052.html To unsubscribe from Re: Date faceting and memory leaks, click here (link removed) WdlQGZvcmQuY29tfDgyNTAzOHwxNjYwNDQ2MTQ1 . -- View this message in context: http://lucene.472066.n3.nabble.com/Date-faceting-and-memory-leaks-tp824372p825086.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: shards design/customization coding question
On 5/17/2010 2:40 PM, D C wrote: We have a large index, separated into multiple shards, that consists of records exported from a database. One requirement is to support near real-time synchronization with the database. To accomplish this we are considering creating a daily shard where create and update documents (records never get deleted) will be posted and at the end of the day, empty the daily shard into the other shards and start afresh the next day. snip My question is where can I customize the solr code to specify that documents from a particular shard should be given precedence in the search results. Any pointers would be very much appreciated. Quick answer: SOLR-1537. https://issues.apache.org/jira/browse/SOLR-1537 Long answer begins with this: You probably don't need it. This is exactly how we've got our system arranged, which has only been in production for a few weeks now. There are six static shards that contain all but the newest content. Another shard, which we call the incremental, holds the most recent data, currently three weeks. The incremental shard gets updated every two minutes and optimized once an hour. Deletes are run against all of the shards every ten minutes. To avoid unnecessary cache warming, the delete script checks for the presence of the deleted data before actually running the update. Once a night, the incremental index is trimmed to three weeks, with that data being distributed among the other shards, and one static shard gets optimized. We have two unique identifiers in the database for each document. One is an autoincrement field we call did, for document ID. This is the primary key in the database table, but is used only behind the scenes. The other is tag_id, which is the field that a user sees and is the uniqueKey in Solr. When a document is updated, its did will change, but its tag_id will not. Deletes from Solr's perspective are handled by did, not tag_id, and when a document is updated, we treat the old did like any other delete. The new document gets added to our incremental shard very quickly, and a little bit later, the old one is deleted from the static shard that contains it. The incremental shard is much smaller than the others, so it responds a lot faster. This means that there's a significant likelihood that it will always take precedence. For reliability reasons in the event of a hardware problem, we did incorporate the patch from SOLR-1537 into our system, which in addition to keeping the index up when a shard goes away, makes the deduplication order explicit. If you go the route you are planning, it is unlikely you'll need this. I have since added load balancing to my setup, so when we upgrade SOLR, this patch will no longer be used. In the absence of a second identifier and SOLR-1537, you could get more deterministic behavior by using the delete mechanism in a slightly different way from mine - add it to your daily/incremental index, then find it in the other shards and delete it. It will mean a cache rewarm when the delete is committed, and I don't know if that will cause problems for your setup. Thanks, Shawn
Re: Solr Search problem; cannot search the existing word in the index content
backslash*rhode \*rhode may work. On Mon, May 17, 2010 at 7:23 AM, Erick Erickson erickerick...@gmail.com wrote: A couple of things: 1 try searching with debugQuery=on attached to your URL, that'll give you some clues. 2 It's really worthwhile exploring the admin pages for a while, it'll also give you a world of information. It takes a while to understand what the various pages are telling you, but you'll come to rely on them. 3 Are you really searching with leading and trailing wildcards or is that just the mail changing bolding? Because this is tricky, very tricky. Search the mail archives for leading wildcard to see lots of discussion of this topic. You might back off a bit and try building up to wildcards if that's what you're doing HTH Erick On Mon, May 17, 2010 at 1:11 AM, Mint o_O! mint@gmail.com wrote: Hi, I'm working on the index/search project recently and i found solr which is very fascinating to me. I followed the test successful from the tutorial page. Starting up jetty and run adding new xml (user:~/solr/example/exampledocs$ *java -jar post.jar *.xml*) so far so good at this stage. Now i have create my own testing westpac.xml file with real data I intend to implement, putting in exampledocs and again ran the command (user:~/solr/example/exampledocs$ *java -jar post.jar westpac.xml*). Everything went on very well however when i searched for *rhode* which is in the content. And Index returned nothing. Could anyone guide me what I did wrong why i couldn't search for that word even though that word is in my index content. thanks, Mint -- Lance Norskog goks...@gmail.com
Re: Customized Solr DataImporter
Thanks for the reply, I dont know what pattern the user will configure the columns in a separate table.i have to read this table to map the solr-fields to these columns ,so i cant give dynamic fields also,and Transformers also seems to be no use in this case. Please provide me any other solution -- View this message in context: http://lucene.472066.n3.nabble.com/Customized-Solr-DataImporter-tp823556p825428.html Sent from the Solr - User mailing list archive at Nabble.com.