SEVERE: SolrIndexWriter was not closed prior to finalize
Hi, I am getting the following two error in my solr log file, SEVERE: SolrIndexWriter was not closed prior to finalize(), indicates a bug -- POSSIBLE RESOURCE LEAK!!! and SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/home/solr/simplify360/multicore/tw/data/index/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1097) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:83) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:102) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:174) at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:222) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:61) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:147) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:77) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:55) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1360) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) What may be causing this I am not able to figure out. Regards, Rohit
How to boost a querystring at query time
i want to enable boosting for query and search results. My dismax queryHandlerConfiguration is as: *requestHandler name=dismax class=solr.DisMaxRequestHandler lst name=defaults str name=echoParamsexplicit/str str name=defTypedismax/str float name=tie0.01/float str name=qf text^0.5 name^1.0 description^1.5 /str str name=fl UID_PK,name,price,description,score /str str name=mm 2lt;-1 5lt;-2 6lt;90% /str int name=ps100/int str name=q.alt*:*/str str name=f.name.hl.fragsize0/str str name=f.name.hl.alternateFieldname/str str name=f.text.hl.fragmenterregex/str /lst /requestHandler* When querystring is q=gold^2.0 ring(boost gold) and qt=standard i got the results for gold ring and when qt=dismax i got no result why so please explain - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-boost-a-querystring-at-query-time-tp3139800p3139800.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: configure dismax requesthandlar for boost a field
Add score to the fl parameter. fl=*,score On 7/4/11 11:09 PM, Romi romijain3...@gmail.com wrote: I am not returning score for the queries. as i suppose it should be reflected in search results. means doc having query string in description field come higher than the doc having query string in name field. And yes i restarted solr after making changes in configuration. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/configure-dismax-requesthandlar-for-boo st-a-field-tp3137239p3139680.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: configure dismax requesthandlar for boost a field
will merely adding fl=score make difference in search results, i mean will i get desired results now??? - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/configure-dismax-requesthandlar-for-boost-a-field-tp3137239p3139814.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How many fields can SOLR handle?
Hi, I know i can add components to my requesthandler. In this situation facets are dependent of there category. So if a user choose for the category TV: Inch: 32 inch(5) 34 inch(3) 40 inch(1) Resolution: Full HD(5) HD ready(2) When a user search for category Computer: CPU: Intel(12) AMD(10) GPU: Ati(5) Nvidia(2) So i can't put it in my requesthandler as default. Every search there can be other facets. Do you understand what i mean? -- View this message in context: http://lucene.472066.n3.nabble.com/How-many-fields-can-SOLR-handle-tp3033910p3139833.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exception when using result grouping and sorting by geodist() with Solr 3.3
Did you add: fq={!geofilt} ?? On 7/3/11 11:14 AM, Thomas Heigl tho...@umschalt.com wrote: Hello, I just tried up(down?)grading our current Solr 4.0 trunk setup to Solr 3.3.0 as result grouping was the only reason for us to stay with the trunk. Everything worked like a charm except for one of our queries, where we group results by the owning user and sort by distance. A simplified example for my query (that still fails) looks like this: q=*:*group=truegroup.field=user.uniqueId_sgroup.main=truegroup.format= groupedsfield=user.location_ppt=48.20927,16.3728sort=geodist() asc The exception thrown is: Caused by: org.apache.solr.common.SolrException: Unweighted use of sort geodist(latlon(user.location_p),48.20927,16.3728) at org.apache.solr.search.function.ValueSource$1.newComparator(ValueSource.j ava:106) at org.apache.lucene.search.SortField.getComparator(SortField.java:413) at org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.ini t(AbstractFirstPassGroupingCollector.java:81) at org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.init(T ermFirstPassGroupingCollector.java:56) at org.apache.solr.search.Grouping$CommandField.createFirstPassCollector(Gro uping.java:587) at org.apache.solr.search.Grouping.execute(Grouping.java:256) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.j ava:237) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH andler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa se.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedded SolrServer.java:140) ... 39 more Any ideas how to fix this or work around this error for now? I'd really like to move from the trunk to the stable 3.3.0 release and this is the only problem currently keeping me from doing so. Cheers, Thomas
faceting on field with two values
Hello, I have two fields TOWN and POSTALCODE and I want to concat those two in one field to do faceting My two fields are declared as followed: field name=TOWN type=string indexed=true stored=true/ field name=POSTALCODE type=string indexed=true stored=true/ The concat field is declared as followed: field name=TOWN_POSTALCODE type=string indexed=true stored=true multiValued=true/ and I do the copyfield as followed: copyField source=TOWN dest=TOWN_POSTALCODE/ copyField source=POSTALCODE dest=TOWN_POSTALCODE/ When I do faceting on TOWN_POSTALCODE field, I only get answers like lst name=TOWN_POSTALCODE int name=622005/int int name=622805/int int name=boulogne sur mer5/int int name=saint martin boulogne5/int ... Which means the faceting is down on the TOWN part or the POSTALCODE part of TOWN_POSTALCODE. But I would like to have answers like lst name=TOWN_POSTALCODE int name=boulogne sur mer 622005/int int name=paris 750165/int Is this possible with Solr? Thanks, Elisabeth
Re: faceting on field with two values
The easiest way is to concat() the fields in SQL, and pass it to indexing as one field already merged together. Thanks, On 7/5/11 1:12 AM, elisabeth benoit elisaelisael...@gmail.com wrote: Hello, I have two fields TOWN and POSTALCODE and I want to concat those two in one field to do faceting My two fields are declared as followed: field name=TOWN type=string indexed=true stored=true/ field name=POSTALCODE type=string indexed=true stored=true/ The concat field is declared as followed: field name=TOWN_POSTALCODE type=string indexed=true stored=true multiValued=true/ and I do the copyfield as followed: copyField source=TOWN dest=TOWN_POSTALCODE/ copyField source=POSTALCODE dest=TOWN_POSTALCODE/ When I do faceting on TOWN_POSTALCODE field, I only get answers like lst name=TOWN_POSTALCODE int name=622005/int int name=622805/int int name=boulogne sur mer5/int int name=saint martin boulogne5/int ... Which means the faceting is down on the TOWN part or the POSTALCODE part of TOWN_POSTALCODE. But I would like to have answers like lst name=TOWN_POSTALCODE int name=boulogne sur mer 622005/int int name=paris 750165/int Is this possible with Solr? Thanks, Elisabeth
Re: How many fields can SOLR handle?
This is taxonomy/index design... One way is to have a series of fields by category: TV - tv_size, resolution Computer - cpu, gpu Solr can have as many fields as you need, and if you do not store them into the index they are ignored. So if a user picks TV, you pass these to Solr: q=*:*facet=truefacet.field=tv_sizefacet.field=resolution If a user picks Computer, you pass these to Solr: q=*:*facet=truefacet.field=cpufacet.field=gpu The other option is to return ALL of the fields facet'd, but this is not recommended since you would certainly have performance issues depending on the number of fields. On 7/5/11 1:00 AM, roySolr royrutten1...@gmail.com wrote: Hi, I know i can add components to my requesthandler. In this situation facets are dependent of there category. So if a user choose for the category TV: Inch: 32 inch(5) 34 inch(3) 40 inch(1) Resolution: Full HD(5) HD ready(2) When a user search for category Computer: CPU: Intel(12) AMD(10) GPU: Ati(5) Nvidia(2) So i can't put it in my requesthandler as default. Every search there can be other facets. Do you understand what i mean? -- View this message in context: http://lucene.472066.n3.nabble.com/How-many-fields-can-SOLR-handle-tp30339 10p3139833.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How many fields can SOLR handle?
Thanks Bill, That's exactly what i mean. But first i do a request to get the right facetFields from a category. So a user search for TV, i do request to a db to get tv_size and resolution. The next step is to add this to my query like this: facet.field=tv_sizefacet.field=resolution. I thought maybe it's possible to add the FACET fields automatically to my query(based on category). I understand this isn't possible and i need to do first a request to get the facet.fields. -- View this message in context: http://lucene.472066.n3.nabble.com/How-many-fields-can-SOLR-handle-tp3033910p3139921.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: faceting on field with two values
Are you using the DIH?? You can use the transformer to concat the two fields -- View this message in context: http://lucene.472066.n3.nabble.com/faceting-on-field-with-two-values-tp3139870p3139934.html Sent from the Solr - User mailing list archive at Nabble.com.
Different Indexing formats for Older Lucene versions and Solr?
Hi All A quick doubt on the index files of Lucene and Solr. I had an older version of lucene (with UIMA) till recently, and had an index built thus. I shifted to Solr (3.3, with UIMA)..and tried to use the same index. While everything else seems fine, the Solr does not seem to recognize the index format. It still shows zero documents. I noticed from an older Solr application I used, that index directory contained files like .fdt, .fdx, .fnm, .nrm, .prx, .tii, .tis and segments_2v, segments_gen files. But, my lucene application contains only .cfx, .cfs and segments files. Is there a way to use my old lucene index in the new Solr application? Sowmya. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: Different Indexing formats for Older Lucene versions and Solr?
Which Lucene version were you using? Regards, Tommaso 2011/7/5 Sowmya V.B. vbsow...@gmail.com Hi All A quick doubt on the index files of Lucene and Solr. I had an older version of lucene (with UIMA) till recently, and had an index built thus. I shifted to Solr (3.3, with UIMA)..and tried to use the same index. While everything else seems fine, the Solr does not seem to recognize the index format. It still shows zero documents. I noticed from an older Solr application I used, that index directory contained files like .fdt, .fdx, .fnm, .nrm, .prx, .tii, .tis and segments_2v, segments_gen files. But, my lucene application contains only .cfx, .cfs and segments files. Is there a way to use my old lucene index in the new Solr application? Sowmya. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: Different Indexing formats for Older Lucene versions and Solr?
I was using 2.4 or 2.5. It was a 2yr old Lucene version. On Tue, Jul 5, 2011 at 10:07 AM, Tommaso Teofili tommaso.teof...@gmail.comwrote: Which Lucene version were you using? Regards, Tommaso 2011/7/5 Sowmya V.B. vbsow...@gmail.com Hi All A quick doubt on the index files of Lucene and Solr. I had an older version of lucene (with UIMA) till recently, and had an index built thus. I shifted to Solr (3.3, with UIMA)..and tried to use the same index. While everything else seems fine, the Solr does not seem to recognize the index format. It still shows zero documents. I noticed from an older Solr application I used, that index directory contained files like .fdt, .fdx, .fnm, .nrm, .prx, .tii, .tis and segments_2v, segments_gen files. But, my lucene application contains only .cfx, .cfs and segments files. Is there a way to use my old lucene index in the new Solr application? Sowmya. -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com -- Sowmya V.B. Losing optimism is blasphemy! http://vbsowmya.wordpress.com
Re: Exception when using result grouping and sorting by geodist() with Solr 3.3
I'm pretty sure my original query contained a distance filter as well. Do I absolutely need to filter by distance in order to sort my results by it? I'll write another unit test including a distance filter as soon as I get a chance. Cheers, Thomas On Tue, Jul 5, 2011 at 9:04 AM, Bill Bell billnb...@gmail.com wrote: Did you add: fq={!geofilt} ?? On 7/3/11 11:14 AM, Thomas Heigl tho...@umschalt.com wrote: Hello, I just tried up(down?)grading our current Solr 4.0 trunk setup to Solr 3.3.0 as result grouping was the only reason for us to stay with the trunk. Everything worked like a charm except for one of our queries, where we group results by the owning user and sort by distance. A simplified example for my query (that still fails) looks like this: q=*:*group=truegroup.field=user.uniqueId_sgroup.main=truegroup.format= groupedsfield=user.location_ppt=48.20927,16.3728sort=geodist() asc The exception thrown is: Caused by: org.apache.solr.common.SolrException: Unweighted use of sort geodist(latlon(user.location_p),48.20927,16.3728) at org.apache.solr.search.function.ValueSource$1.newComparator(ValueSource.j ava:106) at org.apache.lucene.search.SortField.getComparator(SortField.java:413) at org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.ini t(AbstractFirstPassGroupingCollector.java:81) at org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.init(T ermFirstPassGroupingCollector.java:56) at org.apache.solr.search.Grouping$CommandField.createFirstPassCollector(Gro uping.java:587) at org.apache.solr.search.Grouping.execute(Grouping.java:256) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.j ava:237) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchH andler.java:194) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBa se.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(Embedded SolrServer.java:140) ... 39 more Any ideas how to fix this or work around this error for now? I'd really like to move from the trunk to the stable 3.3.0 release and this is the only problem currently keeping me from doing so. Cheers, Thomas
Re: faceting on field with two values
hmmm... that sounds interesting and brings me somewhere else. we are actually reindexing data every night but the whole process is done by talend (reading and formatting data from a database) and this makes me wondering if we should use Solr instead to do this. in this case, concat two fields, the change is quite heavy (we have to change the talend process, pollute the xml files we use to index data with redundant fields, then modify the Solr process). so do you think the dih (which I just discovered) would be appropriate to do the whole process (read a database, read fields from xml contained in some of the database columns, add informations from csv file)??? from what I just read about dih, it seems so, but I'm still very confused about this dih thing. thanks again, Elisabeth 2011/7/5 roySolr royrutten1...@gmail.com Are you using the DIH?? You can use the transformer to concat the two fields -- View this message in context: http://lucene.472066.n3.nabble.com/faceting-on-field-with-two-values-tp3139870p3139934.html Sent from the Solr - User mailing list archive at Nabble.com.
OOM at solr master node while updating document
Is there any memory leak when I updating the index at the master node? Here is the stack trace. o.a.solr.servlet.SolrDispatchFilter - java.lang.OutOfMemoryError: Java heap space at org.apache.solr.handler.ReplicationHandler$FileStream.write(ReplicationHandler.java:1000) at org.apache.solr.handler.ReplicationHandler$3.write(ReplicationHandler.java:887) at org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:322) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:254) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.jboss.web.tomcat.filters.ReplyHeaderFilter.doFilter(ReplyHeaderFilter.java:96) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:230) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:179) at org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:84) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:157) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:262) at org.apache.coyote.ajp.AjpAprProcessor.process(AjpAprProcessor.java:425) at org.apache.coyote.ajp.AjpAprProtocol$AjpConnectionHandler.process(AjpAprProtocol.java:378) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1508) at java.lang.Thread.run(Thread.java:619)
searching a subset of SOLR index
Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
Re: searching a subset of SOLR index
Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
RE: searching a subset of SOLR index
Thanks. But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ? Regards, JAME VAALET -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Tuesday, July 05, 2011 2:26 PM To: solr-user@lucene.apache.org Subject: Re: searching a subset of SOLR index Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
Re: configure dismax requesthandlar for boost a field
On Tue, Jul 5, 2011 at 08:46, Romi romijain3...@gmail.com wrote: will merely adding fl=score make difference in search results, i mean will i get desired results now??? The fl parameter stands for field list and allows you to configure in a request which result fields should be returned. If you try to tweak the boosts in order to change your result order, it's wise to add the calculated score to the output by setting something like fl=score,*. This reminds me of another important question: Are you sorting the result by score? Because if not, your changes to the boosts/score won't ever have an effect on the ordering. http://wiki.apache.org/solr/CommonQueryParameters Marian
Re: Spellchecker in zero-hit search result
On Mon, Jul 4, 2011 at 17:19, Juan Grande juan.gra...@gmail.com wrote: Hi Marian, I guess that your problem isn't related to the number of results, but to the component's configuration. The configuration that you show is meant to set up an autocomplete component that will suggest terms from an incomplete user input (something similar to what google does while you're typing in the search box), see http://wiki.apache.org/solr/Suggester. That's why your suggestions to place are places and placed, all sharing the place prefix. But when you search for placw, the component doesn't return any suggestion, because in your index no term begins with placw. You can learn how to correctly configure a spellchecker here: http://wiki.apache.org/solr/SpellCheckComponent. Also, I'd recommend to take a look at the example's solrconfig, because it provides an example spellchecker configuration. Juan, thanks for the information! I have read through that page for quite a while before doing my tests, but it seems as if I had a different mental model. Then all reading might not be worth it. I thought that the SpellCheckComponent would be able to fetch index terms which are similar to the query term. The use case for that, mainly (but not only) in case of a zero-hit search would be to display the famous Did you mean ... hint. So I'm going back to the docs and to the example. :) Later, Marian
Re: faceting on field with two values
On Tue, Jul 5, 2011 at 10:21, elisabeth benoit elisaelisael...@gmail.com wrote: ... so do you think the dih (which I just discovered) would be appropriate to do the whole process (read a database, read fields from xml contained in some of the database columns, add informations from csv file)??? from what I just read about dih, it seems so, but I'm still very confused about this dih thing. As far as I can tell, the DataImportHandler is very useful if you want to get data (only) from a database directly to Solr, with only slight manipulation, e.g. concatenations. For that, it's much more convenient than the path via scripts to generate XML. It sounds like you are doing more than that in your importers.
Re: OOM at solr master node while updating document
2011/7/5 Chengyang atreey...@163.com: Is there any memory leak when I updating the index at the master node? Here is the stack trace. o.a.solr.servlet.SolrDispatchFilter - java.lang.OutOfMemoryError: Java heap space You don't need a memory leak to get a OOM error in Java. It might just happen that the amount or RAM allocated by the virtual machine is used up. If you are running Solr as in the example, via Jetty on the command line, try java -server -jar start.jar Or try the -Xmx parameter, e.g. java -Xmx1024M -jar start.jar If you are using Tomcat or something else, you might want to look into the docs on how to deal with the memory limit. Marian
Re: what is the diff between katta and solrcloud?
Why katta stores index on HDFS? Any advantages? -- View this message in context: http://lucene.472066.n3.nabble.com/what-is-the-diff-between-katta-and-solrcloud-tp2275554p3139983.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: searching a subset of SOLR index
The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache. If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution. Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 11:10 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index Thanks. But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ? Regards, JAME VAALET -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Tuesday, July 05, 2011 2:26 PM To: solr-user@lucene.apache.org Subject: Re: searching a subset of SOLR index Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
Re: Spellchecker in zero-hit search result
On Tue, Jul 5, 2011 at 11:24, Marian Steinbach mar...@sendung.de wrote: On Mon, Jul 4, 2011 at 17:19, Juan Grande juan.gra...@gmail.com wrote: ... You can learn how to correctly configure a spellchecker here: http://wiki.apache.org/solr/SpellCheckComponent. Also, I'd recommend to take a look at the example's solrconfig, because it provides an example spellchecker configuration. I found the problem. My suggest component had the line str name=classnameorg.apache.solr.spelling.suggest.Suggester/str which I probably copied without knowing what I did. I removed it to use the default IndexBasedSpellChecker. No I get my suggestions on zero-hit searches as well. Thanks again! Marian
Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?
nice...where? I'm trying to figure out 2 things: 1) How to create an analyzer that corresponds to the one in the schema.xml. analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer 2) I'd like to see the code that creates it reading it from schema.xml . On Tue, Jul 5, 2011 at 12:33 PM, Markus Jelsma markus.jel...@openindex.iowrote: No. SolrJ only builds input docs from NutchDocument objects. Solr will do analysis. The integration is analogous to XML post of Solr documents. On Tuesday 05 July 2011 12:28:21 Gabriele Kahlout wrote: Hello, I'm trying to understand better Nutch and Solr integration. My understanding is that Documents are added to Solr index from SolrWriter's write(NutchDocument doc) method. But does it make any use of the WhitespaceTokenizerFactory? -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: full text searching in cloud for minor enterprises
Look at searchblox On Monday, July 4, 2011, Li Li fancye...@gmail.com wrote: hi all, I want to provide full text searching for some small websites. It seems cloud computing is popular now. And it will save costs because it don't need employ engineer to maintain the machine. For now, there are many services such as amazon s3, google app engine, ms azure etc. I am not familiar with cloud computing. Anyone give me a direction or some advice? thanks - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org -- Joe Scanlon jscan...@element115.net Mobile: 603 459 3242 Office: 312 445 0018
RE: searching a subset of SOLR index
I have got two applications 1. website The website will enable any user to search the document repository , and the set they search on is known as website presentable 2. windows service The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set is universal set of documents in the doc repository including the website presentable. Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs. The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request. Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ? I have also read about multiple ports for listening request from different apps , can this be used. Regards, JAME VAALET -Original Message- From: Pierre GOSSE [mailto:pierre.go...@arisem.com] Sent: Tuesday, July 05, 2011 3:52 PM To: solr-user@lucene.apache.org Subject: RE: searching a subset of SOLR index The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache. If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution. Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 11:10 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index Thanks. But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ? Regards, JAME VAALET -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Tuesday, July 05, 2011 2:26 PM To: solr-user@lucene.apache.org Subject: Re: searching a subset of SOLR index Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
Re: @field for child object
Not yet - I've played around with support in this issue in the past though: https://issues.apache.org/jira/browse/SOLR-1945 On Jul 4, 2011, at 6:04 AM, Kiwi de coder wrote: hi, i wondering solrj @Field annotation support embedded child object ? e.g. class A { @field string somefield; @emebedded B b; } regards, kiwi - Mark Miller lucidimagination.com
Re: Feed index with analyzer output
Ok, the very short question is: Is there a way to submit the analyzer response so that solr already knows what to do with that response? (that is, which field are to be treated as payloads, which are tokens, etc...) Chris Hostetter-3 wrote: can you explain a bit more about what you goal is here? what info are you planning on extracting? what do you intend to change between the info you get back in the first request and the info you want to send in the second request? I plan to add some payloads to some terms between request#1 and request#2. Chris Hostetter-3 wrote: your analyziers and whatnot for request#1 would be exactly what you're use to, but for request#2 you'd need to specify an analyzer that would let you specify, in the field value, the details about the term and position, and offsets, and payloads and what not ... the DelimitedPayloadTokenFilterFactory / DelimitedPayloadTokenFilter can help with some of that, but not all -- you'd either need your own custom analyzer or custom FieldType or something depending on teh specific changes you want to make. Frankly though i really believe you are going about this backwards -- if you want to manipulate the Tokenstream after analysis but before indexing, then why not implement this custom logic thta you want in a TokenFilter and use it in the last TokenFilterFactory you have for your analyzer? Yeah, I thought about that. I really wanted to know if there weren't an already implemented way to do that to avoid reinventing the wheel. It would be cool if I were able to send info to solr formatted in a way like I imagined in my last mail, so that a call to any Tokenizer or TokenFilter wouldn't be necessary. It would have been like using an empty analyzer but still retaining the various token information. Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/Feed-index-with-analyzer-output-tp3131771p3140460.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: searching a subset of SOLR index
From what you tell us, I guess a separate index for website docs would be the best. If you fear that request from the window service would cripple your web site performance, why not have a totally separated index on another server, and have your website documents index in both indexes ? Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 13:14 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index I have got two applications 1. website The website will enable any user to search the document repository , and the set they search on is known as website presentable 2. windows service The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set is universal set of documents in the doc repository including the website presentable. Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs. The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request. Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ? I have also read about multiple ports for listening request from different apps , can this be used. Regards, JAME VAALET -Original Message- From: Pierre GOSSE [mailto:pierre.go...@arisem.com] Sent: Tuesday, July 05, 2011 3:52 PM To: solr-user@lucene.apache.org Subject: RE: searching a subset of SOLR index The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache. If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution. Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 11:10 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index Thanks. But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ? Regards, JAME VAALET -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Tuesday, July 05, 2011 2:26 PM To: solr-user@lucene.apache.org Subject: Re: searching a subset of SOLR index Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
Re: configure dismax requesthandlar for boost a field
I got the point that to boost search result i have to sort by score. But as in solrconfig for dismax request handler i use *str name=qf text^0.5 name^1.0 description^1.5 /str* because i want docs having querystring in description field come upper in search results. but what i am getting is first doc in search result which does not have querystring in its description field. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/configure-dismax-requesthandlar-for-boost-a-field-tp3137239p3140501.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is solrj 3.3.0 ready for field collapsing?
Let's see the results of adding debugQuery=on to your URL. Are you getting any documents back at all? If not, then your query isn't getting any documents to group. You haven't told us much about what you're trying to do, you might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Jul 4, 2011 11:55 AM, Per Newgro per.new...@gmx.ch wrote:
Re: configure dismax requesthandlar for boost a field
I got the point that to boost search result i have to sort by score. But as in solrconfig for dismax request handler i use *str name=qf text^0.5 name^1.0 description^1.5 /str* because i want docs having querystring in description field come upper in search results. but what i am getting is first doc in search result which does not have querystring in its description field. Increase the boost factor of description till you get. Lets say make it 100. Adding debugQuery=on will show how the actual score is calculated.
Re: How to boost a querystring at query time
When querystring is q=gold^2.0 ring(boost gold) and qt=standard i got the results for gold ring and when qt=dismax i got no result why so please explain q=gold^2.0 ring(boost gold)defType=dismax would return a document that contains exactly gold^2.0 ring(boost gold) in it. dismax is designed to work with simple keyword queries. You can use only three special characters: + - Rest of the characters, : [ ] ^ ~ etc, don't work. i.e. they are escaped. Please see http://www.lucidimagination.com/blog/2010/05/23/whats-a-dismax/
Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?
I suspect the following should do (1). I'm just not sure about file references as in stopInit.put(words, stopwords.txt) . (2) should clarify. 1) class SchemaAnalyzer extends Analyzer{ @Override public TokenStream tokenStream(String fieldName, Reader reader) { HashMapString, String stopInit = new HashMapString,String(); stopInit.put(words, stopwords.txt); stopInit.put(ignoreCase, Boolean.TRUE.toString()); StopFilterFactory stopFilterFactory = new StopFilterFactory(); stopFilterFactory.init(stopInit); final HashMapString, String wordDelimInit = new HashMapString, String(); wordDelimInit.put(generateWordParts, 1); wordDelimInit.put(generateNumberParts, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateNumbers, 1); wordDelimInit.put(catenateAll, 0); wordDelimInit.put(splitOnCaseChange, 1); WordDelimiterFilterFactory wordDelimiterFilterFactory = new WordDelimiterFilterFactory(); wordDelimiterFilterFactory.init(wordDelimInit); HashMapString, String porterInit = new HashMapString, String(); porterInit.put(protected, protwords.txt); EnglishPorterFilterFactory englishPorterFilterFactory = new EnglishPorterFilterFactory(); englishPorterFilterFactory.init(porterInit); return new RemoveDuplicatesTokenFilter(englishPorterFilterFactory.create(new LowerCaseFilter(wordDelimiterFilterFactory.create(stopFilterFactory.create(new WhitespaceTokenizer(reader)); } } On Tue, Jul 5, 2011 at 1:00 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: nice...where? I'm trying to figure out 2 things: 1) How to create an analyzer that corresponds to the one in the schema.xml. analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ /analyzer 2) I'd like to see the code that creates it reading it from schema.xml . On Tue, Jul 5, 2011 at 12:33 PM, Markus Jelsma markus.jel...@openindex.io wrote: No. SolrJ only builds input docs from NutchDocument objects. Solr will do analysis. The integration is analogous to XML post of Solr documents. On Tuesday 05 July 2011 12:28:21 Gabriele Kahlout wrote: Hello, I'm trying to understand better Nutch and Solr integration. My understanding is that Documents are added to Solr index from SolrWriter's write(NutchDocument doc) method. But does it make any use of the WhitespaceTokenizerFactory? -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?
Not yet an answer to 2) but this is where and how Solr initializes the Analyzer defined in the schema.xml into : //org.apache.solr.schema.IndexSchema // Load the Tokenizer // Although an analyzer only allows a single Tokenizer, we load a list to make sure // the configuration is ok // final ArrayListTokenizerFactory tokenizers = new ArrayListTokenizerFactory(1); AbstractPluginLoaderTokenizerFactory tokenizerLoader = new AbstractPluginLoaderTokenizerFactory( [schema.xml] analyzer/tokenizer, false, false ) { @Override protected void init(TokenizerFactory plugin, Node node) throws Exception { if( !tokenizers.isEmpty() ) { throw new SolrException( SolrException.ErrorCode.SERVER_ERROR, The schema defines multiple tokenizers for: +node ); } final MapString,String params = DOMUtil.toMapExcept(node.getAttributes(),class); // copy the luceneMatchVersion from config, if not set if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM)) params.put(LUCENE_MATCH_VERSION_PARAM, solrConfig.luceneMatchVersion.toString()); plugin.init( params ); tokenizers.add( plugin ); } @Override protected TokenizerFactory register(String name, TokenizerFactory plugin) throws Exception { return null; // used for map registration } }; tokenizerLoader.load( loader, (NodeList)xpath.evaluate(./tokenizer, node, XPathConstants.NODESET) ); // Make sure something was loaded if( tokenizers.isEmpty() ) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,analyzer without class or tokenizer filter list); } // Load the Filters // final ArrayListTokenFilterFactory filters = new ArrayListTokenFilterFactory(); AbstractPluginLoaderTokenFilterFactory filterLoader = new AbstractPluginLoaderTokenFilterFactory( [schema.xml] analyzer/filter, false, false ) { @Override protected void init(TokenFilterFactory plugin, Node node) throws Exception { if( plugin != null ) { final MapString,String params = DOMUtil.toMapExcept(node.getAttributes(),class); // copy the luceneMatchVersion from config, if not set if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM)) params.put(LUCENE_MATCH_VERSION_PARAM, solrConfig.luceneMatchVersion.toString()); plugin.init( params ); filters.add( plugin ); } } @Override protected TokenFilterFactory register(String name, TokenFilterFactory plugin) throws Exception { return null; // used for map registration } }; filterLoader.load( loader, (NodeList)xpath.evaluate(./filter, node, XPathConstants.NODESET) ); return new TokenizerChain(charFilters.toArray(new CharFilterFactory[charFilters.size()]), tokenizers.get(0), filters.toArray(new TokenFilterFactory[filters.size()])); }; On Tue, Jul 5, 2011 at 2:26 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: I suspect the following should do (1). I'm just not sure about file references as in stopInit.put(words, stopwords.txt) . (2) should clarify. 1) class SchemaAnalyzer extends Analyzer{ @Override public TokenStream tokenStream(String fieldName, Reader reader) { HashMapString, String stopInit = new HashMapString,String(); stopInit.put(words, stopwords.txt); stopInit.put(ignoreCase, Boolean.TRUE.toString()); StopFilterFactory stopFilterFactory = new StopFilterFactory(); stopFilterFactory.init(stopInit); final HashMapString, String wordDelimInit = new HashMapString, String(); wordDelimInit.put(generateWordParts, 1); wordDelimInit.put(generateNumberParts, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateNumbers, 1); wordDelimInit.put(catenateAll, 0); wordDelimInit.put(splitOnCaseChange, 1); WordDelimiterFilterFactory wordDelimiterFilterFactory = new WordDelimiterFilterFactory(); wordDelimiterFilterFactory.init(wordDelimInit); HashMapString, String porterInit = new HashMapString, String(); porterInit.put(protected, protwords.txt); EnglishPorterFilterFactory englishPorterFilterFactory = new EnglishPorterFilterFactory(); englishPorterFilterFactory.init(porterInit); return new RemoveDuplicatesTokenFilter(englishPorterFilterFactory.create(new LowerCaseFilter(wordDelimiterFilterFactory.create(stopFilterFactory.create(new WhitespaceTokenizer(reader)); } } On Tue, Jul 5, 2011 at 1:00 PM, Gabriele Kahlout
RE: searching a subset of SOLR index
But incase the website docs contribute around 50 % of the entire docs , why to recreate the indexes . don't you think its redundancy ? Can two web apps (solr instances ) share a single index file to search on it without interfering each other Regards, JAME VAALET Software Developer EXT :8108 Capital IQ -Original Message- From: Pierre GOSSE [mailto:pierre.go...@arisem.com] Sent: Tuesday, July 05, 2011 5:12 PM To: solr-user@lucene.apache.org Subject: RE: searching a subset of SOLR index From what you tell us, I guess a separate index for website docs would be the best. If you fear that request from the window service would cripple your web site performance, why not have a totally separated index on another server, and have your website documents index in both indexes ? Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 13:14 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index I have got two applications 1. website The website will enable any user to search the document repository , and the set they search on is known as website presentable 2. windows service The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set is universal set of documents in the doc repository including the website presentable. Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs. The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request. Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ? I have also read about multiple ports for listening request from different apps , can this be used. Regards, JAME VAALET -Original Message- From: Pierre GOSSE [mailto:pierre.go...@arisem.com] Sent: Tuesday, July 05, 2011 3:52 PM To: solr-user@lucene.apache.org Subject: RE: searching a subset of SOLR index The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache. If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution. Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 11:10 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index Thanks. But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ? Regards, JAME VAALET -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Tuesday, July 05, 2011 2:26 PM To: solr-user@lucene.apache.org Subject: Re: searching a subset of SOLR index Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
Re: How to boost a querystring at query time
than what should i do to get the required result. ie. if i want to boost gold than which querytype i should use. - Thanks Regards Romi -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-boost-a-querystring-at-query-time-tp3139800p3140703.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Reading data from Solr MoreLikeThis
Hi Juan, Thank you very much..Your code worked pretty awesome and was real helpfulGreat start of the day...:) -- View this message in context: http://lucene.472066.n3.nabble.com/Reading-data-from-Solr-MoreLikeThis-tp3130184p3140715.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: searching a subset of SOLR index
I wouldn't share the same index across two Solr webapps - as they could step on each others toes. In this scenario, I think having two Solr instances replicating from the same master is the way to go, to allow you to scale your load from each application separately. Erik On Jul 5, 2011, at 09:04 , Jame Vaalet wrote: But incase the website docs contribute around 50 % of the entire docs , why to recreate the indexes . don't you think its redundancy ? Can two web apps (solr instances ) share a single index file to search on it without interfering each other Regards, JAME VAALET Software Developer EXT :8108 Capital IQ -Original Message- From: Pierre GOSSE [mailto:pierre.go...@arisem.com] Sent: Tuesday, July 05, 2011 5:12 PM To: solr-user@lucene.apache.org Subject: RE: searching a subset of SOLR index From what you tell us, I guess a separate index for website docs would be the best. If you fear that request from the window service would cripple your web site performance, why not have a totally separated index on another server, and have your website documents index in both indexes ? Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 13:14 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index I have got two applications 1. website The website will enable any user to search the document repository , and the set they search on is known as website presentable 2. windows service The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set is universal set of documents in the doc repository including the website presentable. Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs. The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request. Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ? I have also read about multiple ports for listening request from different apps , can this be used. Regards, JAME VAALET -Original Message- From: Pierre GOSSE [mailto:pierre.go...@arisem.com] Sent: Tuesday, July 05, 2011 3:52 PM To: solr-user@lucene.apache.org Subject: RE: searching a subset of SOLR index The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache. If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution. Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 11:10 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index Thanks. But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ? Regards, JAME VAALET -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Tuesday, July 05, 2011 2:26 PM To: solr-user@lucene.apache.org Subject: Re: searching a subset of SOLR index Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
Re: How to boost a querystring at query time
than what should i do to get the required result. ie. if i want to boost gold than which querytype i should use. If you want to boost the keyword 'gold', you can use bq parameter. defType=dismaxbq=someField:gold^100 See the other parameters : http://wiki.apache.org/solr/DisMaxQParserPlugin#bq_.28Boost_Query.29
RE: searching a subset of SOLR index
It is redundancy. You have to balance the cost of redundancy with the cost in performance with your web index requested by your windows service. If your windows service is not too aggressive in its requests, go for shards. Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 15:05 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index But incase the website docs contribute around 50 % of the entire docs , why to recreate the indexes . don't you think its redundancy ? Can two web apps (solr instances ) share a single index file to search on it without interfering each other Regards, JAME VAALET Software Developer EXT :8108 Capital IQ -Original Message- From: Pierre GOSSE [mailto:pierre.go...@arisem.com] Sent: Tuesday, July 05, 2011 5:12 PM To: solr-user@lucene.apache.org Subject: RE: searching a subset of SOLR index From what you tell us, I guess a separate index for website docs would be the best. If you fear that request from the window service would cripple your web site performance, why not have a totally separated index on another server, and have your website documents index in both indexes ? Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 13:14 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index I have got two applications 1. website The website will enable any user to search the document repository , and the set they search on is known as website presentable 2. windows service The windows service will search on all the documents in the repository for fixed set of key words and store the found result in database.this set is universal set of documents in the doc repository including the website presentable. Website is a high prioritized app which should work smoothly without any interference , where as windows service should run all day long continuously without break to save result from incoming docs. The problem here is website set is predefined and I don't want the windows service request to SOLR to slow down website request. Suppose am segregating the website presentable docs index into a particular core and rest of them into different core will it solve the problem ? I have also read about multiple ports for listening request from different apps , can this be used. Regards, JAME VAALET -Original Message- From: Pierre GOSSE [mailto:pierre.go...@arisem.com] Sent: Tuesday, July 05, 2011 3:52 PM To: solr-user@lucene.apache.org Subject: RE: searching a subset of SOLR index The limit will always be logical if you have all documents in the same index. But filters are very efficient when working with subset of your index, especially if you reuse the same filter for many queries since there is a cache. If your subsets are always the same subsets, maybe your could use shards. But we would need to know more about what you intend to do, to point to an adequate solution. Pierre -Message d'origine- De : Jame Vaalet [mailto:jvaa...@capitaliq.com] Envoyé : mardi 5 juillet 2011 11:10 À : solr-user@lucene.apache.org Objet : RE: searching a subset of SOLR index Thanks. But does this range query just limit the universe logically or does it have any mechanism to limit this physically as well .Do we leverage time factor by using the range query ? Regards, JAME VAALET -Original Message- From: shashi@gmail.com [mailto:shashi@gmail.com] On Behalf Of Shashi Kant Sent: Tuesday, July 05, 2011 2:26 PM To: solr-user@lucene.apache.org Subject: Re: searching a subset of SOLR index Range query On Tue, Jul 5, 2011 at 4:37 AM, Jame Vaalet jvaa...@capitaliq.com wrote: Hi, Let say, I have got 10^10 documents in an index with unique id being document id which is assigned to each of those from 1 to 10^10 . Now I want to search a particular query string in a subset of these documents say ( document id 100 to 1000). The question here is.. will SOLR able to search just in this set of documents rather than the entire index ? if yes what should be query to limit search into this subset ? Regards, JAME VAALET Software Developer EXT :8108 Capital IQ
Nightly builds
The solr download link does not point to or mention nightly builds. Are they out there?
Re: Does Nutch make any use of solr.WhitespaceTokenizerFactory defined in schema.xml?
the answer to 2) is new IndexSchema(solrConf, schema).getAnalyzer(); On Tue, Jul 5, 2011 at 2:48 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Not yet an answer to 2) but this is where and how Solr initializes the Analyzer defined in the schema.xml into : //org.apache.solr.schema.IndexSchema // Load the Tokenizer // Although an analyzer only allows a single Tokenizer, we load a list to make sure // the configuration is ok // final ArrayListTokenizerFactory tokenizers = new ArrayListTokenizerFactory(1); AbstractPluginLoaderTokenizerFactory tokenizerLoader = new AbstractPluginLoaderTokenizerFactory( [schema.xml] analyzer/tokenizer, false, false ) { @Override protected void init(TokenizerFactory plugin, Node node) throws Exception { if( !tokenizers.isEmpty() ) { throw new SolrException( SolrException.ErrorCode.SERVER_ERROR, The schema defines multiple tokenizers for: +node ); } final MapString,String params = DOMUtil.toMapExcept(node.getAttributes(),class); // copy the luceneMatchVersion from config, if not set if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM)) params.put(LUCENE_MATCH_VERSION_PARAM, solrConfig.luceneMatchVersion.toString()); plugin.init( params ); tokenizers.add( plugin ); } @Override protected TokenizerFactory register(String name, TokenizerFactory plugin) throws Exception { return null; // used for map registration } }; tokenizerLoader.load( loader, (NodeList)xpath.evaluate(./tokenizer, node, XPathConstants.NODESET) ); // Make sure something was loaded if( tokenizers.isEmpty() ) { throw new SolrException(SolrException.ErrorCode.SERVER_ERROR,analyzer without class or tokenizer filter list); } // Load the Filters // final ArrayListTokenFilterFactory filters = new ArrayListTokenFilterFactory(); AbstractPluginLoaderTokenFilterFactory filterLoader = new AbstractPluginLoaderTokenFilterFactory( [schema.xml] analyzer/filter, false, false ) { @Override protected void init(TokenFilterFactory plugin, Node node) throws Exception { if( plugin != null ) { final MapString,String params = DOMUtil.toMapExcept(node.getAttributes(),class); // copy the luceneMatchVersion from config, if not set if (!params.containsKey(LUCENE_MATCH_VERSION_PARAM)) params.put(LUCENE_MATCH_VERSION_PARAM, solrConfig.luceneMatchVersion.toString()); plugin.init( params ); filters.add( plugin ); } } @Override protected TokenFilterFactory register(String name, TokenFilterFactory plugin) throws Exception { return null; // used for map registration } }; filterLoader.load( loader, (NodeList)xpath.evaluate(./filter, node, XPathConstants.NODESET) ); return new TokenizerChain(charFilters.toArray(new CharFilterFactory[charFilters.size()]), tokenizers.get(0), filters.toArray(new TokenFilterFactory[filters.size()])); }; On Tue, Jul 5, 2011 at 2:26 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: I suspect the following should do (1). I'm just not sure about file references as in stopInit.put(words, stopwords.txt) . (2) should clarify. 1) class SchemaAnalyzer extends Analyzer{ @Override public TokenStream tokenStream(String fieldName, Reader reader) { HashMapString, String stopInit = new HashMapString,String(); stopInit.put(words, stopwords.txt); stopInit.put(ignoreCase, Boolean.TRUE.toString()); StopFilterFactory stopFilterFactory = new StopFilterFactory(); stopFilterFactory.init(stopInit); final HashMapString, String wordDelimInit = new HashMapString, String(); wordDelimInit.put(generateWordParts, 1); wordDelimInit.put(generateNumberParts, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateWords, 1); wordDelimInit.put(catenateNumbers, 1); wordDelimInit.put(catenateAll, 0); wordDelimInit.put(splitOnCaseChange, 1); WordDelimiterFilterFactory wordDelimiterFilterFactory = new WordDelimiterFilterFactory(); wordDelimiterFilterFactory.init(wordDelimInit); HashMapString, String porterInit = new HashMapString, String(); porterInit.put(protected, protwords.txt); EnglishPorterFilterFactory englishPorterFilterFactory = new EnglishPorterFilterFactory(); englishPorterFilterFactory.init(porterInit); return new
Apache Nutch and Solr Integration
Hello Friends, I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2 . I did the steps explained in the following two URL's : http://wiki.apache.org/nutch/RunningNutchAndSolr http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html I downloaded both the softwares, however, I am getting error (*solrUrl is not set, indexing will be skipped..*) when I am trying to crawl using Cygwin. Can anyone please help me out to fix this issue ? Else any other website suggesting for Apache Nutch and Solr integration would be greatly helpful. Thanks Regards, Serenity
Re: Nightly builds
On 07/05/2011 04:08 PM, Benson Margulies wrote: The solr download link does not point to or mention nightly builds. Are they out there? http://lmgtfy.com/?q=%2Bsolr+%2Bnightlybuildsl=1 -- Auther of the book Plone 3 Multimedia - http://amzn.to/dtrp0C Tom Gross email.@toms-projekte.de skype.tom_gross web.http://toms-projekte.de blog...http://blog.toms-projekte.de
Re: Nightly builds
The reason for the email is not that I can't find them, but because the project, I claim, should be advertising them more prominently on the web site than buried in a wiki. Where I come from, an lmgtfy link is rather hostile. Oh, and you might want to fix the spelling of 'Author' in your own signature. On Tue, Jul 5, 2011 at 10:19 AM, Tom Gross itconse...@gmail.com wrote: On 07/05/2011 04:08 PM, Benson Margulies wrote: The solr download link does not point to or mention nightly builds. Are they out there? http://lmgtfy.com/?q=%2Bsolr+%2Bnightlybuildsl=1 -- Auther of the book Plone 3 Multimedia - http://amzn.to/dtrp0C Tom Gross email.@toms-projekte.de skype.tom_gross web.http://toms-projekte.de blog...http://blog.toms-projekte.de
Re: MergerFactor and MaxMergerDocs effecting num of segments created
On 7/4/2011 12:51 AM, Romi wrote: Shawn when i reindex data using full-import i got: *_0.fdt 3310 _0.fdx 23 _0.frq 857 _0.nrm 31 _0.prx 1748 _0.tis 350 _1.fdt 3310 _1.fdx 23 _1.fnm 1 _1.frq 857 _1.nrm 31 _1.prx 1748 _1.tii 5 _1.tis 350 segments.gen1 segments_3 1* Where all _1 marked as archived(A) And when i run again full import(for testing ) i got _1 and 2_ files where all 2_ marked as archive. What does it mean. and the problem i am not getting is while i am doing full import which deletes the old indexes and creates new than why i m getting the old one again. By mentioning the Archive bit, it sounds like you are running on Windows. I've only run it on Linux, but I understand from reading messages on this list that there are a lot of problems on Windows with deleting old files whenever you do anything that results in old segments going away -- reindex, optimize, replication, normal segment merging, etc. The current solr version is 3.3, previous versions are 3.2, 3.1, then 1.4.1. Others will have to comment about whether things have improved in more recent releases. The archive bit is simply a DOS/Windows attribute that says this file needs to be backed up. When you create or modify a file in a normal way, it is turned on. Normally the only thing that turns that bit off is backup software, but Solr might be programmed to clear it on files that are no longer needed, in case the delete fails, so there's a way to detect that they should not be backed up. I don't know if this is right, it's just speculation. Thanks, Shawn
RE: what s the optimum size of SOLR indexes
Hello, On Mon, 2011-07-04 at 13:51 +0200, Jame Vaalet wrote: What would be the maximum size of a single SOLR index file for resulting in optimum search time ? How do you define optimimum? Do you want the fastest possible response time at any cost or do you have a specific response time goal? Can you give us more details on your use case? What kind of load are you expecting? What kind of queries do you need to support? Some of the trade-offs depend if you are CPU bound or I/O bound. Assuming a fairly large index, if you *absolutely need* the fastest possible search response time and you can *afford the hardware*, you probably want to shard your index and size your indexes so they can all fit in memory (and do some work to make sure the index data is always in memory). If you can't afford that much memory, but still need very fast response times, you might want to size your indexes so they all fit on SSD's. As an example of a use case on the opposite side of the spectrum, here at HathiTrust, we have a very low number of queries per second and we are running an index that totals 6 TB in size with shards of about 500GB and average response times of 200ms (but 99th percentile times of about 2 seconds). Tom Burton-West http://www.hathitrust.org/blogs/large-scale-search
Hit Rate
Hello all, Is there a good way to get the hit count of a search? Example query: textField:solr AND documentId:1000 Say document with Id = 1000 has solr 13 times in the document. Any way to extract that number [13] in the response? I know we can return the score which is loosely related to hit counts via tf-idf, but for this case I need the actually hit counts. I believe you can get this information from the logs, but that is less useful if the use case is on the presentation layer. I tried faceting on the query but it seems like that returns the number of documents that query matches rather than the hit count. http://localhost:8080/solr/ExampleCore/select/?q=textField%3Asolr+AND+documentId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=textField:solrfacet.query=http://cobra:8080/solr/BusinessDescriptionCore/select/?q=businessDescription%3Afacebook+AND+businessDescriptionId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=businessDescriptionQuoted:facebookfacet.query=businessDescriptionQuoted:facebook textField:solrhttp://cobra:8080/solr/BusinessDescriptionCore/select/?q=businessDescription%3Afacebook+AND+businessDescriptionId%3A1246727version=2.2start=0rows=10indent=onfacet=trueface.field=businessDescriptionQuoted:facebookfacet.query=businessDescriptionQuoted:facebook I was thinking that highlighting essentially returns the hit count if you supply unlimited amount of snippets, but I imagine there must be a more elegant solution. Thanks in advance, Briggs
Re: Feed index with analyzer output
On 7/5/11 1:37 PM, Lox wrote: Ok, the very short question is: Is there a way to submit the analyzer response so that solr already knows what to do with that response? (that is, which field are to be treated as payloads, which are tokens, etc...) Check this issue: http://issues.apache.org/jira/browse/SOLR-1535 -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Hit Rate
Is there a good way to get the hit count of a search? Example query: textField:solr AND documentId:1000 Say document with Id = 1000 has solr 13 times in the document. Any way to extract that number [13] in the response? Looks like you are looking for term frequency info: Two separate solutions: http://wiki.apache.org/solr/TermVectorComponent http://wiki.apache.org/solr/FunctionQuery#tf
Re: Apache Nutch and Solr Integration
Can you let me know when and where you were getting the error? A screen-shot will be helpful. On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston serenity.kenings...@gmail.com wrote: Hello Friends, I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2 . I did the steps explained in the following two URL's : http://wiki.apache.org/nutch/RunningNutchAndSolr http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html I downloaded both the softwares, however, I am getting error (*solrUrl is not set, indexing will be skipped..*) when I am trying to crawl using Cygwin. Can anyone please help me out to fix this issue ? Else any other website suggesting for Apache Nutch and Solr integration would be greatly helpful. Thanks Regards, Serenity
primary key made of multiple fields from multiple source tables
Hello all I'm using Solr 3.2 and am trying to index a document whose primary key is built from multiple columns selected from an Oracle DB. I'm getting the following error: java.lang.IllegalArgumentException: deltaQuery has no column to resolve to declared primary key pk='ordersorderline_id' at org.apache.solr.handler.dataimport.DocBuilder.findMatchingPkColumn(DocBuilder.java:840) ~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30 23:09:08] at org.apache.solr.handler.dataimport.DocBuilder.collectDelta(DocBuilder.java:891) ~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30 23:09:08] at org.apache.solr.handler.dataimport.DocBuilder.doDelta(DocBuilder.java:284) ~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30 23:09:08] at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:178) ~[apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30 23:09:08] at org.apache.solr.handler.dataimport.DataImporter.doDeltaImport(DataImporter.java:374) [apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30 23:09:08] at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:413) [apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30 23:09:08] at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:392) [apache-solr-dataimporthandler-3.2.0.jar:3.2.0 1129474 - rmuir - 2011-05-30 23:09:08] The deltaQuery is: select orders.order_id || orders.order_booked_ind || order_line.order_line_id as ordersorderline_id, orders.order_id, orders.order_booked_ind, order_line.order_line_id, orders.order_dt, orders.cancel_dt, orders.account_manager_id, orders.of_header_id, orders.order_status_lov_id, orders.order_type_id, orders.approved_discount_pct, orders.campaign_nm, orders.approved_by_cd,orders.advertiser_id, orders.agency_id, order_line.accounting_comments_desc from orders, order_line where order_line.order_id = orders.order_id and order_line.order_booked_ind = orders.order_booked_ind I've just seen in the Solr Wiki Task List at http://wiki.apache.org/solr/TaskList?highlight=%28composite%29 a Big Idea for The Future is: support for *composite* keys ... either with some explicit change to the uniqueKey declaration or perhaps just copyField with some hidden magic that concats the resulting terms into a single key Term Does this prohibit my creating the key with the select as above? Mark
Re: Hit Rate
Yes indeed, that is what I was missing. Thanks Ahmet! On Tue, Jul 5, 2011 at 12:48 PM, Ahmet Arslan iori...@yahoo.com wrote: Is there a good way to get the hit count of a search? Example query: textField:solr AND documentId:1000 Say document with Id = 1000 has solr 13 times in the document. Any way to extract that number [13] in the response? Looks like you are looking for term frequency info: Two separate solutions: http://wiki.apache.org/solr/TermVectorComponent http://wiki.apache.org/solr/FunctionQuery#tf
Dynamic Facets
Hi, guys, We have more than 1000 attributes scattered around 700K docs. Each doc might have about 50 attributes. I would like Solr to return up to 20 facets for every searches, and each search can return facets dynamically depending on the matched docs. Anyone done that before? That'll be awesome if the facets returned will be changed after we drill down facets. I have looked at the following docs: http://wiki.apache.org/solr/SimpleFacetParameters http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr Wondering what's the best way to accomplish that. Any advice? Thanks, YH
Re: After the query component has the results, can I do more filtering on them?
Sorry for being vague. Okay so these scores exist on an external server and they change often enough. The score for each returned user is actually dependent on the user doing the searching (if I'm making the request, and you make the same request, the scores are different). So what I'm doing is getting a bunch of scores from the external and aggregating that with the current scores solr gave in my component. So heres the flow (all numbers are arbitrary): 1) Get 10,000 results from solr from the query component 2) return a list of scores and ids from the external server (it'll return a lot of them) 3) Out of this 1, I take the top 3500 docs after aggregating the external servers scores and netcons scores. The problem is, the score for each doc is specific to the user making the request. The algorithm in doing these scores is quite complex. I cannot simply re-index with new scores, hence I've written this component which runs after querycomponent and does the magic of filtering. I've come up with a solution but it involved me changing a lot of solr code. First and foremost, I've maed the queryResultCache public and developed a small API in accessing and changing it. I've also changed the QueryResultKey to include a Long userId in its hashCode and equals functions. When a search is made, the QueryComponent caches its results, and then in my custom component I go into that cache, get my superset, filter it out from the scores in my external server, and throw it back into cache. Of course none of this happens if my custom scored stuff is already cached, so its actually decent. If you have any suggestions and improvements I'd greatly appreciate it. Sorry for the long response...I didn't want to be an XY problem again :D -- View this message in context: http://lucene.472066.n3.nabble.com/After-the-query-component-has-the-results-can-I-do-more-filtering-on-them-tp3114775p3141652.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Apache Nutch and Solr Integration
You are using the crawl job so you must specify the URL to your Solr instance. The newly updated wiki has you answer: http://wiki.apache.org/nutch/bin/nutch_crawl Hello Friends, I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2 . I did the steps explained in the following two URL's : http://wiki.apache.org/nutch/RunningNutchAndSolr http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apach e-solr.html I downloaded both the softwares, however, I am getting error (*solrUrl is not set, indexing will be skipped..*) when I am trying to crawl using Cygwin. Can anyone please help me out to fix this issue ? Else any other website suggesting for Apache Nutch and Solr integration would be greatly helpful. Thanks Regards, Serenity
Re: Apache Nutch and Solr Integration
Please find attached screenshot On Tue, Jul 5, 2011 at 11:53 AM, Way Cool way1.wayc...@gmail.com wrote: Can you let me know when and where you were getting the error? A screen-shot will be helpful. On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston serenity.kenings...@gmail.com wrote: Hello Friends, I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2 . I did the steps explained in the following two URL's : http://wiki.apache.org/nutch/RunningNutchAndSolr http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html I downloaded both the softwares, however, I am getting error (*solrUrl is not set, indexing will be skipped..*) when I am trying to crawl using Cygwin. Can anyone please help me out to fix this issue ? Else any other website suggesting for Apache Nutch and Solr integration would be greatly helpful. Thanks Regards, Serenity
Re: Custom Cache cleared after a commit?
Sorry for my ignorance, but do you have any lead in the code on where to look for this? Also, I'd still need a way of finding out how long its been in the cache because I don't want it to regenerate every time. I'd want it to regenerate only if its been in the cache for less then 6 hours (or some time frame which I deem to be good). Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Custom-Cache-cleared-after-a-commit-tp3136345p3141673.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dynamic Facets
YH - One technique (that the Smithsonian employs, I believe) is a technique to index the field names for the attributes into a separate field, facet on that first, and then facet on the fields you'd like from that response in a second request to Solr. There's a basic hack here so the indexing client doesn't need to add the fields used field: https://issues.apache.org/jira/browse/SOLR-1280 Ideally, this could all be made part of one request to Solr - and I can envision a pre-faceting component (post querying) to dynamically figure out the best fields to facet on, set those into the request context, and the rest is magic. Erik On Jul 5, 2011, at 13:15 , Way Cool wrote: Hi, guys, We have more than 1000 attributes scattered around 700K docs. Each doc might have about 50 attributes. I would like Solr to return up to 20 facets for every searches, and each search can return facets dynamically depending on the matched docs. Anyone done that before? That'll be awesome if the facets returned will be changed after we drill down facets. I have looked at the following docs: http://wiki.apache.org/solr/SimpleFacetParameters http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr Wondering what's the best way to accomplish that. Any advice? Thanks, YH
Re: faceting on field with two values
: I have two fields TOWN and POSTALCODE and I want to concat those two in one : field to do faceting As others have pointed out, copy field doesn't do a concat, it just adds the field values from the source field to the desc field (so with those two copyField/ lines you will typically get two values for each doc in the dest field) if you don't wnat to go the DIH route, and you don't want to change your talend process, you could use a simple UpdateProcessor for this (update processors are used to process add/delete requests no matter what source the come from, before analysis happens) ... but i don't think we have any off the shelf Concat update processors in solr at the moment there is a patch for a a Script based on which might be helpful.. https://issues.apache.org/jira/browse/SOLR-1725 All of that said, based on what you've described about your usecase i would question from a UI standpoint wether this field would actually a good idea... isn't there an extremely large number of postal codes even in a single city? why not let people fact on just the town field first, and then only when they click on one, offer them a facet on Postal code? Otherwise your facet UI is going to have a tendenzy to look like this... Gender: * Male (9000 results) * Female (8000 results) Town/Postal: * paris, 75016 (560 results) * paris, 75015 (490 results) * paris, 75022 (487 results) * boulogne sur mer 62200 (468 results) * paris, 75018 (465 results) * (click to see more) Color: * Red (900 results) * Blue (800 results) ...and many of your users will never find the town they are looking for (let alone the post code) -Hoss
Cannot I search documents added by IndexWriter after commit?
@Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer, true)); TopDocs docs = searcher.search(allQ, 10); *assertEquals(0, docs.totalHits); // empty/no index* Document doc = getDoc(); writer.addDocument(doc); writer.commit(); docs = searcher.search(allQ, 10); *assertEquals(1,docs.totalHits); //it fails here. docs.totalHits equals 0* } What am I doing wrong here? If I initialize searcher with new IndexSearcher(directory) I'm told: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@3caa4blockFactory=org.apache.lucene.store.SingleInstanceLockFactory@ed0220c: files: [] -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Dynamic Facets
You can issue a new facet search as you drill down from your UI. You have to specify the fields you want to facet on and they can be dynamic. Take a look at recent threads here on taxonomy faceting for help. Also, look here[1] [1] http://wiki.apache.org/solr/SimpleFacetParameters On Tue, 5 Jul 2011 11:15:51 -0600, Way Cool way1.wayc...@gmail.com wrote: Hi, guys, We have more than 1000 attributes scattered around 700K docs. Each doc might have about 50 attributes. I would like Solr to return up to 20 facets for every searches, and each search can return facets dynamically depending on the matched docs. Anyone done that before? That'll be awesome if the facets returned will be changed after we drill down facets. I have looked at the following docs: http://wiki.apache.org/solr/SimpleFacetParameters http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr Wondering what's the best way to accomplish that. Any advice? Thanks, YH
Re: A beginner problem
: follow a receipe. So I went to the the solr site, downloaded solr and : tried to follow the tutorial. In the example folder of solr, using : java -jar start.jar I got: : : 2011-07-04 13:22:38.439:INFO::Logging to STDERR via org.mortbay.log.StdErrLog : 2011-07-04 13:22:38.893:INFO::jetty-6.1-SNAPSHOT : 2011-07-04 13:22:38.946:INFO::Started SocketConnector@0.0.0.0:8983 if that is everything you got in the logs, then i suspect: a) you download a source release (ie: has *-src-* in it's name) in which the solr.war app has not yet been compiled) b) you did not run ant example to build solr and setup the example instance. If i'm wrong, then yes please more details would be helpful: what exact URL did you download? -Hoss
Re: Dynamic Facets
Thanks Erik and Darren. A pre-faceting component (post querying) will be ideal as though maybe a little performance penalty there. :-) I will try to implement one if no one has done so. Darren, I did look at the taxonomy faceting thread. My main concern is that I want to have dynamic facets to be returned because I don't know what facets I can specify as a part of query ahead of time, and there are too many search terms. ;-) Thanks for help. On Tue, Jul 5, 2011 at 11:49 AM, dar...@ontrenet.com wrote: You can issue a new facet search as you drill down from your UI. You have to specify the fields you want to facet on and they can be dynamic. Take a look at recent threads here on taxonomy faceting for help. Also, look here[1] [1] http://wiki.apache.org/solr/SimpleFacetParameters On Tue, 5 Jul 2011 11:15:51 -0600, Way Cool way1.wayc...@gmail.com wrote: Hi, guys, We have more than 1000 attributes scattered around 700K docs. Each doc might have about 50 attributes. I would like Solr to return up to 20 facets for every searches, and each search can return facets dynamically depending on the matched docs. Anyone done that before? That'll be awesome if the facets returned will be changed after we drill down facets. I have looked at the following docs: http://wiki.apache.org/solr/SimpleFacetParameters http://www.lucidimagination.com/devzone/technical-articles/faceted-search-solr Wondering what's the best way to accomplish that. Any advice? Thanks, YH
Re: Cannot I search documents added by IndexWriter after commit?
After your writer.commit you need to reopen your searcher to see the changes. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer, true)); TopDocs docs = searcher.search(allQ, 10); * assertEquals(0, docs.totalHits); // empty/no index* Document doc = getDoc(); writer.addDocument(doc); writer.commit(); docs = searcher.search(allQ, 10); * assertEquals(1,docs.totalHits); //it fails here. docs.totalHits equals 0* } What am I doing wrong here? If I initialize searcher with new IndexSearcher(directory) I'm told: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@3caa4blockFactory=org.apache.lucene.store.SingleInstanceLockFactory@ed0220c: files: [] -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Cannot I search documents added by IndexWriter after commit?
and how do you do that? There is no reopen method On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless luc...@mikemccandless.com wrote: After your writer.commit you need to reopen your searcher to see the changes. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer, true)); TopDocs docs = searcher.search(allQ, 10); *assertEquals(0, docs.totalHits); // empty/no index* Document doc = getDoc(); writer.addDocument(doc); writer.commit(); docs = searcher.search(allQ, 10); *assertEquals(1,docs.totalHits); //it fails here. docs.totalHits equals 0* } What am I doing wrong here? If I initialize searcher with new IndexSearcher(directory) I'm told: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@3caa4blockFactory =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c: files: [] -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Cannot I search documents added by IndexWriter after commit?
Sorry, you must reopen the underlying IndexReader, and then make a new IndexSearcher from the reopened reader. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: and how do you do that? There is no reopen method On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless luc...@mikemccandless.com wrote: After your writer.commit you need to reopen your searcher to see the changes. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer, true)); TopDocs docs = searcher.search(allQ, 10); * assertEquals(0, docs.totalHits); // empty/no index* Document doc = getDoc(); writer.addDocument(doc); writer.commit(); docs = searcher.search(allQ, 10); * assertEquals(1,docs.totalHits); //it fails here. docs.totalHits equals 0* } What am I doing wrong here? If I initialize searcher with new IndexSearcher(directory) I'm told: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@3caa4blockFactory =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c: files: [] -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
The OR operator in a query ?
Hi all, Someone could tell me what is the OR syntax in SOLR and how to use it in a search query ? I tried: fq=sometag:1+sometag:5 fq=sometag:[1+5] fq=sometag:[1OR5] fq=sometag:1+5 and many more but impossible to get what I want. Thanks for advance -- View this message in context: http://lucene.472066.n3.nabble.com/The-OR-operator-in-a-query-tp3141843p3141843.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Cannot I search documents added by IndexWriter after commit?
Still won't work (same as before). @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); final IndexReader indexReader = IndexReader.open(writer, true); IndexSearcher searcher = new IndexSearcher(indexReader); TopDocs docs = searcher.search(allQ, 10); assertEquals(0, docs.totalHits); // empty/no index Document doc = getDoc(); writer.addDocument(doc); writer.commit(); *indexReader.reopen(); searcher = new IndexSearcher(indexReader); docs = searcher.search(allQ, 10);* assertEquals(1,docs.totalHits); } private Document getDoc() { Document doc = new Document(); doc.add(new Field(id, 0, Field.Store.YES, Field.Index.NOT_ANALYZED)); return doc; } private IndexWriter getWriter() throws IOException {// 2 return new IndexWriter(directory, new WhitespaceAnalyzer(), // 2 IndexWriter.MaxFieldLength.UNLIMITED); // 2 } On Tue, Jul 5, 2011 at 8:15 PM, Michael McCandless luc...@mikemccandless.com wrote: Sorry, you must reopen the underlying IndexReader, and then make a new IndexSearcher from the reopened reader. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: and how do you do that? There is no reopen method On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless luc...@mikemccandless.com wrote: After your writer.commit you need to reopen your searcher to see the changes. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer, true)); TopDocs docs = searcher.search(allQ, 10); *assertEquals(0, docs.totalHits); // empty/no index* Document doc = getDoc(); writer.addDocument(doc); writer.commit(); docs = searcher.search(allQ, 10); *assertEquals(1,docs.totalHits); //it fails here. docs.totalHits equals 0* } What am I doing wrong here? If I initialize searcher with new IndexSearcher(directory) I'm told: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@3caa4blockFactory =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c: files: [] -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: The OR operator in a query ?
Hi, This two are valid and equivalent: - fq=sometag:1 OR sometag:5 - fq=sometag:(1 OR 5) Also, beware that fq defines a filter query, which is different from a regular query (http://wiki.apache.org/solr/CommonQueryParameters#fq). For more details on the query syntax see http://lucene.apache.org/java/2_4_0/queryparsersyntax.html Regards, *Juan* On Tue, Jul 5, 2011 at 3:15 PM, duddy67 san...@littlemarc.com wrote: Hi all, Someone could tell me what is the OR syntax in SOLR and how to use it in a search query ? I tried: fq=sometag:1+sometag:5 fq=sometag:[1+5] fq=sometag:[1OR5] fq=sometag:1+5 and many more but impossible to get what I want. Thanks for advance -- View this message in context: http://lucene.472066.n3.nabble.com/The-OR-operator-in-a-query-tp3141843p3141843.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: The OR operator in a query ?
Thanks for your response. I'll check this. -- View this message in context: http://lucene.472066.n3.nabble.com/The-OR-operator-in-a-query-tp3141843p3141916.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is solrj 3.3.0 ready for field collapsing?
Thanks for your response. Am 05.07.2011 13:53, schrieb Erick Erickson: Let's see the results of addingdebugQuery=on to your URL. Are you getting any documents back at all? If not, then your query isn't getting any documents to group. I didn't get any docs back. But they have been in the response (I saw them in debugger). But the structure had changed so that DocumentBuilder didn't brought me any results (getBeans()). I investigated a bit further and found out that i had to set the |*group_main param to true. https://builds.apache.org/job/Solr-trunk/javadoc/org/apache/solr/common/params/GroupParams.html#GROUP_MAIN Now is get results. So the answer seems to be yes :-). *| You haven't told us much about what you're trying to do, you might want to review: http://wiki.apache.org/solr/UsingMailingLists Sorry for that. Best Erick On Jul 4, 2011 11:55 AM, Per Newgroper.new...@gmx.ch wrote: Cheers Per
Re: A beginner problem
You can follow the links below to setup Nutch and Solr: http://thetechietutorials.blogspot.com/2011/06/solr-and-nutch-integration.html http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html http://wiki.apache.org/nutch/RunningNutchAndSolr Of course, more details will be helpful for troubleshooting your env issue. :-) Have fun! On Tue, Jul 5, 2011 at 11:49 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : follow a receipe. So I went to the the solr site, downloaded solr and : tried to follow the tutorial. In the example folder of solr, using : java -jar start.jar I got: : : 2011-07-04 13:22:38.439:INFO::Logging to STDERR via org.mortbay.log.StdErrLog : 2011-07-04 13:22:38.893:INFO::jetty-6.1-SNAPSHOT : 2011-07-04 13:22:38.946:INFO::Started SocketConnector@0.0.0.0:8983 if that is everything you got in the logs, then i suspect: a) you download a source release (ie: has *-src-* in it's name) in which the solr.war app has not yet been compiled) b) you did not run ant example to build solr and setup the example instance. If i'm wrong, then yes please more details would be helpful: what exact URL did you download? -Hoss
Re: Is solrj 3.3.0 ready for field collapsing?
On Mon, Jul 4, 2011 at 11:54 AM, Per Newgro per.new...@gmx.ch wrote: i've tried to add the params for group=true and group.field=myfield by using the SolrQuery. But the result is null. Do i have to configure something? In wiki part for field collapsing i couldn't find anything. No specific (type-safe) support for grouping is in SolrJ currently. But you should still have access to the complete generic solr response via SolrJ regardless (i.e. use getResponse()) -Yonik http://www.lucidimagination.com
Re: Cannot I search documents added by IndexWriter after commit?
Re-open doens't work, but open does. @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); final IndexReader indexReader = IndexReader.open(writer, true); IndexSearcher searcher = new IndexSearcher(indexReader); TopDocs docs = searcher.search(allQ, 10); assertEquals(0, docs.totalHits); // empty/no index Document doc = getDoc(); writer.addDocument(doc); writer.commit(); searcher = new IndexSearcher(IndexReader.open(writer, true));//new IndexSearcher(directory); docs = searcher.search(allQ, 10); assertEquals(1, docs.totalHits); } On Tue, Jul 5, 2011 at 8:23 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Still won't work (same as before). @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); final IndexReader indexReader = IndexReader.open(writer, true); IndexSearcher searcher = new IndexSearcher(indexReader); TopDocs docs = searcher.search(allQ, 10); assertEquals(0, docs.totalHits); // empty/no index Document doc = getDoc(); writer.addDocument(doc); writer.commit(); *indexReader.reopen(); searcher = new IndexSearcher(indexReader); docs = searcher.search(allQ, 10); * assertEquals(1,docs.totalHits); } private Document getDoc() { Document doc = new Document(); doc.add(new Field(id, 0, Field.Store.YES, Field.Index.NOT_ANALYZED)); return doc; } private IndexWriter getWriter() throws IOException {// 2 return new IndexWriter(directory, new WhitespaceAnalyzer(), // 2 IndexWriter.MaxFieldLength.UNLIMITED); // 2 } On Tue, Jul 5, 2011 at 8:15 PM, Michael McCandless luc...@mikemccandless.com wrote: Sorry, you must reopen the underlying IndexReader, and then make a new IndexSearcher from the reopened reader. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: and how do you do that? There is no reopen method On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless luc...@mikemccandless.com wrote: After your writer.commit you need to reopen your searcher to see the changes. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer, true)); TopDocs docs = searcher.search(allQ, 10); *assertEquals(0, docs.totalHits); // empty/no index* Document doc = getDoc(); writer.addDocument(doc); writer.commit(); docs = searcher.search(allQ, 10); *assertEquals(1,docs.totalHits); //it fails here. docs.totalHits equals 0* } What am I doing wrong here? If I initialize searcher with new IndexSearcher(directory) I'm told: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@3caa4blockFactory =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c: files: [] -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this)
Re: Cannot I search documents added by IndexWriter after commit?
re-open does work, but you cannot ignore its return value! see the javadocs for an example. On Tue, Jul 5, 2011 at 3:10 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Re-open doens't work, but open does. @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); final IndexReader indexReader = IndexReader.open(writer, true); IndexSearcher searcher = new IndexSearcher(indexReader); TopDocs docs = searcher.search(allQ, 10); assertEquals(0, docs.totalHits); // empty/no index Document doc = getDoc(); writer.addDocument(doc); writer.commit(); searcher = new IndexSearcher(IndexReader.open(writer, true));//new IndexSearcher(directory); docs = searcher.search(allQ, 10); assertEquals(1, docs.totalHits); } On Tue, Jul 5, 2011 at 8:23 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Still won't work (same as before). @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); final IndexReader indexReader = IndexReader.open(writer, true); IndexSearcher searcher = new IndexSearcher(indexReader); TopDocs docs = searcher.search(allQ, 10); assertEquals(0, docs.totalHits); // empty/no index Document doc = getDoc(); writer.addDocument(doc); writer.commit(); * indexReader.reopen(); searcher = new IndexSearcher(indexReader); docs = searcher.search(allQ, 10); * assertEquals(1,docs.totalHits); } private Document getDoc() { Document doc = new Document(); doc.add(new Field(id, 0, Field.Store.YES, Field.Index.NOT_ANALYZED)); return doc; } private IndexWriter getWriter() throws IOException { // 2 return new IndexWriter(directory, new WhitespaceAnalyzer(), // 2 IndexWriter.MaxFieldLength.UNLIMITED); // 2 } On Tue, Jul 5, 2011 at 8:15 PM, Michael McCandless luc...@mikemccandless.com wrote: Sorry, you must reopen the underlying IndexReader, and then make a new IndexSearcher from the reopened reader. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 2:12 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: and how do you do that? There is no reopen method On Tue, Jul 5, 2011 at 8:09 PM, Michael McCandless luc...@mikemccandless.com wrote: After your writer.commit you need to reopen your searcher to see the changes. Mike McCandless http://blog.mikemccandless.com On Tue, Jul 5, 2011 at 1:48 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: @Test public void testUpdate() throws IOException, ParserConfigurationException, SAXException, ParseException { Analyzer analyzer = getAnalyzer(); QueryParser parser = new QueryParser(Version.LUCENE_32, content, analyzer); Query allQ = parser.parse(*:*); IndexWriter writer = getWriter(); IndexSearcher searcher = new IndexSearcher(IndexReader.open(writer, true)); TopDocs docs = searcher.search(allQ, 10); * assertEquals(0, docs.totalHits); // empty/no index* Document doc = getDoc(); writer.addDocument(doc); writer.commit(); docs = searcher.search(allQ, 10); * assertEquals(1,docs.totalHits); //it fails here. docs.totalHits equals 0* } What am I doing wrong here? If I initialize searcher with new IndexSearcher(directory) I'm told: org.apache.lucene.index.IndexNotFoundException: no segments* file found in org.apache.lucene.store.RAMDirectory@3caa4blockFactory =org.apache.lucene.store.SingleInstanceLockFactory@ed0220c: files: [] -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)). -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the
Re: Apache Nutch and Solr Integration
On Tue, Jul 5, 2011 at 1:11 PM, Way Cool way1.wayc...@gmail.com wrote: Sorry, Serenity, somehow I don't see the attachment. On Tue, Jul 5, 2011 at 11:23 AM, serenity keningston serenity.kenings...@gmail.com wrote: Please find attached screenshot On Tue, Jul 5, 2011 at 11:53 AM, Way Cool way1.wayc...@gmail.com wrote: Can you let me know when and where you were getting the error? A screen-shot will be helpful. On Tue, Jul 5, 2011 at 8:15 AM, serenity keningston serenity.kenings...@gmail.com wrote: Hello Friends, I am a newbie to Solr and trying to integrate Apache Nutch 1.3 and Solr 3.2 . I did the steps explained in the following two URL's : http://wiki.apache.org/nutch/RunningNutchAndSolr http://thetechietutorials.blogspot.com/2011/06/how-to-build-and-start-apache-solr.html I downloaded both the softwares, however, I am getting error (*solrUrl is not set, indexing will be skipped..*) when I am trying to crawl using Cygwin. Can anyone please help me out to fix this issue ? Else any other website suggesting for Apache Nutch and Solr integration would be greatly helpful. Thanks Regards, Serenity
Re: Solr vs Hibernate Search (Huge number of DB DMLs)
Please suggest.. On Mon, Jul 4, 2011 at 10:37 PM, fire fox fyr3...@gmail.com wrote: From my exploration so far, I understood that we can opt Solr straightaway if the index changes are kept to minimal. However, mine is absolutely the opposite. I'm still vague about the perfect solution for the scenario mentioned. Please share.. On Mon, Jul 4, 2011 at 6:28 PM, fire fox fyr3...@gmail.com wrote: Hi all, There were several places I could find a discussion on this but I failed to find the suited one for me. I'd like to be clear on my requirements, so that you may suggest me the better solution. - A project deals with tons of database tables (with *millions *of records) out of which some are to be indexed which should be searchable of-course. It uses Hibernate for MySQL transactions. As per my knowledge, there could be two solutions to maintain sync between index and database effectively. -- There'd be a *huge number of transactions (DMLs) on the DB*, so I'm wondering which one of the following will be able to handle it effectively. 1) Configure *Solr *server, query it to search / send events to update. This might be better than handling Lucene solely which provides index read/write and load balancing. The problem here could be to implement maintain sync between index and DB with no lag as the updations (DMLs on DB) are very frequent. Too many events to be sent! 2) Using *Hibernate Search*. I'm just wondering about its *performance*considering high volume of transactions on DB every minute. Please suggest. Thanks in advance.
Can I invert the inverted index?
Hello, With an inverted index the term is the key, and the documents are the values. Is it still however possible that given a document id I get the terms indexed for that document? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I invert the inverted index?
Hi Gabriele, I'm not sure to understand your problem, but the TermVectorComponent may fit your needs ? http://wiki.apache.org/solr/TermVectorComponent http://wiki.apache.org/solr/TermVectorComponentExampleEnabled Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-invert-the-inverted-index-tp3142206p3142269.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: what s the optimum size of SOLR indexes
It depends on how many queries you'd be making per second. I know for us, I have a gradient of index sizes. The first machine, which gets hit most often is about 2.5 gigs. Most of the queries would only ever need to hit this index but then I have a bigger indices of about 5-10 gigs each which are slower, but don't get queried as often so I can afford them to be a little slower (and hence the bigger index) -- View this message in context: http://lucene.472066.n3.nabble.com/what-s-the-optimum-size-of-SOLR-indexes-tp3137314p3142309.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can I invert the inverted index?
I had looked an term vectors but don't understand them to solve my problem. Consider the following index entries: t0, doc0, doc1 t1, doc0 From the 2nd entry we know that t1 is only present in doc0. Now, my problem, given doc0 how can I know which terms occur in in (t0 and t1) (without storing the content)? One way is go over all terms in the index using the term dictionary. On Tue, Jul 5, 2011 at 10:14 PM, lboutros boutr...@gmail.com wrote: Hi Gabriele, I'm not sure to understand your problem, but the TermVectorComponent may fit your needs ? http://wiki.apache.org/solr/TermVectorComponent http://wiki.apache.org/solr/TermVectorComponentExampleEnabled Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-invert-the-inverted-index-tp3142206p3142269.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I invert the inverted index?
sounds like the Luke request handler will get what you're after: http://wiki.apache.org/solr/LukeRequestHandler http://wiki.apache.org/solr/LukeRequestHandler#id cheers, rob On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: Hello, With an inverted index the term is the key, and the documents are the values. Is it still however possible that given a document id I get the terms indexed for that document? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Can I invert the inverted index?
You can do this, kind of, but it's a lossy process. Consider indexing the cat in the hat strikes back, with the, in being stopwords and strikes getting stemmed to strike. At very best, you can reconstruct that the original doc contained cat, hat, strike, back. Is that sufficient? And it's a very expensive process. What is the problem you're trying to solve? Perhaps there are other ways to get what you need. Best Erick On Tue, Jul 5, 2011 at 4:22 PM, Gabriele Kahlout gabri...@mysimpatico.com wrote: I had looked an term vectors but don't understand them to solve my problem. Consider the following index entries: t0, doc0, doc1 t1, doc0 From the 2nd entry we know that t1 is only present in doc0. Now, my problem, given doc0 how can I know which terms occur in in (t0 and t1) (without storing the content)? One way is go over all terms in the index using the term dictionary. On Tue, Jul 5, 2011 at 10:14 PM, lboutros boutr...@gmail.com wrote: Hi Gabriele, I'm not sure to understand your problem, but the TermVectorComponent may fit your needs ? http://wiki.apache.org/solr/TermVectorComponent http://wiki.apache.org/solr/TermVectorComponentExampleEnabled Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-invert-the-inverted-index-tp3142206p3142269.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).
Re: Using FieldCache in SolrIndexSearcher - crazy idea?
: Correct me if I am wrong: In a standard distributed search with : QueryComponent, the first query sent to the shards asks for : fl=myUniqueKey or fl=myUniqueKey,score. When the response is being : generated to send back to the coordinator, SolrIndexSearcher.doc (int i, : SetString fields) is called for each document. As I understand it, : this will read each document from the index _on disk_ and retrieve the : myUniqueKey field value for each document. : : My idea is to have a FieldCache for the myUniqueKey field in : SolrIndexSearcher (or somewhere else?) that would be used in cases where : the only field that needs to be retrieved is myUniqueKey. Is this : something that would improve performance? Quite probably ... you typically can't assume that a FieldCache can be constructed for *any* field, but it should be a safe assumption for the uniqueKey field, so for that initial request of the mutiphase distributed search it's quite possible it would speed things up. if you want to try this and report back results, i'm sure a lot of people would be interested in a patch ... i would guess the best place to make the chance would be in the QueryComponent so thta it used the FieldCache (probably best to do it via getValueSource() on the uniqueKey's SchemaField) to put the ids in teh response instead of using a SolrDocList. Hmm, actually... there's no reason why this kind of optimization would need to be specific to distributed queries, it could be done by the ResponseWriters directly -- if the field list they are being asked to return only contains the uniqueKeyField and computed values (like score) then don't bother calling SolrIndexSearcher.doc at all ... the only hitch is that with distributed search and using function values as psuedo fields and what not there are more places calling SolrIndexSearcher.doc then their use to be ... so maybe putting this change directly into SolrIndexSearcher.doc would make the most sense? -Hoss
Re: A beginner problem
Thank you for your answer. I downloaded solr from the link you sugested and now it is ok, I can see the administration page. But it is strange that a download from the solr site does not work. Tanks also to Way Cool. I don't know why, but it happened the same to me in the past (with 3.2). Apparently the zip I downloaded was not correct. I think you have to have a solr.war file on the webapps directory, do you have it? Do you know which version of Solr you downloaded? Download this one: http://apache.dattatec.com/lucene/solr/3.3.0/apache-solr-3.3.0.zip I just tried it and it's there. On Mon, Jul 4, 2011 at 1:49 PM, carmme...@qualidade.info wrote: I use nutch, as a search engine. Until now nutch did the crawl and the search functions. The newest version, however, delegated the search to solr. I don't know almost nothing about programming, but i'm able to follow a receipe. So I went to the the solr site, downloaded solr and tried to follow the tutorial. In the example folder of solr, using java -jar start.jar I got: 2011-07-04 13:22:38.439:INFO::Logging to STDERR via org.mortbay.log.StdErrLog 2011-07-04 13:22:38.893:INFO::jetty-6.1-SNAPSHOT 2011-07-04 13:22:38.946:INFO::Started SocketConnector@0.0.0.0:8983 When I tried to go to http://localhost:8983/solr/admin/ I got: HTTP ERROR: 404 Problem accessing /solr/admin/. Reason: NOT_FOUND Can someone help me with this? Tanks
Re: ampersand, dismax, combining two fields, one of which is keywordTokenizer
: Maybe what I really need is a query parser that does not do disjunction : maximum at all, but somehow still combines different 'qf' type fields with : different boosts on each field. I personally don't _neccesarily_ need the : actual disjunction max calculation, but I do need combining of mutiple : fields with different boosts. Of course, I'm not sure exactly how it would : combine multiple fields if not disjunction maximum, but perhaps one is : conceivable that wouldn't be subject to this particular gotcha with differing : analysis. you can sort of do that today, something like this should work... q = _query_:$q1^100 _query_:$q2^10 _query_:$q3^5 _query_:$q4 q1 = {!lucene df=title v=$qq} q2 = {!lucene df=summary v=$qq} q3 = {!lucene df=author v=$qq} q4 = {!lucene df=body v=$qq} qq = ...user input here... ..but you might want to replace lucene with field depending on what metacharacters you want to support. in general though the reason i wrote the dismax parser (instead of a parser that works like this) is because of how a multiword queries wind up matching/scoring. A guy named Chuck Williams wrote the earliest versoin of the DisjunctionMaxQuery class and his albino elephant example totally sold me on this approach back in 2005... http://www.lucidimagination.com/search/document/8ce795c4b6752a1f/contribution_better_multi_field_searching https://issues.apache.org/jira/browse/LUCENE-323 : I also remain kind of confused about how the existing dismax figures out how : many terms for the 'mm' type calculations. If someone wanted to explain that, : I would find it enlightening and helpful for understanding what's going on. it's not really about terms -- it's just the total number of clauses in the outer BooleanQuery that it builds. if a chunk of input produces a valid DisjunctionMaxQuery (because the analyzer for at least one qf field generated tokens) then that's a clause, if a chunk of input doesn't produce a token (because none of hte analyzers from any of the qf ields generated tokens) then that's not a clause. -Hoss
Re: CopyField into another CopyField?
: In solr, is it possible to 'chain' copyfields so that you can copy the value : of one into another? ... : copyField source=name dest=autocomplete / : copyField source=autocomplete dest=ac_spellcheck / : : Point being, every time I add a new field to the autocomplete, I want it to : automatically also be added to ac_spellcheck without having to do it twice. Sorry no, the IndexSchema won't recursively apply copyFields. As i recall it was implemented this way partly for simplicity, and largly to protect people against the risk of infinite loops. we could probably make a more sophisticated impl that detects infinite loops, but that check would slow things down and all solr could really do with it is throw an error. -Hoss
Re: Using FieldCache in SolrIndexSearcher - crazy idea?
On Tue, Jul 5, 2011 at 5:13 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : Correct me if I am wrong: In a standard distributed search with : QueryComponent, the first query sent to the shards asks for : fl=myUniqueKey or fl=myUniqueKey,score. When the response is being : generated to send back to the coordinator, SolrIndexSearcher.doc (int i, : SetString fields) is called for each document. As I understand it, : this will read each document from the index _on disk_ and retrieve the : myUniqueKey field value for each document. : : My idea is to have a FieldCache for the myUniqueKey field in : SolrIndexSearcher (or somewhere else?) that would be used in cases where : the only field that needs to be retrieved is myUniqueKey. Is this : something that would improve performance? Quite probably ... you typically can't assume that a FieldCache can be constructed for *any* field, but it should be a safe assumption for the uniqueKey field, so for that initial request of the mutiphase distributed search it's quite possible it would speed things up. Ah, thanks Hoss - I had meant to respond to the original email, but then I lost track of it. Via pseudo-fields, we actually already have the ability to retrieve values via FieldCache. fl=id:{!func}id But using CSF would probably be better here - no memory overhead for the FieldCache entry. -Yonik http://www.lucidimagination.com if you want to try this and report back results, i'm sure a lot of people would be interested in a patch ... i would guess the best place to make the chance would be in the QueryComponent so thta it used the FieldCache (probably best to do it via getValueSource() on the uniqueKey's SchemaField) to put the ids in teh response instead of using a SolrDocList. Hmm, actually... there's no reason why this kind of optimization would need to be specific to distributed queries, it could be done by the ResponseWriters directly -- if the field list they are being asked to return only contains the uniqueKeyField and computed values (like score) then don't bother calling SolrIndexSearcher.doc at all ... the only hitch is that with distributed search and using function values as psuedo fields and what not there are more places calling SolrIndexSearcher.doc then their use to be ... so maybe putting this change directly into SolrIndexSearcher.doc would make the most sense? -Hoss
Re: Using FieldCache in SolrIndexSearcher - crazy idea?
Ah, thanks Hoss - I had meant to respond to the original email, but then I lost track of it. Via pseudo-fields, we actually already have the ability to retrieve values via FieldCache. fl=id:{!func}id But using CSF would probably be better here - no memory overhead for the FieldCache entry. Not sure if this is related, but we should also consider using the memory codec for id field https://issues.apache.org/jira/browse/LUCENE-3209
Re: Is solrj 3.3.0 ready for field collapsing?
patches are always welcome! On Tue, Jul 5, 2011 at 3:04 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Mon, Jul 4, 2011 at 11:54 AM, Per Newgro per.new...@gmx.ch wrote: i've tried to add the params for group=true and group.field=myfield by using the SolrQuery. But the result is null. Do i have to configure something? In wiki part for field collapsing i couldn't find anything. No specific (type-safe) support for grouping is in SolrJ currently. But you should still have access to the complete generic solr response via SolrJ regardless (i.e. use getResponse()) -Yonik http://www.lucidimagination.com
Re: configure dismax requesthandlar for boost a field
: But i am not finding any effect in my search results. do i need to do some : more configuration to see the effect. posting the solrcofing.xml section for your dismax handler is helpful, start, but to provide any meaninful assistance to you we need a lot more info then just that... * an example of hte URL you are using so we can see what params you are sending at query time * the scores ids of docs that you don't feel are being returned in the order you think they should * the score explanation output for those docs using debugQuery=true (and if needed: explainother) so we can see what kinds of scores are being computed for various docs and why and help you understand that same output to make any changes as needed.. -Hoss
Re: Can I invert the inverted index?
Gabriele, I created a patch that does this about a year ago. See https://issues.apache.org/jira/browse/SOLR-1837. It was written for Solr 1.4 and is based upon the Document Reconstructor in Luke. The patch adds a link to the main solr admin page to a docinspector page which will reconstruct the document given a uniqueid (required). Keep in mind that you're only looking at what's in the index for non-stored fields, not the original text. If you have any issues using this on the most recent release, let me know and I'd be happy to create a new patch for solr 3.3. One of these days I'll remove the JSP dependency and this may eventually making it into trunk. Thanks, -Trey Grainger Search Technology Development Team Lead, Careerbuilder.com Site Architect, Celiaccess.com On Tue, Jul 5, 2011 at 3:59 PM, Gabriele Kahlout gabri...@mysimpatico.comwrote: Hello, With an inverted index the term is the key, and the documents are the values. Is it still however possible that given a document id I get the terms indexed for that document? -- Regards, K. Gabriele --- unchanged since 20/9/10 --- P.S. If the subject contains [LON] or the addressee acknowledges the receipt within 48 hours then I don't resend the email. subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x) Now + 48h) ⇒ ¬resend(I, this). If an email is sent by a sender that is not a trusted contact or the email does not contain a valid code then the email is not received. A valid code starts with a hyphen and ends with X. ∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈ L(-[a-z]+[0-9]X)).