Re: Tag Cloud Generation Problem
The facetting engine can do this job. On Thursday 08 April 2010 10:16:09 Ninad Raut wrote: Hi, I have a business use case where in I have to generate a tagcloud for words with freequency greater than a specified threshold. The way I store records in solr is : For every solr document (which includes content) I store mutlivalued entry of buzzwords with their frequency. The technical problem I face is : While generating a tag cloud I donot know the buzzwords before hand. Morover I want the frequecy total for a Buzzword across documents. In SQL the way to do is: Select buzzWord, sum(frequency) from Verbatim where count(frequency)thresholdValue group by buzzWord Is there a similar way I can query a SOLR. Even a workaround solution to this will do. Thanks. Regards, Ninad R Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Tag Cloud Generation Problem
Hi, It's simpler than you might think :) ?q=*:*facet=truefacet.field=buzzWordrows=0 This will retrieve an overall facet count (useful for navigation and tag cloud generation) but doesn't return the documents themselves. Check the facetting wiki [1] for more information. [1]: http://wiki.apache.org/solr/SimpleFacetParameters Cheers On Thursday 08 April 2010 10:47:19 Ninad Raut wrote: Hi Markus, But the problem is, we donot know the words before hand. What will be the facet Query be? If you can just explain me with an example it would be really nice of you. Regards, Ninad R On Thu, Apr 8, 2010 at 2:09 PM, Markus Jelsma mar...@buyways.nl wrote: The facetting engine can do this job. On Thursday 08 April 2010 10:16:09 Ninad Raut wrote: Hi, I have a business use case where in I have to generate a tagcloud for words with freequency greater than a specified threshold. The way I store records in solr is : For every solr document (which includes content) I store mutlivalued entry of buzzwords with their frequency. The technical problem I face is : While generating a tag cloud I donot know the buzzwords before hand. Morover I want the frequecy total for a Buzzword across documents. In SQL the way to do is: Select buzzWord, sum(frequency) from Verbatim where count(frequency)thresholdValue group by buzzWord Is there a similar way I can query a SOLR. Even a workaround solution to this will do. Thanks. Regards, Ninad R Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Re: Using Solr with CouchDB
Hi, Setting up CouchDB-Lucene is quite easy, but you don't want that i guess. You could construct a show function to convert input to Solr accepted XML, should be very straightforward. You just need some program to fetch from CouchDB and push it in Solr. Cheers, -Original message- From: Patrick Petermair patrick.peterm...@openforce.com Sent: Wed 28-04-2010 17:45 To: solr-user@lucene.apache.org; Subject: Re: Using Solr with CouchDB Hey Brendan! Thanks for your response. I don't know much about couch, but if you to return json from solr (which I think couch would understand) you can do that with wt=json in the query string when querying solr. See here for more details: http://wiki.apache.org/solr/SolJSON Actually I'm looking for the other way around. I'm trying to get Solr to index my CouchDB. CouchDB works with a REST API and returns plaintext JSON. So I'm looking to get JSON into Solr and not out of :) On the CouchDB wiki I've found a reference to a project CouchDB Solr2 which seemed to do exactly what I'm trying to do (full text indexing and searching with CouchDB), but it is no longer maintained as of January 2009 and cannot be found anymore on github. Maybe it's because there is now a simple way to do it in Solr and I just haven't found it yet ;) Patrick
RE: Re: Using Solr with CouchDB
Whether you need Solr depends on if you require some features such as highlighting, faceting, more-like-this etc. They will not work with CouchDB-Lucene, nor can you, at this moment, use CoucDB-Lucene behind CouchDB-Lounge although a seperate shard can have a sharded Lucene index, you cannot query them through smartproxyd. You need to know what you want to do with fulltext-search before choosing and join CouchDB's mailinglist if you haven't already. -Original message- From: Patrick Petermair patrick.peterm...@openforce.com Sent: Wed 28-04-2010 18:03 To: solr-user@lucene.apache.org; Subject: Re: Using Solr with CouchDB Setting up CouchDB-Lucene is quite easy, but you don't want that i guess. Yeah, I was thinking about CouchDB-Lucene too (also found it in the CouchDB wiki). It's not like I HAVE to make it work with Solr. If it turns out that it's not possible or a pain in the ass, I'll probably go for the easy way with CouchDB-Lucene. Patrick
RE: schema.xml question
You could write your own requestHandler in solrconfig.xml, it'll allow you to predefine parameters for your configured search components. -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:17 To: solr-user@lucene.apache.org; Subject: schema.xml question Hello everyone, my question is Is it possible in schema.xml set a group of fields to use as a default field to query in OR or in AND ??? example: group name=group_name field name=a type=. / field name=b type=. / field name=c type=. / /group defaultSearchFieldgroup_name/defaultSearchField Thanks in advance
RE: Re: schema.xml question
A requestHandler works as an URL that can have predefined parameters. By default you will be querying the /select/ requestHandler. It, for instance, predefines the default number of rows to return (10) and returns all fields of a document (*). requestHandler name=standard class=solr.SearchHandler default=true !-- default values for query parameters -- lst name=defaults str name=echoParamsexplicit/str !-- int name=rows10/int str name=fl*/str str name=version2.1/str -- /lst /requestHandler But you can also define more complex requestHandlers. The default configuration adds the dismax requestHandler (/dismax/) but it's actually the same as the default requestHandler if you would define all those configured parameters in your URL. So by defining the parameters in the solrconfig.xml, you won't need to pass them in you query. You can of course override predefined parameters with the exception of parameters defined inside an invariants block. Check the documentation [1] on this subject but i would suggest you study the shipped solrconfig.xml [2] configuration file, it offers better explanation of the subject. [1]: http://wiki.apache.org/solr/SolrConfigXml [2]: http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/solrconfig.xml Cheers, -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:26 To: solr-user@lucene.apache.org; Subject: Re: schema.xml question For the moment I don't know how to do it, but I'll follow your suggestion :) Thank you very much ... ps. I'm just a novel 2010/5/7 Markus Jelsma markus.jel...@buyways.nl You could write your own requestHandler in solrconfig.xml, it'll allow you to predefine parameters for your configured search components. -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:17 To: solr-user@lucene.apache.org; Subject: schema.xml question Hello everyone, my question is Is it possible in schema.xml set a group of fields to use as a default field to query in OR or in AND ??? example: group name=group_name field name=a type=. / field name=b type=. / field name=c type=. / /group defaultSearchFieldgroup_name/defaultSearchField Thanks in advance
RE: Re: schema.xml question
I forgot, there is actually a proper wiki page on this subject: http://wiki.apache.org/solr/SolrRequestHandler -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:26 To: solr-user@lucene.apache.org; Subject: Re: schema.xml question For the moment I don't know how to do it, but I'll follow your suggestion :) Thank you very much ... ps. I'm just a novel 2010/5/7 Markus Jelsma markus.jel...@buyways.nl You could write your own requestHandler in solrconfig.xml, it'll allow you to predefine parameters for your configured search components. -Original message- From: Antonello Mangone antonello.mang...@gmail.com Sent: Fri 07-05-2010 15:17 To: solr-user@lucene.apache.org; Subject: schema.xml question Hello everyone, my question is Is it possible in schema.xml set a group of fields to use as a default field to query in OR or in AND ??? example: group name=group_name field name=a type=. / field name=b type=. / field name=c type=. / /group defaultSearchFieldgroup_name/defaultSearchField Thanks in advance
RE: Help indexing PDF files
Hi, The wiki page [1] on this subject will get you started. [1]: http://wiki.apache.org/solr/ExtractingRequestHandler Cheers -Original message- From: Leonardo Azize Martins laz...@gmail.com Sent: Fri 07-05-2010 15:37 To: solr-user@lucene.apache.org; Subject: Help indexing PDF files Hi, I am new in Solr. I would like to index some PDF files. How can I do using example schema from 1.4.0 version? Regards, Leo
RE: Re: Help indexing PDF files
You don't need it, you can use any PDF file. -Original message- From: Leonardo Azize Martins laz...@gmail.com Sent: Fri 07-05-2010 15:45 To: solr-user@lucene.apache.org; Subject: Re: Help indexing PDF files I am using this page, but in my downloaded version there is no site directory. Thanks 2010/5/7 Markus Jelsma markus.jel...@buyways.nl Hi, The wiki page [1] on this subject will get you started. [1]: http://wiki.apache.org/solr/ExtractingRequestHandler Cheers -Original message- From: Leonardo Azize Martins laz...@gmail.com Sent: Fri 07-05-2010 15:37 To: solr-user@lucene.apache.org; Subject: Help indexing PDF files Hi, I am new in Solr. I would like to index some PDF files. How can I do using example schema from 1.4.0 version? Regards, Leo
RE: How to query for similar documents before indexing
Hi, Deduplication [1] is what you're looking for.It can utilize different analyzers that will add a one or more signatures or hashes to your document depending on exact or partial matches for configurable fields. Based on that, it should be able to prevent new documents from entering the index. The first part works very well but i have some issues with removing those documents on which i also need to check with the community tomorrow back at work ;-) [1]: http://wiki.apache.org/solr/Deduplication Cheers, -Original message- From: Matthieu Labour matthieu_lab...@yahoo.com Sent: Mon 10-05-2010 22:41 To: solr-user@lucene.apache.org; Subject: How to query for similar documents before indexing Hi I want to implement the following logic: Before I index a new document into the index, I want to check if there are already documents in the index with similar content to the content of the document about to be inserted. If the request returns 1 or more documents, then I don't want to insert the document. What is the best way to achieve the above functionality ? I read about Fuzzy searches in logic. But can I really build a request such as mydoc.title:wordexample~ AND mydoc.content:( all the content words)~0.9 ? Thank you for your help
RE: How to query for similar documents before indexing
Hi Matthieu, On the top of the wiki page you can see it's in 1.4 already. As far as i know the API doesn't return information on found duplicates in its response header, the wiki isn't clear on that subject. I, at least, never saw any other response than an error or the usual status code and QTime. Perhaps it would be a nice feature. On the other hand, you can also have a manual process that finds duplicates based on that signature and gather that information yourself as long as such a feature isn't there. Cheers, -Original message- From: Matthieu Labour matthieu_lab...@yahoo.com Sent: Mon 10-05-2010 23:30 To: solr-user@lucene.apache.org; Subject: RE: How to query for similar documents before indexing Markus Thank you for your response That would be great if the index has the option to prevent duplicate from entering the index. But is it going to be a silent action ? Or will the add method return that it failed indexing because it detected a duplicate ? Is it commited to the 1.4 already ? Cheers matt --- On Mon, 5/10/10, Markus Jelsma markus.jel...@buyways.nl wrote: From: Markus Jelsma markus.jel...@buyways.nl Subject: RE: How to query for similar documents before indexing To: solr-user@lucene.apache.org Date: Monday, May 10, 2010, 4:11 PM Hi, Deduplication [1] is what you're looking for.It can utilize different analyzers that will add a one or more signatures or hashes to your document depending on exact or partial matches for configurable fields. Based on that, it should be able to prevent new documents from entering the index. The first part works very well but i have some issues with removing those documents on which i also need to check with the community tomorrow back at work ;-) [1]: http://wiki.apache.org/solr/Deduplication Cheers, -Original message- From: Matthieu Labour matthieu_lab...@yahoo.com Sent: Mon 10-05-2010 22:41 To: solr-user@lucene.apache.org; Subject: How to query for similar documents before indexing Hi I want to implement the following logic: Before I index a new document into the index, I want to check if there are already documents in the index with similar content to the content of the document about to be inserted. If the request returns 1 or more documents, then I don't want to insert the document. What is the best way to achieve the above functionality ? I read about Fuzzy searches in logic. But can I really build a request such as mydoc.title:wordexample~ AND mydoc.content:( all the content words)~0.9 ? Thank you for your help
Dedupe and overwriteDupes setting
List, I've stumbled upon an issue with the deduplication mechanism. It either deletes all documents or does nothing at all and it depends on the overwriteDupes setting, resp. true and false. I use a slightly modified configuration: updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsig/str bool name=overwriteDupestrue/bool str name=fieldscontent/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain field name=sig type=string stored=true indexed=false multiValued=true / After importing new documents i (only with overwriteDupes=false) can clearly see the correct signatures. Most documents have a distinct signature and some share the same because the content field's value is identical for those documents. Anyway, why does it delete all my documents? Any clues? The wiki is not very helpful on this subject. Cheers. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Dedupe and overwriteDupes setting
It seems this e-mail did already leave the outbox yesterday. Apologies for the spam. On Tuesday 11 May 2010 10:13:18 Markus Jelsma wrote: List, I've stumbled upon an issue with the deduplication mechanism. It either deletes all documents or does nothing at all and it depends on the overwriteDupes setting, resp. true and false. I use a slightly modified configuration: updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsig/str bool name=overwriteDupestrue/bool str name=fieldscontent/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/st r /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain field name=sig type=string stored=true indexed=false multiValued=true / After importing new documents i (only with overwriteDupes=false) can clearly see the correct signatures. Most documents have a distinct signature and some share the same because the content field's value is identical for those documents. Anyway, why does it delete all my documents? Any clues? The wiki is not very helpful on this subject. Cheers. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: How to query for similar documents before indexing
If you set overwriteDupes = false the exact or near duplicate documents will not be deleted. The signature field is set, however, so you can later query yourself for duplicates in an external program and do whatever you want with the duplicates. On Tuesday 11 May 2010 15:41:33 Matthieu Labour wrote: Hi Markus Thank you for your answer Here is a use case where I think it would be nice to know there is a dup before I insert it. Let's say I create a summary out of the document and I only index the summary and store the document itself on a separate device (S3, Cassandra etc ...). Then I would need that addDocument on the summary failed because it detected a duplicate so that I don't neet to store the document. When you write: On the other hand, you can also have a manual process that finds duplicates based on that signature and gather that information yourself as long as such a feature isn't there. Can you explain more what you have in mind ? Thank you for your help! matt --- On Mon, 5/10/10, Markus Jelsma markus.jel...@buyways.nl wrote: From: Markus Jelsma markus.jel...@buyways.nl Subject: RE: How to query for similar documents before indexing To: solr-user@lucene.apache.org Date: Monday, May 10, 2010, 5:07 PM Hi Matthieu, On the top of the wiki page you can see it's in 1.4 already. As far as i know the API doesn't return information on found duplicates in its response header, the wiki isn't clear on that subject. I, at least, never saw any other response than an error or the usual status code and QTime. Perhaps it would be a nice feature. On the other hand, you can also have a manual process that finds duplicates based on that signature and gather that information yourself as long as such a feature isn't there. Cheers, -Original message- From: Matthieu Labour matthieu_lab...@yahoo.com Sent: Mon 10-05-2010 23:30 To: solr-user@lucene.apache.org; Subject: RE: How to query for similar documents before indexing Markus Thank you for your response That would be great if the index has the option to prevent duplicate from entering the index. But is it going to be a silent action ? Or will the add method return that it failed indexing because it detected a duplicate ? Is it commited to the 1.4 already ? Cheers matt --- On Mon, 5/10/10, Markus Jelsma markus.jel...@buyways.nl wrote: From: Markus Jelsma markus.jel...@buyways.nl Subject: RE: How to query for similar documents before indexing To: solr-user@lucene.apache.org Date: Monday, May 10, 2010, 4:11 PM Hi, Deduplication [1] is what you're looking for.It can utilize different analyzers that will add a one or more signatures or hashes to your document depending on exact or partial matches for configurable fields. Based on that, it should be able to prevent new documents from entering the index. The first part works very well but i have some issues with removing those documents on which i also need to check with the community tomorrow back at work ;-) [1]: http://wiki.apache.org/solr/Deduplication Cheers, -Original message- From: Matthieu Labour matthieu_lab...@yahoo.com Sent: Mon 10-05-2010 22:41 To: solr-user@lucene.apache.org; Subject: How to query for similar documents before indexing Hi I want to implement the following logic: Before I index a new document into the index, I want to check if there are already documents in the index with similar content to the content of the document about to be inserted. If the request returns 1 or more documents, then I don't want to insert the document. What is the best way to achieve the above functionality ? I read about Fuzzy searches in logic. But can I really build a request such as mydoc.title:wordexample~ AND mydoc.content:( all the content words)~0.9 ? Thank you for your help Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Re: Dedupe and overwriteDupes setting
Thanks Mark, I already fixed it in the meantime and quickly went on with the usual stuff, i know, bad me =). I'll file a Jira report tomorrow and update the wiki on this subject. I'll can also file another ticket from another current topic on this subject; that's about a proper use-case for the update handler to return information on which documents where rejected due to dedupe. I would like to think that updating the wiki with links to those new Jira tickets would be a good idea for other readers, is it not? Cheers, -Original message- From: Mark Miller markrmil...@gmail.com Sent: Tue 11-05-2010 17:25 To: solr-user@lucene.apache.org; Subject: Re: Dedupe and overwriteDupes setting 1. You need to set the sig field to indexed. 2. This should be added to the wiki 3. Want to make a JIRA issue? This is not very friendly behavior (when you have the sig field set to indexed=false and overwriteDupes=true it should likely complain) -- - Mark http://www.lucidimagination.com On 5/11/10 4:13 AM, Markus Jelsma wrote: List, I've stumbled upon an issue with the deduplication mechanism. It either deletes all documents or does nothing at all and it depends on the overwriteDupes setting, resp. true and false. I use a slightly modified configuration: updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsig/str bool name=overwriteDupestrue/bool str name=fieldscontent/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain field name=sig type=string stored=true indexed=false multiValued=true / After importing new documents i (only with overwriteDupes=false) can clearly see the correct signatures. Most documents have a distinct signature and some share the same because the content field's value is identical for those documents. Anyway, why does it delete all my documents? Any clues? The wiki is not very helpful on this subject. Cheers. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Config issue for deduplication
What's your solrconfig? No deduplication is overwritesDedupes = false and signature field is other than doc ID field (unique) -Original message- From: Markus Fischer i...@flyingfischer.ch Sent: Thu 13-05-2010 17:01 To: solr-user@lucene.apache.org; Subject: Config issue for deduplication I am trying to configure automatic deduplication for SOLR 1.4 in Vufind. I followed: http://wiki.apache.org/solr/Deduplication Actually nothing happens. All records are being imported without any deduplication. What am I missing? Thanks Markus I did: - create a duplicated set of records, only shifted their ID by a fixed number --- solrconfig.xml requestHandler name=/update class=solr.XmlUpdateRequestHandler lst name=defaults str name=update.processordedupe/str /lst /requestHandler updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool bool name=overwriteDupestrue/bool str name=signatureFielddedupeHash/str str name=fieldsreference,issn/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain --- In schema.xml I added the field field name=dedupeHash type=string stored=true indexed=true multiValued=false / -- If I look at the created field dedupeHash it seems to be empty...!?
RE: Solr read-only core
Hi, I'd guess there are two ways in doing this but i've never seen any solrconfig.xml file having any directives that explicitly do not allow for updates. You'd either have a proxy in front that simply won't allow any other HTTP method than GET and HEAD, or you could remove the update request handler from your solrconfig.xml file. I've never tried the latter but i'd figure that without a request handler to accommodate updates, no updates can be made. Cheers, -Original message- From: Yao y...@ford.com Sent: Tue 25-05-2010 21:49 To: solr-user@lucene.apache.org; Subject: Solr read-only core Is there a way to open a Solr index/core in read-only mode? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-read-only-core-tp843049p843049.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Filtering near-duplicates using TextProfileSignature
Here's my config for the updateProcessor. It not uses another signature method but i've used TextProfileSignature as well and it works - sort of. updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsig/str bool name=overwriteDupestrue/bool str name=fieldscontent/str str name=signatureClassorg.apache.solr.update.processor.Lookup3Signature/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain Of course, you must define the updateProcessor in your requestHandler, it's commented out in mine at the moment. requestHandler name=/update class=solr.XmlUpdateRequestHandler !-- lst name=defaults str name=update.processordedupe/str /lst -- /requestHandler Also, i see you define minTokenLen = 3. Where does that come from? Haven't seen anything on the wiki specifying such a parameter. On Tuesday 08 June 2010 19:45:35 Neeb wrote: Hey Andrew, Just wondering if you ever managed to run TextProfileSignature based deduplication. I would appreciate it if you could send me the code fragment for it from solrconfig. I have currently something like this, but not sure if I am doing it right: updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool str name=signatureFieldsignature/str bool name=overwriteDupestrue/bool str name=fieldstitle,author,abstract/str str name=signatureClassorg.apache.solr.update.processor.TextProfileSignature /str str name=minTokenLen3/str /processor processor class=solr.LogUpdateProcessorFactory / processor class=solr.RunUpdateProcessorFactory / /updateRequestProcessorChain -- Thanks in advance, -Ali Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Filtering near-duplicates using TextProfileSignature
Well, it got me too! KMail didn't properly order this thread. Can't seem to find Hatcher's reply anywhere. ??!!? On Tuesday 08 June 2010 22:00:06 Andrew Clegg wrote: Andrew Clegg wrote: Re. your config, I don't see a minTokenLength in the wiki page for deduplication, is this a recent addition that's not documented yet? Sorry about this -- stupid question -- I should have read back through the thread and refreshed my memory. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Issue with response header in SOLR running on Linux instance
Hi, Check your requestHandler. It may preset some values that you don't see. Your echoParams setting may be explicit instead of all [1]. Alternatively, you could add the echoParams parameter to your query if it isn't set as an invariant in your requestHandler. [1]: http://wiki.apache.org/solr/CoreQueryParameters Cheers, On Wednesday 09 June 2010 15:25:09 bbarani wrote: Hi, I have been using SOLR for sometime now and had no issues till I was using it in windows. Yesterday I moved the SOLR code to Linux servers and started to index the data. Indexing completed successfully in the linux severs but when I queried the index, the response header returned (by the SOLR instance running in Linux server) is different from the response header returned in SOLR instance that is running on windows instance. Response header returned by SOLR instance running in windows machine - lst name=responseHeader int name=status0/int int name=QTime2219/int - lst name=params str name=indenton/str str name=start0/str str name=qcredit/str str name=version2.2/str str name=rows10/str /lst /lst Response header returned by SOLR instance running in Linux machine - response - responseHeader status0/status QTime26/QTime - lst name=params str name=qcredit/str /lst /responseHeader Any idea why this happens? Thanks, Barani Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Re: Solr and Nutch/Droids - to use or not to use?
Nutch does not, at this moment, support some form of consistent hashing to select an appropriate shard. It would be nice if someone could file an issue in Nutch' Jira to add sharding support to it, perhaps someone with a better understanding and more experience with Solr's distributed search than i have at the moment. I can't point Nutch' developers to the right piece of documentation on this one ;) -Original message- From: Otis Gospodnetic otis_gospodne...@yahoo.com Sent: Wed 16-06-2010 21:03 To: solr-user@lucene.apache.org; Subject: Re: Solr and Nutch/Droids - to use or not to use? Hi Mitch, Solr can do distributed search, so it can definitely handle indices that can't fit on a single server without sharding. What I think *might* be the case that the Nutch indexer that sends docs to Solr might not be capable of sending documents to multiple Solr cores/shards. If that is the case, I think you need to move this to the Nutch user/dev list and see how to feed multiple Solr indices/cores/shards with Nutch data. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: MitchK mitc...@web.de To: solr-user@lucene.apache.org Sent: Wed, June 16, 2010 2:27:16 PM Subject: Re: Solr and Nutch/Droids - to use or not to use? Thanks, that really helps to find the right beginning for such a journey. :-) * Use Solr, not Nutch's search webapp As far as I have read, Solr can't scale, if the index gets too large for one Server The setup explained here has one significant caveat you also need to keep in mind: scale. You cannot use this kind of setup with vertical scale (collection size) that goes beyond one Solr box. The horizontal scaling (query throughput) is still possible with the standard Solr replication tools. ...from Lucidimagination.com Is this still the case? Furthermore, as far as I have understood this blogpost: href=http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/; target=_blank http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ Lucidimagination.com : Nutch and Solr , they index the whole stuff with nutch and reindex it to Solr - sounds like a lot of redundant work. Lucid, Sematext and the Nutch-wiki are the only information-sources where I can find talks about Nutch and Solr, but no one seems to talk about these facts - except this one blogpost. If you say this is wrong or contingent on the shown setup, can you tell me how to avoid these problems? A lot of questions, but it's such an exciting topic... Hopefully you can answer some of them. Again, thank you for the feedback, Otis. - Mitch -- View this message in context: href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html; target=_blank http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Re: Re: Solr and Nutch/Droids - to use or not to use?
You're right. Currently clients need to take care of this, in this case, Nutch would be the client but it cannot be configured as such. It would, indeed, be more appropriate for Solr to take care of this. We can already query any server with a set of shard hosts specified, so it would make sense if Solr also supported some kind of consistent hashing and shard management configuration. With CouchDB-Lounge we can easily create a shard map that supports redundant shards on different servers for fail-over. It would be marvelous if Solr would support it as well. -Original message- From: Otis Gospodnetic otis_gospodne...@yahoo.com Sent: Wed 16-06-2010 21:41 To: solr-user@lucene.apache.org; Subject: Re: Re: Solr and Nutch/Droids - to use or not to use? Well, it's not that Nutch doesn't support it. Solr itself doesn't support it. Indexing applications need to know which shard they want to send documents to. This may be a good case for a new wish issue in Solr JIRA? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Markus Jelsma markus.jel...@buyways.nl To: solr-user@lucene.apache.org Sent: Wed, June 16, 2010 3:31:49 PM Subject: RE: Re: Solr and Nutch/Droids - to use or not to use? Nutch does not, at this moment, support some form of consistent hashing to select an appropriate shard. It would be nice if someone could file an issue in Nutch' Jira to add sharding support to it, perhaps someone with a better understanding and more experience with Solr's distributed search than i have at the moment. I can't point Nutch' developers to the right piece of documentation on this one ;) -Original message- From: Otis Gospodnetic href=mailto:otis_gospodne...@yahoo.com;otis_gospodne...@yahoo.com Sent: Wed 16-06-2010 21:03 To: href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org; Subject: Re: Solr and Nutch/Droids - to use or not to use? Hi Mitch, Solr can do distributed search, so it can definitely handle indices that can't fit on a single server without sharding. What I think *might* be the case that the Nutch indexer that sends docs to Solr might not be capable of sending documents to multiple Solr cores/shards. If that is the case, I think you need to move this to the Nutch user/dev list and see how to feed multiple Solr indices/cores/shards with Nutch data. Otis Sematext :: target=_blank http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: MitchK ymailto=mailto:mitc...@web.de; href=mailto:mitc...@web.de;mitc...@web.de To: ymailto=mailto:solr-user@lucene.apache.org; href=mailto:solr-user@lucene.apache.org;solr-user@lucene.apache.org Sent: Wed, June 16, 2010 2:27:16 PM Subject: Re: Solr and Nutch/Droids - to use or not to use? Thanks, that really helps to find the right beginning for such a journey. :-) * Use Solr, not Nutch's search webapp As far as I have read, Solr can't scale, if the index gets too large for one Server The setup explained here has one significant caveat you also need to keep in mind: scale. You cannot use this kind of setup with vertical scale (collection size) that goes beyond one Solr box. The horizontal scaling (query throughput) is still possible with the standard Solr replication tools. ...from Lucidimagination.com Is this still the case? Furthermore, as far as I have understood this blogpost: href= href=http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/; target=_blank http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/; target=_blank href=http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ target=_blank http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/ Lucidimagination.com : Nutch and Solr , they index the whole stuff with nutch and reindex it to Solr - sounds like a lot of redundant work. Lucid, Sematext and the Nutch-wiki are the only information-sources where I can find talks about Nutch and Solr, but no one seems to talk about these facts - except this one blogpost. If you say this is wrong or contingent on the shown setup, can you tell me how to avoid these problems? A lot of questions, but it's such an exciting topic... Hopefully you can answer some of them. Again, thank you for the feedback, Otis. - Mitch -- View this message in context: href= href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html; target=_blank http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html; target=_blank href=http://lucene.472066.n3.nabble.com/Solr-and-Nutch-Droids-to-use-or-not-to-use-tp900069p900604.html target=_blank http
RE: federated / meta search
Hi, Check out Solr sharding [1] capabilities. I never tested it with different schema's but if each node is queried with fields that it supports, it should return useful results. [1]: http://wiki.apache.org/solr/DistributedSearch Cheers. -Original message- From: Sascha Szott sz...@zib.de Sent: Thu 17-06-2010 19:44 To: solr-user@lucene.apache.org; Subject: federated / meta search Hi folks, if I'm seeing it right Solr currently does not provide any support for federated / meta searching. Therefore, I'd like to know if anyone has already put efforts into this direction? Moreover, is federated / meta search considered a scenario Solr should be able to deal with at all or is it (far) beyond the scope of Solr? To be more precise, I'll give you a short explanation of my requirements. Assume, there are a couple of Solr instances running at different places. The documents stored within those instances are all from the same domain (bibliographic records), but it can not be ensured that the schema definitions conform to 100%. But lets say, there are at least some index fields that are present in all instances (fields with the same name and type definition). Now, I'd like to perform a search on all instances at the same time (with the restriction that the query contains only those fields that overlap among the different schemas) and combine the results in a reasonable way by utilizing the score information associated with each hit. Please note, that due to legal issues it is not feasible to build a single index that integrates the documents of all Solr instances under consideration. Thanks in advance, Sascha
RE: remove from list
If you want to unsubscribe, then you can do so [1] without trying to sell something ;) [1]: http://lucene.apache.org/solr/mailing_lists.html Cheers! -Original message- From: Susan Rust su...@achieveinternet.com Sent: Wed 23-06-2010 18:23 To: solr-user@lucene.apache.org; Erik Hatcher erik.hatc...@gmail.com; Subject: remove from list Hey SOLR folks -- There's too much info for me to digest, so please remove me from the email threads. However, if we can build you a forum, bulletin board or other web- based tool, please let us know. For that matter, we would be happy to build you a new website. Bill O'Connor is our CTO and the Drupal.org SOLR Redesign Lead. So we love SOLR! Let us know how we can support your efforts. Susan Rust VP of Client Services If you wish to travel quickly, go alone If you wish to travel far, go together Achieve Internet 1767 Grand Avenue, Suite 2 San Diego, CA 92109 800-618-8777 x106 858-453-5760 x106 Susan-Rust (skype) @Susan_Rust (twitter) @Achieveinternet (twitter) @drupalsandiego (San Diego Drupal Users' Group Twitter) This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. Please notify the sender immediately by e-mail if you have received this e-mail by mistake and delete this e-mail from your system. E-mail transmission cannot be guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or contain viruses. The sender therefore does not accept liability for any errors or omissions in the contents of this message, which arise as a result of e-mail transmission. If verification is required please request a hard-copy version. On Jun 23, 2010, at 1:52 AM, Mark Allan wrote: Cheers, Geert-Jan, that's very helpful. We won't always be searching with dates and we wouldn't want duplicates to show up in the results, so your second suggestion looks like a good workaround if I can't solve the actual problem. I didn't know about FieldCollapsing, so I'll definitely keep it in mind. Thanks Mark On 22 Jun 2010, at 3:44 pm, Geert-Jan Brits wrote: Perhaps my answer is useless, bc I don't have an answer to your direct question, but: You *might* want to consider if your concept of a solr-document is on the correct granular level, i.e: your problem posted could be tackled (afaik) by defining a document being a 'sub-event' with only 1 daterange. So for each event-doc you have now, this is replaced by several sub- event docs in this proposed situation. Additionally each sub-event doc gets an additional field 'parent- eventid' which maps to something like an event-id (which you're probably using) . So several sub-event docs can point to the same event-id. Lastly, all sub-event docs belonging to a particular event implement all the other fields that you may have stored in that particular event-doc. Now you can query for events based on data-rages like you envisioned, but instead of returning events you return sub-event-docs. However since all data of the original event (except the multiple dateranges) is available in the subevent-doc this shouldn't really bother the client. If you need to display all dates of an event (the only info missing from the returned solr-doc) you could easily store it in a RDB and fetch it using the defined parent-eventid. The only caveat I see, is that possibly multiple sub-events with the same 'parent-eventid' might get returned for a particular query. This however depends on the type of queries you envision. i.e: 1) If you always issue queries with date-filters, and *assuming* that sub-events of a particular event don't temporally overlap, you will never get multiple sub-events returned. 2) if 1) doesn't hold and assuming you *do* mind multiple sub- events of the same actual event, you could try to use Field Collapsing on 'parent-eventid' to only return the first sub-event per parent- eventid that matches the rest of your query. (Note however, that Field Collapsing is a patch at the moment. http://wiki.apache.org/solr/FieldCollapsing) Not sure if this helped you at all, but at the very least it was a nice conceptual exercise ;-) Cheers, Geert-Jan 2010/6/22 Mark Allan mark.al...@ed.ac.uk Hi all, Firstly, I apologise for the length of this email but I need to describe properly what I'm doing before I get to the problem! I'm working on a project just now which requires the ability to store and search on temporal coverage data - ie. a field which specifies a date range during which a certain event took place. I hunted around for a few days and couldn't find anything which seemed to fit, so I had a go at writing my
Re: Cache hits exposed by API
Hi, De AdminRequestHandler exposes a JSP [1] that'll return a nice XML document with all the information you need about cache statistics and other. [1]: http://localhost:8983/solr/admin/stats.jsp Cheers, On Tuesday 29 June 2010 15:52:56 Na_D wrote: This is just an enquiry.I just wanted to know if the cache hit rates of solr exposed via the API of solr? Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Disabling Access to Solr Admin Panel
Hi, Check out the wiki [1] on this subject. [1]: http://wiki.apache.org/solr/SolrSecurity Cheers, -Original message- From: Vladimir Sutskever vladimir.sutske...@jpmorgan.com Sent: Tue 29-06-2010 18:05 To: solr-user@lucene.apache.org; Subject: Disabling Access to Solr Admin Panel Hi All, How can I forbid access to the SOLR index admin panel? Can I configure this in the /jetty.xml - I understand that's it's not true security - considering updates/delete/re-indexing commands will still be allowed - via GET request. Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan Chase, Inc. This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
RE: Re: Faceted search outofmemory
http://wiki.apache.org/solr/SimpleFacetParameters#facet.limit -Original message- From: olivier sallou olivier.sal...@gmail.com Sent: Tue 29-06-2010 20:11 To: solr-user@lucene.apache.org; Subject: Re: Faceted search outofmemory How do make paging over facets? 2010/6/29 Ankit Bhatnagar abhatna...@vantage.com Did you trying paging them? -Original Message- From: olivier sallou [mailto:olivier.sal...@gmail.com] Sent: Tuesday, June 29, 2010 2:04 PM To: solr-user@lucene.apache.org Subject: Faceted search outofmemory Hi, I try to make a faceted search on a very large index (around 200GB with 200M doc). I have an out of memory error. With no facet it works fine. There are quite many questions around this but I could not find the answer. How can we know the required memory when facets are used so that I try to scale my server/index correctly to handle it. Thanks Olivier
RE: Re: Disable Solr Response Formatting
Hi, My client makes a mess out of your example but if you mean formatting as in indenting, then send indent=false, but it's already false by default. Check your requestHandler settings. Cheers, -Original message- From: JohnRodey timothydd...@yahoo.com Sent: Wed 30-06-2010 18:39 To: solr-user@lucene.apache.org; Subject: Re: Disable Solr Response Formatting Oops, let me try that again... By default my SOLR response comes back formatted, like such C/ Is there a way to tell it to return it unformatted? like: C/ -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933793.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr results not updating
Hi, If q=*:* doesn't show your insert, then you forgot the commit: http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 Cheers, -Original message- From: Moazzam Khan moazz...@gmail.com Sent: Tue 06-07-2010 22:09 To: solr-user@lucene.apache.org; Subject: Solr results not updating Hi, I just successfully inserted a document into SOlr but when I search for it, it doesn't show up. Is it a cache issue or something? Is there a way to make sure it was inserted properly? And, it's there? Thanks, Moazzam
RE: /select handler statistics
Hi, I think you're looking for the statistics for the standard request handler. Cheers, -Original message- From: Vladimir Sutskever vladimir.sutske...@jpmorgan.com Sent: Mon 12-07-2010 19:44 To: solr-user@lucene.apache.org; Subject: /select handler statistics Hi All, I am looking at the stats.jsp page in the SOLR admin panel. I do not see statistics for the /select request handler. I want to know total # of search requests + avg time of request ... etc Am I overlooking something? Kind regards, Vladimir Sutskever Investment Bank - Technology JPMorgan Chase, Inc. This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
RE: Problem with Wildcard searches in Solr
Hi, The DisMaxQParser does not support wildcards in its q parameter [1]. You must use the LuceneQParser instead. AFAIK, in DisMax, wildcards are part of the search query and may get filtered out in your query analyzer. [1]: http://wiki.apache.org/solr/DisMaxRequestHandler#q Cheers, -Original message- From: imranak imranak...@gmail.com Sent: Mon 12-07-2010 22:40 To: solr-user@lucene.apache.org; Subject: Problem with Wildcard searches in Solr Hi, I am having a problem doing wildcard searches in lucene syntax using the edismax handler. I have Solr 4.0 nightly build from the trunk. A general search like 'computer' returns results but 'com*er' doesn't return any results. Similary, a search like 'co?mput?r' returns no results. The only type of wildcard searches working currrently is ones with trailing wildcards(like compute? or comput*). I want to be able to do searches with wildcards at the beginning (*puter) and in between (com*er). Could someone please tell me what I am doing wrong and how to fix it. Thanks. Regards, Imran. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961448.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Problem with Wildcard searches in Solr
Hi, Check edismax' JIRA page and its unresolved related issues [1]. AFAIK, it hasn't been committed yet. [1]: https://issues.apache.org/jira/browse/SOLR-1553 Cheers, -Original message- From: imranak imranak...@gmail.com Sent: Mon 12-07-2010 23:55 To: solr-user@lucene.apache.org; Subject: RE: Problem with Wildcard searches in Solr Hi, Thanks for you response. The dismax query parser doesn't support it but I heard the edismax parser supports all kinds of wildcards. Been trying it out but without any luck. Could someone please help me with that. I'm unable to make leading and in-the-middle wildcard searches work. Thanks. Imran. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-with-Wildcard-searches-in-Solr-tp961448p961617.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Ranking position in solr
No, it can build for each new searcher [1]. [1]: http://wiki.apache.org/solr/QueryElevationComponent#config-file On Tuesday 13 July 2010 11:02:10 Chamnap Chhorn wrote: The problem is that every time I update the elevate.xml, I need to restart solr tomcat service. This feature needs to be updated frequently. How would i handle that? Any idea or other solutions? On Mon, Jul 12, 2010 at 5:45 PM, Ahmet Arslan iori...@yahoo.com wrote: I wonder there is a proper way to fulfill this requirement. A book has several keyphrases. Each keyphrase consists from one word to 3 words. The author could either buy keyphrase position or don't buy position. Note: each author could buy more than 1 keyphrase. The keyphrase search must be exact and case sensitive. For example: Book A, keyphrases: agile, web, development Book B, keyphrases: css, html, web Let's say Author of Book A buys search result position 1 with keyphrase web, so his book should be in the first position. His book should be listed before the Book B. Anyone has any suggestions on how to implement this in solr? http://wiki.apache.org/solr/QueryElevationComponent - which is used to elevate results based on editorial decisions - may help. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: indexing rich documents
Hi, Are you sure you followed the wiki [1] on this subject? There is an example there but you need Solr 1.4.0 or higher. I unsure if just patching 1.3.0 will really do the trick. The patch must then also include Apache Tika, which sits under the hood, extracting content and meta data from various formats. [1]: http://wiki.apache.org/solr/ExtractingRequestHandler Cheers, On Tuesday 13 July 2010 14:11:56 satya swaroop wrote: Hi all, i am new to solr and followed with the wiki and got the solr admin run sucessfully. It is good going for xml files. But to index the rich documents i am unable to get it. I followed wiki to make the richer documents also, but i didnt get it.The error comes when i send an pdf/html file is a lazy error. can anyone give some detail description about how to make richer documents indexable i use tomcat and working in ubuntu. The home directory for solr is /opt/solr/example and catalina home is /opt/tomcat6. thanks regards, swaroop Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Tag generation
Check out OpenCalais [1]. Maybe it works for your case and language. [1]: http://www.opencalais.com/ On Thursday 15 July 2010 17:34:31 kenf_nc wrote: A colleague mentioned that he knew of services where you pass some content and it spits out some suggested Tags or Keywords that would be best suited to associate with that content. Does anyone know if there is a contrib to Solr or Lucene that does something like this? Or a third party tool that can be given a solr index or solr query and it comes up with some good Tag suggestions? Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Wiki, login and password recovery
Hi, This probably should be in INFRA (to which i'm not subscribed) or something like that. Anyway, for some reason, my user/pass won't let me login anymore and i'm quite sure my browser still `remembers` the correct combination. I'm unsure whether this is a bug: to get that answer, i need to recover my current password so i can check... But, how convenient, the password recovery mechanism `cannot connect with the mailserver on localhost ERRNO: 60` and times out. Any assistance on this one? Cheers,
RE: Re: Wiki, login and password recovery
This happened just a few hours ago and the problem persists at this very moment. I filed an issue: https://issues.apache.org/jira/browse/INFRA-2884 Cheers! -Original message- From: Chris Hostetter hossman_luc...@fucit.org Sent: Mon 19-07-2010 20:23 To: solr-user@lucene.apache.org; Subject: Re: Wiki, login and password recovery You don't need to subscribe to any infra lists to file an INFRA bug, just use Jira... https://issues.apache.org/jira/browse/INFRA Note that there was infra work this weekend that involved moving servers for the wiki system (as was noted in advance on http://monitoring.apache.org and http://twitter.com/infrabot) so maybe you just got unluky with the timing? https://blogs.apache.org/infra/entry/new_hardware_for_apache_org : This probably should be in INFRA (to which i'm not subscribed) or : something like that. Anyway, for some reason, my user/pass won't let me : login anymore and i'm quite sure my browser still `remembers` the : correct combination. I'm unsure whether this is a bug: to get that : answer, i need to recover my current password so i can check... But, how : convenient, the password recovery mechanism `cannot connect with the : mailserver on localhost ERRNO: 60` and times out. -Hoss
RE: boosting particular field values
function queries match all documents http://wiki.apache.org/solr/FunctionQuery#Using_FunctionQuery -Original message- From: Justin Lolofie jta...@gmail.com Sent: Wed 21-07-2010 20:24 To: solr-user@lucene.apache.org; Subject: boosting particular field values I'm using dismax request handler, solr 1.4. I would like to boost the weight of certain fields according to their values... this appears to work: bq=category:electronics^5.5 However, I think this boosting only affects sorting the results that have already matched? So if I only get 10 rows back, I might not get any records back that are category electronics. If I get 100 rows, I can see that bq is working. However, I only want to get 10 rows. How does one affect the kinds of results that are matched to begin with? bq is the wrong thing to use, right? Thanks for any help, Justin
Re: SolrJ Response + JSON
Hi, I got a response to your e-mail in my box 30 minutes ago. Anyway, enable the JSONResponseWriter, if you haven't already, and query with wt=json. Can't get mucht easier. Cheers, On Wednesday 28 July 2010 15:08:26 MitchK wrote: Hello , Second try to send a mail to the mailing list... I need to translate SolrJ's response into JSON-response. I can not query Solr directly, because I need to do some math with the responsed data, before I show the results to the client. Any experiences how to translate SolrJ's response into JSON without writing your own JSON Writer? Thank you. - Mitch Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Stress Test Solr
Very interersting. Could you add some information and link to the relevant wiki page [1] ? [1]: http://wiki.apache.org/solr/BenchmarkingSolr -Original message- From: Tomas tomasflo...@yahoo.com.ar Sent: Mon 02-08-2010 17:34 To: solr-user@lucene.apache.org; Subject: Stress Test Solr Hi All, we've been building an open source tool for load tests on Solr Installations. Thetool is called SolrMeter. It's on google code at http://code.google.com/p/solrmeter/. Here is some information about it: SolrMeter is an stress testing / performance benchmarking tool for Apache Solr installations. It is licensed under ASL and developed using JavaSE and Swing components, connected with Solr using SolrJ. What can youdowith SolrMeter? The main goal of this open source project is bring to the Apache Solr user community a tool for dealing with Solr specific issues regarding performance and stress testing like firing queries and adding documents to make sure that your Solr instalation will support real world's load and demands. With SolrMeter you can simulate a work load over the Apache Solr instalation and to obtain useful visual performance statistics and metrics. Relevant Features: * Execute queries against a Solr installation * Execute dummy updates/inserts to the Solr installation, it can be the same server as the queries or a different one. * Configure number of queries to fire in a time period interval * Configure the number of updates/inserts in a time period. * Configure commits frequency during adds * Monitor error counts when adding and commiting documents. * Perform and monitor index optimization * Monitor query times online and visually * Add filter queries into the test queries * Add facet abilities into the test queries * Import/Export test configuration * Query time execution histogram chart * Query times distribution chart * Online error log and browsing capabilities * Individual query graphical log and statistics * and much more Whatdo you need for use SolrMeter? This is one of the most interesting points about SolrMeter, the requirements are minimal. It is simple to install and use. * JRE versión 1.6 * The Solr Server you want to test. Who can use SolrMeter? Everyone who needs to assess the solrmeter server performance. To run the tool you only need to know about SOLR. Try it and tell us what you think . . . . . Solrmeter Group mailto:solrme...@googlegroups.com What's next? We are now building version 0.2.0, the objetive of this new version is to evolve SolrMeter into a pluggable architecture to allow deeper customizations like adding custom statistics, extractors or executors. We are also adding some usability improvements. On future versions we want to add a better interaction with Solr request handlers, for example, showing cache statistics online and graphically on some chart would be a great tool. We also want to add more usability features to make of solrmeter a complete tool for testing a Solr instalation. For more details on what's next che the Issues page on the google code site.
RE: Phrase search
Well, the WordDelimiterFilterFactory in your query analyzer clearly makes Apple 2 out of Apple2, that's what it's for. If you're looking for an exact match, use a string field. Check the output with the debugQuery=true parameter. Cheers, -Original message- From: johnmu...@aol.com Sent: Mon 02-08-2010 20:18 To: solr-user@lucene.apache.org; Subject: Phrase search Hi All, I don't understand why i'm getting this behavior. I was under the impression if I search for Apple 2 (with quotes and space before 2 ) it will give me different results vs. if I search for Apple2 (with quotes and no space before 2 ), but I'm not! Why? Here is my fieldType setting from my schema.xml: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ !-- filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType What I am missing?!! What part of my solr.WordDelimiterFilterFactory need to change (if that s where the issue is)? I m using Solr 1.2 Thanks in advanced. -M
RE: Re: Phrase search
Hi, Queries on an analyzed field will need to be analyzed as well or it might not match. You can configure the WordDelimiterFilterFactory so it will not split into multiple tokens because of numerics, see the splitOnNumerics parameter [1]. [1]: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory Cheers, -Original message- From: johnmu...@aol.com Sent: Mon 02-08-2010 21:29 To: solr-user@lucene.apache.org; Subject: Re: Phrase search Thanks for the quick response. Which part of my WordDelimiterFilterFactory is changing Apple 2 to Apple2? How do I fix it? Also, I'm really confused about this. I was under the impression a phrase search is not impacted by the analyzer, no? -M -Original Message- From: Markus Jelsma markus.jel...@buyways.nl To: solr-user@lucene.apache.org Sent: Mon, Aug 2, 2010 2:27 pm Subject: RE: Phrase search Well, the WordDelimiterFilterFactory in your query analyzer clearly makes Apple out of Apple2, that's what it's for. If you're looking for an exact match, se a string field. Check the output with the debugQuery=true parameter. Cheers, Original message- rom: johnmu...@aol.com ent: Mon 02-08-2010 20:18 o: solr-user@lucene.apache.org; ubject: Phrase search i All, I don't understand why i'm getting this behavior. I was under the impression if search for Apple 2 (with quotes and space before 2 ) it will give me ifferent results vs. if I search for Apple2 (with quotes and no space before ), but I'm not! Why? Here is my fieldType setting from my schema.xml: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ !-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt gnoreCase=true expand=false/ -- filter class=solr.StopFilterFactory ignoreCase=true ords=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 enerateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ !-- filter class=solr.SynonymFilterFactory synonyms=synonyms.txt gnoreCase=true expand=true/ -- filter class=solr.StopFilterFactory ignoreCase=true ords=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=0 enerateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType What I am missing?!! What part of my solr.WordDelimiterFilterFactory need to hange (if that s where the issue is)? I m using Solr 1.2 Thanks in advanced. -M
RE: Multi word synomyms
Hi, This happens because your tokenizer will generate seperate tokens for `exercise dvds`, so the SynonymFilter will try to find declared synonyms for `exercise` and `dvds` separately. It's behavior is documented [1] on the wiki. [1]: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory Cheers, -Original message- From: Qwerky neil.j.tay...@hmv.co.uk Sent: Tue 03-08-2010 18:35 To: solr-user@lucene.apache.org; Subject: Multi word synomyms I'm having trouble getting multi word synonyms to work. As an example I have the following synonym; exercise dvds = fitness When I search for exercise dvds I want to return all docs in the index which contain the keyword fitness. I've read the wiki about solr.SynonymFilterFactory which recommends expanding the synonym when indexing, but I'm not sure this is what I want as none of my documents have the keywords exercise dvds. Here is the field definition from my schema.xml; When I test my search with the analysis page on the admin console it seems to work fine; Query Analyzer org.apache.solr.analysis.WhitespaceTokenizerFactory {} term position 12 term text exercisedvds term type wordword source start,end 0,89,13 payload org.apache.solr.analysis.SynonymFilterFactory {ignoreCase=true, synonyms=synonyms.txt, expand=true} term position 1 term text fitness term type word source start,end 0,13 payload org.apache.solr.analysis.TrimFilterFactory {} term position 1 term text fitness term type word source start,end 0,13 payload org.apache.solr.analysis.StopFilterFactory {ignoreCase=true, enablePositionIncrements=true, words=stopwords.txt} term position 1 term text fitness term type word source start,end 0,13 payload org.apache.solr.analysis.LowerCaseFilterFactory {} term position 1 term text fitness term type word source start,end 0,13 payload org.apache.solr.analysis.SnowballPorterFilterFactory {language=English, protected=protwords.txt} term position 1 term text fit term type word source start,end 0,13 payload ...but when I perform the search it doesn't seem to use the SynonymFilterFactory; 0 0 exercise dvds 0 on standard 2.2 standard on *,score 10 . exercise dvds exercise dvds PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds PRODUCTKEYWORDS:exercis PRODUCTKEYWORDS:dvds -- View this message in context: http://lucene.472066.n3.nabble.com/Multi-word-synomyms-tp1019722p1019722.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Indexing fieldvalues with dashes and spaces
You shouldn't fetch faceting results from analyzed fields, it will mess with your results. Search on analyzed fields but don't retrieve values from them. -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Wed 04-08-2010 22:15 To: solr-user@lucene.apache.org; Subject: RE: Indexing fieldvalues with dashes and spaces I changed values to text_ws Now I only seem to have problems with fieldvalues that hold spacessee below: field name=city type=text_ws indexed=true stored=true/ field name=theme type=text_ws indexed=true stored=true multiValued=true omitNorms=true termVectors=true / field name=features type=text_ws indexed=true stored=true multiValued=true/ field name=services type=text_ws indexed=true stored=true multiValued=true/ field name=province type=text_ws indexed=true stored=true/ It has now become: facet_counts:{ facet_queries:{}, facet_fields:{ theme:[ Gemeentehuis,2, ,1, still is created as separate facet Strand,1, Zee,1], features:[ Cafe,3, Danszaal,2, Tuin,2, Strand,1], province:[ Gelderland,1, Utrecht,1, Zuid-Holland,1], this is now correct services:[ Exclusieve,2, Fotoreportage,2, huur,2, Live,1, Live muziek is split and separate facets are created muziek,1]}, facet_dates:{}}} -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023787.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Indexing fieldvalues with dashes and spaces
Hmm, you should first read a bit more on schema design on the wiki and learn about indexing and querying Solr. The copyField directive is what is commonly used in a faceted navigation system, search on analyzed fields, show faceting results using the primitive string field type. With copyField, you can, well, copy the field from one to another without it being analyzed by the first - so no chaining is possible, which is good. Let's say you have a city field you want to navigate with, but also search in, then you would have an analyzed field for search and a string field for displaying the navigation. But, check the wiki on this subject. -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Wed 04-08-2010 22:23 To: solr-user@lucene.apache.org; Subject: RE: Indexing fieldvalues with dashes and spaces Sorry, but Im a newbie to Solr...how would I change my schema.xml to match your requirements? And what do you mean by it will mess with your results? What will happen then? -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-tp1023699p1023824.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Re: Load cores without restarting/reloading Solr
http://wiki.apache.org/solr/CoreAdmin -Original message- From: Karthik K karthikkato...@gmail.com Sent: Thu 05-08-2010 12:00 To: solr-user@lucene.apache.org; Subject: Re: Load cores without restarting/reloading Solr Can some one please answer this. Is there a way of creating/adding a core and starting it without having to reload Solr ?
RE: dismax debugging hyphens dashes
Well, that smells like a WordDelimiterFilterFactory [1]. It splits, as your debug output shows, value into three separate tokens. This means that (at least) the strings 'abc', '12' and 'def' are in your index and can be found. The abc12 value is not present. If you want to query for substrings, you can try NGramFilterFactory [2]. It's not really documented on the wiki but searching will help [3]. [1]: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory [2]: http://search.lucidimagination.com/search/document/CDRG_ch05_5.5.6 [3]: http://www.lucidimagination.com/blog/2009/09/08/auto-suggest-from-popular-queries-using-edgengrams/ -Original message- From: j jta...@gmail.com Sent: Sat 07-08-2010 19:18 To: solr-user@lucene.apache.org; Subject: dismax debugging hyphens dashes How does one debug index vs. dismax query parser? I have a solr instance with 1 document whose title is ABC12-def. I am using dismax. While abc, 12, and def do match, abc12 and def do not. Here is a the parsedquery_toString, I'm having trouble understanding it: +(id:abc12^3.0 | title:(abc12 abc) 12^1.5) (id:abc12^3.0 | title:(abc12 abc) 12^1.5) Does anyone have advice for getting this to work?
RE: Re: Facet Fields - ID vs. Display Value
Well, you can do both, of cource but there's no need for additional code if you get it for free. I'd prefer - as most i assume - to use the label as a facet field. -Original message- From: Frank A fsa...@gmail.com Sent: Tue 10-08-2010 01:11 To: solr-user@lucene.apache.org; Subject: Re: Facet Fields - ID vs. Display Value What I meant (which I realize now wasn't very clear) was if I have something like categoryID and categorylabel - is the normal practice to define categoryID as the facet field and then have the UI layer display the label? Or would it be normal to directly use categorylabel as the facet field? On Mon, Aug 9, 2010 at 6:01 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Frank, I'm not sure what you mean by that. If the question is about what should be shown in the UI, it should be something pretty and human-readable, such as the original facet string value, assuming it was nice and clean. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Frank A fsa...@gmail.com To: solr-user@lucene.apache.org Sent: Mon, August 9, 2010 5:19:57 PM Subject: Facet Fields - ID vs. Display Value Is there a general best practice on whether facet fields should be on IDs or Display values? -Frank
RE: Re: uniqueKey and custom fieldType
copyField it to an analyzed field will do the trick. -Original message- From: j jta...@gmail.com Sent: Sun 15-08-2010 20:30 To: solr-user@lucene.apache.org; Subject: Re: uniqueKey and custom fieldType Hi Erick, thanks- your explanation makes sense. But how then, do I make my unique field useful in terms of searching. If I have a unique column id with value: sometexthere-1234567 and want it match the query '1234567', I need to use an analyzer to split up the parts around the hyphen/dash. I guess I could make a copy of that field in another field with gets analyzed? Thanks for any advice. The short answer is that unique keys should be s single term. String types are guaranteed to be single, since they aren't analyzed. Your SplitUpStuff type *does* analyze terms, and can make multiple tokens out of single strings via WordDelimterFactory. A common error when thinking about the string the type is not understanding that it is NOT analyzed. It's indexed as a single term. So whey you define UniqueKey of type string, it behaves as you expect. That is documents are updated if the ID field matches exactly, case, spaces, order and all. By introducing your SplitUpStuff type as UniqueKey, Well, I don't even know what behavior I'd expect. And whatever behavior I happened to observe would not be guaranteed to be the behavior of the next release. Consider what you're asking for and you can see why you don't want to analyze your uniquekey field. Consider the following simple text type (where each word is a term). You have two values from two different docs doc1: this is a nice unique key doc2: My Keys are Unique and Nice It's quite possible, with combinations of analyzers and stemmers to index the exact same tokens, namely nice, unique and key for each document. Are these equivalent? Does order count? Capitalization? It'd just be a nightmare to try to explain/predict/implement. Likely whatever behavior you do get is just whatever falls out of the code. I'm not even sure any attempt is made to enforce uniqueness on an analyzed field. HTH Erick On Sun, Aug 15, 2010 at 11:59 AM, j jta...@gmail.com wrote: I guess another way to pose the question is- what could cause uniqueKeyid/uniqueKey to no longer be respected? The last chance I made since I noticed the problem of non-unique docs was by changing field title from string to SplitUpStuff. But I dont understand how that could affect the uniqueness of a different field called id. fieldType name=splitUpStuff class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 c filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=false / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType In order to make even a guess, we'd have to see your new field type. Particularly its field definitions and the analysis chain... Best Erick On Fri, Aug 13, 2010 at 5:16 PM, j jta...@gmail.com wrote: Does fieldType have any effect on the thing that I specify should be unique? uniqueKey has been working for me up until recently. I change the field that is unique from type string to a fieldType that I have defined. Now when I do an update I get a newly created document (so that I have duplicates). Has anyone else had this problem before?
RE: Newbie question about search behavior
You can append it in your middleware, or try the EdgeNGramTokenizer [1]. If you're going for the latter, don't forget to reindex and expect a larger index. [1]: http://lucene.apache.org/java/2_9_0/api/all/org/apache/lucene/analysis/ngram/EdgeNGramTokenizer.html -Original message- From: Mike Thomsen mikerthom...@gmail.com Sent: Mon 16-08-2010 19:09 To: solr-user@lucene.apache.org; Subject: Newbie question about search behavior Is it possible to set up Lucene to treat a keyword search such as title:News implicitly like title:News* so that any title that begins with News will be returned without the user having to throw in a wildcard? Also, are there any common filters and such that are generally considered a good practice to throw into the schema for an English-language website? Thanks, Mike
RE: help on facet range
No http://wiki.apache.org/solr/SimpleFacetParameters#Facet_by_Range https://issues.apache.org/jira/browse/SOLR-1240 -Original message- From: Peng, Wei wei.p...@xerox.com Sent: Mon 16-08-2010 20:25 To: solr-user@lucene.apache.org; Subject: RE: help on facet range The solr version that I am using is 1.4.0. Does it support facet.range? Wei -Original Message- From: Peng, Wei [mailto:wei.p...@xerox.com] Sent: Monday, August 16, 2010 2:12 PM To: solr-user@lucene.apache.org Subject: help on facet range I have been trying to use facet by range. However no matter how I tried, I did not get anything from facet range ( I do get results from facet fields: topic and author). The query is http://localhost:8983/solr/select/?facet.range=timestampfacet.range.sta rt=0facet.range.end=1277942270facet.range.gap=86400facet.range.other= allindent=onq=*:*facet.field=topicfacet.field=author The facet range field is timestamp, which is defined to be int in the Schema. Can someone help me on this problem? Many Thanks Wei
RE: Re: Solr searching performance issues, using large documents
I've no idea if it's possible but i'd at least try to return an ArrayList of rows instead of just a single row. And if it doesn't work, which is probably the case, how about filing an issue in Jira? Reading the docs in the matter, i think it should (made) to be possible to return multiple rows in an ArrayList. -Original message- From: Peter Spam ps...@mac.com Sent: Tue 17-08-2010 00:47 To: solr-user@lucene.apache.org; Subject: Re: Solr searching performance issues, using large documents Still stuck on this - any hints on how to write the JavaScript to split a document? Thanks! -Pete On Aug 5, 2010, at 8:10 PM, Lance Norskog wrote: You may have to write your own javascript to read in the giant field and split it up. On Thu, Aug 5, 2010 at 5:27 PM, Peter Spam ps...@mac.com wrote: I've read through the DataImportHandler page a few times, and still can't figure out how to separate a large document into smaller documents. Any hints? :-) Thanks! -Peter On Aug 2, 2010, at 9:01 PM, Lance Norskog wrote: Spanning won't work- you would have to make overlapping mini-documents if you want to support this. I don't know how big the chunks should be- you'll have to experiment. Lance On Mon, Aug 2, 2010 at 10:01 AM, Peter Spam ps...@mac.com wrote: What would happen if the search query phrase spanned separate document chunks? Also, what would the optimal size of chunks be? Thanks! -Peter On Aug 1, 2010, at 7:21 PM, Lance Norskog wrote: Not that I know of. The DataImportHandler has the ability to create multiple documents from one input stream. It is possible to create a DIH file that reads large log files and splits each one into N documents, with the file name as a common field. The DIH wiki page tells you in general how to make a DIH file. http://wiki.apache.org/solr/DataImportHandler From this, you should be able to make a DIH file that puts log files in as separate documents. As to splitting files up into mini-documents, you might have to write a bit of Javascript to achieve this. There is no data structure or software that implements structured documents. On Sun, Aug 1, 2010 at 2:06 PM, Peter Spam ps...@mac.com wrote: Thanks for the pointer, Lance! Is there an example of this somewhere? -Peter On Jul 31, 2010, at 3:13 PM, Lance Norskog wrote: Ah! You're not just highlighting, you're snippetizing. This makes it easier. Highlighting does not stream- it pulls the entire stored contents into one string and then pulls out the snippet. If you want this to be fast, you have to split up the text into small pieces and only snippetize from the most relevant text. So, separate documents with a common group id for the document it came from. You might have to do 2 queries to achieve what you want, but the second query for the same query will be blindingly fast. Often 1ms. Good luck! Lance On Sat, Jul 31, 2010 at 1:12 PM, Peter Spam ps...@mac.com wrote: However, I do need to search the entire document, or else the highlighting will sometimes be blank :-( Thanks! - Peter ps. sorry for the many responses - I'm rushing around trying to get this working. On Jul 31, 2010, at 1:11 PM, Peter Spam wrote: Correction - it went from 17 seconds to 10 seconds - I was changing the hl.regex.maxAnalyzedChars the first time. Thanks! -Peter On Jul 31, 2010, at 1:06 PM, Peter Spam wrote: On Jul 30, 2010, at 1:16 PM, Peter Karich wrote: did you already try other values for hl.maxAnalyzedChars=2147483647 Yes, I tried dropping it down to 21, but it didn't have much of an impact (one search I just tried went from 17 seconds to 15.8 seconds, and this is an 8-core Mac Pro with 6GB RAM - 4GB for java). ? Also regular expression highlighting is more expensive, I think. What does the 'fuzzy' variable mean? If you use this to query via ~someTerm instead someTerm then you should try the trunk of solr which is a lot faster for fuzzy or other wildcard search. fuzzy could be set to * but isn't right now. Thanks for the tips, Peter - this has been very frustrating! - Peter Regards, Peter. Data set: About 4,000 log files (will eventually grow to millions). Average log file is 850k. Largest log file (so far) is about 70MB. Problem: When I search for common terms, the query time goes from under 2-3 seconds to about 60 seconds. TermVectors etc are enabled. When I disable highlighting, performance improves a lot, but is still slow for some queries (7 seconds). Thanks in advance for any ideas! -Peter - 4GB RAM server % java -Xms2048M -Xmx3072M -jar start.jar - schema.xml changes:
RE: Faceting by fields that contain special characters
A very common issue, you need to facet on a non-analyzed field. http://lucene.472066.n3.nabble.com/Indexing-fieldvalues-with-dashes-and-spaces-td1023699.html#a1222961 -Original message- From: Christos Constantinou ch...@simpleweb.co.uk Sent: Thu 19-08-2010 15:08 To: solr-user@lucene.apache.org; Subject: Faceting by fields that contain special characters Hi all, I am doing a faceted search on a solr field that contains URLs, for the sole purpose of trying to locate duplicate URLs in my documents. However, the solr response I get looks like this: public 'com' = int 492198 public 'flickr' = int 492198 public 'http' = int 492198 public 'www' = int 253881 public 'photo' = int 253843 public 'n' = int 253318 public 'httpwwwflickrcomphoto' = int 253316 public 'farm' = int 238317 public 'httpfarm' = int 238317 public 'jpg' = int 238317 public 'static' = int 238317 public 'staticflickrcom' = int 238317 public '5' = int 237939 public '00' = int 61009 public 'b' = int 59463 public 'c' = int 59094 public 'f' = int 59004 public 'd' = int 58995 public 'e' = int 58818 public 'a' = int 58327 public '08' = int 33797 public '06' = int 33341 public '04' = int 29902 public '02' = int 29224 public '2' = int 26671 public '4' = int 26613 public '6' = int 26606 public '03' = int 26506 public '1' = int 26389 public '8' = int 26384 It should instead have the entire URL as the variable name, but the name is only a part of the URL. Is this because characters like :// in http:// cannot be used in variable names? If so, is there any workaround to the problem or an alternative way to detect duplicates? Thanks Christos
RE: Showing results based on facet selection
Hi, A facet query serves a different purpose [1]. You need to filter your result set [2]. And don't forget to follow the links on caching and such. [1]: http://wiki.apache.org/solr/SimpleFacetParameters#facet.query_:_Arbitrary_Query_Faceting [2]: http://wiki.apache.org/solr/CommonQueryParameters#fq Cheers, -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Thu 19-08-2010 14:10 To: solr-user@lucene.apache.org; Subject: Showing results based on facet selection I have indexed all data (as can be seen below). But now I want to be able to simulate when a user clicks on a facet value, for example clicks on the value Gemeentehuis of facet themes_raw AND has a selection on features facet on value Strand I've been playing with facet.query function: facet.query=themes_raw:Gemeentehuisfacet.query=features_raw:Strand But without luck. { responseHeader:{ status:0, QTime:0, params:{ facet:true, fl:id,title,city,score,themes,features,official,services, indent:on, q:*:*, facet.field:[province_raw, services_raw, themes_raw, features_raw], wt:json}}, response:{numFound:3,start:0,maxScore:1.0,docs:[ { id:1, title:Gemeentehuis Nijmegen, services:[ Fotoreportage], features:[ Tuin, Cafe], themes:[ Gemeentehuis], score:1.0}, { id:2, title:Gemeentehuis Utrecht, services:[ Fotoreportage, Exclusieve huur], features:[ Tuin, Cafe, Danszaal], themes:[ Gemeentehuis, Strand Zee], score:1.0}, { id:3, title:Beachclub Vroeger, services:[ Exclusieve huur, Live muziek], features:[ Strand, Cafe, Danszaal], themes:[ Strand Zee], score:1.0}] }, facet_counts:{ facet_queries:{}, facet_fields:{ province_raw:[ Gelderland,1, Utrecht,1, Zuid-Holland,1], services_raw:[ Exclusieve huur,2, Fotoreportage,2, Live muziek,1], themes_raw:[ Gemeentehuis,2, Strand Zee,2], features_raw:[ Cafe,3, Danszaal,2, Tuin,2, Strand,1]}, facet_dates:{}}} -- View this message in context: http://lucene.472066.n3.nabble.com/Showing-results-based-on-facet-selection-tp1223362p1223362.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr for multiple websites
http://osdir.com/ml/solr-user.lucene.apache.org/2009-09/msg00630.html http://osdir.com/ml/solr-user.lucene.apache.org/2009-03/msg00309.html Load balancing is bit out of scope here but all you need is a simple HTTP load balancer and a replication mechanism, depending on your set up. -Original message- From: Hitendra Molleti hitendra.moll...@itp.com Sent: Thu 19-08-2010 14:38 To: solr-user@lucene.apache.org; CC: 'Jonathan DeMello' jonathan.deme...@itp.com; amer.mahf...@itp.com; 'Nishchint Yogishwar' nishchint.yogish...@itp.com; Subject: RE: Solr for multiple websites Thanks Girjesh. Can you please let me know what are the pros and cons of this apporoach. Also, how can we setup load balancing between multiple solrs Thanks Hitendra -Original Message- From: Grijesh.singh [mailto:pintu.grij...@gmail.com] Sent: Thursday, August 19, 2010 10:25 AM To: solr-user@lucene.apache.org Subject: Re: Solr for multiple websites Using multicore is the right approach -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-for-multiple-websites-tp1173220p1219 772.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Autosuggest on PART of cityname
You need a new analyzed field with the EdgeNGramTokenizer or you can try facet.prefix for this to work. To retrieve the number of locations for that city, just use the results from the faceting engine as usual. I'm unsure which approach is actually faster but i'd guess using the EdgeNGramTokenizer is faster, but also takes up more disk space. Using the faceting engine will not take more disk space. -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Thu 19-08-2010 16:46 To: solr-user@lucene.apache.org; Subject: Autosuggest on PART of cityname I want to have a Google-like autosuggest function on citynames. So when user types some characters I want to show cities that match those characters but ALSO the amount of locations that are in that city. Now with Solr I now have the parameter: fq=title:Bost But the result doesnt show the city Boston. So the fq parameter now seems to be an exact match, where I want it to be a partial match as well, more like this in SQL: WHERE title LIKE 'value%' How can I do this? -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226088.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Autosuggest on PART of cityname
Hmm, you have only four documents in your index i guess? That would make sense because you query for *:*. This technique doesn't rely on the found documents but the faceting engine so you should include rows=0 in your query and the fl parameter is not required anymore. Also, add facet=true to enable the faceting engine. http://localhost:8983/solr/db/select/?wt=jsonq=*:*rows=0facet=truefacet.field=cityfacet.prefix=bost -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Thu 19-08-2010 17:11 To: solr-user@lucene.apache.org; Subject: RE: Autosuggest on PART of cityname Ok, I now tried this: http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*fl=cityfacet.field=cityfacet.prefix=Bost Then I get: { responseHeader:{ status:0, QTime:0, params:{ fl:city, indent:on, q:*:*, facet.prefix:Bost, facet.field:city, wt:json}}, response:{numFound:4,start:0,docs:[ {}, {}, {}, {}] }} So 4 total results, but I would have expected 1 What am I doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1226571.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Autosuggest on PART of cityname
You can't, it's analyzed. And if you facet on a non-analyzed field, you cannot distinguish between upper- and lowercase tokens. If you want that, you must create a new field with an EdgeNGramTokenizer, search on it and then you can facet on a non-analyzed field. Your query will be a bit different then: q=new_ngram_field:utr rows=0 facet=true facet.field=non_analyzed_city_field -Original message- From: PeterKerk vettepa...@hotmail.com Sent: Fri 20-08-2010 12:36 To: solr-user@lucene.apache.org; Subject: RE: Autosuggest on PART of cityname Ok, I now do this (searching for utr in cityname): http://localhost:8983/solr/db/select/?wt=jsonindent=onq=*:*rows=0facet=truefacet.field=cityfacet.prefix=utr In the DB there's 1 location with cityname 'Utrecht' and the other 1 is with 'Utrecht Overvecht' So in my dropdown I would like: Utrecht (1) Utrecht Overvecht (1) But I get this: { responseHeader:{ status:0, QTime:0, params:{ facet:true, indent:on, q:*:*, facet.prefix:utr, facet.field:city, wt:json, rows:0}}, response:{numFound:6,start:0,docs:[] }, facet_counts:{ facet_queries:{}, facet_fields:{ city:[ utrecht,2, utrechtovervecht,1]}, facet_dates:{}}} As you can see it looks at field city, where the tokenizer looks at each individual word. I also tried city_raw, but that was without any results. How can I fix that my dropdown will show the correct values? -- View this message in context: http://lucene.472066.n3.nabble.com/Autosuggest-on-PART-of-cityname-tp1226088p1241444.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Document Section in Solr
You cannot divide a document into sections as far as i know. You could, however, store divisions in different fields, if your use-case allows this, and retrieve only the fields that you need. This way you can avoid downloading 20MiB at once. On Friday 27 August 2010 11:26:05 maheshkumar wrote: If the document which is indexed is the big file. Is there are provision of dividing the documents into sections. For eg., 20MB file divided into 10 sections which will show the right section when searched. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Status of Solr in the cloud?
That would be Solr 4.0, or maybe 3.1 first. http://wiki.apache.org/solr/Solr3.1 http://wiki.apache.org/solr/Solr4.0 On Thursday 26 August 2010 23:58:25 Charlie Jackson wrote: There seem to be a few parallel efforts at putting Solr in a cloud configuration. See http://wiki.apache.org/solr/KattaIntegration, which is based off of https://issues.apache.org/jira/browse/SOLR-1395. Also http://wiki.apache.org/solr/SolrCloud which is https://issues.apache.org/jira/browse/SOLR-1873. And another JIRA: https://issues.apache.org/jira/browse/SOLR-1301. These all seem aimed at the same goal, correct? I'm interested in evaluating one of these solutions for my company; which is the most stable or most likely to eventually be part of the Solr distribution? Thanks, Charlie Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Auto ID for Documents indexed
No. Solr doesn't require a unique ID nor is an auto incrementing value really useful in indices spanning multiple machines. Maybe SOLR-308 could help you out but then the question remains, why would you need a feature like this? https://issues.apache.org/jira/browse/SOLR-308 On Friday 27 August 2010 11:41:55 maheshkumar wrote: Is there feature to provide an auto-increment id to the document which is getting indexed. This is the schema file field name=reference type=string indexed=true stored=true required=true/ field name=id type=string indexed=true stored=true/ Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Multiple passes with WordDelimiterFilterFactory
It's just a configured filter, so you should be able to define it twice. Have you tried it? But it might be tricky, the output from the first will be the input of the second so i doubt the usefulness of this approach. On Thursday 26 August 2010 17:45:45 Shawn Heisey wrote: Can I pass my data through WordDelimiterFilterFactory more than once? It occurs to me that I might get better results if I can do some of the filters separately and use preserveOriginal on some of them but not others. Currently I am using the following definition on both indexing and querying. Would it make sense to do the two differently? filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / Thanks, Shawn Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: A few query issues with solr
For solving the car/car-rent issue you'll need to add a SynonymFilter to your analyzer chain and configure it accordingly. On Friday 27 August 2010 13:40:15 hemantverm...@gmail.com wrote: this link will help you: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimi terFilterFactory Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Problem related to Sorting in Solr1.4
What seems to be the problem? Did you consult the wiki on this matter? http://wiki.apache.org/solr/CommonQueryParameters#sort On Friday 27 August 2010 15:14:06 deepak agrawal wrote: Hi, I have one Text fileld in our schema i want to do the sorting for that column. field name=TITLE type=text indexed=true stored=true / field name=UPDBY type=text indexed=true stored=true / I have these two columns i want to use the SORT for these two columns. any one can please suggest what should i need to do for that. I am currently using Solr1.4. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Spellcheck in multilanguage search
Configure language specific fields and spellcheckers just as you would for a single language index, so multiple content_LANG fields and spell_LANG field. This will, of course, only work if you know in what language the search operates. -Original message- From: Grijesh.singh pintu.grij...@gmail.com Sent: Tue 31-08-2010 12:18 To: solr-user@lucene.apache.org; Subject: Spellcheck in multilanguage search How can be spellcheck configured for multilanguage search,I have to index 17 languages in my indexes and search on them also wants to use spellcheck for that -- View this message in context: http://lucene.472066.n3.nabble.com/Spellcheck-in-multilanguage-search-tp1393357p1393357.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Memcache for Solr
Hi, In a restaurant index website, we have used Memcache only for storing the generated HTML facet list when q=*. This cached object was only used when no additional search parameters were specified. It was quite useful because the facet list was always present and only changed if real search parameters were specified. We found it wasn't feasible to cache arbitrary result sets, there would be just too many result sets to cache which would probably never be reused anyway and there is the problem of invalidating cached result sets. I'd rather rely on Solr's filter cache instead. From that point of view, it's only feasible to cache generated objects (HTML or whatever format) that you know are being requested many times. It's easy to implement and doesn't take too much memory that won't be reused anyway. Cheers, -Original message- From: Hitendra Molleti hitendra.moll...@itp.com Sent: Tue 31-08-2010 16:38 To: solr-user@lucene.apache.org; Subject: Memcache for Solr Hi, We were looking at implementing Memcache for Solr. Can someone who has already implemented this let us know if it is a good option to go for i.e. how effective is using memcache compared to Solr's internal cache. Also, are there any down sides to it and difficult to implement. Thanks Hitendra
Re: Proximity search + Highlighting
I think you need to enable usePhraseHighlighter in order to use the highlightMultiTerm parameter. On Wednesday 01 September 2010 12:12:11 Xavier Schepler wrote: Hi, can the highlighting component highlight terms only if the distance between them matches the query ? I use those parameters : hl=onhl.fl=qFR,iFR,mFR,vlFRhl.usePhraseHighlighter=falsehl.highlightMult iTerm=truehl.simple.pre=bhl.simple.post=%2Fbhl.mergeContiguous=false Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: shingles work in analyzer but not real data
If your use-case is limited to this, why don't you encapsulate all queries in double quotes? On Wednesday 01 September 2010 14:21:47 Jeff Rose wrote: Hi, We are using SOLR to match query strings with a keyword database, where some of the keywords are actually more than one word. For example a keyword might be apple pie and we only want it to match for a query containing that word pair, but not one only containing apple. Here is the relevant piece of the schema.xml, defining the index and query pipelines: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.PatternTokenizerFactory pattern=;/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.TrimFilterFactory / filter class=solr.ShingleFilterFactory / /analyzer /fieldType In the analysis tool this schema looks like it works correctly. Our multi-word keywords are indexed as a single entry, and then when a search phrase contains one of these multi-word keywords it is shingled and matched. Unfortunately, when we do the same queries on top of the actual index it responds with zero matches. I can see in the index histogram that the terms are correctly indexed from our mysql datasource containing the keywords, but somehow the shingling doesn't appear to work on this live data. Does anyone have experience with shingling that might have some tips for us, or otherwise advice for debugging the issue? Thanks, Jeff Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: morelikethis - stored=true is necessary?
The following table [1] will be most helpful! Keep it referenced! [1]: http://wiki.apache.org/solr/FieldOptionsByUseCase On Thursday 02 September 2010 13:20:33 zqzuk wrote: Hi all I am learning to use morelikethis handler, which seems very straightforward but I got some problems when testing and I wonder if you could help me. In my schema I have field name=page_content type=text indexed=true stored=false required=false multiValued=false termVectors=true/ With this schema when I use the query parameter mlt.fl=page_content The returned XML results in the moreLiksThis section shows similarity scores of 0 for all documents. However it is not the case for fields that define stored=true. Does it mean I must set stored=true for MLT to work? Also, does multivalued has an effect on the result? Thanks! Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: How to retrieve the full corpus
You can use Luke to inspect a Lucene index. Check the schema browser in your Solr admin interface for an example. On Monday 06 September 2010 16:52:03 Roland Villemoes wrote: Hi All, How can I retrieve all words from a Solr core? I need a list of all the words and how often they occur in the index. med venlig hilsen/best regards Roland Villemoes Tel: (+45) 22 69 59 62 E-Mail: mailto:r...@alpha-solutions.dk Alpha Solutions A/S Borgergade 2, 3.sal, 1300 København K Tel: (+45) 70 20 65 38 Web: http://www.alpha-solutions.dkhttp://www.alpha-solutions.dk/ ** This message including any attachments may contain confidential and/or privileged information intended only for the person or entity to which it is addressed. If you are not the intended recipient you should delete this message. Any printing, copying, distribution or other use of this message is strictly prohibited. If you have received this message in error, please notify the sender immediately by telephone, or e-mail and delete all copies of this message and any attachments from your system. Thank you. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: getting started - books/in dept material
Did you miss the wiki? http://wiki.apache.org/solr/SolrResources -Original message- From: Dennis Gearon gear...@sbcglobal.net Sent: Mon 06-09-2010 22:05 To: solr-user@lucene.apache.org; Subject: getting started - books/in dept material I really don't want to understand the code that is IN Solr/Lucene. So I'm looking for books on USING Solr/Lucene and configuring it plus making good queries. Any suggestions for current material? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php
RE: Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?)
The remainder of an arithmetic division http://en.wikipedia.org/wiki/Modulo_operation -Original message- From: Dennis Gearon gear...@sbcglobal.net Sent: Mon 06-09-2010 22:04 To: solr-user@lucene.apache.org; Subject: Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?) What is a 'simple MOD'? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Mon, 9/6/10, Andrzej Bialecki a...@getopt.org wrote: From: Andrzej Bialecki a...@getopt.org Subject: Re: SolrCloud distributed indexing (Re: anyone use hadoop+solr?) To: solr-user@lucene.apache.org Date: Monday, September 6, 2010, 11:30 AM On 2010-09-06 16:41, Yonik Seeley wrote: On Mon, Sep 6, 2010 at 10:18 AM, MitchKmitc...@web.de wrote: [...consistent hashing...] But it doesn't solve the problem at all, correct me if I am wrong, but: If you add a new server, let's call him IP3-1, and IP3-1 is nearer to the current ressource X, than doc x will be indexed at IP3-1 - even if IP2-1 holds the older version. Am I right? Right. You still need code to handle migration. Consistent hashing is a way for everyone to be able to agree on the mapping, and for the mapping to change incrementally. i.e. you add a node and it only changes the docid-node mapping of a limited percent of the mappings, rather than changing the mappings of potentially everything, as a simple MOD would do. Another strategy to avoid excessive reindexing is to keep splitting the largest shards, and then your mapping becomes a regular MOD plus a list of these additional splits. Really, there's an infinite number of ways you could implement this... For SolrCloud, I don't think we'll end up using consistent hashing - we don't need it (although some of the concepts may still be useful). I imagine there could be situations where a simple MOD won't do ;) so I think it would be good to hide this strategy behind an interface/abstract class. It costs nothing, and gives you flexibility in how you implement this mapping. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Nutch/Solr
Depends on your version of Nutch. At least trunk and 1.1 obey the solrmapping.xml file in Nutch' configuration directory. I'd suggest you start with that mapping file and the Solr schema.xml file shipped with Nutch as it exactly matches with the mapping file. Just restart Solr with the new schema (or you change the mapping), crawl, fetch, parse and update your DB's and then push the index from Nutch to your Solr instance. On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote: I tried to combine nutch and solr, want to ask somethig. After crawling, nutch has certain fields such as; content, tstamp, title. How can I map content field after crawling ? Do I have change the lucene code (such as add extra field)? Or overcome in solr stage? Any suggestion? Thx. -- Yavuz Selim YILMAZ Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Nutch/Solr
You should: - definately upgrade to 1.1 (1.2 is on the way), and - subscribe to the Nutch mailing list for Nutch specific questions. On Tuesday 07 September 2010 10:36:58 Yavuz Selim YILMAZ wrote: In fact, I used nutch 0.9 version, but thinking of passing the new version. If anybody did something like that, ? want to learn their experience. If indexing an xml file, there are specific fields and all of them are dependent among them, so duplicates don't happen. I want to extract specific fields from the content field. Doing such extraction, new fileds should be indexed as well, then comes me that, content indexed twice for every new field. By the way, any details about how to get new fields from the content will be helpful. -- Yavuz Selim YILMAZ 2010/9/7 Markus Jelsma markus.jel...@buyways.nl Depends on your version of Nutch. At least trunk and 1.1 obey the solrmapping.xml file in Nutch' configuration directory. I'd suggest you start with that mapping file and the Solr schema.xml file shipped with Nutch as it exactly matches with the mapping file. Just restart Solr with the new schema (or you change the mapping), crawl, fetch, parse and update your DB's and then push the index from Nutch to your Solr instance. On Tuesday 07 September 2010 10:00:47 Yavuz Selim YILMAZ wrote: I tried to combine nutch and solr, want to ask somethig. After crawling, nutch has certain fields such as; content, tstamp, title. How can I map content field after crawling ? Do I have change the lucene code (such as add extra field)? Or overcome in solr stage? Any suggestion? Thx. -- Yavuz Selim YILMAZ Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350 Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Is there a way to fetch the complete list of data from a particular column in SOLR document?
q=*:*fl=id_FIELDrows=NUM_DOCS ? -Original message- From: bbarani bbar...@gmail.com Sent: Tue 07-09-2010 23:09 To: solr-user@lucene.apache.org; Subject: Is there a way to fetch the complete list of data from a particular column in SOLR document? Hi, I am trying to get complete list of unique document ID and compare it with that of back end to make sure that both back end and SOLR documents are in sync. Is there a way to fetch the complete list of data from a particular column in SOLR document? Once I get the list, I can easily compare it against the DB and delete the orphan documents.. Please let me know if there are any other ideas / suggestions to implement this. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-fetch-the-complete-list-of-data-from-a-particular-column-in-SOLR-document-tp1435586p1435586.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Re: MoreLikethis and fq not giving exact results ?
I can think of two useful cases for a feature that limits MLT results depending with an optional mlt.fq parameter that limits the MLT results for each document, based on that fq: 1. prevent irrelevant docs when in a deep faceted navigation 2. general search results with MLT where you need to distinguish between collections when there are many different collections sharing the same index -Original message- From: Chris Hostetter hossman_luc...@fucit.org Sent: Tue 07-09-2010 23:32 To: solr-user@lucene.apache.org; Subject: Re: MoreLikethis and fq not giving exact results ? I don't believe the MLT Component has anyway of filtering like this. In your case you want the fq params to apply to the MLT results as well as the main results, but in other cases people wantthe fq to apply to the main result set and let the MLT be per individual doc with no ohter filters -- no one has implemented a configurable way to say when/if certain fqs should apply in the way you describe. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
RE: Re: MoreLikethis and fq not giving exact results ?
I know =) I was just polling votes for a feature request - there is no such issue filed for this component. Perhaps there should be? -Original message- From: Chris Hostetter hossman_luc...@fucit.org Sent: Wed 08-09-2010 00:13 To: solr-user@lucene.apache.org; Subject: RE: Re: MoreLikethis and fq not giving exact results ? i don't disagree with you -- i was just commenting that it doesn't work that way at the moment, because it was designed with differnet use cases in mind (returning docs related to the result docs, independent of how you found those result docs) -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
Invariants on a specific fq value
Hi, I have an index with several collections. Every document has a collection field that specifies the collection it belongs to. To make querying easier (and restrict exposed parameters) i have a request handler for each collection. The request handlers are largely the same and preset all parameters using invariants. Well, this is all very nice. But there is a catch, i cannot make an invariant of the fq parameter because it's being used (from the outside) to navigate through the facets. This means that the outside world can specify any value for the fq parameter. With the fq parameter being exposed, it is possible for request handler X to query documents that belong to collection Y and vice versa. But, as you might guess by now, request handler X should only be allowed to retrieve documents that belong to collection X. I know there are some discussions on how to restrict users to certain documents but i'd like to know if it is doable to patch the request handler logic to add an invariant-like directive that allows me to restrict a certain value for a certain parameter, but allow different values for that parameters. To give an example: requestHandler name=collection_x lst name=invariants str name=defTypedismax/str ... More invariants here /lst lst name=what_should_we_call_this? str name=fqfieldName:collection_x/str /lst /requestHandler The above configuration won't allow to change the defType and won't allow a value to be specified for the fieldName through the fq parameter. It will allow the outside worls to specify a value on another field through the fq parameter such as : fq:anotherField:someValue. Any ideas? Cheers, Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: How to import data with a different date format
No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
RE: Re: How to import data with a different date format
Your format (MM/DD/) is not compatible. -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 19:03 To: solr-user@lucene.apache.org; Subject: Re: How to import data with a different date format That was my first thought :-) But it would be nice to be able to do date queries. I guess when I export the data I can just add 00:00:00Z. Thanks. - Original Message From: Markus Jelsma markus.jel...@buyways.nl To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 11:34:32 AM Subject: RE: How to import data with a different date format No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
RE: Re: How to import data with a different date format
Ah, that answers Erick's question. And mine ;) -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 19:25 To: solr-user@lucene.apache.org; Subject: Re: How to import data with a different date format I'm going with option 1, converting MM/DD/ to -MM-DD (which is fairly easy in XSLT) and then adding T00:00:00Z to it. Thanks. - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, September 8, 2010 12:09:55 PM Subject: Re: How to import data with a different date format I think Markus is spot-on given the fact that you have 2 days. Using a string field is quickest. However, if you absolutely MUST have functioning dates, there are three options I can think of: 1 can you make your XSLT transform the dates? Confession; I'm XSLT-ignorant 2 use DIH and DateTransformer, see: http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer you can walk a directory importing all the XML files with FileDataSource. http://wiki.apache.org/solr/DataImportHandler#DateFormatTransformer3 you could write a program to do this manually. But given the time constraints, I suspect your time would be better spent doing the other stuff and just using string as per Markus. I have no clue how SOLR-savvy you are, so pardon if this is something you already know. But lots of people trip up over the string field type, which is NOT tokenized. You usually want text unless it's some sort of ID So it might be worth it to do some searching earlier rather than later G Best Erick On Wed, Sep 8, 2010 at 12:34 PM, Markus Jelsma markus.jel...@buyways.nlwrote: No. The Datefield [1] will not accept it any other way. You could, however, fool your boss and dump your dates in an ordinary string field. But then you cannot use some of the nice date features. [1]: http://lucene.apache.org/solr/api/org/apache/solr/schema/DateField.html -Original message- From: Rico Lelina rlel...@yahoo.com Sent: Wed 08-09-2010 17:36 To: solr-user@lucene.apache.org; Subject: How to import data with a different date format Hi, I am attempting to import some of our data into SOLR. I did it the quickest way I know because I literally only have 2 days to import the data and do some queries for a proof-of-concept. So I have this data in XML format and I wrote a short XSLT script to convert it to the format in solr/example/exampledocs (except I retained the element names so I had to modify schema.xml in the conf directory. So far so good -- the import works and I can search the data. One of my immediate problems is that there is a date field with the format MM/DD/. Looking at schema.xml, it seems SOLR accepts only full date fields -- everything seems to be mandatory including the Z for Zulu/UTC time according to the doc. Is there a way to specify the date format? Thanks very much. Rico
RE: Re: Invariants on a specific fq value
Interesting! I haven't met the appends method before and i'll be sure to give it a try tomorrow. Try, the wiki [1] is not very clear on what it really does. More suggestions before tomorrow? [1]: http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication -Original message- From: Jonathan Rochkind rochk...@jhu.edu Sent: Wed 08-09-2010 19:19 To: solr-user@lucene.apache.org; markus.jel...@buyways.nl; Subject: Re: Invariants on a specific fq value I just found out about 'invariants', and I found out about another thing too: appends. (I don't think either of these are actually documented anywhere?). I think maybe appends rather than invariants, with your fq you want always to be there might be exactly what you want? I actually forget whether it's append or appends, and am not sure if it's documented anywhere, try both I guess. But apparently it does exist in 1.4. Jonathan Markus Jelsma wrote: Hi, I have an index with several collections. Every document has a collection field that specifies the collection it belongs to. To make querying easier (and restrict exposed parameters) i have a request handler for each collection. The request handlers are largely the same and preset all parameters using invariants. Well, this is all very nice. But there is a catch, i cannot make an invariant of the fq parameter because it's being used (from the outside) to navigate through the facets. This means that the outside world can specify any value for the fq parameter. With the fq parameter being exposed, it is possible for request handler X to query documents that belong to collection Y and vice versa. But, as you might guess by now, request handler X should only be allowed to retrieve documents that belong to collection X. I know there are some discussions on how to restrict users to certain documents but i'd like to know if it is doable to patch the request handler logic to add an invariant-like directive that allows me to restrict a certain value for a certain parameter, but allow different values for that parameters. To give an example: requestHandler name=collection_x lst name=invariants str name=defTypedismax/str ... More invariants here /lst lst name=what_should_we_call_this? str name=fqfieldName:collection_x/str /lst /requestHandler The above configuration won't allow to change the defType and won't allow a value to be specified for the fieldName through the fq parameter. It will allow the outside worls to specify a value on another field through the fq parameter such as : fq:anotherField:someValue. Any ideas? Cheers, Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Re: Re: Invariants on a specific fq value
Sounds great! I'll be very sure to put it to the test tomorrow and perhaps add documentation on these types to the solrconfigxml wiki page for reference. -Original message- From: Yonik Seeley yo...@lucidimagination.com Sent: Wed 08-09-2010 19:38 To: solr-user@lucene.apache.org; Subject: Re: Re: Invariants on a specific fq value 2010 at 1:32 PM, Markus Jelsma markus.jel...@buyways.nl wrote: Interesting! I haven't met the appends method before and i'll be sure to give it a try tomorrow. Try, the wiki [1] is not very clear on what it really does. Here's a comment from the example solrconfig.xml: !-- In addition to defaults, appends params can be specified to identify values which should be appended to the list of multi-val params from the query (or the existing defaults). In this example, the param fq=instock:true will be appended to any query time fq params the user may specify, as a mechanism for partitioning the index, independent of any user selected filtering that may also be desired (perhaps as a result of faceted searching). NOTE: there is *absolutely* nothing a client can do to prevent these appends values from being used, so don't use this mechanism unless you are sure you always want it. -- -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
RE: Re: Re: Invariants on a specific fq value
Excellent! You already made my day for tomorrow! I'll check it's behavior with fq parameters specifying the a filter for the same field! -Original message- From: Chris Hostetter hossman_luc...@fucit.org Sent: Wed 08-09-2010 21:04 To: solr-user@lucene.apache.org; Subject: RE: Re: Re: Invariants on a specific fq value : Sounds great! I'll be very sure to put it to the test tomorrow and : perhaps add documentation on these types to the solrconfigxml wiki page : for reference. SolrConfigXml wouldn't really be an appropriate place to document this -- it's not a general config item, it's a feature of the SearchHandler... http://wiki.apache.org/solr/SearchHandler That wiki page already documented defaults, i've updated it to add details on appends and invariants. -Hoss -- http://lucenerevolution.org/ ... October 7-8, Boston http://bit.ly/stump-hoss ... Stump The Chump!
RE: svn branch issues
http://svn.apache.org/repos/asf/lucene/dev/branches/ -Original message- From: Mark Allan mark.al...@ed.ac.uk Sent: Thu 09-09-2010 10:44 To: solr-user@lucene.apache.org; Subject: svn branch issues Hi all, As I've mentioned in the past, I've created some custom field types which make use of the AbstractSubTypeFieldType class in the current trunk version of solr for a service we're working on. We're getting close to putting our service into production (early 2011) and we're now looking for a stable version of Solr to use with these classes. Unfortunately, my field types don't compile against the current stable version (Solr 1.4) because of the missing AbstractSubTypeFieldType and other required classes. Having had a look at JIRA to see the number of outstanding unresolved issues, I tried downloading the now defunct 1.5 branch on the assumption that it's more stable than the current trunk. Whether or not that's a safe assumption remains to be seen! Anyway, the problem is when I try to checkout the 1.5 branch, I get an error from subversion: $ svn co http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev svn: Repository moved permanently to '/viewvc/lucene/solr/branches/ branch-1.5-dev/'; please relocate Going to http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev in a browser shows the web view and contents of that branch, so something's not right with the subversion server. Anyone got any pointers please? Alternatively, how stable is the current trunk? Does it have a long way to go before being released as a stable version? Many thanks Mark -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336.
Re: svn branch issues
Well, it's under heavy development but the 3.x branch is more likely to become released than 1.5.x, which is highly unlikely to be ever released. On Thursday 09 September 2010 13:04:38 Mark Allan wrote: Thanks. Are you suggesting I use branch_3x and is that considered stable? Cheers Mark On 9 Sep 2010, at 10:47 am, Markus Jelsma wrote: http://svn.apache.org/repos/asf/lucene/dev/branches/ -Original message- From: Mark Allan mark.al...@ed.ac.uk Sent: Thu 09-09-2010 10:44 To: solr-user@lucene.apache.org; Subject: svn branch issues Hi all, As I've mentioned in the past, I've created some custom field types which make use of the AbstractSubTypeFieldType class in the current trunk version of solr for a service we're working on. We're getting close to putting our service into production (early 2011) and we're now looking for a stable version of Solr to use with these classes. Unfortunately, my field types don't compile against the current stable version (Solr 1.4) because of the missing AbstractSubTypeFieldType and other required classes. Having had a look at JIRA to see the number of outstanding unresolved issues, I tried downloading the now defunct 1.5 branch on the assumption that it's more stable than the current trunk. Whether or not that's a safe assumption remains to be seen! Anyway, the problem is when I try to checkout the 1.5 branch, I get an error from subversion: $ svn co http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev svn: Repository moved permanently to '/viewvc/lucene/solr/branches/ branch-1.5-dev/'; please relocate Going to http://svn.apache.org/viewvc/lucene/solr/branches/branch-1.5-dev in a browser shows the web view and contents of that branch, so something's not right with the subversion server. Anyone got any pointers please? Alternatively, how stable is the current trunk? Does it have a long way to go before being released as a stable version? Many thanks Mark Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Indexing checksum of field value
Hi, You can use an UpdateProcessor to do so. This can be used to deduplicate documents based on exact or near matches with fields in other documents. Check the wiki page on deduplication [1] for an example. [1]: http://wiki.apache.org/solr/Deduplication Cheers, On Thursday 09 September 2010 13:44:55 Staffan wrote: Hi, I am looking for a way to store the checksum of a field's value, something like: field name=text... !-- the SHA1 checksum of text (before applying analyzer) -- field name=text_sha1 type=checksum indexed=true stored=true ... copyField source=text dest=text_sha1 I haven't found anything like that in the docs or on google. Did I miss something? If not, would a custom tokenizer be a good way to implement it? /Staffan Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Inconsistent search results with multiple keywords
Looks like AND is your defaultOperator [1]. Check your schema.xml and try adding q.op=or to your query. [1]: http://wiki.apache.org/solr/SearchHandler#q.op On Thursday 09 September 2010 15:34:52 Stéphane Corlosquet wrote: Hi all, I'm new to solr so please let me know if there is a more appropriate place for my question below. I'm noticing a rather unexpected number of results when I add more keywords to a search. I'm listing below a example (where I replaced the real keywords with placeholders): keyword1 851 hits keyword1 keyword2 90 hits keyword1 keyword2 keyword3 269 hits keyword1 keyword2 keyword3 keyword4 47 hits As you can see, adding k2 narrows down the amount of results (as I would expect), but adding k3 to k1 and k2 suddenly increases the amount of results. with 4 keywords, the results have been narrowed down again. Would solr/lucene search algorithm with multiple keywords explain this non consistent behavior? I would think that adding more keywords would narrow down my results. I'm pasting below the relevant log in case it helps: INFO: [] webapp=/solr path=/select/ params={spellcheck=truefacet=truefacet.mincount=1facet.limit=20spellche ck.q=keyword1+keyword2+keyword3+keyword4 json.nl=mapwt=jsonversion=1.2rows=10fl=id,nid,title,comment_count,type ,created,changed,score,path,url,uid,namestart=0facet.sort=trueq=keyword1 +keyword2+keyword3+keyword4bf=recip(rord(created),4,10704,10704)^200.0fac et.field=im_cck_field_authorfacet.field=typefacet.field=im_vid_1=indent= onstart=0version=2.2rows=10} hits=10704 status=0 QTime=1 any hint on whether this is expected or not appreciated. Steph. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: Garbled facets even in a zero hit search
That's normal behavior if you haven't configured facet.mincount. Check the wiki. On Thursday 09 September 2010 16:05:01 Dennis Schafroth wrote: I am definitely not excluding the idea that index is garbled, but.. it doesn't explain that I get facets on zero hit. The schema is as following: Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Re: Inconsistent search results with multiple keywords
Indeed, it's the dismax, i missed it! My bad.. -Original message- From: Ahmet Arslan iori...@yahoo.com Sent: Thu 09-09-2010 20:37 To: solr-user@lucene.apache.org; Subject: Re: Inconsistent search results with multiple keywords yes, my schema.xml file have solrQueryParser defaultOperator=AND/ which is why I thought that the number of hits would decrease every time you add a keyword. You are using dismax so, it is determined by mm parameter. http://wiki.apache.org/solr/DisMaxQParserPlugin#mm_.28Minimum_.27Should.27_Match.29
RE: roadmap
You should check Jira's roadmap [1] instead. It shows a clear picture of what has been done since the 1.4.1 release and pending issues for the 3.x branch and others. [1]: https://issues.apache.org/jira/browse/SOLR?report=com.atlassian.jira.plugin.system.project:roadmap-panel -Original message- From: Lukas Kahwe Smith m...@pooteeweet.org Sent: Thu 09-09-2010 20:20 To: solr-user@lucene.apache.org; Subject: roadmap Hi, With the Lucene svn merge a lot of tentative release dates seemed to have slipped. Which is fine, because I think the merge is for the greater good of both projects in the long run. However I do subscribe to the school of thought that believes OSS is best served with a release often mantra. Of course such a one time restructure can add a few months. So right now the main thing I feel a lot of people are wanting to hear is a tenative timeline for when to expect the next release and the key features that we can expect. At least looking at http://lucene.apache.org/solr/ I do not see anything that communicates to the users where things are heading. Or am I just looking in the wrong place? I hope I am not coming off as a whiney user, again I am not telling you guys to work harder without me handing you a pay check. I am just suggesting that a bit more transparency as to whats going to happen in the near future would make it all the more easier for us users to bet our futures on solr :) regards, Lukas Kahwe Smith m...@pooteeweet.org
RE: Re: Re: Invariants on a specific fq value
It works as expected. The append, well, appends the parameter and because each collection has a unique value, specifying two filters on different collections will always yield zero results. This, of course, won't work for values that are shared between collections. -Original message- From: Yonik Seeley yo...@lucidimagination.com Sent: Wed 08-09-2010 19:38 To: solr-user@lucene.apache.org; Subject: Re: Re: Invariants on a specific fq value 2010 at 1:32 PM, Markus Jelsma markus.jel...@buyways.nl wrote: Interesting! I haven't met the appends method before and i'll be sure to give it a try tomorrow. Try, the wiki [1] is not very clear on what it really does. Here's a comment from the example solrconfig.xml: !-- In addition to defaults, appends params can be specified to identify values which should be appended to the list of multi-val params from the query (or the existing defaults). In this example, the param fq=instock:true will be appended to any query time fq params the user may specify, as a mechanism for partitioning the index, independent of any user selected filtering that may also be desired (perhaps as a result of faceted searching). NOTE: there is *absolutely* nothing a client can do to prevent these appends values from being used, so don't use this mechanism unless you are sure you always want it. -- -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
RE: Help on spelling.
I don't see you passing spellcheck parameters in the query string. Are they configured as default in your search handler? -Original message- From: Gregg Hoshovsky hosho...@ohsu.edu Sent: Thu 09-09-2010 22:40 To: solr-user@lucene.apache.org; Subject: Help on spelling. I am trying to use the spellchecker but cannot get past the point of having the spelling possibilities returned. I have a text field define in the schema.xml file as: field name=text type=text_ws indexed=true stored=false multiValued=true/ I modified solrconfig.xml to point the analyzer to the same field type and have the name set the same. searchComponent name=spellcheck class=solr.SpellCheckComponent str name=queryAnalyzerFieldTypetext_ws/str lst name=spellchecker str name=namedefault/str str name=fieldtext/str str name=spellcheckIndexDir./spellchecker/str /lst I left the handler alone requestHandler name=/spell class=solr.SearchHandler lazy=true lst name=defaults I see that the spellchecker folder gets files built so I am assuming that the spelling data is being created Then I ran the query as http://localhost:8983/solr/biolibrary/spell/?q=text:wedgversion=2.2start=0rows=10indent=onwt=json I would expect that this would have returned some spelling suggestions ( such as wedge) but don t get anything besides: { responseHeader:{ status:0, QTime:1}, response:{numFound:0,start:0,docs:[] }} Any help is appreciated. Gregg
RE: How to Update Value of One Field of a Document in Index?
The MoreLikeThis component actually can accept external input: http://wiki.apache.org/solr/MoreLikeThisHandler#Using_ContentStreams -Original message- From: Jonathan Rochkind rochk...@jhu.edu Sent: Fri 10-09-2010 18:59 To: solr-user@lucene.apache.org; Subject: RE: How to Update Value of One Field of a Document in Index? More like this is intended to be run at query time. For what reasons are you thinking you want to (re-)index each document based on the results of MoreLikeThis? You're right that that's not what the component is intended for. Jonathan From: Savannah Beckett [savannah_becket...@yahoo.com] Sent: Friday, September 10, 2010 11:18 AM To: solr-user@lucene.apache.org Subject: Re: How to Update Value of One Field of a Document in Index? Thanks. I am trying to use MoreLikeThis in Solr to find similar documents in the solr index and use the data from these similar documents to modify a field in each document that I am indexing. I found that MoreLikeThis in Solr only works when the document is in the index, is it true? If so, I may have to wait til the indexing is finished, then run my own command to do MoreLikeThis to each document in the index, and then reindex each document? It sounds like it's not efficient. Is there a better way? Thanks. From: Liam O'Boyle liam.obo...@intelligencebank.com To: solr-user@lucene.apache.org Cc: u...@nutch.apache.org Sent: Thu, September 9, 2010 11:06:36 PM Subject: Re: How to Update Value of One Field of a Document in Index? Hi Savannah, You can only reindex the entire document; if you only have the ID, then do a search to retrieve the rest of the data, then reindex. This assumes that all of the fields you need to index are stored (so that you can retrieve them) and not just indexed. Liam On Fri, Sep 10, 2010 at 3:29 PM, Savannah Beckett savannah_becket...@yahoo.com wrote: I use nutch to crawl and index to Solr. My code is working. Now, I want to update the value of one of the fields of a document in the solr index after the document was already indexed, and I have only the document id. How do I do that? Thanks.
RE: multivalued fields in result
Yes, you'll get what is stored and asked for. -Original message- From: Jason Chaffee jchaf...@ebates.com Sent: Sat 11-09-2010 05:27 To: solr-user@lucene.apache.org; Subject: multivalued fields in result Is it possible to return multivalued files in the result? I would like to have a multivalued field that is stored and not indexed (I also copy the same field into another field where it is tokenized and indexed). I would then like all the values of this field returned in the result set. Is there a way to do this? If it is not possible, could someone elaborate why that is so that I may see if I can make it work. thanks, Jason
RE: Re: solr.DateField: org.apache.solr.common.SolrException: Error while creating field
It would be a nice feature if Solr supports queries with time zone support on an index where all times are UTC. There is some chatter about this in SOLR-750 but i haven't found an issue that would add support for time zone queries. Did i do a lousy search or is the issue missing as of yet? -Original message- From: Yonik Seeley yo...@lucidimagination.com Sent: Tue 14-09-2010 22:58 To: solr-user@lucene.apache.org; Subject: Re: solr.DateField: org.apache.solr.common.SolrException: Error while creating field On Tue, Sep 14, 2010 at 4:54 PM, h00kpub...@gmail.com h00kpub...@googlemail.com wrote: SEVERE: org.apache.solr.common.SolrException: Error while creating field 'metadata_last_modified{type=date,properties=indexed,stored,omitNorms}' from value '2010-09-14T22:29:24+0200' Different timezones are currently not allowed - you must UTC (hence the Z timecode). -Yonik http://lucenerevolution.org Lucene/Solr Conference, Boston Oct 7-8
Re: Handling Aggregate Records/Roll-up in Solr
You should just flatten the representation of the shirt in the data model. On Wednesday 15 September 2010 22:23:17 Thomas Martin wrote: Can someone point me to the mechanism in Sol that might allow me to roll-up or aggregate records for display. We have many items that are similar and only want to show a representative record to the user until they select that record. As an example - We carry a polo shirt and have 15 records that represent the individual colors for that shirt. Does the query API provide anyway to rollup the records passed on a property or do we need to just flatten the representation of the shirt in the data model. Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Re: Get all results from a solr query
Not according to the wiki; http://wiki.apache.org/solr/CommonQueryParameters#rows But you could always create an issue for this one. -Original message- From: Christopher Gross cogr...@gmail.com Sent: Thu 16-09-2010 22:50 To: solr-user@lucene.apache.org; Subject: Re: Get all results from a solr query That will stil just return 10 rows for me. Is there something else in the configuration of solr to have it return all the rows in the results? -- Chris On Thu, Sep 16, 2010 at 4:43 PM, Shashi Kant sk...@sloan.mit.edu wrote: q=*:* On Thu, Sep 16, 2010 at 4:39 PM, Christopher Gross cogr...@gmail.com wrote: I have some queries that I'm running against a solr instance (older, 1.2 I believe), and I would like to get *all* the results back (and not have to put an absurdly large number as a part of the rows parameter). Is there a way that I can do that? Any help would be appreciated. -- Chris
Re: Search the mailinglist?
http://www.lucidimagination.com/search/?q= On Friday 17 September 2010 16:10:23 alexander sulz wrote: Im sry to bother you all with this, but is there a way to search through the mailinglist archive? Ive found http://mail-archives.apache.org/mod_mbox/lucene-solr-user/ so far but there isnt any convinient way to search through the archive. Thanks for your help Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
RE: Grouping in solr ?
http://wiki.apache.org/solr/FieldCollapsing https://issues.apache.org/jira/browse/SOLR-236 -Original message- From: Papp Richard ccode...@gmail.com Sent: Thu 23-09-2010 21:29 To: solr-user@lucene.apache.org; Subject: Grouping in solr ? Hi all, is it possible somehow to group documents? I have services as documents, and I would like to show the filtered services grouped by company. So I filter services by given criteria, but I show the results grouped by companay. If I got 1000 services, maybe I need to show just 100 companies (this will affect pagination as well), and how could I get the company info? Should I store the company info in each service (I don't need the compnany info to be indexed) ? regards, Rich __ Information from ESET NOD32 Antivirus, version of virus signature database 5419 (20100902) __ The message was checked by ESET NOD32 Antivirus. http://www.eset.com