DocSet getting cached in filterCache for facet request with {!cache=false}
Hello, It seems Solr is caching when facting even with fq={!cache=false}*:* specified. This is what I am doing on Solr 4.10.0 on jre 1.7.0_51. Query 1) No cache in filterCache as expected http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:* http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache confirms this. Query 2) Query result docset cached in filterCache unexpectedly ? http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*facet=truefacet.field=foobarfacet.method=enum http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache shows entry of item_*:*: org.apache.solr.search.BitDocSet@66afbbf cached. Suggestions why or how this may be avoided since I don't want to cache anything other than facet(ed) terms in the filterCache (for predictable heap usage). The culprit seems to be line 1431 @ http://svn.apache.org/viewvc/lucene/dev/tags/lucene_solr_4_10_2/solr/core/src/java/org/apache/solr/search/SolrIndexSearcher.java?view=markup Thanks. -M
How to specify a property for all cores
Hi, I'm using Solr 4.10.1 with the new solr.xml format (auto-discovered cores). I'm trying to set a property that I can reference in solrconfig.xml files of all cores. I know I can use JVM system properties or add the property to each core's core.properties file. Is there another possibility? I don't want to use system properties as this would affect the whole servlet container. Can I specify the property in a central configuration file? I found some discussion about a solr.properties file. But there's no such thing in Solr 4.10? (see SOLR-4615). Regards, Andreas
Removing Common Web Page Header and Footer from content
Hi, I am using Nutch 1.9 and Solr 4.6 to index a web application with approximately 100 distinct URL and contents. Nutch is used to fetch the urls, links and the crawl the entire web application to extract all the content for all pages, and send the content to Solr. The problem that I have now is that the first 1000 or so characters and the last 400 or so characters of the pages which are common header and footer are showing up in the search results. Is there a way to ignore the links or keep only the static text in the content? Any useful pointers would be highly appreciated. Regards, Moumita Dhar CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Proper way to backup solr.
First, I want to thank you for your response! can you provide more information about the suggested hardlink solution? What are the advantages and disadvantages using it? can you provide an example please? meanwhile try to read about it and test it myself asap. thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Proper-way-to-backup-solr-tp4168498p4168714.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: create new core based on named config set using the admin page
Okay, I've created https://issues.apache.org/jira/browse/SOLR-6728 Erick Erickson wrote on 11/06/2014 08:00 PM: Yeah, please create a JIRA. There are a couple of umbrella JIRAs that you might want to link it to I'm not sure it quite fits in either, if not just let it hang out there bear: https://issues.apache.org/jira/browse/SOLR-6703 https://issues.apache.org/jira/browse/SOLR-6084 On Wed, Nov 5, 2014 at 11:57 PM, Andreas Hubold andreas.hub...@coremedia.com wrote: Hi, Solr 4.8 introduced named config sets with https://issues.apache.org/jira/browse/SOLR-4478. You can create a new core based on a config set with the CoreAdmin API as described in https://cwiki.apache.org/confluence/display/solr/Config+Sets The Solr Admin page allows the creation of new cores as well. There's a Add Core button in the Core Admin tab. This will open a dialog where you can enter name, instanceDir, dataDir and the names of solrconfig.xml / schema.xml. It would be cool and consistent if one could create a core based on a named config set here as well. I'm asking because I might have overlooked something or maybe somebody is already working on this. But probably I should just create a JIRA issue, right? Regards, Andreas Ramzi Alqrainy wrote on 11/05/2014 08:24 PM: Sorry, I did not get your point, can you please elaborate more -- View this message in context: http://lucene.472066.n3.nabble.com/create-new-core-based-on-named-config-set-using-the-admin-page-tp4167850p4167860.html Sent from the Solr - User mailing list archive at Nabble.com. -- Andreas Hubold Software Architect tel +49.40.325587.519 fax +49.40.325587.999 andreas.hub...@coremedia.com CoreMedia AG content | context | conversion Ludwig-Erhard-Str. 18 20459 Hamburg, Germany www.coremedia.com Executive Board: Gerrit Kolb (CEO), Dr. Klemens Kleiminger (CFO) Supervisory Board: Prof. Dr. Florian Matthes (Chairman) Trade Register: Amtsgericht Hamburg, HR B 76277 . -- Andreas Hubold Software Architect tel +49.40.325587.519 fax +49.40.325587.999 andreas.hub...@coremedia.com CoreMedia AG content | context | conversion Ludwig-Erhard-Str. 18 20459 Hamburg, Germany www.coremedia.com Executive Board: Gerrit Kolb (CEO), Dr. Klemens Kleiminger (CFO) Supervisory Board: Prof. Dr. Florian Matthes (Chairman) Trade Register: Amtsgericht Hamburg, HR B 76277
I want to translate solr wiki to Korean.
In Korea, only few people can read English well. Thus, it is difficult to use solr. But I want solr to spread out . So I would like to translate solr wiki to Korean. Is there any good ways to translate it?
Can I select dummy field(for count) from solr?
I want to show cumulative graph from banana framework(SiLK). https://docs.lucidworks.com/display/SiLK/Banana There is no cumulative graph so I want to select count(*) from solr collection like dummy field. So then I am going to sum the field(count(*)) and show histogram graph. Do you have idea? It does not work like this. (q = *:count(*)) http://lucene.472066.n3.nabble.com/file/n4168713/solr.png Help me please. Have a nice day! -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-select-dummy-field-for-count-from-solr-tp4168713.html Sent from the Solr - User mailing list archive at Nabble.com.
Parent query yields document which is not matched by parents filter
Hi, folks! We are using parent/child architecture in our project and sometimes when using child transformer ([child]) there are an exception: Parent query yields document which is not matched by parents filter, docID=... Examples of query are: http://localhost/solr/core/select?fq=id:123456789fl=*q={!child of=DocumentType:parent}Text:foo http://localhost/solr/core/select?q=id:123456789fl=*,[child parentFilter=DocumentType:parent] Documents in index: doc id123456789_0/id Textfoo/Text doc doc id123456789/id DocumentTypeparent/DocumentType ... other fields /doc Root field in schema.xml is present. When I made an optimize to one segment sometimes error is dissapeared, but sometimes not. Please advice how to fix it. -- View this message in context: http://lucene.472066.n3.nabble.com/Parent-query-yields-document-which-is-not-matched-by-parents-filter-tp4168727.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Suggester not suggesting anything using DictionaryCompoundWordTokenFilterFactory
I think I found the problem. The definition of the suggester component has a field option which references the field that the suggester uses to generate suggestions. Changing this to the field using the DictionaryCompundWordTokenFilterFactory also suggests word parts. Am 11.11.2014 08:52 schrieb Thomas Michael Engelke: I'm toying around with the suggester component, like described here: http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx [1] So I made 4 fields: field name=text_suggest type=text_suggest indexed=true stored=true multiValued=true / copyField source=name dest=text_suggest / field name=text_suggest_edge type=text_suggest_edge indexed=true stored=true multiValued=true / copyField source=name dest=text_suggest_edge / field name=text_suggest_ngram type=text_suggest_ngram indexed=true stored=true multiValued=true / copyField source=name dest=text_suggest_ngram / field name=text_suggest_dictionary_ngram type=text_suggest_dictionary_ngram indexed=true stored=true multiValued=true / copyField source=name dest=text_suggest_dictionary_ngram / with the corresponding definitions: fieldType name=text_suggest class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType fieldType name=text_suggest_edge class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=50 side=front / /analyzer /fieldType fieldType name=text_suggest_ngram class=solr.TextField analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=50 side=front / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=text_suggest_dictionary_ngram class=solr.TextField analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=dictionary.txt minWordSize=5 minSubwordSize=3 maxSubwordSize=30 onlyLongestMatch=false/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=50 side=front / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType I'm calling the suggester component this way: http://address:8983/solr/core/suggest?qf=text_suggest^6.0%20test_suggest_edge^3.0%20text_suggest_ngram^1.0%20text_suggest_dictionary_ngram^0.2q=wa This seems to work fine: response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=spellcheck lst name=suggestions lst name=wa int name=numFound5/int int name=startOffset0/int int name=endOffset2/int arr name=suggestion strwandelement aus gitter/str strwandelement aus stahlblech/str strwandelement/str strwandhalter für prospekte/str strwandascher, h 300 × b 230 × t 60 mm/str /arr /lst str name=collation(wandelement aus gitter)/str /lst /lst /response However, I added the fourth field so I could get low-boosted suggestions using the afformentioned DictionaryCompoundWordTokenFilterFactory. A sample analysis for the field(type) text_suggest_dictionary_ngram for the word Geländewagen: g ge gel gelä gelän geländ gelände geländew geländewa geländewag geländewage geländewagen g ge gel gelä gelän geländ gelände w wa wag wage wagen As we can see, the DictionaryCompoundWordTokenFilterFactory extracts the word wagen and EdgeNGrams it. However, I cannot get results from these NGrams. Trying wag as the search term for the suggester, there are no results. However, doing an analysis of Geländewagen (as field value index) and wag (as field value query), analysis shows a match. I had the thought that it might be because the underlying component of the suggester is a spellchecker, and a spellchecker wouldn't correct wag to wagen because there was an NGram that spelled wag, and so the word was spelled correctly already. So I tried without the EdgeNGrams, but the result stays the same. Links: -- [1] http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx
How to suggest from multiple fields?
Like in this article (http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx), I am using multiple fields to generate different options for an autosuggest functionality: - First, the whole field (top priority) - Then, the whole field as EdgeNGrams from the left side (normal priority) - Lastly, single words or word parts (compound words) as EdgeNGrams However, I was not very successful in supplying a single requestHandler (/suggest) with data from multiple suggesters. I have also not been able to find any sample of how this might be done correctly. Is there a sample that I can read, or a documentation of how this might be done? The referenced article was doing it, yet only marginally described the technical implementation.
Re: Removing Common Web Page Header and Footer from content
Hi Moumita, Once, I used https://code.google.com/p/boilerpipe/ to remove common header/footers etc. Ahmet On Tuesday, November 11, 2014 10:41 AM, Moumita Dhar01 moumita_dha...@infosys.com wrote: Hi, I am using Nutch 1.9 and Solr 4.6 to index a web application with approximately 100 distinct URL and contents. Nutch is used to fetch the urls, links and the crawl the entire web application to extract all the content for all pages, and send the content to Solr. The problem that I have now is that the first 1000 or so characters and the last 400 or so characters of the pages which are common header and footer are showing up in the search results. Is there a way to ignore the links or keep only the static text in the content? Any useful pointers would be highly appreciated. Regards, Moumita Dhar CAUTION - Disclaimer * This e-mail contains PRIVILEGED AND CONFIDENTIAL INFORMATION intended solely for the use of the addressee(s). If you are not the intended recipient, please notify the sender by e-mail and delete the original message. Further, you are not to copy, disclose, or distribute this e-mail or its contents to any other person and any such actions are unlawful. This e-mail may contain viruses. Infosys has taken every reasonable precaution to minimize this risk, but is not liable for any damage you may sustain as a result of any virus in this e-mail. You should carry out your own virus checks before opening the e-mail or attachment. Infosys reserves the right to monitor and review the content of all messages sent to or from this e-mail address. Messages sent to or from this e-mail address may be stored on the Infosys e-mail system. ***INFOSYS End of Disclaimer INFOSYS***
Re: Analytics result for each Result Group
Hi Anurag, How can I find median function ? I use a lot that. 2014-11-09 20:39 GMT+02:00 Anurag Sharma anura...@gmail.com: Can a function query(http://wiki.apache.org/solr/FunctionQuery) serves your use case On Wed, Nov 5, 2014 at 3:36 PM, Talat Uyarer ta...@uyarer.com wrote: I searched wiki pages about that. I do not find any documentation. If you help me I will be glad. Thanks 2014-11-04 11:34 GMT+02:00 Talat Uyarer ta...@uyarer.com: Hi folks, We use Analytics Component for median, max etc. I wonder if I use group.field parameter with analytics component, How to calculate analytics for each result group ? Thanks -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
Re: Proper way to backup solr.
On 11/11/2014 1:45 AM, elmerfudd wrote: First, I want to thank you for your response! can you provide more information about the suggested hardlink solution? What are the advantages and disadvantages using it? can you provide an example please? meanwhile try to read about it and test it myself asap. Something like this: mkdir -p ${BACKUPDIR}/corename/index rm -f ${BACKUPDIR}/corename/index/* cp -pl ${SOLRHOME}/corename/data/index/* ${BACKUPDIR}/corename/index/. This does not include necessary additional steps like renaming previous backups to set up an auto-rotating archive. Because only you know what your requirements are when it comes to backup archives, you'll need to fill that part in. As already mentioned, the source and destination must be on the same filesystem. There are very few disadvantages to this solution. It maintains instantaneous backups of previous index states with as little overhead as possible. Note that if you have a filesystem with good built-in snapshot support (typically zfs or btrfs), you can use filesystem snapshots instead, with much the same effect. Thanks, Shawn
Re: I want to translate solr wiki to Korean.
Hi Jeon Woosung, The Solr community wiki is no longer the official Solr documentation location. The Solr Reference Guide is where Solr documentation is now maintained: https://cwiki.apache.org/confluence/display/solr/Apache+Solr+Reference+Guide. I’m not sure what you mean when you ask “Is there any good ways to translate it?” Are you asking about strategy? Or tools? Hmm, maybe you could translate each page using Google Translate or a similar service, then refine the translation manually? I think you may be the first to offer to translate Solr documentation. Perhaps we could create a new space in the Apache Confluence instance, where the Solr Reference Guide is, to host the translated documentation? Probably it would be best to create a Solr JIRA issue where this topic can be discussed: https://issues.apache.org/jira/browse/SOLR. Thanks for contributing! Steve On Nov 11, 2014, at 4:30 AM, Jeon Woosung jeonwoos...@gmail.com wrote: In Korea, only few people can read English well. Thus, it is difficult to use solr. But I want solr to spread out . So I would like to translate solr wiki to Korean. Is there any good ways to translate it?
Re: DocSet getting cached in filterCache for facet request with {!cache=false}
On 11/11/2014 1:22 AM, Mohsin Beg Beg wrote: It seems Solr is caching when facting even with fq={!cache=false}*:* specified. This is what I am doing on Solr 4.10.0 on jre 1.7.0_51. Query 1) No cache in filterCache as expected http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:* http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache confirms this. Query 2) Query result docset cached in filterCache unexpectedly ? http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*facet=truefacet.field=foobarfacet.method=enum http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache shows entry of item_*:*: org.apache.solr.search.BitDocSet@66afbbf cached. Suggestions why or how this may be avoided since I don't want to cache anything other than facet(ed) terms in the filterCache (for predictable heap usage). I hope this is just for testing, because fq=*:* is completely unnecessary, and will cause Solr to do extra work that it doesn't need to do. Try changing that second query so q and fq are not the same, so you can see for sure which one is producing the filterCache entry. With the same query for both, you cannot know which one is populating the filterCache. If it's coming from the q parameter, then it's probably working as designed. If it comes from the fq, then we probably actually do have a problem that needs investigation. Thanks, Shawn
Re: Lucene to Solrcloud migration
Hi Eric, Michael, thank you both for your comments. 2014-11-11 5:05 GMT+01:00 Erick Erickson erickerick...@gmail.com: bq: - the documents are organized in shards according to date (integer) and language (a possibly extensible discrete set) bq: - the indexes are disjunct OK, I'm having a hard time getting my head around these two statements. If the indexes are disjunct in the sense that you only search one at a time, then they are different collections in SolrCloud jargon. I just meant that every document is contained in a single one of the indexes. I have a lot of Lucene indexes for various [language X timespan], but logically we are speaking about a single huge index. That is why I thought it would be natural to represent is as a single SolrCloud collection. If, on the other hand, these are a big collection and you want to search them all with a single query, I suggest that in SolrCloud land you don't want them to be discrete shards. My reasoning here is that let's say you have a bunch of documents for October, 2014 in Spanish. By putting these all on a single shard, your queries all have to be serviced by that one shard. You don't get any parallelism. That is right. Actually the parallelization is not the main issue right now. The queries are very sparse, currently our system does not support load balancing at all. I imagined that in the future it could be achievable via SolrCloud replication. The main consideration is to be able to plug the indexes in and out on demand. The total size of the data is in terabytes. We usually want to search only the latest indexes but occassionally it is needed to plug in one of the older ones. Maybe (probably) I still have some misconceptions about the uses of SolrCloud... If it really does make sense in your case to route all the doc to a single shard, then Michael's comment is spot-on use compositeId router. You confuse me here. I was not thinking about a single shard, on the contrary, any [language X timespan] index would be itself a shard. I agree that compositeId router seems to be natural for what I need. I am currently searching for the way to convert my indexes in such way that my document ID's have the composite format. Currently these are just unique integers, so I would like to prefix all the document ID's of an index with it's language and timespan. I do not know how, but I believe this should be possible, as it is a constant operation that would not change the structure of the index. Best, Michal Best, Erick On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi Michal, Is there a particular reason to shard your collections like that? If it was mainly for ease of operations, I'd consider just using CompositeId to prevent specific types of queries hotspotting particular nodes. If your ingest rate is fast, you might also consider making each collection an alias that points to many actual collections, and periodically closing off a collection and starting a new one. This prevents cache churn and the impact of large merges. Michael On 11/10/14 08:03, Michal Krajňanský wrote: Hi All, I have been working on a project that has long employed Lucene indexer. Currently, the system implements a proprietary document routing and index plugging/unplugging on top of the Lucene and of course contains a great body of indexes. Recently an idea came up to migrate from Lucene to Solrcloud, which appears to be more powerfull that our proprietary system. Could you suggest the best way to seamlessly migrate the system to use Solrcloud, when the reindexing is not an option? - all the existing indexes represent a single collection in terms of Solrcloud - the documents are organized in shards according to date (integer) and language (a possibly extensible discrete set) - the indexes are disjunct I have been able to convert the existing indexes to the newest Lucene version and plug them individually into the Solrcloud. However, there is the question of routing, sharding etc. Any insight appreciated. Best, Michal Krajnansky
Re: Lucene to Solrcloud migration
Yeah, Erick confused me a bit too, but I think what he's talking about takes for granted that you'd have your various indexes directly set up as individual collections. If instead you're considering one big collection, or a few collections based on aggregations of your individual indexes, having big, multisharded collections using compositeId should work, unless there's a use case we're not discussing. Michael On 11/11/14 10:27, Michal Krajňanský wrote: Hi Eric, Michael, thank you both for your comments. 2014-11-11 5:05 GMT+01:00 Erick Erickson erickerick...@gmail.com: bq: - the documents are organized in shards according to date (integer) and language (a possibly extensible discrete set) bq: - the indexes are disjunct OK, I'm having a hard time getting my head around these two statements. If the indexes are disjunct in the sense that you only search one at a time, then they are different collections in SolrCloud jargon. I just meant that every document is contained in a single one of the indexes. I have a lot of Lucene indexes for various [language X timespan], but logically we are speaking about a single huge index. That is why I thought it would be natural to represent is as a single SolrCloud collection. If, on the other hand, these are a big collection and you want to search them all with a single query, I suggest that in SolrCloud land you don't want them to be discrete shards. My reasoning here is that let's say you have a bunch of documents for October, 2014 in Spanish. By putting these all on a single shard, your queries all have to be serviced by that one shard. You don't get any parallelism. That is right. Actually the parallelization is not the main issue right now. The queries are very sparse, currently our system does not support load balancing at all. I imagined that in the future it could be achievable via SolrCloud replication. The main consideration is to be able to plug the indexes in and out on demand. The total size of the data is in terabytes. We usually want to search only the latest indexes but occassionally it is needed to plug in one of the older ones. Maybe (probably) I still have some misconceptions about the uses of SolrCloud... If it really does make sense in your case to route all the doc to a single shard, then Michael's comment is spot-on use compositeId router. You confuse me here. I was not thinking about a single shard, on the contrary, any [language X timespan] index would be itself a shard. I agree that compositeId router seems to be natural for what I need. I am currently searching for the way to convert my indexes in such way that my document ID's have the composite format. Currently these are just unique integers, so I would like to prefix all the document ID's of an index with it's language and timespan. I do not know how, but I believe this should be possible, as it is a constant operation that would not change the structure of the index. Best, Michal Best, Erick On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi Michal, Is there a particular reason to shard your collections like that? If it was mainly for ease of operations, I'd consider just using CompositeId to prevent specific types of queries hotspotting particular nodes. If your ingest rate is fast, you might also consider making each collection an alias that points to many actual collections, and periodically closing off a collection and starting a new one. This prevents cache churn and the impact of large merges. Michael On 11/10/14 08:03, Michal Krajňanský wrote: Hi All, I have been working on a project that has long employed Lucene indexer. Currently, the system implements a proprietary document routing and index plugging/unplugging on top of the Lucene and of course contains a great body of indexes. Recently an idea came up to migrate from Lucene to Solrcloud, which appears to be more powerfull that our proprietary system. Could you suggest the best way to seamlessly migrate the system to use Solrcloud, when the reindexing is not an option? - all the existing indexes represent a single collection in terms of Solrcloud - the documents are organized in shards according to date (integer) and language (a possibly extensible discrete set) - the indexes are disjunct I have been able to convert the existing indexes to the newest Lucene version and plug them individually into the Solrcloud. However, there is the question of routing, sharding etc. Any insight appreciated. Best, Michal Krajnansky
Re: Lucene to Solrcloud migration
Hm. So I found that one can update stored fields with atomic update operation, however according to http://stackoverflow.com/questions/19058795/it-is-possible-to-update-uniquekey-in-solr-4 this will not work for uniqueKey. So I guess with compositeId router I am out of luck. I have been also searching for a way to implement my own routing mechanism. Anyway, this seem to be a cleaner solution -- I would not need to modify existing index, just compute hash from the other (stored) fields than just document id. Can you confirm that it is possible? The documentation is however very modest (I only found that it is possible to specify custom hash function). Best, Michal 2014-11-11 16:48 GMT+01:00 Michael Della Bitta michael.della.bi...@appinions.com: Yeah, Erick confused me a bit too, but I think what he's talking about takes for granted that you'd have your various indexes directly set up as individual collections. If instead you're considering one big collection, or a few collections based on aggregations of your individual indexes, having big, multisharded collections using compositeId should work, unless there's a use case we're not discussing. Michael On 11/11/14 10:27, Michal Krajňanský wrote: Hi Eric, Michael, thank you both for your comments. 2014-11-11 5:05 GMT+01:00 Erick Erickson erickerick...@gmail.com: bq: - the documents are organized in shards according to date (integer) and language (a possibly extensible discrete set) bq: - the indexes are disjunct OK, I'm having a hard time getting my head around these two statements. If the indexes are disjunct in the sense that you only search one at a time, then they are different collections in SolrCloud jargon. I just meant that every document is contained in a single one of the indexes. I have a lot of Lucene indexes for various [language X timespan], but logically we are speaking about a single huge index. That is why I thought it would be natural to represent is as a single SolrCloud collection. If, on the other hand, these are a big collection and you want to search them all with a single query, I suggest that in SolrCloud land you don't want them to be discrete shards. My reasoning here is that let's say you have a bunch of documents for October, 2014 in Spanish. By putting these all on a single shard, your queries all have to be serviced by that one shard. You don't get any parallelism. That is right. Actually the parallelization is not the main issue right now. The queries are very sparse, currently our system does not support load balancing at all. I imagined that in the future it could be achievable via SolrCloud replication. The main consideration is to be able to plug the indexes in and out on demand. The total size of the data is in terabytes. We usually want to search only the latest indexes but occassionally it is needed to plug in one of the older ones. Maybe (probably) I still have some misconceptions about the uses of SolrCloud... If it really does make sense in your case to route all the doc to a single shard, then Michael's comment is spot-on use compositeId router. You confuse me here. I was not thinking about a single shard, on the contrary, any [language X timespan] index would be itself a shard. I agree that compositeId router seems to be natural for what I need. I am currently searching for the way to convert my indexes in such way that my document ID's have the composite format. Currently these are just unique integers, so I would like to prefix all the document ID's of an index with it's language and timespan. I do not know how, but I believe this should be possible, as it is a constant operation that would not change the structure of the index. Best, Michal Best, Erick On Mon, Nov 10, 2014 at 11:50 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Hi Michal, Is there a particular reason to shard your collections like that? If it was mainly for ease of operations, I'd consider just using CompositeId to prevent specific types of queries hotspotting particular nodes. If your ingest rate is fast, you might also consider making each collection an alias that points to many actual collections, and periodically closing off a collection and starting a new one. This prevents cache churn and the impact of large merges. Michael On 11/10/14 08:03, Michal Krajňanský wrote: Hi All, I have been working on a project that has long employed Lucene indexer. Currently, the system implements a proprietary document routing and index plugging/unplugging on top of the Lucene and of course contains a great body of indexes. Recently an idea came up to migrate from Lucene to Solrcloud, which appears to be more powerfull that our proprietary system. Could you suggest the best way to seamlessly migrate the system to use Solrcloud, when the reindexing is not an option? - all the existing indexes
Re: DocSet getting cached in filterCache for facet request with {!cache=false}
Well, the difference that you're faceting with method=enum, which uses the filterCache (I think, it's been a while). I admit I'm a little surprised that when I tried faceting with the inStock field in the standard distro I got 3 entries when there are only two values but I'm willing to let that go ;) i.e. this produces 3 entries in the filterCache: http://localhost:8983/solr/techproducts/select?q=*:*rows=0facet=truefacet.field=inStockfacet.method=enum not an fq clause in sight.. Best, Erick On Tue, Nov 11, 2014 at 9:31 AM, Shawn Heisey apa...@elyograg.org wrote: On 11/11/2014 1:22 AM, Mohsin Beg Beg wrote: It seems Solr is caching when facting even with fq={!cache=false}*:* specified. This is what I am doing on Solr 4.10.0 on jre 1.7.0_51. Query 1) No cache in filterCache as expected http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:* http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache confirms this. Query 2) Query result docset cached in filterCache unexpectedly ? http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*facet=truefacet.field=foobarfacet.method=enum http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache shows entry of item_*:*: org.apache.solr.search.BitDocSet@66afbbf cached. Suggestions why or how this may be avoided since I don't want to cache anything other than facet(ed) terms in the filterCache (for predictable heap usage). I hope this is just for testing, because fq=*:* is completely unnecessary, and will cause Solr to do extra work that it doesn't need to do. Try changing that second query so q and fq are not the same, so you can see for sure which one is producing the filterCache entry. With the same query for both, you cannot know which one is populating the filterCache. If it's coming from the q parameter, then it's probably working as designed. If it comes from the fq, then we probably actually do have a problem that needs investigation. Thanks, Shawn
Re: Lucene to Solrcloud migration
bq: So I guess with compositeId router I am out of luck. No, not at all. Atomic updates are exactly about updating a doc and NOT changing the id. A different uniqueKey is a different doc by definition. So you can easily use atomic updates with composite IDs since you are changing a field of an existing doc as long as the router bits are the same. But that may be irrelevant Take a look at LotsOfCores (WARNING! this is NOT verified in SolrCloud!). The design there is exactly to limit the number of simultaneous cores in memory, having them load/unload themselves based on the limits you set up. So you can just fire queries blindly at your server where the URL includes the core name and be confident that you'll stay within your hardware limits. http://wiki.apache.org/solr/LotsOfCores If you're using SolrCloud, though, there's really no concept of unloading specific cores/indexes at once, it really pre-supposes that you've scaled your system such that you can have them all active at once. So I don't really see how routing to specific cores is going to help you. Then again I don't know your problem space. Best, Erick On Tue, Nov 11, 2014 at 11:33 AM, Michal Krajňanský michal.krajnan...@gmail.com wrote: Hm. So I found that one can update stored fields with atomic update operation, however according to http://stackoverflow.com/questions/19058795/it-is-possible-to-update-uniquekey-in-solr-4 this will not work for uniqueKey. So I guess with compositeId router I am out of luck. I have been also searching for a way to implement my own routing mechanism. Anyway, this seem to be a cleaner solution -- I would not need to modify existing index, just compute hash from the other (stored) fields than just document id. Can you confirm that it is possible? The documentation is however very modest (I only found that it is possible to specify custom hash function). Best, Michal 2014-11-11 16:48 GMT+01:00 Michael Della Bitta michael.della.bi...@appinions.com: Yeah, Erick confused me a bit too, but I think what he's talking about takes for granted that you'd have your various indexes directly set up as individual collections. If instead you're considering one big collection, or a few collections based on aggregations of your individual indexes, having big, multisharded collections using compositeId should work, unless there's a use case we're not discussing. Michael On 11/11/14 10:27, Michal Krajňanský wrote: Hi Eric, Michael, thank you both for your comments. 2014-11-11 5:05 GMT+01:00 Erick Erickson erickerick...@gmail.com: bq: - the documents are organized in shards according to date (integer) and language (a possibly extensible discrete set) bq: - the indexes are disjunct OK, I'm having a hard time getting my head around these two statements. If the indexes are disjunct in the sense that you only search one at a time, then they are different collections in SolrCloud jargon. I just meant that every document is contained in a single one of the indexes. I have a lot of Lucene indexes for various [language X timespan], but logically we are speaking about a single huge index. That is why I thought it would be natural to represent is as a single SolrCloud collection. If, on the other hand, these are a big collection and you want to search them all with a single query, I suggest that in SolrCloud land you don't want them to be discrete shards. My reasoning here is that let's say you have a bunch of documents for October, 2014 in Spanish. By putting these all on a single shard, your queries all have to be serviced by that one shard. You don't get any parallelism. That is right. Actually the parallelization is not the main issue right now. The queries are very sparse, currently our system does not support load balancing at all. I imagined that in the future it could be achievable via SolrCloud replication. The main consideration is to be able to plug the indexes in and out on demand. The total size of the data is in terabytes. We usually want to search only the latest indexes but occassionally it is needed to plug in one of the older ones. Maybe (probably) I still have some misconceptions about the uses of SolrCloud... If it really does make sense in your case to route all the doc to a single shard, then Michael's comment is spot-on use compositeId router. You confuse me here. I was not thinking about a single shard, on the contrary, any [language X timespan] index would be itself a shard. I agree that compositeId router seems to be natural for what I need. I am currently searching for the way to convert my indexes in such way that my document ID's have the composite format. Currently these are just unique integers, so I would like to prefix all the document ID's of an index with it's language and timespan. I do not know how, but I believe this should be possible, as it is a constant operation that would not change the
Re: How to Facet external fields
Thanks for your response.. It's indeed a good idea..I will try that out.. -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-generate-calculate-facet-counts-for-external-fields-tp4168653p4168790.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Does ReRankQuery support reranking the result of a FuzzyQuery?
This issue should be resolved in https://issues.apache.org/jira/browse/SOLR-6323. This is committed in trunk, 5x, 4x, and 4_10, but this did not make it into 4.10.2. If you take the version in the 4_10 branch you should be good to go. If a version 4.10.3 is cut, this will be included. Joel Bernstein Search Engineer at Heliosearch On Mon, Nov 10, 2014 at 1:50 PM, Brian Sawyer bsaw...@basistech.com wrote: Hello, We are trying to make use of the new ReRankQuery to rescore results according to a custom function but run into problems when our main query includes a FuzzyQuery. Using the example setup in Solr 4.10.2 querying: q=name:Dell~1 rq={!rerank reRankQuery=id:whatever} results in: java.lang.UnsupportedOperationException: Query name:delk~1 does not implement createWeight Is this a bug or is this intended? Thanks, Brian Full stack trace below: java.lang.UnsupportedOperationException: Query name:delk~1 does not implement createWeight at org.apache.lucene.search.Query.createWeight(Query.java:80) at org.apache.solr.search.ReRankQParserPlugin$ReRankWeight.init (ReRankQParserPlugin.java:177) at org.apache.solr.search.ReRankQParserPlugin$ReRankQuery.createWeight(ReRankQParserPlugin.java:163) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:209) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1619) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1433) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:485) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:722)
Re: DocSet getting cached in filterCache for facet request with {!cache=false}
Shawn, then how to skip filterCache for facet.method=enum ? Wiki says fq={!cache=false}*:* is ok, no? https://wiki.apache.org/solr/SolrCaching#filterCache -Mohsin - Original Message - From: erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, November 11, 2014 8:40:54 AM GMT -08:00 US/Canada Pacific Subject: Re: DocSet getting cached in filterCache for facet request with {!cache=false} Well, the difference that you're faceting with method=enum, which uses the filterCache (I think, it's been a while). I admit I'm a little surprised that when I tried faceting with the inStock field in the standard distro I got 3 entries when there are only two values but I'm willing to let that go ;) i.e. this produces 3 entries in the filterCache: http://localhost:8983/solr/techproducts/select?q=*:*rows=0facet=truefacet.field=inStockfacet.method=enum not an fq clause in sight.. Best, Erick On Tue, Nov 11, 2014 at 9:31 AM, Shawn Heisey apa...@elyograg.org wrote: On 11/11/2014 1:22 AM, Mohsin Beg Beg wrote: It seems Solr is caching when facting even with fq={!cache=false}*:* specified. This is what I am doing on Solr 4.10.0 on jre 1.7.0_51. Query 1) No cache in filterCache as expected http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:* http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache confirms this. Query 2) Query result docset cached in filterCache unexpectedly ? http://localhost:8983/solr/collection1/select?q=*:*rows=0fq={!cache=false}*:*facet=truefacet.field=foobarfacet.method=enum http://localhost:8983/solr/#/collection1/plugins/cache?entry=filterCache shows entry of item_*:*: org.apache.solr.search.BitDocSet@66afbbf cached. Suggestions why or how this may be avoided since I don't want to cache anything other than facet(ed) terms in the filterCache (for predictable heap usage). I hope this is just for testing, because fq=*:* is completely unnecessary, and will cause Solr to do extra work that it doesn't need to do. Try changing that second query so q and fq are not the same, so you can see for sure which one is producing the filterCache entry. With the same query for both, you cannot know which one is populating the filterCache. If it's coming from the q parameter, then it's probably working as designed. If it comes from the fq, then we probably actually do have a problem that needs investigation. Thanks, Shawn
Re: DocSet getting cached in filterCache for facet request with {!cache=false}
On Tue, Nov 11, 2014 at 1:25 PM, Mohsin Beg Beg mohsin@oracle.com wrote: Wiki says fq={!cache=false}*:* is ok, no? That's for the filtering... not for the faceting. then how to skip filterCache for facet.method=enum ? Specify a high minDF (the min docfreq or number of documents that need to match a term before the filter cache will be used). facet.enum.cache.minDf=1000 -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data
Re: Does ReRankQuery support reranking the result of a FuzzyQuery?
Just verified that fuzzy queries work in trunk with this test: params = new ModifiableSolrParams(); params.add(rq, {!rerank reRankQuery=$rqq reRankDocs=6}); params.add(q, term_s:~1 AND test_ti:[0 TO 2000]); params.add(rqq, id:1^10 id:2^20 id:3^30 id:4^40 id:5^50 id:6^60); params.add(fl, id,score); params.add(start, 0); params.add(rows, 6); assertQ(req(params), *[count(//doc)=5], //result/doc[1]/float[@name='id'][.='6.0'], //result/doc[2]/float[@name='id'][.='5.0'], //result/doc[3]/float[@name='id'][.='4.0'], //result/doc[4]/float[@name='id'][.='2.0'], //result/doc[5]/float[@name='id'][.='1.0'] ); Joel Bernstein Search Engineer at Heliosearch On Tue, Nov 11, 2014 at 1:04 PM, Joel Bernstein joels...@gmail.com wrote: This issue should be resolved in https://issues.apache.org/jira/browse/SOLR-6323. This is committed in trunk, 5x, 4x, and 4_10, but this did not make it into 4.10.2. If you take the version in the 4_10 branch you should be good to go. If a version 4.10.3 is cut, this will be included. Joel Bernstein Search Engineer at Heliosearch On Mon, Nov 10, 2014 at 1:50 PM, Brian Sawyer bsaw...@basistech.com wrote: Hello, We are trying to make use of the new ReRankQuery to rescore results according to a custom function but run into problems when our main query includes a FuzzyQuery. Using the example setup in Solr 4.10.2 querying: q=name:Dell~1 rq={!rerank reRankQuery=id:whatever} results in: java.lang.UnsupportedOperationException: Query name:delk~1 does not implement createWeight Is this a bug or is this intended? Thanks, Brian Full stack trace below: java.lang.UnsupportedOperationException: Query name:delk~1 does not implement createWeight at org.apache.lucene.search.Query.createWeight(Query.java:80) at org.apache.solr.search.ReRankQParserPlugin$ReRankWeight.init (ReRankQParserPlugin.java:177) at org.apache.solr.search.ReRankQParserPlugin$ReRankQuery.createWeight(ReRankQParserPlugin.java:163) at org.apache.lucene.search.IndexSearcher.createNormalizedWeight(IndexSearcher.java:684) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:297) at org.apache.solr.search.SolrIndexSearcher.buildAndRunCollectorChain(SolrIndexSearcher.java:209) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:1619) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1433) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:514) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:485) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:218) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1967) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at
SOLRJ Atomic updates of String field
I am using the below code to do partial update (in SOLR 4.2) partialUpdate = new HashMapString, Object(); partialUpdate.put(set,Object); doc.setField(description, partialUpdate); server.add(docs); server.commit(); I am seeing the below description value with {set =...}, Any idea why this is getting added? str name=description {set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip for faster processing and longer battery life, the M8 motion coprocessor to track speed, distance and elevation, and with an 8MP iSight camera, you can record 1080p HD Video at 60 FPS!} /str -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Analytics result for each Result Group
Probably sum and division can be applied to get the median. If you are using ver above 5, http://svn.apache.org/repos/asf/lucene/dev//trunk/solr/contrib/analytics/src/java/org/apache/solr/analytics/statistics/MedianStatsCollector.java can be used directly On Tue, Nov 11, 2014 at 5:49 PM, Talat Uyarer ta...@uyarer.com wrote: Hi Anurag, How can I find median function ? I use a lot that. 2014-11-09 20:39 GMT+02:00 Anurag Sharma anura...@gmail.com: Can a function query(http://wiki.apache.org/solr/FunctionQuery) serves your use case On Wed, Nov 5, 2014 at 3:36 PM, Talat Uyarer ta...@uyarer.com wrote: I searched wiki pages about that. I do not find any documentation. If you help me I will be glad. Thanks 2014-11-04 11:34 GMT+02:00 Talat Uyarer ta...@uyarer.com: Hi folks, We use Analytics Component for median, max etc. I wonder if I use group.field parameter with analytics component, How to calculate analytics for each result group ? Thanks -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304 -- Talat UYARER Websitesi: http://talat.uyarer.com Twitter: http://twitter.com/talatuyarer Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
Re: How to suggest from multiple fields?
The usual approach is to use copyField to copy multiple fields to a single field. I posted a solution using an UpdateRequestProcessor to merge fields, but with different analyzers, here: https://blog.safaribooksonline.com/2014/04/15/search-suggestions-with-solr-2/ My latest approach is this: https://github.com/safarijv/ifpress-solr-plugin/blob/master/src/main/java/com/ifactory/press/db/solr/spelling/suggest/MultiSuggester.java which merges the fields while building the suggester index, allowing us to provide different weights for suggestions from different fields. -Mike On 11/11/2014 06:59 AM, Thomas Michael Engelke wrote: Like in this article (http://www.andornot.com/blog/post/Advanced-autocomplete-with-Solr-Ngrams-and-Twitters-typeaheadjs.aspx), I am using multiple fields to generate different options for an autosuggest functionality: - First, the whole field (top priority) - Then, the whole field as EdgeNGrams from the left side (normal priority) - Lastly, single words or word parts (compound words) as EdgeNGrams However, I was not very successful in supplying a single requestHandler (/suggest) with data from multiple suggesters. I have also not been able to find any sample of how this might be done correctly. Is there a sample that I can read, or a documentation of how this might be done? The referenced article was doing it, yet only marginally described the technical implementation.
Different ids for the same document in different replicas.
Hi All, I am seeing interesting behavior on the replicas , I have a single shard and 6 replicas and on SolrCloud 4.10.1 . I only have a small number of documents ~375 that are replicated across the six replicas . The interesting thing is that the same document has a different id in each one of those replicas . This is causing the fq(id:xyz) type queries to fail, depending on which replica the query goes to. I have specified the id field in the following manner in schema.xml, is it the right way to specifiy an auto generated id in SolrCloud ? field name=id type=uuid indexed=true stored=true required=true multiValued=false / Thanks.
Re: Different ids for the same document in different replicas.
“uuid” isn’t an out of the box field type that I’m familiar with. Generally, I’d stick with the out of the box advice of the schema.xml file, which includes things like…. !-- Only remove the id field if you have a very good reason to. While not strictly required, it is highly recommended. A uniqueKey is present in almost all Solr installations. See the uniqueKey declaration below where uniqueKey is set to id. -- field name=id type=string indexed=true stored=true required=true multiValued=false / and… !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey If you’re creating some key/value pair with uuid as the key as you feed documents in, and you know that the uuid values you’re creating are unique, just change the field name and unique key name from ‘id’ to ‘uuid’. Or change the key name you send in from ‘uuid’ to ‘id’. On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.com wrote: Hi All, I am seeing interesting behavior on the replicas , I have a single shard and 6 replicas and on SolrCloud 4.10.1 . I only have a small number of documents ~375 that are replicated across the six replicas . The interesting thing is that the same document has a different id in each one of those replicas . This is causing the fq(id:xyz) type queries to fail, depending on which replica the query goes to. I have specified the id field in the following manner in schema.xml, is it the right way to specifiy an auto generated id in SolrCloud ? field name=id type=uuid indexed=true stored=true required=true multiValued=false / Thanks.
Re: Different ids for the same document in different replicas.
Looking a little deeper, I did find this about UUIDField http://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/schema/UUIDField.html NOTE: Configuring a UUIDField instance with a default value of NEW is not advisable for most users when using SolrCloud (and not possible if the UUID value is configured as the unique key field) since the result will be that each replica of each document will get a unique UUID value. Using UUIDUpdateProcessorFactoryhttp://lucene.apache.org/solr/4_9_0/solr-core/org/apache/solr/update/processor/UUIDUpdateProcessorFactory.html to generate UUID values when documents are added is recomended instead.” That might describe the behavior you saw. And the use of UUIDUpdateProcessorFactory to auto generate ID’s seems to be covered well here: http://solr.pl/en/2013/07/08/automatically-generate-document-identifiers-solr-4-x/ Though I’ve not actually tried that process before. On Nov 11, 2014, at 7:39 PM, Garth Grimm garthgr...@averyranchconsulting.commailto:garthgr...@averyranchconsulting.com wrote: “uuid” isn’t an out of the box field type that I’m familiar with. Generally, I’d stick with the out of the box advice of the schema.xml file, which includes things like…. !-- Only remove the id field if you have a very good reason to. While not strictly required, it is highly recommended. A uniqueKey is present in almost all Solr installations. See the uniqueKey declaration below where uniqueKey is set to id. -- field name=id type=string indexed=true stored=true required=true multiValued=false / and… !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey If you’re creating some key/value pair with uuid as the key as you feed documents in, and you know that the uuid values you’re creating are unique, just change the field name and unique key name from ‘id’ to ‘uuid’. Or change the key name you send in from ‘uuid’ to ‘id’. On Nov 11, 2014, at 7:18 PM, S.L simpleliving...@gmail.commailto:simpleliving...@gmail.com wrote: Hi All, I am seeing interesting behavior on the replicas , I have a single shard and 6 replicas and on SolrCloud 4.10.1 . I only have a small number of documents ~375 that are replicated across the six replicas . The interesting thing is that the same document has a different id in each one of those replicas . This is causing the fq(id:xyz) type queries to fail, depending on which replica the query goes to. I have specified the id field in the following manner in schema.xml, is it the right way to specifiy an auto generated id in SolrCloud ? field name=id type=uuid indexed=true stored=true required=true multiValued=false / Thanks.
Re: DocSet getting cached in filterCache for facet request with {!cache=false}
The first thing I'd try is to stop explicitly _telling_ solr to use the enum method by omitting the facet.method=enum from your URL ;)... I'm guessing that the field in question has very few unique values, so you probably need to do what Yonik suggests Erick On Tue, Nov 11, 2014 at 1:30 PM, Yonik Seeley yo...@heliosearch.com wrote: On Tue, Nov 11, 2014 at 1:25 PM, Mohsin Beg Beg mohsin@oracle.com wrote: Wiki says fq={!cache=false}*:* is ok, no? That's for the filtering... not for the faceting. then how to skip filterCache for facet.method=enum ? Specify a high minDF (the min docfreq or number of documents that need to match a term before the filter cache will be used). facet.enum.cache.minDf=1000 -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data
Re: SOLRJ Atomic updates of String field
Sorry didn't get what you are trying to achieve and the issue. On Wed, Nov 12, 2014 at 12:20 AM, bbarani bbar...@gmail.com wrote: I am using the below code to do partial update (in SOLR 4.2) partialUpdate = new HashMapString, Object(); partialUpdate.put(set,Object); doc.setField(description, partialUpdate); server.add(docs); server.commit(); I am seeing the below description value with {set =...}, Any idea why this is getting added? str name=description {set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip for faster processing and longer battery life, the M8 motion coprocessor to track speed, distance and elevation, and with an 8MP iSight camera, you can record 1080p HD Video at 60 FPS!} /str -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLRJ Atomic updates of String field
Hi Bbarani, Partial update solrJ example can be found in : http://find.searchhub.org/document/5b1187abfcfad33f Ahmet On Tuesday, November 11, 2014 8:51 PM, bbarani bbar...@gmail.com wrote: I am using the below code to do partial update (in SOLR 4.2) partialUpdate = new HashMapString, Object(); partialUpdate.put(set,Object); doc.setField(description, partialUpdate); server.add(docs); server.commit(); I am seeing the below description value with {set =...}, Any idea why this is getting added? str name=description {set=The iPhone 6 Plus features a 5.5-inch retina HD display, the A8 chip for faster processing and longer battery life, the M8 motion coprocessor to track speed, distance and elevation, and with an 8MP iSight camera, you can record 1080p HD Video at 60 FPS!} /str -- View this message in context: http://lucene.472066.n3.nabble.com/SOLRJ-Atomic-updates-of-String-field-tp4168809.html Sent from the Solr - User mailing list archive at Nabble.com.
DIH Blob data
I am trying to index json data present under blob data type in data base. JSON stored in database as {a:1,b:2,c:3}. I want to Search based on fields later like fq= a:1. The fields a,b,c are dynamic and can be anything based on data posted by users. What is the correct way to index data based on dynamic fields in Solr and search them later based on those fields. -- Rahul Ranjan