Re: hot deploy of newer version of solr schema in production
Hi, To be able to do a true hot deploy of newer schema without reindexing, you must carefully see to that none of your changes are breaking changes. So you should test the process on your development machine and make sure it works. Adding and deleting fields would work, but not changing the field-type or analysis of an existing field. Depending on from/to version, you may want to keep the old schema-version number. The process is: 1. Deploy the new schema, including all dependencies such as dictionaries 2. Do a RELOAD CORE http://wiki.apache.org/solr/CoreAdmin#RELOAD My preference is to do a more thorough upgrade of schema including new functionality and breaking changes, and then do a full reindex. The exception is if my index is huge and the reason for Solr upgrade or schema change is to fix a bug, not to use new functionality. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 24. jan. 2012, at 01:51, roz dev wrote: Hi All, I need community's feedback about deploying newer versions of solr schema into production while existing (older) schema is in use by applications. How do people perform these things? What has been the learning of people about this. Any thoughts are welcome. Thanks Saroj
Re: Highlighting stopwords
(12/01/24 9:31), O. Klein wrote: Let's say I search for spellcheck solr on a website that only contains info about Solr, so solr was added to the stopwords.txt. The query that will be parsed then (dismax) will not contain the term solr. So fragments won't contain highlights of the term solr. So when a fragment with the highlighted term spellcheck is generated, it would be less confusing for people who don't know how search engines work to also highlight the term solr. So my first test was to have a field with StopFilterFactory and search on that field, while using another field without StopFilterFactory to highlight on. This didn't do the trick. Are you saying that using hl.q parameter on highlight field while using q on the search field that has StopFilter and hl.q doesn't work for you? koji -- http://www.rondhuit.com/en/
Re: Size of index to use shard
Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7 machine with RAID10 and 24GB RAM = q-times under 1 sec. Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 400GB of size. I think is not required to break in shard, because i not consider this like a large index. Am I correct? What's is a real large index Thanks
RE: Filtering search results by an external set of values
Thanks for the responses. Groups probably wouldn't work as while there will be some overlap between customers, each will have a very different overall set of accessible resources. I'll try the suggestion about simply reindexing, or using the no-cache option and see how I get on. Failing that, are there hooks to write custom filter modules that used other parts of the records to decide on whether to include them in a result set or not? In our use case, the documents represent articles, which have an issue field. Each customer has defined issues (or ranges of issues) that they have subscriptions to, so the upper bounds for what to filter would probably be fairly small (10k - 20k issues/ranges). This could probably be used with the no-cache option you've pointed me to. Best wishes, Phil. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 23 January 2012 17:34 To: solr-user@lucene.apache.org Subject: Re: Filtering search results by an external set of values A second, but arguably quite expert option, is to use the no-cache option. See: https://issues.apache.org/jira/browse/SOLR-2429 The idea here is that you can specify that a filter is expensive and it will only be run after all the other filters etc have been applied. Furthermore, it will not be cached and only documents that pass through all the other filters will be matched against this filter. It has been specifically used for ACL calculations... That said, see exactly how painful storing auth tokens is. I can index, on a relatively underpowered laptop, 11M Wiki documents in 5 minutes or so. If your worst-case rights update take 1/2 hour to re-index and it only happens once a month, why be complex? And groups, as Jan says, often make even this unnecessary. Best Erick On Mon, Jan 23, 2012 at 5:16 AM, Jan Høydahl jan@cominvent.com wrote: Hi, Do you have any kind of group membership for you users? If you have, a resource's list of security access tokens could be smaller and avoid re-indexing most resources when adding normal users which mostly belong to groups. The common way is to add filters on the query. You may do it yourself or have some framework/plugin to it for you, see http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 23. jan. 2012, at 11:49, John, Phil (CSS) wrote: Hi, We're building quite a large shared index of resources, using Solr. The application that makes use of these resources is a multitenant one (i.e., many customers using the same index). For resources that are private to a customer, it's fairly easy to tag a document with their customer ID and using a FilterQuery to limit results to just their stuff. We are soon going to be adding a large number (many tens of millions) of records that will be shared amongst customers. Not all customers will have access to the same shared resources, e.g.: * Shared resource 1: o Customer 1 o Customer 3 * Shared resource 2: o Customer 2 o Customer 1 The issue is, what is the best way to model this in Solr? Should we have multiple customer_id fields on each record, and then use the filter query as with private resources, or is there a better way of doing it? What happens if we need to do a bulk change - i.e. adding new customer, or a previous customer has a large change in what shared resources they have access to? Am I right in thinking that we'd need to go through every shared resource, read it, make the required change, and reindex it? I'm wondering if there's a way, instead of updating these resources directly, I could construct a set of documents that would act as a filter at query time of which shared resources to return? Kind regards, Phil John Technical Lead, Capita Software Services Knights Court, Solihull Parkway Birmingham Business Park B37 7YB Office: 0870 400 5000 Fax: 0870 400 5001 email: philj...@capita.co.uk mailto:philj...@capita.co.uk Part of Capita plc www.capita.co.uk http://www.capita.co.uk This email and any attachment to it are confidential. Unless you are the intended recipient, you may not use, copy or disclose either the message or any information contained in the message. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any views or opinions expressed in this email are those of the sender only, unless otherwise stated. All copyright in any Capita material in this email is reserved. All emails, incoming and outgoing, may be recorded by Capita and monitored for legitimate business purposes. Capita exclude all liability for any loss or damage arising or resulting from the receipt, use or transmission of this email to the fullest extent permitted by law.
Re: ExractionHandler/Cell ignore just 2 fields defined in schema 3.5.0
Ah perfect - thank you Jan so much. :-) On Tue, Jan 24, 2012 at 11:14 AM, Jan Høydahl jan@cominvent.com wrote: Hi, It's because lowernames=true by default in solrconfig.xml, and it will convert any - into _ in field names. So try adding a request parameter lowernames=false or change the default in solrconfig.xml. Alternatively, leave as is but name your fields project_id and company_id :) http://wiki.apache.org/solr/ExtractingRequestHandler#Input_Parameters -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 23. jan. 2012, at 22:26, Wayne W wrote: Hi, Im been trying to figure this out now for a few days and I'm just not getting anywhere, so any pointers would be MOST welcome. I'm in the process of upgrading from 1.3 to the latest and greatest version of Solr and I'm getting there slowly. However I have this (final) problem that when sending a document for extraction, 2 of my fields defined in my schema are ignored. When I don't using the extraction the fields are used fine (I can see them via Luke). My schema has: field name=uid type=string stored=true/ field name=type type=string stored=true / field name=id indexed=false type=long stored=true/ field name=project-id type=long stored=true/ field name=company-id type=long stored=true/ field name=importTimestamp type=long stored=true/ field name=label type=text_ws indexed=true stored=true multiValued=true omitNorms=true/ field name=text type=text indexed=true stored=true multiValued=true / field name=title type=text indexed=true stored=true multiValued=true/ field name=date type=date indexed=true stored=true multiValued=true/ My request: INFO: [] webapp=/solr path=/update/extract params={literal.company-id=8literal.uid=hub.app.model.Document#203657literal.date=2012-01-23T21:10:42Zliteral.id=203657literal.type=hub.app.model.Documentidx.attr=trueliteral.label=literal.title=hotel+surfers.pdfdef.fl=textliteral.project-id=36} status=0 QTime=3579 Jan 24, 2012 8:10:58 AM org.apache.solr.update.DirectUpdateHandler2 commit For unknown reasons the fields 'company-id', and 'project-id' are ignored. any ideas? many thanks Wayne
Re: Filtering search results by an external set of values
Phil, Some time ago I posted my thoughts about the similar problem. Scroll to part II. http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201201.mbox/%3CCANGii8egoB1_rXFfwJMheyxx72v48B_DA-6KteKOymiBrR=m...@mail.gmail.com%3E Regards On Tue, Jan 24, 2012 at 1:36 PM, John, Phil (CSS) philj...@capita.co.ukwrote: Thanks for the responses. Groups probably wouldn't work as while there will be some overlap between customers, each will have a very different overall set of accessible resources. I'll try the suggestion about simply reindexing, or using the no-cache option and see how I get on. Failing that, are there hooks to write custom filter modules that used other parts of the records to decide on whether to include them in a result set or not? In our use case, the documents represent articles, which have an issue field. Each customer has defined issues (or ranges of issues) that they have subscriptions to, so the upper bounds for what to filter would probably be fairly small (10k - 20k issues/ranges). This could probably be used with the no-cache option you've pointed me to. Best wishes, Phil. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 23 January 2012 17:34 To: solr-user@lucene.apache.org Subject: Re: Filtering search results by an external set of values A second, but arguably quite expert option, is to use the no-cache option. See: https://issues.apache.org/jira/browse/SOLR-2429 The idea here is that you can specify that a filter is expensive and it will only be run after all the other filters etc have been applied. Furthermore, it will not be cached and only documents that pass through all the other filters will be matched against this filter. It has been specifically used for ACL calculations... That said, see exactly how painful storing auth tokens is. I can index, on a relatively underpowered laptop, 11M Wiki documents in 5 minutes or so. If your worst-case rights update take 1/2 hour to re-index and it only happens once a month, why be complex? And groups, as Jan says, often make even this unnecessary. Best Erick On Mon, Jan 23, 2012 at 5:16 AM, Jan Høydahl jan@cominvent.com wrote: Hi, Do you have any kind of group membership for you users? If you have, a resource's list of security access tokens could be smaller and avoid re-indexing most resources when adding normal users which mostly belong to groups. The common way is to add filters on the query. You may do it yourself or have some framework/plugin to it for you, see http://wiki.apache.org/solr/SolrSecurity#Document_Level_Security -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 23. jan. 2012, at 11:49, John, Phil (CSS) wrote: Hi, We're building quite a large shared index of resources, using Solr. The application that makes use of these resources is a multitenant one (i.e., many customers using the same index). For resources that are private to a customer, it's fairly easy to tag a document with their customer ID and using a FilterQuery to limit results to just their stuff. We are soon going to be adding a large number (many tens of millions) of records that will be shared amongst customers. Not all customers will have access to the same shared resources, e.g.: * Shared resource 1: o Customer 1 o Customer 3 * Shared resource 2: o Customer 2 o Customer 1 The issue is, what is the best way to model this in Solr? Should we have multiple customer_id fields on each record, and then use the filter query as with private resources, or is there a better way of doing it? What happens if we need to do a bulk change - i.e. adding new customer, or a previous customer has a large change in what shared resources they have access to? Am I right in thinking that we'd need to go through every shared resource, read it, make the required change, and reindex it? I'm wondering if there's a way, instead of updating these resources directly, I could construct a set of documents that would act as a filter at query time of which shared resources to return? Kind regards, Phil John Technical Lead, Capita Software Services Knights Court, Solihull Parkway Birmingham Business Park B37 7YB Office: 0870 400 5000 Fax: 0870 400 5001 email: philj...@capita.co.uk mailto:philj...@capita.co.uk Part of Capita plc www.capita.co.uk http://www.capita.co.uk This email and any attachment to it are confidential. Unless you are the intended recipient, you may not use, copy or disclose either the message or any information contained in the message. If you are not the intended recipient, you should delete this email and notify the sender immediately. Any views or opinions expressed in this email are those of
Re: Highlighting stopwords
Ah, I never used the hl.q That did the trick. Thanx! -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-stopwords-tp3681901p3684245.html Sent from the Solr - User mailing list archive at Nabble.com.
solr stopwords issue - documents are not matching
Hi, I am using solr-3.4. My part of the schema looks like : fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType stopwords_en.txt contains : a an and are as etc.. Now when I search for *buy house* Solr does not return me the documents with text *buy a house*. Also when I search for *buy a house* Solr does not return me the documents with text *buy house*. A part of debugQuery is str name=rawquerystringcContent:buy a house/str str name=querystringcContent:buy a house/str str name=parsedqueryPhraseQuery(cContent:bui ? hous)/str str name=parsedquery_toStringcContent:bui ? hous/str Any idea how can I solve this problem? or what is wrong? Thanks Ankita
highlighter not supporting surround parser
i want performing span queries using surround parser and i want tos how the results with highlighter, but the problem is highlighter is not working properly with surround query parser.Are their any plugins or updates available to do it. -- View this message in context: http://lucene.472066.n3.nabble.com/highlighter-not-supporting-surround-parser-tp3684474p3684474.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: index-time over boosted
Any idea? This is a snippet of my schema.xml now: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more ... !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true/ field name=site type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ field name=content type=text stored=true indexed=true omitNorms=true/ field name=cache type=string stored=true indexed=false/ field name=tstamp type=long stored=true indexed=false/ !-- fields for index-anchor plugin -- field name=anchor type=string stored=true indexed=true multiValued=true/ ... !-- uncomment the following to ignore any fields that don't already match an existing field name or dynamic field, rather than reporting them as an error. alternately, change the type=ignored to some other type e.g. text if you want unknown fields indexed and/or stored by default -- !--dynamicField name=* type=ignored multiValued=true /-- /fields !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent ... /schema Remi On Sun, Jan 22, 2012 at 6:31 PM, remi tassing tassingr...@gmail.com wrote: Hi, I got wrong in beginning but putting omitNorms in the query url. Now following your advice, I merged the schema.xml from Nutch and Solr and made sure omitNorms was set to true for the content, just as you said. Unfortunately the problem remains :-( On Thursday, January 19, 2012, Jan Høydahl jan@cominvent.com wrote: Hi, The schema you pasted in your mail is NOT Solr3.5's default example schema. Did you get it from the Nutch project? And the omitNorms parameter is supposed to go in the field tag in schema.xml, and the content field in the example schema does not have omitNorms=true. Try to change field name=content type=text stored=false indexed=true/ to field name=content type=text stored=false indexed=true omitNorms=true/ and try again. Please note that you SHOULD customize your schema, there is really no default schema in Solr (or Nutch), it's only an example or starting point. For your search application to work well you will have to invest some time in designing a schema, working with your queries, perhaps exploring DisMax query parser etc etc. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 19. jan. 2012, at 13:01, remi tassing wrote: Hello Jan, My schema wasn't changed from the release 3.5.0. The content can be seen below: schema name=nutch version=1.1 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=url class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType /types fields field name=id type=string stored=true indexed=true/ !-- core fields -- field name=segment type=string stored=true indexed=false/ field name=digest type=string stored=true indexed=false/ field name=boost type=float stored=true indexed=false/ !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true/ field name=site type=string stored=false indexed=true/ f
Re: Size of index to use shard
Hi, The article you gave mentions 13GB of index size. It is quite small index from our perspective. We have noticed, that at least solr 3.4 has some sort of choking point with respect to growing index size. It just becomes substantially slower than what we need (a query on avg taking more than 3-4 seconds) once index size crosses a magic level (about 80GB following our practical observations). We try to keep our indices at around 60-70GB for fast searches and above 100GB for slow ones. We also route majority of user queries to fast indices. Yes, caching may help, but not necessarily we can afford adding more RAM for bigger indices. BTW, our documents are very small, thus in 100GB index we can have around 200 mil. documents. It would be interesting to see, how you manage to ensure q-times under 1 sec with an index of 250GB? How many documents / facets do you ask max. at a time? FYI, we ask for a thousand of facets in one go. Regards, Dmitry On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7 machine with RAID10 and 24GB RAM = q-times under 1 sec. Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 400GB of size. I think is not required to break in shard, because i not consider this like a large index. Am I correct? What's is a real large index Thanks
Re: Advanced stopword handling edismax
O. Klein wrote As I understand it with edismax in trunk, whenever you have a query that only contains stopwords then all the terms are required. But when I try this I only get an empty parsedQuery like: (+() () () () () () () () () () () FunctionQuery((1.0/(3.16E-11*float(ms(const(132710400),date(date_dt)))+1.0))^50.0))/no_coord Am I misunderstanding this feature? Or is something going wrong? Can someone at least confirm that when using edismax and a query like to be or not to be (for English stopword list) the parsed query is empty? -- View this message in context: http://lucene.472066.n3.nabble.com/Advanced-stopword-handling-edismax-tp3677878p3684599.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Size of index to use shard
Apparently, not so easy to determine when to break the content into pieces. I'll investigate further about the amount of documents, the size of each document and what kind of search is being used. It seems, I will have to do a load test to identify the cutoff point to begin using the strategy of shards. Thanks 2012/1/24, Dmitry Kan dmitry@gmail.com: Hi, The article you gave mentions 13GB of index size. It is quite small index from our perspective. We have noticed, that at least solr 3.4 has some sort of choking point with respect to growing index size. It just becomes substantially slower than what we need (a query on avg taking more than 3-4 seconds) once index size crosses a magic level (about 80GB following our practical observations). We try to keep our indices at around 60-70GB for fast searches and above 100GB for slow ones. We also route majority of user queries to fast indices. Yes, caching may help, but not necessarily we can afford adding more RAM for bigger indices. BTW, our documents are very small, thus in 100GB index we can have around 200 mil. documents. It would be interesting to see, how you manage to ensure q-times under 1 sec with an index of 250GB? How many documents / facets do you ask max. at a time? FYI, we ask for a thousand of facets in one go. Regards, Dmitry On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7 machine with RAID10 and 24GB RAM = q-times under 1 sec. Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 400GB of size. I think is not required to break in shard, because i not consider this like a large index. Am I correct? What's is a real large index Thanks
Re: index-time over boosted
That looks right. Can you restart your Solr, do a new search with debugQuery=true and copy/paste the full EXPLAIN output for your query? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 24. jan. 2012, at 13:22, remi tassing wrote: Any idea? This is a snippet of my schema.xml now: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more ... !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true/ field name=site type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ field name=content type=text stored=true indexed=true omitNorms=true/ field name=cache type=string stored=true indexed=false/ field name=tstamp type=long stored=true indexed=false/ !-- fields for index-anchor plugin -- field name=anchor type=string stored=true indexed=true multiValued=true/ ... !-- uncomment the following to ignore any fields that don't already match an existing field name or dynamic field, rather than reporting them as an error. alternately, change the type=ignored to some other type e.g. text if you want unknown fields indexed and/or stored by default -- !--dynamicField name=* type=ignored multiValued=true /-- /fields !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent ... /schema Remi On Sun, Jan 22, 2012 at 6:31 PM, remi tassing tassingr...@gmail.com wrote: Hi, I got wrong in beginning but putting omitNorms in the query url. Now following your advice, I merged the schema.xml from Nutch and Solr and made sure omitNorms was set to true for the content, just as you said. Unfortunately the problem remains :-( On Thursday, January 19, 2012, Jan Høydahl jan@cominvent.com wrote: Hi, The schema you pasted in your mail is NOT Solr3.5's default example schema. Did you get it from the Nutch project? And the omitNorms parameter is supposed to go in the field tag in schema.xml, and the content field in the example schema does not have omitNorms=true. Try to change field name=content type=text stored=false indexed=true/ to field name=content type=text stored=false indexed=true omitNorms=true/ and try again. Please note that you SHOULD customize your schema, there is really no default schema in Solr (or Nutch), it's only an example or starting point. For your search application to work well you will have to invest some time in designing a schema, working with your queries, perhaps exploring DisMax query parser etc etc. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 19. jan. 2012, at 13:01, remi tassing wrote: Hello Jan, My schema wasn't changed from the release 3.5.0. The content can be seen below: schema name=nutch version=1.1 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType fieldType name=url class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType /types fields field name=id type=string stored=true indexed=true/ !-- core fields -- field name=segment type=string stored=true indexed=false/ field name=digest type=string stored=true indexed=false/ field name=boost type=float stored=true
RE: Highlighting more than 1 term
Nitin and any others who may have followed this item, I resolved the issue, but I'm not exactly sure of the originating cause. I had change the field types of my text fields to text_en and then re-indexed. Changing to text_en kept highlighting from happening to more than one term in the fields for which I desired highlighting. Note that I used the stock fieldtype definitions supplied with solr. Once I changed the field type back to text and re-indexed again, highlighting multiple terms in the same field was re-enabled. Thanks, Tim Hibbs -Original Message- From: csscouter [mailto:tim.hi...@verizon.net] Sent: Thursday, January 19, 2012 9:54 AM To: solr-user@lucene.apache.org Subject: RE: Highlighting more than 1 term Nitin (and any other interested parties here): Unfortunately, re-indexing the content did not resolve the problem and the symptom remains the same. Any additional advice is appreciated. Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Highlighting-more-than-1-term-tp36708 62p3672538.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: index-time over boosted
Hello, thanks for helping out Jan, I really appreciate that! These are full explains of two results: Result#1.-- 3.0412199E-5 = (MATCH) max of: 3.0412199E-5 = (MATCH) weight(content:mobil broadband^0.5 in 19081), product of: 0.13921623 = queryWeight(content:mobil broadband^0.5), product of: 0.5 = boost 6.3531075 = idf(content: mobil=5270 broadband=2392) 0.043826185 = queryNorm 2.1845297E-4 = fieldWeight(content:mobil broadband in 19081), product of: 3.6055512 = tf(phraseFreq=13.0) 6.3531075 = idf(content: mobil=5270 broadband=2392) 9.536743E-6 = fieldNorm(field=content, doc=19081) Result#2.- 2.6991445E-5 = (MATCH) max of: 2.6991445E-5 = (MATCH) weight(content:mobil broadband^0.5 in 15306), product of: 0.13921623 = queryWeight(content:mobil broadband^0.5), product of: 0.5 = boost 6.3531075 = idf(content: mobil=5270 broadband=2392) 0.043826185 = queryNorm 1.9388145E-4 = fieldWeight(content:mobil broadband in 15306), product of: 1.0 = tf(phraseFreq=1.0) 6.3531075 = idf(content: mobil=5270 broadband=2392) 3.0517578E-5 = fieldNorm(field=content, doc=15306) Remi On Tue, Jan 24, 2012 at 3:38 PM, Jan Høydahl jan@cominvent.com wrote: That looks right. Can you restart your Solr, do a new search with debugQuery=true and copy/paste the full EXPLAIN output for your query? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 24. jan. 2012, at 13:22, remi tassing wrote: Any idea? This is a snippet of my schema.xml now: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more ... !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true/ field name=site type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ field name=content type=text stored=true indexed=true omitNorms=true/ field name=cache type=string stored=true indexed=false/ field name=tstamp type=long stored=true indexed=false/ !-- fields for index-anchor plugin -- field name=anchor type=string stored=true indexed=true multiValued=true/ ... !-- uncomment the following to ignore any fields that don't already match an existing field name or dynamic field, rather than reporting them as an error. alternately, change the type=ignored to some other type e.g. text if you want unknown fields indexed and/or stored by default -- !--dynamicField name=* type=ignored multiValued=true /-- /fields !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent ... /schema Remi On Sun, Jan 22, 2012 at 6:31 PM, remi tassing tassingr...@gmail.com wrote: Hi, I got wrong in beginning but putting omitNorms in the query url. Now following your advice, I merged the schema.xml from Nutch and Solr and made sure omitNorms was set to true for the content, just as you said. Unfortunately the problem remains :-( On Thursday, January 19, 2012, Jan Høydahl jan@cominvent.com wrote: Hi, The schema you pasted in your mail is NOT Solr3.5's default example schema. Did you get it from the Nutch project? And the omitNorms parameter is supposed to go in the field tag in schema.xml, and the content field in the example schema does not have omitNorms=true. Try to change field name=content type=text stored=false indexed=true/ to field name=content type=text stored=false indexed=true omitNorms=true/ and try again. Please note that you SHOULD customize your schema, there is really no default schema in Solr (or Nutch), it's only an example or starting point. For your search application to work well you will have to invest some time in designing a schema, working with your queries, perhaps exploring DisMax query parser etc etc. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 19. jan. 2012, at 13:01, remi tassing wrote: Hello Jan, My schema wasn't changed from the release 3.5.0. The content can be seen below: schema name=nutch version=1.1 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer tokenizer
full import is not working and still not showing any errors
hi all, anyone can help me with this please. i am trying to do a full import, i've done everything correctly, now when i try the full import an xml page displays showing the following and i stays like this now matter how i refresh the page: This XML file does not appear to have any style information associated with it. The document tree is shown below. response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=initArgs lst name=defaults str name=configC:\solr\conf\data-config.xml/str /lst /lst str name=commandfull-import/str str name=statusbusy/str str name=importResponseA command is still running.../str lst name=statusMessages str name=Time Elapsed0:5:8.925/str str name=Total Requests made to DataSource1/str str name=Total Rows Fetched0/str str name=Total Documents Processed0/str str name=Total Documents Skipped0/str str name=Full Dump Started2012-01-24 16:29:31/str /lst str name=WARNINGThis response format is experimental. It is likely to change in the future./str/response -- View this message in context: http://lucene.472066.n3.nabble.com/full-import-is-not-working-and-still-not-showing-any-errors-tp3684751p3684751.html Sent from the Solr - User mailing list archive at Nabble.com.
Not getting the expected search results
Hello, I am a newbie in this Solr world and I am getting surprised because I try to do searches, both with the browser interface and by using a Java client and the expected results do not appear. The issue is: 1) I have set up an entity called via in my data-config.xml with 5 fields. I do the full-import and it indexes 1.5M records: entity name=via query=select TVIA, NVIAC, CMUM, CVIA, CPRO from INE_VIAS field column=TVIA name=TVIA / field column=NVIAC name=NVIAC / field column=CMUM name=CMUM / field column=CVIA name=CVIA / field column=CPRO name=CPRO / /entity 2) These 5 fields are mapped in the schema.xml, this way: field name=TVIA type=text_general indexed=true stored=true / field name=NVIAC type=text_general indexed=true stored=true / field name=CMUM type=text_general indexed=true stored=true / field name=CVIA type=string indexed=true stored=true / field name=CPRO type=int indexed=true stored=true / 3) I try to do a search for Alcala street in Madrid: NVIAC:ALCALA AND CPRO:28 AND CMUM:079 But it does just get two results (none of them, the desired one): docstr name=CMUM079/strint name=CPRO28/intstr name=CVIA45363/strstr name=NVIACALCALA GAZULES/strstr name=TVIACALLE/str/doc docstr name=CMUM079/strint name=CPRO28/intstr name=CVIA08116/strstr name=NVIACALCALA GUADAIRA/strstr name=TVIACALLE/str/doc 4) When I do the indexing by delimiting the entity search: entity name=via query=select TVIA, NVIAC, CMUM, CVIA, CPRO from INE_VIAS WHERE NVIAC LIKE '%ALCALA%' The full import does 913 documents and I do the same search, but this time I get the desired result: docstr name=CMUM079/strint name=CPRO28/intstr name=CVIA00132/strstr name=NVIACALCALA/strstr name=TVIACALLE/str/doc Anyone can help me with that? I don't know why it does not work as expected when I do the full-import of the whole lot of streets. Thanks a lot in advance. -- View this message in context: http://lucene.472066.n3.nabble.com/Not-getting-the-expected-search-results-tp3684974p3684974.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Limiting term frequency in a document to a specific term
With the Solr search relevancy functions, a ParseException, unknown function ttf in FunctionQuery. http://localhost:8983/solr/select/?fl=score,documentPageIddefType=funcq=ttf(contents,amplifiers) where contents is a field name, and amplifiers is text in the field name. Just curious why I get a parse exception for the above syntax. On Monday, January 23, 2012, Ahmet Arslan iori...@yahoo.com wrote: Below is an example query to search for the term frequency in a document, but it is returning the frequency for all the terms. [ http://localhost:8983/solr/select/?fl=documentPageIdq=documentPageId:49667.3qt=tvrhtv.tf=truetv.fl=contents][1 ] I would like to be able to limit the query to just one term that I know occurs in the document. I don't fully follow but http://wiki.apache.org/solr/FunctionQuery#tf may be what you want?
analyzing stored fields (removing HTML tags)
Is it possible to configure schema to remove HTML tags from stored field content? As far as I can tell analyzers can only be applied to indexed content, but they don't affect stored content. I want to remove HTML tags from text fields so that returned text content from stored field has no HTML tags. Thanks Bob
Re: index-time over boosted
Hi, Well, I think you do it right, but get tricked by either editing the wrong file, a typo or browser caching. Why not try to start with a fresh Solr3.5.0, start the example app, index all exampledocs, search for Podcasts, you get one hit, in fields text and features. Then change solr/example/solr/conf/schema.xml and add omitNorms=true to these two fields. Then stop Solr, delete your index, start Solr, re-index the docs and try again. fieldNorm is now 1.0. Once you get that working you can start debugging where you got it wrong in your own setup. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 24. jan. 2012, at 14:55, remi tassing wrote: Hello, thanks for helping out Jan, I really appreciate that! These are full explains of two results: Result#1.-- 3.0412199E-5 = (MATCH) max of: 3.0412199E-5 = (MATCH) weight(content:mobil broadband^0.5 in 19081), product of: 0.13921623 = queryWeight(content:mobil broadband^0.5), product of: 0.5 = boost 6.3531075 = idf(content: mobil=5270 broadband=2392) 0.043826185 = queryNorm 2.1845297E-4 = fieldWeight(content:mobil broadband in 19081), product of: 3.6055512 = tf(phraseFreq=13.0) 6.3531075 = idf(content: mobil=5270 broadband=2392) 9.536743E-6 = fieldNorm(field=content, doc=19081) Result#2.- 2.6991445E-5 = (MATCH) max of: 2.6991445E-5 = (MATCH) weight(content:mobil broadband^0.5 in 15306), product of: 0.13921623 = queryWeight(content:mobil broadband^0.5), product of: 0.5 = boost 6.3531075 = idf(content: mobil=5270 broadband=2392) 0.043826185 = queryNorm 1.9388145E-4 = fieldWeight(content:mobil broadband in 15306), product of: 1.0 = tf(phraseFreq=1.0) 6.3531075 = idf(content: mobil=5270 broadband=2392) 3.0517578E-5 = fieldNorm(field=content, doc=15306) Remi On Tue, Jan 24, 2012 at 3:38 PM, Jan Høydahl jan@cominvent.com wrote: That looks right. Can you restart your Solr, do a new search with debugQuery=true and copy/paste the full EXPLAIN output for your query? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 24. jan. 2012, at 13:22, remi tassing wrote: Any idea? This is a snippet of my schema.xml now: ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more ... !-- fields for index-basic plugin -- field name=host type=url stored=false indexed=true/ field name=site type=string stored=false indexed=true/ field name=url type=url stored=true indexed=true required=true/ field name=content type=text stored=true indexed=true omitNorms=true/ field name=cache type=string stored=true indexed=false/ field name=tstamp type=long stored=true indexed=false/ !-- fields for index-anchor plugin -- field name=anchor type=string stored=true indexed=true multiValued=true/ ... !-- uncomment the following to ignore any fields that don't already match an existing field name or dynamic field, rather than reporting them as an error. alternately, change the type=ignored to some other type e.g. text if you want unknown fields indexed and/or stored by default -- !--dynamicField name=* type=ignored multiValued=true /-- /fields !-- Field to use to determine and enforce document uniqueness. Unless this field is marked with required=false, it will be a required field -- uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent ... /schema Remi On Sun, Jan 22, 2012 at 6:31 PM, remi tassing tassingr...@gmail.com wrote: Hi, I got wrong in beginning but putting omitNorms in the query url. Now following your advice, I merged the schema.xml from Nutch and Solr and made sure omitNorms was set to true for the content, just as you said. Unfortunately the problem remains :-( On Thursday, January 19, 2012, Jan Høydahl jan@cominvent.com wrote: Hi, The schema you pasted in your mail is NOT Solr3.5's default example schema. Did you get it from the Nutch project? And the omitNorms parameter is supposed to go in the field tag in schema.xml, and the content field in the example schema does not have omitNorms=true. Try to change field name=content type=text stored=false indexed=true/ to field name=content type=text stored=false indexed=true omitNorms=true/ and try again. Please note that you SHOULD customize your schema, there is really no default schema in Solr (or Nutch), it's only an example or starting point. For your search application to work well you will have to invest some time in designing a schema, working with your queries, perhaps exploring DisMax query parser etc etc. -- Jan Høydahl, search solution architect Cominvent
Re: Solr Java client API
It would really help to see the relevant parts of the code you're using to see what you've tried. You might want to review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Mon, Jan 23, 2012 at 2:45 PM, jingjung Ng jingjun...@gmail.com wrote: Hi, I implemented the facet using query.addFacetQuery query.addFilterQuery to facet on: gender:male state:DC This works fine. How can I facet on multi-values using Solrj API, like following: gender:male gender:female state:DC I've tried, but return 0. Can anyone help ? Thanks, -jingjung ng
Re: analyzing stored fields (removing HTML tags)
You probably may use a Sanitizer as we do here. http://stackoverflow.com/questions/1947021/libs-for-html-sanitizing -- View this message in context: http://lucene.472066.n3.nabble.com/analyzing-stored-fields-removing-HTML-tags-tp3685144p3685182.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Hierarchical faceting in UI
Darren, One challenge for me is that a term can appear in multiple places of the hierarchy. So it's not safe to simply use the term as it appears to get its children; I probably need to include the entire tree path up to this term. For example, if the hierarchy is Cardiovascular Diseases Arteriosclerosis Coronary Artery Disease, and I'm getting the children of the middle term Arteriosclerosi, I need to filter on something like parent:Cardiovascular Diseases/Arteriosclerosis. I'm having trouble figuring out how I can get the complete path per above to add to the URL of each facet term. I know velocity/facet_field.vm is where I build the URL. I know how to simply add a parent:term filter to the URL. But I don't know how to access a document field, like the complete parent path, in facet_field.vm. Any help would be great. Yuhao From: dar...@ontrenet.com dar...@ontrenet.com To: Yuhao nfsvi...@yahoo.com Cc: solr-user@lucene.apache.org Sent: Monday, January 23, 2012 7:16 PM Subject: Re: Hierarchical faceting in UI On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhao nfsvi...@yahoo.com wrote: Programmatically, something like this might work: for each facet field, add another hidden field that identifies its parent. Then, program additional logic in the UI to show only the facet terms at the currently selected level. For example, if one filters on cat:electronics, the new UI logic would apply the additional filter cat_parent:electronics. Can this be done? Yes. This is how I do it. Would it be a lot of work? No. Its not a lot of work, simply represent your hierarchy as parent/child relations in the document fields and in your UI drill down by issuing new faceted searches. Use the current facet (tree level) as the parent:level in the next query. Its much easier than other suggestions for this. Is there a better way? Not in my opinion, there isn't. This is the simplest to implement and understand. By the way, Flamenco (another faceted browser) has built-in support for hierarchies, and it has worked well for my data in this aspect (but less well than Solr in others). I'm looking for the same kind of hierarchical UI feature in Solr.
Re: java.net.SocketException: Too many open files
Hi Jonty, no, not really. When we first had such problems, we really thought that the number of open files is the problem, so we implemented an algorithm that performed an optimize from time to time to force a segment merge. Due to some misconfiguration, this ran too often. With the result that an optimize was issued before thje previous optimization was finished, which is a really bad idea. We removed the optimization calls, and since then we didn't have any more problems. However, I never found out the initial reason for the exception. Maybe there was some bug in Solr's 3.1 version - we're using 3.5 right now -, but I couldn't find a hint in the changelog. At least we didn't have this exception for more than two months now, so I'm hoping that the cause for this has disappeared somehow. Sorry that I can't help you more. Greetings, Kuli On 24.01.2012 07:48, Jonty Rhods wrote: Hi Kuli, Did you get the solution of this problem? I am still facing this problem. Please help me to overcome this problem. regards On Wed, Oct 26, 2011 at 1:16 PM, Michael Kuhlmannk...@solarier.de wrote: Hi; we have a similar problem here. We already raised the file ulimit on the server to 4096, but this only defered the problem. We get a TooManyOpenFilesException every few months. The problem has nothing to do with real files. When we had the last TooManyOpenFilesException, we investigated with netstat -a and saw that there were about 3900 open sockets in Jetty. Curiously, we only have one SolrServer instance per Solr client, and we only have three clients (our running web servers). We have set defaultMaxConnectionsPerHost to 20 and maxTotalConnections to 100. There should be room enough. Sorry that I can't help you, we still have not solved tghe problem on our own. Greetings, Kuli Am 25.10.2011 22:03, schrieb Jonty Rhods: Hi, I am using solrj and for connection to server I am using instance of the solr server: SolrServer server = new CommonsHttpSolrServer( http://localhost:8080/solr/core0;); I noticed that after few minutes it start throwing exception java.net.SocketException: Too many open files. It seems that it related to instance of the HttpClient. How to resolved the instances to a certain no. Like connection pool in dbcp etc.. I am not experienced on java so please help to resolved this problem. solr version: 3.4 regards Jonty
Re: java.net.SocketException: Too many open files
Hi Jonty, You can try changing the maximum number of files opened by a process using command: ulimit -n XXX In case, the number of opened files is not increasing with time and just a constant number which is larger than system default limit, this should fix it. -param On 1/24/12 11:40 AM, Michael Kuhlmann k...@solarier.de wrote: Hi Jonty, no, not really. When we first had such problems, we really thought that the number of open files is the problem, so we implemented an algorithm that performed an optimize from time to time to force a segment merge. Due to some misconfiguration, this ran too often. With the result that an optimize was issued before thje previous optimization was finished, which is a really bad idea. We removed the optimization calls, and since then we didn't have any more problems. However, I never found out the initial reason for the exception. Maybe there was some bug in Solr's 3.1 version - we're using 3.5 right now -, but I couldn't find a hint in the changelog. At least we didn't have this exception for more than two months now, so I'm hoping that the cause for this has disappeared somehow. Sorry that I can't help you more. Greetings, Kuli On 24.01.2012 07:48, Jonty Rhods wrote: Hi Kuli, Did you get the solution of this problem? I am still facing this problem. Please help me to overcome this problem. regards On Wed, Oct 26, 2011 at 1:16 PM, Michael Kuhlmannk...@solarier.de wrote: Hi; we have a similar problem here. We already raised the file ulimit on the server to 4096, but this only defered the problem. We get a TooManyOpenFilesException every few months. The problem has nothing to do with real files. When we had the last TooManyOpenFilesException, we investigated with netstat -a and saw that there were about 3900 open sockets in Jetty. Curiously, we only have one SolrServer instance per Solr client, and we only have three clients (our running web servers). We have set defaultMaxConnectionsPerHost to 20 and maxTotalConnections to 100. There should be room enough. Sorry that I can't help you, we still have not solved tghe problem on our own. Greetings, Kuli Am 25.10.2011 22:03, schrieb Jonty Rhods: Hi, I am using solrj and for connection to server I am using instance of the solr server: SolrServer server = new CommonsHttpSolrServer( http://localhost:8080/solr/core0;); I noticed that after few minutes it start throwing exception java.net.SocketException: Too many open files. It seems that it related to instance of the HttpClient. How to resolved the instances to a certain no. Like connection pool in dbcp etc.. I am not experienced on java so please help to resolved this problem. solr version: 3.4 regards Jonty
using pre-core properties in dih config
I have a multi-core setup, and for each core I have a shared data-config.xml which specifies a SQL query for data import. What I want to do is have the same data-config.xml file shared between my cores (linked to same physical file). I'd like to specify core properties in solr.xml such that each core loads a different set of data from SQL. So my query might look like this: query=select * from index_values where mod(index_id,${NUM_CORES})=${CORE_ID} So I want to have NUM_CORES and CORE_ID specified as properties in solr.xml, something like: solr ... cores .. property name=NUM_CORES value=3/ core name=index0 ... property name=CORE_ID value=0/ /core core name=index1 ... property name=CORE_ID value=1/ /core core name=index2 ... property name=CORE_ID value=2/ /core /cores /solr So my question is, is this possible, and if so what is exact syntax to make it work? Thanks, Bob
Re: Size of index to use shard
Talking about index size can be very misleading. Take a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names. Note that the *.fdt and *.fdx files are used to for stored fields, i.e. the verbatim copy of data put in the index when you specify stored=true. These files have virtually no impact on search speed. So, if your *.fdx and *.fdt files are 90G out of a 100G index it is a much different thing than if these files are 10G out of a 100G index. And this doesn't even mention the peculiarities of your query mix. Nor does it say a thing about whether your cheapest alternative is to add more memory. Anderson's method is about the only reliable one, you just have to test with your index and real queries. At some point, you'll find your tipping point, typically when you come under memory pressure. And it's a balancing act between how much memory you allocate to the JVM and how much you leave for the op system. Bottom line: No hard and fast numbers. And you should periodically re-test the empirical numbers you *do* arrive at... Best Erick On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Apparently, not so easy to determine when to break the content into pieces. I'll investigate further about the amount of documents, the size of each document and what kind of search is being used. It seems, I will have to do a load test to identify the cutoff point to begin using the strategy of shards. Thanks 2012/1/24, Dmitry Kan dmitry@gmail.com: Hi, The article you gave mentions 13GB of index size. It is quite small index from our perspective. We have noticed, that at least solr 3.4 has some sort of choking point with respect to growing index size. It just becomes substantially slower than what we need (a query on avg taking more than 3-4 seconds) once index size crosses a magic level (about 80GB following our practical observations). We try to keep our indices at around 60-70GB for fast searches and above 100GB for slow ones. We also route majority of user queries to fast indices. Yes, caching may help, but not necessarily we can afford adding more RAM for bigger indices. BTW, our documents are very small, thus in 100GB index we can have around 200 mil. documents. It would be interesting to see, how you manage to ensure q-times under 1 sec with an index of 250GB? How many documents / facets do you ask max. at a time? FYI, we ask for a thousand of facets in one go. Regards, Dmitry On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7 machine with RAID10 and 24GB RAM = q-times under 1 sec. Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 400GB of size. I think is not required to break in shard, because i not consider this like a large index. Am I correct? What's is a real large index Thanks
Re: Limiting term frequency in a document to a specific term
At a guess, you're using 3.x and the relevance functions are only on trunk (4.0). Best Erick On Tue, Jan 24, 2012 at 7:49 AM, solr user mvidaat...@gmail.com wrote: With the Solr search relevancy functions, a ParseException, unknown function ttf in FunctionQuery. http://localhost:8983/solr/select/?fl=score,documentPageIddefType=funcq=ttf(contents,amplifiers) where contents is a field name, and amplifiers is text in the field name. Just curious why I get a parse exception for the above syntax. On Monday, January 23, 2012, Ahmet Arslan iori...@yahoo.com wrote: Below is an example query to search for the term frequency in a document, but it is returning the frequency for all the terms. [ http://localhost:8983/solr/select/?fl=documentPageIdq=documentPageId:49667.3qt=tvrhtv.tf=truetv.fl=contents][1 ] I would like to be able to limit the query to just one term that I know occurs in the document. I don't fully follow but http://wiki.apache.org/solr/FunctionQuery#tf may be what you want?
phrase auto-complete with suggester component
I'm testing out the various auto-complete functionalities on the wikipedia dataset. I first tried the facet.prefix and found it slow at times. I'm now looking at the Suggester component. Given a query like new york, I would like to get results like New York or New York City. When I tried using the suggest component, it suggest entries for each word rather then phrase(even if i add quotes). How can I change my config to get title matches and not have the query broken into each word? lst name=spellcheck lst name=suggestions lst name=new int name=numFound5/int int name=startOffset0/int int name=endOffset3/int arr name=suggestion strnewt/str strnewwy patitta/str strnewyddion/str strnewyorker/str strnewyork–presbyterian hospital/str /arr /lst lst name=york int name=numFound5/int int name=startOffset4/int int name=endOffset8/int arr name=suggestion stryork/str stryork–dauphin (septa station)/str stryork—humber/str stryork—scarborough/str stryork—simcoe/str /arr /lst str name=collationnewt york/str /lst /lst /solr/suggest?q=new%20yorkomitHeader=truespellcheck.count=5spellcheck.collate=true solrconfig.xml: searchComponent name=suggest class=solr.SpellCheckComponent lst name=spellchecker str name=namesuggest/str str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str str name=fieldtitle_autocomplete/str str name=buildOnCommittrue/str /lst /searchComponent requestHandler name=/suggest class=org.apache.solr.handler.component.SearchHandler lst name=defaults str name=spellchecktrue/str str name=spellcheck.dictionarysuggest/str str name=spellcheck.count10/str /lst arr name=components strsuggest/str /arr /requestHandler schema.xml: fieldType name=text_auto class=solr.TextField analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=title_autocomplete type=text_auto indexed=true stored=false multiValued=false / -- Tommy Chheng
Re: Hierarchical faceting in UI
Yuhao, Ok, let me think about this. A term can have multiple parents. Each of those parents would be 'different', yes? In this case, use a multivalued field for the parent and add all the parent names or id's to it. The relations should be unique. Your UI will associate the correct parent id to build the facet query from and return the correct children because the user is descending down a specific path in the UI and the parent node unique id's are returned along the way. Now, if you are having parent names/id's that themselves can appear in multiple locations (vs. just terms 'the leafs'), then perhaps your hierarchy needs refactoring for redundancy? Happy to help with more details. Darren On 01/24/2012 11:22 AM, Yuhao wrote: Darren, One challenge for me is that a term can appear in multiple places of the hierarchy. So it's not safe to simply use the term as it appears to get its children; I probably need to include the entire tree path up to this term. For example, if the hierarchy is Cardiovascular Diseases Arteriosclerosis Coronary Artery Disease, and I'm getting the children of the middle term Arteriosclerosi, I need to filter on something like parent:Cardiovascular Diseases/Arteriosclerosis. I'm having trouble figuring out how I can get the complete path per above to add to the URL of each facet term. I know velocity/facet_field.vm is where I build the URL. I know how to simply add a parent:term filter to the URL. But I don't know how to access a document field, like the complete parent path, in facet_field.vm. Any help would be great. Yuhao From: dar...@ontrenet.comdar...@ontrenet.com To: Yuhaonfsvi...@yahoo.com Cc: solr-user@lucene.apache.org Sent: Monday, January 23, 2012 7:16 PM Subject: Re: Hierarchical faceting in UI On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhaonfsvi...@yahoo.com wrote: Programmatically, something like this might work: for each facet field, add another hidden field that identifies its parent. Then, program additional logic in the UI to show only the facet terms at the currently selected level. For example, if one filters on cat:electronics, the new UI logic would apply the additional filter cat_parent:electronics. Can this be done? Yes. This is how I do it. Would it be a lot of work? No. Its not a lot of work, simply represent your hierarchy as parent/child relations in the document fields and in your UI drill down by issuing new faceted searches. Use the current facet (tree level) as the parent:level in the next query. Its much easier than other suggestions for this. Is there a better way? Not in my opinion, there isn't. This is the simplest to implement and understand. By the way, Flamenco (another faceted browser) has built-in support for hierarchies, and it has worked well for my data in this aspect (but less well than Solr in others). I'm looking for the same kind of hierarchical UI feature in Solr.
SolrCell maximum file size
Hi everybody Does anyone knows if there is a maximum file size that can be uploaded to the extractingrequesthandler via http request? Thanks in advance, Augusto Camarotti
HTMLStripCharFilterFactory not working in Solr4?
We recently updated to the latest build of Solr4 and everything is working really well so far! There is one case that is not working the same way it was in Solr 3.4 - we strip out certain HTML constructs (like trademark and registered, for example) in a field as defined below - it was working in Solr3.4 with the configuration shown here, but is not working the same way in Solr4. The label field is defined as type=text_general field name=label type=text_general indexed=true stored=false required=false multiValued=true/ Here's the type definition for text_general field: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType In Solr 3.4, that configuration was completely stripping html constructs out of the indexed field which is exactly what we wanted. If for example, we then do a facet on the label field, like in the test below, we're getting some terms in the response that we would not like to be there. // test case (groovy) void specialHtmlConstructsGetStripped() { SolrInputDocument inputDocument = new SolrInputDocument() inputDocument.addField('label', 'Bose#174; #8482;') solrServer.add(inputDocument) solrServer.commit() QueryResponse response = solrServer.query(new SolrQuery('bose')) assert 1 == response.results.numFound SolrQuery facetQuery = new SolrQuery('bose') facetQuery.facet = true facetQuery.set(FacetParams.FACET_FIELD, 'label') facetQuery.set(FacetParams.FACET_MINCOUNT, '1') response = solrServer.query(facetQuery) FacetField ff = response.facetFields.find {it.name == 'label'} List suggestResponse = [] for (FacetField.Count facetField in ff?.values) { suggestResponse facetField.name } assert suggestResponse == ['bose'] } With the upgrade to Solr4, the assertion fails, the suggested response contains 174 and 8482 as terms. Test output is: Assertion failed: assert suggestResponse == ['bose'] | | | false [174, 8482, bose] I just tried again using the latest build from today, namely: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/369/ and we're still getting the failing assertion. Is there a different way to configure the HTMLStripCharFilterFactory in Solr4? Thanks in advance for any tips! Mike
Re: HTMLStripCharFilterFactory not working in Solr4?
You can use LegacyHTMLStripCharFilterFactory to get the previous behavior. See https://issues.apache.org/jira/browse/LUCENE-3690 for more details. -Yonik http://www.lucidimagination.com On Tue, Jan 24, 2012 at 1:34 PM, Mike Hugo m...@piragua.com wrote: We recently updated to the latest build of Solr4 and everything is working really well so far! There is one case that is not working the same way it was in Solr 3.4 - we strip out certain HTML constructs (like trademark and registered, for example) in a field as defined below - it was working in Solr3.4 with the configuration shown here, but is not working the same way in Solr4. The label field is defined as type=text_general field name=label type=text_general indexed=true stored=false required=false multiValued=true/ Here's the type definition for text_general field: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType In Solr 3.4, that configuration was completely stripping html constructs out of the indexed field which is exactly what we wanted. If for example, we then do a facet on the label field, like in the test below, we're getting some terms in the response that we would not like to be there. // test case (groovy) void specialHtmlConstructsGetStripped() { SolrInputDocument inputDocument = new SolrInputDocument() inputDocument.addField('label', 'Bose#174; #8482;') solrServer.add(inputDocument) solrServer.commit() QueryResponse response = solrServer.query(new SolrQuery('bose')) assert 1 == response.results.numFound SolrQuery facetQuery = new SolrQuery('bose') facetQuery.facet = true facetQuery.set(FacetParams.FACET_FIELD, 'label') facetQuery.set(FacetParams.FACET_MINCOUNT, '1') response = solrServer.query(facetQuery) FacetField ff = response.facetFields.find {it.name == 'label'} List suggestResponse = [] for (FacetField.Count facetField in ff?.values) { suggestResponse facetField.name } assert suggestResponse == ['bose'] } With the upgrade to Solr4, the assertion fails, the suggested response contains 174 and 8482 as terms. Test output is: Assertion failed: assert suggestResponse == ['bose'] | | | false [174, 8482, bose] I just tried again using the latest build from today, namely: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/369/ and we're still getting the failing assertion. Is there a different way to configure the HTMLStripCharFilterFactory in Solr4? Thanks in advance for any tips! Mike
Re: Hierarchical faceting in UI
Hi Darren. You said: Your UI will associate the correct parent id to build the facet query This is the part I'm having trouble figuring out how to accomplish and some guidance would help. How would I get the value of the parent to build the facet query in the UI, if the value is in another document field? I was imagining that I would add the additional filter of parent:parent path to the fq URL parameter. But I don't have a way to do it yet. Perhaps seeing some data would help. Here is a record in old (flattened) and new (parent-enabled) versions, both in JSON format: OLD: { ID : 3816, Gene Symbol : KLK1, Alternate Names : hCG_22931;Klk6;hK1;KLKR, Description : Kallikrein 1, a peptidase that cleaves kininogen, functions in glucose homeostasis, heart contraction, semen liquefaction, and vasoconstriction, aberrantly expressed in pancreatitis and endometrial cancer; gene polymorphism correlates with kidney failure (BKL), GAD_Positive_Disease_Associations : [Mental Disorders(MESH:D001523) Dementia, Vascular(MESH:D015140), Cardiovascular Diseases(MESH:D002318) Coronary Artery Disease(MESH:D003324)], HuGENet_GeneProspector_Associations : [atherosclerosis, HDL], } NEW: { ID : 3816, Gene Symbol : KLK1, Alternate Names : hCG_22931;Klk6;hK1;KLKR, Description : Kallikrein 1, a peptidase that cleaves kininogen, functions in glucose homeostasis, heart contraction, semen liquefaction, and vasoconstriction, aberrantly expressed in pancreatitis and endometrial cancer; gene polymorphism correlates with kidney failure (BKL), GAD_Positive_Disease_Associations : [Dementia, Vascular(MESH:D015140), Coronary Artery Disease(MESH:D003324)], GAD_Positive_Disease_Associations_parent : [Mental Disorders(MESH:D001523), Cardiovascular Diseases(MESH:D002318)], HuGENet_GeneProspector_Associations : [atherosclerosis, HDL], } In the old version, the field GAD_Positive_Disease_Associations had 2 levels of hierarchy that were flattened. It had the full path of the hierarchy leading to the current term. In the new version, the field only has the current term. A separate field called GAD_Positive_Disease_Associations_parent has the full path preceding the current term. So, let's say in the UI, I click on the term Dementia, Vascular(MESH:D015140) to get its child terms and data. My filters in the URL querystring would be exactly: fq=GAD_Positive_Disease_Associations:Dementia, Vascular(MESH:D015140)fq=GAD_Positive_Disease_Associations_parent:Mental Disorders(MESH:D001523) My question is, how to get the parent value of Mental Disorders(MESH:D001523) to build that querystring? Thanks! Yuhao From: Darren Govoni dar...@ontrenet.com To: solr-user@lucene.apache.org Sent: Tuesday, January 24, 2012 1:23 PM Subject: Re: Hierarchical faceting in UI Yuhao, Ok, let me think about this. A term can have multiple parents. Each of those parents would be 'different', yes? In this case, use a multivalued field for the parent and add all the parent names or id's to it. The relations should be unique. Your UI will associate the correct parent id to build the facet query from and return the correct children because the user is descending down a specific path in the UI and the parent node unique id's are returned along the way. Now, if you are having parent names/id's that themselves can appear in multiple locations (vs. just terms 'the leafs'), then perhaps your hierarchy needs refactoring for redundancy? Happy to help with more details. Darren On 01/24/2012 11:22 AM, Yuhao wrote: Darren, One challenge for me is that a term can appear in multiple places of the hierarchy. So it's not safe to simply use the term as it appears to get its children; I probably need to include the entire tree path up to this term. For example, if the hierarchy is Cardiovascular Diseases Arteriosclerosis Coronary Artery Disease, and I'm getting the children of the middle term Arteriosclerosi, I need to filter on something like parent:Cardiovascular Diseases/Arteriosclerosis. I'm having trouble figuring out how I can get the complete path per above to add to the URL of each facet term. I know velocity/facet_field.vm is where I build the URL. I know how to simply add a parent:term filter to the URL. But I don't know how to access a document field, like the complete parent path, in facet_field.vm. Any help would be great. Yuhao From: dar...@ontrenet.comdar...@ontrenet.com To: Yuhaonfsvi...@yahoo.com Cc: solr-user@lucene.apache.org Sent: Monday, January 23, 2012 7:16 PM Subject: Re: Hierarchical faceting in UI On Mon, 23 Jan 2012 14:33:00 -0800 (PST), Yuhaonfsvi...@yahoo.com wrote: Programmatically, something like this might work: for each facet field, add another hidden field that
Re: Size of index to use shard
Thanks for the explanation Erick :) 2012/1/24, Erick Erickson erickerick...@gmail.com: Talking about index size can be very misleading. Take a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names. Note that the *.fdt and *.fdx files are used to for stored fields, i.e. the verbatim copy of data put in the index when you specify stored=true. These files have virtually no impact on search speed. So, if your *.fdx and *.fdt files are 90G out of a 100G index it is a much different thing than if these files are 10G out of a 100G index. And this doesn't even mention the peculiarities of your query mix. Nor does it say a thing about whether your cheapest alternative is to add more memory. Anderson's method is about the only reliable one, you just have to test with your index and real queries. At some point, you'll find your tipping point, typically when you come under memory pressure. And it's a balancing act between how much memory you allocate to the JVM and how much you leave for the op system. Bottom line: No hard and fast numbers. And you should periodically re-test the empirical numbers you *do* arrive at... Best Erick On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Apparently, not so easy to determine when to break the content into pieces. I'll investigate further about the amount of documents, the size of each document and what kind of search is being used. It seems, I will have to do a load test to identify the cutoff point to begin using the strategy of shards. Thanks 2012/1/24, Dmitry Kan dmitry@gmail.com: Hi, The article you gave mentions 13GB of index size. It is quite small index from our perspective. We have noticed, that at least solr 3.4 has some sort of choking point with respect to growing index size. It just becomes substantially slower than what we need (a query on avg taking more than 3-4 seconds) once index size crosses a magic level (about 80GB following our practical observations). We try to keep our indices at around 60-70GB for fast searches and above 100GB for slow ones. We also route majority of user queries to fast indices. Yes, caching may help, but not necessarily we can afford adding more RAM for bigger indices. BTW, our documents are very small, thus in 100GB index we can have around 200 mil. documents. It would be interesting to see, how you manage to ensure q-times under 1 sec with an index of 250GB? How many documents / facets do you ask max. at a time? FYI, we ask for a thousand of facets in one go. Regards, Dmitry On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7 machine with RAID10 and 24GB RAM = q-times under 1 sec. Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 400GB of size. I think is not required to break in shard, because i not consider this like a large index. Am I correct? What's is a real large index Thanks
Re: HTMLStripCharFilterFactory not working in Solr4?
Thanks for the response Yonik, Interestingly enough, changing to to the LegacyHTMLStripCharFilterFactory does NOT solve the problem - in fact I get the same result I can see that the LegacyHTMLStripCharFilterFactory is being applied at startup: Jan 24, 2012 1:25:29 PM org.apache.solr.util.plugin.AbstractPluginLoader load INFO: created : org.apache.solr.analysis.LegacyHTMLStripCharFilterFactory however, I'm still getting the same assertion error. Any thoughts? Mike On Tue, Jan 24, 2012 at 12:40 PM, Yonik Seeley yo...@lucidimagination.comwrote: You can use LegacyHTMLStripCharFilterFactory to get the previous behavior. See https://issues.apache.org/jira/browse/LUCENE-3690 for more details. -Yonik http://www.lucidimagination.com On Tue, Jan 24, 2012 at 1:34 PM, Mike Hugo m...@piragua.com wrote: We recently updated to the latest build of Solr4 and everything is working really well so far! There is one case that is not working the same way it was in Solr 3.4 - we strip out certain HTML constructs (like trademark and registered, for example) in a field as defined below - it was working in Solr3.4 with the configuration shown here, but is not working the same way in Solr4. The label field is defined as type=text_general field name=label type=text_general indexed=true stored=false required=false multiValued=true/ Here's the type definition for text_general field: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType In Solr 3.4, that configuration was completely stripping html constructs out of the indexed field which is exactly what we wanted. If for example, we then do a facet on the label field, like in the test below, we're getting some terms in the response that we would not like to be there. // test case (groovy) void specialHtmlConstructsGetStripped() { SolrInputDocument inputDocument = new SolrInputDocument() inputDocument.addField('label', 'Bose#174; #8482;') solrServer.add(inputDocument) solrServer.commit() QueryResponse response = solrServer.query(new SolrQuery('bose')) assert 1 == response.results.numFound SolrQuery facetQuery = new SolrQuery('bose') facetQuery.facet = true facetQuery.set(FacetParams.FACET_FIELD, 'label') facetQuery.set(FacetParams.FACET_MINCOUNT, '1') response = solrServer.query(facetQuery) FacetField ff = response.facetFields.find {it.name == 'label'} List suggestResponse = [] for (FacetField.Count facetField in ff?.values) { suggestResponse facetField.name } assert suggestResponse == ['bose'] } With the upgrade to Solr4, the assertion fails, the suggested response contains 174 and 8482 as terms. Test output is: Assertion failed: assert suggestResponse == ['bose'] | | | false [174, 8482, bose] I just tried again using the latest build from today, namely: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/369/ and we're still getting the failing assertion. Is there a different way to configure the HTMLStripCharFilterFactory in Solr4? Thanks in advance for any tips! Mike
Re: phrase auto-complete with suggester component
You might wanna read http://lucene.472066.n3.nabble.com/suggester-issues-td3262718.html#a3264740 which contains the solution to your problem. -- View this message in context: http://lucene.472066.n3.nabble.com/phrase-auto-complete-with-suggester-component-tp3685572p3685730.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexing failover and replication
Hi I'm doing now a test with replication using solr 1.4.1. I configured two servers (server1 and server 2) as master/slave to sincronized both. I put apache on the front side, and we index sometime in server1 and sometime in server2. I realized that the both index servers are now confused. In solr data folder, was created many index folders with the timestamp of syncronization (Exemple: index.20120124041340) with some segments inside. I thought that was possible to index in two master server and than synchronized both using replication. It's really possible do this with replication mechanism? If is possible, what I have done wrong? I need to have more than one node for indexing to guarantee failover feature for indexing. MultiMaster is the best way to guarantee failover feature for indexing? Thanks
Re: phrase auto-complete with suggester component
Thanks, I'll try out the custom class file. Any possibilities this class can be merged into solr? It seems like an expected behavior. On Tue, Jan 24, 2012 at 11:29 AM, O. Klein kl...@octoweb.nl wrote: You might wanna read http://lucene.472066.n3.nabble.com/suggester-issues-td3262718.html#a3264740 which contains the solution to your problem. -- View this message in context: http://lucene.472066.n3.nabble.com/phrase-auto-complete-with-suggester-component-tp3685572p3685730.html Sent from the Solr - User mailing list archive at Nabble.com. -- Tommy Chheng
RE: HTMLStripCharFilterFactory not working in Solr4?
Hi Mike, When I add the following test to TestHTMLStripCharFilterFactory.java on Solr trunk, it passes: public void testNumericCharacterEntities() throws Exception { final String text = Bose#174; #8482;; // |Bose® ™| HTMLStripCharFilterFactory htmlStripFactory = new HTMLStripCharFilterFactory(); htmlStripFactory.init(Collections.String,StringemptyMap()); CharStream charStream = htmlStripFactory.create(CharReader.get(new StringReader(text))); StandardTokenizerFactory stdTokFactory = new StandardTokenizerFactory(); stdTokFactory.init(DEFAULT_VERSION_PARAM); Tokenizer stream = stdTokFactory.create(charStream); assertTokenStreamContents(stream, new String[] { Bose }); } What's happening: First, htmlStripFactory converts #174; to ® and #8482; to ™. Then stdTokFactory declines to tokenize ® and ™, because they are belong to the Unicode general category Symbol, Other, and so are not included in any of the output tokens. StandardTokenizer uses the Word Break rules find UAX#29 http://unicode.org/reports/tr29/ to find token boundaries, and then outputs only alphanumeric tokens. See the JFlex grammar for details: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex?view=markup. The behavior you're seeing is not consistent with the above test. Steve -Original Message- From: Mike Hugo [mailto:m...@piragua.com] Sent: Tuesday, January 24, 2012 1:34 PM To: solr-user@lucene.apache.org Subject: HTMLStripCharFilterFactory not working in Solr4? We recently updated to the latest build of Solr4 and everything is working really well so far! There is one case that is not working the same way it was in Solr 3.4 - we strip out certain HTML constructs (like trademark and registered, for example) in a field as defined below - it was working in Solr3.4 with the configuration shown here, but is not working the same way in Solr4. The label field is defined as type=text_general field name=label type=text_general indexed=true stored=false required=false multiValued=true/ Here's the type definition for text_general field: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType In Solr 3.4, that configuration was completely stripping html constructs out of the indexed field which is exactly what we wanted. If for example, we then do a facet on the label field, like in the test below, we're getting some terms in the response that we would not like to be there. // test case (groovy) void specialHtmlConstructsGetStripped() { SolrInputDocument inputDocument = new SolrInputDocument() inputDocument.addField('label', 'Bose#174; #8482;') solrServer.add(inputDocument) solrServer.commit() QueryResponse response = solrServer.query(new SolrQuery('bose')) assert 1 == response.results.numFound SolrQuery facetQuery = new SolrQuery('bose') facetQuery.facet = true facetQuery.set(FacetParams.FACET_FIELD, 'label') facetQuery.set(FacetParams.FACET_MINCOUNT, '1') response = solrServer.query(facetQuery) FacetField ff = response.facetFields.find {it.name == 'label'} List suggestResponse = [] for (FacetField.Count facetField in ff?.values) { suggestResponse facetField.name } assert suggestResponse == ['bose'] } With the upgrade to Solr4, the assertion fails, the suggested response contains 174 and 8482 as terms. Test output is: Assertion failed: assert suggestResponse == ['bose'] | | | false [174, 8482, bose] I just tried again using the latest build from today, namely: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/369/ and we're still getting the failing assertion. Is there a different way to configure the HTMLStripCharFilterFactory in Solr4? Thanks in advance for any tips! Mike
RE: HTMLStripCharFilterFactory not working in Solr4?
Try putting the HTMLStripCharFilterFactory before the StandardTokenizerFactory instead of after it. I vaguely recall being burned by something like this before. -Michael
Re: HTMLStripCharFilterFactory not working in Solr4?
Oops, I didn't read carefully enough to see that you wanted those constructs entirely stripped out. Given that you're seeing numbers indexed, this strongly indicates an escaping bug in the SolrJ client that must have been introduced at some point. I'll see if I can reproduce it in a unit test. -Yonik http://www.lucidimagination.com
Re: dismax: limiting term match to one field
This seems like a real shame. As soon as you search across more than one field, the mm setting becomes nearly useless. -- View this message in context: http://lucene.472066.n3.nabble.com/dismax-limiting-term-match-to-one-field-tp2056498p3685850.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Size of index to use shard
@Erick thanks:) i´m with you with your opinion. my load tests show the same. @Dmitry my docs are small too, i think about 3-15KB per doc. i update my index all the time and i have an average of 20-50 requests per minute (20% facet queries, 80% large boolean queries with wildcard/fuzzy) . How much docs at a time= depends from choosed filters, from 10 to all 100Mio. I work with very small caches (strangely, but if my index is under 100GB i need larger caches, over 100GB smaller caches..) My JVM has 6GB, 18GB for I/O. With few updates a day i would configure very big caches, like Tim Burton (see HathiTrust´s Blog) Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Thanks for the explanation Erick :) 2012/1/24, Erick Erickson erickerick...@gmail.com: Talking about index size can be very misleading. Take a look at http://lucene.apache.org/java/3_5_0/fileformats.html#file-names. Note that the *.fdt and *.fdx files are used to for stored fields, i.e. the verbatim copy of data put in the index when you specify stored=true. These files have virtually no impact on search speed. So, if your *.fdx and *.fdt files are 90G out of a 100G index it is a much different thing than if these files are 10G out of a 100G index. And this doesn't even mention the peculiarities of your query mix. Nor does it say a thing about whether your cheapest alternative is to add more memory. Anderson's method is about the only reliable one, you just have to test with your index and real queries. At some point, you'll find your tipping point, typically when you come under memory pressure. And it's a balancing act between how much memory you allocate to the JVM and how much you leave for the op system. Bottom line: No hard and fast numbers. And you should periodically re-test the empirical numbers you *do* arrive at... Best Erick On Tue, Jan 24, 2012 at 5:31 AM, Anderson vasconcelos anderson.v...@gmail.com wrote: Apparently, not so easy to determine when to break the content into pieces. I'll investigate further about the amount of documents, the size of each document and what kind of search is being used. It seems, I will have to do a load test to identify the cutoff point to begin using the strategy of shards. Thanks 2012/1/24, Dmitry Kan dmitry@gmail.com: Hi, The article you gave mentions 13GB of index size. It is quite small index from our perspective. We have noticed, that at least solr 3.4 has some sort of choking point with respect to growing index size. It just becomes substantially slower than what we need (a query on avg taking more than 3-4 seconds) once index size crosses a magic level (about 80GB following our practical observations). We try to keep our indices at around 60-70GB for fast searches and above 100GB for slow ones. We also route majority of user queries to fast indices. Yes, caching may help, but not necessarily we can afford adding more RAM for bigger indices. BTW, our documents are very small, thus in 100GB index we can have around 200 mil. documents. It would be interesting to see, how you manage to ensure q-times under 1 sec with an index of 250GB? How many documents / facets do you ask max. at a time? FYI, we ask for a thousand of facets in one go. Regards, Dmitry On Tue, Jan 24, 2012 at 10:30 AM, Vadim Kisselmann v.kisselm...@googlemail.com wrote: Hi, it depends from your hardware. Read this: http://www.derivante.com/2009/05/05/solr-performance-benchmarks-single-vs-multi-core-index-shards/ Think about your cache-config (few updates, big caches) and a good HW-infrastructure. In my case i can handle a 250GB index with 100mil. docs on a I7 machine with RAID10 and 24GB RAM = q-times under 1 sec. Regards Vadim 2012/1/24 Anderson vasconcelos anderson.v...@gmail.com: Hi Has some size of index (or number of docs) that is necessary to break the index in shards? I have a index with 100GB of size. This index increase 10GB per year. (I don't have information how many docs they have) and the docs never will be deleted. Thinking in 30 years, the index will be with 400GB of size. I think is not required to break in shard, because i not consider this like a large index. Am I correct? What's is a real large index Thanks
Re: HTMLStripCharFilterFactory not working in Solr4?
Thanks for the responses everyone. Steve, the test method you provided also works for me. However, when I try a more end to end test with the HTMLStripCharFilterFactory configured for a field I am still having the same problem. I attached a failing unit test and configuration to the following issue in JIRA: https://issues.apache.org/jira/browse/LUCENE-3721 I appreciate all the prompt responses! Looking forward to finding the root cause of this guy :) If there's something I'm doing incorrectly in the configuration, please let me know! Mike On Tue, Jan 24, 2012 at 1:57 PM, Steven A Rowe sar...@syr.edu wrote: Hi Mike, When I add the following test to TestHTMLStripCharFilterFactory.java on Solr trunk, it passes: public void testNumericCharacterEntities() throws Exception { final String text = Bose#174; #8482;; // |Bose® ™| HTMLStripCharFilterFactory htmlStripFactory = new HTMLStripCharFilterFactory(); htmlStripFactory.init(Collections.String,StringemptyMap()); CharStream charStream = htmlStripFactory.create(CharReader.get(new StringReader(text))); StandardTokenizerFactory stdTokFactory = new StandardTokenizerFactory(); stdTokFactory.init(DEFAULT_VERSION_PARAM); Tokenizer stream = stdTokFactory.create(charStream); assertTokenStreamContents(stream, new String[] { Bose }); } What's happening: First, htmlStripFactory converts #174; to ® and #8482; to ™. Then stdTokFactory declines to tokenize ® and ™, because they are belong to the Unicode general category Symbol, Other, and so are not included in any of the output tokens. StandardTokenizer uses the Word Break rules find UAX#29 http://unicode.org/reports/tr29/ to find token boundaries, and then outputs only alphanumeric tokens. See the JFlex grammar for details: http://svn.apache.org/viewvc/lucene/dev/trunk/modules/analysis/common/src/java/org/apache/lucene/analysis/standard/StandardTokenizerImpl.jflex?view=markup . The behavior you're seeing is not consistent with the above test. Steve -Original Message- From: Mike Hugo [mailto:m...@piragua.com] Sent: Tuesday, January 24, 2012 1:34 PM To: solr-user@lucene.apache.org Subject: HTMLStripCharFilterFactory not working in Solr4? We recently updated to the latest build of Solr4 and everything is working really well so far! There is one case that is not working the same way it was in Solr 3.4 - we strip out certain HTML constructs (like trademark and registered, for example) in a field as defined below - it was working in Solr3.4 with the configuration shown here, but is not working the same way in Solr4. The label field is defined as type=text_general field name=label type=text_general indexed=true stored=false required=false multiValued=true/ Here's the type definition for text_general field: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType In Solr 3.4, that configuration was completely stripping html constructs out of the indexed field which is exactly what we wanted. If for example, we then do a facet on the label field, like in the test below, we're getting some terms in the response that we would not like to be there. // test case (groovy) void specialHtmlConstructsGetStripped() { SolrInputDocument inputDocument = new SolrInputDocument() inputDocument.addField('label', 'Bose#174; #8482;') solrServer.add(inputDocument) solrServer.commit() QueryResponse response = solrServer.query(new SolrQuery('bose')) assert 1 == response.results.numFound SolrQuery facetQuery = new SolrQuery('bose') facetQuery.facet = true facetQuery.set(FacetParams.FACET_FIELD, 'label') facetQuery.set(FacetParams.FACET_MINCOUNT, '1') response = solrServer.query(facetQuery) FacetField ff = response.facetFields.find {it.name == 'label'} List suggestResponse = [] for (FacetField.Count facetField in ff?.values) { suggestResponse facetField.name } assert suggestResponse == ['bose'] } With the upgrade to Solr4, the assertion fails, the suggested response contains
Fw: Problem with SpliBy in Solr 3.4
- Forwarded Message - From: Sumit Sen sumitse...@yahoo.com To: Solr List solr-user@lucene.apache.org Sent: Tuesday, January 24, 2012 3:53 PM Subject: Problem with SpliBy in Solr 3.4 Hi All: I have a very silly problem. I am using Solr 3.4. I have a data import handle for indexing which is not Spliting a field data by '|' inspite of following setup. document entity dataSource=ds-1 name=associate pk=id transformer=RegexTransformer query=Select case when EMPLID != ' ' then EMPLID END as ID , case when FIRST_NAME != ' ' then FIRST_NAME END as firstName, case when MIDDLE_NAME != ' ' then MIDDLE_NAME END as middleName, case when LAST_NAME != ' ' then LAST_NAME END as familyName, case when FORMER_NAME != ' ' then FORMER_NAME END as middleName, case when EMAIL_ADDRESS != ' ' then EMAIL_ADDRESS END as businessEmail, case when CITY != ' ' then CITY END as homeCity, case when STATE != ' ' then STATE END as homeCState, case when ZIP != ' ' then ZIP END as homeZip, case when COUNTRY_ISO != ' ' then COUNTRY_ISO END as homeCountry, case when WORK_PHONE != ' ' then WORK_PHONE END as businessTel, (select xlatlongname from xlattable where fieldname = 'PER_STATUS' and fieldvalue = t1.per_status and language_cd = 'ENG') as PER_STATUS, case when ORIG_HIRE_DT IS NOT NULL then ORIG_HIRE_DT END as hireDate, (select xlatlongname from xlattable where fieldname = 'SEX' and fieldvalue = t1.sex and language_cd = 'ENG') as sex, (select xlatlongname from xlattable where fieldname = 'ETHNIC_GROUP' and fieldvalue = t1.ethnic_group and language_cd = 'ENG') as ethnicityCode, case when CITZNS_CNTRY_ISO != ' ' then CITZNS_CNTRY_ISO END as citizenship, (select xlatlongname from xlattable where fieldname = 'MAR_STATUS' and fieldvalue = t1.mar_status and language_cd = 'ENG') as marritalStatus, case when PREFERRED_LANGUAGE != ' ' then PREFERRED_LANGUAGE END as primaryLanguageCode, case when BUSINESS_TITLE != ' ' then BUSINESS_TITLE END as businessTitle, case when TITLE != ' ' then TITLE END as title, case when JOBCODE != ' ' then JOBCODE END , (select xlatlongname from xlattable where fieldname = 'EMPL_STATUS' and fieldvalue = t1.empl_status and language_cd = 'ENG') as workLevelStatus, case when LOCATION != ' ' then LOCATION END , case when CITY_EMPL != ' ' then CITY_EMPL END , case when STATE_EMPL != ' ' then STATE_EMPL END , case when COUNTRY_2CHAR != ' ' then COUNTRY_2CHAR END , case when ZIP_INTL != ' ' then ZIP_INTL END , (select xlatlongname from xlattable where fieldname = 'EMPL_TYPE' and fieldvalue = t1.empl_type and language_cd = 'ENG') as employmenttype, case when HOME_DEPARTMENT != ' ' then HOME_DEPARTMENT END as DEPARTMENT, (Select case when name != ' ' then name end from ps_personal_data where employee_oid = t1.REPORTS_TO_AOID) as reportsTo, case when t1.ROLE_CODE1 != ' ' then t1.ROLE_CODE1 end ||'|'|| case when t1.ROLE_CODE2 != ' ' then t1.ROLE_CODE2 end ||'|'|| case when t1.ROLE_CODE3 != ' ' then t1.ROLE_CODE3 end ||'|'|| case when t1.EE_ROLE_CODE1 != ' ' then t1.EE_ROLE_CODE1 end ||'|'|| case when t1.EE_ROLE_CODE2 != ' ' then t1.EE_ROLE_CODE2 end ||'|'|| case when t1.EE_ROLE_CODE3 != ' ' then t1.EE_ROLE_CODE3 end ||'|'|| case when t1.EE_ROLE_CODE4 != ' ' then t1.EE_ROLE_CODE4 end ||'|'|| case when t1.EE_ROLE_CODE5 != ' ' then t1.EE_ROLE_CODE5 end ||'|'|| case when t1.EE_ROLE_CODE6 != ' ' then t1.EE_ROLE_CODE6 end as roleCode From PS_BOD_EE_VW t1 where t1.per_status = 'A' field column = id / field column = title / field column = firstName / field column = middleName / field column = familyName / field column = maidenName / field column = primaryLanguageCode / ... ... field column = education / field column = roleCode splitBy = \| name=roleCode / field column = applicationDate / ... ... field column = securityLevel / /entity /document /dataConfig I schema.xm I have field name=id type=string indexed=true stored=true required=true / field name=title type=string indexed=true stored=true required=false / field name=firstName type=string indexed=true stored=true required=false / field name=middleName type=string indexed=true stored=true required=false / field name=familyName type=string indexed=true stored=true required=false / field name=maidenName type=string indexed=true stored=true required=false / field name=sex type=string indexed=true stored=true required=false / field
Re: Do Hignlighting + proximity using surround query parser
I got this working the way you describe it (in the getHighlightQuery() method). The span queries were tripping it up, so I extracted the query terms and created a DisMax query from them. There'll be a loss of accuracy in the highlighting, but in my case that's better than no highlighting. Should I just go ahead and submit a patch to SOLR-2703? On Tue, Jan 10, 2012 at 9:35 AM, Ahmet Arslan iori...@yahoo.com wrote: I am not able to do highlighting with surround query parser on the returned results. I have tried the highlighting component but it does not return highlighted results. Highlighter does not recognize Surround Query. It must be re-written to enable highlighting in o.a.s.search.QParser#getHighlightQuery() method. Not sure this functionality should be added in SOLR-2703 or a separate jira issue. -- Scott Stults | Founder Solutions Architect | OpenSource Connections, LLC | 434.409.2780 http://www.opensourceconnections.com
Re: Solr 3.5.0 can't find Carrot classes
On Tuesday, January 24, 2012 at 3:07 PM, Christopher J. Bottaro wrote: SEVERE: java.lang.NoClassDefFoundError: org/carrot2/core/ControllerFactory at org.apache.solr.handler.clustering.carrot2.CarrotClusteringEngine.init(CarrotClusteringEngine.java:102) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(Unknown Source) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source) at java.lang.reflect.Constructor.newInstance(Unknown Source) at java.lang.Class.newInstance0(Unknown Source) at java.lang.Class.newInstance(Unknown Source) … I'm starting Solr with -Dsolr.clustering.enabled=true and I can see that the Carrot jars in contrib are getting loaded. Full log file is here: http://onespot-development.s3.amazonaws.com/solr.log Any ideas? Thanks for the help. Ok, got a little further. Seems that Solr doesn't like it if you include jars more than once (I had a lib dir and also lib directives in the solrconfig which ended up loading the same jars twice). But now I'm getting these errors: java.lang.NoClassDefFoundError: org/apache/solr/handler/clustering/SearchClusteringEngine Any help? Thanks.
Re: Do Hignlighting + proximity using surround query parser
I got this working the way you describe it (in the getHighlightQuery() method). The span queries were tripping it up, so I extracted the query terms and created a DisMax query from them. There'll be a loss of accuracy in the highlighting, but in my case that's better than no highlighting. Should I just go ahead and submit a patch to SOLR-2703? I think a separate jira ticket would be more appropriate. By the way, o.a.l.search.Query#rewrite(IndexReader reader) should do the trick. /** * Highlighter does not recognize SurroundQuery. * It must be rewritten in its most primitive form to enable highlighting. */ @Override public Query getHighlightQuery() throws ParseException { Query rewritedQuery; try { rewritedQuery = getQuery().rewrite(getReq().getSearcher().getIndexReader()); } catch (IOException ioe) { rewritedQuery = null; LOG.error(query.rewrite() failed, ioe); } if (rewritedQuery == null) return getQuery(); else return rewritedQuery; }
solr not working with magento enterprise 1.11
I am integrating solr 3.5 with jetty in magento EE 1.11. I have followed all the necessary steps, configure and tested solr connection in magento catalog system config. I have copied magento/lib/Solr/conf/ content to solr installation. I have run the index management, restarted jetty but when I search any word or misspell its not showing me Did you mean ? string means not correcting misspelling. seems solr is not throwing results. please let me know how can i know solr is working with magento and where solr save XML documents when magento pushes attributes and product information in solr ? which directory it stores them. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686773.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr not working with magento enterprise 1.11
Hey, Shouldn't you be asking this question to the Magento people? You have an Enterprise edition, so you have paid for their support. Cheers, David On 25/01/2012 2:57 PM, vishal_asc wrote: I am integrating solr 3.5 with jetty in magento EE 1.11. I have followed all the necessary steps, configure and tested solr connection in magento catalog system config. I have copied magento/lib/Solr/conf/ content to solr installation. I have run the index management, restarted jetty but when I search any word or misspell its not showing me Did you mean ? string means not correcting misspelling. seems solr is not throwing results. please let me know how can i know solr is working with magento and where solr save XML documents when magento pushes attributes and product information in solr ? which directory it stores them. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686773.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Cores
Thanks Erick. Regards Sujatha On Mon, Jan 23, 2012 at 11:16 PM, Erick Erickson erickerick...@gmail.comwrote: You can have a large number of cores, some people have multiple hundreds. Having multiple cores is preferred over having multiple JVMs since it's more efficient at sharing system resources. If you're running a 32 bit JVM, you are limited in the amount of memory you can let the JVM use, so that's a consideration, but otherwise use multiple cores in one JVM and give that JVM say, half of the physical memory on the machine and tune from there. Best Erick On Sun, Jan 22, 2012 at 8:16 PM, Sujatha Arun suja.a...@gmail.com wrote: Hello, We have in production a number of individual solr Instnaces on a single JVM.As a result ,we see that the permgenSpace keeps increasing with each additional instance added. I would Like to know ,if we can have solr cores , instead of individual instances. - Is there any limit to the number of cores ,for a single instance? - Will this decrease the permgen space as the LIB is shared.? - Would there be any decrease in performance with number of cores added? - Any thing else that I should know before moving into cores? Any help would be appreciated? Regards Sujatha
Re: solr not working with magento enterprise 1.11
Thanks David. As of now we are configuring it on local WAMP server and we have only development version provided by sales team. Do you when where solr saves information or push the xml docs when we run index management in magento ? I followed this site: http://www.summasolutions.net/blogposts/magento-apache-solr-set Please let me know if you have some other info also. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686816.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr not working with magento enterprise 1.11
Thanks David. As of now we are configuring it on local WAMP server and we have only development version provided by sales team. Do you when where solr saves information or push the xml docs when we run index management in magento ? I followed this site: http://www.summasolutions.net/blogposts/magento-apache-solr-set Please let me know if you have some other info also. Best Regards, Vishal Porwal From: David Radunz [via Lucene] [mailto:ml-node+s472066n3686805...@n3.nabble.com] Sent: Wednesday, January 25, 2012 9:47 AM To: Vishal Porwal Subject: Re: solr not working with magento enterprise 1.11 Hey, Shouldn't you be asking this question to the Magento people? You have an Enterprise edition, so you have paid for their support. Cheers, David On 25/01/2012 2:57 PM, vishal_asc wrote: I am integrating solr 3.5 with jetty in magento EE 1.11. I have followed all the necessary steps, configure and tested solr connection in magento catalog system config. I have copied magento/lib/Solr/conf/ content to solr installation. I have run the index management, restarted jetty but when I search any word or misspell its not showing me Did you mean ? string means not correcting misspelling. seems solr is not throwing results. please let me know how can i know solr is working with magento and where solr save XML documents when magento pushes attributes and product information in solr ? which directory it stores them. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686773.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686805.html To unsubscribe from solr not working with magento enterprise 1.11, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3686773code=dmlzaGFsLnBvcndhbEBhc2NlbmR1bS5jb218MzY4Njc3M3w5NjEyMzY0MDE=. NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686818.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr not working with magento enterprise 1.11
Hey, I am using Magento Community Edition, I wrote my own Magento extension to integrate Solr and it works fine. So I really don't know what the Enterprise edition does. On a personal and unrelated note, I would never use Windows for a server; Unreliable and most of the system resources go towards the OS. Cheers, David On 25/01/2012 3:30 PM, vishal_asc wrote: Thanks David. As of now we are configuring it on local WAMP server and we have only development version provided by sales team. Do you when where solr saves information or push the xml docs when we run index management in magento ? I followed this site: http://www.summasolutions.net/blogposts/magento-apache-solr-set Please let me know if you have some other info also. Best Regards, Vishal Porwal From: David Radunz [via Lucene] [mailto:ml-node+s472066n3686805...@n3.nabble.com] Sent: Wednesday, January 25, 2012 9:47 AM To: Vishal Porwal Subject: Re: solr not working with magento enterprise 1.11 Hey, Shouldn't you be asking this question to the Magento people? You have an Enterprise edition, so you have paid for their support. Cheers, David On 25/01/2012 2:57 PM, vishal_asc wrote: I am integrating solr 3.5 with jetty in magento EE 1.11. I have followed all the necessary steps, configure and tested solr connection in magento catalog system config. I have copied magento/lib/Solr/conf/ content to solr installation. I have run the index management, restarted jetty but when I search any word or misspell its not showing me Did you mean ? string means not correcting misspelling. seems solr is not throwing results. please let me know how can i know solr is working with magento and where solr save XML documents when magento pushes attributes and product information in solr ? which directory it stores them. -- View this message in context: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686773.html Sent from the Solr - User mailing list archive at Nabble.com. If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686805.html To unsubscribe from solr not working with magento enterprise 1.11, click herehttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=3686773code=dmlzaGFsLnBvcndhbEBhc2NlbmR1bS5jb218MzY4Njc3M3w5NjEyMzY0MDE=. NAMLhttp://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.InstantMailNamespacebreadcrumbs=instant+emails%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/solr-not-working-with-magento-enterprise-1-11-tp3686773p3686818.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SpellCheck Help
I have installed the same solr 3.5 with jetty and integrating it magento 1.11 but it seems to be not working. As my search result is not showing Did you mean string ? when I misspelled any word. I followed all steps necessary for magento solr integration. Please help ASAP. Thanks Vishal -- View this message in context: http://lucene.472066.n3.nabble.com/SpellCheck-Help-tp3648589p3686756.html Sent from the Solr - User mailing list archive at Nabble.com.