Re: Is SOLR best suited to this application - Finding co-ordinates
Normalising the data is a good idea, and it would be easy to do since I would only have around 50,000 entires BUT it is a bit complicated with addresses I think. Lets say I store the data in this form: TownCityCountry London, England Swindon, Wiltshire, England Wiltshire England England What happens if someone searches just London, or just Swindon. I assume it wouldnt return any results because they would have to type London, England for example. If I include an entry for London and London, England then the autocomplete will show both, which would confuse the user. -- View this message in context: http://lucene.472066.n3.nabble.com/Is-SOLR-best-suited-to-this-application-Finding-co-ordinates-tp3998308p3998547.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr upgrade from 1.4 to 3.6
Hi Kalyan, that is becouse SolrJ uses javabin as format which has class version numbers in the serialized objects that do not match. Set the format to XML (wt parameter) and it will work (maybe JSON would, as well). Chantal Am 31.07.2012 um 20:50 schrieb Manepalli, Kalyan: Hi all, We are trying to upgrade our solr instance from 1.4 to 3.6. We use SolrJ API to fetch the data from index. We see that SolrJ 3.6 version is not compatible with index generated with 1.4. Is this known issue and is there a workaround for this. Thanks, Kalyan Manepalli
auto completion search with solr using NGrams in SOLR
I want to implement an auto completion search with solr using NGrams. If the user is searching for names of employees, then auto completion should be applied. ie., if types j then need to show the names starts with j if types ja then need to show the names starts with ja if types jac then need to show the names starts with jak if types jack then need to show the names starts with jack Below is my configuration settings in schema.xml, Please suggest me if anything wrong. below is my code in schema.xml fieldType name=edgytext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType field name=empname type=edgytext indexed=true stored=true / field name=autocomplete_text type=edgytext indexed=true stored=true omitNorms=true omitTermFreqAndPositions=true / copyField source=empname dest=text / when im searching with name mado or madonna getting employees names.But when searching with madon not getting any data. Please help me on this. Thanks in Advance, Anil. -- View this message in context: http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559.html Sent from the Solr - User mailing list archive at Nabble.com.
Urgent: Facetable but not Searchable Field
All, We have a requirement, where we need to implement 2 fields as Facetable, but the values of the fields should not be Searchable. Please let me know is this feature Supported in Solr If yes what would be the Configuration to be done in Schema.xml and Solrconfig.xml to achieve the same. This is kind of urgent as we need to reply on the functionality. Thanks in advance, Jay
Re: Urgent: Facetable but not Searchable Field
On 01.08.2012 13:58, jayakeerthi s wrote: We have a requirement, where we need to implement 2 fields as Facetable, but the values of the fields should not be Searchable. Simply don't search for it, then it's not searchable. Or do I simply don't understand your question? As long as Dismax doesn't have the attribute in its qf parameter, it's not getting searched. Or, if the user has direct access to Solr, then she can search for the attribute. And can delete the index, or crash the server, if she likes. So the short anser is: No. Facettable fields must be searchable. But usually, this is no problem. -Kuli
Re: Urgent: Facetable but not Searchable Field
On Wed, Aug 1, 2012 at 7:58 AM, jayakeerthi s mail2keer...@gmail.com wrote: We have a requirement, where we need to implement 2 fields as Facetable, but the values of the fields should not be Searchable. The user fields uf feature of the edismax parser may work for you: http://wiki.apache.org/solr/ExtendedDisMax#uf_.28User_Fields.29 -Yonik http://lucidimagination.com
AW: auto completion search with solr using NGrams in SOLR
Your configuration of the fieldtype looks quite ok. In what field are you searching? text? empname ? autocomplete_text? If you are searching in autocomplete_text how do you add content to it? Is there another copyfield statement? If you are searching in text what fieldtype has that field. You can use the analysis.jsp (linked at the admin console) to check what happens with your content at index time and search time and if there is a match. Viele Grüße aus Augsburg Markus Klose SHI Elektronische Medien GmbH -Ursprüngliche Nachricht- Von: aniljayanti [mailto:anil.jaya...@gmail.com] Gesendet: Mittwoch, 1. August 2012 12:05 An: solr-user@lucene.apache.org Betreff: auto completion search with solr using NGrams in SOLR I want to implement an auto completion search with solr using NGrams. If the user is searching for names of employees, then auto completion should be applied. ie., if types j then need to show the names starts with j if types ja then need to show the names starts with ja if types jac then need to show the names starts with jak if types jack then need to show the names starts with jack Below is my configuration settings in schema.xml, Please suggest me if anything wrong. below is my code in schema.xml fieldType name=edgytext class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType field name=empname type=edgytext indexed=true stored=true / field name=autocomplete_text type=edgytext indexed=true stored=true omitNorms=true omitTermFreqAndPositions=true / copyField source=empname dest=text / when im searching with name mado or madonna getting employees names.But when searching with madon not getting any data. Please help me on this. Thanks in Advance, Anil. -- View this message in context: http://lucene.472066.n3.nabble.com/auto-completion-search-with-solr-using-NGrams-in-SOLR-tp3998559.html Sent from the Solr - User mailing list archive at Nabble.com.
termFrequncy off and still use fastvector highlighter?
hi We would like to turn off TF for a field but we still want to use fast vector highlighter. How would we do that? -- View this message in context: http://lucene.472066.n3.nabble.com/termFrequncy-off-and-still-use-fastvector-highlighter-tp3998590.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Urgent: Facetable but not Searchable Field
The indexed and stored field attributes are independent, so you can define a facet field as stored but not indexed (stored=true indexed=false), so that the field can be faceted but not indexed. In addition, you can also use a copyField to copy the original values for an indexed field (before the values get analyzed and transformed to be placed in the index as terms) to a stored field to facet them (or vice versa). -- Jack Krupansky -Original Message- From: jayakeerthi s Sent: Wednesday, August 01, 2012 6:58 AM To: solr-user@lucene.apache.org ; solr-user-h...@lucene.apache.org ; solr-dev-h...@lucene.apache.org Subject: Urgent: Facetable but not Searchable Field All, We have a requirement, where we need to implement 2 fields as Facetable, but the values of the fields should not be Searchable. Please let me know is this feature Supported in Solr If yes what would be the Configuration to be done in Schema.xml and Solrconfig.xml to achieve the same. This is kind of urgent as we need to reply on the functionality. Thanks in advance, Jay
Re: Urgent: Facetable but not Searchable Field
On 01.08.2012 15:40, Jack Krupansky wrote: The indexed and stored field attributes are independent, so you can define a facet field as stored but not indexed (stored=true indexed=false), so that the field can be faceted but not indexed. ? A field must be indexed to be used for faceting. -Kuli
Re: Urgent: Facetable but not Searchable Field
Oops. Obviously facet fields must be indexed. Not sure what I was thinking at the moment. -- Jack Krupansky -Original Message- From: Michael Kuhlmann Sent: Wednesday, August 01, 2012 8:54 AM To: solr-user@lucene.apache.org Subject: Re: Urgent: Facetable but not Searchable Field On 01.08.2012 15:40, Jack Krupansky wrote: The indexed and stored field attributes are independent, so you can define a facet field as stored but not indexed (stored=true indexed=false), so that the field can be faceted but not indexed. ? A field must be indexed to be used for faceting. -Kuli
Cloud and cores
Hi all, I'm playing around with SolrCloud and followed indications I found at http://wiki.apache.org/solr/SolrCloud/ - Started Instance 1 with embedded zk - Started Instances 2 3 and 4 using Instance 1 as zk server. Everything works fine. Then, using CoreAdmin, I add a second core in collection1 for Instance 1 and 3 ... everything is ok in the admin GUI, meaning that the graph show 2 shards of 3 server addresses each, those having 2 cores showing to time on the graph. collection1 shard1 wks-pge:7574 wks-pge:8900 wks-pge:8983 shard2 wks-pge:8983 wks-pge:7500 wks-pge:8900 On instances 1 and 3 I have 2 cores both at the bottom of the left column, and in the CoreAdmin screen. I restart everything, and find the server in what seems to be an inconsistent state : i.e. graph still showing 2 shards of 3 server addresses, but CoreAdmin not showing my additional cores any more. Is there a problem in SolrCloud or CoreAdmin, or did I just do something stupid here ? :) Pierre
Map Complex Datastructure with Solr
Hi, how can I map these complex Datastructure in Solr? Document - Groups - Group_ID - Group_Name - . - Title - Chapter - Chapter_Title - Chapter_Content Or Product - Groups - Group_ID - Group_Name - . - Title - Articles - Artilce_ID - Artilce_Color - Artilce_Size Thanks for ideas
Re: Map Complex Datastructure with Solr
The general rule is to flatten the structures. You have a choice between sharing common fields between tables, such as title, or adding a prefix/suffix to qualify them, such as document_title vs. product_title. You also have the choice of storing different tables in separate Solr cores/collections, but then you have the burden of querying them separately and coordinating the separate results on your own. It all depends on your application. A lot hinges on: 1. How do you want to search the data? 2. How do you want to access the fields once the Solr documents have been identified by a query - such as fields to retrieve, join, etc. So, once the data is indexed, what are your requirements for accessing the data? E.g., some sample pseudo-queries and the fields you want to access. -- Jack Krupansky -Original Message- From: Thomas Gravel Sent: Wednesday, August 01, 2012 9:52 AM To: solr-user@lucene.apache.org Subject: Map Complex Datastructure with Solr Hi, how can I map these complex Datastructure in Solr? Document - Groups - Group_ID - Group_Name - . - Title - Chapter - Chapter_Title - Chapter_Content Or Product - Groups - Group_ID - Group_Name - . - Title - Articles - Artilce_ID - Artilce_Color - Artilce_Size Thanks for ideas
Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter
On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote: Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up being moved to Old Gen. Eventually, Old Gen also fills up and we start getting into excessive GC problem. Hi: I don't claim to know anything about how tomcat manages threads, but really you shouldnt have all these objects. In general snowball stemmers should be reused per-thread-per-field. But if you have a lot of fields*threads, especially if there really is high thread churn on tomcat, then this could be bad with snowball: see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841 I think it would be useful to see if you can tune tomcat's threadpool as he describes. separately: Snowball stemmers are currently really ram-expensive for stupid reasons. each one creates a ton of Among objects, e.g. an EnglishStemmer today is about 8KB. I'll regenerate these and open a JIRA issue: as the snowball code generator in their svn was improved recently and each one now takes about 64 bytes instead (the Among's are static and reused). Still this wont really solve your problem, because the analysis chain could have other heavy parts in initialization, but it seems good to fix. As a workaround until then you can also just use the good old PorterStemmer (PorterStemFilterFactory in solr). Its not exactly the same as using Snowball(English) but its pretty close and also much faster. -- lucidimagination.com
RE: Cloud and cores
It may have something to do with SOLR-3425, but I'm not that sure it fits. I made some more tests. Case 1 : with SolrCloud I can create a new core on one of the server by the admin GUI or by CREATE directive in URL. The data folder is created (but no conf folder, I believe zk conf is used). However ./solr/solr.xml is not updated with the new core parameter. If I restart the server, the core is lost (but data folder is kept) Case 2 : on a single solr server Creation of new core fails by the gui with error : GRAVE: org.apache.solr.common.SolrException: Error executing default implementation of CREATE at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:396) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:141) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:175) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1337) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:484) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:233) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1065) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:413) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:999) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111) at org.eclipse.jetty.server.Server.handle(Server.java:351) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:454) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:47) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:890) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:944) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:634) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:230) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:66) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:254) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:599) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:534) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.RuntimeException: Can't find resource 'solrconfig.xml' in classpath or 'solr\core2\conf/', cwd=F:\solr-4.0\Test at org.apache.solr.core.SolrResourceLoader.openResource(SolrResourceLoader.java:294) at org.apache.solr.core.SolrResourceLoader.openConfig(SolrResourceLoader.java:260) at org.apache.solr.core.Config.init(Config.java:111) at org.apache.solr.core.Config.init(Config.java:78) at org.apache.solr.core.SolrConfig.init(SolrConfig.java:117) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:742) at org.apache.solr.handler.admin.CoreAdminHandler.handleCreateAction(CoreAdminHandler.java:391) ... 29 more Using an URL CREATE and giving relative pathes for solrconfig.xml and shema.xml fails later on stopwords.txt Again solr/solr.xml is not updated, but the runtime exception could explain that in this case. Pierre -Message d'origine- De : Pierre GOSSÉ [mailto:pierre.go...@arisem.com] Envoyé : mercredi 1 août 2012 16:22 À : solr-user@lucene.apache.org Objet : Cloud and cores Hi all, I'm playing around with SolrCloud and followed indications I found at http://wiki.apache.org/solr/SolrCloud/ - Started Instance 1 with embedded zk - Started Instances 2 3 and 4 using Instance 1 as zk server. Everything works fine. Then, using CoreAdmin, I add a second core in collection1 for Instance 1 and 3 ... everything is ok in the admin GUI, meaning that the graph show 2 shards of 3 server addresses each, those
StandardTokenizerFactory is behaving differently in Solr 3.6?
I have a field type like the following: fieldType name=text_general_name class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType This type is behaving differently in Solr 3.3 and 3.6. In 3.3, the following doesn't return any records because there is no author called 'Gerri Killis'. But there is an author called ''Gerri Jonathan'. /select/?q=author:Gerri\ Killis In 3.6, the following returns records because there is an author called 'Gerri Jonathan'. So something is wrong in 3.6?. I didn't expect any records here, because there is no author called 'Gerri Killis'. /select/?q=author:Gerri\ Killis Your help is appreciated. Thanks Srini -- View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Map Complex Datastructure with Solr
Thanks for the answer. Ich have to explain, where the problem is... you may have at the shop solutions products and articles. The product is the parent of all articles... in json... { product_name: tank top, article_list: [ { color: red, price: 10.99, size: XL, inStore: true }, { color: blue, price: 15.99, size: XL, inStore: false } ] } the problem is not the search (i think, because you can use copyField), but the searchresults... I have read the possibility to create own FieldTypes, but I don't know if this is the answer of my issues... 2012/8/1 Jack Krupansky j...@basetechnology.com: The general rule is to flatten the structures. You have a choice between sharing common fields between tables, such as title, or adding a prefix/suffix to qualify them, such as document_title vs. product_title. You also have the choice of storing different tables in separate Solr cores/collections, but then you have the burden of querying them separately and coordinating the separate results on your own. It all depends on your application. A lot hinges on: 1. How do you want to search the data? 2. How do you want to access the fields once the Solr documents have been identified by a query - such as fields to retrieve, join, etc. So, once the data is indexed, what are your requirements for accessing the data? E.g., some sample pseudo-queries and the fields you want to access. -- Jack Krupansky -Original Message- From: Thomas Gravel Sent: Wednesday, August 01, 2012 9:52 AM To: solr-user@lucene.apache.org Subject: Map Complex Datastructure with Solr Hi, how can I map these complex Datastructure in Solr? Document - Groups - Group_ID - Group_Name - . - Title - Chapter - Chapter_Title - Chapter_Content Or Product - Groups - Group_ID - Group_Name - . - Title - Articles - Artilce_ID - Artilce_Color - Artilce_Size Thanks for ideas
Re: Map Complex Datastructure with Solr
Sorry, that did not explain the problem, just more info about data layout. What are you actually trying to get out of SOLR? Are you saying you want parent's details repeated in every entry? Are you saying that you want to be able to find entries and from there, being able to find specific parent. Whatever you do, SOLR will return you a list of flat entries plus some statistics on occurrences and facets. Given that, what would you like to see? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Aug 1, 2012 at 12:33 PM, Thomas Gravel thomas.gra...@gmail.com wrote: Thanks for the answer. Ich have to explain, where the problem is... you may have at the shop solutions products and articles. The product is the parent of all articles... in json... { product_name: tank top, article_list: [ { color: red, price: 10.99, size: XL, inStore: true }, { color: blue, price: 15.99, size: XL, inStore: false } ] } the problem is not the search (i think, because you can use copyField), but the searchresults... I have read the possibility to create own FieldTypes, but I don't know if this is the answer of my issues... 2012/8/1 Jack Krupansky j...@basetechnology.com: The general rule is to flatten the structures. You have a choice between sharing common fields between tables, such as title, or adding a prefix/suffix to qualify them, such as document_title vs. product_title. You also have the choice of storing different tables in separate Solr cores/collections, but then you have the burden of querying them separately and coordinating the separate results on your own. It all depends on your application. A lot hinges on: 1. How do you want to search the data? 2. How do you want to access the fields once the Solr documents have been identified by a query - such as fields to retrieve, join, etc. So, once the data is indexed, what are your requirements for accessing the data? E.g., some sample pseudo-queries and the fields you want to access. -- Jack Krupansky -Original Message- From: Thomas Gravel Sent: Wednesday, August 01, 2012 9:52 AM To: solr-user@lucene.apache.org Subject: Map Complex Datastructure with Solr Hi, how can I map these complex Datastructure in Solr? Document - Groups - Group_ID - Group_Name - . - Title - Chapter - Chapter_Title - Chapter_Content Or Product - Groups - Group_ID - Group_Name - . - Title - Articles - Artilce_ID - Artilce_Color - Artilce_Size Thanks for ideas
Re: Map Complex Datastructure with Solr
hm ok I think i have to write my example data and the queries I want to make + the response I expect... Data: { product_id: xyz76, product_name: tank top, brand: adidas, description:this is the long description of the product, short_description:this is the short description of the product, product_image:/images/tanktop.jpg, product_image:/images/tanktop2.jpg, article_list: [ { article_number: TR47, color: red, price: 10.99, size: XL, unit: 1 piece, inStore: true }, { article_number: TR48, color: blue, price: 15.99, size: XL, unit: 1 piece, inStore: false } ] } I want to search: - article_number (i.e with inStore = true) - color - description - short_description - product_name Facets: - brand - color - size - price example query-response { responseHeader:{ status:0, QTime:2, params:{ indent:on, start:0, q:IBProductName:Durch*, wt:json, version:2.2, rows:10}}, response:{numFound:1,start:0,docs:[ { product_id: xyz76, product_name: tank top, brand: adidas, description:this is the long description of the product, short_description:this is the short description of the product, product_image:/images/tanktop.jpg, product_image:/images/tanktop2.jpg, article_list: [ { color: red, price: 10.99, size: XL, unit: 1 piece, inStore: true }, { color: blue, price: 15.99, size: XL, unit: 1 piece, inStore: false } ] } ] }} 2012/8/1 Alexandre Rafalovitch arafa...@gmail.com: Sorry, that did not explain the problem, just more info about data layout. What are you actually trying to get out of SOLR? Are you saying you want parent's details repeated in every entry? Are you saying that you want to be able to find entries and from there, being able to find specific parent. Whatever you do, SOLR will return you a list of flat entries plus some statistics on occurrences and facets. Given that, what would you like to see? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, Aug 1, 2012 at 12:33 PM, Thomas Gravel thomas.gra...@gmail.com wrote: Thanks for the answer. Ich have to explain, where the problem is... you may have at the shop solutions products and articles. The product is the parent of all articles... in json... { product_name: tank top, article_list: [ { color: red, price: 10.99, size: XL, inStore: true }, { color: blue, price: 15.99, size: XL, inStore: false } ] } the problem is not the search (i think, because you can use copyField), but the searchresults... I have read the possibility to create own FieldTypes, but I don't know if this is the answer of my issues... 2012/8/1 Jack Krupansky j...@basetechnology.com: The general rule is to flatten the structures. You have a choice between sharing common fields between tables, such as title, or adding a prefix/suffix to qualify them, such as document_title vs. product_title. You also have the choice of storing different tables in separate Solr cores/collections, but then you have the burden of querying them separately and coordinating the separate results on your own. It all depends on your application. A lot hinges on: 1. How do you want to search the data? 2. How do you want to access the fields once the Solr documents have been identified by a query - such as fields to retrieve, join, etc. So, once the data is indexed, what are your requirements for accessing the data? E.g., some sample pseudo-queries and the fields you want to access. -- Jack Krupansky -Original Message- From: Thomas Gravel Sent: Wednesday, August 01, 2012 9:52 AM To: solr-user@lucene.apache.org Subject: Map Complex Datastructure with Solr Hi, how can I map these complex Datastructure in Solr? Document - Groups - Group_ID - Group_Name -
Exact match on few fields, fuzzy on others
Hi Folks, I am using Solr 3.4 and my document schema has attributes - title, transcript, author_name. Presently, I am using DisMax to search for a user query across transcript. I would also like to do an exact search on author_name so that for a query Albert Einstein, I would want to get all the documents which contain Albert or Einstein in transcript and also those documents which have author_name exactly as 'Albert Einstein'. Can we do this by dismax query parser? The schema for both the fields are below: fieldType name=text_commongrams class=solr.TextField analyzer charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.StandardTokenizerFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.CommonGramsFilterFactory words=stopwords_en.txt ignoreCase=true / filter class=solr.StopFilterFactory words=stopwords_en.txt ignoreCase=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / /analyzer /fieldType fieldType name=text_standard class=solr.TextField analyzer charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.StandardTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.StopFilterFactory words=stopwords_en.txt ignoreCase=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType field name=titletype=text_commongrams indexed=true stored=true multiValued=false / field name=author_name type=text_standard indexed=true stored=false / -- *Pranav Prakash* temet nosce
4.0 Strange Commit/Replication Issue
Hello all, I am running 4.0 alpha and have encountered something I am unable to explain. I am indexing content to a master server, and the data is replicating to a slave. The odd part is that when searching through the UI, no documents show up on master with a standard *:* query. All cache types are set to zero. I know indexing is working because I am watching the logs and I can see documents getting added, not to mention the data is written to the filesystem. I have autocommit set to 6 (1 minute) so it isn't a commit issue. The very strange part is that the slave is correctly replicating the data, and it is searchable in the UI on the slave (but not master). I don't understand how/why the data is visible on the slave and not visible on the master. Does anyone have any thoughts on this or seen it before? Thanks in advance! Briggs
Solr spellcheck for words with quotes
Hi , I use solr as search engine for our application. WE have a title Pandora's star. When I give a query as http://localhost:8983/solr/select?q=pandora's starspellcheck=true spellcheck.collate=true I get response as below, - lst name=spellcheck - lst name=suggestions - lst name=pandora int name=numFound1/int int name=startOffset10/int int name=endOffset17/int - arr name=suggestion strpandora's/str /arr /lst str name=collationtext_engb:pandora's's star/str /lst /lst The word goes as pandora and not as pandora's. An additional 's is appended to the collation result. Below is my configuraion for spellcheck fieldType name=textSpell class=solr.TextField positionIncrementGap=100 omitNorms=true analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_selma.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_selma.txt/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType Please suggest Thanks, Shri
Re: 4.0 Strange Commit/Replication Issue
Could your autocommit in the master be using openSearcher=false? If you go to the Master admin, do you see that the searcher has all the segments that you see in the filesystem? On Wed, Aug 1, 2012 at 4:24 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello all, I am running 4.0 alpha and have encountered something I am unable to explain. I am indexing content to a master server, and the data is replicating to a slave. The odd part is that when searching through the UI, no documents show up on master with a standard *:* query. All cache types are set to zero. I know indexing is working because I am watching the logs and I can see documents getting added, not to mention the data is written to the filesystem. I have autocommit set to 6 (1 minute) so it isn't a commit issue. The very strange part is that the slave is correctly replicating the data, and it is searchable in the UI on the slave (but not master). I don't understand how/why the data is visible on the slave and not visible on the master. Does anyone have any thoughts on this or seen it before? Thanks in advance! Briggs
Re: StandardTokenizerFactory is behaving differently in Solr 3.6?
I noticed, escape character which is in the query, is getting ignored in solr 3.6. For the following 3.3 gives results where 'Featuring Chimp' is matched. But in 3.6, it gives results where Featuring or Chimp or Featuring Chimp is matched. Any idea what is the difference between my 3.3 and 3.6 environments for this inconsistent results? /select/?q=title:Featuring\ Chimp -- View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998665.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: 4.0 Strange Commit/Replication Issue
That is the problem. I wasn't aware of that new feature in 4.0. Thanks for the quick response Tomás. -Briggs On Wed, Aug 1, 2012 at 3:08 PM, Tomás Fernández Löbbe tomasflo...@gmail.com wrote: Could your autocommit in the master be using openSearcher=false? If you go to the Master admin, do you see that the searcher has all the segments that you see in the filesystem? On Wed, Aug 1, 2012 at 4:24 PM, Briggs Thompson w.briggs.thomp...@gmail.com wrote: Hello all, I am running 4.0 alpha and have encountered something I am unable to explain. I am indexing content to a master server, and the data is replicating to a slave. The odd part is that when searching through the UI, no documents show up on master with a standard *:* query. All cache types are set to zero. I know indexing is working because I am watching the logs and I can see documents getting added, not to mention the data is written to the filesystem. I have autocommit set to 6 (1 minute) so it isn't a commit issue. The very strange part is that the slave is correctly replicating the data, and it is searchable in the UI on the slave (but not master). I don't understand how/why the data is visible on the slave and not visible on the master. Does anyone have any thoughts on this or seen it before? Thanks in advance! Briggs
Re: StandardTokenizerFactory is behaving differently in Solr 3.6?
Which query parser do you have set in your request handler? There was a problem with edismax in 3.6 with the WordDelimiterFilter, that sounds exactly like your symptom. The workaround is to enclose the term in quotes (to make it a phrase), otherwise the terms would be ORed rather than ANDed. -- Jack Krupansky -Original Message- From: raonalluri Sent: Wednesday, August 01, 2012 3:25 PM To: solr-user@lucene.apache.org Subject: Re: StandardTokenizerFactory is behaving differently in Solr 3.6? I noticed, escape character which is in the query, is getting ignored in solr 3.6. For the following 3.3 gives results where 'Featuring Chimp' is matched. But in 3.6, it gives results where Featuring or Chimp or Featuring Chimp is matched. Any idea what is the difference between my 3.3 and 3.6 environments for this inconsistent results? /select/?q=title:Featuring\ Chimp -- View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998665.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: StandardTokenizerFactory is behaving differently in Solr 3.6?
Jack, thanks a lot for your reply. We are using LuceneQParser query parser. I agree, if I phrase the string by adding double quotes, I am good. But I am checking if there is any fix for this without changing the query. As we are in production environment, we need to change the quries in different places. Can we escape from this issue by change the query parser? regards Srini -- View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998677.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: StandardTokenizerFactory is behaving differently in Solr 3.6?
This may simply be a matter of changing the default query operator from OR to AND. Try adding q.op=AND to your request. -- Jack Krupansky -Original Message- From: raonalluri Sent: Wednesday, August 01, 2012 4:26 PM To: solr-user@lucene.apache.org Subject: Re: StandardTokenizerFactory is behaving differently in Solr 3.6? Jack, thanks a lot for your reply. We are using LuceneQParser query parser. I agree, if I phrase the string by adding double quotes, I am good. But I am checking if there is any fix for this without changing the query. As we are in production environment, we need to change the quries in different places. Can we escape from this issue by change the query parser? regards Srini -- View this message in context: http://lucene.472066.n3.nabble.com/StandardTokenizerFactory-is-behaving-differently-in-Solr-3-6-tp3998623p3998677.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Exact match on few fields, fuzzy on others
Try edismax with the PF2 option, which will automatically boost documents that contains occurrences of adjacent terms as you have suggested. See: http://wiki.apache.org/solr/ExtendedDisMax -- Jack Krupansky -Original Message- From: Pranav Prakash Sent: Wednesday, August 01, 2012 1:21 PM To: solr-user@lucene.apache.org Subject: Exact match on few fields, fuzzy on others Hi Folks, I am using Solr 3.4 and my document schema has attributes - title, transcript, author_name. Presently, I am using DisMax to search for a user query across transcript. I would also like to do an exact search on author_name so that for a query Albert Einstein, I would want to get all the documents which contain Albert or Einstein in transcript and also those documents which have author_name exactly as 'Albert Einstein'. Can we do this by dismax query parser? The schema for both the fields are below: fieldType name=text_commongrams class=solr.TextField analyzer charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.StandardTokenizerFactory / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.CommonGramsFilterFactory words=stopwords_en.txt ignoreCase=true / filter class=solr.StopFilterFactory words=stopwords_en.txt ignoreCase=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / /analyzer /fieldType fieldType name=text_standard class=solr.TextField analyzer charFilter class=solr.HTMLStripCharFilterFactory / tokenizer class=solr.StandardTokenizerFactory / filter class=solr.TrimFilterFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.StopFilterFactory words=stopwords_en.txt ignoreCase=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 preserveOriginal=1 / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=false / filter class=solr.RemoveDuplicatesTokenFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType field name=titletype=text_commongrams indexed=true stored=true multiValued=false / field name=author_name type=text_standard indexed=true stored=false / -- *Pranav Prakash* temet nosce
Re: Memory leak?? with CloseableThreadLocal with use of Snowball Filter
Thanks Robert for these inputs. Since we do not really Snowball analyzer for this field, we would not use it for now. If this still does not address our issue, we would tweak thread pool as per eks dev suggestion - I am bit hesitant to do this change yet as we would be reducing thread pool which can adversely impact our throughput If Snowball Filter is being optimized for Solr 4 beta then it would be great for us. If you have already filed a JIRA for this then please let me know and I would like to follow it Thanks again Saroj On Wed, Aug 1, 2012 at 8:37 AM, Robert Muir rcm...@gmail.com wrote: On Tue, Jul 31, 2012 at 2:34 PM, roz dev rozde...@gmail.com wrote: Hi All I am using Solr 4 from trunk and using it with Tomcat 6. I am noticing that when we are indexing lots of data with 16 concurrent threads, Heap grows continuously. It remains high and ultimately most of the stuff ends up being moved to Old Gen. Eventually, Old Gen also fills up and we start getting into excessive GC problem. Hi: I don't claim to know anything about how tomcat manages threads, but really you shouldnt have all these objects. In general snowball stemmers should be reused per-thread-per-field. But if you have a lot of fields*threads, especially if there really is high thread churn on tomcat, then this could be bad with snowball: see eks dev's comment on https://issues.apache.org/jira/browse/LUCENE-3841 I think it would be useful to see if you can tune tomcat's threadpool as he describes. separately: Snowball stemmers are currently really ram-expensive for stupid reasons. each one creates a ton of Among objects, e.g. an EnglishStemmer today is about 8KB. I'll regenerate these and open a JIRA issue: as the snowball code generator in their svn was improved recently and each one now takes about 64 bytes instead (the Among's are static and reused). Still this wont really solve your problem, because the analysis chain could have other heavy parts in initialization, but it seems good to fix. As a workaround until then you can also just use the good old PorterStemmer (PorterStemFilterFactory in solr). Its not exactly the same as using Snowball(English) but its pretty close and also much faster. -- lucidimagination.com