Multiple schemas in the same SolrCloud ?
Hi all, I want to use the multiple schemas in the same solrCloud, is it allowed? If it is allowed,how? These schemas may have no relation. Thank You. Dai. -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279.html Sent from the Solr - User mailing list archive at Nabble.com.
synonyms and term position
Hi: I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a problem using SynonymFilterFactory within the process chain SynonymFilterFactory, StopFilterFactory . I have configured synonyms.txt to expand the word AIO as: all-in-one. Well, when using solr 1.4 I get the following result (term position) when analysing the string one aio two. Solr 1.4 after synonym: term position |1 | 2 |3 |4 |5 term text |one| all |in |one |two Solr 1.4 after stopfilter (in term is deleted and terms all and one are consecutive) term position |1 | 2 |4 |5 term text |one| all |one |two But when using solr4.4 I get: Solr 4.4 after synonym: term position |1 | 2 |3 |4 |3 term text |one| all |in |one |two Solr 4.4 after stop (in is deleted and the term two is now close to all : term position |1 | 2 |4 |3 term text |one| all |one |two The problem is that the second word two is in position 3 in solr4.4 so when I try to search aio, in solr1.4 I get results, but find nothing using Solr4. Is there any option to configure solr4 that imitates solr1.4 behavior. Regards. Please, find attached the fieldtype configuration. fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer /fieldType
Re: Multiple schemas in the same SolrCloud ?
You can simply have multiple collections, each independent of each other on the schema but could run on the same instance/jvm if you want. On Wed, Oct 9, 2013 at 12:36 PM, xinwu xinwu0...@gmail.com wrote: Hi all, I want to use the multiple schemas in the same solrCloud, is it allowed? If it is allowed,how? These schemas may have no relation. Thank You. Dai. -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279.html Sent from the Solr - User mailing list archive at Nabble.com. -- Anshum Gupta http://www.anshumgupta.net
Re: dynamic field question
Hi David, A separate Solr document for each section is a good option if you also need to handle phrases, case, special characters, etc. within the title field. How do you map them to dynamic fields? E.g.: Appendix for cities, APPENDIX 1: Cities Regards, Aloke On Wed, Oct 9, 2013 at 9:45 AM, Jack Krupansky j...@basetechnology.comwrote: I'd suggest that each of your source document sections would be a distinct solr document. All of the sections could have a source document ID field to tie them together. Dynamic fields work best when used in moderation. Your use case seems like an excessive use of dynamic fields. -- Jack Krupansky -Original Message- From: Twomey, David Sent: Tuesday, October 08, 2013 6:59 PM To: solr-user@lucene.apache.org Subject: dynamic field question I am having trouble trying to return a particular dynamic field only instead of all dynamic fields. Imagine I have a document with an unknown number of sections. Each section can have a 'title' and a 'body' I have each section title and body as dynamic fields such as section_title_* and section_body_* Imagine that some documents contain a section that has a title=Appendix I want a query that will find all docs with that section and return just the Appendix section. I don't know how to return just that one section though I can copyField my dynamic field section_title_* into a static field called section_titles and query that for docs that contain the Appendix But I don't know how to only return that one dynamic field ?q=section_titles:Appendixfl=**section_body_* Any ideas? I can't seem to put a conditional in the fl parameter
Re: Multiple schemas in the same SolrCloud ?
I remember I must put the -Dbootstrap_confdir=/opt/Solr_home/collection1/conf -Dcollection.configName=solrConfig in the catalina.sh . Is it means that solrCloud must have one ,and only one, schema? -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094281.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrJ best pratices
I suggest you to look at here: http://wiki.apache.org/solr/Solrj?action=fullsearchcontext=180value=cloudsolrservertitlesearch=Titles#Using_with_SolrCloud 2013/10/9 Shawn Heisey s...@elyograg.org On 10/7/2013 3:08 PM, Mark wrote: Some specific questions: - When working with HttpSolrServer should we keep around instances for ever or should we create a singleton that can/should be used over and over? - Is there a way to change the collection after creating the server or do we need to create a new server for each collection? If at all possible, you should create your server object and use it for the life of your application. SolrJ is threadsafe. If there is any part of it that's not, the javadocs should say so - the SolrServer implementations definitely are. By using the word collection you are implying that you are using SolrCloud ... but earlier you said HttpSolrServer, which implies that you are NOT using SolrCloud. With HttpSolrServer, your base URL includes the core or collection name - http://server:port/solr/**corename; for example. Generally you will need one object for each core/collection, and another object for server-level things like CoreAdmin. With SolrCloud, you should be using CloudSolrServer instead, another implementation of SolrServer that is constantly aware of the SolrCloud clusterstate. With that object, you can use setDefaultCollection, and you can also add a collection parameter to each SolrQuery or other request object. Thanks, Shawn
Re: SolrCloud High Availability during indexing operation
Hi Saurabh, Your link does not work (it is broken). 2013/10/9 Saurabh Saxena ssax...@gopivotal.com Pastbin link http://pastebin.com/cnkXhz7A I am doing a bulk request. I am uploading 100 files, each file having 100 docs. -Saurabh On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com wrote: The attachment did not go through - try using pastebin.com or something. Are you adding docs with curl one at a time or in bulk per request. - Mark On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Repeated the experiments on local system. Single shard Solrcloud with a replica. Tried to index 10K docs. All the indexing operation were redirected to replica Solr node. While the document while getting indexed on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900 docs got indexed. If I repeat the experiment without shutting down the leader instance, all 10K docs get indexed. I am using curl to upload the docs, there was no curl error while uploading documents. Following error was there in replica log file. ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:test_collection slice:shard1 Attached replica log file. On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Sorry for the late reply. All the documents have unique id. If I repeat the experiment, the num of docs indexed changes (I guess it depends when I shutdown a particular shard). When I do the experiment without shutting down leader Shards, all 80k docs get indexed (which I think proves that all documents are valid). I need to dig the logs to find error message. Also, I am not tracking of curl return code, will run again and reply. Regards, Saurabh On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.com wrote: And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that can be taken from the client side to avoid such a situation. Regards, Saurabh Saxena -- Walter Underwood wun...@wunderwood.org
Re: synonyms and term position
Could you send screenshot of admin Analysis page when trying to analyze that words? 2013/10/9 Alvaro Cabrerizo topor...@gmail.com Hi: I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a problem using SynonymFilterFactory within the process chain SynonymFilterFactory, StopFilterFactory . I have configured synonyms.txt to expand the word AIO as: all-in-one. Well, when using solr 1.4 I get the following result (term position) when analysing the string one aio two. Solr 1.4 after synonym: term position |1 | 2 |3 |4 |5 term text |one| all |in |one |two Solr 1.4 after stopfilter (in term is deleted and terms all and one are consecutive) term position |1 | 2 |4 |5 term text |one| all |one |two But when using solr4.4 I get: Solr 4.4 after synonym: term position |1 | 2 |3 |4 |3 term text |one| all |in |one |two Solr 4.4 after stop (in is deleted and the term two is now close to all : term position |1 | 2 |4 |3 term text |one| all |one |two The problem is that the second word two is in position 3 in solr4.4 so when I try to search aio, in solr1.4 I get results, but find nothing using Solr4. Is there any option to configure solr4 that imitates solr1.4 behavior. Regards. Please, find attached the fieldtype configuration. fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer /fieldType
Re: Multiple schemas in the same SolrCloud ?
You can have more information from here: https://cwiki.apache.org/confluence/display/solr/Using+ZooKeeper+to+Manage+Configuration+Files 2013/10/9 xinwu xinwu0...@gmail.com I remember I must put the -Dbootstrap_confdir=/opt/Solr_home/collection1/conf -Dcollection.configName=solrConfig in the catalina.sh . Is it means that solrCloud must have one ,and only one, schema? -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-schemas-in-the-same-SolrCloud-tp4094279p4094281.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: synonyms and term position
Sure, Find attached the screenshots with almost all the analysis, (dont worry about the lowercase and the porter stemmer) Regards. On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI furkankam...@gmail.comwrote: Could you send screenshot of admin Analysis page when trying to analyze that words? 2013/10/9 Alvaro Cabrerizo topor...@gmail.com Hi: I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a problem using SynonymFilterFactory within the process chain SynonymFilterFactory, StopFilterFactory . I have configured synonyms.txt to expand the word AIO as: all-in-one. Well, when using solr 1.4 I get the following result (term position) when analysing the string one aio two. Solr 1.4 after synonym: term position |1 | 2 |3 |4 |5 term text |one| all |in |one |two Solr 1.4 after stopfilter (in term is deleted and terms all and one are consecutive) term position |1 | 2 |4 |5 term text |one| all |one |two But when using solr4.4 I get: Solr 4.4 after synonym: term position |1 | 2 |3 |4 |3 term text |one| all |in |one |two Solr 4.4 after stop (in is deleted and the term two is now close to all : term position |1 | 2 |4 |3 term text |one| all |one |two The problem is that the second word two is in position 3 in solr4.4 so when I try to search aio, in solr1.4 I get results, but find nothing using Solr4. Is there any option to configure solr4 that imitates solr1.4 behavior. Regards. Please, find attached the fieldtype configuration. fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer /fieldType
Re: no such field error:smaller big block size details while indexing doc files
I will try using solrj.Thanks. but I tried to index .docx file I am getting some different error: SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:59) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) ... 16 more I read this solution(http://stackoverflow.com/questions/14696371/how-to-extract-the-text-of-a-ppt-file-with-tika),which says removal of jars solves errors,but there are no such mentioned jars in my classpath. Is it that,Jars may cause the issue? Thank You. On Wednesday, October 9, 2013 12:54 PM, sweety shinde sweetyshind...@yahoo.com wrote: I will try using solrJ. Now I tried indexing .docx files and I get some different error,logs are: SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at
Re: no such field error:smaller big block size details while indexing doc files
I will try using solrJ. Now I tried indexing .docx files and I get some different error,logs are: SEVERE: null:java.lang.RuntimeException: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.solr.servlet.SolrDispatchFilter.sendError(SolrDispatchFilter.java:651) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:364) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:141) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:928) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:539) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:298) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Caused by: java.lang.VerifyError: (class: org/apache/poi/extractor/ExtractorFactory, method: createExtractor signature: (Lorg/apache/poi/poifs/filesystem/DirectoryNode;)Lorg/apache/poi/POITextExtractor;) Wrong return type in function at org.apache.tika.parser.microsoft.ooxml.OOXMLExtractorFactory.parse(OOXMLExtractorFactory.java:59) at org.apache.tika.parser.microsoft.ooxml.OOXMLParser.parse(OOXMLParser.java:82) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:219) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1797) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:637) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:343) ... 16 more But does the jars cause these errors? Because I read one solution which said removal of few jars in classpath may solve the errors,but those jars are not present in my classpath.(the link to solution :http://stackoverflow.com/questions/14696371/how-to-extract-the-text-of-a-ppt-file-with-tika) Thank You. On Wednesday, October 9, 2013 6:05 AM, Erick Erickson [via Lucene] ml-node+s472066n4094231...@n3.nabble.com wrote: Hmmm, that is odd, the glob dynamicField should pick this up. Not quite sure what's going on. You an parse the file via Tika yourself and look at what's in there, it's a relatively simple SolrJ program, here's a sample: http://searchhub.org/2012/02/14/indexing-with-solrj/ Best, Erick On Tue, Oct 8, 2013 at 4:15 PM, sweety [hidden email] wrote: This my new schema.xml: schema name=documents fields field name=id type=string indexed=true stored=true required=true multiValued=false/ field name=author type=string indexed=true stored=true multiValued=true/ field name=comments type=text indexed=true stored=true multiValued=false/ field name=keywords type=text indexed=true stored=true multiValued=false/ field name=contents type=text indexed=true stored=true multiValued=false/ field name=title type=text indexed=true stored=true multiValued=false/ field name=revision_number type=string indexed=true stored=true multiValued=false/ field name=_version_ type=long indexed=true stored=true multiValued=false/ dynamicField name=ignored_* type=string indexed=false stored=true multiValued=true/ dynamicField name=* type=ignored multiValued=true / copyfield source=id dest=text / copyfield source=author dest=text / /fields types fieldtype name=ignored stored=false indexed=false class=solr.StrField / fieldType name=integer class=solr.IntField / fieldType name=long class=solr.LongField / fieldType name=string class=solr.StrField / fieldType name=text
Re: synonyms and term position
Does two has a synonym of in and one? 2013/10/9 Furkan KAMACI furkankam...@gmail.com Does two has a synonym of in and one? 2013/10/9 Alvaro Cabrerizo topor...@gmail.com Sure, Find attached the screenshots with almost all the analysis, (dont worry about the lowercase and the porter stemmer) Regards. On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI furkankam...@gmail.comwrote: Could you send screenshot of admin Analysis page when trying to analyze that words? 2013/10/9 Alvaro Cabrerizo topor...@gmail.com Hi: I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a problem using SynonymFilterFactory within the process chain SynonymFilterFactory, StopFilterFactory . I have configured synonyms.txt to expand the word AIO as: all-in-one. Well, when using solr 1.4 I get the following result (term position) when analysing the string one aio two. Solr 1.4 after synonym: term position |1 | 2 |3 |4 |5 term text |one| all |in |one |two Solr 1.4 after stopfilter (in term is deleted and terms all and one are consecutive) term position |1 | 2 |4 |5 term text |one| all |one |two But when using solr4.4 I get: Solr 4.4 after synonym: term position |1 | 2 |3 |4 |3 term text |one| all |in |one |two Solr 4.4 after stop (in is deleted and the term two is now close to all : term position |1 | 2 |4 |3 term text |one| all |one |two The problem is that the second word two is in position 3 in solr4.4 so when I try to search aio, in solr1.4 I get results, but find nothing using Solr4. Is there any option to configure solr4 that imitates solr1.4 behavior. Regards. Please, find attached the fieldtype configuration. fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer /fieldType
Re: synonyms and term position
No, it has no synonyms. On Wed, Oct 9, 2013 at 10:48 AM, Furkan KAMACI furkankam...@gmail.comwrote: Does two has a synonym of in and one? 2013/10/9 Furkan KAMACI furkankam...@gmail.com Does two has a synonym of in and one? 2013/10/9 Alvaro Cabrerizo topor...@gmail.com Sure, Find attached the screenshots with almost all the analysis, (dont worry about the lowercase and the porter stemmer) Regards. On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI furkankam...@gmail.comwrote: Could you send screenshot of admin Analysis page when trying to analyze that words? 2013/10/9 Alvaro Cabrerizo topor...@gmail.com Hi: I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a problem using SynonymFilterFactory within the process chain SynonymFilterFactory, StopFilterFactory . I have configured synonyms.txt to expand the word AIO as: all-in-one. Well, when using solr 1.4 I get the following result (term position) when analysing the string one aio two. Solr 1.4 after synonym: term position |1 | 2 |3 |4 |5 term text |one| all |in |one |two Solr 1.4 after stopfilter (in term is deleted and terms all and one are consecutive) term position |1 | 2 |4 |5 term text |one| all |one |two But when using solr4.4 I get: Solr 4.4 after synonym: term position |1 | 2 |3 |4 |3 term text |one| all |in |one |two Solr 4.4 after stop (in is deleted and the term two is now close to all : term position |1 | 2 |4 |3 term text |one| all |one |two The problem is that the second word two is in position 3 in solr4.4 so when I try to search aio, in solr1.4 I get results, but find nothing using Solr4. Is there any option to configure solr4 that imitates solr1.4 behavior. Regards. Please, find attached the fieldtype configuration. fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer /fieldType
Re: synonyms and term position
The synonyms.txt has defined the next associations defined. AIO=All in one aio=all-in-one Regards. On Wed, Oct 9, 2013 at 11:05 AM, Alvaro Cabrerizo topor...@gmail.comwrote: No, it has no synonyms. On Wed, Oct 9, 2013 at 10:48 AM, Furkan KAMACI furkankam...@gmail.comwrote: Does two has a synonym of in and one? 2013/10/9 Furkan KAMACI furkankam...@gmail.com Does two has a synonym of in and one? 2013/10/9 Alvaro Cabrerizo topor...@gmail.com Sure, Find attached the screenshots with almost all the analysis, (dont worry about the lowercase and the porter stemmer) Regards. On Wed, Oct 9, 2013 at 10:17 AM, Furkan KAMACI furkankam...@gmail.comwrote: Could you send screenshot of admin Analysis page when trying to analyze that words? 2013/10/9 Alvaro Cabrerizo topor...@gmail.com Hi: I'm involved in a process o upgrade solr from 1.4 to 4.4 and I'm having a problem using SynonymFilterFactory within the process chain SynonymFilterFactory, StopFilterFactory . I have configured synonyms.txt to expand the word AIO as: all-in-one. Well, when using solr 1.4 I get the following result (term position) when analysing the string one aio two. Solr 1.4 after synonym: term position |1 | 2 |3 |4 |5 term text |one| all |in |one |two Solr 1.4 after stopfilter (in term is deleted and terms all and one are consecutive) term position |1 | 2 |4 |5 term text |one| all |one |two But when using solr4.4 I get: Solr 4.4 after synonym: term position |1 | 2 |3 |4 |3 term text |one| all |in |one |two Solr 4.4 after stop (in is deleted and the term two is now close to all : term position |1 | 2 |4 |3 term text |one| all |one |two The problem is that the second word two is in position 3 in solr4.4 so when I try to search aio, in solr1.4 I get results, but find nothing using Solr4. Is there any option to configure solr4 that imitates solr1.4 behavior. Regards. Please, find attached the fieldtype configuration. fieldType name=text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 / filter class=solr.LowerCaseFilterFactory / filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / /analyzer /fieldType
Collection API wrong configuration
I'm experimenting with SolrCloud using Solr 4.5.0 and the Collection API What i did was: 1. upload configuration to ZK zkcli.sh -cmd upconfig -zkhost 127.0.0.1:8993 -d solr/my_custom_collection/conf/ -n my_custom_collection 2. create a collection using the api: /admin/collections?action=CREATEname=my_custom_collectionnumShards=2replicationFactor=2maxShardsPerNode=2collection.configName=my_custom_config The outcome of these action seem to be that the collection cores don't use the my_custom_collection but the example configuration. Any idea why this is happening? -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-tp4094319.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamic field question
OK. Then the JSON returned would contain a lot of documents that are really sections. This would work fine for the use-case I mentioned but I also use the index for full-text search of the whole document. Therefore, I would need to parse the result JSON in a way that combines the solr docs returned in to one virtual doc based on source document ID. Is that correct? On 10/9/13 6:15 AM, Jack Krupansky j...@basetechnology.com wrote: I'd suggest that each of your source document sections would be a distinct solr document. All of the sections could have a source document ID field to tie them together. Dynamic fields work best when used in moderation. Your use case seems like an excessive use of dynamic fields. -- Jack Krupansky -Original Message- From: Twomey, David Sent: Tuesday, October 08, 2013 6:59 PM To: solr-user@lucene.apache.org Subject: dynamic field question I am having trouble trying to return a particular dynamic field only instead of all dynamic fields. Imagine I have a document with an unknown number of sections. Each section can have a 'title' and a 'body' I have each section title and body as dynamic fields such as section_title_* and section_body_* Imagine that some documents contain a section that has a title=Appendix I want a query that will find all docs with that section and return just the Appendix section. I don't know how to return just that one section though I can copyField my dynamic field section_title_* into a static field called section_titles and query that for docs that contain the Appendix But I don't know how to only return that one dynamic field ?q=section_titles:Appendixfl=section_body_* Any ideas? I can't seem to put a conditional in the fl parameter
Re: Collection API wrong configuration
Using Solr 4.4.0 the same scenarion behaves as expected. Can anyone else try this, to check if it this only happens with 4.5.0 and if so, is this a desired behaviour or a bug? -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-tp4094319p4094323.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Collection API wrong configuration
This may be a bug in 4.5 Another user has also reported this bug: https://issues.apache.org/jira/browse/SOLR-5307 On Wed, Oct 9, 2013 at 3:51 PM, maephisto my_sky...@yahoo.com wrote: Using Solr 4.4.0 the same scenarion behaves as expected. Can anyone else try this, to check if it this only happens with 4.5.0 and if so, is this a desired behaviour or a bug? -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-tp4094319p4094323.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Collection API wrong configuration
Works fine at my end. I use Solr 4.5.0 on Windows 7. I tried: zkcli.bat -cmd upconfig -zkhost localhost:9000 -d ..\solr\collection2\conf -n my_custom_collection java -Djetty.port=8001 -DzkHost=localhost:9000 -jar start.jar and finally http://localhost:8001/solr/admin/collections?action=CREATEname=my_custom_collectionnumShards=1replicationFactor=1maxShardsPerNode=1collection.configName=my_custom_collection If I open newly created core/shard I can see under Schema the modified schema file. Best regards, Primož From: maephisto my_sky...@yahoo.com To: solr-user@lucene.apache.org Date: 09.10.2013 11:57 Subject:Collection API wrong configuration I'm experimenting with SolrCloud using Solr 4.5.0 and the Collection API What i did was: 1. upload configuration to ZK zkcli.sh -cmd upconfig -zkhost 127.0.0.1:8993 -d solr/my_custom_collection/conf/ -n my_custom_collection 2. create a collection using the api: /admin/collections?action=CREATEname=my_custom_collectionnumShards=2replicationFactor=2maxShardsPerNode=2collection.configName=my_custom_config The outcome of these action seem to be that the collection cores don't use the my_custom_collection but the example configuration. Any idea why this is happening? -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-tp4094319.html Sent from the Solr - User mailing list archive at Nabble.com.
Find documents that are composed of % words
Is there a way that in Solr Query i find documents that is composed of n number of words. for example here is the list of words - Love - Ice - Cream - Sunny - I - To - A - On - Elephant - Balloon And a percentage such as: 80% Let’s assume you’re analyzing the text of the following sentence. “I love to eat ice cream on a sunny day” This sentence contains 10 words, and only 2 (Day and Eat) of them do not appear on the list. So the score for this sentence would be 80%. So this would be a valid search result. If the user had entered 90%, then this sentence would not be a valid result, since more than 10% aren’t on the list Shahzad -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Collection API wrong configuration
Yes, the problem described in the ticket is what I'm also confronting with. -- View this message in context: http://lucene.472066.n3.nabble.com/Collection-API-wrong-configuration-in-4-5-0-tp4094319p4094335.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamic field question
David, Yes. Document grouping (aka field collapsing) will help you here. It should also allow you to create a better search experience on the front end - it's often better to narrow down where in a large document a match is than give users a large doc and say: we know the match is in here somewhere, you just have to locate it now. Otis Solr ElasticSearch Support http://sematext.com/ On Oct 9, 2013 6:21 AM, Twomey, David david.two...@novartis.com wrote: OK. Then the JSON returned would contain a lot of documents that are really sections. This would work fine for the use-case I mentioned but I also use the index for full-text search of the whole document. Therefore, I would need to parse the result JSON in a way that combines the solr docs returned in to one virtual doc based on source document ID. Is that correct? On 10/9/13 6:15 AM, Jack Krupansky j...@basetechnology.com wrote: I'd suggest that each of your source document sections would be a distinct solr document. All of the sections could have a source document ID field to tie them together. Dynamic fields work best when used in moderation. Your use case seems like an excessive use of dynamic fields. -- Jack Krupansky -Original Message- From: Twomey, David Sent: Tuesday, October 08, 2013 6:59 PM To: solr-user@lucene.apache.org Subject: dynamic field question I am having trouble trying to return a particular dynamic field only instead of all dynamic fields. Imagine I have a document with an unknown number of sections. Each section can have a 'title' and a 'body' I have each section title and body as dynamic fields such as section_title_* and section_body_* Imagine that some documents contain a section that has a title=Appendix I want a query that will find all docs with that section and return just the Appendix section. I don't know how to return just that one section though I can copyField my dynamic field section_title_* into a static field called section_titles and query that for docs that contain the Appendix But I don't know how to only return that one dynamic field ?q=section_titles:Appendixfl=section_body_* Any ideas? I can't seem to put a conditional in the fl parameter
Re: Find documents that are composed of % words
Hi, You can take your words, combine some % of them with AND. Then take another set of them OR it with the previous set, and so on. Otis Solr ElasticSearch Support http://sematext.com/ On Oct 9, 2013 6:54 AM, shahzad73 shahzad...@yahoo.com wrote: Is there a way that in Solr Query i find documents that is composed of n number of words. for example here is the list of words - Love - Ice - Cream - Sunny - I - To - A - On - Elephant - Balloon And a percentage such as: 80% Let’s assume you’re analyzing the text of the following sentence. “I love to eat ice cream on a sunny day” This sentence contains 10 words, and only 2 (Day and Eat) of them do not appear on the list. So the score for this sentence would be 80%. So this would be a valid search result. If the user had entered 90%, then this sentence would not be a valid result, since more than 10% aren’t on the list Shahzad -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: dynamically adding core with auto-discovery in Solr 4.5
Jan: This worked for me if I do NOT have a core.properties at all in my new core. Personally I think the behavior in 4.4 was dangerous, what happens if you mis-type the command for instance? You could do Bad Things to the old core you were inadvertently re-creating. The core.properties file gets created in the new core as a results of the CREATE command. So go ahead and try it again without a core.properties file perhaps? The RELOAD command is intended to be used when you change the schema or solrconfig files and want the core indicated to start working with the new definitions. It relies on having a record of that core to work. When Solr starts up in discovery mode, it explores the directory tree and keeps an internal map of all the cores, transient, loaded etc. The reload then looks at that map and barfs if the core isn't there. Changing this seems like more work than reward, how would the code know where to look for the core to load? It would have to do a re-walk of the tree, or rely on instanceDir being an absolute path etc. Do-able but not worth it IMO. Best, Erick On Tue, Oct 8, 2013 at 4:38 PM, Jan Van Besien ja...@ngdata.com wrote: Hi, We are using auto discovery and have a use case where we want to be able to add cores dynamically, without restarting solr. In 4.4 we were able to - add a directory (e.g. core1) with an empty core.properties - call http://localhost:8983/solr/admin/cores?action=CREATEcore=core1name=core1instanceDir=%2Fsomewhere%2Fcore1 In 4.5 however this (the second step) fails, saying it cannot create a new core in that directory because another core is already defined there. From the documentation (http://wiki.apache.org/solr/CoreAdmin), I understand that since 4.3 we should actually do RELOAD. However, RELOAD results in this stacktrace: org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:673) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:172) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:655) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:246) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:322) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582) Caused by: org.apache.solr.common.SolrException: Unable to reload core: core1 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:936) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:691) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:671) ... 20 more Caused by: org.apache.solr.common.SolrException: No such core: core1 at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:642) ... 21 more Note that before I RELOAD, the core1 directory was created. Also note that next to the core1 directory, there is a core0 directory which has exactly the same content and is auto-discovered perfectly fine at startup. So... what should it be? Or am I missing something here? thanks in advance, Jan
Re: {soft}Commit and cache flusing
Tim: I think you're mis-interpreting. By replying to a post with the subject: {soft}Commit and cache flushing but going in a different direction, it's easy for people to think I'm not interested in that thread, I'll ignore it, thereby missing the fact that you're asking a somewhat different question that they might have information about. It's not about whether you're doing anything particularly wrong with the question. It's about making it easy for people to help. See http://people.apache.org/~hossman/#threadhijack Best, Erick On Tue, Oct 8, 2013 at 6:23 PM, Tim Vaillancourt t...@elementspace.com wrote: I have a genuine question with substance here. If anything this nonconstructive, rude response was to get noticed. Thanks for contributing to the discussion. Tim On 8 October 2013 05:31, Dmitry Kan solrexp...@gmail.com wrote: Tim, I suggest you open a new thread and not reply to this one to get noticed. Dmitry On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt t...@elementspace.com wrote: Is there a way to make autoCommit only commit if there are pending changes, ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher and wipe the caches)? Cheers, Tim On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote: right. We've got the autoHard commit configured only atm. The soft-commits are controlled on the client. It was just easier to implement the first version of our internal commit policy that will commit to all solr instances at once. This is where we have noticed the reported behavior. On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu wrote: if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Indeed. The easiest way to work around this is by disabling auto commits and only commit when you have to.
Re: How to share Schema between multicore on Solr 4.4
Shawn: Hmmm, I hadn't thought about that before. The shareSchema stuff is keyed off the absolute directory (and timestamp) of the schema.xml file associated with a core and is about sharing the internal object that holds the parsed schema. Do you know for sure if the fact that this is coming from ZK actually shares the schema object? 'Cause I've never looked to see and it would be a good thing to have in my head... Thanks! Erick On Tue, Oct 8, 2013 at 8:33 PM, Shawn Heisey s...@elyograg.org wrote: On 10/7/2013 6:02 AM, Dharmendra Jaiswal wrote: I am using Solr 4.4 version with SolrCloud on Windows machine. Somehow i am not able to share schema between multiple core. If you're in SolrCloud mode, then you already *are* sharing your schema. You are also sharing your configuration. Both of them are in zookeeper. All collections (and all shards within a collection) which use a given config name are using the same copy. Any copies of your config/schema that might be on your disk are *NOT* being used. If you are starting Solr with any bootstrap options, then the config set that is in zookeeper might be getting overwritten by whats on your disk when Solr restarts, but otherwise SolrCloud *only* uses zookeeper for config/schema. The bootstrap options are meant to be used once, and I actually prefer to get SolrCloud operational without using bootstrap options at all. Thanks, Shawn
Re: dynamically adding core with auto-discovery in Solr 4.5
On Wed, Oct 9, 2013 at 2:15 PM, Erick Erickson erickerick...@gmail.com wrote: This worked for me if I do NOT have a core.properties at all in my new core. Personally I think the behavior in 4.4 was dangerous, what happens if you mis-type the command for instance? You could do Bad Things to the old core you were inadvertently re-creating. Thanks, this works. I think the documentation could be updated to indicate that - auto-discovery is only at startup (that wasn't obvious to me) - creating cores after startup should be done on a config directory without a core.properties file Jan
Re: Find documents that are composed of % words
Hi Shahzad, Have you tried with the Minimum Should Match feature: http://wiki.apache.org/solr/ExtendedDisMax#mm_.28Minimum_.27Should.27_Match.29 Regards, Aloke On Wed, Oct 9, 2013 at 4:55 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, You can take your words, combine some % of them with AND. Then take another set of them OR it with the previous set, and so on. Otis Solr ElasticSearch Support http://sematext.com/ On Oct 9, 2013 6:54 AM, shahzad73 shahzad...@yahoo.com wrote: Is there a way that in Solr Query i find documents that is composed of n number of words. for example here is the list of words - Love - Ice - Cream - Sunny - I - To - A - On - Elephant - Balloon And a percentage such as: 80% Let’s assume you’re analyzing the text of the following sentence. “I love to eat ice cream on a sunny day” This sentence contains 10 words, and only 2 (Day and Eat) of them do not appear on the list. So the score for this sentence would be 80%. So this would be a valid search result. If the user had entered 90%, then this sentence would not be a valid result, since more than 10% aren’t on the list Shahzad -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264.html Sent from the Solr - User mailing list archive at Nabble.com.
Permisions didn't check when call discoverUnder
Hello. When in solr/home directory exists directory to which solr do not have rights, then solr failed to start with exception: 2108 [main] INFO org.apache.solr.core.CoresLocator - Looking for core definitions underneath /var/lib/solr 2109 [main] ERROR org.apache.solr.servlet.SolrDispatchFilter - Could not start Solr. Check solr/home property and the logs 2138 [main] ERROR org.apache.solr.core.SolrCore - null:java.lang.NullPointerException at org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:121) at org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:130) at org.apache.solr.core.CorePropertiesLocator.discover(CorePropertiesLocator.java:113) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:226) at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4488) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:516) at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at org.apache.catalina.startup.Catalina.start(Catalina.java:593) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) 2138 [main] INFO org.apache.solr.servlet.SolrDispatchFilter - SolrDispatchFilter.init() done For example: solr/home located on /var/lib/solr. /var/lib/solr is another file system, it has lost+found directory, and only root can read from this directory. As result solr can't to star. It is because discovering directory to which tomcat do not have rights. Yours faithfully.
Re: dynamically adding core with auto-discovery in Solr 4.5
If you create a Wiki login, I'll be happy to add you to the contributors list. It's always valuable to have fresh eyes update docs while the ambiguities are still fresh! Erick On Wed, Oct 9, 2013 at 8:37 AM, Jan Van Besien ja...@ngdata.com wrote: On Wed, Oct 9, 2013 at 2:15 PM, Erick Erickson erickerick...@gmail.com wrote: This worked for me if I do NOT have a core.properties at all in my new core. Personally I think the behavior in 4.4 was dangerous, what happens if you mis-type the command for instance? You could do Bad Things to the old core you were inadvertently re-creating. Thanks, this works. I think the documentation could be updated to indicate that - auto-discovery is only at startup (that wasn't obvious to me) - creating cores after startup should be done on a config directory without a core.properties file Jan
Re: Permisions didn't check when call discoverUnder
What do you think Solr should do in this case? If the process doesn't have permission to the dir, it can't write to it. You need to set the permissions, or the authority of the process that Solr is running as appropriately. Best, Erick On Wed, Oct 9, 2013 at 8:54 AM, Said Chavkin schav...@gmail.com wrote: Hello. When in solr/home directory exists directory to which solr do not have rights, then solr failed to start with exception: 2108 [main] INFO org.apache.solr.core.CoresLocator - Looking for core definitions underneath /var/lib/solr 2109 [main] ERROR org.apache.solr.servlet.SolrDispatchFilter - Could not start Solr. Check solr/home property and the logs 2138 [main] ERROR org.apache.solr.core.SolrCore - null:java.lang.NullPointerException at org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:121) at org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:130) at org.apache.solr.core.CorePropertiesLocator.discover(CorePropertiesLocator.java:113) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:226) at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4488) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:516) at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at org.apache.catalina.startup.Catalina.start(Catalina.java:593) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) 2138 [main] INFO org.apache.solr.servlet.SolrDispatchFilter - SolrDispatchFilter.init() done For example: solr/home located on /var/lib/solr. /var/lib/solr is another file system, it has lost+found directory, and only root can read from this directory. As result solr can't to star. It is because discovering directory to which tomcat do not have rights. Yours faithfully.
RE: Searching on (hyphenated/capitalized) word issue
Thank you Upayavira. I'm trying to figure out what will make Solr stem on multi in the word multicad so that any attempt to search on multicad, Multi-CAD or multiCAD will return results. The WordDelimiterFilterFactory helps with the case of multi followed by a dash or a capital letter, but I'm not sure how to get Solr to tokenize the word multi. Should I look at ngram configurations? Or is there a filter which promotes (rather than protects) words from being stemmed? (in other words, I could configure in a txt file that multi should be stemmed. Just to reiterate, I am not getting any results when I search for the word multicad, even though it appears many times in the text as multiCAD and Multi-CAD. Here is my configuration: analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Monday, September 30, 2013 1:45 PM To: solr-user@lucene.apache.org Subject: Re: Searching on (hyphenated/capitalized) word issue You need to look at your analysis chain. The stuff you're talking about there is all configurable. There's different tokenisers available to split your fields differently, then you might use the WordDelimiterFilterFactory to split existing tokens further (e.g. WiFi might become wi, fi and WiFi). So really, you need to craft your own analysis chain to fit the kind of data you are working with. Upayavira On Mon, Sep 30, 2013, at 06:50 PM, Van Tassell, Kristian wrote: I have a search term multi-CAD being issues on tokenized text. The problem is that you cannot get any search results when you type multicad unless you add a hyphen (multi-cad) or type multiCAD (omitting the hyphen, but correctly adding the CAPS into the spelling). However, for the similar but unhyphenated word AutoCAD, you can type autocad and get hits for AutoCAD, as you would expect. You can type auto-cad and get the same results. The query seems to get parsed as separate words (resulting in hits) for multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for multicad. In other words, the search terms become multi cad and auto cad for all cases except for when the term is multicad. I'm guessing this may be in part to auto being a more common word prefix, but I may be wrong. Can anyone provide some clarity (and maybe point me towards a potential solution)? Thanks in advance! Kristian Van Tassell Siemens Industry Sector Siemens Product Lifecycle Management Software Inc. 5939 Rice Creek Parkway Shoreview, MN 55126 United States Tel. :+1 (651) 855-6194 Fax :+1 (651) 855-6280 kristian.vantass...@siemens.com kristian.vantass...@siemens.com%20 www.siemens.com/plm
Re: Permisions didn't check when call discoverUnder
I'm not sure, may be solr should skip inaccessible directory. Because it is standard rule to place service on separate filesystem. On the other hand it is possible to place solr/home not on the top of mounted fs. Anyway it would be better if error message was more clearly. 2013/10/9 Erick Erickson erickerick...@gmail.com: What do you think Solr should do in this case? If the process doesn't have permission to the dir, it can't write to it. You need to set the permissions, or the authority of the process that Solr is running as appropriately. Best, Erick On Wed, Oct 9, 2013 at 8:54 AM, Said Chavkin schav...@gmail.com wrote: Hello. When in solr/home directory exists directory to which solr do not have rights, then solr failed to start with exception: 2108 [main] INFO org.apache.solr.core.CoresLocator - Looking for core definitions underneath /var/lib/solr 2109 [main] ERROR org.apache.solr.servlet.SolrDispatchFilter - Could not start Solr. Check solr/home property and the logs 2138 [main] ERROR org.apache.solr.core.SolrCore - null:java.lang.NullPointerException at org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:121) at org.apache.solr.core.CorePropertiesLocator.discoverUnder(CorePropertiesLocator.java:130) at org.apache.solr.core.CorePropertiesLocator.discover(CorePropertiesLocator.java:113) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:226) at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:115) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3838) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4488) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:791) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:771) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:526) at org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:637) at org.apache.catalina.startup.HostConfig.deployDescriptors(HostConfig.java:563) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:498) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1277) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:321) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:119) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1053) at org.apache.catalina.core.StandardHost.start(StandardHost.java:722) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1045) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:443) at org.apache.catalina.core.StandardService.start(StandardService.java:516) at org.apache.catalina.core.StandardServer.start(StandardServer.java:710) at org.apache.catalina.startup.Catalina.start(Catalina.java:593) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289) at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414) 2138 [main] INFO org.apache.solr.servlet.SolrDispatchFilter - SolrDispatchFilter.init() done For example: solr/home located on /var/lib/solr. /var/lib/solr is another file system, it has lost+found directory, and only root can read from this directory. As result solr can't to star. It is because discovering directory to which tomcat do not have rights. Yours faithfully.
Re: Shard split issue
I opened https://issues.apache.org/jira/browse/SOLR-5324 On Mon, Oct 7, 2013 at 2:20 PM, Yago Riveiro yago.rive...@gmail.com wrote: If the replica has 20G must probably the recovery will take more than 120 seconds. In my case I have ssd's and 120 it's not enough. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Monday, October 7, 2013 at 9:19 AM, Shalin Shekhar Mangar wrote: I think what is happening here is that the sub shard replicas are taking time to recover. We use a core admin command to wait for the replicas to become active before the shard states are switched. The timeout value for that command is just 120 seconds. We should wait for more than that. I'll open an issue. On Mon, Oct 7, 2013 at 2:47 AM, Yago Riveiro yago.rive...@gmail.com(mailto: yago.rive...@gmail.com) wrote: Seems the issue occurs when the shard has more than one replica. I unload all replicas of the shard (less 1 to do the split) and the SPLITSHARD finished as expected, the parent went to inactive and the children active. If the parent has more than 1 replica, the process apparently is finish, the total number of documents of children are the same of the parent, the problem is that the parent never goes to inactive state and the children are stuck in construction state. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Sunday, October 6, 2013 at 12:23 AM, Yago Riveiro wrote: I can attach the full log of the process if you want. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Sunday, October 6, 2013 at 12:12 AM, Yago Riveiro wrote: The error in log are: ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: splitshard the collection time out:300s ERROR - 2013-10-05 21:06:22.997; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: splitshard the collection time out:300s INFO - 2013-10-05 22:48:54.083; org.apache.solr.cloud.OverseerCollectionProcessor; Overseer Collection Processor: Message id:/overseer/collection-queue-work/qn-000138 complete, response:{success={null={responseHeader={status=0,QTime=1901},core=statistics-13_shard17_0_replica1},null={responseHeader={status=0,QTime=1903},core=statistics-13_shard17_1_replica1},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=2000}},null={responseHeader={status=0,QTime=6324147}},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_1_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=0},core=statistics-13_shard17_0_replica1,status=EMPTY_BUFFER},null={responseHeader={status=0,QTime=1127},core=statistics-13_shard17_0_replica2},null={responseHeader={status=0,QTime=2109},core=statistics-13_shard17_1_replica2}},failure={null=org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:I was asked to wait on state active for 192.168. 20.105:8983_solr but I still do not see the requested state. I see state: recovering live:true},Operation splitshard caused exception:=org.apache.solr.common.SolrException: SPLTSHARD failed to create subshard replicas or timed out waiting for them to come up,exception={msg=SPLTSHARD failed to create subshard replicas or timed out waiting for them to come up,rspCode=500}} -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Saturday, October 5, 2013 at 5:03 PM, Yago Riveiro wrote: I don't have the log, the rotation log file is configured to only 5 files with a small size, I will reconfigured to a high value and retry the split again. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Saturday, October 5, 2013 at 4:54 PM, Shalin Shekhar Mangar wrote: On Sat, Oct 5, 2013 at 8:37 PM, Yago Riveiro yago.rive...@gmail.com (mailto:yago.rive...@gmail.com) wrote: How I can see the logs of the parent? They are stored on solr.log? Yes. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: Searching on (hyphenated/capitalized) word issue
If you have that word to index: multicad and if you want to get result when you search that: multi you can use ngram filter. However you should consider pros and cons of using Ngram Filter. If you use ngrams you may find multicad from multi but your index size will be much more bigger. I suggest you to look at here: http://docs.lucidworks.com/display/solr/Tokenizers 2013/10/9 Van Tassell, Kristian kristian.vantass...@siemens.com Thank you Upayavira. I'm trying to figure out what will make Solr stem on multi in the word multicad so that any attempt to search on multicad, Multi-CAD or multiCAD will return results. The WordDelimiterFilterFactory helps with the case of multi followed by a dash or a capital letter, but I'm not sure how to get Solr to tokenize the word multi. Should I look at ngram configurations? Or is there a filter which promotes (rather than protects) words from being stemmed? (in other words, I could configure in a txt file that multi should be stemmed. Just to reiterate, I am not getting any results when I search for the word multicad, even though it appears many times in the text as multiCAD and Multi-CAD. Here is my configuration: analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Monday, September 30, 2013 1:45 PM To: solr-user@lucene.apache.org Subject: Re: Searching on (hyphenated/capitalized) word issue You need to look at your analysis chain. The stuff you're talking about there is all configurable. There's different tokenisers available to split your fields differently, then you might use the WordDelimiterFilterFactory to split existing tokens further (e.g. WiFi might become wi, fi and WiFi). So really, you need to craft your own analysis chain to fit the kind of data you are working with. Upayavira On Mon, Sep 30, 2013, at 06:50 PM, Van Tassell, Kristian wrote: I have a search term multi-CAD being issues on tokenized text. The problem is that you cannot get any search results when you type multicad unless you add a hyphen (multi-cad) or type multiCAD (omitting the hyphen, but correctly adding the CAPS into the spelling). However, for the similar but unhyphenated word AutoCAD, you can type autocad and get hits for AutoCAD, as you would expect. You can type auto-cad and get the same results. The query seems to get parsed as separate words (resulting in hits) for multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for multicad. In other words, the search terms become multi cad and auto cad for all cases except for when the term is multicad. I'm guessing this may be in part to auto being a more common word prefix, but I may be wrong. Can anyone provide some clarity (and maybe point me towards a potential solution)? Thanks in advance! Kristian Van Tassell Siemens Industry Sector Siemens Product Lifecycle Management Software Inc. 5939 Rice Creek Parkway Shoreview, MN 55126 United States Tel. :+1 (651) 855-6194 Fax :+1 (651) 855-6280 kristian.vantass...@siemens.com kristian.vantass...@siemens.com%20 www.siemens.com/plm
Update existing documents when using ExtractingRequestHandler?
Hi, In a content management system I have a document and an attachment. The document contains the meta data and the attachment the actual data. I would like to combine data of both in one Solr document. I have thought of several options: 1. Using ExtractingRequestHandler I would extract the data (extractOnly) and combine it with the meta data and send it to Solr. But this might be inefficient and increase the network traffic. 2. Seperate Tika installation and use that to extract and send the data to Solr. This would stress an already busy web server. 3. First upload the file using ExtractingRequestHandler, then use atomic updates to add the other fields. Or is there another way? First add the meta data and later use the ExtractingRequestHandler to add the file contents? Cheers, Jeroen -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: run filter queries after post filter
Hey, so the post filter logs the number of ids that it receives. With the above filter having cost=200, the post filter should have received the same number of ids as before ( when the filter was not present ). But that does not seem to be the case...with the filter query on the index, the number of ids that the post filter is receiving reduces. Thanks, Rohit On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, seems like it should. What's our evidence that it isn't working? Best, Erick On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com wrote: Hey, I am using solr 4.0 with my own PostFilter implementation which is executed after the normal solr query is done. This filter has a cost of 100. Is it possible to run filter queries on the index after the execution of the post filter? I tried adding the below line to the url but it did not seem to work: fq={!cache=false cost=200}field:value Thanks, Rohit
Re: Find documents that are composed of % words
Please help me formulate the query that will be easy or do i have to build a custom filter for this ? Shahzad -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094372.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Find documents that are composed of % words
my client has a strange requirement, he will give a list of 500 words and then set a percentage like 80% now he want to find those pages or documents which consist of the only those 80% of 500 and only 20% unknown. like we have this document word1 word2 word3 word4 and he give the list word1 word2 word3 and set the accuracy to 75% the above doc will meet the criteria because no 1 it matches all words and only 25% words are unknow from the list of searches. here is another way to say this if 500 words are provided in search then All 500 words words must exist in the document and unknow words should be only 20% if accracy is 80% -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094369.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to share Schema between multicore on Solr 4.4
On 10/9/2013 6:24 AM, Erick Erickson wrote: Hmmm, I hadn't thought about that before. The shareSchema stuff is keyed off the absolute directory (and timestamp) of the schema.xml file associated with a core and is about sharing the internal object that holds the parsed schema. Do you know for sure if the fact that this is coming from ZK actually shares the schema object? 'Cause I've never looked to see and it would be a good thing to have in my head... With SolrCloud, I have no idea whether the actual internal objects are shared. Just now I tried to figure that out from the code, but I don't already have an understanding of how that code works, and a quick glance isn't enough to gain that knowledge.I can guarantee that you have a much deeper understanding of those internals than I do! My comments were to indicate that SolrCloud creates a situation where the config/schema are shared in the sense that there's only one canonical copy. Thanks, Shawn
Solr's Filtering approaches
Hi All, I have an issue in handling filters for one of our requirements and liked to get suggestion for the best approaches. *Use Case:* 1. We have List of groups and the number of groups can increase upto 1 million. Currently we have almost 90 thousand groups in the solr search system. 2. Just before the user hits a search, He has options to select the no. of groups he want to retrieve. [the distinct list of these group Names for display are retrieved from other solr index that has more information about groups] *3.User Operation:** * Say if user selected group 1A - group 1A. and searches for key:cancer. The current approach I was thinking is : get search results and filter query by groupids' list selected by user. But my concern is When these groups list is increasing to 50k unique Ids, This can cause lot of delay in getting search results. So wanted to know whether there are different filtering ways that I can try for? I was thinking of one more approach as suggested by my colleague to do - intersection. - Get the groupIds' selected by user. Get the list of groupId's from search results, Perform intersection of both and then get the entire result set of only those groupid that intersected. Is this better way? Can I use any cache technique in this case? - David.
Re: Multiple schemas in the same SolrCloud ?
On 10/9/2013 1:17 AM, xinwu wrote: I remember I must put the -Dbootstrap_confdir=/opt/Solr_home/collection1/conf -Dcollection.configName=solrConfig in the catalina.sh . Is it means that solrCloud must have one ,and only one, schema? Those bootstrap options are intended to be used ONCE, and on only one of your Solr instances, not all of them. What that does is take the configuration in the confdir and upload it to zookeeper, giving it the name you chose. You can have many configurations with different names in zookeeper. Each collection is associated with a config name. A better way than the bootstrap options is the zkcli script in cloud-scripts on the example. The upconfig command can be used to upload or change your configurations. http://wiki.apache.org/solr/SolrCloud#Command_Line_Util Thanks, Shawn
Re: SolrJ best pratices
Thanks for the clarification. In Solr Cloud just use 1 connection. In non-cloud environments you will need one per core. On Oct 8, 2013, at 5:58 PM, Shawn Heisey s...@elyograg.org wrote: On 10/7/2013 3:08 PM, Mark wrote: Some specific questions: - When working with HttpSolrServer should we keep around instances for ever or should we create a singleton that can/should be used over and over? - Is there a way to change the collection after creating the server or do we need to create a new server for each collection? If at all possible, you should create your server object and use it for the life of your application. SolrJ is threadsafe. If there is any part of it that's not, the javadocs should say so - the SolrServer implementations definitely are. By using the word collection you are implying that you are using SolrCloud ... but earlier you said HttpSolrServer, which implies that you are NOT using SolrCloud. With HttpSolrServer, your base URL includes the core or collection name - http://server:port/solr/corename; for example. Generally you will need one object for each core/collection, and another object for server-level things like CoreAdmin. With SolrCloud, you should be using CloudSolrServer instead, another implementation of SolrServer that is constantly aware of the SolrCloud clusterstate. With that object, you can use setDefaultCollection, and you can also add a collection parameter to each SolrQuery or other request object. Thanks, Shawn
Re: Find documents that are composed of % words
Are you asking something like that: http://wiki.apache.org/solr/TextProfileSignature 9 Ekim 2013 Çarşamba tarihinde shahzad73 shahzad...@yahoo.com adlı kullanıcı şöyle yazdı: Please help me formulate the query that will be easy or do i have to build a custom filter for this ? Shahzad -- View this message in context: http://lucene.472066.n3.nabble.com/Find-documents-that-are-composed-of-words-tp4094264p4094372.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: limiting deep pagination
On 10/8/13 6:51 PM, Peter Keegan wrote: Is there a way to configure Solr 'defaults/appends/invariants' such that the product of the 'start' and 'rows' parameters doesn't exceed a given value? This would be to prevent deep pagination. Or would this require a custom requestHandler? Peter Just wondering -- isn't it the sum that you should be concerned about rather than the product? Actually I think what we usually do is limit both independently, with slightly different concerns, since. eg start=1, rows=1000 causes memory problems if you have large fields in your results, where start=1000, rows=1 may not actually be a problem -Mike
Re: SolrCloud High Availability during indexing operation
@Furkan Pastebin link is working for me. Can you try again ? On Wed, Oct 9, 2013 at 1:15 AM, Furkan KAMACI furkankam...@gmail.comwrote: Hi Saurabh, Your link does not work (it is broken). 2013/10/9 Saurabh Saxena ssax...@gopivotal.com Pastbin link http://pastebin.com/cnkXhz7A I am doing a bulk request. I am uploading 100 files, each file having 100 docs. -Saurabh On Tue, Oct 8, 2013 at 7:39 PM, Mark Miller markrmil...@gmail.com wrote: The attachment did not go through - try using pastebin.com or something. Are you adding docs with curl one at a time or in bulk per request. - Mark On Oct 8, 2013, at 9:58 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Repeated the experiments on local system. Single shard Solrcloud with a replica. Tried to index 10K docs. All the indexing operation were redirected to replica Solr node. While the document while getting indexed on replica, I shutdown the leader Solr node. Out of 10K docs, only 9900 docs got indexed. If I repeat the experiment without shutting down the leader instance, all 10K docs get indexed. I am using curl to upload the docs, there was no curl error while uploading documents. Following error was there in replica log file. ERROR - 2013-10-08 16:10:32.662; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found, collection:test_collection slice:shard1 Attached replica log file. On Thu, Sep 26, 2013 at 7:15 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Sorry for the late reply. All the documents have unique id. If I repeat the experiment, the num of docs indexed changes (I guess it depends when I shutdown a particular shard). When I do the experiment without shutting down leader Shards, all 80k docs get indexed (which I think proves that all documents are valid). I need to dig the logs to find error message. Also, I am not tracking of curl return code, will run again and reply. Regards, Saurabh On Wed, Sep 25, 2013 at 3:01 AM, Erick Erickson erickerick...@gmail.com wrote: And do any of the documents have the same uniqueKey, which is usually called id? Subsequent adds of docs with the same uniqueKey replace the earlier one. It's not definitive because it changes as merges happen, old copies of docs that have been deleted or updated will be purged, but what does your admin page show for maxDoc? If it's more than numDocs then you have duplicate uniqueKeys. NOTE: if you optimize (which you usually shouldn't) then maxDoc and numDocs will be the same so if you test this don't optimize. Best, Erick On Tue, Sep 24, 2013 at 10:43 AM, Walter Underwood wun...@wunderwood.org wrote: Did all of the curl update commands return success? Ane errors in the logs? wunder On Sep 24, 2013, at 6:40 AM, Otis Gospodnetic wrote: Is it possible that some of those 80K docs were simply not valid? e.g. had a wrong field, had a missing required field, anything like that? What happens if you clear this collection and just re-run the same indexing process and do everything else the same? Still some docs missing? Same number? And what if you take 1 document that you know is valid and index it 80K times, with a different ID, of course? Do you see 80K docs in the end? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Sep 24, 2013 at 2:45 AM, Saurabh Saxena ssax...@gopivotal.com wrote: Doc count did not change after I restarted the nodes. I am doing a single commit after all 80k docs. Using Solr 4.4. Regards, Saurabh On Mon, Sep 23, 2013 at 6:37 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Interesting. Did the doc count change after you started the nodes again? Can you tell us about commits? Which version? 4.5 will be out soon. Otis Solr ElasticSearch Support http://sematext.com/ On Sep 23, 2013 8:37 PM, Saurabh Saxena ssax...@gopivotal.com wrote: Hello, I am testing High Availability feature of SolrCloud. I am using the following setup - 8 linux hosts - 8 Shards - 1 leader, 1 replica / host - Using Curl for update operation I tried to index 80K documents on replicas (10K/replica in parallel). During indexing process, I stopped 4 Leader nodes. Once indexing is done, out of 80K docs only 79808 docs are indexed. Is this an expected behaviour ? In my opinion replica should take care of indexing if leader is down. If this is an expected behaviour, any steps that
Re: {soft}Commit and cache flusing
Apologies all. I think the suggestion that I was replying to get noticed is what erked me, otherwise I would have moved on. I'll follow this advice. Cheers, Tim On 9 October 2013 05:20, Erick Erickson erickerick...@gmail.com wrote: Tim: I think you're mis-interpreting. By replying to a post with the subject: {soft}Commit and cache flushing but going in a different direction, it's easy for people to think I'm not interested in that thread, I'll ignore it, thereby missing the fact that you're asking a somewhat different question that they might have information about. It's not about whether you're doing anything particularly wrong with the question. It's about making it easy for people to help. See http://people.apache.org/~hossman/#threadhijack Best, Erick On Tue, Oct 8, 2013 at 6:23 PM, Tim Vaillancourt t...@elementspace.com wrote: I have a genuine question with substance here. If anything this nonconstructive, rude response was to get noticed. Thanks for contributing to the discussion. Tim On 8 October 2013 05:31, Dmitry Kan solrexp...@gmail.com wrote: Tim, I suggest you open a new thread and not reply to this one to get noticed. Dmitry On Mon, Oct 7, 2013 at 9:44 PM, Tim Vaillancourt t...@elementspace.com wrote: Is there a way to make autoCommit only commit if there are pending changes, ie: if there are 0 adds pending commit, don't autoCommit (open-a-searcher and wipe the caches)? Cheers, Tim On 2 October 2013 00:52, Dmitry Kan solrexp...@gmail.com wrote: right. We've got the autoHard commit configured only atm. The soft-commits are controlled on the client. It was just easier to implement the first version of our internal commit policy that will commit to all solr instances at once. This is where we have noticed the reported behavior. On Wed, Oct 2, 2013 at 9:32 AM, Bram Van Dam bram.van...@intix.eu wrote: if there are no modifications to an index and a softCommit or hardCommit issued, then solr flushes the cache. Indeed. The easiest way to work around this is by disabling auto commits and only commit when you have to.
matching starts with only
My index contains documents which could be a single word or a short sentence which contains up to 4-5 words. I need to return documents, which starts with only from the searched pattern. in regex it would be [^my_query]. for example, for a docs: black beautiful black cat cat cat is black black cat and for the query: black only black and black cat should be returned. The text field I'm using is as follows: fieldType name=text_general_aa class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.NGramFilterFactory minGramSize=4 maxGramSize=15 side=front/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.NGramFilterFactory minGramSize=4 maxGramSize=15 side=front/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Solr version is 4.2 thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: matching starts with only
On 10/9/2013 12:57 PM, adm1n wrote: My index contains documents which could be a single word or a short sentence which contains up to 4-5 words. I need to return documents, which starts with only from the searched pattern. in regex it would be [^my_query]. for example, for a docs: black beautiful black cat cat cat is black black cat and for the query: black only black and black cat should be returned. The text field I'm using is as follows: fieldType name=text_general_aa class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.NGramFilterFactory minGramSize=4 maxGramSize=15 side=front/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.NGramFilterFactory minGramSize=4 maxGramSize=15 side=front/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Solr version is 4.2 thanks! The presence of either the whitespace tokenizer or the NGram filter make this impossible, because they both break the indexed value into smaller pieces. Together, they *really* break things up. Matching is done on a per-term basis, and these two components in your analysis chain ensure that black will be a term for all of those input documents, whether it appears at the beginning, middle, or end. If you set up a copyField to a new field whose fieldType uses the Keyword tokenizer (which treats the entire string as a single token) and the lowercase filter, you would be able use the regex support in Solr 4.x and have this as your query string: newfield:/^black/ Thanks, Shawn
Re: run filter queries after post filter
Ah, I think you're misunderstanding the nature of post-filters. Or I'm confused, which happens a lot! The whole point of post filters is that they're assumed to be expensive (think ACL calculation). So you want them to run on the fewest documents possible. So only docs that make it through the primary query _and_ all lower-cost filters will get to this post-filter. This means they can't be cached for instance, because they don't see (hopefully) very many docs. This is radically different than normal fq clauses, which are calculated on the entire corpus and can thus be cached. Best, Erick On Wed, Oct 9, 2013 at 11:59 AM, Rohit Harchandani rhar...@gmail.com wrote: Hey, so the post filter logs the number of ids that it receives. With the above filter having cost=200, the post filter should have received the same number of ids as before ( when the filter was not present ). But that does not seem to be the case...with the filter query on the index, the number of ids that the post filter is receiving reduces. Thanks, Rohit On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, seems like it should. What's our evidence that it isn't working? Best, Erick On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com wrote: Hey, I am using solr 4.0 with my own PostFilter implementation which is executed after the normal solr query is done. This filter has a cost of 100. Is it possible to run filter queries on the index after the execution of the post filter? I tried adding the below line to the url but it did not seem to work: fq={!cache=false cost=200}field:value Thanks, Rohit
Re: matching starts with only
Shawn Heisey-4: thanks for the quick response. Why this field have to be copyField? Couldn't it be a single field, for example: fieldType name=text_general_long class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=my_name type=text_general_long stored=true multiValued=false required=false/ thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094447.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to share Schema between multicore on Solr 4.4
bq: ...in the sense that there's only one canonical copy. Agreed, and as you say that copy is kept in ZooKeeper. And I pretty much guarantee that the internal solrconfig object is NOT shared. I doubt the schema object is shared, but it seems like it could be with some work. But the savings potential here is rather small unless you have a large number of cores. The LotsOfCores option is really, at this point, orthogonal to SolrCloud, I don't think (and we have some anecdotal evidence) that they don't play nice together Erick On Wed, Oct 9, 2013 at 12:17 PM, Shawn Heisey s...@elyograg.org wrote: On 10/9/2013 6:24 AM, Erick Erickson wrote: Hmmm, I hadn't thought about that before. The shareSchema stuff is keyed off the absolute directory (and timestamp) of the schema.xml file associated with a core and is about sharing the internal object that holds the parsed schema. Do you know for sure if the fact that this is coming from ZK actually shares the schema object? 'Cause I've never looked to see and it would be a good thing to have in my head... With SolrCloud, I have no idea whether the actual internal objects are shared. Just now I tried to figure that out from the code, but I don't already have an understanding of how that code works, and a quick glance isn't enough to gain that knowledge.I can guarantee that you have a much deeper understanding of those internals than I do! My comments were to indicate that SolrCloud creates a situation where the config/schema are shared in the sense that there's only one canonical copy. Thanks, Shawn
Re: matching starts with only
On 10/9/2013 2:16 PM, adm1n wrote: Why this field have to be copyField? Couldn't it be a single field, for I always assume that people already are using the existing field and type for other purposes. Offering advice without making that assumption will usually result in people making a change and then complaining that something else no longer works. If you don't need what you already have for something else, then you could change the type on the existing field with no problem. Thanks, Shawn
Re: Searching on (hyphenated/capitalized) word issue
The admin/analysis page is definitely your friend. On the surface, [catenateWords=1] in WDFF should mash the split up bits of multiCAD into multicad and you should be. I suspect that StandardTokenizerFactory is somehow getting into the mix here. Under any circumstance, the admin/analysis page should help. StandardTokenizerFactory, on a quick test, does split up multi-cad into separate tokens that then do NOT get concatenated... That doesn't explain not getting hits on multiCAD though when you search for multicad. Best, Erick On Wed, Oct 9, 2013 at 10:45 AM, Furkan KAMACI furkankam...@gmail.com wrote: If you have that word to index: multicad and if you want to get result when you search that: multi you can use ngram filter. However you should consider pros and cons of using Ngram Filter. If you use ngrams you may find multicad from multi but your index size will be much more bigger. I suggest you to look at here: http://docs.lucidworks.com/display/solr/Tokenizers 2013/10/9 Van Tassell, Kristian kristian.vantass...@siemens.com Thank you Upayavira. I'm trying to figure out what will make Solr stem on multi in the word multicad so that any attempt to search on multicad, Multi-CAD or multiCAD will return results. The WordDelimiterFilterFactory helps with the case of multi followed by a dash or a capital letter, but I'm not sure how to get Solr to tokenize the word multi. Should I look at ngram configurations? Or is there a filter which promotes (rather than protects) words from being stemmed? (in other words, I could configure in a txt file that multi should be stemmed. Just to reiterate, I am not getting any results when I search for the word multicad, even though it appears many times in the text as multiCAD and Multi-CAD. Here is my configuration: analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Monday, September 30, 2013 1:45 PM To: solr-user@lucene.apache.org Subject: Re: Searching on (hyphenated/capitalized) word issue You need to look at your analysis chain. The stuff you're talking about there is all configurable. There's different tokenisers available to split your fields differently, then you might use the WordDelimiterFilterFactory to split existing tokens further (e.g. WiFi might become wi, fi and WiFi). So really, you need to craft your own analysis chain to fit the kind of data you are working with. Upayavira On Mon, Sep 30, 2013, at 06:50 PM, Van Tassell, Kristian wrote: I have a search term multi-CAD being issues on tokenized text. The problem is that you cannot get any search results when you type multicad unless you add a hyphen (multi-cad) or type multiCAD (omitting the hyphen, but correctly adding the CAPS into the spelling). However, for the similar but unhyphenated word AutoCAD, you can type autocad and get hits for AutoCAD, as you would expect. You can type auto-cad and get the same results. The query seems to get parsed as separate words (resulting in hits) for multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for multicad. In other words, the search terms become multi cad and auto cad for all cases except for when the term is multicad. I'm guessing this may be in part to auto being a more common word prefix, but I may be wrong. Can anyone provide some clarity (and maybe point me towards a potential solution)? Thanks in advance! Kristian Van Tassell Siemens Industry Sector Siemens Product Lifecycle Management Software Inc. 5939 Rice Creek Parkway Shoreview, MN 55126 United States Tel. :+1 (651) 855-6194 Fax :+1 (651) 855-6280 kristian.vantass...@siemens.com kristian.vantass...@siemens.com%20 www.siemens.com/plm
Re: matching starts with only
search by starts with is something new I have to add, as well as the data I have to index for this purpose, so it's ok to create a new field. But once I added the following field type: fieldType name=text_general_long class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType And: field name=my_name type=text_general_long stored=true multiValued=false required=false/ indexing, and afterwards searching by my_name:/^black/ returns no results, while searching by my_name:black returns only black document. What am I missing? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094453.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: run filter queries after post filter
yes i get that. actually i should have explained in more detail. - i have a query which gets certain documents. - the post filter gets these matched documents and does some processing on them and filters the results. - but after this is done i need to apply another filter - which is why i gave a higher cost to it. the reason i need to do this is because the processing done by the post filter depends on the documents matching the query till that point. since the normal fq clause is also getting executed before the post filter (despite the cost), the final results are not accurate thanks Rohit On Wed, Oct 9, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.comwrote: Ah, I think you're misunderstanding the nature of post-filters. Or I'm confused, which happens a lot! The whole point of post filters is that they're assumed to be expensive (think ACL calculation). So you want them to run on the fewest documents possible. So only docs that make it through the primary query _and_ all lower-cost filters will get to this post-filter. This means they can't be cached for instance, because they don't see (hopefully) very many docs. This is radically different than normal fq clauses, which are calculated on the entire corpus and can thus be cached. Best, Erick On Wed, Oct 9, 2013 at 11:59 AM, Rohit Harchandani rhar...@gmail.com wrote: Hey, so the post filter logs the number of ids that it receives. With the above filter having cost=200, the post filter should have received the same number of ids as before ( when the filter was not present ). But that does not seem to be the case...with the filter query on the index, the number of ids that the post filter is receiving reduces. Thanks, Rohit On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, seems like it should. What's our evidence that it isn't working? Best, Erick On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com wrote: Hey, I am using solr 4.0 with my own PostFilter implementation which is executed after the normal solr query is done. This filter has a cost of 100. Is it possible to run filter queries on the index after the execution of the post filter? I tried adding the below line to the url but it did not seem to work: fq={!cache=false cost=200}field:value Thanks, Rohit
Re: Searching on (hyphenated/capitalized) word issue
It depends whether multicad is a special case, or whether you want micr to match the term microsoft. If it is a special case, you can use synonyms, so that multi and multicad are considered the same term. If it isn't a special case, then ngrams could work - your document would be indexed with: mul mult multi multic multica multicad all indexed at the same term position, allowing for any of those to match. Of course, that will make your index much larger. As Erick says, use the admin/analysis page to play with your analysis chains and see what they do to different inputs. Upayavira On Wed, Oct 9, 2013, at 09:30 PM, Erick Erickson wrote: The admin/analysis page is definitely your friend. On the surface, [catenateWords=1] in WDFF should mash the split up bits of multiCAD into multicad and you should be. I suspect that StandardTokenizerFactory is somehow getting into the mix here. Under any circumstance, the admin/analysis page should help. StandardTokenizerFactory, on a quick test, does split up multi-cad into separate tokens that then do NOT get concatenated... That doesn't explain not getting hits on multiCAD though when you search for multicad. Best, Erick On Wed, Oct 9, 2013 at 10:45 AM, Furkan KAMACI furkankam...@gmail.com wrote: If you have that word to index: multicad and if you want to get result when you search that: multi you can use ngram filter. However you should consider pros and cons of using Ngram Filter. If you use ngrams you may find multicad from multi but your index size will be much more bigger. I suggest you to look at here: http://docs.lucidworks.com/display/solr/Tokenizers 2013/10/9 Van Tassell, Kristian kristian.vantass...@siemens.com Thank you Upayavira. I'm trying to figure out what will make Solr stem on multi in the word multicad so that any attempt to search on multicad, Multi-CAD or multiCAD will return results. The WordDelimiterFilterFactory helps with the case of multi followed by a dash or a capital letter, but I'm not sure how to get Solr to tokenize the word multi. Should I look at ngram configurations? Or is there a filter which promotes (rather than protects) words from being stemmed? (in other words, I could configure in a txt file that multi should be stemmed. Just to reiterate, I am not getting any results when I search for the word multicad, even though it appears many times in the text as multiCAD and Multi-CAD. Here is my configuration: analyzer tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer -Original Message- From: Upayavira [mailto:u...@odoko.co.uk] Sent: Monday, September 30, 2013 1:45 PM To: solr-user@lucene.apache.org Subject: Re: Searching on (hyphenated/capitalized) word issue You need to look at your analysis chain. The stuff you're talking about there is all configurable. There's different tokenisers available to split your fields differently, then you might use the WordDelimiterFilterFactory to split existing tokens further (e.g. WiFi might become wi, fi and WiFi). So really, you need to craft your own analysis chain to fit the kind of data you are working with. Upayavira On Mon, Sep 30, 2013, at 06:50 PM, Van Tassell, Kristian wrote: I have a search term multi-CAD being issues on tokenized text. The problem is that you cannot get any search results when you type multicad unless you add a hyphen (multi-cad) or type multiCAD (omitting the hyphen, but correctly adding the CAPS into the spelling). However, for the similar but unhyphenated word AutoCAD, you can type autocad and get hits for AutoCAD, as you would expect. You can type auto-cad and get the same results. The query seems to get parsed as separate words (resulting in hits) for multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for multicad. In other words, the search terms become multi cad and auto cad for all cases except for when the term is multicad. I'm guessing this may be in part to auto being a more common word prefix, but I may be wrong. Can anyone provide some clarity (and maybe point me towards a potential solution)? Thanks in advance! Kristian Van Tassell Siemens Industry Sector Siemens Product Lifecycle Management Software Inc. 5939 Rice Creek Parkway Shoreview, MN 55126
Dynamically loading synonym dictionary for solr SynonymFilter
Hi, All of our synonyms are maintained in DB, we would like to fetch those synonym dynamically for query expansion (Not indexing time). Are there any code contribution? I saw some discussion years ago but without conclusion. Thanks a lot!
Re: run filter queries after post filter
Hi Rohit, The main problem is that if the query inside the filter does not have a PostFilter implementation then your post filter is silently transformed into a simple filter. The query field:value is based on the inverted lists and does not have a postfilter support. If your field is a numeric field take a look at the frange query parser which has post filter support: To filter out document with a field value less than 5: fq={!frange l=5 cache=false cost=200}field(myField) Cheers, Jim 2013/10/9 Rohit Harchandani rhar...@gmail.com yes i get that. actually i should have explained in more detail. - i have a query which gets certain documents. - the post filter gets these matched documents and does some processing on them and filters the results. - but after this is done i need to apply another filter - which is why i gave a higher cost to it. the reason i need to do this is because the processing done by the post filter depends on the documents matching the query till that point. since the normal fq clause is also getting executed before the post filter (despite the cost), the final results are not accurate thanks Rohit On Wed, Oct 9, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.com wrote: Ah, I think you're misunderstanding the nature of post-filters. Or I'm confused, which happens a lot! The whole point of post filters is that they're assumed to be expensive (think ACL calculation). So you want them to run on the fewest documents possible. So only docs that make it through the primary query _and_ all lower-cost filters will get to this post-filter. This means they can't be cached for instance, because they don't see (hopefully) very many docs. This is radically different than normal fq clauses, which are calculated on the entire corpus and can thus be cached. Best, Erick On Wed, Oct 9, 2013 at 11:59 AM, Rohit Harchandani rhar...@gmail.com wrote: Hey, so the post filter logs the number of ids that it receives. With the above filter having cost=200, the post filter should have received the same number of ids as before ( when the filter was not present ). But that does not seem to be the case...with the filter query on the index, the number of ids that the post filter is receiving reduces. Thanks, Rohit On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, seems like it should. What's our evidence that it isn't working? Best, Erick On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com wrote: Hey, I am using solr 4.0 with my own PostFilter implementation which is executed after the normal solr query is done. This filter has a cost of 100. Is it possible to run filter queries on the index after the execution of the post filter? I tried adding the below line to the url but it did not seem to work: fq={!cache=false cost=200}field:value Thanks, Rohit
Re: Solr 4.4 - Master/Slave configuration - Replication Issue with Commits after deleting documents using Delete by ID
Bharat, Can you look at the logs on the Master when you issue the delete and the subsequent commits and share that? Otis -- Solr ElasticSearch Support -- http://sematext.com/ Performance Monitoring -- http://sematext.com/spm On Tue, Oct 8, 2013 at 3:57 PM, Akkinepalli, Bharat (ELS-CON) b.akkinepa...@elsevier.com wrote: Hi, We have recently migrated from Solr 3.6 to Solr 4.4. We are using the Master/Slave configuration in Solr 4.4 (not Solr Cloud). We have noticed the following behavior/defect. Configuration: === 1. The Hard Commit and Soft Commit are disabled in the configuration (we control the commits from the application) 2. We have 1 Master and 2 Slaves configured and the pollInterval is configured to 10 Minutes. 3. The Master is configured to have the replicateAfter as commit startup Steps to reproduce the problem: == 1. Delete a document in Solr (using delete by id). URL - http://localhost:8983/solr/annotation/update with body as deleteidchange.me/id/delete 2. Issue a commit in Master (http://localhost:8983/solr/annotation/update?commit=true). 3. The replication of the DELETE WILL NOT happen. The master and slave has the same Index version. 4. If we try to issue another commit in Master, we see that it replicates fine. Request you to please confirm if this is a known issue. Thank you. Regards, Bharat Akkinepalli
Re: Dynamically loading synonym dictionary for solr SynonymFilter
Hi, Not as I know of. You'd probably want to subclass SynonymFilter* with your own DB aware implementation, and of course contribute this back :) -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 9. okt. 2013 kl. 23:31 skrev ALEX PKB alex...@gmail.com: Hi, All of our synonyms are maintained in DB, we would like to fetch those synonym dynamically for query expansion (Not indexing time). Are there any code contribution? I saw some discussion years ago but without conclusion. Thanks a lot!
Field with default value and stored=false, will be reset back to the default value in case of updating other fields
hi all, I have encountered some problems and post it on stackoverflow here: http://stackoverflow.com/questions/19285251/solr-field-with-default-value-resets-itself-if-it-is-stored-false as you can see from the response, does it make sense to open a bug ticket for this? because, although i can workaround this by setting everything back to stored=true, it does not make sense to keep every field stored while i dont need to return them in the search result.. or will anyone can make more detailed explanations that this is expected and normal? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Field-with-default-value-and-stored-false-will-be-reset-back-to-the-default-value-in-case-of-updatins-tp4094508.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields
You have to update the whole record including all fields... Bill Bell Sent from mobile On Oct 9, 2013, at 7:50 PM, deniz denizdurmu...@gmail.com wrote: hi all, I have encountered some problems and post it on stackoverflow here: http://stackoverflow.com/questions/19285251/solr-field-with-default-value-resets-itself-if-it-is-stored-false as you can see from the response, does it make sense to open a bug ticket for this? because, although i can workaround this by setting everything back to stored=true, it does not make sense to keep every field stored while i dont need to return them in the search result.. or will anyone can make more detailed explanations that this is expected and normal? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Field-with-default-value-and-stored-false-will-be-reset-back-to-the-default-value-in-case-of-updatins-tp4094508.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields
Billnbell wrote You have to update the whole record including all fields... so what is the point of having atomic updates if i need to update everything? - Zeki ama calismiyor... Calissa yapar... -- View this message in context: http://lucene.472066.n3.nabble.com/Field-with-default-value-and-stored-false-will-be-reset-back-to-the-default-value-in-case-of-updatins-tp4094508p4094523.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Field with default value and stored=false, will be reset back to the default value in case of updating other fields
On 10/9/2013 8:39 PM, deniz wrote: Billnbell wrote You have to update the whole record including all fields... so what is the point of having atomic updates if i need to update everything? If you have any regular fields that are not stored, atomic updates will not work -- unstored field data will be lost. If you have copyField destination fields that *are* stored, atomic updates will not work as expected with those fields. The wiki spells out the requirements: http://wiki.apache.org/solr/Atomic_Updates#Caveats_and_Limitations An atomic update is just a shortcut for read all existing fields from the original document, apply the atomic updates, and re-insert the document, overwriting the original. Thanks, Shawn