shard splitting
Hi I tried to split a shard but it failed. If I try to do it again it does not start again. I see the to extra shards in /collections/messages/leader_elect/ and /collections/messages/leaders/ How can I fix this? root@solr07-dcg:/solr/messages_shard3_replica2# curl 'http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=messagesshard=shard3' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status500/intint name=QTime300117/int/lstlst name=errorstr name=msgsplitshard the collection time out:300s/strstr name=traceorg.apache.solr.common.SolrException: splitshard the collection time out:300s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:166) at org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:300) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) /strint name=code500/int/lst /response INFO - 2013-05-22 06:45:54.148; org.apache.solr.handler.admin.CoreAdminHandler; Invoked split action for core: messages_shard3_replica1 INFO - 2013-05-22 06:45:54.271; org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: partitions=2 segments=29 INFO - 2013-05-22 06:46:03.240; org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: partition #0 range=2aaa-5554 BR Arkadi
solr starting time takes too long
Hi, We are using solr 3.6.1, our application has many cores (more than 1K), the problem is that solr starting took a long time (10m). Examing log file and code we found that for each core we loaded many resources, but in our app, we are sure we are always using the same solrconfig.xml and schema.xml for all cores. While we can config schema.xml to be shared, we cannot share SolrConfig object. But looking inside SolrConfig code, we donot use any of the cache. Could we somehow change config (or source code) to share resource between cores to reduce solr starting time? Thanks very much for helps, Lisheng
Re: solr starting time takes too long
Hi Lisheng, I had the same problem when I enabled the autoSoftCommit in solrconfig.xml. If you have it enabled, disabling it could fix your problem, Cheers. Carlos. 2013/5/22 Zhang, Lisheng lisheng.zh...@broadvision.com Hi, We are using solr 3.6.1, our application has many cores (more than 1K), the problem is that solr starting took a long time (10m). Examing log file and code we found that for each core we loaded many resources, but in our app, we are sure we are always using the same solrconfig.xml and schema.xml for all cores. While we can config schema.xml to be shared, we cannot share SolrConfig object. But looking inside SolrConfig code, we donot use any of the cache. Could we somehow change config (or source code) to share resource between cores to reduce solr starting time? Thanks very much for helps, Lisheng
RE: solr starting time takes too long
Thanks very much for quick helps! I searched but it seems that autoSoftCommit is solr 4x feature and we are still using 3.6.1? Best regards, Lisheng -Original Message- From: Carlos Bonilla [mailto:carlosbonill...@gmail.com] Sent: Wednesday, May 22, 2013 12:17 AM To: solr-user@lucene.apache.org Subject: Re: solr starting time takes too long Hi Lisheng, I had the same problem when I enabled the autoSoftCommit in solrconfig.xml. If you have it enabled, disabling it could fix your problem, Cheers. Carlos. 2013/5/22 Zhang, Lisheng lisheng.zh...@broadvision.com Hi, We are using solr 3.6.1, our application has many cores (more than 1K), the problem is that solr starting took a long time (10m). Examing log file and code we found that for each core we loaded many resources, but in our app, we are sure we are always using the same solrconfig.xml and schema.xml for all cores. While we can config schema.xml to be shared, we cannot share SolrConfig object. But looking inside SolrConfig code, we donot use any of the cache. Could we somehow change config (or source code) to share resource between cores to reduce solr starting time? Thanks very much for helps, Lisheng
Re: Boosting Documents
Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: shard splitting
clusterstate.json is now reporting shard3 as inactive. Any idea how to change clusterstate.json manually from commandline? On 05/22/2013 08:59 AM, Arkadi Colson wrote: Hi I tried to split a shard but it failed. If I try to do it again it does not start again. I see the to extra shards in /collections/messages/leader_elect/ and /collections/messages/leaders/ How can I fix this? root@solr07-dcg:/solr/messages_shard3_replica2# curl 'http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=messagesshard=shard3' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status500/intint name=QTime300117/int/lstlst name=errorstr name=msgsplitshard the collection time out:300s/strstr name=traceorg.apache.solr.common.SolrException: splitshard the collection time out:300s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:166) at org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:300) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) /strint name=code500/int/lst /response INFO - 2013-05-22 06:45:54.148; org.apache.solr.handler.admin.CoreAdminHandler; Invoked split action for core: messages_shard3_replica1 INFO - 2013-05-22 06:45:54.271; org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: partitions=2 segments=29 INFO - 2013-05-22 06:46:03.240; org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: partition #0 range=2aaa-5554 BR Arkadi
RE: Upgrade Solr index from 4.0 to 4.2.1
My index is originally of version 4.0. My methods failed with this configuration. So, I changed solrconfig.xml in my index to both versions: LUCENE_42 and LUCENE_41. For each version in each method (loading and IndexUpgrader), I see the same errors as before. Thanks. -Original Message- From: Elran Dvir Sent: Tuesday, May 21, 2013 6:48 PM To: solr-user@lucene.apache.org Subject: RE: Upgrade Solr index from 4.0 to 4.2.1 Why LUCENE_42?Why not LUCENE_41? Do I still need to run IndexUpgrader or just loading will be enough? Thanks. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, May 21, 2013 2:52 PM To: solr-user@lucene.apache.org Subject: Re: Upgrade Solr index from 4.0 to 4.2.1 This is always something that gives me a headache, but what happens if you change luceneMatchVersion in solrconfig.xml to LUCENE_40? I'm assuming it's LUCENE_42... Best Erick On Tue, May 21, 2013 at 5:48 AM, Elran Dvir elr...@checkpoint.com wrote: Hi all, I have a 4.0 Solr (sharded/cored) index. I upgraded Solr to 4.2.1 and tried to load the existing index with it. I got the following exception: May 21, 2013 12:03:42 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: other_2013-05-04 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345) at java.util.concurrent.FutureTask.run(FutureTask.java:177) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345) at java.util.concurrent.FutureTask.run(FutureTask.java:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1121) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:779) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.init(SolrCore.java:822) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) ... 10 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1435) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1547) at org.apache.solr.core.SolrCore.init(SolrCore.java:797) ... 13 more Caused by: org.apache.solr.common.SolrException: Error opening Reader at org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172) at org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:183) at org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:179) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1411) ... 15 more Caused by: org.apache.lucene.index.CorruptIndexException: codec mismatch: actual codec=Lucene40StoredFieldsIndex vs expected codec=Lucene41StoredFieldsIndex (resource: MMapIndexInput(path=/var/solr/multicore_solr/other_2013-05-04/data/index/_3gfk.fdx)) at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:140) at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:130) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:102) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) at org.apache.lucene.index.StandardDirectoryReader.open(StandardDirectoryReader.java:52) at org.apache.lucene.index.DirectoryReader.open(DirectoryReader.java:88) at
Re: Boosting Documents
Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/field field name=office boost=2.0Bridgewater/field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [Solr 4.2.1] LotsOfCores - Can't query cores with loadOnStartup=true and transient=true
Hi Erick, I opened an issue in JIRA: SOLR-4850. But I don't see how to change an assignee, I don't think that I have permissions to do it. Thank you. Best regards, Lyuba On Mon, May 20, 2013 at 6:05 PM, Erick Erickson erickerick...@gmail.comwrote: Lyuba: Could you go ahead and raise a JIRA and assign it to me to investigate? You should definitely be able to define cores this way. Thanks, Erick On Sun, May 19, 2013 at 9:27 AM, Lyuba Romanchuk lyuba.romanc...@gmail.com wrote: Hi, It seems like in order to query transient cores they must be defined with loadOnStartup=false. I define one core loadOnStartup=true and transient=false, and another cores to be loadOnStartup=true and transient=true, and transientCacheSize=Integer.MAX_VALUE. In this case CoreContainer.dynamicDescriptors will be empty and then CoreContainer.getCoreFromAnyList(String) and CoreContainer.getCore(String) returns null for all transient cores. I looked at the code of 4.3.0 and it doesn't seem that the flow was changed, the core is added only if it's not loaded on start up. Could you please assist with this issue? Best regards, Lyuba
Re: Boosting Documents
Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/field field name=office boost=2.0Bridgewater/field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/field field name=office boost=2.0Bridgewater/field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-**tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Regular expression in solr
Hi, How do we search based upon regular expressions in solr? Regards, Sagar DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Boosting Documents
I think that is applicable only for the field level boosting and not at document level boosting. Can you post your query, field definition and results you're expecting. I am using index and query time boosting without any issues so far. also which version of Solr you're using? On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/**field field name=office boost=2.0Bridgewater/**field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http://wiki.**apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html http:**//lucene.472066.n3.nabble.com/**Boosting-Documents-** tp4064955p4064966.htmlhttp://lucene.472066.n3.nabble.com/Boosting-Documents-tp4064955p4064966.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Boosting Documents
I don't know if this can help (since the document boost should be independent of any schema) but here is my schema : |?xml version=1.0 encoding=UTF-8? schema name= version=1.5 types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=long class=solr.TrieLongField sortMissingLast=true precisionStep=0 positionIncrementGap=0 / fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr.EdgeNGramFilterFactory maxGramSize=255 / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType /types fields field name=Id type=string indexed=true stored=true multiValued=false required=true / field name=Suggestion type=text indexed=true stored=true multiValued=false required=false / field name=Type type=string indexed=true stored=true multiValued=false required=true / field name=Sections type=string indexed=true stored=true multiValued=true required=false / field name=_version_ type=long indexed=true stored=true/ /fields copyField source=Id dest=Suggestion / uniqueKeyId/uniqueKey defaultSearchFieldSuggestion/defaultSearchField /schema| My query is somthing like : Suggestion:Olive Oil. The result is 9 documents, wich all has the same score 11.287682, even if they had been indexed with different boosts (I am sure of this). On 05/22/2013 10:54 AM, Sandeep Mestry wrote: I think that is applicable only for the field level boosting and not at document level boosting. Can you post your query, field definition and results you're expecting. I am using index and query time boosting without any issues so far. also which version of Solr you're using? On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/**field field name=office boost=2.0Bridgewater/**field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score. http://wiki.apache.org/solr/**CommonQueryParameters#sorthttp://wiki.apache.org/solr/CommonQueryParameters#sort HTH, Sandeep On 22 May 2013 09:21, Oussama Jilal jilal.ouss...@gmail.com wrote: Thank you for your reply bbarani, I can't do that because I want to boost some documents over others, independing of the query. On 05/21/2013 05:41 PM, bbarani wrote: Why don't you boost during query time? Something like q=supermanqf=title^2 subject You can refer: http://wiki.apache.org/solr/SolrRelevancyFAQhttp://wiki.apache.org/solr/**SolrRelevancyFAQ http://wiki.**apache.org/solr/**SolrRelevancyFAQhttp://wiki.apache.org/solr/SolrRelevancyFAQ -- View this message in context: http://lucene.472066.n3.** nabble.com/Boosting-Documents-tp4064955p4064966.htmlhttp://nabble.com/Boosting-Documents-**tp4064955p4064966.html http:**//lucene.472066.n3.nabble.com/**Boosting-Documents-**
Re: Regular expression in solr
You can write a regular expression query like this (you need to specify the regex between slashes / ) : fieldName:/[rR]egular.*/ On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote: Hi, How do we search based upon regular expressions in solr? Regards, Sagar DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Boosting Documents
Did you use the debugQuery=true in solr console to see how the query is being interpreted and the result calculation? Also, I'm not sure but this copyfield directive seems a bit confusing to me.. copyField source=Id dest=Suggestion / Because multiValued is false for Suggestion field so does that schema mean Suggestion has value only from Id and not from any other input? You haven't mentioned the version of Solr, can you also post the query params? On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this can help (since the document boost should be independent of any schema) but here is my schema : |?xml version=1.0 encoding=UTF-8? schema name= version=1.5 types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=long class=solr.TrieLongField sortMissingLast=true precisionStep=0 positionIncrementGap=0 / fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.**KeywordTokenizerFactory / filter class=solr.**LowerCaseFilterFactory / filter class=solr.**EdgeNGramFilterFactory maxGramSize=255 / /analyzer analyzer type=query tokenizer class=solr.**KeywordTokenizerFactory / filter class=solr.**LowerCaseFilterFactory / /analyzer /fieldType /types fields field name=Id type=string indexed=true stored=true multiValued=false required=true / field name=Suggestion type=text indexed=true stored=true multiValued=false required=false / field name=Type type=string indexed=true stored=true multiValued=false required=true / field name=Sections type=string indexed=true stored=true multiValued=true required=false / field name=_version_ type=long indexed=true stored=true/ /fields copyField source=Id dest=Suggestion / uniqueKeyId/uniqueKey defaultSearchField**Suggestion/**defaultSearchField /schema| My query is somthing like : Suggestion:Olive Oil. The result is 9 documents, wich all has the same score 11.287682, even if they had been indexed with different boosts (I am sure of this). On 05/22/2013 10:54 AM, Sandeep Mestry wrote: I think that is applicable only for the field level boosting and not at document level boosting. Can you post your query, field definition and results you're expecting. I am using index and query time boosting without any issues so far. also which version of Solr you're using? On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boostshttp://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boosts http://wiki.apache.org/**solr/SolrRelevancyFAQ#index-**time_boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/UpdateXmlMessages#Optional_**http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http:**//wiki.apache.org/solr/** UpdateXmlMessages#Optional_**attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/ field field name=office boost=2.0Bridgewater/field /doc/add What is not clear from your message is whether you need better scoring or better sorting. so, additionally, you can consider adding a secondary sort parameter for the docs having the same score.
RE: Regular expression in solr
@Oussama Thank you for your reply. Is it as simple as that? I mean no additional settings required? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr You can write a regular expression query like this (you need to specify the regex between slashes / ) : fieldName:/[rR]egular.*/ On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote: Hi, How do we search based upon regular expressions in solr? Regards, Sagar DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Regular expression in solr
I don't think so, it always worked for me without anything special, just try it and see :) On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote: @Oussama Thank you for your reply. Is it as simple as that? I mean no additional settings required? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr You can write a regular expression query like this (you need to specify the regex between slashes / ) : fieldName:/[rR]egular.*/ On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote: Hi, How do we search based upon regular expressions in solr? Regards, Sagar DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Boosting Documents
Yes I did debug it and there is nothing special about it, everything is treated the same, My Solr version is 4.2 The copy field is used because the 2 field are of different types but only one value is indexed in them (so no multiValue is required and it works perfectly). On 05/22/2013 11:18 AM, Sandeep Mestry wrote: Did you use the debugQuery=true in solr console to see how the query is being interpreted and the result calculation? Also, I'm not sure but this copyfield directive seems a bit confusing to me.. copyField source=Id dest=Suggestion / Because multiValued is false for Suggestion field so does that schema mean Suggestion has value only from Id and not from any other input? You haven't mentioned the version of Solr, can you also post the query params? On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this can help (since the document boost should be independent of any schema) but here is my schema : |?xml version=1.0 encoding=UTF-8? schema name= version=1.5 types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=long class=solr.TrieLongField sortMissingLast=true precisionStep=0 positionIncrementGap=0 / fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.**KeywordTokenizerFactory / filter class=solr.**LowerCaseFilterFactory / filter class=solr.**EdgeNGramFilterFactory maxGramSize=255 / /analyzer analyzer type=query tokenizer class=solr.**KeywordTokenizerFactory / filter class=solr.**LowerCaseFilterFactory / /analyzer /fieldType /types fields field name=Id type=string indexed=true stored=true multiValued=false required=true / field name=Suggestion type=text indexed=true stored=true multiValued=false required=false / field name=Type type=string indexed=true stored=true multiValued=false required=true / field name=Sections type=string indexed=true stored=true multiValued=true required=false / field name=_version_ type=long indexed=true stored=true/ /fields copyField source=Id dest=Suggestion / uniqueKeyId/uniqueKey defaultSearchField**Suggestion/**defaultSearchField /schema| My query is somthing like : Suggestion:Olive Oil. The result is 9 documents, wich all has the same score 11.287682, even if they had been indexed with different boosts (I am sure of this). On 05/22/2013 10:54 AM, Sandeep Mestry wrote: I think that is applicable only for the field level boosting and not at document level boosting. Can you post your query, field definition and results you're expecting. I am using index and query time boosting without any issues so far. also which version of Solr you're using? On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boostshttp://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_**boosts http://wiki.apache.org/**solr/SolrRelevancyFAQ#index-**time_boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts http://wiki.apache.org/solr/UpdateXmlMessages#Optional_**http://wiki.apache.org/solr/**UpdateXmlMessages#Optional_** attributes_for_.22add.22http:**//wiki.apache.org/solr/** UpdateXmlMessages#Optional_**attributes_for_.22add.22http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22add.22 All you need to do is something similar to below.. - add doc boost=2.5field name=employeeId05991/
RE: Regular expression in solr
Yes, it works for me too. But many times result is not as expected. Is there some guide on use of regex in solr? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 4:00 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr I don't think so, it always worked for me without anything special, just try it and see :) On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote: @Oussama Thank you for your reply. Is it as simple as that? I mean no additional settings required? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr You can write a regular expression query like this (you need to specify the regex between slashes / ) : fieldName:/[rR]egular.*/ On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote: Hi, How do we search based upon regular expressions in solr? Regards, Sagar DISCLAIMER: - - - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . - - - DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
synonym indexing in solr
Hi, Since synonym searching has some limitations in solr, so I wanted to know the procedure of Synonym indexing in solr? Please let me know if any guide is available for that. Regards, Sagar DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Regular expression in solr
I am not sure but I heard it works with the Java Regex engine (a little obvious if it is true ...), so any Java regex tutorial would help you. On 05/22/2013 11:42 AM, Sagar Chaturvedi wrote: Yes, it works for me too. But many times result is not as expected. Is there some guide on use of regex in solr? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 4:00 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr I don't think so, it always worked for me without anything special, just try it and see :) On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote: @Oussama Thank you for your reply. Is it as simple as that? I mean no additional settings required? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr You can write a regular expression query like this (you need to specify the regex between slashes / ) : fieldName:/[rR]egular.*/ On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote: Hi, How do we search based upon regular expressions in solr? Regards, Sagar DISCLAIMER: - - - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . - - - DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: synonym indexing in solr
Hello, I think that what is written about the SynonymFilterFactory in the wiki is well explained, so I will direct you there : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory On 05/22/2013 11:44 AM, Sagar Chaturvedi wrote: Hi, Since synonym searching has some limitations in solr, so I wanted to know the procedure of Synonym indexing in solr? Please let me know if any guide is available for that. Regards, Sagar DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: [custom data structure] aligned dynamic fields
Jack, Thanks for your response. 1. Flattening could be an option, although our scale and required functionality (runtime non DocValues backed facets) is beyond what solr3 can handle (billions of docs). We have flattened the meta data at the expense of over-generating solr documents. But to solve the problem I have described via flattening would make big impact on the scalability and price. 2. We have quite the opposite of what you have described about the dynamic fields: there will be very few per document. I agree, that caution should be taken here, as we have suffered (or should I say experienced) having multivalued fields (the good thing is we never had to facet on them). Any other options? Maybe someone can share their experience with dynamic fields and discourage from pursuing this path? Dmitry On Mon, May 20, 2013 at 4:23 PM, Jack Krupansky j...@basetechnology.comwrote: Before you dive off the deep end and go crazy with dynamic fields, try a clean, simple, Solr-oriented static design. Yes, you CAN do an over-complicated design with dynamic fields, but that doesn't mean you should. In a single phrase, denormalize and flatten your design. Sure, that will lead to a lot of rows, but Solr and Lucene are designed to do well in that scenario. If you are still linking in terms of C Struct, go for a long walk or do SOMETHING else until you can get that idea out of your head. It is a sub-optimal approach for exploiting the power of Lucene and Solr. Stay with a static schema design until you hit... just stay with a static schema, period. Dynamic fields and multi-valued fields do have value, but only when used in moderation - small numbers. If you start down a design path and find that you are heavily dependent on dynamic fields and/or multi-valued fields with large numbers of values per document, that is feedback that your design needs to be denormalized and flattened further. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Monday, May 20, 2013 7:06 AM To: solr-user@lucene.apache.org Subject: [custom data structure] aligned dynamic fields Hi all, Our current project requirement suggests that we should start storing custom data structures in solr index. The custom data structure would be an equivalent of C struct. The task is as follows. Suppose we have two types of fields, one is FieldName1 and the other FieldName2. Suppose also that we can have multiple pairs of these two fields on a document in Solr. That is, in notation of dynamic fields: doc1 FieldName1_id1 FieldName2_id1 FieldName1_id2 FieldName2_id2 doc2 FieldName1_id3 FieldName2_id3 FieldName1_id4 FieldName2_id4 FieldName1_id5 FieldName2_id5 etc What we would like to have is a value for the Field1_(some_unique_id) and a value for Field2_(some_unique_id) as input for search. That is we wouldn't care about the some_unique_id in some search scenarios. And the search would automatically iterate the pairs of dynamic fields and respect the pairings. I know it used to be so, that with dynamic fields a client must provide the dynamically generated field names coupled with their values up front when searching. What data structure / solution could be used as an alternative approach to help such a structured search? Thanks, Dmitry
Re: Boosting Documents
I'm running out of options now, can't really see the issue you're facing unless the debug analysis is posted. I think a thorough debugging is required from both application and solr level. If you want a customize scoring from Solr, you can also consider overriding DefaultSimilarity implementation - but that'll be a separate issue. On 22 May 2013 11:32, Oussama Jilal jilal.ouss...@gmail.com wrote: Yes I did debug it and there is nothing special about it, everything is treated the same, My Solr version is 4.2 The copy field is used because the 2 field are of different types but only one value is indexed in them (so no multiValue is required and it works perfectly). On 05/22/2013 11:18 AM, Sandeep Mestry wrote: Did you use the debugQuery=true in solr console to see how the query is being interpreted and the result calculation? Also, I'm not sure but this copyfield directive seems a bit confusing to me.. copyField source=Id dest=Suggestion / Because multiValued is false for Suggestion field so does that schema mean Suggestion has value only from Id and not from any other input? You haven't mentioned the version of Solr, can you also post the query params? On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this can help (since the document boost should be independent of any schema) but here is my schema : |?xml version=1.0 encoding=UTF-8? schema name= version=1.5 types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=long class=solr.TrieLongField sortMissingLast=true precisionStep=0 positionIncrementGap=0 / fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr. KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr. EdgeNGramFilterFactory maxGramSize=255 / /analyzer analyzer type=query tokenizer class=solr. KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType /types fields field name=Id type=string indexed=true stored=true multiValued=false required=true / field name=Suggestion type=text indexed=true stored=true multiValued=false required=false / field name=Type type=string indexed=true stored=true multiValued=false required=true / field name=Sections type=string indexed=true stored=true multiValued=true required=false / field name=_version_ type=long indexed=true stored=true/ /fields copyField source=Id dest=Suggestion / uniqueKeyId/uniqueKey defaultSearchFieldSuggestion/defaultSearchField /schema| My query is somthing like : Suggestion:Olive Oil. The result is 9 documents, wich all has the same score 11.287682, even if they had been indexed with different boosts (I am sure of this). On 05/22/2013 10:54 AM, Sandeep Mestry wrote: I think that is applicable only for the field level boosting and not at document level boosting. Can you post your query, field definition and results you're expecting. I am using index and query time boosting without any issues so far. also which version of Solr you're using? On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki.. http://wiki.apache.org/solr/**SolrRelevancyFAQ#index-time_** boostshttp://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts
Re: Regular expression in solr
I just can't get the $ endpoint to work. I am not sure but I heard it works with the Java Regex engine (a little obvious if it is true ...), so any Java regex tutorial would help you. On 05/22/2013 11:42 AM, Sagar Chaturvedi wrote: Yes, it works for me too. But many times result is not as expected. Is there some guide on use of regex in solr? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 4:00 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr I don't think so, it always worked for me without anything special, just try it and see :) On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote: @Oussama Thank you for your reply. Is it as simple as that? I mean no additional settings required? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr You can write a regular expression query like this (you need to specify the regex between slashes / ) : fieldName:/[rR]egular.*/ On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote: Hi, How do we search based upon regular expressions in solr? Regards, Sagar DISCLAIMER: - - - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . - - - DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . --- -- Stéphane Roux hab...@habett.org http://habett.net
Re: Boosting Documents
Ok thank you for your help, I think I will have to treat the problem in another way even if it will complicate things for me. thanks again On 05/22/2013 11:51 AM, Sandeep Mestry wrote: I'm running out of options now, can't really see the issue you're facing unless the debug analysis is posted. I think a thorough debugging is required from both application and solr level. If you want a customize scoring from Solr, you can also consider overriding DefaultSimilarity implementation - but that'll be a separate issue. On 22 May 2013 11:32, Oussama Jilal jilal.ouss...@gmail.com wrote: Yes I did debug it and there is nothing special about it, everything is treated the same, My Solr version is 4.2 The copy field is used because the 2 field are of different types but only one value is indexed in them (so no multiValue is required and it works perfectly). On 05/22/2013 11:18 AM, Sandeep Mestry wrote: Did you use the debugQuery=true in solr console to see how the query is being interpreted and the result calculation? Also, I'm not sure but this copyfield directive seems a bit confusing to me.. copyField source=Id dest=Suggestion / Because multiValued is false for Suggestion field so does that schema mean Suggestion has value only from Id and not from any other input? You haven't mentioned the version of Solr, can you also post the query params? On 22 May 2013 11:04, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this can help (since the document boost should be independent of any schema) but here is my schema : |?xml version=1.0 encoding=UTF-8? schema name= version=1.5 types fieldType name=string class=solr.StrField sortMissingLast=true / fieldType name=long class=solr.TrieLongField sortMissingLast=true precisionStep=0 positionIncrementGap=0 / fieldType name=text class=solr.TextField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr. KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / filter class=solr. EdgeNGramFilterFactory maxGramSize=255 / /analyzer analyzer type=query tokenizer class=solr. KeywordTokenizerFactory / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType /types fields field name=Id type=string indexed=true stored=true multiValued=false required=true / field name=Suggestion type=text indexed=true stored=true multiValued=false required=false / field name=Type type=string indexed=true stored=true multiValued=false required=true / field name=Sections type=string indexed=true stored=true multiValued=true required=false / field name=_version_ type=long indexed=true stored=true/ /fields copyField source=Id dest=Suggestion / uniqueKeyId/uniqueKey defaultSearchFieldSuggestion/defaultSearchField /schema| My query is somthing like : Suggestion:Olive Oil. The result is 9 documents, wich all has the same score 11.287682, even if they had been indexed with different boosts (I am sure of this). On 05/22/2013 10:54 AM, Sandeep Mestry wrote: I think that is applicable only for the field level boosting and not at document level boosting. Can you post your query, field definition and results you're expecting. I am using index and query time boosting without any issues so far. also which version of Solr you're using? On 22 May 2013 10:44, Oussama Jilal jilal.ouss...@gmail.com wrote: I don't know if this is the issue or not but, concidering this note from the wiki : NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for any fields where the index-time boost should be stored. In my case where I only need to boost the whole document (not a specific field), do I have to activate the omitNorms=false for all the fields in the schema ? On 05/22/2013 10:41 AM, Oussama Jilal wrote: Thank you Sandeep, I did post the document like that (a minor difference is that I did not add the boost to the field since I don't want to boost on specific field, I boosted the whole document 'doc boost=2.0 /doc'), but the issue is that everything in the queries results has the same score even if they had been indexed with different boosts, and I can't sort on another field since this is independent from any field value. Any ideas ? On 05/22/2013 10:30 AM, Sandeep Mestry wrote: Hi Oussama, This is explained very nicely on Solr Wiki..
RE: synonym indexing in solr
Thanks. Already used it. Quite easy to setup. But it tells how to setup Synonym search. I am asking about synonym indexing. -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 4:18 PM To: solr-user@lucene.apache.org Subject: Re: synonym indexing in solr Hello, I think that what is written about the SynonymFilterFactory in the wiki is well explained, so I will direct you there : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory On 05/22/2013 11:44 AM, Sagar Chaturvedi wrote: Hi, Since synonym searching has some limitations in solr, so I wanted to know the procedure of Synonym indexing in solr? Please let me know if any guide is available for that. Regards, Sagar DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Regular expression in solr
There is no ^ or $ in the solr regex since the regular expression will match tokens (not the complete indexed text). So the results you get will basicly depend on your way of indexing, if you use the regex on a tokenized field and that is not what you want, try to use a copy field wich is not tokenized and then use the regex on that one. On 05/22/2013 11:53 AM, Stéphane Habett Roux wrote: I just can't get the $ endpoint to work. I am not sure but I heard it works with the Java Regex engine (a little obvious if it is true ...), so any Java regex tutorial would help you. On 05/22/2013 11:42 AM, Sagar Chaturvedi wrote: Yes, it works for me too. But many times result is not as expected. Is there some guide on use of regex in solr? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 4:00 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr I don't think so, it always worked for me without anything special, just try it and see :) On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote: @Oussama Thank you for your reply. Is it as simple as that? I mean no additional settings required? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr You can write a regular expression query like this (you need to specify the regex between slashes / ) : fieldName:/[rR]egular.*/ On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote: Hi, How do we search based upon regular expressions in solr? Regards, Sagar DISCLAIMER: - - - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . - - - DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core
Sandeep: You need to be a little careful here, I second Shawn's comment that you are mixing versions. You say you are using solr 4.0. But the jar that ships with that is apache-solr-core-4.0.0.jar. Then you talk about using solr-core, which is called solr-core-4.1.jar. Maven is not officially supported, so grabbing some solr-core.jar (with no apache) and doing _anything_ with it from a 4.0 code base is not a good idea. You can check out the 4.0 code branch and just compile the whole thing. Or you can get a new 4.0 distro and use the jars there. But I'd be _really_ cautious about using a 4.1 or later jar with 4.0. FWIW, Erick On Tue, May 21, 2013 at 12:05 PM, Sandeep Mestry sanmes...@gmail.com wrote: Thanks Steve, I could find solr-core.jar in the repo but could not find apache-solr-core.jar. I think my issue got misunderstood - which is totally my fault. Anyway, I took into account Shawn's comment and will use solr-core.jar only for compiling the project - not for deploying. Thanks, Sandeep On 21 May 2013 16:46, Steve Rowe sar...@gmail.com wrote: The 4.0 solr-core jar is available in Maven Central: http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar Steve On May 21, 2013, at 11:26 AM, Sandeep Mestry sanmes...@gmail.com wrote: Hi Steve, Solr 4.0 - mentioned in the subject.. :-) Thanks, Sandeep On 21 May 2013 14:58, Steve Rowe sar...@gmail.com wrote: Sandeep, What version of Solr are you using? Steve On May 21, 2013, at 6:55 AM, Sandeep Mestry sanmes...@gmail.com wrote: Hi Shawn, Thanks for your reply. I'm not mixing versions. The problem I faced is I want to override Highlighter from solr-core jar and if I add that as a dependency in my project then there was a clash between solr-core.jar and the apache-solr-core.jar that comes bundled within the solr distribution. It was complaining about MorfologikFilterFactory classcastexception. I can't use apache-solr-core.jar as a dependency as no such jar exists in any maven repo. The only thing I could do is to remove apache-solr-core.jar from solr.war and then use solr-core.jar as a dependency - however I do not think this is the ideal solution. Thanks, Sandeep On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote: On 5/20/2013 8:01 AM, Sandeep Mestry wrote: And I do remember the discussion on the forum about dropping the name *apache* from solr jars. If that's what caused this issue, then can you tell me if the mirrors need updating with solr-core.jar instead of apache-solr-core.jar? If it's named apache-solr-core, then it's from 4.0 or earlier. If it's named solr-core, then it's from 4.1 or later. That might mean that you are mixing versions - don't do that. Make sure that you have jars from the exact same version as your server. Thanks, Shawn
Re: Upgrade Solr index from 4.0 to 4.2.1
LUCENE_40 since your original index was built with 4.0. As for the other, I'll defer to people who actually know what they're talking about. Best Erick On Wed, May 22, 2013 at 5:19 AM, Elran Dvir elr...@checkpoint.com wrote: My index is originally of version 4.0. My methods failed with this configuration. So, I changed solrconfig.xml in my index to both versions: LUCENE_42 and LUCENE_41. For each version in each method (loading and IndexUpgrader), I see the same errors as before. Thanks. -Original Message- From: Elran Dvir Sent: Tuesday, May 21, 2013 6:48 PM To: solr-user@lucene.apache.org Subject: RE: Upgrade Solr index from 4.0 to 4.2.1 Why LUCENE_42?Why not LUCENE_41? Do I still need to run IndexUpgrader or just loading will be enough? Thanks. -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Tuesday, May 21, 2013 2:52 PM To: solr-user@lucene.apache.org Subject: Re: Upgrade Solr index from 4.0 to 4.2.1 This is always something that gives me a headache, but what happens if you change luceneMatchVersion in solrconfig.xml to LUCENE_40? I'm assuming it's LUCENE_42... Best Erick On Tue, May 21, 2013 at 5:48 AM, Elran Dvir elr...@checkpoint.com wrote: Hi all, I have a 4.0 Solr (sharded/cored) index. I upgraded Solr to 4.2.1 and tried to load the existing index with it. I got the following exception: May 21, 2013 12:03:42 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.SolrException: Unable to create core: other_2013-05-04 at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:1672) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1057) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:634) at org.apache.solr.core.CoreContainer$3.call(CoreContainer.java:629) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345) at java.util.concurrent.FutureTask.run(FutureTask.java:177) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:482) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:345) at java.util.concurrent.FutureTask.run(FutureTask.java:177) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1121) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:614) at java.lang.Thread.run(Thread.java:779) Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.init(SolrCore.java:822) at org.apache.solr.core.SolrCore.init(SolrCore.java:618) at org.apache.solr.core.CoreContainer.createFromLocal(CoreContainer.java:1021) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:1051) ... 10 more Caused by: org.apache.solr.common.SolrException: Error opening new searcher at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1435) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1547) at org.apache.solr.core.SolrCore.init(SolrCore.java:797) ... 13 more Caused by: org.apache.solr.common.SolrException: Error opening Reader at org.apache.solr.search.SolrIndexSearcher.getReader(SolrIndexSearcher.java:172) at org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:183) at org.apache.solr.search.SolrIndexSearcher.init(SolrIndexSearcher.java:179) at org.apache.solr.core.SolrCore.openNewSearcher(SolrCore.java:1411) ... 15 more Caused by: org.apache.lucene.index.CorruptIndexException: codec mismatch: actual codec=Lucene40StoredFieldsIndex vs expected codec=Lucene41StoredFieldsIndex (resource: MMapIndexInput(path=/var/solr/multicore_solr/other_2013-05-04/data/index/_3gfk.fdx)) at org.apache.lucene.codecs.CodecUtil.checkHeaderNoMagic(CodecUtil.java:140) at org.apache.lucene.codecs.CodecUtil.checkHeader(CodecUtil.java:130) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsReader.init(CompressingStoredFieldsReader.java:102) at org.apache.lucene.codecs.compressing.CompressingStoredFieldsFormat.fieldsReader(CompressingStoredFieldsFormat.java:113) at org.apache.lucene.index.SegmentCoreReaders.init(SegmentCoreReaders.java:147) at org.apache.lucene.index.SegmentReader.init(SegmentReader.java:56) at org.apache.lucene.index.StandardDirectoryReader$1.doBody(StandardDirectoryReader.java:62) at
Re: [Solr 4.2.1] LotsOfCores - Can't query cores with loadOnStartup=true and transient=true
Thanks, I saw that and assigned it to myself. On the original form when you create the issue, there's an assign to entry field, but I don't know whether you see the same thing Best Erick On Wed, May 22, 2013 at 5:36 AM, Lyuba Romanchuk lyuba.romanc...@gmail.com wrote: Hi Erick, I opened an issue in JIRA: SOLR-4850. But I don't see how to change an assignee, I don't think that I have permissions to do it. Thank you. Best regards, Lyuba On Mon, May 20, 2013 at 6:05 PM, Erick Erickson erickerick...@gmail.comwrote: Lyuba: Could you go ahead and raise a JIRA and assign it to me to investigate? You should definitely be able to define cores this way. Thanks, Erick On Sun, May 19, 2013 at 9:27 AM, Lyuba Romanchuk lyuba.romanc...@gmail.com wrote: Hi, It seems like in order to query transient cores they must be defined with loadOnStartup=false. I define one core loadOnStartup=true and transient=false, and another cores to be loadOnStartup=true and transient=true, and transientCacheSize=Integer.MAX_VALUE. In this case CoreContainer.dynamicDescriptors will be empty and then CoreContainer.getCoreFromAnyList(String) and CoreContainer.getCore(String) returns null for all transient cores. I looked at the code of 4.3.0 and it doesn't seem that the flow was changed, the core is added only if it's not loaded on start up. Could you please assist with this issue? Best regards, Lyuba
Re: Solr 4.0 war startup issue - apache-solr-core.jar Vs solr-core
Thanks Erick for your suggestion. Turns out I won't be going that route after all as the highlighter component is quite complicated - to follow and to override - and not much time left in hand so did it the manual (dirty) way. Beat Regards, Sandeep On 22 May 2013 12:21, Erick Erickson erickerick...@gmail.com wrote: Sandeep: You need to be a little careful here, I second Shawn's comment that you are mixing versions. You say you are using solr 4.0. But the jar that ships with that is apache-solr-core-4.0.0.jar. Then you talk about using solr-core, which is called solr-core-4.1.jar. Maven is not officially supported, so grabbing some solr-core.jar (with no apache) and doing _anything_ with it from a 4.0 code base is not a good idea. You can check out the 4.0 code branch and just compile the whole thing. Or you can get a new 4.0 distro and use the jars there. But I'd be _really_ cautious about using a 4.1 or later jar with 4.0. FWIW, Erick On Tue, May 21, 2013 at 12:05 PM, Sandeep Mestry sanmes...@gmail.com wrote: Thanks Steve, I could find solr-core.jar in the repo but could not find apache-solr-core.jar. I think my issue got misunderstood - which is totally my fault. Anyway, I took into account Shawn's comment and will use solr-core.jar only for compiling the project - not for deploying. Thanks, Sandeep On 21 May 2013 16:46, Steve Rowe sar...@gmail.com wrote: The 4.0 solr-core jar is available in Maven Central: http://search.maven.org/#artifactdetails%7Corg.apache.solr%7Csolr-core%7C4.0.0%7Cjar Steve On May 21, 2013, at 11:26 AM, Sandeep Mestry sanmes...@gmail.com wrote: Hi Steve, Solr 4.0 - mentioned in the subject.. :-) Thanks, Sandeep On 21 May 2013 14:58, Steve Rowe sar...@gmail.com wrote: Sandeep, What version of Solr are you using? Steve On May 21, 2013, at 6:55 AM, Sandeep Mestry sanmes...@gmail.com wrote: Hi Shawn, Thanks for your reply. I'm not mixing versions. The problem I faced is I want to override Highlighter from solr-core jar and if I add that as a dependency in my project then there was a clash between solr-core.jar and the apache-solr-core.jar that comes bundled within the solr distribution. It was complaining about MorfologikFilterFactory classcastexception. I can't use apache-solr-core.jar as a dependency as no such jar exists in any maven repo. The only thing I could do is to remove apache-solr-core.jar from solr.war and then use solr-core.jar as a dependency - however I do not think this is the ideal solution. Thanks, Sandeep On 20 May 2013 15:18, Shawn Heisey s...@elyograg.org wrote: On 5/20/2013 8:01 AM, Sandeep Mestry wrote: And I do remember the discussion on the forum about dropping the name *apache* from solr jars. If that's what caused this issue, then can you tell me if the mirrors need updating with solr-core.jar instead of apache-solr-core.jar? If it's named apache-solr-core, then it's from 4.0 or earlier. If it's named solr-core, then it's from 4.1 or later. That might mean that you are mixing versions - don't do that. Make sure that you have jars from the exact same version as your server. Thanks, Shawn
Solr Faceting doesn't return values.
Hello, I have a field defined in my schema.xml like so: field name=sa_site_city type=string indexed=true stored=true/ string is a type : fieldType name=string class=solr.StrField sortMissingLast=true / When I run the query for faceting data by the city: http://XX.XX.XX.XX/solr/collection1/select?q=mm_state_codewt=jsonindent=truefacet=truefacet.field=sa_site_city I get empty result like so: { responseHeader:{ status:0, QTime:1, params:{ facet:true, indent:true, q:mm_state_code, facet.field:sa_site_city, wt:json}}, response:{numFound:0,start:0,docs:[] }, facet_counts:{ facet_queries:{}, facet_fields:{ sa_site_city:[]}, facet_dates:{}, facet_ranges:{}}} I wonder what am I doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr starting time takes too long
Zhang: In 3.6, there's really no choice except to load all the cores on startup. 10 minutes still seems excessive, do you perhaps have a heavy-weight firstSearcher query? Yes, soft commits are 4.x only, so that's not your problem. There's a shareSchema option that tries to only load 1 copy of the schema that should help, but that doesn't help with loading solrconfig.xml. Also in the 4.3+ world there's the option to lazily-load cores, see: http://wiki.apache.org/solr/LotsOfCores for the overview. Perhaps not an option, but I thought I'd mention it. But I'm afraid you're stuck. You might be able to run bigger hardware (perhaps you're memory-starved). Other than that, you may need to use more than one machine to get fast enough startup times. Best, Erick On Wed, May 22, 2013 at 3:27 AM, Zhang, Lisheng lisheng.zh...@broadvision.com wrote: Thanks very much for quick helps! I searched but it seems that autoSoftCommit is solr 4x feature and we are still using 3.6.1? Best regards, Lisheng -Original Message- From: Carlos Bonilla [mailto:carlosbonill...@gmail.com] Sent: Wednesday, May 22, 2013 12:17 AM To: solr-user@lucene.apache.org Subject: Re: solr starting time takes too long Hi Lisheng, I had the same problem when I enabled the autoSoftCommit in solrconfig.xml. If you have it enabled, disabling it could fix your problem, Cheers. Carlos. 2013/5/22 Zhang, Lisheng lisheng.zh...@broadvision.com Hi, We are using solr 3.6.1, our application has many cores (more than 1K), the problem is that solr starting took a long time (10m). Examing log file and code we found that for each core we loaded many resources, but in our app, we are sure we are always using the same solrconfig.xml and schema.xml for all cores. While we can config schema.xml to be shared, we cannot share SolrConfig object. But looking inside SolrConfig code, we donot use any of the cache. Could we somehow change config (or source code) to share resource between cores to reduce solr starting time? Thanks very much for helps, Lisheng
Re: synonym indexing in solr
Look at the text_general type (solr 4.x) in the example schema.xml. That has an example of including synonyms at index time (although it it commented out, but you can get the idea). So to substitute synonyms at index time, just uncomment the index time analyzer mention of synonyms and comment out the one in the query time analysis chain. Be cautious about doing synonym expansion at both index and query time. It's perfectly legal but often not what you want if you use the same synonym list. Best Erick On Wed, May 22, 2013 at 7:02 AM, Sagar Chaturvedi sagar.chaturv...@nectechnologies.in wrote: Thanks. Already used it. Quite easy to setup. But it tells how to setup Synonym search. I am asking about synonym indexing. -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 4:18 PM To: solr-user@lucene.apache.org Subject: Re: synonym indexing in solr Hello, I think that what is written about the SynonymFilterFactory in the wiki is well explained, so I will direct you there : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory On 05/22/2013 11:44 AM, Sagar Chaturvedi wrote: Hi, Since synonym searching has some limitations in solr, so I wanted to know the procedure of Synonym indexing in solr? Please let me know if any guide is available for that. Regards, Sagar DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Solr Faceting doesn't return values.
Probably you're not querying the field you think you are. Try adding debug=all to the URL and I think you'll see something like default_search_field:mm_state_code Which means you're searching for the literal phrase mm_state_code in your default search field (defined in solrconfig.xml for the handler you're using). You won't get any facets if you don't have any documents that match. Best Erick On Wed, May 22, 2013 at 7:42 AM, samabhiK qed...@gmail.com wrote: Hello, I have a field defined in my schema.xml like so: field name=sa_site_city type=string indexed=true stored=true/ string is a type : fieldType name=string class=solr.StrField sortMissingLast=true / When I run the query for faceting data by the city: http://XX.XX.XX.XX/solr/collection1/select?q=mm_state_codewt=jsonindent=truefacet=truefacet.field=sa_site_city I get empty result like so: { responseHeader:{ status:0, QTime:1, params:{ facet:true, indent:true, q:mm_state_code, facet.field:sa_site_city, wt:json}}, response:{numFound:0,start:0,docs:[] }, facet_counts:{ facet_queries:{}, facet_fields:{ sa_site_city:[]}, facet_dates:{}, facet_ranges:{}}} I wonder what am I doing wrong? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276.html Sent from the Solr - User mailing list archive at Nabble.com.
too many boolean clauses
I got: SyntaxError: Cannot parse 'name:Bbbbm' Using solr 4.21 name field type def: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=1 preserveOriginal=1 types=characters.txt / filter class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=15/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=1 preserveOriginal=1 types=characters.txt / filter class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=15/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Any ideas how to fix it? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/too-many-boolean-clauses-tp4065288.html Sent from the Solr - User mailing list archive at Nabble.com.
Sorting solr search results using multiple fields
hi all I wanted to know is there a way I can sort the my documents based on 3 fields I have fields like pop(which is basically frequency of the term searched history) and autosug(auto suggested words) and initial_boost(copy field of autosug such that only match with initial term match having whole sentence saved as one token) Now I want the documents to be returned as: 1. initial_boost with pop of 192 2. initial_boost with pop of 156 3. initial_boost with pop of 120 4. autosug with pop of 205 5. autosug with pop of 180 6. autosug with pop of 112 I have tried using boosting the initial_boost field but without the sort it does the above boost to the initial_boost than autosug but as I add sort=pop desc documents gets sorted according to pop field disturbing the boost on the fields that I had set. help anyone... thanks in advance. regards Rohan
Re: setting the collection in cloudsolrserver without using setdefaultcollection.
On 5/21/2013 11:20 PM, mike st. john wrote: Is there any way to set the collection without passing setDefaultCollection in cloudsolrserver? I'm using cloudsolrserver with spring, and would like to autowire it. It's a query parameter: http://wiki.apache.org/solr/SolrCloud#Distributed_Requests Here's how you do it in SolrJ: SolrQuery query = new SolrQuery(); query.set(collection, collection3); Thanks, Shawn
Re: Sorting solr search results using multiple fields
On 22 May 2013 18:26, Rohan Thakur rohan.i...@gmail.com wrote: hi all I wanted to know is there a way I can sort the my documents based on 3 fields I have fields like pop(which is basically frequency of the term searched history) and autosug(auto suggested words) and initial_boost(copy field of autosug such that only match with initial term match having whole sentence saved as one token) [...] You seem to be confusing boosting with sorting. If you sort the results, the boosts are irrelevant. You can sort on multiple fields by separating them by commas, as described under http://wiki.apache.org/solr/CommonQueryParameters#sort Regards, Gora
Re: Crawl Anywhere -
Hi, I didn't see this question. Yes, I confirm Crawl-Anywhere can crawl in distributed environment. If you have several huge web sites to crawl, you can dispatch crawling across several crawler engines. However, one single web site can only be crawled by one crawler engine at a time. This limitation should be removed in future version. For your information, new version 4.0.0 is now available as an open-source project hosted on Github - https://github.com/bejean/crawl-anywhere Regards. Le 11/02/13 12:02, O. Klein a écrit : Yes you can run CA on different machines. In Manage you have to set target and engine for this to work. I've never done this, so you have to contact the developer for more details. SivaKarthik wrote Hi All, in our project, we need to download around millions of pages... so is there any support to do the crawling in distributed environment using crawl-anywhere apps? or wat could be the alternatives...? Thanks in advance.. -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4039674.html Sent from the Solr - User mailing list archive at Nabble.com. -- Dominique Béjean +33 6 08 46 12 43 skype: dbejean www.eolya.fr www.crawl-anywhere.com www.mysolrserver.com
Re: [ANNOUNCE] Web Crawler
Hi, Crawl-Anywhere is now open-source - https://github.com/bejean/crawl-anywhere Best regards. Le 02/03/11 10:02, findbestopensource a écrit : Hello Dominique Bejean, Good job. We identified almost 8 open source web crawlers http://www.findbestopensource.com/tagged/webcrawler I don't know how far yours would be different from the rest. Your license states that it is not open source but it is free for personnel use. Regards Aditya www.findbestopensource.com http://www.findbestopensource.com On Wed, Mar 2, 2011 at 5:55 AM, Dominique Bejean dominique.bej...@eolya.fr mailto:dominique.bej...@eolya.fr wrote: Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document processing pipeline * a solr indexer The crawler has a web administration in order to manage web sites to be crawled. Each web site crawl is configured with a lot of possible parameters (no all mandatory) : * number of simultaneous items crawled by site * recrawl period rules based on item type (html, PDF, …) * item type inclusion / exclusion rules * item path inclusion / exclusion / strategy rules * max depth * web site authentication * language * country * tags * collections * ... The pileline includes various ready to use stages (text extraction, language detection, Solr ready to index xml writer, ...). All is very configurable and extendible either by scripting or java coding. With scripting technology, you can help the crawler to handle javascript links or help the pipeline to extract relevant title and cleanup the html pages (remove menus, header, footers, ..) With java coding, you can develop your own pipeline stage stage The Crawl Anywhere web site provides good explanations and screen shots. All is documented in a wiki. The current version is 1.1.4. You can download and try it out from here : www.crawl-anywhere.com http://www.crawl-anywhere.com Regards Dominique -- Dominique Béjean +33 6 08 46 12 43 skype: dbejean www.eolya.fr www.crawl-anywhere.com www.mysolrserver.com
Re: Solr Faceting doesn't return values.
Ok after I added debug=all to the query, I get: { responseHeader:{ status:0, QTime:11, params:{ facet:true, indent:true, q:mm_state_code, debug:all, facet.field:sa_site_city, wt:json}}, response:{numFound:0,start:0,docs:[] }, facet_counts:{ facet_queries:{}, facet_fields:{ sa_site_city:[]}, facet_dates:{}, facet_ranges:{}}, debug:{ rawquerystring:mm_state_code, querystring:mm_state_code, parsedquery:sa_property_id:mm_state_code, parsedquery_toString:sa_property_id:mm_state_code, explain:{}, QParser:LuceneQParser, timing:{ time:4.0, prepare:{ time:2.0, query:{ time:0.0}, facet:{ time:0.0}, mlt:{ time:0.0}, highlight:{ time:0.0}, stats:{ time:0.0}, debug:{ time:0.0}}, process:{ time:1.0, query:{ time:0.0}, facet:{ time:0.0}, mlt:{ time:0.0}, highlight:{ time:0.0}, stats:{ time:0.0}, debug:{ time:1.0} I have not defined any default facet filed, in the handler in solrconfig.xml file. Also, there is plenty of data available and the field sa_site_city What I am trying to understand is this: parsedquery:sa_property_id:mm_state_code I have a field sa_property_id in the schema but i have not defined it in the query nor in solrconfig.xml, but why is it still evaluated? Any help in solving this problem will be greatly appreciated. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065294.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [ANNOUNCE] Web Crawler
Hi, I did see this message (again). Please, use the new dedicated Crawl-Anywhere forum for your next questions. https://groups.google.com/forum/#!forum/crawl-anywhere Did you solve your problem ? Thank you Dominique Le 29/01/13 09:28, SivaKarthik a écrit : Hi, i resolved the issue Access denied for user 'crawler'@'localhost' (using password: YES) mysql user crawler/crawler was created and privileges added as mentioned in the tutorial.. Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4036978.html Sent from the Solr - User mailing list archive at Nabble.com. -- Dominique Béjean +33 6 08 46 12 43 skype: dbejean www.eolya.fr www.crawl-anywhere.com www.mysolrserver.com
search filter
Dear All Can I write a search filter for a field having a value in a range or a specific value. Say if I want to have a filter like 1. Select profiles with salary 5 to 10 or Salary 0. So I expect profiles having salary either 0 , 5, 6, 7, 8, 9, 10 etc. It should be possible, can somebody help me with syntax of 'fq' filter please. Best Regards kamal
Re: Crawl Anywhere -
Hi, Crawl-Anywhere includes a customizable document processing pipeline. Crawl-Anywhere can also cache original crawled pages and documents in a mongodb database. Best regards. Dominique Le 11/02/13 06:16, SivaKarthik a écrit : Dear Erick, Thanks for ur relpy.. ya..nutch can meet my requirement... but the problem is, i want to store the crawled document in html or xml format instead of mapreduce format.. not sure nutch plugins available to convert into xml files. please share me if you any idea . ThankYou -- View this message in context: http://lucene.472066.n3.nabble.com/ANNOUNCE-Web-Crawler-tp2607831p4039619.html Sent from the Solr - User mailing list archive at Nabble.com. -- Dominique Béjean www.crawl-anywhere.com
Re: Solr Faceting doesn't return values.
Ok my bad. I do have a default field defined in the /select handler in the config file. lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfsa_property_id/str /lst But then how do I change my query now? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065298.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: search filter
Hello! You can try sending a filter like this fq=Salary:[5+TO+10]+OR+Salary:0 It should work -- Regards, Rafał Kuć Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch Dear All Can I write a search filter for a field having a value in a range or a specific value. Say if I want to have a filter like 1. Select profiles with salary 5 to 10 or Salary 0. So I expect profiles having salary either 0 , 5, 6, 7, 8, 9, 10 etc. It should be possible, can somebody help me with syntax of 'fq' filter please. Best Regards kamal
Re: too many boolean clauses
On 5/22/2013 6:43 AM, adm1n wrote: SyntaxError: Cannot parse 'name:Bbbbm' The subject mentions one error, the message says another. If you are getting too many boolean clauses, then you need to increase the maxBooleanClauses in your solrconfig.xml file. The default is 1024: maxBooleanClauses1024/maxBooleanClauses Looking at your analyzer chain, I see two potential problems. One is that you have two tokenizer factories, though one is specified as a filter. I don't know if you can use a tokenizer as a filter - you might need NGramFilterFactory instead. If using a tokenizer as a filter actually works, then we run into the other possible problem: I can imagine that with the input you have specified, the NGram expansion in your config might balloon that to more than 1024 tokens, which would exceed the default maxBooleanClauses. Thanks, Shawn
Re: Sorting solr search results using multiple fields
thanks gora I got that one more thing what actually I have done is made document consisting of fields: { autosug:galaxy, query_id:1414, pop:168, initial_boost:galaxy _version_:1435669695565922305, score:1.8908522} this inital_boost is basically copy field of autosug but saved using different analysers taking whole sentence as single token and generating edge ngrams so that what I search on this field only term matching from first will match...and for any other infix term match I have autosug field so now what I want from this is to show the documents returned with initial_boost first and then the documents with autosug field sorted with pop field respectively (separately) and return the result... now from your suggestion I could do this using sort on multiple fields by separating them by commas, as described under http://wiki.apache.org/solr/CommonQueryParameters#sort but for that I would require 1 field having value greater(all equal say 2) for initial_boost field and smaller(all same say 1) for autosug field how can I do this? or is there some better solution.. thanks regards Rohan On Wed, May 22, 2013 at 6:39 PM, Gora Mohanty g...@mimirtech.com wrote: On 22 May 2013 18:26, Rohan Thakur rohan.i...@gmail.com wrote: hi all I wanted to know is there a way I can sort the my documents based on 3 fields I have fields like pop(which is basically frequency of the term searched history) and autosug(auto suggested words) and initial_boost(copy field of autosug such that only match with initial term match having whole sentence saved as one token) [...] You seem to be confusing boosting with sorting. If you sort the results, the boosts are irrelevant. You can sort on multiple fields by separating them by commas, as described under http://wiki.apache.org/solr/CommonQueryParameters#sort Regards, Gora
RE: How do I use CachedSqlEntityProcessor?
Thank you bbarani. Unfortunately, this does not work. I do not get any exception, and the documents import OK. However there is no Category1, Category2 … etc. when I retrieve the documents. I don’t think I am using the Alpha or Beta of 4.0. I think I downloaded the plain vanilla release version. O. O. bbarani wrote Try this.. entity name=Cat1 query=SELECT CategoryName,SKU from CAT_TABLE WHERE CategoryLevel=1 cacheKey=Cat1.SKU cacheLookup=Product.SKU processor=CachedSqlEntityProcessor field column=CategoryName name=Category1 / /entity sample data import config: entity name=property query=select UID,name as name, value as value from opTable where type='${dataimporter.request.type}' and indexed='Y' processor=CachedSqlEntityProcessor cacheKey=UID cacheLookup=object.uid transformer=RegexTransformer,DateFormatTransformer,TemplateTransformer field column=value name=${property.name}/ //dynamic column /entity Also not sure if you are using Alpha / Beta release of SOLR 4.0. In Solr 3.6, 3.6.1, 4.0-Alpha 4.0-Beta, the cacheKey parameter was re-named cachePk. This is renamed back for 4.0 ( 3.6.2, if released). See SOLR-3850 -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065309.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How do I use CachedSqlEntityProcessor?
There was a mistake in my last reply. Your child entities need to SELECT on the join key so DIH has it to do the join. So use SELECT SKU, CategoryName... James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: O. Olson [mailto:olson_...@yahoo.it] Sent: Tuesday, May 21, 2013 5:06 PM To: solr-user@lucene.apache.org Subject: RE: How do I use CachedSqlEntityProcessor? Thank you James bbarani. This worked in the sense that there was no error or exception in the data import. Unfortunately, I do not see any of my Category1, Category2 etc. when I retrieve the documents. If I use the first configuration of the db-data-config.xml posted in my original post, I see these fields in each document. Doing an import with your suggestion of entity name=Cat1 query=SELECT CategoryName from CAT_TABLE WHERE CategoryLevel=1 cacheKey=SKU cacheLookup=Product.SKU processor=CachedSqlEntityProcessor field column=CategoryName name=Category1 / /entity I do not see Category1. I have not changed my schema.xml, so I don’t think this should affect the results. For e.g. Category1 is declared as: field name=Category1 type=string indexed=true stored=true multiValued=true/ I am curious to what I am doing wrong. I should mention that I am using Solr 4.0.0. I know a more recent version is out – but I don’t think it should make a difference. Thank you again for your help. O. O. Dyer, James-2 wrote First remove the where condition from the child entities, then use the cacheKey and cacheLookup parameters to instruct DIH how to do the join. Example: entity name=Cat1 cacheKey=SKU cacheLookup=Product.SKU query=SELECT CategoryName from CAT_TABLE where CategoryLevel=1 / See http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor , particularly the 3rd configuration option. James Dyer Ingram Content Group (615) 213-4311 -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065091.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: too many boolean clauses
first of all thanks for response! Regarding two tokenizers - it's ok. switching to NGramFilterFactory didn't help (though I didn't reindex but don't think it was needed since switched it into 'query' section). Now regarding the maxBooleanClauses - how it effects performance (response times, memory usage) when increasing it? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/too-many-boolean-clauses-tp4065288p4065314.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: too many boolean clauses
Now regarding the maxBooleanClauses - how it effects performance (response times, memory usage) when increasing it? Changing maxBooleanClauses doesn't make any difference at all. Having thousands of clauses is what makes things run slower and take more memory. The setting just causes large queries to fail without running. If you need a query with more than 1024 clauses and there's no other way to do the job, then you have to increase it. Thanks, Shawn
Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0
*Problem:* We periodically rebuild our Solr index from scratch. We have built a custom publisher that horizontally scales to increase write throughput. On a given rebuild, we will have ~60 JVMs running with 5 threads that are actively publishing to all Solr masters. For each thread, we instantiate one StreamingUpdateSolrServer( QueueSize:100, QueueThreadSize: 2 ) for each master = 20 servers/thread. At the end of a publish cycle (we publish in smaller chunks = 5MM records), we execute server.blockUntilFinished() on each of the 20 servers on each thread ( 100 total ). Before we applied a recent change, this would always execute to completion. There were a few hang-ups on publishes but we consistently re-published our entire corpus in 6-7 hours. The *problem* is that the blockUntilFinished hangs indefinitely. From the java thread dumps, it appears that the loop in StreamingUpdateSolrServer thinks a runner thread is still active so it blocks (as expected). The other note about the java thread dump is that the active runner thread is exactly this: *Hung Runner Thread:* pool-1-thread-8 prio=3 tid=0x0001084c nid=0xfe runnable [0x5c7fe000] java.lang.Thread.State: RUNNABLE at java.net.SocketInputStream.socketRead0(Native Method) at java.net.SocketInputStream.read(SocketInputStream.java:129) at java.io.BufferedInputStream.fill(BufferedInputStream.java:218) at java.io.BufferedInputStream.read(BufferedInputStream.java:237) - locked 0xfffe81dbcbe0 (a java.io.BufferedInputStream) at org.apache.commons.httpclient.HttpParser.readRawLine(HttpParser.java:78) at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java:106) at org.apache.commons.httpclient.HttpConnection.readLine(HttpConnection.java:1116) at org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$HttpConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1413) at org.apache.commons.httpclient.HttpMethodBase.readStatusLine(HttpMethodBase.java:1973) at org.apache.commons.httpclient.HttpMethodBase.readResponse(HttpMethodBase.java:1735) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1098) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.StreamingUpdateSolrServer$Runner.run(StreamingUpdateSolrServer.java:154) Although the runner thread is reading the socket, there is absolutely no activity on the Solr clients. Other than the blockUntilFinished thread, the client is basically sleeping. * * * * ***Recent Change:* We increased the maxFieldLength from 1(default) to 2147483647 (Integer.MAX_VALUE). Given this change is server side, I don't know how this would impact adding a new document. I see how it would increase commit times and index size, but don't see the relationship to hanging client adds. *Ingest Workflow:* 1) Pull artifacts from relational database (PDF/TXT/Java bean) 2) Extract all searchable text fields -- this is where we use Tika, independent of Solr 3) Using Solr4J client, we publish an object that is serialized to XML and written to the master 4) execute blockUntilFinished for all 20 servers on each thread. 5) Autocommit set on servers at 30 minutes or 50k documents. During republish, 50k threshold is met first. * * *Environment:* Solr v3.5.0 20 masters 2 slaves/master = 40 slaves *Corpus:* We have ~100MM records, ranging in size from 50MB PDFs to 1KB TXT files. Our schema has an unusually large number of fields, 200. Our index size averages about 30GB/shards, totally 600GB. *Releated Bugs:* My symptoms are most related to this bug but we are not executing any deletes so I have low confidence that it is 100% related https://issues.apache.org/jira/browse/SOLR-1990 Although we have similar stack traces, we are only ADDING docs. Thanks ahead for any input/help! -- Justin Babuscio
RE: How do I use CachedSqlEntityProcessor?
Thank you very much James. Your suggestion worked exactly! I am curious why I did not get any errors before. For others, the following worked for me: entity name=Cat1 query=SELECT CategoryName, SKU from CAT_TABLE WHERE CategoryLevel=1 cacheKey=SKU cacheLookup=Product.SKU processor=CachedSqlEntityProcessor field column=CategoryName name=Category1 / /entity Similarly for other Categories i.e. Category2, Category3, etc. I am now going to try this for a larger dataset. I hope this works. O.O. Dyer, James-2 wrote There was a mistake in my last reply. Your child entities need to SELECT on the join key so DIH has it to do the join. So use SELECT SKU, CategoryName... James Dyer Ingram Content Group (615) 213-4311 -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065342.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How do I use CachedSqlEntityProcessor?
I am curious why I did not get any errors before. Because there was no (syntax) error before - the fact that you didn't include a SKU (but using it as cacheKey) just doesn't match anything .. therefore you got nothing added to your documents. Perhaps we should add an ticket as improvement for that, to issue a notice/warning if the result set itself doesn't contain the cacheKey? WDYT James? Stefan On Wednesday, May 22, 2013 at 5:14 PM, O. Olson wrote: Thank you very much James. Your suggestion worked exactly! I am curious why I did not get any errors before. For others, the following worked for me: entity name=Cat1 query=SELECT CategoryName, SKU from CAT_TABLE WHERE CategoryLevel=1 cacheKey=SKU cacheLookup=Product.SKU processor=CachedSqlEntityProcessor field column=CategoryName name=Category1 / /entity Similarly for other Categories i.e. Category2, Category3, etc. I am now going to try this for a larger dataset. I hope this works. O.O. Dyer, James-2 wrote There was a mistake in my last reply. Your child entities need to SELECT on the join key so DIH has it to do the join. So use SELECT SKU, CategoryName... James Dyer Ingram Content Group (615) 213-4311 -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065342.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
Re: [custom data structure] aligned dynamic fields
Although we are entering the era of Big Data, that does not mean there are no limits or restrictions on what a given technology can do. Maybe you need to consider either a smaller scope for your project, or more limited features, or some other form of simplification. Solr can do billions of documents - for a heavily sharded cluster, but you will have to work really hard to make that work well. So, I can confirm, that maybe in this case, there is no free lunch - unless you are willing to strip down the project. Or, maybe we just need a deeper feel for what your data model is really trying to achieve. Suggestion: Think about your data model again, and then try rephrasing it for this group. You have violated one cardinal rule of this group: you focused on a proposed solution rather than focusing our attention on the original problem you are trying to solve. That short-circuited our focus on really solving your problem. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Wednesday, May 22, 2013 6:50 AM To: solr-user@lucene.apache.org Subject: Re: [custom data structure] aligned dynamic fields Jack, Thanks for your response. 1. Flattening could be an option, although our scale and required functionality (runtime non DocValues backed facets) is beyond what solr3 can handle (billions of docs). We have flattened the meta data at the expense of over-generating solr documents. But to solve the problem I have described via flattening would make big impact on the scalability and price. 2. We have quite the opposite of what you have described about the dynamic fields: there will be very few per document. I agree, that caution should be taken here, as we have suffered (or should I say experienced) having multivalued fields (the good thing is we never had to facet on them). Any other options? Maybe someone can share their experience with dynamic fields and discourage from pursuing this path? Dmitry On Mon, May 20, 2013 at 4:23 PM, Jack Krupansky j...@basetechnology.comwrote: Before you dive off the deep end and go crazy with dynamic fields, try a clean, simple, Solr-oriented static design. Yes, you CAN do an over-complicated design with dynamic fields, but that doesn't mean you should. In a single phrase, denormalize and flatten your design. Sure, that will lead to a lot of rows, but Solr and Lucene are designed to do well in that scenario. If you are still linking in terms of C Struct, go for a long walk or do SOMETHING else until you can get that idea out of your head. It is a sub-optimal approach for exploiting the power of Lucene and Solr. Stay with a static schema design until you hit... just stay with a static schema, period. Dynamic fields and multi-valued fields do have value, but only when used in moderation - small numbers. If you start down a design path and find that you are heavily dependent on dynamic fields and/or multi-valued fields with large numbers of values per document, that is feedback that your design needs to be denormalized and flattened further. -- Jack Krupansky -Original Message- From: Dmitry Kan Sent: Monday, May 20, 2013 7:06 AM To: solr-user@lucene.apache.org Subject: [custom data structure] aligned dynamic fields Hi all, Our current project requirement suggests that we should start storing custom data structures in solr index. The custom data structure would be an equivalent of C struct. The task is as follows. Suppose we have two types of fields, one is FieldName1 and the other FieldName2. Suppose also that we can have multiple pairs of these two fields on a document in Solr. That is, in notation of dynamic fields: doc1 FieldName1_id1 FieldName2_id1 FieldName1_id2 FieldName2_id2 doc2 FieldName1_id3 FieldName2_id3 FieldName1_id4 FieldName2_id4 FieldName1_id5 FieldName2_id5 etc What we would like to have is a value for the Field1_(some_unique_id) and a value for Field2_(some_unique_id) as input for search. That is we wouldn't care about the some_unique_id in some search scenarios. And the search would automatically iterate the pairs of dynamic fields and respect the pairings. I know it used to be so, that with dynamic fields a client must provide the dynamically generated field names coupled with their values up front when searching. What data structure / solution could be used as an alternative approach to help such a structured search? Thanks, Dmitry
filter query by string length or word count?
I have schema.xml field name=body type=text_en_html indexed=true stored=true omitNorms=true/ ... fieldType name=text_en_html class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType how can I query docs whose body has more than 80 words (or 80 characters) ?
Re: Solr Faceting doesn't return values.
Hi There, Not sure I understand your problem correctly, but is 'mm_state_code' a real value or is it field name? Also, as Erick pointed out above, the facets are not calculated if there are no results. Hence you get no facets. You have mentioned which facets you want but you haven't mentioned which field you want to search against. That field should be defined in df parameter instead of sa_property_id. Can you post example solr document you're indexing? -Sandeep On 22 May 2013 14:28, samabhiK qed...@gmail.com wrote: Ok my bad. I do have a default field defined in the /select handler in the config file. lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=dfsa_property_id/str /lst But then how do I change my query now? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065298.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0
On 5/22/2013 9:08 AM, Justin Babuscio wrote: We periodically rebuild our Solr index from scratch. We have built a custom publisher that horizontally scales to increase write throughput. On a given rebuild, we will have ~60 JVMs running with 5 threads that are actively publishing to all Solr masters. For each thread, we instantiate one StreamingUpdateSolrServer( QueueSize:100, QueueThreadSize: 2 ) for each master = 20 servers/thread. Looking over all your details, you might want to try first reducing the maxFieldLength to slightly below Integer.MAX_VALUE. Try setting it to 2 billion, or even something more modest, in the millions. It's theoretically possible that the other value might be leading to an overflow somewhere. I've been looking for evidence of this, nothing's turned up yet. There MIGHT be bugs in the Apache Commons libraries that SolrJ uses. The next thing I would try is upgrading those component jars in your application's classpath - httpclient, commons-io, commons-codec, etc. Upgrading to a newer SolrJ version is also a good idea. Your notes imply that you are using the default XML request writer in SolrJ. If that's true, you should be able to use a 4.3 SolrJ even with an older Solr version, which would give you a server object that's based on HttpComponents 4.x, where your current objects are based on HttpClient 3.x. You would need to make adjustments in your source code. If you're not using the default XML request writer, you can get a similar change by using SolrJ 3.6.2. IMHO you should switch to HttpSolrServer (CommonsHttpSolrServer in SolrJ 3.5 and earlier). StreamingUpdateSolrServer (and its replacement in 3.6 and later, named ConcurrentUpdateSolrServer) has one glaring problem - it never informs the calling application about any errors that it encounters during indexing. It lies to you, and tells you that everything has succeeded even when it doesn't. The one advantage that SUSS/CUSS has over its Http sibling is that it is multi-threaded, so it can send updates concurrently. You seem to know enough about how it works, so I'll just say that you don't need additional complexity that is not under your control and refuses to throw exceptions when an error occurs. You already have a large-scale concurrent and multi-threaded indexing setup, so SolrJ's additional thread handling doesn't really buy you much. Thanks, Shawn
RE: How do I use CachedSqlEntityProcessor?
That would be a worthy enhancement to do. Always nice to give the user a warning when something is going to fail so they can troubleshoot better... James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Stefan Matheis [mailto:matheis.ste...@gmail.com] Sent: Wednesday, May 22, 2013 10:30 AM To: solr-user@lucene.apache.org Subject: Re: How do I use CachedSqlEntityProcessor? I am curious why I did not get any errors before. Because there was no (syntax) error before - the fact that you didn't include a SKU (but using it as cacheKey) just doesn't match anything .. therefore you got nothing added to your documents. Perhaps we should add an ticket as improvement for that, to issue a notice/warning if the result set itself doesn't contain the cacheKey? WDYT James? Stefan On Wednesday, May 22, 2013 at 5:14 PM, O. Olson wrote: Thank you very much James. Your suggestion worked exactly! I am curious why I did not get any errors before. For others, the following worked for me: entity name=Cat1 query=SELECT CategoryName, SKU from CAT_TABLE WHERE CategoryLevel=1 cacheKey=SKU cacheLookup=Product.SKU processor=CachedSqlEntityProcessor field column=CategoryName name=Category1 / /entity Similarly for other Categories i.e. Category2, Category3, etc. I am now going to try this for a larger dataset. I hope this works. O.O. Dyer, James-2 wrote There was a mistake in my last reply. Your child entities need to SELECT on the join key so DIH has it to do the join. So use SELECT SKU, CategoryName... James Dyer Ingram Content Group (615) 213-4311 -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065342.html Sent from the Solr - User mailing list archive at Nabble.com (http://Nabble.com).
Solr french search optimisation
Hello to all, I'm trying to setup solr 4.2 to index and search into french content. I defined a special fieldtype for french content : fieldType name=text_fr class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer analyzer type=query charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=French protected=protwords.txt/ /analyzer /fieldType unfortunately, this field does not behave as I wish. I'd like to be able to get results from unwell spelled word. IE : I wish to get the same result typing Pompe à chaleur than typing pomppe a chaler or with solère and solaire I'm do not find the right way to create a fieldtype to reach this aim. thanks in advance for your help, do not hesitate for more information if need. Regards David
Re: Solr Faceting doesn't return values.
Thanks for your reply. I have my request url modified like this: http://xx.xx.xx.xx/solr/collection1/select?q=TXdf=mm_state_codewt=xmlindent=truefacet=truefacet.field=sa_site_citydebug=all Facet Filed = sa_site_city ( city wise facet) Default Filed = mm_state_code Query= TX When I run this query, I get something like this: ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status0/int int name=QTime3/int lst name=params str name=facettrue/str str name=dfsa_site_city/str str name=indenttrue/str str name=qTX/str str name=_1369238921109/str str name=debugall/str str name=facet.fieldsa_site_city/str str name=wtxml/str /lst /lst result name=response numFound=0 start=0 /result lst name=facet_counts lst name=facet_queries/ lst name=facet_fields lst name=sa_site_city/ /lst lst name=facet_dates/ lst name=facet_ranges/ /lst lst name=debug str name=rawquerystringTX/str str name=querystringTX/str str name=parsedquerysa_site_city:TX/str str name=parsedquery_toStringsa_site_city:TX/str lst name=explain/ str name=QParserLuceneQParser/str lst name=timing double name=time2.0/double lst name=prepare double name=time0.0/double lst name=query double name=time0.0/double /lst lst name=facet double name=time0.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time0.0/double /lst /lst lst name=process double name=time2.0/double lst name=query double name=time1.0/double /lst lst name=facet double name=time1.0/double /lst lst name=mlt double name=time0.0/double /lst lst name=highlight double name=time0.0/double /lst lst name=stats double name=time0.0/double /lst lst name=debug double name=time0.0/double /lst /lst /lst /lst /response I do have the data in my index and that I verified by running other queries. I can't figure out what I am missing. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065360.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: filter query by string length or word count?
I doubt if there is any straight out of the box feature that supports this requirement, you will probably need to handle this at the index time. You can play around with Function Queries http://wiki.apache.org/solr/FunctionQuery for any such feature. On 22 May 2013 16:37, Sam Lee skyn...@gmail.com wrote: I have schema.xml field name=body type=text_en_html indexed=true stored=true omitNorms=true/ ... fieldType name=text_en_html class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType how can I query docs whose body has more than 80 words (or 80 characters) ?
Can anyone explain this Solr query behavior?
This query returns 0 documents: *q=(+Title:() +Classification:() +Contributors:() +text:())* This returns 1 document: *q=doc-id:3000* And this returns 631580 documents when I was expecting 0: *q=doc-id:3000 AND (+Title:() +Classification:() +Contributors:() +text:())* Am I missing something here? Can someone please explain? I am using Solr 4.2.1 Thanks -Shankar
Re: filter query by string length or word count?
Sam, I would highly suggest counting the words in your external pipeline and sending that value in as a specific field. It can then be queried quite simply with a: wordcount:{80 TO *] (Note the { next to 80, excluding the value of 80) Jason On May 22, 2013, at 11:37 AM, Sam Lee skyn...@gmail.com wrote: I have schema.xml field name=body type=text_en_html indexed=true stored=true omitNorms=true/ ... fieldType name=text_en_html class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords_en.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPossessiveFilterFactory/ filter class=solr.KeywordMarkerFilterFactory protected=protwords.txt/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType how can I query docs whose body has more than 80 words (or 80 characters) ?
Re: Solr Faceting doesn't return values.
From the response you've mentioned it appears to me that the query term TX is searched against sa_site_city instead of mm_state_code. Can you try your query like below: http://xx.xx.xx.xx/solr/collection1/select?q=*mm_state_code:(**TX)* wt=xmlindent=truefacet=truefacet.field=sa_site_citydebug=all and post your output? On 22 May 2013 17:13, samabhiK qed...@gmail.com wrote: str name=dfsa_site_city/str
RE: How do I use CachedSqlEntityProcessor?
Thank you guys, particularly James, very much. I just imported 200K documents in a little more than 2 mins – which is great for me :-). Thank you Stefan. I did not realize that it was not a syntax error and hence no error. Thank you for clearing that up. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-tp4064919p4065392.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: shard splitting
You will need to edit it manually and upload using a zookeeper client, you can use kazoo, it's very easy to use. -- Yago Riveiro Sent with Sparrow (http://www.sparrowmailapp.com/?sig) On Wednesday, May 22, 2013 at 10:04 AM, Arkadi Colson wrote: clusterstate.json is now reporting shard3 as inactive. Any idea how to change clusterstate.json manually from commandline? On 05/22/2013 08:59 AM, Arkadi Colson wrote: Hi I tried to split a shard but it failed. If I try to do it again it does not start again. I see the to extra shards in /collections/messages/leader_elect/ and /collections/messages/leaders/ How can I fix this? root@solr07-dcg:/solr/messages_shard3_replica2# curl 'http://localhost:8983/solr/admin/collections?action=SPLITSHARDcollection=messagesshard=shard3' ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status500/intint name=QTime300117/int/lstlst name=errorstr name=msgsplitshard the collection time out:300s/strstr name=traceorg.apache.solr.common.SolrException: splitshard the collection time out:300s at org.apache.solr.handler.admin.CollectionsHandler.handleResponse(CollectionsHandler.java:166) at org.apache.solr.handler.admin.CollectionsHandler.handleSplitShardAction(CollectionsHandler.java:300) at org.apache.solr.handler.admin.CollectionsHandler.handleRequestBody(CollectionsHandler.java:136) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:608) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:215) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) /strint name=code500/int/lst /response INFO - 2013-05-22 06:45:54.148; org.apache.solr.handler.admin.CoreAdminHandler; Invoked split action for core: messages_shard3_replica1 INFO - 2013-05-22 06:45:54.271; org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: partitions=2 segments=29 INFO - 2013-05-22 06:46:03.240; org.apache.solr.update.SolrIndexSplitter; SolrIndexSplitter: partition #0 range=2aaa-5554 BR Arkadi
RE: Speed up import of Hierarchical Data
Just an update for others reading this thread: I had some CachedSqlEntityProcessor and had it addressed in the thread How do I use CachedSqlEntityProcessor? (http://lucene.472066.n3.nabble.com/How-do-I-use-CachedSqlEntityProcessor-td4064919.html) I basically had to declare the child entities in the db-data-config.xml like: entity name=Cat1 query=SELECT CategoryName, SKU from CAT_TABLE WHERE CategoryLevel=1 cacheKey=SKU cacheLookup=Product.SKU processor=CachedSqlEntityProcessor field column=CategoryName name=Category1 / /entity Thanks to James and others for their help. O. O. -- View this message in context: http://lucene.472066.n3.nabble.com/Speed-up-import-of-Hierarchical-Data-tp4063924p4065400.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0
Shawn, Thank you! Just some quick responses: On your overflow theory, why would this impact the client? Is is possible that a write attempt to Solr would block indefinitely while the Solr server is running wild or in a bad state due to the overflow? We attempt to set the BinaryRequestWriter but per this bug: https://issues.apache.org/jira/browse/SOLR-1565, v3.5 uses the default XML writer. On upgrading to 3.6.2 or 4.x, we have an organizational challenge that requires approval of the software/upgrade. I am promoting/supporting this idea but cannot execute in the short-term. For the mass publish, we originally used the CommonsHttpSolrServer (what we use in live production updates) but we found the trade-off with performance was quite large. I really like your idea about KISS on threading. Since I'm already introducing complexity with all the multi-threading, why stress the older 3.x software. We may need to trade-off time for this. My first tactics will be to adjust the maxFieldLength and toggle the configuration to use CommonsHttpSolrServer. I will follow-up with any discoveries. Thanks again, Justin On Wed, May 22, 2013 at 11:46 AM, Shawn Heisey s...@elyograg.org wrote: On 5/22/2013 9:08 AM, Justin Babuscio wrote: We periodically rebuild our Solr index from scratch. We have built a custom publisher that horizontally scales to increase write throughput. On a given rebuild, we will have ~60 JVMs running with 5 threads that are actively publishing to all Solr masters. For each thread, we instantiate one StreamingUpdateSolrServer( QueueSize:100, QueueThreadSize: 2 ) for each master = 20 servers/thread. Looking over all your details, you might want to try first reducing the maxFieldLength to slightly below Integer.MAX_VALUE. Try setting it to 2 billion, or even something more modest, in the millions. It's theoretically possible that the other value might be leading to an overflow somewhere. I've been looking for evidence of this, nothing's turned up yet. There MIGHT be bugs in the Apache Commons libraries that SolrJ uses. The next thing I would try is upgrading those component jars in your application's classpath - httpclient, commons-io, commons-codec, etc. Upgrading to a newer SolrJ version is also a good idea. Your notes imply that you are using the default XML request writer in SolrJ. If that's true, you should be able to use a 4.3 SolrJ even with an older Solr version, which would give you a server object that's based on HttpComponents 4.x, where your current objects are based on HttpClient 3.x. You would need to make adjustments in your source code. If you're not using the default XML request writer, you can get a similar change by using SolrJ 3.6.2. IMHO you should switch to HttpSolrServer (CommonsHttpSolrServer in SolrJ 3.5 and earlier). StreamingUpdateSolrServer (and its replacement in 3.6 and later, named ConcurrentUpdateSolrServer) has one glaring problem - it never informs the calling application about any errors that it encounters during indexing. It lies to you, and tells you that everything has succeeded even when it doesn't. The one advantage that SUSS/CUSS has over its Http sibling is that it is multi-threaded, so it can send updates concurrently. You seem to know enough about how it works, so I'll just say that you don't need additional complexity that is not under your control and refuses to throw exceptions when an error occurs. You already have a large-scale concurrent and multi-threaded indexing setup, so SolrJ's additional thread handling doesn't really buy you much. Thanks, Shawn -- Justin Babuscio 571-210-0035 http://linchpinsoftware.com
Re: Solr Faceting doesn't return values.
When I use your query, I get : ?xml version=1.0 encoding=UTF-8? response lst name=responseHeader int name=status400/int int name=QTime12/int lst name=params str name=facettrue/str str name=dfmm_state_code/str str name=indenttrue/str str name=q*mm_state_code:(**TX)*/str str name=_1369244078714/str str name=debugall/str str name=facet.fieldsa_site_city/str str name=wtxml/str /lst /lst lst name=error str name=msgorg.apache.solr.search.SyntaxError: Cannot parse '*mm_state_code:(**TX)*': Encountered : : at line 1, column 14. Was expecting one of: EOF AND ... OR ... NOT ... + ... - ... BAREOPER ... ( ... * ... ^ ... QUOTED ... TERM ... FUZZY_SLOP ... PREFIXTERM ... WILDTERM ... REGEXPTERM ... [ ... { ... LPARAMS ... NUMBER ... /str int name=code400/int /lst /response Not sure why the data wont show up. Almost all the records has the field sa_site_city has data and is also indexed. :( -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Faceting-doesn-t-return-values-tp4065276p4065406.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Large-scale Solr publish - hanging at blockUntilFinished indefinitely - stuck on SocketInputStream.socketRead0
On 5/22/2013 11:25 AM, Justin Babuscio wrote: On your overflow theory, why would this impact the client? Is is possible that a write attempt to Solr would block indefinitely while the Solr server is running wild or in a bad state due to the overflow? That's the general notion. I could be completely wrong about this, but as that limit is the only thing you changed, it was the idea that came to mind first. One other thing I thought of, though this would be a band-aid, not a real solution - if there's a definable maximum amount of time that an individual update request should take to complete (1 minute? 5 minutes?) then you might be able to use the setSoTimeout call on your server object. In the 3.5.0 source code, this method is inherited, so it might not actually work correctly, but I'm hopeful. If the problem is stuck update requests (and not a bug in blockUntilFinished), setting the SoTimeout (assuming it works) might unplug the works. The stuck requests might fail, but your SolrJ log might contain enough info to help you track that down. I don't think your application would ever be notified about such failures, but they should be logged. Good luck with the upgrade plan. Would you be able to upgrade the dependent jars for the existing SolrJ without an extensive approval process? I won't be surprised if the answer is no. On SOLR-1990, I don't think that's it, because unless blockUntilFinished() itself is broken, calling it more often than strictly necessary shouldn't be an issue. Do you see any problems in the server log? Thanks, Shawn
MoreLikeThis - No Results
I'm a developing a recommendation feature in our app using the MoreLikeThisHandler http://wiki.apache.org/solr/MoreLikeThisHandler, and so far it is doing a great job. We're using a user's competency keywords as the MLT field list and the user's corresponding document in Solr as the comparison document. I have found that for one user I'm not receiving any recommendations, and I'm not sure why. Solr: 4.1.0 *relevant schema*: field name=competencyKeywords type=short-mlt-text indexed=true stored=true multiValued=true termVectors=true/ fieldType name=short-mlt-text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType *user's values*: arr name=competencyKeywords strHealthcare Cost Trends/str /arr Is it possible that among all the ~40,000 users in this index (about 500 of which have the same competency keywords), that the words healthcare, cost and trends are just judged by Lucene to not be significant. I realize that I may not understand how the MLT Handler is doing things under the covers...I've only been guessing until now based on the (otherwise excellent) results I've been seeing. Thanks, Andy Pickler P.S. For some additional information, the following query: /mlt?q=objectId:user91813mlt.fl=competencyKeywordsmlt.interestingTerms=detailsdebugQuery=truemlt.match.include=false ...produces the following results... response lst name=responseHeader int name=status0/int int name=QTime2/int /lst result name=response numFound=0 start=0/ lst name=interestingTerms/ lst name=debug str name=rawquerystringobjectId:user91813/str str name=querystringobjectId:user91813/str str name=parsedquery/ str name=parsedquery_toString/ lst name=explain/ /lst /response
hostname - ipaddress change in solr4.0 to solr4.1+
Logging/UI used to show hostname in 4.0 in 4.1+ it switched to ip addresses is this by design or a bug/side effect ? its pretty painful to look at ip addresses, I am planning to change. let me know if you have any concerns -- Anirudha
Re: solr starting time takes too long
: Subject: solr starting time takes too long : In-Reply-To: 519c6cd6.90...@smartbit.be : Thread-Topic: shard splitting https://people.apache.org/~hossman/#threadhijack -Hoss
Re: hostname - ipaddress change in solr4.0 to solr4.1+
On 5/22/2013 12:53 PM, Anirudha Jadhav wrote: Logging/UI used to show hostname in 4.0 in 4.1+ it switched to ip addresses is this by design or a bug/side effect ? If you are talking about SolrCloud, this was an intentional change. By including a host property either on the Solr startup command or in solr.xml, you can force SolrCloud to use hostnames. http://wiki.apache.org/solr/SolrCloud#SolrCloud_Instance_Params If you aren't talking about SolrCloud, can you give specific examples of what you are seeing? Thanks, Shawn
RE: Slow Highlighter Performance Even Using FastVectorHighlighter
After taking your advice on profiling, I didn't see any memory issues. I wanted to verify this with a small set of data. So I created a new sandbox core with the exact same schema and config file settings. I indexed only 25 PDF documents with an average size of 2.8 MB, the largest is approx 5 MB (39 pages). I run the exact same query on that core and I'm seeing response times of 7 secs or more. Without highlighting the response is usually 1 ms. I don't understand why it's taking 7 secs to return highlights. The size of the index is only 20.93 MB. The JVM heap Xms and Xmx are both set to 1024 for this verification purpose and that should be more than enough. The processor is plenty powerful enough as well. Running VisualVM shows all my CPU time being taken by mainly these 3 methods: org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI nfo.getStartOffset() org.apache.lucene.search.vectorhighlight.FieldPhraseList$WeightedPhraseI nfo.getStartOffset() org.apache.lucene.search.vectorhighlight.FieldPhraseList.addIfNoOverlap( ) My guess is that this has something to do with how I'm handling partial word matches/highlighting. I have setup another request handler that only searches the whole word fields and it returns in 850 ms with highlighting. Any ideas? - Andy -Original Message- From: Bryan Loofbourrow [mailto:bloofbour...@knowledgemosaic.com] Sent: Monday, May 20, 2013 1:39 PM To: solr-user@lucene.apache.org Subject: RE: Slow Highlighter Performance Even Using FastVectorHighlighter My guess is that the problem is those 200M documents. FastVectorHighlighter is fast at deciding whether a match, especially a phrase, appears in a document, but it still starts out by walking the entire list of term vectors, and ends by breaking the document into candidate-snippet fragments, both processes that are proportional to the length of the document. It's hard to do much about the first, but for the second you could choose to expose FastVectorHighlighter's FieldPhraseList representation, and return offsets to the caller rather than fragments, building up your own snippets from a separate store of indexed files. This would also permit you to set stored=false, improving your memory/core size ratio, which I'm guessing could use some improving. It would require some work, and it would require you to store a representation of what was indexed outside the Solr core, in some constant-bytes-to-character representation that you can use offsets with (e.g. UTF-16, or ASCII+entity references). However, you may not need to do this -- it may be that you just need more memory for your search machine. Not JVM memory, but memory that the O/S can use as a file cache. What do you have now? That is, how much memory do you have that is not used by the JVM or other apps, and how big is your Solr core? One way to start getting a handle on where time is being spent is to set up VisualVM. Turn on CPU sampling, send in a bunch of the slow highlight queries, and look at where the time is being spent. If it's mostly in methods that are just reading from disk, buy more memory. If you're on Linux, look at what top is telling you. If the CPU usage is low and the wa number is above 1% more often than not, buy more memory (I don't know why that wa number makes sense, I just know that it has been a good rule of thumb for us). -- Bryan -Original Message- From: Andy Brown [mailto:andy_br...@rhoworld.com] Sent: Monday, May 20, 2013 9:53 AM To: solr-user@lucene.apache.org Subject: Slow Highlighter Performance Even Using FastVectorHighlighter I'm providing a search feature in a web app that searches for documents that range in size from 1KB to 200MB of varying MIME types (PDF, DOC, etc). Currently there are about 3000 documents and this will continue to grow. I'm providing full word search and partial word search. For each document, there are three source fields that I'm interested in searching and highlighting on: name, description, and content. Since I'm providing both full and partial word search, I've created additional fields that get tokenized differently: name_par, description_par, and content_par. Those are indexed and stored as well for querying and highlighting. As suggested in the Solr wiki, I've got two catch all fields text and text_par for faster querying. An average search results page displays 25 results and I provide paging. I'm just returning the doc ID in my Solr search results and response times have been quite good (1 to 10 ms). The problem in performance occurs when I turn on highlighting. I'm already using the FastVectorHighlighter and depending on the query, it has taken as long as 15 seconds to get the highlight snippets. However, this isn't always the case. Certain query terms result in 1 sec or less response time. In any case, 15 seconds is way too long. I'm fairly new to Solr but I've spent days coming up with what I've got so far. Feel free to
Re: MoreLikeThis - No Results
Answered my own question... mlt.mintf: Minimum Term Frequency - the frequency below which terms will be ignored in the source doc Our source doc is a set of limited terms...not a large content field. So in our case I need to set that value to 1 (rather than the default of 2). Now I'm getting results...and they indeed are relevant. Thanks, Andy Pickler On Wed, May 22, 2013 at 12:20 PM, Andy Pickler andy.pick...@gmail.comwrote: I'm a developing a recommendation feature in our app using the MoreLikeThisHandler http://wiki.apache.org/solr/MoreLikeThisHandler, and so far it is doing a great job. We're using a user's competency keywords as the MLT field list and the user's corresponding document in Solr as the comparison document. I have found that for one user I'm not receiving any recommendations, and I'm not sure why. Solr: 4.1.0 *relevant schema*: field name=competencyKeywords type=short-mlt-text indexed=true stored=true multiValued=true termVectors=true/ fieldType name=short-mlt-text class=solr.TextField positionIncrementGap=100 autoGeneratePhraseQueries=true analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldType *user's values*: arr name=competencyKeywords strHealthcare Cost Trends/str /arr Is it possible that among all the ~40,000 users in this index (about 500 of which have the same competency keywords), that the words healthcare, cost and trends are just judged by Lucene to not be significant. I realize that I may not understand how the MLT Handler is doing things under the covers...I've only been guessing until now based on the (otherwise excellent) results I've been seeing. Thanks, Andy Pickler P.S. For some additional information, the following query: /mlt?q=objectId:user91813mlt.fl=competencyKeywordsmlt.interestingTerms=detailsdebugQuery=truemlt.match.include=false ...produces the following results... response lst name=responseHeader int name=status0/int int name=QTime2/int /lst result name=response numFound=0 start=0/ lst name=interestingTerms/ lst name=debug str name=rawquerystringobjectId:user91813/str str name=querystringobjectId:user91813/str str name=parsedquery/ str name=parsedquery_toString/ lst name=explain/ /lst /response
Re: Boosting Documents
: NOTE: make sure norms are enabled (omitNorms=false in the schema.xml) for : any fields where the index-time boost should be stored. : : In my case where I only need to boost the whole document (not a specific : field), do I have to activate the omitNorms=false for all the fields : in the schema ? docBoost is really just syntactic sugar for a field boost on each field i the document -- it's factored into the norm value for each field in the document. (I'll update the wiki to make this more clear) If you do a query that doesn't utilize any field which has norms, then the docBoost you specified when indexing the document never comes into play. In general, doc boosts and field boosts, and the way they come into play as part of the field norm is fairly inflexible, and (in my opinion) antiquated. A much better way of dealing with this type of problem is also discussed in the section of the wiki you linked to. Imeediately below... http://wiki.apache.org/solr/SolrRelevancyFAQ#index-time_boosts ...you'll find... http://wiki.apache.org/solr/SolrRelevancyFAQ#Field_Based_Boosting -Hoss
Scheduling DataImports
Hi, I am new to Solr and recently started exploring it for search/sort needs in our webapp. I have couple of questions as below, (I am using solr 4.2.1 with default core named collection1) 1. We have a use case where we would like to index data every 10 mins (avg). Whats the best way to schedule data import every 10 mins or so? cron job? 2. Also, We are indexing data returned from an api which returns different cache ttls. How can I re-index after ttl its expired? some process which polls for the expiring soon entries and issues data-import command? Any pointers will be much appreciated. Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Scheduling-DataImports-tp4065435.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: solr starting time takes too long
Very sorry about hijacking existing thread (I thought it would be OK if I just change the title and content, but still wrong). It will never happen again. Lisheng -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Wednesday, May 22, 2013 11:58 AM To: solr-user@lucene.apache.org Subject: Re: solr starting time takes too long : Subject: solr starting time takes too long : In-Reply-To: 519c6cd6.90...@smartbit.be : Thread-Topic: shard splitting https://people.apache.org/~hossman/#threadhijack -Hoss
Re: Russian stopwords
I'm encountering the same issue, but, my Russian stopwords.txt IS encoded in UTF-8. I verified the encoding using EmEditor (I've used it for years, and I use it for the existing English, French, Spanish, Portuguese and German Solr configurations, without issues). Just to make extra sure, I downloaded Edit Plus, as mentioned in this thread, and verified the encoding again: UTF-8 I realize this will pass for a stupid question, but... Could there be any issue other than encoding ? Thanks; -- View this message in context: http://lucene.472066.n3.nabble.com/Russian-stopwords-tp491490p4065440.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Regular expression in solr
API doc says that: Lucene supports regular expression searches matching a pattern between forward slashes /. The syntax may change across releases, but the current supported syntax is documented in the RegExp class. For example to find documents containing moat or boat: /[mb]oat/ I think that this may help you: http://lucene.apache.org/core/4_3_0/core/org/apache/lucene/util/automaton/RegExp.html 2013/5/22 Oussama Jilal jilal.ouss...@gmail.com There is no ^ or $ in the solr regex since the regular expression will match tokens (not the complete indexed text). So the results you get will basicly depend on your way of indexing, if you use the regex on a tokenized field and that is not what you want, try to use a copy field wich is not tokenized and then use the regex on that one. On 05/22/2013 11:53 AM, Stéphane Habett Roux wrote: I just can't get the $ endpoint to work. I am not sure but I heard it works with the Java Regex engine (a little obvious if it is true ...), so any Java regex tutorial would help you. On 05/22/2013 11:42 AM, Sagar Chaturvedi wrote: Yes, it works for me too. But many times result is not as expected. Is there some guide on use of regex in solr? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 4:00 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr I don't think so, it always worked for me without anything special, just try it and see :) On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote: @Oussama Thank you for your reply. Is it as simple as that? I mean no additional settings required? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr You can write a regular expression query like this (you need to specify the regex between slashes / ) : fieldName:/[rR]egular.*/ On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote: Hi, How do we search based upon regular expressions in solr? Regards, Sagar DISCLAIMER: - - - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . - - - DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Re: Regular expression in solr
If the indexed data includes positions, it should be possible to implement ^ and $ as the first and last positions. On 05/22/2013 04:08 AM, Oussama Jilal wrote: There is no ^ or $ in the solr regex since the regular expression will match tokens (not the complete indexed text). So the results you get will basicly depend on your way of indexing, if you use the regex on a tokenized field and that is not what you want, try to use a copy field wich is not tokenized and then use the regex on that one. On 05/22/2013 11:53 AM, Stéphane Habett Roux wrote: I just can't get the $ endpoint to work. I am not sure but I heard it works with the Java Regex engine (a little obvious if it is true ...), so any Java regex tutorial would help you. On 05/22/2013 11:42 AM, Sagar Chaturvedi wrote: Yes, it works for me too. But many times result is not as expected. Is there some guide on use of regex in solr? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 4:00 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr I don't think so, it always worked for me without anything special, just try it and see :) On 05/22/2013 11:26 AM, Sagar Chaturvedi wrote: @Oussama Thank you for your reply. Is it as simple as that? I mean no additional settings required? -Original Message- From: Oussama Jilal [mailto:jilal.ouss...@gmail.com] Sent: Wednesday, May 22, 2013 3:37 PM To: solr-user@lucene.apache.org Subject: Re: Regular expression in solr You can write a regular expression query like this (you need to specify the regex between slashes / ) : fieldName:/[rR]egular.*/ On 05/22/2013 10:51 AM, Sagar Chaturvedi wrote: Hi, How do we search based upon regular expressions in solr? Regards, Sagar DISCLAIMER: - - - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . - - - DISCLAIMER: -- - The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . -- - DISCLAIMER: --- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. It shall not attach any liability on the originator or NEC or its affiliates. Any views or opinions presented in this email are solely those of the author and may not necessarily reflect the opinions of NEC or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of the author of this e-mail is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. . ---
Tool to read Solr4.2 index
Hi All, We can use lukeall4.0 for reading Solr3.x index . Like that do we have anything to read solr 4.x index. Please help. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Tool-to-read-Solr4-2-index-tp4065448.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: Date Field
How is the format of utc string? Example? thx -Ursprüngliche Nachricht- Von: Chris Hostetter [mailto:hossman_luc...@fucit.org] Gesendet: Mittwoch, 22. Mai 2013 00:03 An: solr-user@lucene.apache.org Betreff: Re: Date Field : 2) Chain TemplateTransformer either by itself or before the : DateFormatTransformer (not sure if evaluator spits the date out or : not). Either way, I think you should be able to use the formatDate : function in the transformer That sounds correct .. it should be possible to use TemplateTransformer (or something like RegexTransformer) prior to DateFormatTransformer so that the value you extract from the xpath (ie: 5/13) gets the literal string UTC appended to it, and then configure a dateTimeFormat that parses the timezone from the value (ie: MM/yy z) -Hoss
Re: Tool to read Solr4.2 index
This might help http://wiki.apache.org/solr/LukeRequestHandler -- Shreejay Nair Sent from my mobile device. Please excuse brevity and typos. On Wednesday, May 22, 2013 at 13:47, gpssolr2020 wrote: Hi All, We can use lukeall4.0 for reading Solr3.x index . Like that do we have anything to read solr 4.x index. Please help. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Tool-to-read-Solr4-2-index-tp4065448.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tool to read Solr4.2 index
Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Tool-to-read-Solr4-2-index-tp4065448p4065453.html Sent from the Solr - User mailing list archive at Nabble.com.
fq facet on double and non-indexed field
Hi i am trying to apply filtering on non-indexed double field .But its not returning any results. So cant we do fq on non-indexed field? can not use FieldCache on a field which is neither indexed nor has doc values: EXCH_RT_AMT /str int name=code400/int We are using Solr4.2. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/fq-facet-on-double-and-non-indexed-field-tp4065457.html Sent from the Solr - User mailing list archive at Nabble.com.
Low Priority: Lucene Facets in Solr?
Hi All, Not really a pressing need for this at all, but having worked through a few tutorials, I was wondering if there was any work being done to incorporate Lucene Facets into solr: http://lucene.apache.org/core/4_3_0/facet/org/apache/lucene/facet/doc-files/userguide.html Brendan
Re: Scheduling DataImports
On first, the cron job that hits the DIH trigger URL will probably be the easiest way. Not sure I understood the second question. How do you store/know that the entries expire. And how do you pull for those specific entries? Regards, Alex. Personal blog: http://blog.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Wed, May 22, 2013 at 3:36 PM, smanad sma...@gmail.com wrote: Hi, I am new to Solr and recently started exploring it for search/sort needs in our webapp. I have couple of questions as below, (I am using solr 4.2.1 with default core named collection1) 1. We have a use case where we would like to index data every 10 mins (avg). Whats the best way to schedule data import every 10 mins or so? cron job? 2. Also, We are indexing data returned from an api which returns different cache ttls. How can I re-index after ttl its expired? some process which polls for the expiring soon entries and issues data-import command? Any pointers will be much appreciated. Thanks, -M -- View this message in context: http://lucene.472066.n3.nabble.com/Scheduling-DataImports-tp4065435.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Low Priority: Lucene Facets in Solr?
The topic has come up, but nobody has expressed a sense of urgency. It actually has a placeholder Jira: https://issues.apache.org/jira/browse/SOLR-4774 Feel free to add your encouragement there. -- Jack Krupansky -Original Message- From: Brendan Grainger Sent: Wednesday, May 22, 2013 6:39 PM To: solr-user@lucene.apache.org Subject: Low Priority: Lucene Facets in Solr? Hi All, Not really a pressing need for this at all, but having worked through a few tutorials, I was wondering if there was any work being done to incorporate Lucene Facets into solr: http://lucene.apache.org/core/4_3_0/facet/org/apache/lucene/facet/doc-files/userguide.html Brendan
Using alternate Solr index location for SolrCloud
Our prod environment is going to be on Azure. As such, I want our index to live on the Azure VM's local storage rather than the default VM disk (blob storage). Normally, I just use /var/opt/tomcat7/PORT/solr/collection1/data, but I want to use something else. I am also using the Collections API to create my collections (I have several). Is my only option to hardcode the data directory in the collection's solrconfig.xml? I would prefer to avoid this because not all environments will have this same disk structure. Ideally, I could put a parameter in the Collections API for the instance directory. I see this is available for Core Admin, but I don't see it for the Collections API itself. Or failing that, solr.xml would be better. Does anyone have any suggestions? Thanks. -- *KEVIN OSBORN* LEAD SOFTWARE ENGINEER CNET Content Solutions OFFICE 949.399.8714 CELL 949.310.4677 SKYPE osbornk 5 Park Plaza, Suite 600, Irvine, CA 92614 [image: CNET Content Solutions]
Storing and retrieving json
Hello all, I am facing a need to store and retrieve json string in a field. eg. Imagine a schema like below. [Please note that this is just an example but not actual specification.] str name=carName type=string indexed=true stored=false str name=carDescription type=string indexed=false stored=false carDescription is a json string . An example would be { model:1988 type:manual} I dont need to search on the carDescription. I want to store some json data and retreive. When i feed json data to carDescription field through DIH, the response for the query is like below {\ model\:1988 \type\:\manual\} All the quotes are escaped. I dont want this. I want the original unmodified data. Is there a way to do this? Thanks, Karthick
Re: Storing and retrieving json
Yes, the quotes need to be escaped - since they are contained within a quoted string, which you didn't show. That is the proper convention for representing strings in JSON. Are you familiar with the JSON format? If not, try XML - it won't have to represent a string as a quoted JSON string. If you read and parse the Solr response with a JSON parser, you should get your original JSON string value back intact. Now, you may want to do a JSON parse of that string itself, but that has nothing to do with the Solr JSON response itself. As you said, you wanted to store and retrieve JSON as a string field, which Solr appears to be doing correctly. -- Jack Krupansky -Original Message- From: Karthick Duraisamy Soundararaj Sent: Wednesday, May 22, 2013 8:03 PM To: solr-user@lucene.apache.org Subject: Storing and retrieving json Hello all, I am facing a need to store and retrieve json string in a field. eg. Imagine a schema like below. [Please note that this is just an example but not actual specification.] str name=carName type=string indexed=true stored=false str name=carDescription type=string indexed=false stored=false carDescription is a json string . An example would be { model:1988 type:manual} I dont need to search on the carDescription. I want to store some json data and retreive. When i feed json data to carDescription field through DIH, the response for the query is like below {\ model\:1988 \type\:\manual\} All the quotes are escaped. I dont want this. I want the original unmodified data. Is there a way to do this? Thanks, Karthick
Re: List of Solr Query Parsers
Hello, I have just created a new JIRA issue, if you are interested in trying out the new query parser, please visit: https://issues.apache.org/jira/browse/LUCENE-5014 Thanks, roman On Mon, May 6, 2013 at 5:36 PM, Jan Høydahl jan@cominvent.com wrote: Added. Please try editing the page now. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 6. mai 2013 kl. 19:58 skrev Roman Chyla roman.ch...@gmail.com: Hi Jan, My login is RomanChyla Thanks, Roman On 6 May 2013 10:00, Jan Høydahl jan@cominvent.com wrote: Hi Roman, This sounds great! Please register as a user on the WIKI and give us your username here, then we'll grant you editing karma so you can edit the page yourself! The NEAR/5 syntax is really something I think we should get into the default lucene parser. Can't wait to have a look at your code. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com 6. mai 2013 kl. 15:41 skrev Roman Chyla roman.ch...@gmail.com: Hi Jan, Please add this one http://29min.wordpress.com/category/antlrqueryparser/ - I can't edit the wiki This parser is written with ANTLR and on top of lucene modern query parser. There is a version which implements Lucene standard QP as well as a version which includes proximity operators, multi token synonym handling and all of solr qparsers using function syntax - ie,. for a query like: multi synonym NEAR/5 edismax(foo) I would like to create a JIRA ticket soon Thanks Roman On 6 May 2013 09:21, Jan Høydahl jan@cominvent.com wrote: Hi, I just added a Wiki page to try to gather a list of all known Solr query parsers in one place, both those which are part of Solr and those in JIRA or 3rd party. http://wiki.apache.org/solr/QueryParser If you known about other cool parsers out there, please add to the list. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com
Question about Coordination factor
Hello Folks, I have a question about coordination factor to ensure my understanding of this value is correct. If I have documents that contain some keywords like the following:Doc1: A, B, CDoc2: A, CDoc3: B, C And my query is A OR B OR C OR D. In this case, Coord factor value for each documents will be the following: Doc1: 3/4 Doc2: 2/4 Doc3: 2/4 In the same fashion, respective value of coord factor is the following if I have a query C OR D: Doc1: 1/2 Doc2: 1/2 Doc3: 1/2 Is this correct? or Did I miss something? Please correct me if I am wrong. Regards,Kazuaki Hiraga