Re: Apache solr sink issue
Do you have this tag uniqueKeyid/uniqueKey define in your schema , it is not mandatory to have unique field but if you need it then u have to provide it else you can remove it, see below wiki page for more details http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field Some options to generate this field if your document cannot derive one https://wiki.apache.org/solr/UniqueKey On Mon, Aug 18, 2014 at 10:48 PM, Jeniba Johnson jeniba.john...@lntinfotech.com wrote: Hi, I want to index a log file in Solr using Flume + Apache Solr sink Iam referring this below mentioned URL https://cwiki.apache.org/confluence/display/FLUME/How+to+Setup+Solr+Sink+for+Flume Error from flume console 2014-08-19 15:38:56,451 (concurrentUpdateScheduler-2-thread-1) [ERROR - org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.handleError(ConcurrentUpdateSolrServer.java:354)] error java.lang.Exception: Bad Request request: http://xxx.xx.xx:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:208) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Error from solr console 473844 [qtp176433427-19] ERROR org.apache.solr.core.SolrCore â org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id Csn anyone help me with this issue and help me with the steps for integrating flume with solr sink Regards, Jeniba Johnson The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
Any recommendation for Solr Cloud version.
Hi, I am trying to build a new Solr Cloud which will replace sold cluster ( 2 indexers + 2 searchers ). the version what I am using is 4.1. the sooner the better? i.e. version 4.9.0. Please give any suggestion for me. Thanks, Chunki.
Exact match?
If I have a long string, how do I match on 90% of the terms to see if there is a duplicate? If I add the field and index it, what is the best way to return 90%? # terms match # of terms in the field? -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: Syntax unavailable for parameter substitution Solr 3.5
Thanks Chris, Yes I am comfortable writing Java code will try to give it a shot. Thanks Deepak -- View this message in context: http://lucene.472066.n3.nabble.com/Syntax-unavailable-for-parameter-substitution-Solr-3-5-tp4153197p4153722.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr, weblogic managed server and log4j logging
Maybe some of you uses Solr with Weblogic and can help me... I have weblogic 12.1.3 and would like to deploy/run solr on a managed server. I started the node manager, created a server named server-solr and deployed solr(4.7.9). In the server start tab of the server configuration I added C:\lib\wllog4j.jar;C:\lib\log4j-1.2.16.jar in the Class Path and -Dlog4j.configuration=C:\download\log4j.properties -Dweblogic.log.Log4jLoggingEnabled=true in the Arguments When I try to start the server I get the following error: Aug 19, 2014 10:53:07 AM CEST Critical WebLogicServer BEA-000386 Server subsystem failed. Reason: A MultiException has 6 exceptions. They are: 1. java.lang.NoSuchMethodError: com.bea.logging.LogBufferHandler.getServerLogBufferHandler()Lcom/bea/logging/LogBufferHandler; 2. java.lang.IllegalStateException: Unable to perform operation: post construct on weblogic.diagnostics.lifecycle.LoggingServerService 3. java.lang.IllegalArgumentException: While attempting to resolve the dependencies of weblogic.diagnostics.lifecycle.DiagnosticFoundationService errors were found 4. java.lang.IllegalStateException: Unable to perform operation: resolve on weblogic.diagnostics.lifecycle.DiagnosticFoundationService 5. java.lang.IllegalArgumentException: While attempting to resolve the dependencies of com.oracle.injection.integration.CDIIntegrationService errors were found 6. java.lang.IllegalStateException: Unable to perform operation: resolve on com.oracle.injection.integration.CDIIntegrationService Do I still miss something? Thank you in advance Francesco
Re: solr cloud going down repeatedly
On 08/18/2014 08:38 PM, Shawn Heisey wrote: With an 8GB heap and UseConcMarkSweepGC as your only GC tuning, I can pretty much guarantee that you'll see occasional GC pauses of 10-15 seconds, because I saw exactly that happening with my own setup. This is what I use now: http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning I can't claim that my problem is 100% solved, but collections that go over one second are *very* rare now, and I'm pretty sure they are all under two seconds. Thank you for your comment. How did you test these settings? I mean, that's a lot of tuning and I would like to set up some test environment to be certain this is what I want...
sample Cell schema question
In the sample schema.xml I can see this: !-- Main body of document extracted by SolrCell. NOTE: This field is not indexed by default, since it is also copied to text using copyField below. This is to save space. Use this field for returning and highlighting document content. Use the text field to search the content. -- field name=content type=text_general indexed=false stored=true multiValued=true/ I am wondering, how does having this split in two fields text/content save space?
Re: sample Cell schema question
ok, I had not noticed text contains also the other metadata like keywords, description etc, nevermind! On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav jmluc...@gmail.com wrote: In the sample schema.xml I can see this: !-- Main body of document extracted by SolrCell. NOTE: This field is not indexed by default, since it is also copied to text using copyField below. This is to save space. Use this field for returning and highlighting document content. Use the text field to search the content. -- field name=content type=text_general indexed=false stored=true multiValued=true/ I am wondering, how does having this split in two fields text/content save space?
Re: BlendedInfixSuggester index write.lock failures on core reload
Hi, Yes this indeed is a bug. I am currently trying to get a patch for it. This is the Jira issue - https://issues.apache.org/jira/browse/SOLR-6246 On Thu, Aug 14, 2014 at 7:52 PM, Zisis Tachtsidis zist...@runbox.com wrote: Hi all, I'm using Solr 4.9.0 and have setup a spellcheck component for returning suggestions. The configuration inside my solr.SpellCheckComponent has as follows. str name=classnameorg.apache.solr.spelling.suggest.Suggester/str str name=lookupImplorg.apache.solr.spelling.suggest.fst.BlendedInfixLookupFactory/str along with a custom value for str name=indexPath/str The server is starting properly and data gets indexed but once i hit the 'Reload' button from 'Core Admin' I get the following error. null:org.apache.solr.common.SolrException: Error handling 'reload' action at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:791) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestInternal(CoreAdminHandler.java:224) at org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:187) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:729) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:258) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at com.caucho.server.dispatch.FilterFilterChain.doFilter(FilterFilterChain.java:89) at com.caucho.server.webapp.WebAppFilterChain.doFilter(WebAppFilterChain.java:156) at com.caucho.server.webapp.AccessLogFilterChain.doFilter(AccessLogFilterChain.java:95) at com.caucho.server.dispatch.ServletInvocation.service(ServletInvocation.java:289) at com.caucho.server.http.HttpRequest.handleRequest(HttpRequest.java:838) at com.caucho.network.listen.TcpSocketLink.dispatchRequest(TcpSocketLink.java:1345) at com.caucho.network.listen.TcpSocketLink.handleRequest(TcpSocketLink.java:1301) at com.caucho.network.listen.TcpSocketLink.handleRequestsImpl(TcpSocketLink.java:1285) at com.caucho.network.listen.TcpSocketLink.handleRequests(TcpSocketLink.java:1193) at com.caucho.network.listen.TcpSocketLink.handleAcceptTaskImpl(TcpSocketLink.java:992) at com.caucho.network.listen.ConnectionTask.runThread(ConnectionTask.java:117) at com.caucho.network.listen.ConnectionTask.run(ConnectionTask.java:93) at com.caucho.network.listen.SocketLinkThreadLauncher.handleTasks(SocketLinkThreadLauncher.java:169) at com.caucho.network.listen.TcpSocketAcceptThread.run(TcpSocketAcceptThread.java:61) at com.caucho.env.thread2.ResinThread2.runTasks(ResinThread2.java:173) at com.caucho.env.thread2.ResinThread2.run(ResinThread2.java:118) Caused by: org.apache.solr.common.SolrException: Unable to reload core: autocomplete at org.apache.solr.core.CoreContainer.recordAndThrow(CoreContainer.java:911) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:660) at org.apache.solr.handler.admin.CoreAdminHandler.handleReloadAction(CoreAdminHandler.java:789) ... 24 more Caused by: org.apache.solr.common.SolrException at org.apache.solr.core.SolrCore.init(SolrCore.java:868) at org.apache.solr.core.SolrCore.reload(SolrCore.java:426) at org.apache.solr.core.CoreContainer.reload(CoreContainer.java:650) ... 25 more Caused by: java.lang.RuntimeException at org.apache.solr.spelling.suggest.fst.BlendedInfixLookupFactory.create(BlendedInfixLookupFactory.java:102) at org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:105) at org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:636) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:651) at org.apache.solr.core.SolrCore.init(SolrCore.java:851) ... 27 more Debugging Solr code I found out that the original exception comes from the IndexWriter construction inside AnalyzingInfixSuggester.java ( more specifically org.apache.lucene.store.Lock:89). The exception is Lock obtain timed out: NativeFSLock@$indexPath/write.lock but seems to be hidden by the RuntimeException thrown by BlendedInfixLookupFactory. If I use the default indexPath I get another error (write lock related again) in the logs. org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@$indexPath/blendedInfixSuggesterIndexDir/write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:89) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:724) at
Re: sample Cell schema question
I have a question, does storing the data in copyfields save space? With Regards Aman Tandon On Tue, Aug 19, 2014 at 3:02 PM, jmlucjav jmluc...@gmail.com wrote: ok, I had not noticed text contains also the other metadata like keywords, description etc, nevermind! On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav jmluc...@gmail.com wrote: In the sample schema.xml I can see this: !-- Main body of document extracted by SolrCell. NOTE: This field is not indexed by default, since it is also copied to text using copyField below. This is to save space. Use this field for returning and highlighting document content. Use the text field to search the content. -- field name=content type=text_general indexed=false stored=true multiValued=true/ I am wondering, how does having this split in two fields text/content save space?
Re: sample Cell schema question
no it does not. Here the intent, I think, is not to duplicate stored info, as other metadata fields like author, keywords etc already are stored, if 'text' was stored (text is where all fields: content, author etc are copyed), then it would contain some duplicate info. On Tue, Aug 19, 2014 at 1:05 PM, Aman Tandon amantandon...@gmail.com wrote: I have a question, does storing the data in copyfields save space? With Regards Aman Tandon On Tue, Aug 19, 2014 at 3:02 PM, jmlucjav jmluc...@gmail.com wrote: ok, I had not noticed text contains also the other metadata like keywords, description etc, nevermind! On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav jmluc...@gmail.com wrote: In the sample schema.xml I can see this: !-- Main body of document extracted by SolrCell. NOTE: This field is not indexed by default, since it is also copied to text using copyField below. This is to save space. Use this field for returning and highlighting document content. Use the text field to search the content. -- field name=content type=text_general indexed=false stored=true multiValued=true/ I am wondering, how does having this split in two fields text/content save space?
Re: sample Cell schema question
indexed means you can search it, stored means you can return the value to the user or highlight it. Both consum disk space. A copyfield is not a kind of special field : it is a directive that copies one field values to another field. They are many use cases for using copy fields. In the example, we use a specific field, text, as a default field where use will perform the searches. That is why we copy all fields that we want to search in that specific field text(note that there are other way to search multiple fields : have a look to http://wiki.apache.org/solr/ExtendedDisMax) For exemple, the field contentis copied to the text field (that is indexed) for searching. As we will use the field text to perform our search, we don't need to index the content field too, and we don't, you save some disk space. Regards, Aurélien Le 19/08/2014 13:05, Aman Tandon a écrit : I have a question, does storing the data in copyfields save space? With Regards Aman Tandon On Tue, Aug 19, 2014 at 3:02 PM, jmlucjav jmluc...@gmail.com wrote: ok, I had not noticed text contains also the other metadata like keywords, description etc, nevermind! On Tue, Aug 19, 2014 at 11:28 AM, jmlucjav jmluc...@gmail.com wrote: In the sample schema.xml I can see this: !-- Main body of document extracted by SolrCell. NOTE: This field is not indexed by default, since it is also copied to text using copyField below. This is to save space. Use this field for returning and highlighting document content. Use the text field to search the content. -- field name=content type=text_general indexed=false stored=true multiValued=true/ I am wondering, how does having this split in two fields text/content save space?
Re: Exact match?
Maybe use dismax for this? Something like q={!dismax qf=field_name mm=90%}query_string, or more verbosely and separately, q=query_stringdefType=dismaxmm=90% Erik On Aug 19, 2014, at 2:43 AM, William Bell billnb...@gmail.com wrote: If I have a long string, how do I match on 90% of the terms to see if there is a duplicate? If I add the field and index it, what is the best way to return 90%? # terms match # of terms in the field? -- Bill Bell billnb...@gmail.com cell 720-256-8076
Re: faceted query with stats not working in solrj
That's a good suggestion, I hadn't checked that log file. What I found that works, is hitting these methods on the SolrQuery object: query.setGetFieldStatistics(true); query.setParam(stats.field, MyStatsFieldName); query.setParam(stats.facet, MyFacetFieldName); Now I see the stats in the QueryResponse. -- View this message in context: http://lucene.472066.n3.nabble.com/faceted-query-with-stats-not-working-in-solrj-tp4153608p4153764.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Any recommendation for Solr Cloud version.
On August 19, 2014 at 2:39:32 AM, Lee Chunki (lck7...@coupang.com) wrote: the sooner the better? i.e. version 4.9.0. Yes, certainly. -- Mark Miller about.me/markrmiller
Re: Apache Solr Wiki
Can I also have access to the wiki? We are at the outset of a Solr/Hybris implementation. From: Mark Sun mark...@motionelements.com To: solr-user@lucene.apache.org Date: 08/18/2014 08:06 PM Subject:Apache Solr Wiki Dear Solr Wiki admin, We are using Solr for our multilingual asian language keywords search, as well as visual similarity search engine (via pixolution plugin). We would like to update the Powered by Solr section. As well as help to add on to the knowledge base for other Solr setups. Can you add me, username MarkSun as a contributor to the wiki? Thank you! Cheers, Mark Sun CTO MotionElements Pte Ltd 190 Middle Road, #10-05 Fortune Centre Singapore 188979 mark...@motionelements.com www.motionelements.com = Asia-inspired Stock Animation | Video Footage l AE Template online marketplace = This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.
Re: solr cloud going down repeatedly
On 8/19/2014 3:12 AM, Jakov Sosic wrote: Thank you for your comment. How did you test these settings? I mean, that's a lot of tuning and I would like to set up some test environment to be certain this is what I want... I included a section on tools when I wrote this page: http://wiki.apache.org/solr/SolrPerformanceProblems#GC_pause_problems Thanks, Shawn
Near Realtime get
Hi, I tried the realtime get today in a solrcloud setup, and it's returning only a subset of my stored fields. Did I miss any parameter that would return all the fields ? Thanks for your help ! Philippe This email message and any attachments are confidential and may be privileged. If you are not the intended recipient, please notify GenomeQuest immediately by replying to this message or by sending an email to postmas...@genomequest.com and destroy all copies of this message and any attachments without reading or disclosing their contents. Thank you.
Indexing and Querying MS SQL Server 2012 Spatial
Hello I'm new to Solr: I have a SQL Server 2012 database with spatial columns (points/lines/polys) Do you have any resources to point to for the following Creating a Solr index of a sql server spatial table Bounding Box query (intersect) example, possibly with front-end from GMaps or OpenLayers I'm currently reading Apache Solr Beginner's Guide and have reviewed https://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4 I am able to index and query my non spatial data, I am just looking for some resource that may have some more detail about how to set everything up. I can provide more detail if needed. Thanks Alex Bostic GIS Developer URS Corporation 12420 Milestone Center Drive, Suite 150 Germantown, MD 20876 direct line: 301-820-3287 cell line: 301-213-2639 This e-mail and any attachments contain URS Corporation confidential information that may be proprietary or privileged. If you receive this message in error or are not the intended recipient, you should not retain, distribute, disclose or use any of this information and you should destroy the e-mail and any attachments or copies.
Re: Apache solr sink issue
While Gopal is correct, having a uniqueKey is not mandatory, if you're using SolrCloud, it _is_ necessary. And I don't know the internals of the Flume Solr Sink, but if it uses CloudSolrServer under the covers I'd be surprised if it worked without a uniqueKey defined. And I'd guess it does use CloudSolrServer. The error is explicit. You're sending documents to Solr that don't have an id field (or whatever your uniqueKey is set to in schema.xml). Best, Erick On Mon, Aug 18, 2014 at 11:08 PM, Gopal Patwa gopalpa...@gmail.com wrote: Do you have this tag uniqueKeyid/uniqueKey define in your schema , it is not mandatory to have unique field but if you need it then u have to provide it else you can remove it, see below wiki page for more details http://wiki.apache.org/solr/SchemaXml#The_Unique_Key_Field Some options to generate this field if your document cannot derive one https://wiki.apache.org/solr/UniqueKey On Mon, Aug 18, 2014 at 10:48 PM, Jeniba Johnson jeniba.john...@lntinfotech.com wrote: Hi, I want to index a log file in Solr using Flume + Apache Solr sink Iam referring this below mentioned URL https://cwiki.apache.org/confluence/display/FLUME/How+to+Setup+Solr+Sink+for+Flume Error from flume console 2014-08-19 15:38:56,451 (concurrentUpdateScheduler-2-thread-1) [ERROR - org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.handleError(ConcurrentUpdateSolrServer.java:354)] error java.lang.Exception: Bad Request request: http://xxx.xx.xx:8983/solr/update?wt=javabinversion=2 at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer$Runner.run(ConcurrentUpdateSolrServer.java:208) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Error from solr console 473844 [qtp176433427-19] ERROR org.apache.solr.core.SolrCore â org.apache.solr.common.SolrException: Document is missing mandatory uniqueKey field: id Csn anyone help me with this issue and help me with the steps for integrating flume with solr sink Regards, Jeniba Johnson The contents of this e-mail and any attachment(s) may contain confidential or privileged information for the intended recipient(s). Unintended recipients are prohibited from taking action on the basis of information in this e-mail and using or disseminating the information, and must notify the sender and delete it from their system. LT Infotech will not accept responsibility or liability for the accuracy or completeness of, or the presence of any virus or disabling code in this e-mail
Re: Apache Solr Wiki
Julie: bq: Can I also have access to the wiki? Sure. Sou need to create a Wiki logon and let us know what that is before we can add you to the list. Best, Erick On Tue, Aug 19, 2014 at 6:54 AM, julie.v...@anixter.com wrote: Can I also have access to the wiki? We are at the outset of a Solr/Hybris implementation. From: Mark Sun mark...@motionelements.com To: solr-user@lucene.apache.org Date: 08/18/2014 08:06 PM Subject:Apache Solr Wiki Dear Solr Wiki admin, We are using Solr for our multilingual asian language keywords search, as well as visual similarity search engine (via pixolution plugin). We would like to update the Powered by Solr section. As well as help to add on to the knowledge base for other Solr setups. Can you add me, username MarkSun as a contributor to the wiki? Thank you! Cheers, Mark Sun CTO MotionElements Pte Ltd 190 Middle Road, #10-05 Fortune Centre Singapore 188979 mark...@motionelements.com www.motionelements.com = Asia-inspired Stock Animation | Video Footage l AE Template online marketplace = This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.
Substring and Case In sensitive Search
Hi, I am very new to solr.How can I allow solr search on a string field case insensitive and substring?. Thanks, Nishanth
Re: Substring and Case In sensitive Search
Substring search a string field using wildcard, *, at beginning and end of query term. Case-insensitive match on string field is not supported. Instead, copy the string field to a text field, use the keyword tokenizer, and then apply the lower case filter. But... review your use case to confirm whether you really need to use string as opposed to text field. -- Jack Krupansky -Original Message- From: Nishanth S Sent: Tuesday, August 19, 2014 12:03 PM To: solr-user@lucene.apache.org Subject: Substring and Case In sensitive Search Hi, I am very new to solr.How can I allow solr search on a string field case insensitive and substring?. Thanks, Nishanth
Re: Apache Solr Wiki
user name: julievoss From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Date: 08/19/2014 10:34 AM Subject:Re: Apache Solr Wiki Julie: bq: Can I also have access to the wiki? Sure. Sou need to create a Wiki logon and let us know what that is before we can add you to the list. Best, Erick On Tue, Aug 19, 2014 at 6:54 AM, julie.v...@anixter.com wrote: Can I also have access to the wiki? We are at the outset of a Solr/Hybris implementation. From: Mark Sun mark...@motionelements.com To: solr-user@lucene.apache.org Date: 08/18/2014 08:06 PM Subject:Apache Solr Wiki Dear Solr Wiki admin, We are using Solr for our multilingual asian language keywords search, as well as visual similarity search engine (via pixolution plugin). We would like to update the Powered by Solr section. As well as help to add on to the knowledge base for other Solr setups. Can you add me, username MarkSun as a contributor to the wiki? Thank you! Cheers, Mark Sun CTO MotionElements Pte Ltd 190 Middle Road, #10-05 Fortune Centre Singapore 188979 mark...@motionelements.com www.motionelements.com = Asia-inspired Stock Animation | Video Footage l AE Template online marketplace = This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.
Replication of full index to replica after merge index into leader not working
Hi, Using the coreAdmin mergeindexes command to merge an index into a leader (SolrCloud mode on 4.9.0) and the replica does not do a snap pull from the leader as I would have expected. The merge into the leader worked like a charm except I had to send a hard commit after that (which makes sense). I'm guessing the replica would snap pull from the leader if I restarted it, but reloading the collection or core does not trigger the replica to pull from the leader. This seems like an oversight in the mergeindex interaction with SolrCloud. Seems like the simplest would be for the leader to send all replicas a request recovery command after performing the merge. Advice? Cheers, Tim
Re: Replication of full index to replica after merge index into leader not working
I’d just file a JIRA. Merge, like optimize and a few other things, were never tested or considered in early SolrCloud days. It’s used in the HDFS stuff, but in that case, the index is merged to all replicas and no recovery is necessary. If you want to make the local filesystem merge work well with SolrCloud, sounds like we should write a test and make it work. -- Mark Miller about.me/markrmiller On August 19, 2014 at 1:20:54 PM, Timothy Potter (thelabd...@gmail.com) wrote: Hi, Using the coreAdmin mergeindexes command to merge an index into a leader (SolrCloud mode on 4.9.0) and the replica does not do a snap pull from the leader as I would have expected. The merge into the leader worked like a charm except I had to send a hard commit after that (which makes sense). I'm guessing the replica would snap pull from the leader if I restarted it, but reloading the collection or core does not trigger the replica to pull from the leader. This seems like an oversight in the mergeindex interaction with SolrCloud. Seems like the simplest would be for the leader to send all replicas a request recovery command after performing the merge. Advice? Cheers, Tim
Re: Replication of full index to replica after merge index into leader not working
On August 19, 2014 at 1:33:10 PM, Mark Miller (markrmil...@gmail.com) wrote: sounds like we should write a test and make it work. Keeping in mind that when using a shared filesystem like HDFS or especially if using the MapReduce contrib, you probably won’t want this new behavior. -- Mark Miller about.me/markrmiller
Help with StopFilterFactory
Hi, I have the next text field: fieldType name=words_ngram class=solr.TextField omitNorms=false analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType url_stopwords.txt looks like: http https ftp www So very simple. In index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But any of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser What do I do wrong? Analysis makes me think something is wrong with token positions: http://lucene.472066.n3.nabble.com/file/n4153839/oi7o69.jpg but I was thinking StopFilterFactory is supposed to remove https/http/ftw/www keywords. Why do they figure there at all? That doesn't make much sense. Regards, Alexander -- View this message in context: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839.html Sent from the Solr - User mailing list archive at Nabble.com.
Performance of Boolean query with hundreds of OR clauses.
I am using Solr to perform search for finding similar pictures. For this purpose, every image indexed as a set of descriptors ( descriptor is a string of 6 chars ) . Number of descriptors for every image may vary ( from few to many thousands) When I want to search for a similar image , I am extracting the descriptors from it and create a query like : MyImage:( desc1 desc2 ... desc n ) Number of descriptors in query may also vary. Usual it is about 1000. Of course performance of this query very bad and may take few minutes to return . Any ideas for performance improvement ? P.s I also tried to use lire , but it is not fits my use case. -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-of-Boolean-query-with-hundreds-of-OR-clauses-tp4153844.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replication of full index to replica after merge index into leader not working
Was able to get around it for now sending the REQUESTRECOVERY command to the replica. Will open an improvement JIRA but not sure if it's worth it as the work-around is pretty clean (IMO). Tim On Tue, Aug 19, 2014 at 5:33 PM, Mark Miller markrmil...@gmail.com wrote: I’d just file a JIRA. Merge, like optimize and a few other things, were never tested or considered in early SolrCloud days. It’s used in the HDFS stuff, but in that case, the index is merged to all replicas and no recovery is necessary. If you want to make the local filesystem merge work well with SolrCloud, sounds like we should write a test and make it work. -- Mark Miller about.me/markrmiller On August 19, 2014 at 1:20:54 PM, Timothy Potter (thelabd...@gmail.com) wrote: Hi, Using the coreAdmin mergeindexes command to merge an index into a leader (SolrCloud mode on 4.9.0) and the replica does not do a snap pull from the leader as I would have expected. The merge into the leader worked like a charm except I had to send a hard commit after that (which makes sense). I'm guessing the replica would snap pull from the leader if I restarted it, but reloading the collection or core does not trigger the replica to pull from the leader. This seems like an oversight in the mergeindex interaction with SolrCloud. Seems like the simplest would be for the leader to send all replicas a request recovery command after performing the merge. Advice? Cheers, Tim
Re: Replicating Between Solr Clouds
Are there any more OOB solutions for inter-SolrCloud replication now? Our indexing is so slow that we cannot rely on a complete re-index of data from our DB of record (SQL) to recover data in the Solr indices. -- View this message in context: http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp4121196p4153856.html Sent from the Solr - User mailing list archive at Nabble.com.
Index not respecting Omit Norms
Please reference the below images: http://lucene.472066.n3.nabble.com/file/n4153863/Schema.png http://lucene.472066.n3.nabble.com/file/n4153863/SolrDescriptionSchemaBrowser.png http://lucene.472066.n3.nabble.com/file/n4153863/SolrDescriptionDebugResults.png As you can see from the first image, the text field-type doesn't define the omitNorms flag, meaning it is set to false. Also on the first image you can see that the description field doesn't define the omitNorms flag, again meaning it is set to false. (Default for omitNorms is false). This can all be confirmed on the second image, where the Properties and Schema rows have omitNorms set to checked. I am having some issues understanding why some results have a fieldNorm set to 1 for matches on the description field. As you can see from the third image, the description field has a rather large number of terms in it, yet the fieldNorm is being set to 1.0 for matching 'supply' on the description field. My guess is that the Omit Norms flag for the 'Index' row is causing the issue. Questions: From the first picture, can anyone tell me what each row (Properties, Schema and Index) refers to? I think the Properties row refers to the flags set when defining the Field Type, which for this field is text. The Schema row refers to the flags set when defining the field, which is description. I'm not as sure where the Index row flags come from, but I'm assuming it defines what the index is really representing? Am I right in assuming the Omit Norms flag in the Index row of the first picture is what is causing fieldNorm issues in the second image? If I am correct in the above question, how do I fix it? Additional information: I am not using the standard request handler. I am using a custom request handler that uses eDisMax. The description_sortAlpha field that the description field is copying to is a text field *but* it has omitNorms set to true My Index Analyzers for the description field are: WhitespaceTokenizerFactory, StopFilterFactory, WordDelimiterFilterFactory, LowerCaseFilterFactory and RemoveDuplicatesTokenFIlterFactory, in that order My Query Analyzers for the description field are: WhitespaceTokenizerFactory, SynonymFilterFactory, StopFilterFactory, WordDelimiterFilterFactory, LowerCaseFilterFactory and RemoveDuplicatesTokenFilterFactory, in that order. The description field is not the only text field to be having this omit norms issue for the Index row. There are actually a couple of others. Thanks, -Tim -- View this message in context: http://lucene.472066.n3.nabble.com/Index-not-respecting-Omit-Norms-tp4153863.html Sent from the Solr - User mailing list archive at Nabble.com.
Question on Solr Relevancy using Okapi BM25F
I am trying to get OkapiBM25F working over some press release articles I am indexing. The data has text portions spread across 3 fields - Title, Summary and Full Article. I would like to influence the standard BM25 by giving more weight to words in title and summary of the article than the full description. The importance has to be of the order title Summary Full description. I am unable to find schema examples online that can help me with it. Can someone guide me with a possible schema for this. (or a link to an article/blog that explains it) Thanks for your help. -Ritesh -- View this message in context: http://lucene.472066.n3.nabble.com/Question-on-Solr-Relevancy-using-Okapi-BM25F-tp4153866.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Performance of Boolean query with hundreds of OR clauses.
A large number of query terms is definitely an anti-pattern and not a recommended use case for Solr, but I'm a little surprised that it takes minutes, as opposed to 10 to 20 seconds. Does your index fit entirely in the OS system memory available for file caching? IOW, are those few minutes CPU-bound or I/O-bound? -- Jack Krupansky -Original Message- From: SolrUser1543 Sent: Tuesday, August 19, 2014 2:57 PM To: solr-user@lucene.apache.org Subject: Performance of Boolean query with hundreds of OR clauses. I am using Solr to perform search for finding similar pictures. For this purpose, every image indexed as a set of descriptors ( descriptor is a string of 6 chars ) . Number of descriptors for every image may vary ( from few to many thousands) When I want to search for a similar image , I am extracting the descriptors from it and create a query like : MyImage:( desc1 desc2 ... desc n ) Number of descriptors in query may also vary. Usual it is about 1000. Of course performance of this query very bad and may take few minutes to return . Any ideas for performance improvement ? P.s I also tried to use lire , but it is not fits my use case. -- View this message in context: http://lucene.472066.n3.nabble.com/Performance-of-Boolean-query-with-hundreds-of-OR-clauses-tp4153844.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Index not respecting Omit Norms
: As you can see from the first image, the text field-type doesn't define the : omitNorms flag, meaning it is set to false. Also on the first image you can : see that the description field doesn't define the omitNorms flag, again : meaning it is set to false. (Default for omitNorms is false). This can all ... : I am having some issues understanding why some results have a fieldNorm set : to 1 for matches on the description field. As you can see from the third ... : From the first picture, can anyone tell me what each row (Properties, Schema : and Index) refers to? I think the Properties row refers to the flags set : when defining the Field Type, which for this field is text. The Schema row : refers to the flags set when defining the field, which is description. I'm : not as sure where the Index row flags come from, but I'm assuming it defines : what the index is really representing? : Am I right in assuming the Omit Norms flag in the Index row of the first : picture is what is causing fieldNorm issues in the second image? : If I am correct in the above question, how do I fix it? From a quick glance at the UI JavaScript code (and the underlying LukeRequestHandler) I'm honestly not sure what the intended difference is between the Properties row and the Schema row. I can tell you that the Index row represents what information about the field can actaully be extracted from the underlying index itself -- completely independently from the schema. The fact that Omit Norms is checked in that row means that there is at least one document in your index that was indexed with omitNormws=true. Most likely what happened is that you indexed a bunch of docs with omitNorms=true in your schema.xml, then later changed your schema to use norms, but those docs are still there in the index. -Hoss http://www.lucidworks.com/
Re: Replicating Between Solr Clouds
I¹ve been working on this tool, which wraps the collections API to do more advanced cluster-management operations: https://github.com/whitepages/solrcloud_manager One of the operations I¹ve added (copy) is a deployment mechanism that uses the replication handler¹s snap puller to hot-load a pre-indexed collection from one solrcloud cluster into another. You create the same collection name with the same shard count in two clusters, index into one, and copy from that into the other. This method won¹t work as a method of active replication, since it copies the whole index. If you only need a periodic copy between data centers though, or want someplace to restore from in case of critical failure (until you can properly rebuild), there might be something you can use here. On 8/19/14, 12:45 PM, reparker23 reparke...@gmail.com wrote: Are there any more OOB solutions for inter-SolrCloud replication now? Our indexing is so slow that we cannot rely on a complete re-index of data from our DB of record (SQL) to recover data in the Solr indices. -- View this message in context: http://lucene.472066.n3.nabble.com/Replicating-Between-Solr-Clouds-tp41211 96p4153856.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sample Cell schema question
Thanks Aurélien On Aug 19, 2014 3:00 PM, jmlucjav jmluc...@gmail.com wrote: In the sample schema.xml I can see this: !-- Main body of document extracted by SolrCell. NOTE: This field is not indexed by default, since it is also copied to text using copyField below. This is to save space. Use this field for returning and highlighting document content. Use the text field to search the content. -- field name=content type=text_general indexed=false stored=true multiValued=true/ I am wondering, how does having this split in two fields text/content save space?
Re: logging in solr
As you are using tomcat you can configure the log file name, folder,etc. by configuring the server.xml present in the Conf directory of tomcat. On Aug 19, 2014 4:17 AM, Shawn Heisey s...@elyograg.org wrote: On 8/18/2014 2:43 PM, M, Arjun (NSN - IN/Bangalore) wrote: Currently in my component Solr is logging to catalina.out. What is the configuration needed to redirect those logs to some custom logfile eg: Solr.log. Solr uses the slf4j library for logging. Simply change your program to use slf4j, and very likely the logs will go to the same place the Solr logs do. http://www.slf4j.org/manual.html See also the wiki page on logging jars and Solr: http://wiki.apache.org/solr/SolrLogging Thanks, Shawn
Re: Apache Solr Wiki
Done, have fun! On Tue, Aug 19, 2014 at 10:07 AM, julie.v...@anixter.com wrote: user name: julievoss From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Date: 08/19/2014 10:34 AM Subject:Re: Apache Solr Wiki Julie: bq: Can I also have access to the wiki? Sure. Sou need to create a Wiki logon and let us know what that is before we can add you to the list. Best, Erick On Tue, Aug 19, 2014 at 6:54 AM, julie.v...@anixter.com wrote: Can I also have access to the wiki? We are at the outset of a Solr/Hybris implementation. From: Mark Sun mark...@motionelements.com To: solr-user@lucene.apache.org Date: 08/18/2014 08:06 PM Subject:Apache Solr Wiki Dear Solr Wiki admin, We are using Solr for our multilingual asian language keywords search, as well as visual similarity search engine (via pixolution plugin). We would like to update the Powered by Solr section. As well as help to add on to the knowledge base for other Solr setups. Can you add me, username MarkSun as a contributor to the wiki? Thank you! Cheers, Mark Sun CTO MotionElements Pte Ltd 190 Middle Road, #10-05 Fortune Centre Singapore 188979 mark...@motionelements.com www.motionelements.com = Asia-inspired Stock Animation | Video Footage l AE Template online marketplace = This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation.
Re: Help with StopFilterFactory
What release of Solr? Do you have autoGeneratePhraseQueries=true on the field? And when you said But any of these does, did you mean But NONE of these does? -- Jack Krupansky -Original Message- From: heaven Sent: Tuesday, August 19, 2014 2:34 PM To: solr-user@lucene.apache.org Subject: Help with StopFilterFactory Hi, I have the next text field: fieldType name=words_ngram class=solr.TextField omitNorms=false analyzer tokenizer class=solr.PatternTokenizerFactory pattern=[^\w]+ / filter class=solr.StopFilterFactory words=url_stopwords.txt ignoreCase=true / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType url_stopwords.txt looks like: http https ftp www So very simple. In index I have: * twitter.com/testuser All these queries do match: * twitter.com/testuser * com/testuser * testuser But any of these does: * https://twitter.com/testuser * https://www.twitter.com/testuser * www.twitter.com/testuser What do I do wrong? Analysis makes me think something is wrong with token positions: http://lucene.472066.n3.nabble.com/file/n4153839/oi7o69.jpg but I was thinking StopFilterFactory is supposed to remove https/http/ftw/www keywords. Why do they figure there at all? That doesn't make much sense. Regards, Alexander -- View this message in context: http://lucene.472066.n3.nabble.com/Help-with-StopFilterFactory-tp4153839.html Sent from the Solr - User mailing list archive at Nabble.com.
Inconsistent Solr Index Behavior
A while back we added a span support for multi-value fields and did a full re-index for data spanning over 4 years. It worked perfectly for a month, and then suddenly results are not reliable anymore. We are noticing that the span is not working on most of the data and is returning wrong results. Is there anything that might corrupt or drop index data(old data)? Less RAM? Something? -E
Intermittent error indexing SolrCloud 4.7.0
Hi All, I get No Live SolrServers available to handle this request error intermittently while indexing in a SolrCloud cluster with 3 shards and replication factor of 2. I am using Solr 4.7.0. Please see the stack trace below. org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request at org.apache.solr.client.solrj.impl.LBHttpSolrServer.request(LBHttpSolrServer.java:352) ~[DynaOCrawlerUtils.jar:?] at org.apache.solr.client.solrj.impl.CloudSolrServer.request(CloudSolrServer.java:640) ~[DynaOCrawlerUtils.jar:?] at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) ~[DynaOCrawlerUtils.jar:?] at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168) ~[DynaOCrawlerUtils.jar:?] at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146) ~[DynaOCrawlerUtils.jar:?]
Re: Integrating Solr with HBase Using Lily Project
try cloudear search. -- View this message in context: http://lucene.472066.n3.nabble.com/Integrating-Solr-with-HBase-Using-Lily-Project-tp4147868p4153906.html Sent from the Solr - User mailing list archive at Nabble.com.