Re: Indexed=false for a field,but still able to search on field.
Hii, I have tried two scanarios: 1. I have tried and docValues is not set anything. 2. I have tried and docValues is set true. #1. You can not search directly that field, but when you apply search in any other field of that doc, it will show you that field in the result. You can not do faceting on this field as well. If you will apply seach on this field in the Solr Admin Panel, no result found. But you can see this field on doc there. #2. Its searchable and can do faceting also. Please correct me, if I am going wrong. Thanks Renuka Srishti On Tue, Aug 29, 2017 at 1:06 AM, AshB wrote: > Hi, > > Yes docValues is true for fieldType > > docValues="true"/> > > > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Indexed-false-for-a-field-but-still-able-to-search-on-field- > tp4352338p4352442.html > Sent from the Solr - User mailing list archive at Nabble.com. >
Solr client
Hi I am aggregating open source solr client libraries across all languages. Below are the links. Very few projects are currently active. Most of them are last updated few years back. Please provide me pointers, if i missed any solr client library. http://www.findbestopensource.com/tagged/solr-client http://www.findbestopensource.com/tagged/solr-gui Regards Ganesh PS: The website http://www.findbestopensource.com search is powered by Solr.
solr index replace with index from another environment
Hi there, We are using solr-6.3.0 and have the need to replace the solr index in production with the solr index from another environment on periodical basis. But the jvms have to be recycled for the updated index to take effect. Is there any way this can be achieved without restarting the jvms? Using aliases as described below, there is an alternative, but I dont think it is useful in my case, where I have the index from other environment ready. If I build new collection and replace index, again, the jvms need to be restarted for the new index to take effect. https://stackoverflow.com/questions/45158394/replacing-old-indexed-data-with-new-data-in-apache-solr-with-zero-downtime Any other suggestions please. Thanks, satya
Re: Excessive resources consumption migrating from Solr 6.6.0 Master/Slave to SolrCloud 6.6.0 (dozen times more resources)
Dani, It might be time to attach some instrumentation to one of your nodes. Finding out which classes are occupying the memory will help narrow the issue. Are you using a lot of facets, grouping, or stats during your queries? Also, when you were doing Master/Slave, was that on the same version of Solr as you're using now in SolrCloud mode? -Scott On Mon, Aug 28, 2017 at 4:50 AM, Daniel Ortega wrote: > Hi Scott, > > Yes, we think that our usage scenario falls into Index-Heavy/Query-Heavy > too. We have tested with several values in softcommit/hardcommit values > (from few seconds to minutes) with no appreciable improvements :( > > Thanks for your reply! > > - Daniel > > 2017-08-25 6:45 GMT+02:00 Scott Stults >: > > > Hi Dani, > > > > It seems like your use case falls into the Index-Heavy / Query-Heavy > > category, so you might try increasing your hard commit frequency to 15 > > seconds rather than 15 minutes: > > > > https://lucidworks.com/2013/08/23/understanding- > > transaction-logs-softcommit-and-commit-in-sorlcloud/ > > > > > > -Scott > > > > On Thu, Aug 24, 2017 at 10:03 AM, Daniel Ortega < > > danielortegauf...@gmail.com > > > wrote: > > > > > Hi Scott, > > > > > > In our indexing service we are using that client too > > > (org.apache.solr.client.solrj.impl.CloudSolrClient) :) > > > > > > This is out Update Request Processor chain configuration: > > > > > > > > name > > > ="signature"> true > name="signatureField"> > > > hash false > > "signatureClass">solr.processor.Lookup3Signature > > > > > < > > > updateRequestProcessorChain processor="signature" name="dedupe"> > > > > class="solr.LogUpdateProcessorFactory" /> > > "solr.RunUpdateProcessorFactory" /> > < > > > requestHandler name="/update" class="solr.UpdateRequestHandler" > > > name= > > > "defaults"> dedupe > > > > > > > > Thanks for your reply :) > > > > > > - Dani > > > > > > 2017-08-24 14:49 GMT+02:00 Scott Stults > opensourceconnections.com > > > >: > > > > > > > Hi Daniel, > > > > > > > > SolrJ has a few client implementations to choose from: > CloudSolrClient, > > > > ConcurrentUpdateSolrClient, HttpSolrClient, LBHttpSolrClient. You > said > > > your > > > > query service uses CloudSolrClient, but it would be good to verify > > which > > > > implementation your indexing service uses. > > > > > > > > One of the problems you might be having is with your deduplication > > step. > > > > Can you post your Update Request Processor Chain? > > > > > > > > > > > > -Scott > > > > > > > > > > > > On Wed, Aug 23, 2017 at 4:13 PM, Daniel Ortega < > > > > danielortegauf...@gmail.com> > > > > wrote: > > > > > > > > > Hi Scott, > > > > > > > > > > - *Can you describe the process that queries the DB and sends > records > > > to > > > > * > > > > > *Solr?* > > > > > > > > > > We are enqueueing ids during every ORACLE transaction (in > > > > insert/updates). > > > > > > > > > > An application dequeues every id and perform queries against dozen > of > > > > > tables in the relational model to retrieve the fields to build the > > > > > document. As we know that we are modifying the same ORACLE row in > > > > > different (but consecutive) transactions, we store only the last > > > version > > > > of > > > > > the modified documents in a map data structure. > > > > > > > > > > The application has a configurable interval to send the documents > > > stored > > > > in > > > > > the map to the update handler (we have tested different intervals > > from > > > > few > > > > > milliseconds to several seconds) using the SolrJ client. Actually > we > > > are > > > > > sending all the documents every 15 seconds. > > > > > > > > > > This application is developed using Java, Spring and Maven and we > > have > > > > > several instances. > > > > > > > > > > -* Is it a SolrJ-based application?* > > > > > > > > > > Yes, it is. We aren't using the last version of SolrJ client (we > are > > > > > currently using SolrJ v6.3.0). > > > > > > > > > > - *If it is, which client package are you using?* > > > > > > > > > > I don't know exactly what do you mean saying 'client package' :) > > > > > > > > > > - *How many documents do you send at once?* > > > > > > > > > > It depends on the defined interval described before and the number > of > > > > > transactions executed in our relational database. From dozens to > few > > > > > hundreds (and even thousands). > > > > > > > > > > - *Are you sending your indexing or query traffic through a load > > > > balancer?* > > > > > > > > > > We aren't using a load balancer for indexing, but we have all our > > Rest > > > > > Query services through an HAProxy (using 'leastconn' algorithm). > The > > > Rest > > > > > Query Services performs queries using the CloudSolrClient. > > > > > > > > > > Thanks for your reply, > > > > > if you need any further information don't hesitate to ask > > > > > > > > > > Daniel > > > > > > > > > > 2017-08-23 14:57 GMT+02:00 Scott Stults > > > opensourceconnections.com > > > > > >: > > > > > > >
Re: Process to fix typos in ref-guide
Hi, I think the PR is the easiest and better. They are only typos in the ref-guide. Thanks! On Mon, Aug 28, 2017 at 4:45 PM, Erick Erickson wrote: > If you're a committer yes. If not, I guess you'd have to create a > patch or a PR and ask a committer pick it up. > > And probably not only master but 7x as well. > > Erick > > On Mon, Aug 28, 2017 at 1:32 PM, Leonardo Perez Pulido > wrote: > > Hi, > > How is the process to help fix typos in Solr's ref. guide? Can I merge it > > directly to master? > > @Cassandra. > > Thanks. >
Re: Process to fix typos in ref-guide
If you're a committer yes. If not, I guess you'd have to create a patch or a PR and ask a committer pick it up. And probably not only master but 7x as well. Erick On Mon, Aug 28, 2017 at 1:32 PM, Leonardo Perez Pulido wrote: > Hi, > How is the process to help fix typos in Solr's ref. guide? Can I merge it > directly to master? > @Cassandra. > Thanks.
Process to fix typos in ref-guide
Hi, How is the process to help fix typos in Solr's ref. guide? Can I merge it directly to master? @Cassandra. Thanks.
Re: Solr cloud in kubernetes
Thanks Björn for the detailed information, just wanted to understand: When you say separate service for external traffic, does this mean a home brewed one that proxy solr queries? And what is the difference between the above and "solr discovery"? Do you specify pod anti affinity for solr hosts? Regards Lars On Sat, 26 Aug 2017 at 13:19, Björn Häuser wrote: > Hi Lars, > > we are running Solr in kubernetes and after some initial problems we are > running quite stable now. > > Here is the setup we choose for solr: > > - separate service for external traffic to solr (called “solr”) > - statefulset for solr with 3 replicas with another service (called > “solr-discovery”) > > We set the SOLR_HOST (which is used for intra cluster communication) to > the pod inside the statefulset > (solr-0.solr-discovery.default.svc.cluster.local. This ensures that on solr > pod restart the intra cluster communication still continues to work. In the > beginning we used the IP address of the pod, this caused problems when > restarting pods, they tried to talk with the old ip addresses. > > Zookeeper inside kubernetes is a different story. Use the latest version > of kubernetes, because old versions never reresolved dns names. For > connecting to zookeeper we use the same approach, one service-ip for all > pods. The statefulset works again with a different service name. > > The problems we are currently facing: > > - Client timeouts whenever a solr pod stops and starts again, we currently > try to solve this with better readiness probes, no success yet > - Sometimes solr collections do not recover completely after a pod restart > and we manually have to force recovery, still not investigated fully > > Hope this helps you! > > Thanks > Björn > > > On 26. Aug 2017, at 12:08, Lars Karlsson > wrote: > > > > Hi, I wanted to hear if anyone successfully got solr cloud running on > > kubernetes and can share challenges and limitations. > > > > Can't find much uptodate github projects, would be great if you can point > > out blogposts or other useful links. > > > > Thanks in advance. > >
Re: Indexed=false for a field,but still able to search on field.
Hi, Yes docValues is true for fieldType -- View this message in context: http://lucene.472066.n3.nabble.com/Indexed-false-for-a-field-but-still-able-to-search-on-field-tp4352338p4352442.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Facet on a Payload field type?
The issue is, that we lack translations for much of our attribute data. We do have English versions. The idea is to use the English values for the faceted values and for the filters, but be able to retrieve different language versions of the term to the caller. If we have a facet on color if the value is red, be able to retrieve rojo for Spanish etc... Also users can switch regions between searches. If a user starts out in French, executes a search, selects a facet then switches to German they should get the German for the facet (if it exists) even when they originally used French. If all of the searching was in English where we have the data, we could then show French (or German etc) for the facet value. The real field value that we use for filtering would be in English but the values returned to the user would be in the language of their locale or English if we don't have a translation for it. The idea being that the translations would be stored in the payloads On Wed, Aug 23, 2017 at 7:47 PM, Chris Hostetter wrote: > > : The payload idea was from my boss, it's similar to how they did this in > : Endeca. > ... > : My alternate idea is to have sets of facet fields for different > languages, > : then let our service layer determine the correct one for the user's > : language, but I'm curious as to how others have solved this. > > Let's back up for a minute -- can you please explain your ultimate goal, > from a "solr client application" perspective? (assuming we have no > knowledge of how/how you might have used Endeca in the past) > > What is it you want your application to be able to do when indexing docs > to solr and making queries to solr? give us some real world examples > > > > (If i had to guess: i gather maybe you're just dealing with a "keywords" > type field that you want to facet on -- and maybe you could use a diff > field for each langauge, or encode the langauges as a prefix on each term > and use facet.prefix to restrict the facet contraints returned) > > > > https://people.apache.org/~hossman/#xyproblem > XY Problem > > Your question appears to be an "XY Problem" ... that is: you are dealing > with "X", you are assuming "Y" will help you, and you are asking about "Y" > without giving more details about the "X" so that we can understand the > full issue. Perhaps the best solution doesn't involve "Y" at all? > See Also: http://www.perlmonks.org/index.pl?node_id=542341 > > > > : > : On Wed, Aug 23, 2017 at 2:10 PM, Markus Jelsma < > markus.jel...@openindex.io> > : wrote: > : > : > Technically they could, facetting is possible on TextField, but it > would > : > be useless for facetting. Payloads are only used for scoring via a > custom > : > Similarity. Payloads also can only contain one byte of information (or > was > : > it 64 bits?) > : > > : > Payloads are not something you want to use when dealing with > translations. > : > We handle facet constraint (and facet field) translations just by > mapping > : > internal value to a translated value when displaying facet, and vice > versa > : > when filtering. > : > > : > -Original message- > : > > From:Webster Homer > : > > Sent: Wednesday 23rd August 2017 20:22 > : > > To: solr-user@lucene.apache.org > : > > Subject: Facet on a Payload field type? > : > > > : > > Is it possible to facet on a payload field type? > : > > > : > > We are moving from Endeca to Solr. We have a number of Endeca facets > : > where > : > > we have hacked in multilangauge support. The multiple languages are > : > really > : > > just for displaying the value of a term internally the value used to > : > search > : > > is in English. The problem is that we don't have translations for > most of > : > > our facet data and this was a way to support multiple languages with > the > : > > data we have. > : > > > : > > Looking at the Solrj FacetField class I cannot tell if the value can > : > > contain a payload or not > : > > > : > > -- > : > > > : > > > : > > This message and any attachment are confidential and may be > privileged or > : > > otherwise protected from disclosure. If you are not the intended > : > recipient, > : > > you must not copy this message or attachment or disclose the > contents to > : > > any other person. If you have received this transmission in error, > please > : > > notify the sender immediately and delete the message and any > attachment > : > > from your system. Merck KGaA, Darmstadt, Germany and any of its > : > > subsidiaries do not accept liability for any omissions or errors in > this > : > > message which may arise as a result of E-Mail-transmission or for > damages > : > > resulting from any unauthorized changes of the content of this > message > : > and > : > > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its > : > > subsidiaries do not guarantee that this message is free of viruses > and > : > does > : > > not accept liability for any damages caused by any virus transmitted > : > > therewith. >
Re: Solr Wiki issues
This appears to have happened for at least one other Apache project using Apache's Confluence installation: https://issues.apache.org/jira/browse/INFRA-14971. You should use the new Ref Guide anyway: https://lucene.apache.org/solr/guide/post-tool.html. An automatic redirect from the old location is in the works. On Mon, Aug 28, 2017 at 11:32 AM, Erick Erickson wrote: > Hmmm, no it's not just you, I see them too. > > > On Mon, Aug 28, 2017 at 7:45 AM, Steve Pruitt wrote: >> Is it just me, but the Solr Wiki shows nonsensical characters for what looks >> like example commands, etc.? I tried both Chrome and IE and get the same >> result. >> >> Example, on https://cwiki.apache.org/confluence/display/solr/Post+Tool >> >> This shows: >> >> Index a PDF file into gettingstarted. >> #66nonesolid >> >> Automatically detect content types in a folder, and recursively scan it for >> documents for indexing into gettingstarted. >> #66nonesolid >> >> Automatically detect content types in a folder, but limit it to PPT and HTML >> files and index into gettingstarted. >> #66nonesolid >> >> This started showing up a few days ago. >> >> Thanks. >> >> -S
Zookeeper issues
Hi,When trying to ingest data into solr, got a lot of zookeeper exceptions and the load fails :org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/WIP_DWO/state.json org.apache.solr.common.SolrException: Could not load collection from ZK: WIP_DWO at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1098) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.common.cloud.ZkStateReader$LazyCollectionRef.get(ZkStateReader.java:638) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.client.solrj.impl.CloudSolrClient.getDocCollection(CloudSolrClient.java:1482) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1092) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1057) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:160) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:484) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.client.solrj.SolrClient.commit(SolrClient.java:448) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at oracle.ecc.index.retrieve.solr.util.QueryExecuter.commitThenOptimize(QueryExecuter.java:143) ~[ecc-ir-1.0-SNAPSHOT.jar:na] at oracle.ecc.index.tools.IndexDataHelper.commit(IndexDataHelper.java:73) ~[ecc-ir-1.0-SNAPSHOT.jar:na] at oracle.ecc.index.tools.DataLoadUtil.loadDataForDataset(DataLoadUtil.java:286) ~[ecc-ir-1.0-SNAPSHOT.jar:na] at oracle.ecc.index.retrieve.services.impl.IRDataLoadServiceImpl.lambda$runProcessorsForJobSync$1(IRDataLoadServiceImpl.java:151) [ecc-ir-1.0-SNAPSHOT.jar:na] at java.util.concurrent.CompletableFuture$AsyncSupply.run(CompletableFuture.java:1590) ~[na:1.8.0_73] at java.util.concurrent.CompletableFuture$AsyncSupply.exec(CompletableFuture.java:1582) ~[na:1.8.0_73] at java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:289) ~[na:1.8.0_73] at java.util.concurrent.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1056) ~[na:1.8.0_73] at java.util.concurrent.ForkJoinPool.runWorker(ForkJoinPool.java:1692) ~[na:1.8.0_73] at java.util.concurrent.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:157) ~[na:1.8.0_73]Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = Session expired for /collections/WIP_DWO/state.json at org.apache.zookeeper.KeeperException.create(KeeperException.java:127) ~[zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) ~[zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.zookeeper.ZooKeeper.getData(ZooKeeper.java:1155) ~[zookeeper-3.4.6.jar:3.4.6-1569965] at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:356) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.common.cloud.SolrZkClient$7.execute(SolrZkClient.java:353) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:60) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.common.cloud.SolrZkClient.getData(SolrZkClient.java:353) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.common.cloud.ZkStateReader.fetchCollectionState(ZkStateReader.java:1110) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] at org.apache.solr.common.cloud.ZkStateReader.getCollectionLive(ZkStateReader.java:1096) ~[solr-solrj-6.5.0.jar:6.5.0 4b16c9a10c3c00cafaf1fc92ec3276a7bc7b8c95 - jimczi - 2017-03-21 20:47:15] ... 17 common frames omitted
Re: Indexed=false for a field,but still able to search on field.
Is docValues enabled (this happens by default in some versions)? I think I've seen this enable searching on a field. If that's the root of the problem, let us know since it's trappy and we should discuss this on the dev list. Best, Erick On Sun, Aug 27, 2017 at 10:58 PM, AshB wrote: > Hi, > > I created a field as,expecting I won't be able to search on it > > . > > But i am able to search on it.Sample query below > > fileName:"ipgb20080916_1078.xml" > > What is wrong here.I am not doing any copy of this field > > Solrversion:6.5.1 > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Indexed-false-for-a-field-but-still-able-to-search-on-field-tp4352338.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: Search by similarity?
What are the results of adding &debug=query to the URL? The parsed query will be especially illuminating. Best, Erick On Mon, Aug 28, 2017 at 4:37 AM, Emir Arnautovic wrote: > Hi Darko, > > The issue is the wrong expectations: title-1-end is parsed to 3 tokens > (guessing) and mm=99% of 3 tokens is 2.99 and it is rounded down to 2. Since > all your documents have 'title' and 'end' tokens, all match. If you want to > round up, you can use mm=-1% - that will result in zero (or one match if you > do not filter out original document). > > You have to play with your tokenizers and define what is similarity match > percentage (if you want to stick with mm). > > Regards, > Emir > > > > On 28.08.2017 09:17, Darko Todoric wrote: >> >> Hm... I cannot make that this DisMax work on my Solr... >> >> In solr I have document with title: >> - "title-1-end" >> - "title-2-end" >> - "title-3-end" >> - ... >> - ... >> - "title-312-end" >> >> and when I make query >> "*http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title:"title-123123123-end"&wt=json*' >> I get all documents from solr :\ >> What I doing wrong? >> >> Also, I don't know if affecting results, but on "title" field I use >> "WhitespaceTokenizerFactory". >> >> Kind regards, >> Darko >> >> >> On 08/25/2017 06:38 PM, Junte Zhang wrote: >>> >>> If you already have the title of the document, then you could run that >>> title as a new query against the whole index and exclude the source document >>> from the results as a filter. >>> >>> You could use the DisMax query parser: >>> https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser >>> >>> And then set the minimum match ratio of the OR clauses to 90%. >>> >>> /JZ >>> >>> -Original Message- >>> From: Darko Todoric [mailto:todo...@mdpi.com] >>> Sent: Friday, August 25, 2017 5:49 PM >>> To: solr-user@lucene.apache.org >>> Subject: Search by similarity? >>> >>> Hi, >>> >>> >>> I have 90.000.000 documents in Solr and I need to compare "title" of this >>> document and get all documents with more than 80% similarity. PHP have >>> "similar_text" but it's not so smart inserting 90m documents in the array... >>> Can I do some query in Solr which will give me the more the 80% >>> similarity? >>> >>> >>> Kind regards, >>> Darko Todoric >>> >>> -- >>> Darko Todoric >>> Web Engineer, MDPI DOO >>> Veljka Dugosevica 54, 11060 Belgrade, Serbia >>> +381 65 43 90 620 >>> www.mdpi.com >>> >>> Disclaimer: The information and files contained in this message are >>> confidential and intended solely for the use of the individual or entity to >>> whom they are addressed. >>> f you have received this message in error, please notify me and delete >>> this message from your system. >>> You may not copy this message in its entirety or in part, or disclose its >>> contents to anyone. >>> >> > > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ >
Re: Solr Wiki issues
Hmmm, no it's not just you, I see them too. On Mon, Aug 28, 2017 at 7:45 AM, Steve Pruitt wrote: > Is it just me, but the Solr Wiki shows nonsensical characters for what looks > like example commands, etc.? I tried both Chrome and IE and get the same > result. > > Example, on https://cwiki.apache.org/confluence/display/solr/Post+Tool > > This shows: > > Index a PDF file into gettingstarted. > #66nonesolid > > Automatically detect content types in a folder, and recursively scan it for > documents for indexing into gettingstarted. > #66nonesolid > > Automatically detect content types in a folder, but limit it to PPT and HTML > files and index into gettingstarted. > #66nonesolid > > This started showing up a few days ago. > > Thanks. > > -S
Re: Solr memory leak
Varun Thacker is the RM for Solr 6.6.1, I've pinged him about including it. On Mon, Aug 28, 2017 at 8:52 AM, Walter Underwood wrote: > That would be a really good reason for a 6.7. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ (my blog) > > >> On Aug 28, 2017, at 8:48 AM, Markus Jelsma >> wrote: >> >> It is, unfortunately, not committed for 6.7. >> >> >> >> >> >> -Original message- >>> From:Markus Jelsma >>> Sent: Monday 28th August 2017 17:46 >>> To: solr-user@lucene.apache.org >>> Subject: RE: Solr memory leak >>> >>> See https://issues.apache.org/jira/browse/SOLR-10506 >>> Fixed for 7.0 >>> >>> Markus >>> >>> >>> >>> -Original message- From:Hendrik Haddorp Sent: Monday 28th August 2017 17:42 To: solr-user@lucene.apache.org Subject: Solr memory leak Hi, we noticed that triggering collection reloads on many collections has a good chance to result in an OOM-Error. To investigate that further I did a simple test: - Start solr with a 2GB heap and 1GB Metaspace - create a trivial collection with a few documents (I used only 2 fields and 100 documents) - trigger a collection reload in a loop (I used SolrJ for this) Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 worked better but also failed after 1100 loops. When looking at the memory usage on the Solr dashboard it looks like the space left after GC cycles gets less and less. Then Solr gets very slow, as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In my last run this was actually for the Metaspace. So it looks like more and more heap and metaspace is being used by just constantly reloading a trivial collection. regards, Hendrik >>> >
Re: Solr memory leak
That would be a really good reason for a 6.7. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 28, 2017, at 8:48 AM, Markus Jelsma wrote: > > It is, unfortunately, not committed for 6.7. > > > > > > -Original message- >> From:Markus Jelsma >> Sent: Monday 28th August 2017 17:46 >> To: solr-user@lucene.apache.org >> Subject: RE: Solr memory leak >> >> See https://issues.apache.org/jira/browse/SOLR-10506 >> Fixed for 7.0 >> >> Markus >> >> >> >> -Original message- >>> From:Hendrik Haddorp >>> Sent: Monday 28th August 2017 17:42 >>> To: solr-user@lucene.apache.org >>> Subject: Solr memory leak >>> >>> Hi, >>> >>> we noticed that triggering collection reloads on many collections has a >>> good chance to result in an OOM-Error. To investigate that further I did >>> a simple test: >>> - Start solr with a 2GB heap and 1GB Metaspace >>> - create a trivial collection with a few documents (I used only 2 >>> fields and 100 documents) >>> - trigger a collection reload in a loop (I used SolrJ for this) >>> >>> Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 >>> worked better but also failed after 1100 loops. >>> >>> When looking at the memory usage on the Solr dashboard it looks like the >>> space left after GC cycles gets less and less. Then Solr gets very slow, >>> as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In >>> my last run this was actually for the Metaspace. So it looks like more >>> and more heap and metaspace is being used by just constantly reloading a >>> trivial collection. >>> >>> regards, >>> Hendrik >>> >>
RE: Solr memory leak
It is, unfortunately, not committed for 6.7. -Original message- > From:Markus Jelsma > Sent: Monday 28th August 2017 17:46 > To: solr-user@lucene.apache.org > Subject: RE: Solr memory leak > > See https://issues.apache.org/jira/browse/SOLR-10506 > Fixed for 7.0 > > Markus > > > > -Original message- > > From:Hendrik Haddorp > > Sent: Monday 28th August 2017 17:42 > > To: solr-user@lucene.apache.org > > Subject: Solr memory leak > > > > Hi, > > > > we noticed that triggering collection reloads on many collections has a > > good chance to result in an OOM-Error. To investigate that further I did > > a simple test: > > - Start solr with a 2GB heap and 1GB Metaspace > > - create a trivial collection with a few documents (I used only 2 > > fields and 100 documents) > > - trigger a collection reload in a loop (I used SolrJ for this) > > > > Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 > > worked better but also failed after 1100 loops. > > > > When looking at the memory usage on the Solr dashboard it looks like the > > space left after GC cycles gets less and less. Then Solr gets very slow, > > as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In > > my last run this was actually for the Metaspace. So it looks like more > > and more heap and metaspace is being used by just constantly reloading a > > trivial collection. > > > > regards, > > Hendrik > > >
RE: Solr memory leak
See https://issues.apache.org/jira/browse/SOLR-10506 Fixed for 7.0 Markus -Original message- > From:Hendrik Haddorp > Sent: Monday 28th August 2017 17:42 > To: solr-user@lucene.apache.org > Subject: Solr memory leak > > Hi, > > we noticed that triggering collection reloads on many collections has a > good chance to result in an OOM-Error. To investigate that further I did > a simple test: > - Start solr with a 2GB heap and 1GB Metaspace > - create a trivial collection with a few documents (I used only 2 > fields and 100 documents) > - trigger a collection reload in a loop (I used SolrJ for this) > > Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 > worked better but also failed after 1100 loops. > > When looking at the memory usage on the Solr dashboard it looks like the > space left after GC cycles gets less and less. Then Solr gets very slow, > as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In > my last run this was actually for the Metaspace. So it looks like more > and more heap and metaspace is being used by just constantly reloading a > trivial collection. > > regards, > Hendrik >
Solr memory leak
Hi, we noticed that triggering collection reloads on many collections has a good chance to result in an OOM-Error. To investigate that further I did a simple test: - Start solr with a 2GB heap and 1GB Metaspace - create a trivial collection with a few documents (I used only 2 fields and 100 documents) - trigger a collection reload in a loop (I used SolrJ for this) Using Solr 6.3 the test started to fail after about 250 loops. Solr 6.6 worked better but also failed after 1100 loops. When looking at the memory usage on the Solr dashboard it looks like the space left after GC cycles gets less and less. Then Solr gets very slow, as the JVM is busy with the GC. A bit later Solr gets an OOM-Error. In my last run this was actually for the Metaspace. So it looks like more and more heap and metaspace is being used by just constantly reloading a trivial collection. regards, Hendrik
Solr Wiki issues
Is it just me, but the Solr Wiki shows nonsensical characters for what looks like example commands, etc.? I tried both Chrome and IE and get the same result. Example, on https://cwiki.apache.org/confluence/display/solr/Post+Tool This shows: Index a PDF file into gettingstarted. #66nonesolid Automatically detect content types in a folder, and recursively scan it for documents for indexing into gettingstarted. #66nonesolid Automatically detect content types in a folder, but limit it to PPT and HTML files and index into gettingstarted. #66nonesolid This started showing up a few days ago. Thanks. -S
Re: Index relational database
Hello Renuka, I would suggest to start with your use case(s). May be start with your first use case with the below questions a) What is that you want to search (which fields like name, desc, city etc.) b) What is that you want to show part of search result (name, city etc.) Based on above two questions, you would know what data to pull in from relational database and create solr schema and index the data. You may first try to denormalize / flatten the structure so that you deal with one collection/schema and query upon it. HTH. Thanks, Susheel On Mon, Aug 28, 2017 at 8:04 AM, Renuka Srishti wrote: > Hii, > > What is the best way to index relational database, and how it impacts on > the performance? > > Thanks > Renuka Srishti >
Index relational database
Hii, What is the best way to index relational database, and how it impacts on the performance? Thanks Renuka Srishti
Re: Search by similarity?
Hi Darko, The issue is the wrong expectations: title-1-end is parsed to 3 tokens (guessing) and mm=99% of 3 tokens is 2.99 and it is rounded down to 2. Since all your documents have 'title' and 'end' tokens, all match. If you want to round up, you can use mm=-1% - that will result in zero (or one match if you do not filter out original document). You have to play with your tokenizers and define what is similarity match percentage (if you want to stick with mm). Regards, Emir On 28.08.2017 09:17, Darko Todoric wrote: Hm... I cannot make that this DisMax work on my Solr... In solr I have document with title: - "title-1-end" - "title-2-end" - "title-3-end" - ... - ... - "title-312-end" and when I make query "*http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title:"title-123123123-end"&wt=json*' I get all documents from solr :\ What I doing wrong? Also, I don't know if affecting results, but on "title" field I use "WhitespaceTokenizerFactory". Kind regards, Darko On 08/25/2017 06:38 PM, Junte Zhang wrote: If you already have the title of the document, then you could run that title as a new query against the whole index and exclude the source document from the results as a filter. You could use the DisMax query parser: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser And then set the minimum match ratio of the OR clauses to 90%. /JZ -Original Message- From: Darko Todoric [mailto:todo...@mdpi.com] Sent: Friday, August 25, 2017 5:49 PM To: solr-user@lucene.apache.org Subject: Search by similarity? Hi, I have 90.000.000 documents in Solr and I need to compare "title" of this document and get all documents with more than 80% similarity. PHP have "similar_text" but it's not so smart inserting 90m documents in the array... Can I do some query in Solr which will give me the more the 80% similarity? Kind regards, Darko Todoric -- Darko Todoric Web Engineer, MDPI DOO Veljka Dugosevica 54, 11060 Belgrade, Serbia +381 65 43 90 620 www.mdpi.com Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. f you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/
Re: Excessive resources consumption migrating from Solr 6.6.0 Master/Slave to SolrCloud 6.6.0 (dozen times more resources)
Hi Scott, Yes, we think that our usage scenario falls into Index-Heavy/Query-Heavy too. We have tested with several values in softcommit/hardcommit values (from few seconds to minutes) with no appreciable improvements :( Thanks for your reply! - Daniel 2017-08-25 6:45 GMT+02:00 Scott Stults : > Hi Dani, > > It seems like your use case falls into the Index-Heavy / Query-Heavy > category, so you might try increasing your hard commit frequency to 15 > seconds rather than 15 minutes: > > https://lucidworks.com/2013/08/23/understanding- > transaction-logs-softcommit-and-commit-in-sorlcloud/ > > > -Scott > > On Thu, Aug 24, 2017 at 10:03 AM, Daniel Ortega < > danielortegauf...@gmail.com > > wrote: > > > Hi Scott, > > > > In our indexing service we are using that client too > > (org.apache.solr.client.solrj.impl.CloudSolrClient) :) > > > > This is out Update Request Processor chain configuration: > > > > > name > > ="signature"> true name="signatureField"> > > hash false > "signatureClass">solr.processor.Lookup3Signature > > > < > > updateRequestProcessorChain processor="signature" name="dedupe"> > > class="solr.LogUpdateProcessorFactory" /> > "solr.RunUpdateProcessorFactory" /> < > > requestHandler name="/update" class="solr.UpdateRequestHandler" > > name= > > "defaults"> dedupe > > > > > Thanks for your reply :) > > > > - Dani > > > > 2017-08-24 14:49 GMT+02:00 Scott Stults opensourceconnections.com > > >: > > > > > Hi Daniel, > > > > > > SolrJ has a few client implementations to choose from: CloudSolrClient, > > > ConcurrentUpdateSolrClient, HttpSolrClient, LBHttpSolrClient. You said > > your > > > query service uses CloudSolrClient, but it would be good to verify > which > > > implementation your indexing service uses. > > > > > > One of the problems you might be having is with your deduplication > step. > > > Can you post your Update Request Processor Chain? > > > > > > > > > -Scott > > > > > > > > > On Wed, Aug 23, 2017 at 4:13 PM, Daniel Ortega < > > > danielortegauf...@gmail.com> > > > wrote: > > > > > > > Hi Scott, > > > > > > > > - *Can you describe the process that queries the DB and sends records > > to > > > * > > > > *Solr?* > > > > > > > > We are enqueueing ids during every ORACLE transaction (in > > > insert/updates). > > > > > > > > An application dequeues every id and perform queries against dozen of > > > > tables in the relational model to retrieve the fields to build the > > > > document. As we know that we are modifying the same ORACLE row in > > > > different (but consecutive) transactions, we store only the last > > version > > > of > > > > the modified documents in a map data structure. > > > > > > > > The application has a configurable interval to send the documents > > stored > > > in > > > > the map to the update handler (we have tested different intervals > from > > > few > > > > milliseconds to several seconds) using the SolrJ client. Actually we > > are > > > > sending all the documents every 15 seconds. > > > > > > > > This application is developed using Java, Spring and Maven and we > have > > > > several instances. > > > > > > > > -* Is it a SolrJ-based application?* > > > > > > > > Yes, it is. We aren't using the last version of SolrJ client (we are > > > > currently using SolrJ v6.3.0). > > > > > > > > - *If it is, which client package are you using?* > > > > > > > > I don't know exactly what do you mean saying 'client package' :) > > > > > > > > - *How many documents do you send at once?* > > > > > > > > It depends on the defined interval described before and the number of > > > > transactions executed in our relational database. From dozens to few > > > > hundreds (and even thousands). > > > > > > > > - *Are you sending your indexing or query traffic through a load > > > balancer?* > > > > > > > > We aren't using a load balancer for indexing, but we have all our > Rest > > > > Query services through an HAProxy (using 'leastconn' algorithm). The > > Rest > > > > Query Services performs queries using the CloudSolrClient. > > > > > > > > Thanks for your reply, > > > > if you need any further information don't hesitate to ask > > > > > > > > Daniel > > > > > > > > 2017-08-23 14:57 GMT+02:00 Scott Stults > > opensourceconnections.com > > > > >: > > > > > > > > > Hi Daniel, > > > > > > > > > > Great background information about your setup! I've got just a few > > more > > > > > questions: > > > > > > > > > > - Can you describe the process that queries the DB and sends > records > > to > > > > > Solr? > > > > > - Is it a SolrJ-based application? > > > > > - If it is, which client package are you using? > > > > > - How many documents do you send at once? > > > > > - Are you sending your indexing or query traffic through a load > > > balancer? > > > > > > > > > > If you're sending documents to each replica as fast as they can > take > > > > them, > > > > > you might be seeing a bottleneck at the shard leaders. The SolrJ > > > > > CloudSolrClient finds out from Zooke
Re: Search by similarity?
Hm... I cannot make that this DisMax work on my Solr... In solr I have document with title: - "title-1-end" - "title-2-end" - "title-3-end" - ... - ... - "title-312-end" and when I make query "*http://localhost:8983/solr/SciLit/select?defType=dismax&indent=on&mm=99%&q=title:"title-123123123-end"&wt=json*' I get all documents from solr :\ What I doing wrong? Also, I don't know if affecting results, but on "title" field I use "WhitespaceTokenizerFactory". Kind regards, Darko On 08/25/2017 06:38 PM, Junte Zhang wrote: If you already have the title of the document, then you could run that title as a new query against the whole index and exclude the source document from the results as a filter. You could use the DisMax query parser: https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser And then set the minimum match ratio of the OR clauses to 90%. /JZ -Original Message- From: Darko Todoric [mailto:todo...@mdpi.com] Sent: Friday, August 25, 2017 5:49 PM To: solr-user@lucene.apache.org Subject: Search by similarity? Hi, I have 90.000.000 documents in Solr and I need to compare "title" of this document and get all documents with more than 80% similarity. PHP have "similar_text" but it's not so smart inserting 90m documents in the array... Can I do some query in Solr which will give me the more the 80% similarity? Kind regards, Darko Todoric -- Darko Todoric Web Engineer, MDPI DOO Veljka Dugosevica 54, 11060 Belgrade, Serbia +381 65 43 90 620 www.mdpi.com Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. f you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone. -- Darko Todoric Web Engineer, MDPI DOO Veljka Dugosevica 54, 11060 Belgrade, Serbia +381 65 43 90 620 www.mdpi.com Disclaimer: The information and files contained in this message are confidential and intended solely for the use of the individual or entity to whom they are addressed. f you have received this message in error, please notify me and delete this message from your system. You may not copy this message in its entirety or in part, or disclose its contents to anyone.