Solr Expand throws NPE along with elevate component
Hi, I’m Facing a issue with expand component when working alongside with elevate component, In some of the request (not for all request) expand component is throwing NPE below is the stacktrace,any idea why the ArrayTimSorter object is null and any way to avoid that Solr Log stack trace- 2018-02-21 05:09:06.404 ERROR (qtp444920847-16) [c:test s:shard1 r:core_node1 x:test_shard1_replica1] o.a.s.s.HttpSolrCall null:java.io.IOException: java.lang.NullPointerException at org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:339) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:304) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:173) at org.apache.solr.core.SolrCore.execute(SolrCore.java:2477) at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:723) at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:529) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:361) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:305) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1691) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134) at org.eclipse.jetty.server.Server.handle(Server.java:534) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273) at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95) at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148) at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671) at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589) at java.lang.Thread.run(Thread.java:745) Caused by: java.lang.NullPointerException at java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:52) at java.util.Comparators$NaturalOrderComparator.compare(Comparators.java:47) at org.apache.lucene.util.ArrayTimSorter.compare(ArrayTimSorter.java:48) at org.apache.lucene.util.Sorter.comparePivot(Sorter.java:50) at org.apache.lucene.util.Sorter.binarySort(Sorter.java:197) at org.apache.lucene.util.TimSorter.nextRun(TimSorter.java:120) at org.apache.lucene.util.TimSorter.sort(TimSorter.java:201) at org.apache.lucene.util.ArrayUtil.timSort(ArrayUtil.java:426) at org.apache.lucene.util.ArrayUtil.timSort(ArrayUtil.java:445) at org.apache.lucene.util.ArrayUtil.timSort(ArrayUtil.java:453) at org.apache.lucene.search.TermInSetQuery.(TermInSetQuery.java:87) at org.apache.lucene.search.TermInSetQuery.(TermInSetQuery.java:109) at org.apache.solr.handler.component.ExpandComponent.getGroupQuery(ExpandComponent.java:718) at org.apache.solr.handler.component.ExpandComponent.process(ExpandComponent.java:337) ... 34 more Response stack trace- java.io.IOException: java.lang.NullPointerException\n\tat
Re: Streaming Expressions using Solrj.io
On 2/20/2018 7:54 PM, Ryan Yacyshyn wrote: I'd like to get a stream of search results using the solrj.io package but running into a small issue. Exception in thread "main" java.lang.NoSuchMethodError: org.apache.http.impl.client.HttpClientBuilder.evictIdleConnections(JLjava/util/concurrent/TimeUnit;)Lorg/apache/http/impl/client/HttpClientBuilder; There is a problem accessing the HttpClient library. Either the httpclient jar is missing from your project, or it's the wrong version. You can use pretty much any 4.5.x version for recent SolrJ versions. 3.x versions won't work at all, and older 4.x versions not work. The 5.0 beta releases also won't work. You can find information about 4.0 and later versions of HttpClient here: http://hc.apache.org/ If you use a dependency manager like gradle, maven, or ivy for your project, just be sure it's set to pull in all transitive dependencies for solrj, and you should be fine. If you manage dependencies manually, you will find all of the extra jars required by the solrj client in the download, in the dist/solrj-lib directory. Note that you can very likely upgrade individual dependencies to newer versions than Solr includes with no issues. Thanks, Shawn
Streaming Expressions using Solrj.io
Hello all, I'd like to get a stream of search results using the solrj.io package but running into a small issue. It seems to have something to do with the HttpClientUtil. I'm testing on SolrCloud 7.1.0, using the sample_techproducts_configs configs, and indexed the manufacturers.xml file. I'm following the test code in the method `testCloudSolrStreamWithZkHost` found in StreamExpressionTest.java: ``` package ca.ryac.testing; import java.io.IOException; import java.util.ArrayList; import java.util.List; import org.apache.solr.client.solrj.io.SolrClientCache; import org.apache.solr.client.solrj.io.Tuple; import org.apache.solr.client.solrj.io.stream.CloudSolrStream; import org.apache.solr.client.solrj.io.stream.StreamContext; import org.apache.solr.client.solrj.io.stream.TupleStream; import org.apache.solr.client.solrj.io.stream.expr.StreamExpression; import org.apache.solr.client.solrj.io.stream.expr.StreamExpressionParser; import org.apache.solr.client.solrj.io.stream.expr.StreamFactory; public class SolrStreamingClient { String zkHost = "localhost:9983"; String COLLECTIONORALIAS = "gettingstarted"; public SolrStreamingClient() throws Exception { init(); } public static void main(String[] args) throws Exception { new SolrStreamingClient(); } private void init() throws Exception { System.out.println(zkHost); StreamFactory factory = new StreamFactory(); StreamExpression expression; CloudSolrStream stream; List tuples; StreamContext streamContext = new StreamContext(); SolrClientCache solrClientCache = new SolrClientCache(); streamContext.setSolrClientCache(solrClientCache); // basic test.. String expr = "search(" + COLLECTIONORALIAS + ", zkHost=\"" + zkHost + "\", q=*:*, fl=\"id,compName_s\", sort=\"compName_s asc\")"; System.out.println(expr); expression = StreamExpressionParser.parse(expr); stream = new CloudSolrStream(expression, factory); stream.setStreamContext(streamContext); tuples = getTuples(stream); System.out.println(tuples.size()); } protected List getTuples(TupleStream tupleStream) throws IOException { List tuples = new ArrayList(); try { System.out.println("open stream.."); tupleStream.open(); for (Tuple t = tupleStream.read(); !t.EOF; t = tupleStream.read()) { tuples.add(t); } } finally { tupleStream.close(); } return tuples; } } ``` And this is the output I get: --- localhost:9983 SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. search(gettingstarted, zkHost="localhost:9983", q=*:*, fl="id,compName_s", sort="compName_s asc") open stream.. Exception in thread "main" java.lang.NoSuchMethodError: org.apache.http.impl.client.HttpClientBuilder.evictIdleConnections(JLjava/util/concurrent/TimeUnit;)Lorg/apache/http/impl/client/HttpClientBuilder; at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:279) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:298) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:236) at org.apache.solr.client.solrj.impl.HttpClientUtil.createClient(HttpClientUtil.java:223) at org.apache.solr.client.solrj.impl.CloudSolrClient.(CloudSolrClient.java:276) at org.apache.solr.client.solrj.impl.CloudSolrClient$Builder.build(CloudSolrClient.java:1525) at org.apache.solr.client.solrj.io.SolrClientCache.getCloudSolrClient(SolrClientCache.java:62) at org.apache.solr.client.solrj.io.stream.TupleStream.getShards(TupleStream.java:138) at org.apache.solr.client.solrj.io.stream.CloudSolrStream.constructStreams(CloudSolrStream.java:368) at org.apache.solr.client.solrj.io.stream.CloudSolrStream.open(CloudSolrStream.java:274) at ca.ryac.testing.SolrStreamingClient.getTuples(SolrStreamingClient.java:61) at ca.ryac.testing.SolrStreamingClient.init(SolrStreamingClient.java:51) at ca.ryac.testing.SolrStreamingClient.(SolrStreamingClient.java:22) at ca.ryac.testing.SolrStreamingClient.main(SolrStreamingClient.java:26) --- It's not finding or connecting to my SolrCloud instance, I can put *anything* in zkHost and get the same results. Not really sure why it can't find or connect to it. Any thoughts or ideas? Thank you, Ryan
Re: Filesystems supported by Solr
On 2/20/2018 3:22 PM, Ritesh Chaman wrote: > May I know what all filesystems are supported by Solr. For eg ADLS,WASB, S3 > etc. Thanks. Solr supports whatever your operating system supports. It will expect file locking to work be fully functional, so things like NFS don't always work. Local filesystems are very much preferred, and will generally have the best performance. As far as I am aware, the only filesystem that Solr has explicit support for (outside of what the OS itself provides) is HDFS. https://lucene.apache.org/solr/guide/7_2/running-solr-on-hdfs.html There may be plugins available to store indexes in other stores like S3, but if those exist, I am not immediately aware of them. They would be third-party plugins, not supported by the Solr project. Thanks, Shawn
Re: Filesystems supported by Solr
Ritesh The filesystems you mention are used by Spark so it can stream huge quantities of data (corrections please). By comparison, Solr uses a more 'reasonable' sized filesystem, but needs enough memory that all the index data can be resident. The regular Linux ext3 or ext4 is fine. If you are integrating Solr with Spark, then the filesystems you mention would be for Spark not Solr. Cheers -- Rick On February 20, 2018 5:22:33 PM EST, Ritesh Chamanwrote: >Hi team > >May I know what all filesystems are supported by Solr. For eg >ADLS,WASB, S3 >etc. Thanks. > >Ritesh -- Sorry for being brief. Alternate email is rickleir at yahoo dot com
Filesystems supported by Solr
Hi team May I know what all filesystems are supported by Solr. For eg ADLS,WASB, S3 etc. Thanks. Ritesh
Re: storing large text fields in a database? (instead of inside index)
Say there is a high load and I'd like to bring a new machine and let it replicate the index, if 100gb and more can be shaved, it will have a significant impact on how quickly the new searcher is ready and added to the cluster. Impact on the search speed is likely minimal. we are investigating the idea of two clusters but i have to say it seems to me more complex than storing/loading a field from an external source. having said that, I wonder why this was not done before (maybe it was) and what the cons are (besides the obvious ones: maintenance and the database being potential point of failure; well in that case i'd miss highlights - can live with that...) On Tue, Feb 20, 2018 at 10:36 AM, David Hastings < hastings.recurs...@gmail.com> wrote: > Really depends on what you consider too large, and why the size is a big > issue, since most replication will go at about 100mg/second give or take, > and replicating a 300GB index is only an hour or two. What i do for this > purpose is store my text in a separate index altogether, and call on that > core for highlighting. So for my use case, the primary index with no > stored text is around 300GB and replicates as needed, and the full text > indexes with stored text totals around 500GB and are replicating non stop. > All searching goes against the primary index, and for highlighting i call > on the full text indexes that have a stupid simple schema. This has worked > for me pretty well at least. > > On Tue, Feb 20, 2018 at 10:27 AM, Roman Chyla> wrote: > > > Hello, > > > > We have a use case of a very large index (slave-master; for unrelated > > reasons the search cannot work in the cloud mode) - one of the fields is > a > > very large text, stored mostly for highlighting. To cut down the index > size > > (for purposes of replication/scaling) I thought I could try to save it > in a > > database - and not in the index. > > > > Lucene has codecs - one of the methods is for 'stored field', so that > seems > > likes a natural path for me. > > > > However, I'd expect somebody else before had a similar problem. I googled > > and couldn't find any solutions. Using the codecs seems really good thing > > for this particular problem, am I missing something? Is there a better > way > > to cut down on index size? (besides solr cloud/sharding, compression) > > > > Thank you, > > > >Roman > > >
Re: What is “high cardinality” in facet streams?
The rollup streaming expression rolls up aggregations on a stream that has been sorted by the group by fields. This is basically a MapReduce reduce operation and can work with extremely high cardinality (basically unlimited). The rollup function is designed to rollup data produced by the /export handler which can also sort data sets with very high cardinality. The docs should describe the correct usage of the rollup expression with the /export handler. Joel Bernstein http://joelsolr.blogspot.com/ On Tue, Feb 20, 2018 at 11:10 AM, Shawn Heiseywrote: > On 2/20/2018 4:44 AM, Alfonso Muñoz-Pomer Fuentes wrote: > >> We have a query that we can resolve using either facet or search with >> rollup. In the Stream Source Reference section of Solr’s Reference Guide ( >> https://lucene.apache.org/solr/guide/7_1/stream-source-refe >> rence.html#facet) it says “To support high cardinality aggregations see >> the rollup function”. I was wondering what it’s considered “high >> cardinality”. If it serves, our query returns up to 60k results. I haven’t >> got to do any benchmarking to see if there’s any difference, though, >> because facet so far performs very well, but I don’t know if I’m near the >> “tipping point”. Any feedback would be appreciated. >> > > There's no hard and fast rule for this. The tipping point is going to be > different for every use case. With a little bit of information about your > setup, experienced users can make an educated guess about whether or not > performance will be good, but cannot say with absolute certainty what > you're going to run into. > > Let's start with some definitions, which you may or may not already know: > > https://en.wikipedia.org/wiki/Cardinality_(data_modeling) > https://en.wikipedia.org/wiki/Cardinality > > You haven't said how many unique values are in your field. The only > information I have from you is 60K results from your queries, which may or > may not have any bearing on the total number of documents in your index, or > the total number of unique values in the field you're using for faceting. > So the next paragraph may or may not apply to your index. > > In general, 60,000 unique values in a field would be considered very low > cardinality, because computers can typically operate on 60,000 values > *very* quickly, unless the size of each value is enormous. But if the > index has 60,000 total documents, then *in relation to other data*, the > cardinality is very high, even though most people would say the opposite. > Sixty thousand documents or unique values is almost always a very small > index, not prone to performance issues. > > The warnings about cardinality in the Solr documentation mostly refer to > *absolute* cardinality -- how many unique values there are in a field, > regardless of the actual number of documents. If there are millions or > billions of unique values, then operations like facets, grouping, sorting, > etc are probably going to be slow. If there are a lot less, such as > thousands or only a handful, then those operations are likely to be very > fast, because the computer will have less information it must process. > > Thanks, > Shawn > >
solr.DictionaryCompoundWordTokenFilterFactory filter and double quotes
Hi, We have below field type defined in our schema.xml to support the German Compound word search . This works find. But even when double quotes are there in the search term , it gets split . Is there a way not to split the term when double quotes are present in the query with this field type Thanks in Advance, Rajeswari
Re: What is “high cardinality” in facet streams?
On 2/20/2018 4:44 AM, Alfonso Muñoz-Pomer Fuentes wrote: We have a query that we can resolve using either facet or search with rollup. In the Stream Source Reference section of Solr’s Reference Guide (https://lucene.apache.org/solr/guide/7_1/stream-source-reference.html#facet) it says “To support high cardinality aggregations see the rollup function”. I was wondering what it’s considered “high cardinality”. If it serves, our query returns up to 60k results. I haven’t got to do any benchmarking to see if there’s any difference, though, because facet so far performs very well, but I don’t know if I’m near the “tipping point”. Any feedback would be appreciated. There's no hard and fast rule for this. The tipping point is going to be different for every use case. With a little bit of information about your setup, experienced users can make an educated guess about whether or not performance will be good, but cannot say with absolute certainty what you're going to run into. Let's start with some definitions, which you may or may not already know: https://en.wikipedia.org/wiki/Cardinality_(data_modeling) https://en.wikipedia.org/wiki/Cardinality You haven't said how many unique values are in your field. The only information I have from you is 60K results from your queries, which may or may not have any bearing on the total number of documents in your index, or the total number of unique values in the field you're using for faceting. So the next paragraph may or may not apply to your index. In general, 60,000 unique values in a field would be considered very low cardinality, because computers can typically operate on 60,000 values *very* quickly, unless the size of each value is enormous. But if the index has 60,000 total documents, then *in relation to other data*, the cardinality is very high, even though most people would say the opposite. Sixty thousand documents or unique values is almost always a very small index, not prone to performance issues. The warnings about cardinality in the Solr documentation mostly refer to *absolute* cardinality -- how many unique values there are in a field, regardless of the actual number of documents. If there are millions or billions of unique values, then operations like facets, grouping, sorting, etc are probably going to be slow. If there are a lot less, such as thousands or only a handful, then those operations are likely to be very fast, because the computer will have less information it must process. Thanks, Shawn
Re: Auto-Suggestions are not propagating to Solr Cluster Nodes
FYI Thanks Kalahasthi Satyanarayana Mobile : 08884581161 From: Kalahasthi Satyanarayana Sent: Tuesday, February 20, 2018 11:57 AM To: 'solr-user@lucene.apache.org Cc: Deepak Udapudi; Venkata MR; v...@delta.org; Nareshkumar P; Nareshkumar P; Soma Das; Soma Das Subject: Auto-Suggestions are not propagating to Solr Cluster Nodes Hi All, Problem: Not able to build suggest data on all solr cluster nodes Configured three solr using external zookeeper Configured the requestHandler for auto-suggestion as below true 5 Name suggest Name name name AnalyzingInfixLookupFactory name_suggester_infix_dir DocumentDictionaryFactory key lowercase name_suggestor_dictionary string When we manually issue request with suggest.build=true on one of the node for building suggest data, suggest data is built for that particular node only, other nodes of cluster are not getting build the suggest data. Any configuration mismatch? Thanks Kalahasthi Satyanarayana Mobile : 08884581161 ::DISCLAIMER:: -- The contents of this e-mail and any attachment(s) are confidential and intended for the named recipient(s) only. E-mail transmission is not guaranteed to be secure or error-free as information could be intercepted, corrupted, lost, destroyed, arrive late or incomplete, or may contain viruses in transmission. The e mail and its contents (with or without referred errors) shall therefore not attach any liability on the originator or HCL or its affiliates. Views or opinions, if any, presented in this email are solely those of the author and may not necessarily reflect the views or opinions of HCL or its affiliates. Any form of reproduction, dissemination, copying, disclosure, modification, distribution and / or publication of this message without the prior written consent of authorized representative of HCL is strictly prohibited. If you have received this email in error please delete it and notify the sender immediately. Before opening any email and/or attachments, please check them for viruses and other defects. --
Re: storing large text fields in a database? (instead of inside index)
Really depends on what you consider too large, and why the size is a big issue, since most replication will go at about 100mg/second give or take, and replicating a 300GB index is only an hour or two. What i do for this purpose is store my text in a separate index altogether, and call on that core for highlighting. So for my use case, the primary index with no stored text is around 300GB and replicates as needed, and the full text indexes with stored text totals around 500GB and are replicating non stop. All searching goes against the primary index, and for highlighting i call on the full text indexes that have a stupid simple schema. This has worked for me pretty well at least. On Tue, Feb 20, 2018 at 10:27 AM, Roman Chylawrote: > Hello, > > We have a use case of a very large index (slave-master; for unrelated > reasons the search cannot work in the cloud mode) - one of the fields is a > very large text, stored mostly for highlighting. To cut down the index size > (for purposes of replication/scaling) I thought I could try to save it in a > database - and not in the index. > > Lucene has codecs - one of the methods is for 'stored field', so that seems > likes a natural path for me. > > However, I'd expect somebody else before had a similar problem. I googled > and couldn't find any solutions. Using the codecs seems really good thing > for this particular problem, am I missing something? Is there a better way > to cut down on index size? (besides solr cloud/sharding, compression) > > Thank you, > >Roman >
storing large text fields in a database? (instead of inside index)
Hello, We have a use case of a very large index (slave-master; for unrelated reasons the search cannot work in the cloud mode) - one of the fields is a very large text, stored mostly for highlighting. To cut down the index size (for purposes of replication/scaling) I thought I could try to save it in a database - and not in the index. Lucene has codecs - one of the methods is for 'stored field', so that seems likes a natural path for me. However, I'd expect somebody else before had a similar problem. I googled and couldn't find any solutions. Using the codecs seems really good thing for this particular problem, am I missing something? Is there a better way to cut down on index size? (besides solr cloud/sharding, compression) Thank you, Roman
Save the date: ApacheCon North America, September 24-27 in Montréal
Dear Apache Enthusiast, (You’re receiving this message because you’re subscribed to a user@ or dev@ list of one or more Apache Software Foundation projects.) We’re pleased to announce the upcoming ApacheCon [1] in Montréal, September 24-27. This event is all about you — the Apache project community. We’ll have four tracks of technical content this time, as well as lots of opportunities to connect with your project community, hack on the code, and learn about other related (and unrelated!) projects across the foundation. The Call For Papers (CFP) [2] and registration are now open. Register early to take advantage of the early bird prices and secure your place at the event hotel. Important dates March 30: CFP closes April 20: CFP notifications sent August 24: Hotel room block closes (please do not wait until the last minute) Follow @ApacheCon on Twitter to be the first to hear announcements about keynotes, the schedule, evening events, and everything you can expect to see at the event. See you in Montréal! Sincerely, Rich Bowen, V.P. Events, on behalf of the entire ApacheCon team [1] http://www.apachecon.com/acna18 [2] https://cfp.apachecon.com/conference.html?apachecon-north-america-2018
What is “high cardinality” in facet streams?
Hi, We have a query that we can resolve using either facet or search with rollup. In the Stream Source Reference section of Solr’s Reference Guide (https://lucene.apache.org/solr/guide/7_1/stream-source-reference.html#facet) it says “To support high cardinality aggregations see the rollup function”. I was wondering what it’s considered “high cardinality”. If it serves, our query returns up to 60k results. I haven’t got to do any benchmarking to see if there’s any difference, though, because facet so far performs very well, but I don’t know if I’m near the “tipping point”. Any feedback would be appreciated. Many thanks in advance. -- Alfonso Muñoz-Pomer Fuentes Senior Lead Software Engineer @ Expression Atlas Team European Bioinformatics Institute (EMBL-EBI) European Molecular Biology Laboratory Tel:+ 44 (0) 1223 49 2633 Skype: amunozpomer
Sitecore Analytics Index
Hi, For those have Sitecore website app having multisite (but only 1 Sitecore code base), have you separated the index for each multisite? how where you able to manage it? also do you have archiving since analytics data keep growing? Thanks Best Regards, Jeck
Re: Need help with match contains query in SOLR
It was not clear at the beginning, but If I understood correctly you could : *Index Time analysis* Use whatever charFilter you need, the keyword tokenizer[1] and then token filters you like ( such as lowercase filter, synonyms ect) *Query Time Analysis* You can use a tokenizer you like ( that tokenizes so not keywordTokenizer), the Shingle Token filter[2] and whatever additional filter you need. This should do the trick. Cheers [1] https://lucene.apache.org/solr/guide/6_6/tokenizers.html#Tokenizers-KeywordTokenizer [2] https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-ShingleFilter - --- Alessandro Benedetti Search Consultant, R Software Engineer, Director Sease Ltd. - www.sease.io -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: One of three cores is missing userData and lastModified fields from /admin/cores
Were you able to get a solution to this issue ? Aaron Daubman wrote > On a Solr server running 4.10.2 with three cores, two return the expected > info from /solr/admin/cores?wt=json but the third is missing userData and > lastModified. > > The first (artists) and third (tracks) cores from the linked screenshot > are > the ones I care about. * > Unfortunately, the third (tracks) is the one missing > lastModified. * > > As far as I can see, that comes from: > https://github.com/apache/lucene-solr/blob/lucene_solr_4_10_2/solr/core/src/java/org/apache/solr/handler/admin/LukeRequestHandler.java#L568 > > I can't trace back to see what would possible cause getUserData() to > return > an empty Object, but that appears to be what is happening? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Re: Facet performance problem
On 2/20/2018 1:18 AM, LOPEZ-CORTES Mariano-ext wrote: We return a facet list of values in "motifPresence" field (person status). Status: [ ] status1 [x] status2 [x] status3 The user then selects 1 or multiple status (It's this step that we called "facet filtering"). Query is then re-executed with fq=motifPresence:(status2 OR status3) We use fq in order to not alter the score in main query. We've read that docValues=true for facet fields. We need also indexed=true? Facets, grouping, and sorting are more efficient with docValues, but searches aren't helped by docValues. Without indexed="true", searches on the field will be VERY slow. A filter query is still a search. The "filter" in filter query just refers to the fact that it's separate from the main query, and that it does not affect relevancy scoring. Thanks, Shawn
Re: ZK session times out intermittently
On 2/19/2018 3:33 PM, Roy Lim wrote: 6 x Solr (3 primary shard, 3 secondary) 3 x ZK The client is indexing over 16 million documents using 8 threads. Auto-soft commit is 3 minutes, auto-commit is 10 minutes. I would probably reduce the autoCommit time to 1 minute, as long as openSearcher is set to false, which is the recommended setting. This is not necessary, but it would probably reduce the size of your transaction logs, which will make Solr restarts faster. The following timeout is observed in our client log, intermittently: There is no information here. I checked Nabble as well, because sometimes when they replicate to the mailing list, there is information on their forum that does not show up on the mailing list. In this case, Nabble didn't have any information either. If you can't get the data to stay in the message, you may need to use a paste website and provide a URL. Thinking that this is a case where ZK could no longer establish connection to Solr node it is communicating with, I went to the primary nodes and correlated the timestamps. They all are very similar to below: Again, there is nothing here for us to examine. BTW, ZK does not connect to Solr. Solr connects to ZK.It's possible that you're already aware of this, but because of the way you phrased your comment, I cannot tell for sure. Note the time gap of over 1 minute, which I can only surmise that ZK is waiting this whole time for Solr to return, only to timeout. Is that reasonable? Thing is I have no idea what is happening in during that time and why Solr is taking so long. Note the second statement signaling the start of the soft commit, so I don't think this is a case of a long commit. Finally, checking the GC logs, there are no long pauses either! Hoping an expert can shed some light here. Because we can't actually see the information you've referenced, which I assume are excerpts from logfiles, it's difficult to make any kind of recommendation, or even make a guess. We'll need to see your solr logfile, and maybe your ZK logfile. Hopefully there are ERROR logs that we can attempt to decipher, but you'll want the logging to be at the default level of INFO, so we can see the errors in context. If Solr and ZK are on separate servers, you'll want to make sure that there is good time synchronization, so that timestamps in different logs are in sync with each other. How have you determined that the GC log does not have long pauses? Can you share a GC log that includes the timeframe where the problem happened? Thanks, Shawn
RE: Facet performance problem
Our query looks like this: ...factet=true=motifPresence We return a facet list of values in "motifPresence" field (person status). Status: [ ] status1 [x] status2 [x] status3 The user then selects 1 or multiple status (It's this step that we called "facet filtering"). Query is then re-executed with fq=motifPresence:(status2 OR status3) We use fq in order to not alter the score in main query. We've read that docValues=true for facet fields. We need also indexed=true? Is there any other problem in our solution? -Message d'origine- De : Erick Erickson [mailto:erickerick...@gmail.com] Envoyé : lundi 19 février 2018 18:18 À : solr-user Objet : Re: Facet performance problem I'm confused here. What do you mean by "facet filtering"? Your examples have no facets at all, just a _filter query_. I'll assume you want to use filter query (fq), and faceting has nothing to do with it. This is one of the tricky bits of docValues. While it's _possible_ to search on a field that's defined as above, it's very inefficient since there's no "inverted index" for the field, you specified 'indexed="false" '. So the docValues are searched, and it's essentially a table scan. If you mean to search against this field, set indexed="true". You'll have to completely reindex your corpus of course. If you intend to facet, group or sort on this field, you should _also_ have docValues="true". Best, Erick On Mon, Feb 19, 2018 at 7:47 AM, MOUSSA MZE Oussama-extwrote: > Hi > > We have following environement : > > 3 nodes cluster > 1 shard > Replication factor = 2 > 8GB per node > > 29 millions of documents > > We've faceting over field "motifPresence" defined as follow: > > indexed="false" stored="true" required="false"/> > > Once the user selects motifPresence filter we executes search again with: > > fq: (value1 OR value2 OR value3 OR ...) > > The problem is: During facet filtering query is too slow and her response > time is greater than main search (without facet filtering). > > Thanks in advance!