Re: How to know which value matched in multivalued field
I found this page. https://stackoverflow.com/questions/2135072/determine-which-value-produced-a-hit-in-solr-multivalued-field-type Hmmm... 2019年7月12日(金) 22:08 Takashi Sasaki : > > Hi Solr experts, > > I have multivalued location on RPT field. > Is there a way to know which location matched by query? > > sample query: > =:={!bbox sfield=store}=45.15,-93.85=5 > > Of course I can recalculate on the client side, > but I want to know how to do it using Solr's features. > > Solr version is 7.3.1. > > Thanks, > Takashi Sasaki
Re: indexing slow in solr 8.0.0
You reduce cpu in half and see slower indexing. That is to be expected. But you fail to tell us any real details about your setup, your docs, how you index, how you measure throughput, what your bottleneck is etc. Also note that you get better throughput when indexing for the first time than if you re-index on top of an existing index. Jan > 12. jul. 2019 kl. 15:25 skrev derrick cui : > > Hi, > I am facing an problem now. I just moved my solr cloud from one environment > to another one, but performance is extremely slow in the new servers. the > only difference is CPU. also I just copy my whole solr folder from old env to > new env and changed the configuration file. > before:hardware: three servers: 8 core cpu, mem 32G, ssd:300Gindexing 400k > only needs 5 minutescollection: 3 shareds/2 replicas/3 nodes > now:hardware: three servers: 4 core cpu, mem 32G, ssd:300G > indexing 400k, less than 1 documents per minutes > collection: 3 shareds/2 replicas/3 nodes > > anyone what could cause the issue? thanks advance
Solr 7.7 restore issue
I have a 4 node cluster. My goal is to have 2 shards with two replicas each and only allowing 1 core on each node. I have a cluster policy set to: [{"replica":"2", "shard": "#EACH", "collection":"test", "port":"8983"},{"cores":"1", "node":"#ANY"}] I then manually create a collection with: name: test config set: test numShards: 2 replicationFact: 2 This works and I get a collection that looks like what I expect. I then backup this collection. But when I try to restore the collection it fails and says "Error getting replica locations : No node can satisfy the rules" [{"replica":"2", "shard": "#EACH", "collection":"test", "port":"8983"},{"cores":"1", "node":"#ANY"}] If I set my cluster-policy rules back to [] and try to restore it then successfully restores my collection exactly how I expect it to be. It appears that having any cluster-policy rules in place is affecting my restore, but the "error getting replica locations" is strange. Any suggestions? mark
Re: SolrCloud indexing triggers merges and timeouts
Upon further investigation on this issue, I see the below log lines during the indexing process: 2019-06-06 22:24:56.203 INFO (qtp1169794610-5652) [c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623 s:shard22 r:core_node87 x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84] org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: trigger flush: activeBytes=352402600 deleteBytes=279 vs limit=104857600 2019-06-06 22:24:56.203 INFO (qtp1169794610-5652) [c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623 s:shard22 r:core_node87 x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84] org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: thread state has 352402600 bytes; docInRAM=1 2019-06-06 22:24:56.204 INFO (qtp1169794610-5652) [c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623 s:shard22 r:core_node87 x:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623_shard22_replica_n84] org.apache.solr.update.LoggingInfoStream [FP][qtp1169794610-5652]: 1 in-use non-flushing threads states 2019-06-06 22:24:56.204 INFO (qtp1169794610-5652) [c:UM_IndexServer_MailArchiv_Spelle_66AC8340-4734-438A-9D1D-A84B659B1623 s:shard22 r:core_node87 I have the below questions: 1) The log line which says "thread state has 352402600 bytes; docInRAM=1 ", does it mean that the buffer was flushed to disk with only one huge document ? 2) If yes, does this flush create a segment with just one document ? 3) Heap dump analysis shows large (>350 MB) instances of DocumentWritersPerThread. Does one instance of this class correspond to one document? Help is much appreciated. Thanks, Rahul On Fri, Jul 5, 2019 at 2:11 AM Rahul Goswami wrote: > Shawn,Erick, > Thank you for the explanation. The merge scheduler params make sense now. > > Thanks, > Rahul > > On Wed, Jul 3, 2019 at 11:30 AM Erick Erickson > wrote: > >> Two more tidbits to add to Shawn’s explanation: >> >> There are heuristics built in to ConcurrentMergeScheduler. >> From the Javadocs: >> * If it's an SSD, >> * {@code maxThreadCount} is set to {@code max(1, min(4, >> cpuCoreCount/2))}, >> * otherwise 1. Note that detection only currently works on >> * Linux; other platforms will assume the index is not on an SSD. >> >> Second, TieredMergePolicy (the default) merges in “tiers” that >> are of similar size. So you can have multiple merges going on >> at the same time on disjoint sets of segments. >> >> Best, >> Erick >> >> > On Jul 3, 2019, at 7:54 AM, Shawn Heisey wrote: >> > >> > On 7/2/2019 10:53 PM, Rahul Goswami wrote: >> >> Hi Shawn, >> >> Thank you for the detailed suggestions. Although, I would like to >> >> understand the maxMergeCount and maxThreadCount params better. The >> >> documentation >> >> < >> https://lucene.apache.org/solr/guide/7_3/indexconfig-in-solrconfig.html#mergescheduler >> > >> >> mentions >> >> that >> >> maxMergeCount : The maximum number of simultaneous merges that are >> allowed. >> >> maxThreadCount : The maximum number of simultaneous merge threads that >> >> should be running at once >> >> Since one thread can only do 1 merge at any given point of time, how >> does >> >> maxMergeCount being greater than maxThreadCount help anyway? I am >> having >> >> difficulty wrapping my head around this, and would appreciate if you >> could >> >> help clear it for me. >> > >> > The maxMergeCount setting controls the number of merges that can be >> *scheduled* at the same time. As soon as that number of merges is reached, >> the indexing thread(s) will be paused until the number of merges in the >> schedule drops below this number. This ensures that no more merges will be >> scheduled. >> > >> > By setting maxMergeCount higher than the number of merges that are >> expected in the schedule, you can ensure that indexing will never be >> paused. It would require very atypical merge policy settings for the >> number of scheduled merges to ever reach six. On my own indexing, I >> reached three scheduled merges quite frequently. The default setting for >> maxMergeCount is three. >> > >> > The maxThreadCount setting controls how many of the scheduled merges >> will be simultaneously executed. With index data on standard spinning >> disks, you do not want to increase this number beyond 1, or you will have a >> performance problem due to thrashing disk heads. If your data is on SSD, >> you can make it larger than 1. >> > >> > Thanks, >> > Shawn >> >>
Re: Spark-Solr connector
Thanks Shawn I'll raise a question on the GitHub page. Cheers, Dwane From: Shawn Heisey Sent: Friday, 12 July 2019 10:05 PM To: solr-user@lucene.apache.org Subject: Re: Spark-Solr connector On 7/11/2019 8:50 PM, Dwane Hall wrote: > I’ve just started looking at the excellent spark-solr project (thanks Tim > Potter, Kiran Chitturi, Kevin Risden and Jason Gerlowski for their efforts > with this project it looks really neat!!). > > I’m only at the initial stages of my exploration but I’m running into a class > not found exception when connecting to a secure solr cloud instance (basic > auth, ssl). Everything is working as expected on a non-secure solr cloud > instance. > > The process looks pretty straightforward according to the doco so I’m > wondering if I’m missing anything obvious or if I need to bring any extra > classes to the classpath when using this project? > > Any advice would be greatly appreciated. The exception here (which I did not quote) is in code from Google, Spark, and Lucidworks. There are no Solr classes mentioned at all in the stacktrace. Which means that we won't be able to help you on this list. Looking closer at the stacktrace, it looks to me like you're going to need to talk to Lucidworks about this problem. Thanks, Shawn
Re: Getting list of unique values in a field
i found this: https://stackoverflow.com/questions/14485031/faceting-using-solrj-and-solr4 and this https://www.programcreek.com/java-api-examples/?api=org.apache.solr.client.solrj.response.FacetField just from a google search On Fri, Jul 12, 2019 at 9:46 AM Steven White wrote: > Thanks David. But is there a SolrJ sample code on how to do this? I need > to see one, or at least the API, so I know how to make the call. > > Steven > > On Fri, Jul 12, 2019 at 9:42 AM David Hastings < > hastings.recurs...@gmail.com> > wrote: > > > just use a facet on the field should work yes? > > > > On Fri, Jul 12, 2019 at 9:39 AM Steven White > wrote: > > > > > Hi everyone, > > > > > > One of my indexed field is as follows: > > > > > > > > multiValued="false" indexed="true" required="true" stored="false"/> > > > > > > It holds the file extension of the files I'm indexing. That is, let us > > say > > > I indexed 10 million files and the result of such indexing, the field > > > CC_FILE_EXT will now have the file extension. In my case the unique > file > > > extension list is about 300. > > > > > > Using SolrJ, is there a quick and fast way for me to get back all the > > > unique values this field has across all of my document? I don't and > > cannot > > > scan all the 10 million indexed documents in Solr to build that list. > > That > > > would be very inefficient. > > > > > > Thanks, > > > > > > Steven > > > > > >
Re: Getting list of unique values in a field
Thanks David. But is there a SolrJ sample code on how to do this? I need to see one, or at least the API, so I know how to make the call. Steven On Fri, Jul 12, 2019 at 9:42 AM David Hastings wrote: > just use a facet on the field should work yes? > > On Fri, Jul 12, 2019 at 9:39 AM Steven White wrote: > > > Hi everyone, > > > > One of my indexed field is as follows: > > > > > multiValued="false" indexed="true" required="true" stored="false"/> > > > > It holds the file extension of the files I'm indexing. That is, let us > say > > I indexed 10 million files and the result of such indexing, the field > > CC_FILE_EXT will now have the file extension. In my case the unique file > > extension list is about 300. > > > > Using SolrJ, is there a quick and fast way for me to get back all the > > unique values this field has across all of my document? I don't and > cannot > > scan all the 10 million indexed documents in Solr to build that list. > That > > would be very inefficient. > > > > Thanks, > > > > Steven > > >
Re: Getting list of unique values in a field
just use a facet on the field should work yes? On Fri, Jul 12, 2019 at 9:39 AM Steven White wrote: > Hi everyone, > > One of my indexed field is as follows: > > multiValued="false" indexed="true" required="true" stored="false"/> > > It holds the file extension of the files I'm indexing. That is, let us say > I indexed 10 million files and the result of such indexing, the field > CC_FILE_EXT will now have the file extension. In my case the unique file > extension list is about 300. > > Using SolrJ, is there a quick and fast way for me to get back all the > unique values this field has across all of my document? I don't and cannot > scan all the 10 million indexed documents in Solr to build that list. That > would be very inefficient. > > Thanks, > > Steven >
Getting list of unique values in a field
Hi everyone, One of my indexed field is as follows: It holds the file extension of the files I'm indexing. That is, let us say I indexed 10 million files and the result of such indexing, the field CC_FILE_EXT will now have the file extension. In my case the unique file extension list is about 300. Using SolrJ, is there a quick and fast way for me to get back all the unique values this field has across all of my document? I don't and cannot scan all the 10 million indexed documents in Solr to build that list. That would be very inefficient. Thanks, Steven
indexing slow in solr 8.0.0
Hi, I am facing an problem now. I just moved my solr cloud from one environment to another one, but performance is extremely slow in the new servers. the only difference is CPU. also I just copy my whole solr folder from old env to new env and changed the configuration file. before:hardware: three servers: 8 core cpu, mem 32G, ssd:300Gindexing 400k only needs 5 minutescollection: 3 shareds/2 replicas/3 nodes now:hardware: three servers: 4 core cpu, mem 32G, ssd:300G indexing 400k, less than 1 documents per minutes collection: 3 shareds/2 replicas/3 nodes anyone what could cause the issue? thanks advance
How to know which value matched in multivalued field
Hi Solr experts, I have multivalued location on RPT field. Is there a way to know which location matched by query? sample query: =:={!bbox sfield=store}=45.15,-93.85=5 Of course I can recalculate on the client side, but I want to know how to do it using Solr's features. Solr version is 7.3.1. Thanks, Takashi Sasaki
Re: Spark-Solr connector
On 7/11/2019 8:50 PM, Dwane Hall wrote: I’ve just started looking at the excellent spark-solr project (thanks Tim Potter, Kiran Chitturi, Kevin Risden and Jason Gerlowski for their efforts with this project it looks really neat!!). I’m only at the initial stages of my exploration but I’m running into a class not found exception when connecting to a secure solr cloud instance (basic auth, ssl). Everything is working as expected on a non-secure solr cloud instance. The process looks pretty straightforward according to the doco so I’m wondering if I’m missing anything obvious or if I need to bring any extra classes to the classpath when using this project? Any advice would be greatly appreciated. The exception here (which I did not quote) is in code from Google, Spark, and Lucidworks. There are no Solr classes mentioned at all in the stacktrace. Which means that we won't be able to help you on this list. Looking closer at the stacktrace, it looks to me like you're going to need to talk to Lucidworks about this problem. Thanks, Shawn
Re: QTime
Yeah, for network latency I would recommend a tool like charlesproxy. Edward Em qui, 11 de jul de 2019 20:59, Erick Erickson escreveu: > true, although there’s still network that can’t be included. > > > On Jul 11, 2019, at 5:55 PM, Edward Ribeiro > wrote: > > > > Wouldn't be the case of using =0 parameter on those requests? Wdyt? > > > > Edward > > > > Em qui, 11 de jul de 2019 14:24, Erick Erickson > > > escreveu: > > > >> Not only does Qtime not include network latency, it also doesn't include > >> the time it takes to assemble the docs for return, which can be lengthy > >> when rows is large.. > >> > >> On Wed, Jul 10, 2019, 14:39 Shawn Heisey wrote: > >> > >>> On 7/10/2019 3:17 PM, Lucky Sharma wrote: > I am seeing one very weird behaviour of QTime of SOLR. > > Scenario is : > When I am hitting the Solr Cloud Instance, situated at a DC with my > >> local > machine while load test I was seeing 400ms Qtime response and 1sec > Http > Response time. > >>> > >>> How much data was in the response? If it's large, I can see it taking > >>> that long to transfer. This is even more likely if there is a lot of > >>> network latency in the network between the client and the server. > >>> > While I am trying to do the same process within the same DC location, > I > >>> am > getting 100 ms Solr QTime and 130ms Response Time. > > Does QTime counts network latency too?? > >>> > >>> There's no way Solr can include the time to send the response over the > >>> network in QTime. The value is calculated and put into the response > >>> before Solr starts sending. > >>> > >>> Thanks, > >>> Shawn > >>> > >> > >