Re: slow queries
A couple of things don't particularly make sense here: You specify edismax, q=*:* yet you specify qf= You're searching across whatever you defined as the default field in the request handler. What do you see if you attach =true to the query? I think this clause is wrong: (cents_ri: [* 3000]) I think you mean (cents_ri: [* TO 3000]) I'm not sure either of those is the problem, but are places I'd start. As far as the size of your filter cache goes, a hit ratio of .87 actually isn't bad. Upping the size would add some marginal benefit, but it's unlikely to be a magic bullet. But are these slow queries constant or intermittent? In other words, are all queries of this general form slow or just the first few? In particular is the first query that mentions sorting on this field slow but subsequent ones faster? In that case consider adding a query to the newSearcher event in solrconfig.xml that mentions this sort, that would pre-warm the sort values. Also, defining all fields that you sort on as docValues="true" is recommended at this point. What I'd try is removing clauses to see which one is the problem. On the surface this is surprisingly slow. And how heavily loaded is the server? Your autocommit settings look fine, my question is more how much indexing and querying is going on when you take these measurements. Best, Erick On Wed, Oct 14, 2015 at 3:03 AM, Lorenzo Fundarówrote: > Hello, > > I have following conf for filters and commits : > > Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57, > acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8, > regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd) > > > >${solr.autoCommit.maxTime:15000} >false > > > > >${solr.autoSoftCommit.maxTime:60} > > > and the following stats for filters: > > lookups = 3602 > hits = 3148 > hit ratio = 0.87 > inserts = 455 > evictions = 400 > size = 63 > warmupTime = 770 > > *Problem: *a lot of slow queries, for example: > > {q=*:*=1.0=edismax=standard=map==pk_i,score=0=view_counter_i > desc={!cost=1 cache=true}type_s:Product AND is_valid_b:true={!cost=50 > cache=true}in_languages_t:de={!cost=99 > cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND > (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378 > > I could increase the size of the filter so I would decrease the amount of > evictions, but it seems to me this would not be solving the root problem. > > Some ideas on where/how to start for optimisation ? Is it actually normal > that this query takes this time ? > > We have an index of ~14 million docs. 4 replicas with two cores and 1 shard > each. > > thank you. > > > -- > > -- > Lorenzo Fundaro > Backend Engineer > E-Mail: lorenzo.fund...@dawandamail.com > > Fax + 49 - (0)30 - 25 76 08 52 > Tel+ 49 - (0)179 - 51 10 982 > > DaWanda GmbH > Windscheidstraße 18 > 10627 Berlin > > Geschäftsführer: Claudia Helming, Michael Pütz > Amtsgericht Charlottenburg HRB 104695 B
Re: Replication and soft commits for NRT searches
bq: If a timeout between shard leader and replica can lead to a smaller rf value (because replication has timed out), is it possible to increase this timeout in the configuration? Why do you care? If it timed out, then the follower will no longer be active and will not serve queries. The Cloud view should show it in "down", "recovery" or the like. Before it goes back to the "active" state, it will synchronize from the leader automatically without you having to do anything and any docs that were indexed to the leader will be faithfully reflected on the follower _before_ the recovering follower serves any new queries. So practically it makes no difference whether there was an update timeout or not. This is feeling a lot like an "XY" problem. You're asking detailed questions about "X" (in this case timeouts, what rf means and the like) without telling us what the problem you're concerned about is ("Y"). So please back up and tell us what your higher level concern is. Do you have any evidence of Bad Things Happening? And do, please, change your commit intervals to not happen after doc. That's a Really Bad Practice in Solr. Best, Erick On Tue, Oct 13, 2015 at 11:58 PM, MOIS Martin (MORPHO)wrote: > Hello, > > thank you for the detailed answer. > > If a timeout between shard leader and replica can lead to a smaller rf value > (because replication has timed out), is it possible to increase this timeout > in the configuration? > > Best Regards, > Martin Mois > > Comments inline: > > On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO) > wrote: >> Hello, >> >> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been >> created with > replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am > using autoCommit/maxDocs=1 > and autoSoftCommits/maxDocs=1 in order to achieve near realtime search > behavior. >> >> As far as I understand from section "Write Side Fault Tolerance" in the >> documentation > (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance), > I > cannot enforce that an update gets replicated to all replicas, but I can only > get the achieved > replication factor by requesting the return value rf. >> >> My question is now, what exactly does rf=2 mean? Does it only mean that the >> replica has > written the update to its transaction log? Or has the replica also performed > the soft commit > as configured with autoSoftCommits/maxDocs=1? The answer is important for me, > as if the update > would only get written to the transaction log, I could not search for it > reliable, as the > replica may not have added it to the searchable index. > > rf=2 means that the update was successfully replicated to and > acknowledged by two replicas (including the leader). The rf only deals > with the durability of the update and has no relation to visibility of > the update to searchers. The auto(soft)commit settings are applied > asynchronously and do not block an update request. > >> >> My second question is, does rf=1 mean that the update was definitely not >> successful on > the replica or could it also represent a timeout of the replication request > from the shard > leader? If it could also represent a timeout, then there would be a small > chance that the > replication was successfully despite of the timeout. > > Well, rf=1 implies that the update was only applied on the leader's > index + tlog and either replicas weren't available or returned an > error or the request timed out. So yes, you are right that it can > represent a timeout and as such there is a chance that the replication > was indeed successful despite of the timeout. > >> >> Is there a way to retrieve the replication factor for a specific document >> after the update > in order to check if replication was successful in the meantime? >> > > No, there is no way to do that. > >> Thanks in advance. >> >> Best Regards, >> Martin Mois >> # >> " This e-mail and any attached documents may contain confidential or >> proprietary information. > If you are not the intended recipient, you are notified that any > dissemination, copying of > this e-mail and any attachments thereto or use of their contents by any means > whatsoever is > strictly prohibited. If you have received this e-mail in error, please advise > the sender immediately > and delete this e-mail and all attached documents from your computer system." >> # > > > > -- > Regards, > Shalin Shekhar Mangar. > > # > " This e-mail and any attached documents may contain confidential or > proprietary information. If you are not the intended recipient, you are > notified that any dissemination, copying of this e-mail and any attachments > thereto or use of their contents by any means whatsoever is strictly > prohibited. If you have received this e-mail in error, please advise the > sender immediately and delete this e-mail and all attached documents from >
Bioinformatics search event in Cambridge UK Feb 3rd & 4th 2016
Hi all, We're helping to run an event in Cambridge UK next year which will be an open workshop on search for bioinformatics: http://www.ebi.ac.uk/pdbe/about/events/open-source-search-bioinformatics Do please spread the word to anyone working with biological data and open source search! It's linked to our project BioSolr which is developing Solr features for bioinformaticians such as ontology indexers, JOINs with external data and faceting improvements (although we're hoping they're also of general use). Cheers Charlie -- Charlie Hull Flax - Open Source Enterprise Search tel/fax: +44 (0)8700 118334 mobile: +44 (0)7767 825828 web: www.flax.co.uk
Re: Run Solr 5.3.0 as a Service on Windows using NSSM
Did you add the f param for running it in foreground? I noticed that the Solr service was restarted indefinetly when running it as a background service. its also needed to stop the windows service. This test worked well here (on Windows 2012): REM Test for running solr 5.3.1 as a windows service C:\nssm\nssm64.exe install "Solr 5.3.1" C:\search\solr-5.3.1\bin\solr.cmd "start -f -p 8983" On 8 October 2015 at 04:34, Zheng Lin Edwin Yeowrote: > Hi Adrian and Upayavira, > > It works fine when I start Solr outside NSSM. > As for the NSSM, so far I haven't tried the automatic startup yet. I start > the services for ZooKeeper and Solr in NSSM manually from the Windows > Component Services, so the ZooKeeper will have been started before I start > Solr. > > I'll also try to write the script for Solr that can check it can access > Zookeeper before attempting to start Solr. > > Regards, > Edwin > > > On 7 October 2015 at 19:16, Upayavira wrote: > > > Wrap your script that starts Solr with one that checks it can access > > Zookeeper before attempting to start Solr, that way, once ZK starts, > > Solr will come up. Then, hand *that* script to NSSM. > > > > And finally, when one of you has got a setup that works with NSSM > > starting Solr via the default bin\solr.cmd script, create a patch and > > upload it to JIRA. It would be a valuable thing for Solr to have a > > *standard* way to start Solr on Windows as a service. I recall checking > > the NSSM license and it wouldn't be an issue to include it within Solr - > > or to have a script that assumes it is installed. > > > > Upayavira > > > > On Wed, Oct 7, 2015, at 11:49 AM, Adrian Liew wrote: > > > Hi Edwin, > > > > > > You may want to try explore some of the configuration properties to > > > configure in zookeeper. > > > > > > > > > http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#sc_zkMulitServerSetup > > > > > > My recommendation is to try run your batch files outside of NSSM so it > is > > > easier to debug and observe what you see from the command window. I > don't > > > think ZK and Solr can be automated on startup well using NSSM due to > the > > > fact that ZK services need to be running before you start up Solr > > > services. I just had conversation with Shawn on this topic. NSSM cannot > > > do the magic startup in a cluster setup. In that, you may need to write > > > custom scripting to get it right. > > > > > > Back to your original issue, I guess it is worth exploring timeout > > > values. Then again, I will leave the real Solr experts to chip in their > > > thoughts. > > > > > > Best regards, > > > > > > Adrian Liew > > > > > > > > > -Original Message- > > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] > > > Sent: Wednesday, October 7, 2015 1:40 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: Run Solr 5.3.0 as a Service on Windows using NSSM > > > > > > Hi Adrian, > > > > > > I've waited for more than 5 minutes and most of the time when I refresh > > > it says that the page cannot be found. Got one or twice the main Admin > > > page is loaded, but none of the cores are loaded. > > > > > > I have 20 cores which I'm loading. The core are of various sizes, but > the > > > maximum one is 38GB. Others ranges from 10GB to 15GB, and there're some > > > which are less than 1GB. > > > > > > My overall core size is about 200GB. > > > > > > Regards, > > > Edwin > > > > > > > > > On 7 October 2015 at 12:11, Adrian Liew > wrote: > > > > > > > Hi Edwin, > > > > > > > > I have setup NSSM on Solr 5.3.0 in an Azure VM and can start up Solr > > > > with a base standalone installation. > > > > > > > > You may have to give Solr some time to bootstrap things and wait for > > > > the page to reload. Are you still seeing the page after 1 minute or > so? > > > > > > > > What are your core sizes? And how many cores are you trying to load? > > > > > > > > Best regards, > > > > Adrian > > > > > > > > -Original Message- > > > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] > > > > Sent: Wednesday, October 7, 2015 11:46 AM > > > > To: solr-user@lucene.apache.org > > > > Subject: Run Solr 5.3.0 as a Service on Windows using NSSM > > > > > > > > Hi, > > > > > > > > I tried to follow this to start my Solr as a service using NSSM. > > > > http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/ > > > > > > > > Everything is fine when I start the services under Component > Services. > > > > However, when I tried to point to the Solr Admin page, it says that > > > > the page cannot be found. > > > > > > > > I have tried the same thing in Solr 5.1, and it was able to work. Not > > > > sure why it couldn't work for Solr 5.2 and Solr 5.3. > > > > > > > > Is there any changes required to what is listed on the website? > > > > > > > > Regards, > > > > Edwin > > > > > > > -- Kind Regards / Med vänlig hälsning *Anders Thulin* Founder, CTO
Re: slow queries
Consider 1. Turning on docValues for fields you are sorting, faceting on. This will require to reindex your data 2. Try using TrieInt type field you are trying to do range search on (you may have to fiddle with precisoinStep) to balance index size vs performance. 3. If slowness is intermittent - turn on GC logging and see if there are any long and tune GC strategy accordingly. -- Pushkar Raste On Wed, Oct 14, 2015 at 5:03 AM, Lorenzo Fundaró < lorenzo.fund...@dawandamail.com> wrote: > Hello, > > I have following conf for filters and commits : > > Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57, > acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8, > regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd) > > > >${solr.autoCommit.maxTime:15000} >false > > > > >${solr.autoSoftCommit.maxTime:60} > > > and the following stats for filters: > > lookups = 3602 > hits = 3148 > hit ratio = 0.87 > inserts = 455 > evictions = 400 > size = 63 > warmupTime = 770 > > *Problem: *a lot of slow queries, for example: > > {q=*:*=1.0=edismax=standard > =map==pk_i,score=0=view_counter_i > desc={!cost=1 cache=true}type_s:Product AND is_valid_b:true={!cost=50 > cache=true}in_languages_t:de={!cost=99 > cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND > (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378 > > I could increase the size of the filter so I would decrease the amount of > evictions, but it seems to me this would not be solving the root problem. > > Some ideas on where/how to start for optimisation ? Is it actually normal > that this query takes this time ? > > We have an index of ~14 million docs. 4 replicas with two cores and 1 shard > each. > > thank you. > > > -- > > -- > Lorenzo Fundaro > Backend Engineer > E-Mail: lorenzo.fund...@dawandamail.com > > Fax + 49 - (0)30 - 25 76 08 52 > Tel+ 49 - (0)179 - 51 10 982 > > DaWanda GmbH > Windscheidstraße 18 > 10627 Berlin > > Geschäftsführer: Claudia Helming, Michael Pütz > Amtsgericht Charlottenburg HRB 104695 B >
Can I use tokenizer twice ?
I have Solr 4.2 I need to do the following: 1. white space tokenize 2. create shingles 3. use EdgeNGramFilter for each word in shingles, but not in a shingle as a string So can I do this? * * * * -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: slow queries
Hi Lorenzo, Can you provide which solr version you are using, index size on disks & hardware config (memory/processor on each machine. Thanks, Susheel On Wed, Oct 14, 2015 at 6:03 AM, Lorenzo Fundaró < lorenzo.fund...@dawandamail.com> wrote: > Hello, > > I have following conf for filters and commits : > > Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57, > acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8, > regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd) > > > >${solr.autoCommit.maxTime:15000} >false > > > > >${solr.autoSoftCommit.maxTime:60} > > > and the following stats for filters: > > lookups = 3602 > hits = 3148 > hit ratio = 0.87 > inserts = 455 > evictions = 400 > size = 63 > warmupTime = 770 > > *Problem: *a lot of slow queries, for example: > > {q=*:*=1.0=edismax=standard > =map==pk_i,score=0=view_counter_i > desc={!cost=1 cache=true}type_s:Product AND is_valid_b:true={!cost=50 > cache=true}in_languages_t:de={!cost=99 > cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND > (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378 > > I could increase the size of the filter so I would decrease the amount of > evictions, but it seems to me this would not be solving the root problem. > > Some ideas on where/how to start for optimisation ? Is it actually normal > that this query takes this time ? > > We have an index of ~14 million docs. 4 replicas with two cores and 1 shard > each. > > thank you. > > > -- > > -- > Lorenzo Fundaro > Backend Engineer > E-Mail: lorenzo.fund...@dawandamail.com > > Fax + 49 - (0)30 - 25 76 08 52 > Tel+ 49 - (0)179 - 51 10 982 > > DaWanda GmbH > Windscheidstraße 18 > 10627 Berlin > > Geschäftsführer: Claudia Helming, Michael Pütz > Amtsgericht Charlottenburg HRB 104695 B >
RE: How to formulate query
Hi Susheel, Mikhail, Erick, Thanks for replies. I need to learn more. Regards, Prasanna. -Original Message- From: Susheel Kumar [mailto:susheel2...@gmail.com] Sent: Tuesday, October 13, 2015 12:54 AM To: solr-user@lucene.apache.org Subject: Re: How to formulate query Hi Prassana, This is a highly custom relevancy/ordering requirement and one possible way you can try is by creating multiple fields and coming up with query for each of the searches and boost them accordingly. Thnx On Mon, Oct 12, 2015 at 12:50 PM, Erick Ericksonwrote: > Nothing exists currently that would do this. I would urge you to > revisit the requirements, this kind of super-specific ordering is > often not worth the effort to try to enforce, how does the _user_ > benefit here? > > Best, > Erick > > On Mon, Oct 12, 2015 at 12:47 AM, Prasanna S. Dhakephalkar > wrote: > > Hi, > > > > > > > > I am trying to make a solr search query to get result as under I am > unable > > to get do > > > > > > > > I have a search term say "pit" > > > > The result should have (in that order) > > > > > > > > All docs that have "pit" as first WORD in search field (pit\ *)+ > > > > All docs that have first WORD that starts with "pit" (pit*\ *)+ > > > > All docs that have "pit" as WORD anywhere in search field (except > > first) (*\ pit\ *)+ > > > > All docs that have a WORD starting with "pit" anywhere in search > > field (except first) (*\ pit*\ *)+ > > > > All docs that have "pit" as string anywhere in the search field > > except > cases > > covered above (*pit*) > > > > > > > > Example : > > > > > > > > Pit the pat > > > > Pit digger > > > > Pitch ball > > > > Pitcher man > > > > Dig a pit with shovel > > > > Why do you want to dig a pit with shovel > > > > Cricket pitch is 22 yards > > > > What is pithy, I don't know > > > > Per capita income > > > > Epitome of blah blah > > > > > > > > > > > > How can I achieve this ? > > > > > > > > Regards, > > > > > > > > Prasanna. > > > > > > >
Re: are there any SolrCloud supervisors?
I’m aware of two public administration tools: This was announced to the list just recently: https://github.com/bloomreach/solrcloud-haft And I’ve been working in this: https://github.com/whitepages/solrcloud_manager Both of these hook the Solrcloud client’s ZK access to inspect the cluster state and execute more complex cluster-aware operations. I was also a bit amused, because it looks like we both independently arrived at the same replication-handler-based copy-collection operation. (Which suggests to me that the functionality should be pushed into the collections API.) Neither of these is a supervisor though, they merely provide a way to execute cluster aware commands. Another monitor-oriented mechanism would be needed to detect when to perform those commands, and I’ve not seen anything existing along those lines. On 10/13/15, 5:35 AM, "Susheel Kumar"wrote: >Sounds interesting... > >On Tue, Oct 13, 2015 at 12:58 AM, Trey Grainger >wrote: > >> I'd be very interested in taking a look if you post the code. >> >> Trey Grainger >> Co-Author, Solr in Action >> Director of Engineering, Search & Recommendations @ CareerBuilder >> >> On Fri, Oct 2, 2015 at 3:09 PM, r b wrote: >> >> > I've been working on something that just monitors ZooKeeper to add and >> > remove nodes from collections. the use case being I put SolrCloud in >> > an autoscaling group on EC2 and as instances go up and down, I need >> > them added to the collection. It's something I've built for work and >> > could clean up to share on GitHub if there is much interest. >> > >> > I asked in the IRC about a SolrCloud supervisor utility but wanted to >> > extend that question to this list. are there any more "full featured" >> > supervisors out there? >> > >> > >> > -renning >> > >>
partial search EdgeNGramFilterFactory
I have the following fieldtype in my schema: and the following field: With the following data: SellerName:CARDINAL HEALTH When I do the following search q:SellerName:cardinal I get back the results with SellerName: CARDINAL HEALTH (correct) or I do the search q:SellerName:cardinal he I get back the results with SellerName: CARDINAL HEALTH (correct) But when I do the search q:SellerName:cardinal hea I am getting the results back with SellerName:INTEGRA RADIONICS Why is that? I need it to continue to return the correct results with CARDINAL HEALTH. How do I make that happen? Thanks in advance,
Re: slow queries
You may want to start solr with following settings to enable logging GC details. Here are some flags you might want to enable. -Xloggc:/gc.log -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCTimeStamps -XX:+PrintTenuringDistribution -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC Once you have GC logs, look for string "Total time for which application threads were stopped" to check if you have long pauses (you may get long pauses even with young generation GC). -- Pushkar Raste On Wed, Oct 14, 2015 at 11:47 AM, Lorenzo Fundaró < lorenzo.fund...@dawandamail.com> wrote: > < =true to the query?>> > > "debug": { "rawquerystring": "*:*", "querystring": "*:*", "parsedquery": > "(+MatchAllDocsQuery(*:*))/no_coord", "parsedquery_toString": "+*:*", " > explain": { "Product:47047358": "\n1.0 = (MATCH) MatchAllDocsQuery, product > of:\n 1.0 = queryNorm\n", "Product:3223": "\n1.0 = (MATCH) > MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:30852121": > "\n1.0 > = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " > Product:35018929": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = > queryNorm\n", "Product:31682082": "\n1.0 = (MATCH) MatchAllDocsQuery, > product of:\n 1.0 = queryNorm\n", "Product:31077677": "\n1.0 = (MATCH) > MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:22298365": > "\n1.0 > = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " > Product:41094514": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = > queryNorm\n", "Product:13106166": "\n1.0 = (MATCH) MatchAllDocsQuery, > product of:\n 1.0 = queryNorm\n", "Product:19142249": "\n1.0 = (MATCH) > MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38243373": > "\n1.0 > = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " > Product:20434065": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = > queryNorm\n", "Product:25194801": "\n1.0 = (MATCH) MatchAllDocsQuery, > product of:\n 1.0 = queryNorm\n", "Product:885482": "\n1.0 = (MATCH) > MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:45356790": > "\n1.0 > = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " > Product:67719831": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = > queryNorm\n", "Product:12843394": "\n1.0 = (MATCH) MatchAllDocsQuery, > product of:\n 1.0 = queryNorm\n", "Product:38126213": "\n1.0 = (MATCH) > MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38798130": > "\n1.0 > = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " > Product:30292169": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = > queryNorm\n", "Product:11535854": "\n1.0 = (MATCH) MatchAllDocsQuery, > product of:\n 1.0 = queryNorm\n", "Product:8443674": "\n1.0 = (MATCH) > MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:51012182": > "\n1.0 > = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " > Product:75780871": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = > queryNorm\n", "Product:20227881": "\n1.0 = (MATCH) MatchAllDocsQuery, > product of:\n 1.0 = queryNorm\n", "Product:38093629": "\n1.0 = (MATCH) > MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:3142218": > "\n1.0 > = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " > Product:15295602": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = > queryNorm\n", "Product:3375982": "\n1.0 = (MATCH) MatchAllDocsQuery, > product of:\n 1.0 = queryNorm\n", "Product:38276777": "\n1.0 = (MATCH) > MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:10726118": > "\n1.0 > = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " > Product:50827742": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = > queryNorm\n", "Product:5771722": "\n1.0 = (MATCH) MatchAllDocsQuery, > product of:\n 1.0 = queryNorm\n", "Product:3245678": "\n1.0 = (MATCH) > MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:13702130": > "\n1.0 > = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " > Product:25679953": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = > queryNorm\n" }, "QParser": "ExtendedDismaxQParser", "altquerystring": null, > "boost_queries": null, "parsed_boost_queries": [], "boostfuncs": null, " > filter_queries": [ "{!cost=1 cache=true}type_s:Product AND > is_valid_b:true", > "{!cost=50 cache=true}in_languages_t:de", "{!cost=99 > cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND > (cents_ri: [* TO 3000])" ], "parsed_filter_queries": [ "+type_s:Product > +is_valid_b:true", "in_languages_t:de", "{!cache=false > cost=99}+(shipping_country_codes_mt:de shipping_country_codes_mt:euro > shipping_country_codes_mt:eur shipping_country_codes_mt:all) +cents_ri:[* > TO 3000]" ], "timing": { "time": 18, "prepare": { "time": 0, "query": { " > time": 0 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { " > time": 0 }, "stats": { "time": 0 }, "expand": { "time": 0 }, "spellcheck": > { >
Re: slow queries
On 14 October 2015 at 18:18, Pushkar Rastewrote: > Consider > 1. Turning on docValues for fields you are sorting, faceting on. This will > require to reindex your data > Yes. I am considering doing this. > 2. Try using TrieInt type field you are trying to do range search on (you > may have to fiddle with precisoinStep) to balance index size vs > performance. > Ok. > 3. If slowness is intermittent - turn on GC logging and see if there are > any long and tune GC strategy accordingly. > The Gc strategy is the default that comes when starting solr with bin/solr start script. And I was looking at the GC logs, and saw no Full GC at all. Thank you ! > > -- Pushkar Raste > > On Wed, Oct 14, 2015 at 5:03 AM, Lorenzo Fundaró < > lorenzo.fund...@dawandamail.com> wrote: > > > Hello, > > > > I have following conf for filters and commits : > > > > Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57, > > acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8, > > regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd) > > > > > > > >${solr.autoCommit.maxTime:15000} > >false > > > > > > > > > >${solr.autoSoftCommit.maxTime:60} > > > > > > and the following stats for filters: > > > > lookups = 3602 > > hits = 3148 > > hit ratio = 0.87 > > inserts = 455 > > evictions = 400 > > size = 63 > > warmupTime = 770 > > > > *Problem: *a lot of slow queries, for example: > > > > {q=*:*=1.0=edismax=standard > > =map==pk_i,score=0=view_counter_i > > desc={!cost=1 cache=true}type_s:Product AND > is_valid_b:true={!cost=50 > > cache=true}in_languages_t:de={!cost=99 > > cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND > > (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378 > > > > I could increase the size of the filter so I would decrease the amount of > > evictions, but it seems to me this would not be solving the root problem. > > > > Some ideas on where/how to start for optimisation ? Is it actually normal > > that this query takes this time ? > > > > We have an index of ~14 million docs. 4 replicas with two cores and 1 > shard > > each. > > > > thank you. > > > > > > -- > > > > -- > > Lorenzo Fundaro > > Backend Engineer > > E-Mail: lorenzo.fund...@dawandamail.com > > > > Fax + 49 - (0)30 - 25 76 08 52 > > Tel+ 49 - (0)179 - 51 10 982 > > > > DaWanda GmbH > > Windscheidstraße 18 > > 10627 Berlin > > > > Geschäftsführer: Claudia Helming, Michael Pütz > > Amtsgericht Charlottenburg HRB 104695 B > > > -- -- Lorenzo Fundaro Backend Engineer E-Mail: lorenzo.fund...@dawandamail.com Fax + 49 - (0)30 - 25 76 08 52 Tel+ 49 - (0)179 - 51 10 982 DaWanda GmbH Windscheidstraße 18 10627 Berlin Geschäftsführer: Claudia Helming, Michael Pütz Amtsgericht Charlottenburg HRB 104695 B
Re: slow queries
<> "debug": { "rawquerystring": "*:*", "querystring": "*:*", "parsedquery": "(+MatchAllDocsQuery(*:*))/no_coord", "parsedquery_toString": "+*:*", " explain": { "Product:47047358": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:3223": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:30852121": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " Product:35018929": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:31682082": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:31077677": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:22298365": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " Product:41094514": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:13106166": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:19142249": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38243373": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " Product:20434065": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:25194801": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:885482": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:45356790": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " Product:67719831": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:12843394": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38126213": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38798130": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " Product:30292169": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:11535854": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:8443674": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:51012182": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " Product:75780871": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:20227881": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38093629": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:3142218": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " Product:15295602": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:3375982": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38276777": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:10726118": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " Product:50827742": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:5771722": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:3245678": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:13702130": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", " Product:25679953": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n" }, "QParser": "ExtendedDismaxQParser", "altquerystring": null, "boost_queries": null, "parsed_boost_queries": [], "boostfuncs": null, " filter_queries": [ "{!cost=1 cache=true}type_s:Product AND is_valid_b:true", "{!cost=50 cache=true}in_languages_t:de", "{!cost=99 cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND (cents_ri: [* TO 3000])" ], "parsed_filter_queries": [ "+type_s:Product +is_valid_b:true", "in_languages_t:de", "{!cache=false cost=99}+(shipping_country_codes_mt:de shipping_country_codes_mt:euro shipping_country_codes_mt:eur shipping_country_codes_mt:all) +cents_ri:[* TO 3000]" ], "timing": { "time": 18, "prepare": { "time": 0, "query": { " time": 0 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { " time": 0 }, "stats": { "time": 0 }, "expand": { "time": 0 }, "spellcheck": { "time": 0 }, "debug": { "time": 0 } }, "process": { "time": 18, "query": { " time": 0 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { " time": 0 }, "stats": { "time": 0 }, "expand": { "time": 0 }, "spellcheck": { "time": 0 }, "debug": { "time": 18 } } I think this clause is wrong: (cents_ri: [* 3000]) I think you mean (cents_ri: [* TO 3000]) I think I made no difference. I tried both and they both worked. But are these slow queries constant or intermittent? They are definetly cached. The second time runs in no time. I gonna try adding them in the pre warmcache too. And see the results. The field that I used for sorting is indexed but not stored and it's not a DocValue. I tried the query without the sort and the performance didnt
Re: slow queries
bq: They are definetly cached. The second time runs in no time. That's not what I was referring to. Submitting the same query over will certainly hit the queryResultCache and return in almost no time. What I meant was do things like vary the fq clause you have where you've set cache=false. Or vary the parameters in the fq clauses. The point is to only take measurements after enough queries have gone through so you're sure the low-level caches are initialized. But the queries all have to be different or you hit the queryResultCache. Best, Erick On Wed, Oct 14, 2015 at 9:50 AM, Lorenzo Fundarówrote: > On 14 October 2015 at 18:18, Pushkar Raste wrote: > >> Consider >> 1. Turning on docValues for fields you are sorting, faceting on. This will >> require to reindex your data >> > > Yes. I am considering doing this. > > >> 2. Try using TrieInt type field you are trying to do range search on (you >> may have to fiddle with precisoinStep) to balance index size vs >> performance. >> > > Ok. > > >> 3. If slowness is intermittent - turn on GC logging and see if there are >> any long and tune GC strategy accordingly. >> > > The Gc strategy is the default that comes when starting solr with bin/solr > start script. And I was looking at the GC logs, and saw no Full GC at all. > > Thank you ! > > > >> >> -- Pushkar Raste >> >> On Wed, Oct 14, 2015 at 5:03 AM, Lorenzo Fundaró < >> lorenzo.fund...@dawandamail.com> wrote: >> >> > Hello, >> > >> > I have following conf for filters and commits : >> > >> > Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57, >> > acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8, >> > regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd) >> > >> > >> > >> >${solr.autoCommit.maxTime:15000} >> >false >> > >> > >> > >> > >> >${solr.autoSoftCommit.maxTime:60} >> > >> > >> > and the following stats for filters: >> > >> > lookups = 3602 >> > hits = 3148 >> > hit ratio = 0.87 >> > inserts = 455 >> > evictions = 400 >> > size = 63 >> > warmupTime = 770 >> > >> > *Problem: *a lot of slow queries, for example: >> > >> > {q=*:*=1.0=edismax=standard >> > =map==pk_i,score=0=view_counter_i >> > desc={!cost=1 cache=true}type_s:Product AND >> is_valid_b:true={!cost=50 >> > cache=true}in_languages_t:de={!cost=99 >> > cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND >> > (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378 >> > >> > I could increase the size of the filter so I would decrease the amount of >> > evictions, but it seems to me this would not be solving the root problem. >> > >> > Some ideas on where/how to start for optimisation ? Is it actually normal >> > that this query takes this time ? >> > >> > We have an index of ~14 million docs. 4 replicas with two cores and 1 >> shard >> > each. >> > >> > thank you. >> > >> > >> > -- >> > >> > -- >> > Lorenzo Fundaro >> > Backend Engineer >> > E-Mail: lorenzo.fund...@dawandamail.com >> > >> > Fax + 49 - (0)30 - 25 76 08 52 >> > Tel+ 49 - (0)179 - 51 10 982 >> > >> > DaWanda GmbH >> > Windscheidstraße 18 >> > 10627 Berlin >> > >> > Geschäftsführer: Claudia Helming, Michael Pütz >> > Amtsgericht Charlottenburg HRB 104695 B >> > >> > > > > -- > > -- > Lorenzo Fundaro > Backend Engineer > E-Mail: lorenzo.fund...@dawandamail.com > > Fax + 49 - (0)30 - 25 76 08 52 > Tel+ 49 - (0)179 - 51 10 982 > > DaWanda GmbH > Windscheidstraße 18 > 10627 Berlin > > Geschäftsführer: Claudia Helming, Michael Pütz > Amtsgericht Charlottenburg HRB 104695 B
Re: partial search EdgeNGramFilterFactory
try adding =true to your query. The query q=SellerName:cardinal he actually parses as q=SellerName:cardinal defaultSearchField:he so I suspect you're getting on the default search field. I'm not sure EdgeNGram is what you want here though. That only grams individual tokens, so CARDINAL is grammed totally separately from HEALTH. You might consider a different tokenizer, say KeywordTokenizer and LowerCaseFilter followed by edgeNGram to treat the whole thing as a unit. You'd have to take some care to make sure you escaped spaces to get the whole thing through the query parser though. Best, Erick On Wed, Oct 14, 2015 at 11:03 AM, Brian Narsiwrote: > I have the following fieldtype in my schema: > > positionIncrementGap="100"> > > > > maxGramSize="25"/> > > > > > > > > and the following field: > required="true" multiValued="false" /> > > With the following data: > SellerName:CARDINAL HEALTH > > When I do the following search > > q:SellerName:cardinal > > I get back the results with SellerName: CARDINAL HEALTH (correct) > > or I do the search > > q:SellerName:cardinal he > > I get back the results with SellerName: CARDINAL HEALTH (correct) > > But when I do the search > > q:SellerName:cardinal hea > > I am getting the results back with SellerName:INTEGRA RADIONICS > > Why is that? > > I need it to continue to return the correct results with CARDINAL HEALTH. > How do I make that happen? > > Thanks in advance,
Re: Replication and soft commits for NRT searches
Hello, thank you for the detailed answer. If a timeout between shard leader and replica can lead to a smaller rf value (because replication has timed out), is it possible to increase this timeout in the configuration? Best Regards, Martin Mois Comments inline: On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)wrote: > Hello, > > I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been > created with replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am using autoCommit/maxDocs=1 and autoSoftCommits/maxDocs=1 in order to achieve near realtime search behavior. > > As far as I understand from section "Write Side Fault Tolerance" in the > documentation (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance), I cannot enforce that an update gets replicated to all replicas, but I can only get the achieved replication factor by requesting the return value rf. > > My question is now, what exactly does rf=2 mean? Does it only mean that the > replica has written the update to its transaction log? Or has the replica also performed the soft commit as configured with autoSoftCommits/maxDocs=1? The answer is important for me, as if the update would only get written to the transaction log, I could not search for it reliable, as the replica may not have added it to the searchable index. rf=2 means that the update was successfully replicated to and acknowledged by two replicas (including the leader). The rf only deals with the durability of the update and has no relation to visibility of the update to searchers. The auto(soft)commit settings are applied asynchronously and do not block an update request. > > My second question is, does rf=1 mean that the update was definitely not > successful on the replica or could it also represent a timeout of the replication request from the shard leader? If it could also represent a timeout, then there would be a small chance that the replication was successfully despite of the timeout. Well, rf=1 implies that the update was only applied on the leader's index + tlog and either replicas weren't available or returned an error or the request timed out. So yes, you are right that it can represent a timeout and as such there is a chance that the replication was indeed successful despite of the timeout. > > Is there a way to retrieve the replication factor for a specific document > after the update in order to check if replication was successful in the meantime? > No, there is no way to do that. > Thanks in advance. > > Best Regards, > Martin Mois > # > " This e-mail and any attached documents may contain confidential or > proprietary information. If you are not the intended recipient, you are notified that any dissemination, copying of this e-mail and any attachments thereto or use of their contents by any means whatsoever is strictly prohibited. If you have received this e-mail in error, please advise the sender immediately and delete this e-mail and all attached documents from your computer system." > # -- Regards, Shalin Shekhar Mangar. # " This e-mail and any attached documents may contain confidential or proprietary information. If you are not the intended recipient, you are notified that any dissemination, copying of this e-mail and any attachments thereto or use of their contents by any means whatsoever is strictly prohibited. If you have received this e-mail in error, please advise the sender immediately and delete this e-mail and all attached documents from your computer system." #
Re: Request for Wiki edit right
Thank you very much Erick. Arcadius. On 13 October 2015 at 22:04, Erick Ericksonwrote: > Just added you to the Solr Wiki contributors group, if you need to > access the Lucene Wiki let us know. > > Best, > Erick > > On Tue, Oct 13, 2015 at 1:57 PM, Arcadius Ahouansou > wrote: > > Hello Erick. > > Thank you for the detailed info. > > My username is arcadius. > > > > Thanks. > > > > > > On 13 October 2015 at 16:58, Erick Erickson > wrote: > > > >> Create a user on the Wiki (anyone can), then tell us the user name > >> you've created and we'll add you to the auth lists. There are separate > >> lists for Solr and Lucene. We had to lock these down because we were > >> getting a lot of spam pages created. > >> > >> The reference guide (CWiki) is restricted to committers though. > >> > >> Best, > >> Erick > >> > >> On Tue, Oct 13, 2015 at 6:30 AM, Arcadius Ahouansou > >> wrote: > >> > Hello. > >> > > >> > Please, can I have the right to edit the Wiki? > >> > > >> > Thanks. > >> > > >> > Arcadius. > >> > > > > > > > > -- > > Arcadius Ahouansou > > Menelic Ltd | Information is Power > > M: 07908761999 > > W: www.menelic.com > > --- > -- Arcadius Ahouansou Menelic Ltd | Information is Power M: 07908761999 W: www.menelic.com ---
slow queries
Hello, I have following conf for filters and commits : Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57, acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8, regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd) ${solr.autoCommit.maxTime:15000} false ${solr.autoSoftCommit.maxTime:60} and the following stats for filters: lookups = 3602 hits = 3148 hit ratio = 0.87 inserts = 455 evictions = 400 size = 63 warmupTime = 770 *Problem: *a lot of slow queries, for example: {q=*:*=1.0=edismax=standard=map==pk_i,score=0=view_counter_i desc={!cost=1 cache=true}type_s:Product AND is_valid_b:true={!cost=50 cache=true}in_languages_t:de={!cost=99 cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378 I could increase the size of the filter so I would decrease the amount of evictions, but it seems to me this would not be solving the root problem. Some ideas on where/how to start for optimisation ? Is it actually normal that this query takes this time ? We have an index of ~14 million docs. 4 replicas with two cores and 1 shard each. thank you. -- -- Lorenzo Fundaro Backend Engineer E-Mail: lorenzo.fund...@dawandamail.com Fax + 49 - (0)30 - 25 76 08 52 Tel+ 49 - (0)179 - 51 10 982 DaWanda GmbH Windscheidstraße 18 10627 Berlin Geschäftsführer: Claudia Helming, Michael Pütz Amtsgericht Charlottenburg HRB 104695 B
Re: Using SimpleNaiveBayesClassifier in solr
Thank Ales and Tommaso for your replies So, is it like the classifier query the whole index db and load onto memory first before running tokenizer against InputDocument? It sounds like if I don't close the classifier and my index is big, i might need bigger machine. Anyway to reverse the order? Do I sound dump? On 12 October 2015 at 16:11, Alessandro Benedetti < benedetti.ale...@gmail.com> wrote: > Hi Yewint, > > > > The sample test code inside seems like that classifier read the whole > index > > db to train the model everytime when classification happened for > > inputDocument. or am I misunderstanding something here? > > > I would suggest you to take a look to a couple of articles I wrote last > summer about the Classification in Lucene and Solr : > > > http://alexbenedetti.blogspot.co.uk/2015/07/lucene-document-classification.html > > > http://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html > > Basically your misunderstood is that this module work as standard > classifier, which is not our case. > Lucene Classification doesn't train a model over time, the Index is your > model. > It uses the Index data structures to perform the classification processes > (Knn and Simple Bayes are the algorithms I explored at that time) . > Basically the algorithms access to Term Frequencies and Document > Frequencies stored in the Inverted index. > > Having a big Index will affect as of course we are querying the index, but > not because we are building a model. > > +1 on all Tommaso's observations! > > Cheers > > > > On 10 October 2015 at 20:36, Yewint Kowrote: > > > Hi > > > > I am trying to use SimpleNaiveBayesClassifier in my solr project. > Currently > > looking at its test base ClassificationTestBase.java. > > > > The sample test code inside seems like that classifier read the whole > index > > db to train the model everytime when classification happened for > > inputDocument. or am I misunderstanding something here? If i had a large > > index db, will it impact performance? > > > > protected void checkCorrectClassification(Classifier classifier, > String > > inputDoc, T expectedResult, Analyzer analyzer, String textFieldName, > String > > classFieldName, Query query) throws Exception { > > > > AtomicReader atomicReader = null; > > > > try { > > > > populateSampleIndex(analyzer); > > > > atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter > > .getReader()); > > > > classifier.train(atomicReader, textFieldName, classFieldName, > > analyzer, > > query); > > > > ClassificationResult classificationResult = > > classifier.assignClass( > > inputDoc); > > > > assertNotNull(classificationResult.getAssignedClass()); > > > > assertEquals("got an assigned class of " + > > classificationResult.getAssignedClass(), > > expectedResult, classificationResult.getAssignedClass()); > > > > assertTrue("got a not positive score " + > > classificationResult.getScore(), > > classificationResult.getScore() > 0); > > > > } finally { > > > > if (atomicReader != null) > > > > atomicReader.close(); > > > > } > > > > } > > > > > > -- > -- > > Benedetti Alessandro > Visiting card - http://about.me/alessandro_benedetti > Blog - http://alexbenedetti.blogspot.co.uk > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >
Re: Grouping facets: Possible to get facet results for each Group?
mmm let's say that nested facets are a subset of Pivot Facets. if pivot faceting works with the classic flat document structure, the sub facet are working with any nested structure. So be careful about pivot faceting in a flat document with multi valued fields, because you lose the relation across the different fields value. Cheers On 13 October 2015 at 18:06, Peter Sturgewrote: > Hi, > Thanks for your response. > I did have a look at pivots, and they could work in a way. We're still on > Solr 4.3, so I'll have to wait for sub-facets - but they sure look pretty > cool! > Peter > > > On Tue, Oct 13, 2015 at 12:30 PM, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > Can you model your business domain with Solr nested Docs ? In the case > you > > can use Yonik article about nested facets. > > > > Cheers > > > > On 13 October 2015 at 05:05, Alexandre Rafalovitch > > wrote: > > > > > Could you use the new nested facets syntax? > > > http://yonik.com/solr-subfacets/ > > > > > > Regards, > > >Alex. > > > > > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > > > http://www.solr-start.com/ > > > > > > On 11 October 2015 at 09:51, Peter Sturge > > wrote: > > > > Been trying to coerce Group faceting to give some faceting back for > > each > > > > group, but maybe this use case isn't catered for in Grouping? : > > > > > > > > So the Use Case is this: > > > > Let's say I do a grouped search that returns say, 9 distinct groups, > > and > > > in > > > > these groups are various numbers of unique field values that need > > > faceting > > > > - but the faceting needs to be within each group: > > > > > > > > > > > -- > > -- > > > > Benedetti Alessandro > > Visiting card - http://about.me/alessandro_benedetti > > Blog - http://alexbenedetti.blogspot.co.uk > > > > "Tyger, tyger burning bright > > In the forests of the night, > > What immortal hand or eye > > Could frame thy fearful symmetry?" > > > > William Blake - Songs of Experience -1794 England > > > -- -- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: AutoComplete Feature in Solr
using the suggester feature you can in some case rank the suggestions based on an additional numeric field. It's not your use case, you actually want to use a search handler with a well defined schema that will allow you for example to query on an edge ngram token filtered field, applying a geo distance boost function. This is what i would use and would work fine with your applied filter queries as well ( reducing the space of Suggestions) Cheers On 14 October 2015 at 05:09, William Bellwrote: > We want to use suggester but also want to show those results closest to my > lat,long... Kinda combine suggester and bq=geodist() > > On Mon, Oct 12, 2015 at 2:24 PM, Salman Ansari > wrote: > > > Hi, > > > > I have been trying to get the autocomplete feature in Solr working with > no > > luck up to now. First I read that "suggest component" is the recommended > > way as in the below article (and this is the exact functionality I am > > looking for, which is to autocomplete multiple words) > > > > > http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/ > > > > Then I tried implementing suggest as described in the following articles > in > > this order > > 1) https://wiki.apache.org/solr/Suggester#SearchHandler_configuration > > 2) http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/ (I > > implemented suggesting phrases) > > 3) > > > > > http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplete-on-whole-phrase-when-query-contains-multiple-terms > > > > With no luck, after implementing each article when I run my query as > > http://[MySolr]:8983/solr/entityStore114/suggest?spellcheck.q=Barack > > > > > > > > I get > > > > > > 0 > > 0 > > > > > > > > Although I have an entry for Barack Obama in my index. I am posting my > > Solr configuration as well > > > > > > > > suggest > > org.apache.solr.spelling.suggest.Suggester > >> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup > > entity_autocomplete > > true > > > > > > > > > class="org.apache.solr.handler.component.SearchHandler"> > > > > true > > suggest > > 10 > > true > > false > > > > > > suggest > > > > > > > > It looks like a very simple job, but even after following so many > articles, > > I could not get it right. Any comment will be appreciated! > > > > Regards, > > Salman > > > > > > -- > Bill Bell > billnb...@gmail.com > cell 720-256-8076 > -- -- Benedetti Alessandro Visiting card - http://about.me/alessandro_benedetti Blog - http://alexbenedetti.blogspot.co.uk "Tyger, tyger burning bright In the forests of the night, What immortal hand or eye Could frame thy fearful symmetry?" William Blake - Songs of Experience -1794 England
Re: catchall fields or multiple fields
Thanks for your suggestion Jack. In fact we're doing geographic search (fields are country, state, county, town, hamlet, district) So it's difficult to split. Best regards, Elisabeth 2015-10-13 16:01 GMT+02:00 Jack Krupansky: > Performing a sequence of queries can help too. For example, if users > commonly search for a product name, you could do an initial query on just > the product name field which should be much faster than searching the text > of all product descriptions, and highlighting would be less problematic. If > that initial query comes up empty, then you could move on to the next > highest most likely field, maybe product title (short one line > description), and query voluminous fields like detailed product > descriptions, specifications, and user comments/reviews only as a last > resort. > > -- Jack Krupansky > > On Tue, Oct 13, 2015 at 6:17 AM, elisabeth benoit < > elisaelisael...@gmail.com > > wrote: > > > Thanks to you all for those informed advices. > > > > Thanks Trey for your very detailed point of view. This is now very clear > to > > me how a search on multiple fields can grow slower than a search on a > > catchall field. > > > > Our actual search model is problematic: we search on a catchall field, > but > > need to know which fields match, so we do highlighting on multi fields > (not > > indexed, but stored). To improve performance, we want to get rid of > > highlighting and use the solr explain output. To get the explain output > on > > those fields, we need to do a search on those fields. > > > > So I guess we have to test if removing highlighting and adding multi > fields > > search will improve performances or not. > > > > Best regards, > > Elisabeth > > > > > > > > 2015-10-12 17:55 GMT+02:00 Jack Krupansky : > > > > > I think it may all depend on the nature of your application and how > much > > > commonality there is between fields. > > > > > > One interesting area is auto-suggest, where you can certainly suggest > > from > > > the union of all fields, you may want to give priority to suggestions > > from > > > preferred fields. For example, for actual product names or important > > > keywords rather than random words from the English language that happen > > to > > > occur in descriptions, all of which would occur in a catchall. > > > > > > -- Jack Krupansky > > > > > > On Mon, Oct 12, 2015 at 8:39 AM, elisabeth benoit < > > > elisaelisael...@gmail.com > > > > wrote: > > > > > > > Hello, > > > > > > > > We're using solr 4.10 and storing all data in a catchall field. It > > seems > > > to > > > > me that one good reason for using a catchall field is when using > > scoring > > > > with idf (with idf, a word might not have same score in all fields). > We > > > got > > > > rid of idf and are now considering using multiple fields. I remember > > > > reading somewhere that using a catchall field might speed up > searching > > > > time. I was wondering if some of you have any opinion (or experience) > > > > related to this subject. > > > > > > > > Best regards, > > > > Elisabeth > > > > > > > > > >
Re: Solr Pagination
I have not benchmarked various number of segments at different sizes on different HW etc, so my hunch could very well be wrong for Salman’s case. I don’t know how frequent updates there is to his data either. Have you done #segments benchmarking for your huge datasets? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com > 12. okt. 2015 kl. 12.56 skrev Toke Eskildsen: > > On Mon, 2015-10-12 at 10:05 +0200, Jan Høydahl wrote: >> What you do when you call optimize is to force Lucene to merge all >> those 35M docs into ONE SINGLE index segment. You get better HW >> utilization if you let Lucene/Solr automatically handle merging, >> meaning you’ll have around 10 smaller segments that are faster to >> search across than one huge segment. > > As individual Lucene/Solr shard searches are very much single threaded, > the single segment version should be faster. Have you observed > otherwise? > > > Optimization is a fine feature if ones workflow is batch oriented with > sufficiently long pauses between index updates. Nightly index updates > with few active users at that time could be an example. > > - Toke Eskildsen, State and University Library, Denmark > >
Re: Using SimpleNaiveBayesClassifier in solr
ahahah absolutely not, you don't sound dumb. You need only a basic knowledge of how Lucene manage IndexReaders and IndexSearchers. On 14 October 2015 at 09:08, Yewint Kowrote: > Thank Ales and Tommaso for your replies > > So, is it like the classifier query the whole index db and load onto memory > first before running tokenizer against InputDocument? Your Index for durability is flushed on the disk on every hard commit, so it will be physically present as a set of disk files ( each file is related to a specific data structure in the Index) in your data directory. At this point Lucene model the data directory depending of the implementation of the Index Directory. For example in Solr the default Lucene Index Directory implementation is : NRTCachingDirectoryFactory . This Directory implementation is based on the OS feature of Memory Mapping, optimized to manage Near Real time caching of small files for NRT Search systems. This means that Lucene will leverage the OS memory map implementation ( using the memory available to the OS) . Ideally if your RAM memory allows that, you are going to see the entire index in memory and searches will be really fast. If all the index does't fix , partition of itself are going to pass in the memory, and some I/O will happen, with a degradation of performances. Hope this clarifies your first doubt. It is not a requirement for the index to be load in memory immediately, but files will be cached in memory over time, during the life of your system. > It sounds like if I > don't close the classifier and my index is big, i might need bigger > machine. Anyway to reverse the order? Do I sound dump? > I would not be so worried about this, the memory mapping management , will be quite efficient, just focus on the implementation of your functionality and prototype to see the performances. If you don't match the expected, you can try to improve the bottlenecks, maybe can be the disk I/O, in the case you could switch to a SSD and improve the problem, etc etc Cheers > > On 12 October 2015 at 16:11, Alessandro Benedetti < > benedetti.ale...@gmail.com> wrote: > > > Hi Yewint, > > > > > > The sample test code inside seems like that classifier read the whole > > index > > > db to train the model everytime when classification happened for > > > inputDocument. or am I misunderstanding something here? > > > > > > I would suggest you to take a look to a couple of articles I wrote last > > summer about the Classification in Lucene and Solr : > > > > > > > http://alexbenedetti.blogspot.co.uk/2015/07/lucene-document-classification.html > > > > > > > http://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html > > > > Basically your misunderstood is that this module work as standard > > classifier, which is not our case. > > Lucene Classification doesn't train a model over time, the Index is your > > model. > > It uses the Index data structures to perform the classification processes > > (Knn and Simple Bayes are the algorithms I explored at that time) . > > Basically the algorithms access to Term Frequencies and Document > > Frequencies stored in the Inverted index. > > > > Having a big Index will affect as of course we are querying the index, > but > > not because we are building a model. > > > > +1 on all Tommaso's observations! > > > > Cheers > > > > > > > > On 10 October 2015 at 20:36, Yewint Ko wrote: > > > > > Hi > > > > > > I am trying to use SimpleNaiveBayesClassifier in my solr project. > > Currently > > > looking at its test base ClassificationTestBase.java. > > > > > > The sample test code inside seems like that classifier read the whole > > index > > > db to train the model everytime when classification happened for > > > inputDocument. or am I misunderstanding something here? If i had a > large > > > index db, will it impact performance? > > > > > > protected void checkCorrectClassification(Classifier classifier, > > String > > > inputDoc, T expectedResult, Analyzer analyzer, String textFieldName, > > String > > > classFieldName, Query query) throws Exception { > > > > > > AtomicReader atomicReader = null; > > > > > > try { > > > > > > populateSampleIndex(analyzer); > > > > > > atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter > > > .getReader()); > > > > > > classifier.train(atomicReader, textFieldName, classFieldName, > > > analyzer, > > > query); > > > > > > ClassificationResult classificationResult = > > > classifier.assignClass( > > > inputDoc); > > > > > > assertNotNull(classificationResult.getAssignedClass()); > > > > > > assertEquals("got an assigned class of " + > > > classificationResult.getAssignedClass(), > > > expectedResult, classificationResult.getAssignedClass()); > > > > > > assertTrue("got a not positive score " + > > > classificationResult.getScore(), > > > classificationResult.getScore() > 0); > > > > > > }
Re: Can I use tokenizer twice ?
Hi, Analyzers must have exactly one tokenizer, no more and no less. You could achieve what you want by copying to another field and defining a separate analyzer for each. One would create shingles, and the other edge ngrams. Steve > On Oct 14, 2015, at 11:58 AM, vitwrote: > > I have Solr 4.2 > I need to do the following: > > 1. white space tokenize > 2. create shingles > 3. use EdgeNGramFilter for each word in shingles, but not in a shingle as a > string > > So can I do this? > > * * > > maxShingleSize="4" outputUnigrams="false" outputUnigramsIfNoShingles="true" > /> > * * > maxGramSize="25"/> > > > > -- > View this message in context: > http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438.html > Sent from the Solr - User mailing list archive at Nabble.com.
Re: AutoComplete Feature in Solr
Actually what you mentioned Alessandro is something interesting for me. I am looking to boost the ranking of some suggestions based on some dynamic criteria (let's say how frequent they are used). Do I need to update the boost field each time I request the suggestion (to capture the frequency)? If you can direct me to an article that explains this with some scenarios of using boost that would be appreciated. Regards, Salman On Wed, Oct 14, 2015 at 11:49 AM, Alessandro Benedetti < benedetti.ale...@gmail.com> wrote: > using the suggester feature you can in some case rank the suggestions based > on an additional numeric field. > It's not your use case, you actually want to use a search handler with a > well defined schema that will allow you for example to query on an edge > ngram token filtered field, applying a geo distance boost function. > > This is what i would use and would work fine with your applied filter > queries as well ( reducing the space of Suggestions) > > Cheers > > On 14 October 2015 at 05:09, William Bellwrote: > > > We want to use suggester but also want to show those results closest to > my > > lat,long... Kinda combine suggester and bq=geodist() > > > > On Mon, Oct 12, 2015 at 2:24 PM, Salman Ansari > > wrote: > > > > > Hi, > > > > > > I have been trying to get the autocomplete feature in Solr working with > > no > > > luck up to now. First I read that "suggest component" is the > recommended > > > way as in the below article (and this is the exact functionality I am > > > looking for, which is to autocomplete multiple words) > > > > > > > > > http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/ > > > > > > Then I tried implementing suggest as described in the following > articles > > in > > > this order > > > 1) https://wiki.apache.org/solr/Suggester#SearchHandler_configuration > > > 2) http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/ (I > > > implemented suggesting phrases) > > > 3) > > > > > > > > > http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplete-on-whole-phrase-when-query-contains-multiple-terms > > > > > > With no luck, after implementing each article when I run my query as > > > http://[MySolr]:8983/solr/entityStore114/suggest?spellcheck.q=Barack > > > > > > > > > > > > I get > > > > > > > > > 0 > > > 0 > > > > > > > > > > > > Although I have an entry for Barack Obama in my index. I am posting my > > > Solr configuration as well > > > > > > > > > > > > suggest > > >name="classname">org.apache.solr.spelling.suggest.Suggester > > >> > name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup > > > entity_autocomplete > > > true > > > > > > > > > > > > > > class="org.apache.solr.handler.component.SearchHandler"> > > > > > > true > > > suggest > > > 10 > > > true > > > false > > > > > > > > > suggest > > > > > > > > > > > > It looks like a very simple job, but even after following so many > > articles, > > > I could not get it right. Any comment will be appreciated! > > > > > > Regards, > > > Salman > > > > > > > > > > > -- > > Bill Bell > > billnb...@gmail.com > > cell 720-256-8076 > > > > > > -- > -- > > Benedetti Alessandro > Visiting card - http://about.me/alessandro_benedetti > Blog - http://alexbenedetti.blogspot.co.uk > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >
Re: Grouping facets: Possible to get facet results for each Group?
Yes, you are right about that - I've used pivots before and they do need to be used judiciously. Fortunately, we only ever use single-value fields, as it gives some good advantages in a heavily sharded environment. Our document structure is, by it's very nature always flat, so it could be an impediment to nested facets, but I don't know enough about them to know for sure. Thanks, Peter On Wed, Oct 14, 2015 at 9:44 AM, Alessandro Benedetti < benedetti.ale...@gmail.com> wrote: > mmm let's say that nested facets are a subset of Pivot Facets. > if pivot faceting works with the classic flat document structure, the sub > facet are working with any nested structure. > So be careful about pivot faceting in a flat document with multi valued > fields, because you lose the relation across the different fields value. > > Cheers > > On 13 October 2015 at 18:06, Peter Sturgewrote: > > > Hi, > > Thanks for your response. > > I did have a look at pivots, and they could work in a way. We're still on > > Solr 4.3, so I'll have to wait for sub-facets - but they sure look pretty > > cool! > > Peter > > > > > > On Tue, Oct 13, 2015 at 12:30 PM, Alessandro Benedetti < > > benedetti.ale...@gmail.com> wrote: > > > > > Can you model your business domain with Solr nested Docs ? In the case > > you > > > can use Yonik article about nested facets. > > > > > > Cheers > > > > > > On 13 October 2015 at 05:05, Alexandre Rafalovitch > > > > wrote: > > > > > > > Could you use the new nested facets syntax? > > > > http://yonik.com/solr-subfacets/ > > > > > > > > Regards, > > > >Alex. > > > > > > > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter: > > > > http://www.solr-start.com/ > > > > > > > > On 11 October 2015 at 09:51, Peter Sturge > > > wrote: > > > > > Been trying to coerce Group faceting to give some faceting back for > > > each > > > > > group, but maybe this use case isn't catered for in Grouping? : > > > > > > > > > > So the Use Case is this: > > > > > Let's say I do a grouped search that returns say, 9 distinct > groups, > > > and > > > > in > > > > > these groups are various numbers of unique field values that need > > > > faceting > > > > > - but the faceting needs to be within each group: > > > > > > > > > > > > > > > > -- > > > -- > > > > > > Benedetti Alessandro > > > Visiting card - http://about.me/alessandro_benedetti > > > Blog - http://alexbenedetti.blogspot.co.uk > > > > > > "Tyger, tyger burning bright > > > In the forests of the night, > > > What immortal hand or eye > > > Could frame thy fearful symmetry?" > > > > > > William Blake - Songs of Experience -1794 England > > > > > > > > > -- > -- > > Benedetti Alessandro > Visiting card - http://about.me/alessandro_benedetti > Blog - http://alexbenedetti.blogspot.co.uk > > "Tyger, tyger burning bright > In the forests of the night, > What immortal hand or eye > Could frame thy fearful symmetry?" > > William Blake - Songs of Experience -1794 England >
Re: Can I use tokenizer twice ?
Steve, /You could achieve what you want by copying to another field and defining a separate analyzer for each. One would create shingles, and the other edge ngrams. / Could you please elaborate this. I am not sure I understand how to do it by using copyField. -- View this message in context: http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438p4234503.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Run Solr 5.3.0 as a Service on Windows using NSSM
Hi, I am trying to implement some scripting to detect if all Zookeepers have started in a cluster, then restart the solr servers. Has anyone achieved this yet through scripting? I also saw there is the ZookeeperClient that is available in .NET via a nuget package. Not sure if this could be also implemented to check if a zookeeper is running. Any thoughts? Regards, Adrian -Original Message- From: Anders Thulin [mailto:anders.thu...@comintelli.com] Sent: Wednesday, October 14, 2015 11:44 PM To: solr-user@lucene.apache.org Subject: Re: Run Solr 5.3.0 as a Service on Windows using NSSM Did you add the f param for running it in foreground? I noticed that the Solr service was restarted indefinetly when running it as a background service. its also needed to stop the windows service. This test worked well here (on Windows 2012): REM Test for running solr 5.3.1 as a windows service C:\nssm\nssm64.exe install "Solr 5.3.1" C:\search\solr-5.3.1\bin\solr.cmd "start -f -p 8983" On 8 October 2015 at 04:34, Zheng Lin Edwin Yeowrote: > Hi Adrian and Upayavira, > > It works fine when I start Solr outside NSSM. > As for the NSSM, so far I haven't tried the automatic startup yet. I > start the services for ZooKeeper and Solr in NSSM manually from the > Windows Component Services, so the ZooKeeper will have been started > before I start Solr. > > I'll also try to write the script for Solr that can check it can > access Zookeeper before attempting to start Solr. > > Regards, > Edwin > > > On 7 October 2015 at 19:16, Upayavira wrote: > > > Wrap your script that starts Solr with one that checks it can access > > Zookeeper before attempting to start Solr, that way, once ZK starts, > > Solr will come up. Then, hand *that* script to NSSM. > > > > And finally, when one of you has got a setup that works with NSSM > > starting Solr via the default bin\solr.cmd script, create a patch > > and upload it to JIRA. It would be a valuable thing for Solr to have > > a > > *standard* way to start Solr on Windows as a service. I recall > > checking the NSSM license and it wouldn't be an issue to include it > > within Solr - or to have a script that assumes it is installed. > > > > Upayavira > > > > On Wed, Oct 7, 2015, at 11:49 AM, Adrian Liew wrote: > > > Hi Edwin, > > > > > > You may want to try explore some of the configuration properties > > > to configure in zookeeper. > > > > > > > > > http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#sc_zkMulitS > erverSetup > > > > > > My recommendation is to try run your batch files outside of NSSM > > > so it > is > > > easier to debug and observe what you see from the command window. > > > I > don't > > > think ZK and Solr can be automated on startup well using NSSM due > > > to > the > > > fact that ZK services need to be running before you start up Solr > > > services. I just had conversation with Shawn on this topic. NSSM > > > cannot do the magic startup in a cluster setup. In that, you may > > > need to write custom scripting to get it right. > > > > > > Back to your original issue, I guess it is worth exploring timeout > > > values. Then again, I will leave the real Solr experts to chip in > > > their thoughts. > > > > > > Best regards, > > > > > > Adrian Liew > > > > > > > > > -Original Message- > > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] > > > Sent: Wednesday, October 7, 2015 1:40 PM > > > To: solr-user@lucene.apache.org > > > Subject: Re: Run Solr 5.3.0 as a Service on Windows using NSSM > > > > > > Hi Adrian, > > > > > > I've waited for more than 5 minutes and most of the time when I > > > refresh it says that the page cannot be found. Got one or twice > > > the main Admin page is loaded, but none of the cores are loaded. > > > > > > I have 20 cores which I'm loading. The core are of various sizes, > > > but > the > > > maximum one is 38GB. Others ranges from 10GB to 15GB, and there're > > > some which are less than 1GB. > > > > > > My overall core size is about 200GB. > > > > > > Regards, > > > Edwin > > > > > > > > > On 7 October 2015 at 12:11, Adrian Liew > wrote: > > > > > > > Hi Edwin, > > > > > > > > I have setup NSSM on Solr 5.3.0 in an Azure VM and can start up > > > > Solr with a base standalone installation. > > > > > > > > You may have to give Solr some time to bootstrap things and wait > > > > for the page to reload. Are you still seeing the page after 1 > > > > minute or > so? > > > > > > > > What are your core sizes? And how many cores are you trying to load? > > > > > > > > Best regards, > > > > Adrian > > > > > > > > -Original Message- > > > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] > > > > Sent: Wednesday, October 7, 2015 11:46 AM > > > > To: solr-user@lucene.apache.org > > > > Subject: Run Solr 5.3.0 as a Service on Windows using NSSM > > > > > > > > Hi, > > > > > > > > I tried to follow this to
Re: Run Solr 5.3.0 as a Service on Windows using NSSM
Hi Anders, Yes, I did put the -f param for running it in foreground. I put start -f -p 8983 in the Arugments parameters in NSSM service installer. Is that the correct place to put for Solr 5.3.0? I did the same way for Solr 5.1 and it was working then. I'm using Windows 8.1. Regards, Edwin On 14 October 2015 at 23:44, Anders Thulinwrote: > Did you add the f param for running it in foreground? > I noticed that the Solr service was restarted indefinetly when running it > as a background service. > its also needed to stop the windows service. > > This test worked well here (on Windows 2012): > > REM Test for running solr 5.3.1 as a windows service > C:\nssm\nssm64.exe install "Solr 5.3.1" C:\search\solr-5.3.1\bin\solr.cmd > "start -f -p 8983" > > On 8 October 2015 at 04:34, Zheng Lin Edwin Yeo > wrote: > > > Hi Adrian and Upayavira, > > > > It works fine when I start Solr outside NSSM. > > As for the NSSM, so far I haven't tried the automatic startup yet. I > start > > the services for ZooKeeper and Solr in NSSM manually from the Windows > > Component Services, so the ZooKeeper will have been started before I > start > > Solr. > > > > I'll also try to write the script for Solr that can check it can access > > Zookeeper before attempting to start Solr. > > > > Regards, > > Edwin > > > > > > On 7 October 2015 at 19:16, Upayavira wrote: > > > > > Wrap your script that starts Solr with one that checks it can access > > > Zookeeper before attempting to start Solr, that way, once ZK starts, > > > Solr will come up. Then, hand *that* script to NSSM. > > > > > > And finally, when one of you has got a setup that works with NSSM > > > starting Solr via the default bin\solr.cmd script, create a patch and > > > upload it to JIRA. It would be a valuable thing for Solr to have a > > > *standard* way to start Solr on Windows as a service. I recall checking > > > the NSSM license and it wouldn't be an issue to include it within Solr > - > > > or to have a script that assumes it is installed. > > > > > > Upayavira > > > > > > On Wed, Oct 7, 2015, at 11:49 AM, Adrian Liew wrote: > > > > Hi Edwin, > > > > > > > > You may want to try explore some of the configuration properties to > > > > configure in zookeeper. > > > > > > > > > > > > > > http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#sc_zkMulitServerSetup > > > > > > > > My recommendation is to try run your batch files outside of NSSM so > it > > is > > > > easier to debug and observe what you see from the command window. I > > don't > > > > think ZK and Solr can be automated on startup well using NSSM due to > > the > > > > fact that ZK services need to be running before you start up Solr > > > > services. I just had conversation with Shawn on this topic. NSSM > cannot > > > > do the magic startup in a cluster setup. In that, you may need to > write > > > > custom scripting to get it right. > > > > > > > > Back to your original issue, I guess it is worth exploring timeout > > > > values. Then again, I will leave the real Solr experts to chip in > their > > > > thoughts. > > > > > > > > Best regards, > > > > > > > > Adrian Liew > > > > > > > > > > > > -Original Message- > > > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] > > > > Sent: Wednesday, October 7, 2015 1:40 PM > > > > To: solr-user@lucene.apache.org > > > > Subject: Re: Run Solr 5.3.0 as a Service on Windows using NSSM > > > > > > > > Hi Adrian, > > > > > > > > I've waited for more than 5 minutes and most of the time when I > refresh > > > > it says that the page cannot be found. Got one or twice the main > Admin > > > > page is loaded, but none of the cores are loaded. > > > > > > > > I have 20 cores which I'm loading. The core are of various sizes, but > > the > > > > maximum one is 38GB. Others ranges from 10GB to 15GB, and there're > some > > > > which are less than 1GB. > > > > > > > > My overall core size is about 200GB. > > > > > > > > Regards, > > > > Edwin > > > > > > > > > > > > On 7 October 2015 at 12:11, Adrian Liew > > wrote: > > > > > > > > > Hi Edwin, > > > > > > > > > > I have setup NSSM on Solr 5.3.0 in an Azure VM and can start up > Solr > > > > > with a base standalone installation. > > > > > > > > > > You may have to give Solr some time to bootstrap things and wait > for > > > > > the page to reload. Are you still seeing the page after 1 minute or > > so? > > > > > > > > > > What are your core sizes? And how many cores are you trying to > load? > > > > > > > > > > Best regards, > > > > > Adrian > > > > > > > > > > -Original Message- > > > > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com] > > > > > Sent: Wednesday, October 7, 2015 11:46 AM > > > > > To: solr-user@lucene.apache.org > > > > > Subject: Run Solr 5.3.0 as a Service on Windows using NSSM > > > > > > > > > > Hi, > > > > > > > > > > I tried to follow this to start my Solr as a service
Re: partial search EdgeNGramFilterFactory
Thank you Erick. Yes it was the default search field. So for the following SellerName: 1) cardinal healthcare products 2) cardinal healthcare 3) postoperative cardinal healthcare 4) surgical cardinal products My requirement is: q=SellerName:cardinal - all 4 records returned q=SellerName:healthcare - 1,2,3 returned q=SellerName:surgical cardinal - 4 returned q=SellerName:cardinal healthcare - 1,2,3 returned q=SellerName:products - 1,4 returned q=SellerName:car - nothing returned q=SellerName:card - all 4 returned How should I setup my fieldtype? Thanks On Wed, Oct 14, 2015 at 1:14 PM, Erick Ericksonwrote: > try adding =true to your query. The query > q=SellerName:cardinal he > actually parses as > q=SellerName:cardinal defaultSearchField:he > > so I suspect you're getting on the default search field. > > I'm not sure EdgeNGram is what you want here though. > That only grams individual tokens, so CARDINAL is grammed > totally separately from HEALTH. You might consider > a different tokenizer, say KeywordTokenizer and LowerCaseFilter > followed by edgeNGram to treat the whole thing as a unit. You'd have > to take some care to make sure you escaped spaces to get > the whole thing through the query parser though. > > Best, > Erick > > On Wed, Oct 14, 2015 at 11:03 AM, Brian Narsi wrote: > > I have the following fieldtype in my schema: > > > > > positionIncrementGap="100"> > > > > > > > > > maxGramSize="25"/> > > > > > > > > > > > > > > > > and the following field: > > > required="true" multiValued="false" /> > > > > With the following data: > > SellerName:CARDINAL HEALTH > > > > When I do the following search > > > > q:SellerName:cardinal > > > > I get back the results with SellerName: CARDINAL HEALTH (correct) > > > > or I do the search > > > > q:SellerName:cardinal he > > > > I get back the results with SellerName: CARDINAL HEALTH (correct) > > > > But when I do the search > > > > q:SellerName:cardinal hea > > > > I am getting the results back with SellerName:INTEGRA RADIONICS > > > > Why is that? > > > > I need it to continue to return the correct results with CARDINAL HEALTH. > > How do I make that happen? > > > > Thanks in advance, >