Re: Start With and contain search
Thanks, I think the NgramFitlerFactory is the good filter, I will try it today but what if I want to search for query :* dom host *and get the result: *domhost* -- View this message in context: http://lucene.472066.n3.nabble.com/Start-With-and-contain-search-tp4171854p4172031.html Sent from the Solr - User mailing list archive at Nabble.com.
Get list of collection
Hi All, I have a requirement to get the list of *collection* available in Solr. we are using solrj library. I am able to fetch the list of cores but not getting ways to fetch the list of collections. Below is the sample example, that i am using to fetch the cores: CoreAdminRequest request = new CoreAdminRequest(); request.setAction(CoreAdminAction.STATUS); CoreAdminResponse cores = request.process(server); // List of the cores ListString coreList = new ArrayListString(); for (int i = 0; i cores.getCoreStatus().size(); i++) { coreList.add(cores.getCoreStatus().getName(i)); } Please help. -- Thanks, Ankit Jain
Slow queries
Hi, I have a solr collection with 16 millions documents and growing daily with 1 documents recently it is becoming slow to answer my request ( several seconds) specially when I use multi-words query I am running solr on a machine with 32G RAM but heavy used one What are my options to optimize the collection and speed up querying it is it normal with this volume of data? is sharding is a good solution? regards, -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow queries
If you performance was fine but degraded over the time it might be easier to check / increase the memory to have better disk caching. Cheers, Siegfried Goeschl On 02.12.14 09:27, melb wrote: Hi, I have a solr collection with 16 millions documents and growing daily with 1 documents recently it is becoming slow to answer my request ( several seconds) specially when I use multi-words query I am running solr on a machine with 32G RAM but heavy used one What are my options to optimize the collection and speed up querying it is it normal with this volume of data? is sharding is a good solution? regards, -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Slow queries
Yes performance degraded over the time, I can raise the memory but I can't do it every time and the volume will keep growing Is it better to put the solr on dedicated machine? Is there any thing else that can be done to the solr instance for example deviding the collection? rgds, -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032p4172039.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Different update handlers for auto commit configuration
Thanks for the clarification, I indeed mixed it with UpdateRequestHandler. On Mon, Dec 1, 2014 at 11:24 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I thought that the auto commit is per update handler because they are : configured within the update handler tag. updateHandler is not the same thing as a requestHandler that does updates. there can be many Update request handlers configured, but there is only ever one updateHandler/ in a SolrCore. -Hoss http://www.lucidworks.com/
Replication of a corrupt master index
Hi, If I have a master/slave setup and the master index gets corrupted, will the slaves realize they should not replicate from the master anymore, since the master does not have a newer index version? I'm using Solr version 4.2.1. Regards, Johannes
Re: SOLR not starting after restart 2 node cloud setup
Dear Erick, Thanks for your thoughts, it helped me a lot. In my instances no solr logs are appended in to catalina.out. Now I placed the log4j.properties file. Solr logs are captured in solr.log file with the help of it I found the reason for the issue. I am starting tomcat with the option -Dbootstrap_conf=true which made solr to look for core configuration files in a wrong directory, after removing this it started without any issues. I also commented suggester component which made solr to load fast. Thanks, Doss. On Thu, Nov 20, 2014 at 9:47 PM, Erick Erickson erickerick...@gmail.com wrote: Doss: Tomcat often puts things in catalina.out, you might check there, I've often seen logging information from Solr go there by default. Without having some idea what kinds of problems Solr is reporting when you see this situation, it's really hard to say. Some things I'd check first though, in order of what I _guess_ is most likely. There have been anecdotal reports (in fact, I'm trying to understand the why of it right now) of the suggester taking a long time to initialize, even if you don't use it! So if you're not using the suggest component, try commenting out those sections in solrconfig.xml for the cores in question. I like this explanation since it fits with your symptoms, but I don't like it since the index you are using isn't all that big. So it's something of a shot in the dark. I expect that the core will _eventually_ come up, but I've seen reports of 10-15 minutes being required, far beyond my patience! That said, this would also explain why deleting the index works. OutOfMemory errors. You might be able to attach jConsole (part of the standard Java stuff) to the process and monitor the memory usage. If it's being pushed near the 5G limit that's the first thing I'd suspect. If you're using the default setups, then the Zookeeper timeout may be too low, I think the default (not sure about whether it's been changed in 4.9) is 15 seconds, 30-60 is usually much better. Best, Erick On Thu, Nov 20, 2014 at 3:47 AM, Doss itsmed...@gmail.com wrote: Dear Erick, Forgive my ignorance. Please find some of the details you required. *have you looked at the solr logs?* Sorry I haven't defined the log4j.properties file, so I don't have solr logs. Since it requires tomcat restart I am planning to do it in next restart. But found the following in tomcat log 18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web application [/mima] appears to have started a thread named [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) *How big are the cores?* We have 16 cores, out of it only 5 are big ones. Total size of all 16 cores is 10+ GB *How many docs in the cores when the problem happens?* 1 core with 163 fields and 33,00,000 documents (Index size 2+ GB) 4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5 GB) remaining cores are 1,00,000 to 40,00,000 documents *How much memory are you allocating the JVM? * 5GB for JVM, Total RAM available in the systems is 30 GB *can you restart Tomcat without a problem?* This problem is occurring in production, I never tried. Thanks, Doss. On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson erickerick...@gmail.com wrote: You've really got to provide details for us to say much of anything. There are about a zillion things that it could be. In particular, have you looked at the solr logs? Are there any interesting things in them? How big are the cores? How much memory are you allocating the JVM? How many docs in the cores when the problem happens? Before the nodes stop responding, can you restart Tomcat without a problem? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Wed, Nov 19, 2014 at 1:04 AM, Doss itsmed...@gmail.com wrote: I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat in Node 1, but SOLR not starting up, but if I remove the solr cores in both nodes and try restarting it starts working, and then I have to reindex the whole data again. We are using this setup in production because of this issue we
Re: Getting the position of a word via Solr API
Small update, I have managed making the Term Vector to work and I am getting all the words of the text field. The problem is that it doesn't work with several words combined, I can't find the offset of the needed expression starts... Any ideas anyone? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-the-position-of-a-word-via-Solr-API-tp4171877p4172092.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Start With and contain search
It's not clear what you actually mean with that space. Do you mean any two words should try to match as if they were one? What's the business-level description of what you are trying to do? Also, you are not reinventing https://domainr.com/ , are you? If you are, search around, I think they had some technical architecture description somewhere in early articles. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 2 December 2014 at 03:15, melb melaggo...@gmail.com wrote: Thanks, I think the NgramFitlerFactory is the good filter, I will try it today but what if I want to search for query :* dom host *and get the result: *domhost* -- View this message in context: http://lucene.472066.n3.nabble.com/Start-With-and-contain-search-tp4171854p4172031.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Start With and contain search
Yes this is exactly what I am trying to do but with less extended database can I do it with solr? rgds, -- View this message in context: http://lucene.472066.n3.nabble.com/Start-With-and-contain-search-tp4171854p4172105.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Start With and contain search
Well, if all you are doing is substring searches, then Solr could be an overkill. But if you doing a search and then want to do faceting or additional query, then Solr is a good bet. And yes, it can do it, you just need to really understand your input patterns and what you want to find with them. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 2 December 2014 at 09:20, melb melaggo...@gmail.com wrote: Yes this is exactly what I am trying to do but with less extended database can I do it with solr? rgds, -- View this message in context: http://lucene.472066.n3.nabble.com/Start-With-and-contain-search-tp4171854p4172105.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Get list of collection
I think you want CloudSolrServer.getCollectionList() Best, Erick On Tue, Dec 2, 2014 at 12:27 AM, Ankit Jain ankitjainc...@gmail.com wrote: Hi All, I have a requirement to get the list of *collection* available in Solr. we are using solrj library. I am able to fetch the list of cores but not getting ways to fetch the list of collections. Below is the sample example, that i am using to fetch the cores: CoreAdminRequest request = new CoreAdminRequest(); request.setAction(CoreAdminAction.STATUS); CoreAdminResponse cores = request.process(server); // List of the cores ListString coreList = new ArrayListString(); for (int i = 0; i cores.getCoreStatus().size(); i++) { coreList.add(cores.getCoreStatus().getName(i)); } Please help. -- Thanks, Ankit Jain
Re: Slow queries
bq: Is it better to put the solr on dedicated machine? Yes, absolutely. Solr _likes_ memory, and on a machine with lots of other processes you'll keep running into this problem. FWIW, I've seen between 10M and 300M docs fit into 16G for the JVM. But see Uwe's excellent blog on MMapDirectory and not over-allocating memory to the JVM here: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html Also see: https://wiki.apache.org/solr/SolrPerformanceProblems and http://wiki.apache.org/solr/SolrPerformanceFactors Best, Erick On Tue, Dec 2, 2014 at 1:02 AM, melb melaggo...@gmail.com wrote: Yes performance degraded over the time, I can raise the memory but I can't do it every time and the volume will keep growing Is it better to put the solr on dedicated machine? Is there any thing else that can be done to the solr instance for example deviding the collection? rgds, -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032p4172039.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Replication of a corrupt master index
No. The master is the master and will always stay the master unless you change it. This is one of the reasons I really like to keep the original source around in case I every have this problem. Best, Erick On Tue, Dec 2, 2014 at 2:34 AM, Charra, Johannes johannes.charrahorstm...@haufe-lexware.com wrote: Hi, If I have a master/slave setup and the master index gets corrupted, will the slaves realize they should not replicate from the master anymore, since the master does not have a newer index version? I'm using Solr version 4.2.1. Regards, Johannes
Re: SOLR not starting after restart 2 node cloud setup
Glad you found a solution! Best, Erick On Tue, Dec 2, 2014 at 4:30 AM, Doss itsmed...@gmail.com wrote: Dear Erick, Thanks for your thoughts, it helped me a lot. In my instances no solr logs are appended in to catalina.out. Now I placed the log4j.properties file. Solr logs are captured in solr.log file with the help of it I found the reason for the issue. I am starting tomcat with the option -Dbootstrap_conf=true which made solr to look for core configuration files in a wrong directory, after removing this it started without any issues. I also commented suggester component which made solr to load fast. Thanks, Doss. On Thu, Nov 20, 2014 at 9:47 PM, Erick Erickson erickerick...@gmail.com wrote: Doss: Tomcat often puts things in catalina.out, you might check there, I've often seen logging information from Solr go there by default. Without having some idea what kinds of problems Solr is reporting when you see this situation, it's really hard to say. Some things I'd check first though, in order of what I _guess_ is most likely. There have been anecdotal reports (in fact, I'm trying to understand the why of it right now) of the suggester taking a long time to initialize, even if you don't use it! So if you're not using the suggest component, try commenting out those sections in solrconfig.xml for the cores in question. I like this explanation since it fits with your symptoms, but I don't like it since the index you are using isn't all that big. So it's something of a shot in the dark. I expect that the core will _eventually_ come up, but I've seen reports of 10-15 minutes being required, far beyond my patience! That said, this would also explain why deleting the index works. OutOfMemory errors. You might be able to attach jConsole (part of the standard Java stuff) to the process and monitor the memory usage. If it's being pushed near the 5G limit that's the first thing I'd suspect. If you're using the default setups, then the Zookeeper timeout may be too low, I think the default (not sure about whether it's been changed in 4.9) is 15 seconds, 30-60 is usually much better. Best, Erick On Thu, Nov 20, 2014 at 3:47 AM, Doss itsmed...@gmail.com wrote: Dear Erick, Forgive my ignorance. Please find some of the details you required. *have you looked at the solr logs?* Sorry I haven't defined the log4j.properties file, so I don't have solr logs. Since it requires tomcat restart I am planning to do it in next restart. But found the following in tomcat log 18-Nov-2014 11:27:29.028 WARNING [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoader.clearReferencesThreads The web application [/mima] appears to have started a thread named [localhost-startStop-1-SendThread(10.236.149.28:2181)] but has failed to stop it. This is very likely to create a memory leak. Stack trace of thread: sun.nio.ch.EPollArrayWrapper.epollWait(Native Method) sun.nio.ch.EPollArrayWrapper.poll(EPollArrayWrapper.java:269) sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:79) sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:87) sun.nio.ch.SelectorImpl.select(SelectorImpl.java:98) org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:349) org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081) *How big are the cores?* We have 16 cores, out of it only 5 are big ones. Total size of all 16 cores is 10+ GB *How many docs in the cores when the problem happens?* 1 core with 163 fields and 33,00,000 documents (Index size 2+ GB) 4 cores with 3 fields and has 150,00,000 (approx) documents (1.2 to 1.5 GB) remaining cores are 1,00,000 to 40,00,000 documents *How much memory are you allocating the JVM? * 5GB for JVM, Total RAM available in the systems is 30 GB *can you restart Tomcat without a problem?* This problem is occurring in production, I never tried. Thanks, Doss. On Wed, Nov 19, 2014 at 7:55 PM, Erick Erickson erickerick...@gmail.com wrote: You've really got to provide details for us to say much of anything. There are about a zillion things that it could be. In particular, have you looked at the solr logs? Are there any interesting things in them? How big are the cores? How much memory are you allocating the JVM? How many docs in the cores when the problem happens? Before the nodes stop responding, can you restart Tomcat without a problem? You might review: http://wiki.apache.org/solr/UsingMailingLists Best, Erick On Wed, Nov 19, 2014 at 1:04 AM, Doss itsmed...@gmail.com wrote: I have two node SOLR (4.9.0) cloud with Tomcat (8), Zookeeper. At times SOLR in Node 1 stops responding, to fix the issue I am restarting tomcat in Node 1, but SOLR not starting up, but if I remove the solr cores in both nodes and try restarting it starts working, and
Re: Slow queries
It might be a good idea to * move SOLR to a dedicated box :-) * load your SOLR server with 20.000.000 documents (the estimated number of documents after three years) and do performance testing tuning Afterwards you have some hard facts about hardware sizing and expected performance for the next three years :-) Cheers, Siegfried Goeschl On 02 Dec 2014, at 10:02, melb melaggo...@gmail.com wrote: Yes performance degraded over the time, I can raise the memory but I can't do it every time and the volume will keep growing Is it better to put the solr on dedicated machine? Is there any thing else that can be done to the solr instance for example deviding the collection? rgds, -- View this message in context: http://lucene.472066.n3.nabble.com/Slow-queries-tp4172032p4172039.html Sent from the Solr - User mailing list archive at Nabble.com.
AW: Replication of a corrupt master index
Thanks for your response, Erick. Do you think it is possible to corrupt an index merely with HTTP requests? I've been using the aforementioned m/s setup for years now and have never seen a master failure. I'm trying to think of scenarios where this setup (1 master, 4 slaves) might have a total outage. The master runs on a h/a cluster. Regards, Johannes -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Dienstag, 2. Dezember 2014 15:54 An: solr-user@lucene.apache.org Betreff: Re: Replication of a corrupt master index No. The master is the master and will always stay the master unless you change it. This is one of the reasons I really like to keep the original source around in case I every have this problem. Best, Erick On Tue, Dec 2, 2014 at 2:34 AM, Charra, Johannes johannes.charrahorstm...@haufe-lexware.com wrote: Hi, If I have a master/slave setup and the master index gets corrupted, will the slaves realize they should not replicate from the master anymore, since the master does not have a newer index version? I'm using Solr version 4.2.1. Regards, Johannes
Find duplicates
Hi Is it possible to formulate a Solr query which finds all documents which have the same value in a particular field? Note, I don't know what the value is, I just want to find all documents with duplicate values. For example, I have 5 documents: Doc1: field Name = Peter Doc2: field Name = Jack Doc3: field Name = Peter Doc4: field Name = Paul Doc5: field Name = Jack If I executed the query, it would find documents Doc1 and Doc3 (Peter is the same), and Doc2 and Doc5 (Jack is the same). Thanks, Peter
Re: Find duplicates
Sort of… if you indexed the full value of the field (and you’re looking for truly exact matches) as a string field type you could facet on that field with facet.mincount=2 and the facets returned would be the ones with duplicate values. You’d have to drill down on each of the facets returned to find the actual docs. Erik On Dec 2, 2014, at 10:57 AM, Peter Kirk p...@alpha-solutions.dk wrote: Hi Is it possible to formulate a Solr query which finds all documents which have the same value in a particular field? Note, I don't know what the value is, I just want to find all documents with duplicate values. For example, I have 5 documents: Doc1: field Name = Peter Doc2: field Name = Jack Doc3: field Name = Peter Doc4: field Name = Paul Doc5: field Name = Jack If I executed the query, it would find documents Doc1 and Doc3 (Peter is the same), and Doc2 and Doc5 (Jack is the same). Thanks, Peter
RE: Find duplicates
Have you tried using result grouping for your query? There are some very good examples in the wiki: https://wiki.apache.org/solr/FieldCollapsing Gonzalo -Original Message- From: Peter Kirk [mailto:p...@alpha-solutions.dk] Sent: Tuesday, December 02, 2014 9:58 AM To: solr-user@lucene.apache.org Subject: Find duplicates Hi Is it possible to formulate a Solr query which finds all documents which have the same value in a particular field? Note, I don't know what the value is, I just want to find all documents with duplicate values. For example, I have 5 documents: Doc1: field Name = Peter Doc2: field Name = Jack Doc3: field Name = Peter Doc4: field Name = Paul Doc5: field Name = Jack If I executed the query, it would find documents Doc1 and Doc3 (Peter is the same), and Doc2 and Doc5 (Jack is the same). Thanks, Peter
spellchecker returns correctlySpelled=true if one term in phrase is correctly spelled
Hi, It seems that when I do a phrase search, SOLR's spellchecker would return correctlySpelled=true if at least one term in the phrase was correctly spelled. For example: If I search for soriasis treatment, SOLR returns over 8000 search results for treatment, correctlySpelled: true, and a spelling suggestion of psoriasis for soriasis. If I search for soriasis treatment, SOLR returns 0 results, correctlySpelled:false, and spelling suggestings for both soriasis and treatmnt. Does this mean if I want to display a Did You Mean for soriasis treatment, I need to 1) Check if there are any suggestions returned by spellchecker for any of the terms, and 2) Compare the number of hits for each collation with the numFound for original query? Another spellchecker question I have is how can I configure SOLR to suggest heart attack if someone searches for heart attach? Technically, there are no misspellings, but heart attach as a phrase does not make sense. Thanks, Jing
Re: Find duplicates
And if I am correct, enabling docValues will do this kind of grouping as part of the indexing with docValues data structure (per segment). So, all one has to do is to get it back (through faceting). Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 2 December 2014 at 11:02, Erik Hatcher erik.hatc...@gmail.com wrote: Sort of… if you indexed the full value of the field (and you’re looking for truly exact matches) as a string field type you could facet on that field with facet.mincount=2 and the facets returned would be the ones with duplicate values. You’d have to drill down on each of the facets returned to find the actual docs. Erik On Dec 2, 2014, at 10:57 AM, Peter Kirk p...@alpha-solutions.dk wrote: Hi Is it possible to formulate a Solr query which finds all documents which have the same value in a particular field? Note, I don't know what the value is, I just want to find all documents with duplicate values. For example, I have 5 documents: Doc1: field Name = Peter Doc2: field Name = Jack Doc3: field Name = Peter Doc4: field Name = Paul Doc5: field Name = Jack If I executed the query, it would find documents Doc1 and Doc3 (Peter is the same), and Doc2 and Doc5 (Jack is the same). Thanks, Peter
Re: Contextual search
Hi alex thnx .i was able to get the get the suggestion for thri book as the book of three.but when i search for threebook (three and book are now combined)then i am not able to get the suggestn for a book of three.how we solve this? On 01-Dec-2014 9:34 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: If you need Solr to treat 'thri' (invalid English) as 'three', you need to tell it to do so. Look at the synonym modules in the example's schema.xml. Or you could do phonetic matches. You have a couple of choices for those, but basically it's all about the specific analyzer chains to experiment with. So, start with that and come back if you still have troubles once you understand the way analyzers work. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 1 December 2014 at 09:46, ASHOK SARMAH ashoksarmah1...@gmail.com wrote: Hii all .i wanted to know how solr performs contextual search.actually in my search list i had given the query as three book.i got the suggestn as a book of three.which i wanted.but when i specify it as thri book.it specifies me of spelling check for thri as three its fyn.but why i dont get in this case result as a book of three.like previous.
Re: SOLR Join Query, Use highest weight.
Thanks! I will take a look at this. I do have an additional question, since after a bunch of digging I believe I am going to run into another dead end. I want to execute the join (or rollup) query, but I want the facets to represent the facets of all the child documents, not the resulting product documents. From what I gather, this is not possible. My thought process of what I want to get goes as follows: 1) Execute my search for children 2) Get the facets for all the children 3) Rollup the child dataset into its parent dataset, keeping the score. Is this easily possible with the tools available today? Thanks! Darin On Dec 1, 2014, at 11:01 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, AFAIK {!join} doesn't supply any meaningful scores. I can suggest https://issues.apache.org/jira/browse/SOLR-6234 https://issues.apache.org/jira/browse/SOLR-6234 On Tue, Dec 2, 2014 at 4:35 AM, Darin Amos dari...@gmail.com mailto:dari...@gmail.com wrote: Hello, I had sent an email a few days ago talking about implementing a custom rollup query component. I have changed directions a little bit because I have learned about the JoinQuery. I have an index that contains a combination of parent and child documents. The parent child relationship is always one-to-many. Here is a very simple sample query: http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child When I have a more specific query that actually give some meaningful weights: q=name:(*Shirt*)%20OR%20name:(*Small*) , it appears the rollup query assigns a weight to the parent of the last document encountered. For example, if a parents 2 children has weights of 1.4 and 0.4 without the join query, the parent has a weight of 0.4 after the join query. Is there a way that I can extend or modify the join query so it would assign the highest child weight to the parent document? Thanks!! Darin -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com http://www.griddynamics.com/ mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com
Tika HTTP 400 Errors with DIH
Hi all, I am using Solr 4.9.0 to index a DB with DIH. In the DB there is a URL field. In the DIH Tika uses that field to fetch and parse the documents. The URL from the field is valid and will download the document in the browser just fine. But Tika is getting HTTP response code 400. Any ideas why? ERROR BinURLDataSource java.io.IOException: Server returned HTTP response code: 400 for URL: EntityProcessorWrapper Exception in entity : tika_content:org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in invoking url DIH dataConfig dataSource type=JdbcDataSource name=ds-1 driver=net.sourceforge.jtds.jdbc.Driver url=jdbc:jtds:sqlserver://1.2.3.4/database;instance=INSTANCE;user=USER;pass word=PASSWORD / dataSource type=BinURLDataSource name=ds-2 / document entity name=db_content dataSource=ds-1 transformer=ClobTransformer, RegexTransformer query=SELECT ContentID, DownloadURL FROM DATABASE.VIEW field column=ContentID name=id / field column=DownloadURL clob=true name=DownloadURL / entity name=tika_content processor=TikaEntityProcessor url=${db_content.DownloadURL} onError=continue dataSource=ds-2 field column=TikaParsedContent / /entity /entity /document /dataConfig SCHEMA - Fields field name=DownloadURL type=string indexed=true stored=true / field name=TikaParsedContent type=text_general indexed=true stored=true multiValued=true/
Re: Contextual search
Well, how would you expect it to solve it - in non-technical terms. What's the high level description of book of three matching threebook and not say threeof? Random permutation of any two words? It's a bit of a strange requirement so far. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 2 December 2014 at 12:55, ASHOK SARMAH ashoksarmah1...@gmail.com wrote: Hi alex thnx .i was able to get the get the suggestion for thri book as the book of three.but when i search for threebook (three and book are now combined)then i am not able to get the suggestn for a book of three.how we solve this? On 01-Dec-2014 9:34 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: If you need Solr to treat 'thri' (invalid English) as 'three', you need to tell it to do so. Look at the synonym modules in the example's schema.xml. Or you could do phonetic matches. You have a couple of choices for those, but basically it's all about the specific analyzer chains to experiment with. So, start with that and come back if you still have troubles once you understand the way analyzers work. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 1 December 2014 at 09:46, ASHOK SARMAH ashoksarmah1...@gmail.com wrote: Hii all .i wanted to know how solr performs contextual search.actually in my search list i had given the query as three book.i got the suggestn as a book of three.which i wanted.but when i specify it as thri book.it specifies me of spelling check for thri as three its fyn.but why i dont get in this case result as a book of three.like previous.
Re: Tika HTTP 400 Errors with DIH
On 2 December 2014 at 13:19, Teague James teag...@insystechinc.com wrote: clob=true What does ClobTransformer is doing on the DownloadURL field? Is it possible it is corrupting the value somehow? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853
Re: SOLR Join Query, Use highest weight.
Have you considered using grouping? If I understand your requirements, I think it does what you want. https://cwiki.apache.org/confluence/display/solr/Result+Grouping On 12/02/2014 12:59 PM, Darin Amos wrote: Thanks! I will take a look at this. I do have an additional question, since after a bunch of digging I believe I am going to run into another dead end. I want to execute the join (or rollup) query, but I want the facets to represent the facets of all the child documents, not the resulting product documents. From what I gather, this is not possible. My thought process of what I want to get goes as follows: 1) Execute my search for children 2) Get the facets for all the children 3) Rollup the child dataset into its parent dataset, keeping the score. Is this easily possible with the tools available today? Thanks! Darin On Dec 1, 2014, at 11:01 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, AFAIK {!join} doesn't supply any meaningful scores. I can suggest https://issues.apache.org/jira/browse/SOLR-6234 https://issues.apache.org/jira/browse/SOLR-6234 On Tue, Dec 2, 2014 at 4:35 AM, Darin Amos dari...@gmail.com mailto:dari...@gmail.com wrote: Hello, I had sent an email a few days ago talking about implementing a custom rollup query component. I have changed directions a little bit because I have learned about the JoinQuery. I have an index that contains a combination of parent and child documents. The parent child relationship is always one-to-many. Here is a very simple sample query: http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child When I have a more specific query that actually give some meaningful weights: q=name:(*Shirt*)%20OR%20name:(*Small*) , it appears the rollup query assigns a weight to the parent of the last document encountered. For example, if a parents 2 children has weights of 1.4 and 0.4 without the join query, the parent has a weight of 0.4 after the join query. Is there a way that I can extend or modify the join query so it would assign the highest child weight to the parent document? Thanks!! Darin -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com http://www.griddynamics.com/ mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com
Re: Getting the position of a word via Solr API
I would keep trying with the highlighters. Some of them, at least, have options to provide an external text source, although you will almost certainly have to write some java code to get this working; extend the highlighter you choose and supply its text from an external source. -Mike On 12/02/2014 08:13 AM, adfel70 wrote: Small update, I have managed making the Term Vector to work and I am getting all the words of the text field. The problem is that it doesn't work with several words combined, I can't find the offset of the needed expression starts... Any ideas anyone? Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/Getting-the-position-of-a-word-via-Solr-API-tp4171877p4172092.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR Join Query, Use highest weight.
Hi, Thanks for the response, I have considered grouping often, but grouping does not return the parent document, just the group id. I would still have to add something to take the group id’s and get the parent documents. Thanks Darin On Dec 2, 2014, at 2:11 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: Have you considered using grouping? If I understand your requirements, I think it does what you want. https://cwiki.apache.org/confluence/display/solr/Result+Grouping https://cwiki.apache.org/confluence/display/solr/Result+Grouping On 12/02/2014 12:59 PM, Darin Amos wrote: Thanks! I will take a look at this. I do have an additional question, since after a bunch of digging I believe I am going to run into another dead end. I want to execute the join (or rollup) query, but I want the facets to represent the facets of all the child documents, not the resulting product documents. From what I gather, this is not possible. My thought process of what I want to get goes as follows: 1) Execute my search for children 2) Get the facets for all the children 3) Rollup the child dataset into its parent dataset, keeping the score. Is this easily possible with the tools available today? Thanks! Darin On Dec 1, 2014, at 11:01 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, AFAIK {!join} doesn't supply any meaningful scores. I can suggest https://issues.apache.org/jira/browse/SOLR-6234 https://issues.apache.org/jira/browse/SOLR-6234 https://issues.apache.org/jira/browse/SOLR-6234 https://issues.apache.org/jira/browse/SOLR-6234 On Tue, Dec 2, 2014 at 4:35 AM, Darin Amos dari...@gmail.com mailto:dari...@gmail.com mailto:dari...@gmail.com mailto:dari...@gmail.com wrote: Hello, I had sent an email a few days ago talking about implementing a custom rollup query component. I have changed directions a little bit because I have learned about the JoinQuery. I have an index that contains a combination of parent and child documents. The parent child relationship is always one-to-many. Here is a very simple sample query: http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:childhttp://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child When I have a more specific query that actually give some meaningful weights: q=name:(*Shirt*)%20OR%20name:(*Small*) , it appears the rollup query assigns a weight to the parent of the last document encountered. For example, if a parents 2 children has weights of 1.4 and 0.4 without the join query, the parent has a weight of 0.4 after the join query. Is there a way that I can extend or modify the join query so it would assign the highest child weight to the parent document? Thanks!! Darin -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com http://www.griddynamics.com/ http://www.griddynamics.com/ http://www.griddynamics.com/ mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com
Re: SOLR Join Query, Use highest weight.
We simply index parent and child documents with the same field value, and group on that, querying both parent and child documents. If you boost the parent it will show up as the first result in the group. Then you get all related documents together. in the same group. -Mike On 12/02/2014 02:27 PM, Darin Amos wrote: Hi, Thanks for the response, I have considered grouping often, but grouping does not return the parent document, just the group id. I would still have to add something to take the group id’s and get the parent documents. Thanks Darin On Dec 2, 2014, at 2:11 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: Have you considered using grouping? If I understand your requirements, I think it does what you want. https://cwiki.apache.org/confluence/display/solr/Result+Grouping https://cwiki.apache.org/confluence/display/solr/Result+Grouping On 12/02/2014 12:59 PM, Darin Amos wrote: Thanks! I will take a look at this. I do have an additional question, since after a bunch of digging I believe I am going to run into another dead end. I want to execute the join (or rollup) query, but I want the facets to represent the facets of all the child documents, not the resulting product documents. From what I gather, this is not possible. My thought process of what I want to get goes as follows: 1) Execute my search for children 2) Get the facets for all the children 3) Rollup the child dataset into its parent dataset, keeping the score. Is this easily possible with the tools available today? Thanks! Darin On Dec 1, 2014, at 11:01 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello, AFAIK {!join} doesn't supply any meaningful scores. I can suggest https://issues.apache.org/jira/browse/SOLR-6234 https://issues.apache.org/jira/browse/SOLR-6234 https://issues.apache.org/jira/browse/SOLR-6234 https://issues.apache.org/jira/browse/SOLR-6234 On Tue, Dec 2, 2014 at 4:35 AM, Darin Amos dari...@gmail.com mailto:dari...@gmail.com mailto:dari...@gmail.com mailto:dari...@gmail.com wrote: Hello, I had sent an email a few days ago talking about implementing a custom rollup query component. I have changed directions a little bit because I have learned about the JoinQuery. I have an index that contains a combination of parent and child documents. The parent child relationship is always one-to-many. Here is a very simple sample query: http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child http://localhost:8983/solr/testcore/select?q=*:*fq={!join%20from=parent%20to=id}type:child http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:childhttp://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child http://localhost:8983/solr/testcore/select?q=*:*fq=%7B!join%20from=parent%20to=id%7Dtype:child When I have a more specific query that actually give some meaningful weights: q=name:(*Shirt*)%20OR%20name:(*Small*) , it appears the rollup query assigns a weight to the parent of the last document encountered. For example, if a parents 2 children has weights of 1.4 and 0.4 without the join query, the parent has a weight of 0.4 after the join query. Is there a way that I can extend or modify the join query so it would assign the highest child weight to the parent document? Thanks!! Darin -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com http://www.griddynamics.com/ http://www.griddynamics.com/ http://www.griddynamics.com/ mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com mailto:mkhlud...@griddynamics.com
Solr collection alias - how rank is affected
Solr allows create an alias for few collection via its API Suppose I have two collection C1 C2 and an alias C3 = C1 , C2 C1 and C2 deployed on different machines , but has a mutual ZooKeeper . How rank is affected when searching C3 collection ? when they has same schema ? different schema ? ( it is possible to search on different schemas if we use field alias on both collections ) -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-collection-alias-how-rank-is-affected-tp4172197.html Sent from the Solr - User mailing list archive at Nabble.com.
indexing numbers in texts for range queries
Hello Searchers, Don't you remember any examples of indexing numbers inside of plain text. eg. if I have a text: foo and 10 bars I want to find it with a query like foo [8 TO 20] bars. The question no.1 whether to put trie terms into the separate field or they can reside at the same text one? Note, enumerating [0-9]* terms in MultiTermQuery is not an option for me, I definitely need the trie field magic! Perhaps you can remind a blog or chapter, whatever makes me happy. Thanks a lot! -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: indexing numbers in texts for range queries
Mikhail - I can imagine a filter that strips out everything but numbers and then indexes those with a (separate) numeric (trie) field. But I don't believe you can do phrase or other proximity queries across multiple fields. As long as an or-query is good enough, I think this problem is not too hard? But if you need proximity it becomes more complicated. Once in the distant past we coded a numeric range query using a complicated set of wildcard queries that could handle large numbers efficiently - this search index (Verity) had no range capability, so we had to mock it up using text. The way this worked was something along these lines: 1) transform all the numbers into their binary encoding (8 = 0b1000, eg) 2) write queries by encoding the range as a set of bitmasks represented by wildcard queries: [8 TO 20] becomes (0b1000 0b000100?? 0b00010100) I know you said you cannot use [0-9]* terms, but you will not see terrible term explosion with this. What's your concern there? -Mike On 12/02/2014 02:59 PM, Mikhail Khludnev wrote: Hello Searchers, Don't you remember any examples of indexing numbers inside of plain text. eg. if I have a text: foo and 10 bars I want to find it with a query like foo [8 TO 20] bars. The question no.1 whether to put trie terms into the separate field or they can reside at the same text one? Note, enumerating [0-9]* terms in MultiTermQuery is not an option for me, I definitely need the trie field magic! Perhaps you can remind a blog or chapter, whatever makes me happy. Thanks a lot!
Re: indexing numbers in texts for range queries
Hello Michael, On Tue, Dec 2, 2014 at 11:15 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: Mikhail - I can imagine a filter that strips out everything but numbers and then indexes those with a (separate) numeric (trie) field. But I don't believe you can do phrase or other proximity queries across multiple fields. Technically it's not a big deal. I used FieldMaskingSpanQuery before. As long as an or-query is good enough, I think this problem is not too hard? But if you need proximity it becomes more complicated. Once in the distant past we coded a numeric range query using a complicated set of wildcard queries that could handle large numbers efficiently - this search index (Verity) had no range capability, so we had to mock it up using text. The way this worked was something along these lines: 1) transform all the numbers into their binary encoding (8 = 0b1000, eg) 2) write queries by encoding the range as a set of bitmasks represented by wildcard queries: [8 TO 20] becomes (0b1000 0b000100?? 0b00010100) I know you said you cannot use [0-9]* terms, but you will not see terrible term explosion with this. What's your concern there? it's not terrible but significant, I wish to make a try with the trie magic, which reduces query time processing. Thanks for suggestions. Do I remember correctly that you ignored last Lucene Revolution? -Mike On 12/02/2014 02:59 PM, Mikhail Khludnev wrote: Hello Searchers, Don't you remember any examples of indexing numbers inside of plain text. eg. if I have a text: foo and 10 bars I want to find it with a query like foo [8 TO 20] bars. The question no.1 whether to put trie terms into the separate field or they can reside at the same text one? Note, enumerating [0-9]* terms in MultiTermQuery is not an option for me, I definitely need the trie field magic! Perhaps you can remind a blog or chapter, whatever makes me happy. Thanks a lot! -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: indexing numbers in texts for range queries
On 12/02/2014 03:41 PM, Mikhail Khludnev wrote: Thanks for suggestions. Do I remember correctly that you ignored last Lucene Revolution? I wouldn't say I ignored it, but it's true I wasn't there in DC: I'm excited to catch up on the presentations as the videos become available, though. -Mike
Re: indexing numbers in texts for range queries
Hi Mikhail, Range queries allowed inside phrases with ComplexPhraseQParser, but I think string order is used. Also LUCENE-5205 / SOLR-5410 is meant to supersede complex phrase. It might have that functionality too. Ahmet On Tuesday, December 2, 2014 10:43 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Michael, On Tue, Dec 2, 2014 at 11:15 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: Mikhail - I can imagine a filter that strips out everything but numbers and then indexes those with a (separate) numeric (trie) field. But I don't believe you can do phrase or other proximity queries across multiple fields. Technically it's not a big deal. I used FieldMaskingSpanQuery before. As long as an or-query is good enough, I think this problem is not too hard? But if you need proximity it becomes more complicated. Once in the distant past we coded a numeric range query using a complicated set of wildcard queries that could handle large numbers efficiently - this search index (Verity) had no range capability, so we had to mock it up using text. The way this worked was something along these lines: 1) transform all the numbers into their binary encoding (8 = 0b1000, eg) 2) write queries by encoding the range as a set of bitmasks represented by wildcard queries: [8 TO 20] becomes (0b1000 0b000100?? 0b00010100) I know you said you cannot use [0-9]* terms, but you will not see terrible term explosion with this. What's your concern there? it's not terrible but significant, I wish to make a try with the trie magic, which reduces query time processing. Thanks for suggestions. Do I remember correctly that you ignored last Lucene Revolution? -Mike On 12/02/2014 02:59 PM, Mikhail Khludnev wrote: Hello Searchers, Don't you remember any examples of indexing numbers inside of plain text. eg. if I have a text: foo and 10 bars I want to find it with a query like foo [8 TO 20] bars. The question no.1 whether to put trie terms into the separate field or they can reside at the same text one? Note, enumerating [0-9]* terms in MultiTermQuery is not an option for me, I definitely need the trie field magic! Perhaps you can remind a blog or chapter, whatever makes me happy. Thanks a lot! -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Replication of a corrupt master index
If nothing else, the disk underlying the index could have a bad spot... There have been some corrupt index bugs in the past, but they always get a super-high priority for fixing so don't hang around for long. You can always take periodic backups. Perhaps the slickest way to do that is to set up a slave that does nothing but poll once/day. Since you know that's not changing, you can do simple disk copies of the index and at least minimize your possible outage. Now, all that said you may wan to consider SolrCloud. The advantage there is that each node gets the raw input and very rarely does replication. Failover is as simple in that scenario as killing the bad node and things just work. Best, Erick On Tue, Dec 2, 2014 at 7:40 AM, Charra, Johannes johannes.charrahorstm...@haufe-lexware.com wrote: Thanks for your response, Erick. Do you think it is possible to corrupt an index merely with HTTP requests? I've been using the aforementioned m/s setup for years now and have never seen a master failure. I'm trying to think of scenarios where this setup (1 master, 4 slaves) might have a total outage. The master runs on a h/a cluster. Regards, Johannes -Ursprüngliche Nachricht- Von: Erick Erickson [mailto:erickerick...@gmail.com] Gesendet: Dienstag, 2. Dezember 2014 15:54 An: solr-user@lucene.apache.org Betreff: Re: Replication of a corrupt master index No. The master is the master and will always stay the master unless you change it. This is one of the reasons I really like to keep the original source around in case I every have this problem. Best, Erick On Tue, Dec 2, 2014 at 2:34 AM, Charra, Johannes johannes.charrahorstm...@haufe-lexware.com wrote: Hi, If I have a master/slave setup and the master index gets corrupted, will the slaves realize they should not replicate from the master anymore, since the master does not have a newer index version? I'm using Solr version 4.2.1. Regards, Johannes
Re: Contextual search
HI Alex, I have specified following in my solrconfig.xml :: str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count5/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str str name=spellcheck.maxCollations3/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.MinBreakWordLength5/str I have written str name=spellcheck.dictionarywordbreak/str str name=spellcheck.MinBreakWordLength5/str to break the words with minimum length 5.then it should break my word threebook as three and book right?correct me if I am wrong.But I am not getting the required search results.Kindly suggest. On Wed, Dec 3, 2014 at 12:08 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Well, how would you expect it to solve it - in non-technical terms. What's the high level description of book of three matching threebook and not say threeof? Random permutation of any two words? It's a bit of a strange requirement so far. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 2 December 2014 at 12:55, ASHOK SARMAH ashoksarmah1...@gmail.com wrote: Hi alex thnx .i was able to get the get the suggestion for thri book as the book of three.but when i search for threebook (three and book are now combined)then i am not able to get the suggestn for a book of three.how we solve this? On 01-Dec-2014 9:34 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: If you need Solr to treat 'thri' (invalid English) as 'three', you need to tell it to do so. Look at the synonym modules in the example's schema.xml. Or you could do phonetic matches. You have a couple of choices for those, but basically it's all about the specific analyzer chains to experiment with. So, start with that and come back if you still have troubles once you understand the way analyzers work. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 1 December 2014 at 09:46, ASHOK SARMAH ashoksarmah1...@gmail.com wrote: Hii all .i wanted to know how solr performs contextual search.actually in my search list i had given the query as three book.i got the suggestn as a book of three.which i wanted.but when i specify it as thri book.it specifies me of spelling check for thri as three its fyn.but why i dont get in this case result as a book of three.like previous.
Re: Contextual search
HI Alex, I have specified these in the solrconfig.xml as:: str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count5/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str str name=spellcheck.maxCollations3/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.MinBreakWordLength5/str . The lines str name=spellcheck.dictionarywordbreak/str str name=spellcheck.MinBreakWordLength5/str are for breaking the word threebook as three and book .But then too its not searching for the string A book of three.Kindly suggest what all ways it can be done On Wed, Dec 3, 2014 at 12:08 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Well, how would you expect it to solve it - in non-technical terms. What's the high level description of book of three matching threebook and not say threeof? Random permutation of any two words? It's a bit of a strange requirement so far. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 2 December 2014 at 12:55, ASHOK SARMAH ashoksarmah1...@gmail.com wrote: Hi alex thnx .i was able to get the get the suggestion for thri book as the book of three.but when i search for threebook (three and book are now combined)then i am not able to get the suggestn for a book of three.how we solve this? On 01-Dec-2014 9:34 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: If you need Solr to treat 'thri' (invalid English) as 'three', you need to tell it to do so. Look at the synonym modules in the example's schema.xml. Or you could do phonetic matches. You have a couple of choices for those, but basically it's all about the specific analyzer chains to experiment with. So, start with that and come back if you still have troubles once you understand the way analyzers work. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 1 December 2014 at 09:46, ASHOK SARMAH ashoksarmah1...@gmail.com wrote: Hii all .i wanted to know how solr performs contextual search.actually in my search list i had given the query as three book.i got the suggestn as a book of three.which i wanted.but when i specify it as thri book.it specifies me of spelling check for thri as three its fyn.but why i dont get in this case result as a book of three.like previous.
Re: Contextual search
Sorry, beyond my area of expertise now. Hopefully somebody else will pitch in. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 2 December 2014 at 22:03, ASHOK SARMAH ashoksarmah1...@gmail.com wrote: HI Alex, I have specified these in the solrconfig.xml as:: str name=spellcheckon/str str name=spellcheck.extendedResultstrue/str str name=spellcheck.count5/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str str name=spellcheck.maxCollations3/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.MinBreakWordLength5/str . The lines str name=spellcheck.dictionarywordbreak/str str name=spellcheck.MinBreakWordLength5/str are for breaking the word threebook as three and book .But then too its not searching for the string A book of three.Kindly suggest what all ways it can be done On Wed, Dec 3, 2014 at 12:08 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Well, how would you expect it to solve it - in non-technical terms. What's the high level description of book of three matching threebook and not say threeof? Random permutation of any two words? It's a bit of a strange requirement so far. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 2 December 2014 at 12:55, ASHOK SARMAH ashoksarmah1...@gmail.com wrote: Hi alex thnx .i was able to get the get the suggestion for thri book as the book of three.but when i search for threebook (three and book are now combined)then i am not able to get the suggestn for a book of three.how we solve this? On 01-Dec-2014 9:34 PM, Alexandre Rafalovitch arafa...@gmail.com wrote: If you need Solr to treat 'thri' (invalid English) as 'three', you need to tell it to do so. Look at the synonym modules in the example's schema.xml. Or you could do phonetic matches. You have a couple of choices for those, but basically it's all about the specific analyzer chains to experiment with. So, start with that and come back if you still have troubles once you understand the way analyzers work. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 1 December 2014 at 09:46, ASHOK SARMAH ashoksarmah1...@gmail.com wrote: Hii all .i wanted to know how solr performs contextual search.actually in my search list i had given the query as three book.i got the suggestn as a book of three.which i wanted.but when i specify it as thri book.it specifies me of spelling check for thri as three its fyn.but why i dont get in this case result as a book of three.like previous.