Re: Re: How to properly use Levenstein distance with ~ in Java
Hi Aleksander, The Fuzzy Searche '~' is not supported in dismax (defType=dismax) https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser You are using SearchComponent spellchecker. This does not change the query results. btw: It looks like you are using path /select with qt=dismax. This normaly would throw an exception. Is there a tag requestHandler name=/dismax ... inside your solrconfig.xml ? Best regards Karsten P.S. in Context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-td4164793.html On 20 October 2014 11:13, Aleksander Sadecki wrote: Ok, thank you for your response. But why I cannot use '~'?
Re: unstable results on refresh
My user interface shows some boxes to describe results categories. After half a day of small updates and delete I noticed with various queries that the boxes started swapping while browsing. For sure I relied too much in getting the same results on each call, now I'm keeping the categories order in request parameters to avoid the blink effect while browsing. The optimize process is really slow, and I can't use it. Since I have many other parameters that should be carried along the request to make sure that the navigation is consistent, I would like to understand if is there a setup that can limit the idf change and keep it low enough I tried with indexConfig mergeFactor5/mergeFactor /indexConfig In solrconfig but this morning /solr/admin/cores?action=STATUS still reports a number of segments above ten for all cores of the shard. (I'm sure I have reloaded each core after changing the value) Now I'm trying with expungeDeletes called from solrj, but still I don't see the segment count decrease UpdateRequest commitRequest = new UpdateRequest(); commitRequest.setAction //(action, waitFlush, waitSearcher, maxSegments, softCommit, expungeDeletes) ( ACTION.COMMIT, true, true, 10, false, true); commitRequest.process(solrServer); 2014-10-22 15:48 GMT+02:00 Erick Erickson erickerick...@gmail.com: I would rather ask whether such small differences matter enough to do this. Is this something users will _ever_ notice? Optimization is quite a heavyweight operation, and is generally not recommended on indexes that change often, and 5 minutes is certainly below the recommendation for optimizing. There is/has been work done on distributed IDF, but I don't quite know the current status that should address this (I think). But other than in a test setup, is it worth the effort? Best, Erick On Wed, Oct 22, 2014 at 3:54 AM, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: I have made some small patch to the application to make this problem less visible, and I'm trying to perform the optimize once per hour, yesterday it took 5 minutes to perform it, this morning 15 minutes. Today I will collect some statistics but the publication process sends documents every 5 minutes, and I think the optimize is taking too much time. I have no default mergeFactor configured for this collection, do you think that setting it to a small value could improve the situation? If I have understood well having to merge segments will keep similar stats on all nodes. It's ok to have the indexing process a little bit slower. 2014-10-21 18:44 GMT+02:00 Erick Erickson erickerick...@gmail.com: Giovanni: To see how this happens, consider a shard with a leader and two followers. Assume your autocommit interval is 60 seconds on each. This interval can expire at slightly different wall clock times. Even if the servers started perfectly in synch, they can get slightly out of sync. So, you index a bunch of docs and these replicas close the current segment and re-open a new segment with slightly different contents. Now docs come in that replace older docs. The tf/idf statistics _include_ deleted document data (which is purged on optimize). Given that doc X an be in different segments (or, more accurately, segments that get merged at different times on different machines), replica 1 may have slightly different stats than replica 2, thus computing slightly different scores. Optimizing purges all data related to deleted documents, so it all regularizes itself on optimize. Best, Erick On Tue, Oct 21, 2014 at 11:08 AM, Giovanni Bricconi giovanni.bricc...@banzai.it wrote: I noticed again the problem, now I was able to collect some data. in my paste http://pastebin.com/nVwf327c you can see the result of the same query issued twice, the 2nd and 3rd group are swapped. I pasted also the clusterstate and the core state for each core. The logs did'n show any problem related to indexing, only some malformed query. After doing an optimize the problem disappeared. So, is the problem related to documents that where deleted from the index? The optimization took 5 minutes to complete 2014-10-21 11:41 GMT+02:00 Giovanni Bricconi giovanni.bricc...@banzai.it: Nice! I will monitor the index and try this if the problem comes back. Actually the problem was due to small differences in score, so I think the problem has the same origin 2014-10-21 8:10 GMT+02:00 lboutros boutr...@gmail.com: Hi Giovanni, we had this problem as well. The cause was that the different nodes have slightly different idf values. We solved this problem by doing an optimize operation which really remove suppressed data. Ludovic. - Jouve France. -- View this message in context:
Re: StatelessScriptUpdateProcessorFactory Access to Solr Core/schema/analyzer etc
On Oct 22, 2014, at 3:27 PM, Shawn Heisey apa...@elyograg.org wrote: On 10/22/2014 11:50 AM, Tom LAMPERT wrote: I am attempting to create a script (java script) using the StatelessScriptUpdateProcessorFactory feature of solr but I am blocked on how to access the current core instance (ultimately to access it's schema)? In the wikipedia example the input document is accessible using doc = cmd.solrDoc but no other information is given. The aim of the script is to apply any filters/tokenisers to the input fields before solr indexes them so that the stored values are those after processing, not the original data. Any tips would be gratefully received as I cannot find any info on the API for this framework... I would guess that you'd need to be writing Java code to have this kind of detail, not javascript. The info in the other replies you received is talking about Java code. Javascript would not be able to execute the analysis on the input anyway -- that's all Java as well. Ummm… see slides 10 and 11 here: http://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks So yes, you can do analysis tricks in an update script. And it’s incredibly useful and powerful! :) Erik
Analytics component
Hi All, I'm trying to use Solr to do some analytic function (percentile, median...). I got Trunck branch from Solr which contain the analytics component implementation. I've rebuild solr but unfortunately this component wasn't taken into consideration and no lib generated in /contrib/analytics. Do you have any idea how to get it complied. Otherwise, any idea to have this analytics in Solr. Regards, Nabil.
Re: Difference between unloading of cores with LotsOfCores and unloading a core with CoreAdmin
Memory should eventually be returned when a core is unloaded. There's a very small amount of overhead for keeping a list of all the cores and their locations, but this shouldn't increase with time unless you're adding more cores. Do note that the transient cache size is fixed, but may be exceeded. A core is held open when it gets reclaimed long enough to serve any outstanding requests, but it _should_ have the memory reclaimed eventually. Of course there's always the possibility of some memory being kept inadvertently, I'd consider that a bug so if you can define how this happens, perhaps with a test case that would be great. Dumping the memory would help see what's kept if anything actually is. Best, Erick On Wed, Oct 22, 2014 at 12:33 PM, Xiaolu Zhao xiaolu.z...@oracle.com wrote: Hi Erick, Thanks a lot for your explanation. Last time, when I try out LotsOfCores, I find JVM memory usage will increase as the total number of cores grows, though the transient cache size is fixed. Finally, JVM will run out of memory when I have thousands of cores. Does it mean other currently unloaded cores will consume memory? Or swapping among loaded/unloaded cores will consume memory? Best, Xiaolu On 10/22/2014 12:23 PM, Erick Erickson wrote: The difference here is that the LotsOfCores is intended to cache open cores and thus limit the number of currently loaded cores. However, cores not currently loaded are available for use; the next request that needs that core will cause it to be loaded (or reloaded). The admin/core/UNLOAD command, on the other hand, is designed to _permanently_ remove the core from Solr. Or at least have it become unavailable until another explicit admin/core command is executed to bring it back. There is nothing automatic about this. Another way of looking at it is that LotsOfCores is used in a situation where you don't know what requests are coming in, but you _can_ predict that not many will be used at once. So if I have 500 cores, and my expectation is that only 20 of them are used at once, there's no good in having the 480 other cores loaded all the time. When a query comes in for one of the currently-unloaded cores (call it core21), that core is loaded (perhaps displacing one of the currently-loaded cores) and the request is served. If core21 above had been unloaded with the core/admin command, then a request directed to it would return an error instead. Best, Erick On Wed, Oct 22, 2014 at 12:11 PM, Xiaolu Zhao xiaolu.z...@oracle.com wrote: Hi All, I am confused about the difference between unloading of cores with LotsOfCores and unloading a core with CoreAdmin. From my understanding of LotsOfCores, if one core is removed from transient cache, it is pending to close, it means close all resources allocated by the core if it is no longer in use, e.g. searcher, updateHandler... While for unloading a core with CoreAdmin, this core needs to be removed from the cores list, either ordinary cores list or transient cores list, and cores locator will delete it. If this core is loaded but not pending to close, it will be close. Also, one more interesting thing is if I unload a core with CoreAdmin, core.properties will be renamed core.properties.unloaded. Then this core cannot be found in the Solr API, and STATUS url won't return its status as well. But with LotsOfCores, a core not in the transient cache will still have core.properties and could be found through STATUS url, though it is marked with isLoaded=false. Could anyone tell me the underlying mechanism for these two cases? Why LotsOfCores could realize frequent unloading/loading of cores? Do cores not in the transient cores still consume JVM memory, while unloaded cores with CoreAdmin not? Thanks, Xiaolu
Re: How to properly use Levenstein distance with ~ in Java
We’re reimplementing fuzzy support in edismax on Solr 4.x right now. See: https://issues.apache.org/jira/browse/SOLR-629 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Oct 22, 2014, at 11:08 PM, karsten-s...@gmx.de wrote: Hi Aleksander, The Fuzzy Searche '~' is not supported in dismax (defType=dismax) https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser You are using SearchComponent spellchecker. This does not change the query results. btw: It looks like you are using path /select with qt=dismax. This normaly would throw an exception. Is there a tag requestHandler name=/dismax ... inside your solrconfig.xml ? Best regards Karsten P.S. in Context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-td4164793.html On 20 October 2014 11:13, Aleksander Sadecki wrote: Ok, thank you for your response. But why I cannot use '~'?
Re: unstable results on refresh
On 10/23/2014 2:44 AM, Giovanni Bricconi wrote: My user interface shows some boxes to describe results categories. After half a day of small updates and delete I noticed with various queries that the boxes started swapping while browsing. For sure I relied too much in getting the same results on each call, now I'm keeping the categories order in request parameters to avoid the blink effect while browsing. The optimize process is really slow, and I can't use it. Since I have many other parameters that should be carried along the request to make sure that the navigation is consistent, I would like to understand if is there a setup that can limit the idf change and keep it low enough I tried with indexConfig mergeFactor5/mergeFactor /indexConfig In solrconfig but this morning /solr/admin/cores?action=STATUS still reports a number of segments above ten for all cores of the shard. (I'm sure I have reloaded each core after changing the value) Now I'm trying with expungeDeletes called from solrj, but still I don't see the segment count decrease It's completely normal to have more segments than the mergeFactor. Think about this scenario with a mergeFactor of 5: You index five segments. They get merged to one segment. Let's say that this happens a total of four times, so you've indexed a total of 20 segments and merging has reduced that to four larger segments. Let's say that you now index four more segments. You'll be completely stable with eight segments. If you index another one, that will result in a fifth larger segment. This sets conditions up just right for another merge -- to one even larger segment. This represents three levels of merging, and there can be even more levels, each of which can have four segments and remain stable. Starting at the last state I described, if you then indexed 24 more segments, you'd have a stable index with a total of nine segments - four of them would be normal sized, four of them would be about five times normal size, and the first one would be about 25 times normal size. The Solr default for the merge policy in all recent versions is TieredMergePolicy, and this can make things slightly more complicated than I've described, because it can merge *any* segments, not just those indexed sequentially, and I believe that it can delay merging until the right number of segments with suitable characteristics appear. I've got merge settings equivalent to a mergeFactor of 35, but I regularly see the segment count approach 100, and there's absolutely nothing wrong with my merging. If I understand it correctly, expungeDeletes will not decrease the segment count. It will simply rewrite segments that have deleted documents so there are none. I'm not 100% sure that I know exactly what expungeDeletes does, though. Thanks, Shawn
Re: StatelessScriptUpdateProcessorFactory Access to Solr Core/schema/analyzer etc
On 10/23/2014 2:47 AM, Erik Hatcher wrote: Ummm… see slides 10 and 11 here: http://www.slideshare.net/erikhatcher/solr-indexing-and-analysis-tricks So yes, you can do analysis tricks in an update script. And it’s incredibly useful and powerful! :) That's pretty amazing. I would not have imagined that this kind of crossover would be possible. Thanks for the info! Shawn
QueryAutoStopWordAnalyzer
I just located the QueryAutoStopWordAnalyzer in lucene. Has anyone managed to use it for solr? Could imagine to have a language independent search clean up for the text_all field. Can it be used for solr right out of the box or do I have to write a wrapper or factory? Regards Bernd
Re: Analytics component
I believe some of this statistics function that you're trying to use are precent in facets. - Original Message - From: nabil Kouici koui...@yahoo.fr To: solr-user@lucene.apache.org Sent: Thursday, October 23, 2014 5:57:27 AM Subject: Analytics component Hi All, I'm trying to use Solr to do some analytic function (percentile, median...). I got Trunck branch from Solr which contain the analytics component implementation. I've rebuild solr but unfortunately this component wasn't taken into consideration and no lib generated in /contrib/analytics. Do you have any idea how to get it complied. Otherwise, any idea to have this analytics in Solr. Regards, Nabil.
Re: Difference between unloading of cores with LotsOfCores and unloading a core with CoreAdmin
Hi Erick, Actually we are adding more cores. In this case, we set transientCacheSize=500, create 16,000 cores in total, each with 10k log entries. During the process, we could easily see JVM memory usage will increase as the total number of cores grows. It runs out of memory when the total number of cores reaches 5,400. Then we restart Solr, continue creating and loading cores. JVM memory usage will rise to over 7GB (Max: 8GB), but not exceed the maximum. The process could be very slow then, we believe garbage collection may take place and cost some time. How about the resources usage for LotsOfCores (loaded/unloaded), e.g. searcher? Are all resources allocated by the core close for unloaded cores? And how about the processing time for unloaded cores to get it loaded first if we issue a query to it? We do the testing to look into the processing time for unloaded cores. In this case, we have 100 cores, 1-50 with 100M, 51-55 with 1M, 56-60 with 10M, 61-70 with 100K, 71-100 with 10K. Then we could do query to unloaded cores with different data size to get the processing time for each group. Here, this query is for all: select?q=*. *Collection Name* *Total Time(ms)* *QTime(ms)* *Processing Time(ms)* collection71(10K) 418 1 417 collection72(10K) 413 0 413 collection61(100K) 439 2 437 collection62(100K) 424 1 423 collection51(1M) 527 5 522 collection52(1M) 538 5 533 collection56(10M) 560 33 527 collection57(10M) 553 33 520 collection3(100M) 5971 322 5649 collection4(100M) 6052 327 5725 Based on the table above, we could see an ascending trend with larger data. But there is a big gap between 10M and 100M. Thanks, Xiaolu On 10/23/2014 9:51 AM, Erick Erickson wrote: Memory should eventually be returned when a core is unloaded. There's a very small amount of overhead for keeping a list of all the cores and their locations, but this shouldn't increase with time unless you're adding more cores. Do note that the transient cache size is fixed, but may be exceeded. A core is held open when it gets reclaimed long enough to serve any outstanding requests, but it _should_ have the memory reclaimed eventually. Of course there's always the possibility of some memory being kept inadvertently, I'd consider that a bug so if you can define how this happens, perhaps with a test case that would be great. Dumping the memory would help see what's kept if anything actually is. Best, Erick On Wed, Oct 22, 2014 at 12:33 PM, Xiaolu Zhao xiaolu.z...@oracle.com wrote: Hi Erick, Thanks a lot for your explanation. Last time, when I try out LotsOfCores, I find JVM memory usage will increase as the total number of cores grows, though the transient cache size is fixed. Finally, JVM will run out of memory when I have thousands of cores. Does it mean other currently unloaded cores will consume memory? Or swapping among loaded/unloaded cores will consume memory? Best, Xiaolu On 10/22/2014 12:23 PM, Erick Erickson wrote: The difference here is that the LotsOfCores is intended to cache open cores and thus limit the number of currently loaded cores. However, cores not currently loaded are available for use; the next request that needs that core will cause it to be loaded (or reloaded). The admin/core/UNLOAD command, on the other hand, is designed to _permanently_ remove the core from Solr. Or at least have it become unavailable until another explicit admin/core command is executed to bring it back. There is nothing automatic about this. Another way of looking at it is that LotsOfCores is used in a situation where you don't know what requests are coming in, but you _can_ predict that not many will be used at once. So if I have 500 cores, and my expectation is that only 20 of them are used at once, there's no good in having the 480 other cores loaded all the time. When a query comes in for one of the currently-unloaded cores (call it core21), that core is loaded (perhaps displacing one of the currently-loaded cores) and the request is served. If core21 above had been unloaded with the core/admin command, then a request directed to it would return an error instead. Best, Erick On Wed, Oct 22, 2014 at 12:11 PM, Xiaolu Zhao xiaolu.z...@oracle.com wrote: Hi All, I am confused about the difference between unloading of cores with LotsOfCores and unloading a core with CoreAdmin. From my understanding of LotsOfCores, if one core is removed from transient cache, it is pending to close, it means close all resources allocated by the core if it is no longer in use, e.g. searcher, updateHandler... While for unloading a core with CoreAdmin, this
Re: QueryAutoStopWordAnalyzer
How is this different from using StopFilterFactory in Solr: http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/analysis/core/StopFilterFactory.html ? Lucene wraps analyzers, Solr has a chain instead (though analyzers are supported as well). You just configure the chain. Writing a factory for when one analyzer wraps another would be just duplication of the chain code. What am I missing? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 23 October 2014 10:31, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: I just located the QueryAutoStopWordAnalyzer in lucene. Has anyone managed to use it for solr? Could imagine to have a language independent search clean up for the text_all field. Can it be used for solr right out of the box or do I have to write a wrapper or factory? Regards Bernd
Re: How to properly use Levenstein distance with ~ in Java
The last real update on that is 2.5 years old. Is there more recent update? I am interested in this topic as well. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 23 October 2014 10:10, Walter Underwood wun...@wunderwood.org wrote: We’re reimplementing fuzzy support in edismax on Solr 4.x right now. See: https://issues.apache.org/jira/browse/SOLR-629 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Oct 22, 2014, at 11:08 PM, karsten-s...@gmx.de wrote: Hi Aleksander, The Fuzzy Searche '~' is not supported in dismax (defType=dismax) https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Parser You are using SearchComponent spellchecker. This does not change the query results. btw: It looks like you are using path /select with qt=dismax. This normaly would throw an exception. Is there a tag requestHandler name=/dismax ... inside your solrconfig.xml ? Best regards Karsten P.S. in Context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-distance-with-in-Java-td4164793.html On 20 October 2014 11:13, Aleksander Sadecki wrote: Ok, thank you for your response. But why I cannot use '~'?
update external file
I've been looking at ExternalFileField to handle popularity boosting. Since Solr updatable docvalues (SOLR-5944) isn't quite there yet. My question is whether there is any support for uploading the external file via Solr, or if people do that some other (external, I guess) way? -Mike
Re: update external file
Of course, there is a support for uploading the external file via Solr, you can find more details in below links https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes http://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/schema/ExternalFileField.html -- View this message in context: http://lucene.472066.n3.nabble.com/update-external-file-tp4165563p4165565.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: update external file
Thanks for the links, Ramzi. I had already read the wiki page, which merely talks about how to reload the file into memory once it has been updated on disk. It doesn't mention any support for uploading that I can see. Did I miss it? -Mike On 10/23/14 1:36 PM, Ramzi Alqrainy wrote: Of course, there is a support for uploading the external file via Solr, you can find more details in below links https://cwiki.apache.org/confluence/display/solr/Working+with+External+Files+and+Processes http://lucene.apache.org/solr/4_10_0/solr-core/org/apache/solr/schema/ExternalFileField.html -- View this message in context: http://lucene.472066.n3.nabble.com/update-external-file-tp4165563p4165565.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: update external file
I hope I understand your question well. so I had the same problem. This is what I did: 1. Create a file: solr_home/PROJECT/multicore/core1/data/external_popularProducts.txt The file should contain values like this: uniqueID_in_core=count Example: 873728721=19 842728342=20 2. Update schema.xml, add this under types /types fieldType name=popularProductsFile keyField=key defVal=0 stored=true indexed=true class=solr.ExternalFileField valType=float / Here, key is the column name for the primaryID of solr core. Add this under fields/fields field name=popularProducts type=popularProductsFile indexed=true stored=true / 3. Reload the core. -- View this message in context: http://lucene.472066.n3.nabble.com/update-external-file-tp4165563p4165572.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: update external file
You either need to upload them and issue the reload command, or download them from the machine, and then issue the reload command. There is no REST support for it (yet) like the synonym filter, or was it stop filter? MArkus -Original message- From:Michael Sokolov msoko...@safaribooksonline.com Sent: Thursday 23rd October 2014 19:19 To: solr-user solr-user@lucene.apache.org Subject: update external file I've been looking at ExternalFileField to handle popularity boosting. Since Solr updatable docvalues (SOLR-5944) isn't quite there yet. My question is whether there is any support for uploading the external file via Solr, or if people do that some other (external, I guess) way? -Mike
Re: Difference between unloading of cores with LotsOfCores and unloading a core with CoreAdmin
bq: ..allocated by the core close for unloaded cores? And how about the processing time for unloaded cores to get it loaded first if we issue a query to it? Well, all resources are supposed to be returned to the system. Even 500 cores open at one time is a lot though. My theory is this has nothing to do with transient or non-transient cores. What's happening here is that you simply are opening too many cores (eventually) for the memory you're allocating. Plus, various caches get filled up at different times depending on the query. Also, is you have, say, 1,000 simultaneous queries outstanding to 1,000 different cores, _all_ 1,000 will be loaded in memory at the same time (I'm simplifying a bit here). After 500 of the queries have been satisfied, the number should drop back. So here's what I'd do to test if there's really a memory leak or you're just being too ambitious: Drop the transient cache size to, say, 100 (or 50 or 10). You'll also have to take some care not to flood the system with lots of queries to lots of different cores, but you should vary the cores to cycle through them all. If your process still shows memory creeping, you'll need to take some memory snapshots so we can analyze what's going on. And by mixing very different numbers of documents in your various cores, you're introducing another variable that will make apples-to-apples comparisons difficult. The model the LotsOfCores stuff was built to deal with is having 100's to 1,000's of cores, but not very many of them active at once. Consider a situation where each e-mail user has their own core. A user searches old e-mails only very rarely, so having 10,000 cores on a machine, only, say, 10-20 may be active at once. You never know which ones, of course. Eventually all of them will be used but rarely very many simultaneously. So you may be hitting an edge case if you are continually firing queries at different cores. Loading a core is expensive, all the underlying caches will be warmed, firstSearcher queries will be fired, etc. And on only 8G of memory for 500 active cores, it's not surprising that you're blowing up memory IMO. Best, Erick On Thu, Oct 23, 2014 at 11:28 AM, Xiaolu Zhao xiaolu.z...@oracle.com wrote: Hi Erick, Actually we are adding more cores. In this case, we set transientCacheSize=500, create 16,000 cores in total, each with 10k log entries. During the process, we could easily see JVM memory usage will increase as the total number of cores grows. It runs out of memory when the total number of cores reaches 5,400. Then we restart Solr, continue creating and loading cores. JVM memory usage will rise to over 7GB (Max: 8GB), but not exceed the maximum. The process could be very slow then, we believe garbage collection may take place and cost some time. How about the resources usage for LotsOfCores (loaded/unloaded), e.g. searcher? Are all resources allocated by the core close for unloaded cores? And how about the processing time for unloaded cores to get it loaded first if we issue a query to it? We do the testing to look into the processing time for unloaded cores. In this case, we have 100 cores, 1-50 with 100M, 51-55 with 1M, 56-60 with 10M, 61-70 with 100K, 71-100 with 10K. Then we could do query to unloaded cores with different data size to get the processing time for each group. Here, this query is for all: select?q=*. *Collection Name* *Total Time(ms)* *QTime(ms)* *Processing Time(ms)* collection71(10K) 418 1 417 collection72(10K) 413 0 413 collection61(100K) 439 2 437 collection62(100K) 424 1 423 collection51(1M) 527 5 522 collection52(1M) 538 5 533 collection56(10M) 560 33 527 collection57(10M) 553 33 520 collection3(100M) 5971 322 5649 collection4(100M) 6052 327 5725 Based on the table above, we could see an ascending trend with larger data. But there is a big gap between 10M and 100M. Thanks, Xiaolu On 10/23/2014 9:51 AM, Erick Erickson wrote: Memory should eventually be returned when a core is unloaded. There's a very small amount of overhead for keeping a list of all the cores and their locations, but this shouldn't increase with time unless you're adding more cores. Do note that the transient cache size is fixed, but may be exceeded. A core is held open when it gets reclaimed long enough to serve any outstanding requests, but it _should_ have the memory reclaimed eventually. Of course there's always the possibility of some memory being kept inadvertently, I'd consider that a bug so if you can define how this happens, perhaps with a test case that would be great. Dumping the memory would help see what's kept if anything actually is. Best, Erick On Wed, Oct 22, 2014 at 12:33 PM, Xiaolu Zhao xiaolu.z...@oracle.com wrote: Hi Erick, Thanks a lot for your explanation. Last time, when I try out
RE: update external file
Right, There is no REST support for it like the synonym filter, or was it stop filter. -- View this message in context: http://lucene.472066.n3.nabble.com/update-external-file-tp4165563p4165577.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: update external file
That's what I thought; thanks, Markus. On 10/23/14 2:19 PM, Markus Jelsma wrote: You either need to upload them and issue the reload command, or download them from the machine, and then issue the reload command. There is no REST support for it (yet) like the synonym filter, or was it stop filter? MArkus -Original message- From:Michael Sokolov msoko...@safaribooksonline.com Sent: Thursday 23rd October 2014 19:19 To: solr-user solr-user@lucene.apache.org Subject: update external file I've been looking at ExternalFileField to handle popularity boosting. Since Solr updatable docvalues (SOLR-5944) isn't quite there yet. My question is whether there is any support for uploading the external file via Solr, or if people do that some other (external, I guess) way? -Mike
recip function error
Good evening, I'm using solr 4.0 Final. I tried using this function boost=recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05)) but it fails with this error: org.apache.lucene.queryparser.classic.ParseException: Expected ')' at position 29 in 'recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))' I applied this patch https://issues.apache.org/jira/browse/SOLR-3522 Rebuilt and redeployed AND I get the exact same error. I only copied over the new jars and war file. Non of the other libraries seemed to have changed. the patch is in solr core so I figured I was safe. Does anyone know how to fix this? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: How to properly use Levenstein distance with ~ in Java
In terms of recent work with edit-distance (specifically Levenshtein) and your expressed interest might find this paper provocative. We measure the keyword similarity between two strings by lemmatizing them, removing stopwords, and computing the cosine similarity. We then include the keyword similar- ity between the query and the input question, the keyword similarity between the query and the returned evidence, and an indicator feature for whether the query involves a join. The evidence features compute KB-specific properties... We compute the join-key string similarity mea- sured using the Levenshtein distance. http://dx.doi.org/10.1145/2623330.2623677 re will -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Thursday, October 23, 2014 12:05 PM To: solr-user Subject: Re: How to properly use Levenstein distance with ~ in Java The last real update on that is 2.5 years old. Is there more recent update? I am interested in this topic as well. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 23 October 2014 10:10, Walter Underwood wun...@wunderwood.org wrote: We’re reimplementing fuzzy support in edismax on Solr 4.x right now. See: https://issues.apache.org/jira/browse/SOLR-629 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Oct 22, 2014, at 11:08 PM, karsten-s...@gmx.de wrote: Hi Aleksander, The Fuzzy Searche '~' is not supported in dismax (defType=dismax) https://cwiki.apache.org/confluence/display/solr/The+DisMax+Query+Par ser You are using SearchComponent spellchecker. This does not change the query results. btw: It looks like you are using path /select with qt=dismax. This normaly would throw an exception. Is there a tag requestHandler name=/dismax ... inside your solrconfig.xml ? Best regards Karsten P.S. in Context: http://lucene.472066.n3.nabble.com/How-to-properly-use-Levenstein-dis tance-with-in-Java-td4164793.html On 20 October 2014 11:13, Aleksander Sadecki wrote: Ok, thank you for your response. But why I cannot use '~'?
Re: recip function error
On 10/23/2014 3:09 PM, eShard wrote: Good evening, I'm using solr 4.0 Final. I tried using this function boost=recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05)) but it fails with this error: org.apache.lucene.queryparser.classic.ParseException: Expected ')' at position 29 in 'recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))' I applied this patch https://issues.apache.org/jira/browse/SOLR-3522 Rebuilt and redeployed AND I get the exact same error. I only copied over the new jars and war file. Non of the other libraries seemed to have changed. the patch is in solr core so I figured I was safe. Does anyone know how to fix this? The Solr version you are running is more than two years old. There have been MANY new releases and MANY problems fixed since July 2012. I have been using the recip function in a similar manner without any problem on Solr versions starting at 4.2.1, up through 4.9.1, but I |have never used 4.0.|| boost=min(recip(abs(ms(NOW/HOUR,pd)),1.92901e-10,1.5,1.5),0.85)| Upgrading is strongly advised. The current Solr version is 4.10.1, released less than a month ago. Thanks, Shawn
Re: Analytics component
Thank you for this replay. Yes but many analytics functions are not available like percentile, median, SD deviation... Regards,Nabil Le Jeudi 23 octobre 2014 16h34, Jorge Luis Betancourt González jlbetanco...@uci.cu a écrit : I believe some of this statistics function that you're trying to use are precent in facets. - Original Message - From: nabil Kouici koui...@yahoo.fr To: solr-user@lucene.apache.org Sent: Thursday, October 23, 2014 5:57:27 AM Subject: Analytics component Hi All, I'm trying to use Solr to do some analytic function (percentile, median...). I got Trunck branch from Solr which contain the analytics component implementation. I've rebuild solr but unfortunately this component wasn't taken into consideration and no lib generated in /contrib/analytics. Do you have any idea how to get it complied. Otherwise, any idea to have this analytics in Solr. Regards, Nabil.
Re: recip function error
Thanks we're planning on going to 4.10.1 in a few months. I discovered that recip only works with dismax; I use edismax by default. does anyone know why I can't use recip with edismax?? I hope this is fixed in 4.10.1... Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600p4165613.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: recip function error
: I tried using this function : boost=recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05)) : but it fails with this error: : org.apache.lucene.queryparser.classic.ParseException: Expected ')' at : position 29 in 'recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))' look very carefully at your input, and at the error message. you are only passing *1* argument to the recip() function -- the output of the ms() funciton. you are passing *5* arguments to the ms() function -- it supports a max of 2. which is why at the 29th character of your input, after the second argument to your ms() function, it's complaining that it's expecting a ) character -- not more arguments. -Hoss http://www.lucidworks.com/
Re: SolrCloud 4.7 not doing distributed search when querying from a load balancer.
Shawn , Just wanted to follow up , I still face this issue of inconsistent search results on Solr Cloud 4.1.0.1 , upon further looking into logs , I found out a few exceptions , what was obvious was zkConnection time out issues and other exceptions , please take a look . *Logs* /opt/tomcat1/logs/catalina.out:103651230 [http-bio-8081-exec-206] WARN org.apache.solr.handler.ReplicationHandler – Exception while writing response for params: file=_68v.fnmcommand=filecontentchecksum=truewt=filestreamqt=/replicationgeneration=2410 /opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException: /opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) /opt/tomcat1/logs/catalina.out:103651579 [http-bio-8081-exec-206] WARN org.apache.solr.handler.ReplicationHandler – Exception while writing response for params: file=_68v.fnmcommand=filecontentchecksum=truewt=filestreamqt=/replicationgeneration=2410 /opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException: /opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) /opt/tomcat1/logs/catalina.out:103651586 [http-bio-8081-exec-206] WARN org.apache.solr.handler.ReplicationHandler – Exception while writing response for params: file=_68v.fnmcommand=filecontentchecksum=truewt=filestreamqt=/replicationgeneration=2410 /opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException: /opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) /opt/tomcat1/logs/catalina.out:103651592 [http-bio-8081-exec-206] WARN org.apache.solr.handler.ReplicationHandler – Exception while writing response for params: file=_68v.fnmcommand=filecontentchecksum=truewt=filestreamqt=/replicationgeneration=2410 /opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException: /opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) /opt/tomcat1/logs/catalina.out:103651600 [http-bio-8081-exec-206] WARN org.apache.solr.handler.ReplicationHandler – Exception while writing response for params: file=_68v.fnmcommand=filecontentchecksum=truewt=filestreamqt=/replicationgeneration=2410 /opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException: /opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) /opt/tomcat1/logs/catalina.out:103651611 [http-bio-8081-exec-203] WARN org.apache.solr.handler.ReplicationHandler – Exception while writing response for params: file=_68v.fnmcommand=filecontentchecksum=truewt=filestreamqt=/replicationgeneration=2410 /opt/tomcat1/logs/catalina.out:java.nio.file.NoSuchFileException: /opt/solr/home1/dyCollection1_shard2_replica1/data/index/_68v.fnm /opt/tomcat1/logs/catalina.out: at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) 471640118 [localhost-startStop-1-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager – Watcher org.apache.solr.common.cloud.ConnectionManager@2a7dcd74 name:ZooKeeperConnection Watcher:server1.mydomain.com:2181, server2.mydomain.com:2181,server3.mydomain.com:2181 got event WatchedEvent state:Disconnected type:None path:null path:null type:None 471640120 [localhost-startStop-1-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager – zkClient has disconnected 471642457 [zkCallback-2-thread-8] INFO org.apache.solr.cloud.DistributedQueue – LatchChildWatcher fired on path: null state: Expired type None 471642458 [localhost-startStop-1-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager –
Re: recip function error
3.16e-11.0 looks fishy to me On 10/23/14 5:09 PM, eShard wrote: Good evening, I'm using solr 4.0 Final. I tried using this function boost=recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05)) but it fails with this error: org.apache.lucene.queryparser.classic.ParseException: Expected ')' at position 29 in 'recip(ms(NOW/HOUR,startdatez,3.16e-11.0,0.08,0.05))' I applied this patch https://issues.apache.org/jira/browse/SOLR-3522 Rebuilt and redeployed AND I get the exact same error. I only copied over the new jars and war file. Non of the other libraries seemed to have changed. the patch is in solr core so I figured I was safe. Does anyone know how to fix this? Thanks, -- View this message in context: http://lucene.472066.n3.nabble.com/recip-function-error-tp4165600.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: recip function error
On Thu, Oct 23, 2014 at 7:47 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: 3.16e-11.0 looks fishy to me Indeed... looks like it should be 3.16e-11 Standard scientific notation shouldn't have decimal points in the exponent. Not sure if that causes Java problems or not though... -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data