Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
On Tue, Oct 20, 2009 at 11:57 PM, Shalin Shekhar Mangar < shalinman...@gmail.com> wrote: > On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller > wrote: > > > > > > > On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar < > > shalinman...@gmail.com> wrote: > > > > I don't think the debate is about weak reference vs. soft references. > >> > > > > There appears to be confusion between the two here no matter what the > > debate - soft references are for cachinh, weak references are not so > much. > > Getting it right is important. > > > > I > >> guess the point that Lance is making is that using such a technique will > >> make application performance less predictable. There's also a good > chance > >> that a soft reference based cache will cause cache thrashing and will > hide > >> OOMs caused by inadequate cache sizes. So basically we trade an OOM for > >> more > >> CPU usage (due to re-computation of results). > >> > > > > That's the whole point. Your not hiding anything. I don't follow you. > > > > Using a soft reference based cache can hide the fact that one has > inadequate > memory for the cache size one has configured. Don't get me wrong. I'm not > against the feature. I was merely trying to explain Lance's concerns as I > understood them. > Lance concern is valid. Assuming that we are going to have this feature (non-default) we need a way to know that cache trashing has happened.I mean the statistics should also expose the no:of cache entries which got removed. This should enable the user to decide whether there should be more RAM or he is happy to live w/ the extra cpu cycles for recomputation > > > > > > > > > >> Personally, I think giving an option is fine. What if the user does not > >> have > >> enough RAM and he is willing to pay the price? Right now, there is no > way > >> he > >> can do that at all. However, the most frequent reason behind OOMs is not > >> having enough RAM to create the field caches and not Solr caches, so I'm > >> not > >> sure how important this is. > >> > > > > How important is any feature? You don't have a use for it, so it's not > > important to you - someone else does so it is important to them. Soft > value > > caches can be useful. > > > Don't jump to conclusions :) > > The reason behind this feature request is to have Solr caches which resize > themselves when enough memory is not available. I agree that soft value > caches are useful for this. All I'm saying is that most OOMs that get > reported on the list are due to inadequate free memory for allocating field > caches. Finding a way around that will be the key to make a Lucene/Solr > application practical in a limited memory environment. > > Just for the record, I'm +1 for adding this feature but keeping the current > behavior as the default. > > -- > Regards, > Shalin Shekhar Mangar. > -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: deploy solr in Eclipse IDE
Pradeep, Attached are the files. You may have to open them in a text editor and rename the project to match yours but should be pretty straightforward. I used this with 1.3 trunk at the time so things may have changed but it's easy enough to modify in eclipse. - Amit On Mon, Oct 19, 2009 at 4:16 PM, Pradeep Pujari wrote: > This is ulr is helpful. If I checkout in Eclipse using SVN(Subclipse), the > source files are not as per package structure. Can you please send me your > ..project and .classpth files? Thank you in advance. > > Pradeep > > --- On Sun, 10/18/09, Amit Nithian wrote: > > > From: Amit Nithian > > Subject: Re: deploy solr in Eclipse IDE > > To: solr-dev@lucene.apache.org > > Date: Sunday, October 18, 2009, 11:06 PM > > Hey Pradeep, > > Check out > > > http://lucene.apache.org/solr/version_control.html#Anonymous+Access+%28read-only%29 > > > > < > http://lucene.apache.org/solr/version_control.html#Anonymous+Access+%28read-only%29 > >If > > you need more help with setting up Eclipse and Solr trunk > > send me an email. > > I can send you my .project and .classpath files as I have > > it for my setup. > > > > Take care > > Amit > > > > On Sun, Oct 18, 2009 at 11:34 AM, Pradeep Pujari < > prade...@rocketmail.com>wrote: > > > > > Hi Amit, > > > This is what I am looking for. Do you know the URL for > > trunk? > > > > > > Thanks, > > > Pradeep. > > > > > > --- On Sun, 10/18/09, Amit Nithian > > wrote: > > > > > > > From: Amit Nithian > > > > Subject: Re: deploy solr in Eclipse IDE > > > > To: solr-dev@lucene.apache.org > > > > Date: Sunday, October 18, 2009, 12:55 AM > > > > I think you may have better luck > > > > setting up Eclipse, Subclipse etc and hook > > > > off of trunk rather than having to re-create the > > eclipse > > > > project every time > > > > a nightly build comes out. > > > > I simply have an eclipse project tied to trunk > > and every so > > > > often i'll do an > > > > SVN update when I want/need the latest code. > > > > > > > > hope that helps some! > > > > Amit > > > > > > > > On Thu, Oct 15, 2009 at 2:31 AM, Brian Carmalt > > > > > > wrote: > > > > > > > > > Hello, > > > > > > > > > > I Start Solr with Jetty using the following > > code. If > > > > the classpath and > > > > > src paths are set correctly in Eclipse and > > you pass > > > > the solr.home to the > > > > > VM on startup, you just have to start this > > class and > > > > you can debug Solr > > > > > in Eclipse. > > > > > > > > > > > > > > > import org.mortbay.jetty.Connector; > > > > > import org.mortbay.jetty.Server; > > > > > import > > org.mortbay.jetty.webapp.WebAppContext; > > > > > > > > > > public class JettyStarter { > > > > > > > > > >/** > > > > > * > > @param args > > > > > */ > > > > >public static > > void > > > > main(String[] args) { > > > > > > > > > > > > > > try { > > > > > > > > > > > > > > Server > > server = new > > > > Server(); > > > > > > > > > > > > > > > >WebAppContext solr = new > > > > WebAppContext(); > > > > > > > > > > >solr.setContextPath("/solr"); > > > > > solr.setWar("Path to solr directory or > > war"); > > > > > > > > > > >server.addHandler(solr); > > > > > > > > > > >server.setStopAtShutdown(true); > > > > > > > > > > >server.start(); > > > > > > > > > } catch (Exception e) { > > > > > > > > > // TODO > > Auto-generated catch > > > > block > > > > > > > > > > >e.printStackTrace(); > > > > > > > > > } > > > > >} > > > > > > > > > > } > > > > > > > > > > > > > > > > > > > > > > > > > Am Dienstag, den 13.10.2009, 16:43 -0700 > > schrieb > > > > Pradeep Pujari: > > > > > > Hi All, > > > > > > > > > > > > I am trying to install solr nightly > > build into > > > > Eclipse IDE and facing lot > > > > > of issues while importing the zip file. The > > build > > > > path, libs and various > > > > > source files are scattered. It took me lot > > of tine to > > > > configure and make it > > > > > run. > > > > > > > > > > > > What development environment are being > > used and > > > > is there a smooth way of > > > > > importing daily-nightly build into eclipse? > > > > > > > > > > > > Please help. > > > > > > > > > > > > Thanks, > > > > > > Pradeep. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
On-topic: Will the Google implementations + soft references behave well with 8+ processors? Semi-on-topic: If you want to really know multiprocessor algorithms, this is the bible: "The Art Of Multiprocessor Programming". Hundreds of parallel algorithms for many different jobs, all coded in Java, and cross-referenced with the java.util.concurrent package. Just amazing. http://www.elsevier.com/wps/find/bookdescription.cws_home/714091/description#description Off-topic: I was representing a system troubleshooting philosophy: "Fail Early, Fail Loud". Meaning, if there is a problem like OOMs, tell me and I'll fix it permanently. But different situations call for different answers, and Mark is representing "just keep working, ok?". Brittle v.s. Supple is one way to think of it. On Tue, Oct 20, 2009 at 11:27 AM, Shalin Shekhar Mangar wrote: > On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller wrote: > >> >> >> On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar < >> shalinman...@gmail.com> wrote: >> >> I don't think the debate is about weak reference vs. soft references. >>> >> >> There appears to be confusion between the two here no matter what the >> debate - soft references are for cachinh, weak references are not so much. >> Getting it right is important. >> >> I >>> guess the point that Lance is making is that using such a technique will >>> make application performance less predictable. There's also a good chance >>> that a soft reference based cache will cause cache thrashing and will hide >>> OOMs caused by inadequate cache sizes. So basically we trade an OOM for >>> more >>> CPU usage (due to re-computation of results). >>> >> >> That's the whole point. Your not hiding anything. I don't follow you. >> > > Using a soft reference based cache can hide the fact that one has inadequate > memory for the cache size one has configured. Don't get me wrong. I'm not > against the feature. I was merely trying to explain Lance's concerns as I > understood them. > > >> >> >> >>> Personally, I think giving an option is fine. What if the user does not >>> have >>> enough RAM and he is willing to pay the price? Right now, there is no way >>> he >>> can do that at all. However, the most frequent reason behind OOMs is not >>> having enough RAM to create the field caches and not Solr caches, so I'm >>> not >>> sure how important this is. >>> >> >> How important is any feature? You don't have a use for it, so it's not >> important to you - someone else does so it is important to them. Soft value >> caches can be useful. > > > Don't jump to conclusions :) > > The reason behind this feature request is to have Solr caches which resize > themselves when enough memory is not available. I agree that soft value > caches are useful for this. All I'm saying is that most OOMs that get > reported on the list are due to inadequate free memory for allocating field > caches. Finding a way around that will be the key to make a Lucene/Solr > application practical in a limited memory environment. > > Just for the record, I'm +1 for adding this feature but keeping the current > behavior as the default. > > -- > Regards, > Shalin Shekhar Mangar. > -- Lance Norskog goks...@gmail.com
Re: clustering schema
Actually just copying the example schema to contrib seemed to work fine... those should probably be kept in alignment regardless of if we decide to do something different about the data directory. -Yonik http://www.lucidimagination.com
Re: clustering schema
On Tue, Oct 20, 2009 at 5:31 PM, Grant Ingersoll wrote: > Can't we set up the clustering solrconfig to have a different data directory > and remove the default of ./solr/data? I get caught on this gotcha in a > lot of places these days b/c I am often trying out lots of different > configs. We could, but that has it's own downsides... like creating lucene indexes in various places in source directories like contrib. -Yonik http://www.lucidimagination.com > On Oct 20, 2009, at 5:13 PM, Yonik Seeley wrote: > >> So when I go to try the clustering example, I fire up the server, hit >> it with the example on the Wiki >> >> http://localhost:8983/solr/select?indent=on&q=*:*&rows=10&clustering=true >> >> And... boom. >> >> java.lang.NullPointerException >> at >> org.apache.solr.schema.SortableIntField.write(SortableIntField.java:72) >> at org.apache.solr.schema.SchemaField.write(SchemaField.java:108) >> [...] >> >> It's because a schema mismatch of course... I had already indexed data >> using the normal schema, and now we're using a different schema/config >> with the same data dir. >> I imagine this will be a common mistake. >> >> Should we try to do this like SolrCell... just make it a lazy handler >> and reference the libs in solrconfig.xml? Oh wait... searchComponents >> can't be lazy I don't think... darn. >> I guess the only "fix" (it's not really a bug, just undesirable) is to >> try and get the schemas closer together? >> >> -Yonik >> http://www.lucidimagination.com > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using > Solr/Lucene: > http://www.lucidimagination.com/search > >
Re: clustering schema
Can't we set up the clustering solrconfig to have a different data directory and remove the default of ./solr/data? I get caught on this gotcha in a lot of places these days b/c I am often trying out lots of different configs. On Oct 20, 2009, at 5:13 PM, Yonik Seeley wrote: So when I go to try the clustering example, I fire up the server, hit it with the example on the Wiki http://localhost:8983/solr/select?indent=on&q=*:*&rows=10&clustering=true And... boom. java.lang.NullPointerException at org.apache.solr.schema.SortableIntField.write (SortableIntField.java:72) at org.apache.solr.schema.SchemaField.write(SchemaField.java:108) [...] It's because a schema mismatch of course... I had already indexed data using the normal schema, and now we're using a different schema/config with the same data dir. I imagine this will be a common mistake. Should we try to do this like SolrCell... just make it a lazy handler and reference the libs in solrconfig.xml? Oh wait... searchComponents can't be lazy I don't think... darn. I guess the only "fix" (it's not really a bug, just undesirable) is to try and get the schemas closer together? -Yonik http://www.lucidimagination.com -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
clustering schema
So when I go to try the clustering example, I fire up the server, hit it with the example on the Wiki http://localhost:8983/solr/select?indent=on&q=*:*&rows=10&clustering=true And... boom. java.lang.NullPointerException at org.apache.solr.schema.SortableIntField.write(SortableIntField.java:72) at org.apache.solr.schema.SchemaField.write(SchemaField.java:108) [...] It's because a schema mismatch of course... I had already indexed data using the normal schema, and now we're using a different schema/config with the same data dir. I imagine this will be a common mistake. Should we try to do this like SolrCell... just make it a lazy handler and reference the libs in solrconfig.xml? Oh wait... searchComponents can't be lazy I don't think... darn. I guess the only "fix" (it's not really a bug, just undesirable) is to try and get the schemas closer together? -Yonik http://www.lucidimagination.com
RE: Where to free Tokenizer resources?
Erik, That's a good idea. But that means the resource releasing code must live in the finialize method and it has to wait until GC kicks in. Correct? -kuro > -Original Message- > From: Erik Hatcher [mailto:erik.hatc...@gmail.com] > Sent: Tuesday, October 20, 2009 12:37 PM > To: solr-dev@lucene.apache.org > Subject: Re: Where to free Tokenizer resources? > > What about acquiring the resource in your tokenizer factory > instead of at the tokenizer level? > > Erik > > > On Oct 20, 2009, at 1:16 PM, Teruhiko Kurosaka wrote: > > > > > Yonik, > > > >> If you really want to release/acquire your resources each time the > >> tokenizer is used, then release it in the close() and > acquire in the > >> reset(). There is no "done with this forever" callback. > > > > I wanted to avoid that because acquring this resource is a > relatively > > expensive operation. I wanted to do that per instance. I guess I > > should lobby Lucene folks and ask them to consider adding a > new method > > to do so. > > > > Is my guess that Solr calls Tokenizer.close() more than > once correct? > > My observation of the behavior suggets it but I couldn't find a > > concrete evidence in the source. > > > > > >> > >> -Yonik > >> http://www.lucidimagination.com > >> > >> On Tue, Oct 20, 2009 at 12:25 PM, Teruhiko Kurosaka > >> wrote: > >>> Hi, > >>> I have my own Tokenizer that was working with Solr 1.3 fine > >> but threw an Exception when used with Solr 1.4 dev. > >>> > >>> This Tokenizer uses some JNI-side resources that it takes > >> in the constructor and it frees it in close(). > >>> > >>> The behavior seems to indicate that Solr 1.4 calls close() > >> then reset(Reader) in order to reuse the Tokenizer. But > my Tokenizer > >> threw an Exception because its resource has been freed already. My > >> temporary fix was to move the resource release code from > close() to > >> finalize(). But I'm not very happy with it because the timing of > >> resource release is up to the garbage collector. > >>> > >>> Question #1: Is close() supposed to be called more than > >> once? To me, > >>> close() should be called only once at the end of life > cycle of the > >>> Tokenizer. (The old reader shold be closed when reset(Reader) is > >>> called.) > >>> > >>> If the answer is Yes, then > >>> > >>> Question #2: Is there any better place to release the > >> internal resource than in finalize()? > >>> > >>> Thank you. > >>> > >>> T. "Kuro" Kurosaka > >>> > >>> > >> > >
[jira] Commented: (SOLR-1516) DocumentList and Document QueryResponseWriter
[ https://issues.apache.org/jira/browse/SOLR-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767925#action_12767925 ] Chris A. Mattmann commented on SOLR-1516: - Hi All: I don't mean to be a pest here, but I've seen the amount of activity going on the SOLR lists recently, as well as the decision to hold off on calling for a vote on 1.4 until Lucene 2.9.1 is released. This patch is self-contained, doesn't touch any code, and honestly, it only adds functionality that would have made my life as a user of SOLR a lot easier (I would have saved the hour of debugging and printing out #getClass on the Objects in NamedList, and on top of that only had to implement an #emitDoc or #emitDocList function and optionally #emitHeader and #emitFooter, rather than the rest of the supporting code). Am I the only one that's run into a problem trying to write a custom XML SOLR output that's inherently simple? That is, XML output that doesn't need to worry about the inherent types of the named values in the NamedList, output that only cares about spitting out the set of returned Documents? It would be great to see this get into 1.4, but if I'm the outlier, I can wait. Just thought I'd raise the issue. Cheers, Chris > DocumentList and Document QueryResponseWriter > - > > Key: SOLR-1516 > URL: https://issues.apache.org/jira/browse/SOLR-1516 > Project: Solr > Issue Type: New Feature > Components: search >Affects Versions: 1.3 > Environment: My MacBook Pro laptop. >Reporter: Chris A. Mattmann >Priority: Minor > Fix For: 1.5 > > Attachments: SOLR-1516.Mattmann.101809.patch.txt > > > I tried to implement a custom QueryResponseWriter the other day and was > amazed at the level of unmarshalling and weeding through objects that was > necessary just to format the output o.a.l.Document list. As a user, I wanted > to be able to implement either 2 functions: > * process a document at a time, and format it (for speed/efficiency) > * process all the documents at once, and format them (in case an aggregate > calculation is necessary for outputting) > So, I've decided to contribute 2 simple classes that I think are sufficiently > generic and reusable. The first is o.a.s.request.DocumentResponseWriter -- it > handles the first bullet above. The second is > o.a.s.request.DocumentListResponseWriter. Both are abstract base classes and > require the user to implement either an #emitDoc function (in the case of > bullet 1), or an #emitDocList function (in the case of bullet 2). Both > classes provide an #emitHeader and #emitFooter function set that handles > formatting and output before the Document list is processed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: Where to free Tokenizer resources?
What about acquiring the resource in your tokenizer factory instead of at the tokenizer level? Erik On Oct 20, 2009, at 1:16 PM, Teruhiko Kurosaka wrote: Yonik, If you really want to release/acquire your resources each time the tokenizer is used, then release it in the close() and acquire in the reset(). There is no "done with this forever" callback. I wanted to avoid that because acquring this resource is a relatively expensive operation. I wanted to do that per instance. I guess I should lobby Lucene folks and ask them to consider adding a new method to do so. Is my guess that Solr calls Tokenizer.close() more than once correct? My observation of the behavior suggets it but I couldn't find a concrete evidence in the source. -Yonik http://www.lucidimagination.com On Tue, Oct 20, 2009 at 12:25 PM, Teruhiko Kurosaka wrote: Hi, I have my own Tokenizer that was working with Solr 1.3 fine but threw an Exception when used with Solr 1.4 dev. This Tokenizer uses some JNI-side resources that it takes in the constructor and it frees it in close(). The behavior seems to indicate that Solr 1.4 calls close() then reset(Reader) in order to reuse the Tokenizer. But my Tokenizer threw an Exception because its resource has been freed already. My temporary fix was to move the resource release code from close() to finalize(). But I'm not very happy with it because the timing of resource release is up to the garbage collector. Question #1: Is close() supposed to be called more than once? To me, close() should be called only once at the end of life cycle of the Tokenizer. (The old reader shold be closed when reset(Reader) is called.) If the answer is Yes, then Question #2: Is there any better place to release the internal resource than in finalize()? Thank you. T. "Kuro" Kurosaka
Re: TrieField -> NumericField ?
Yonik Seeley wrote: > On Tue, Oct 20, 2009 at 2:18 PM, Chris Hostetter > wrote: > >> I just realized we still have "o.a.s.schema.Trie*Field" classes in Solr >> but Lucene switched to using "NumericField" ... should we convert the Solr >> class names prior to 1.4? >> > > I dunno - NumericField is too generic. We still have two other types > of numeric fields. > > -Yonik > http://www.lucidimagination.com > I spotted the same thing this morning and was about to raise it when I came to the same conclusion. -- - Mark http://www.lucidimagination.com
Re: TrieField -> NumericField ?
: > I just realized we still have "o.a.s.schema.Trie*Field" classes in Solr : > but Lucene switched to using "NumericField" ... should we convert the Solr : > class names prior to 1.4? : : I dunno - NumericField is too generic. We still have two other types : of numeric fields. I'm fine with that ... I just wanted to make sure it was something we at least thought about (and not just an oversight) -Hoss
Re: TrieField -> NumericField ?
On Tue, Oct 20, 2009 at 2:18 PM, Chris Hostetter wrote: > > I just realized we still have "o.a.s.schema.Trie*Field" classes in Solr > but Lucene switched to using "NumericField" ... should we convert the Solr > class names prior to 1.4? I dunno - NumericField is too generic. We still have two other types of numeric fields. -Yonik http://www.lucidimagination.com
Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller wrote: > > > On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > I don't think the debate is about weak reference vs. soft references. >> > > There appears to be confusion between the two here no matter what the > debate - soft references are for cachinh, weak references are not so much. > Getting it right is important. > > I >> guess the point that Lance is making is that using such a technique will >> make application performance less predictable. There's also a good chance >> that a soft reference based cache will cause cache thrashing and will hide >> OOMs caused by inadequate cache sizes. So basically we trade an OOM for >> more >> CPU usage (due to re-computation of results). >> > > That's the whole point. Your not hiding anything. I don't follow you. > Using a soft reference based cache can hide the fact that one has inadequate memory for the cache size one has configured. Don't get me wrong. I'm not against the feature. I was merely trying to explain Lance's concerns as I understood them. > > > >> Personally, I think giving an option is fine. What if the user does not >> have >> enough RAM and he is willing to pay the price? Right now, there is no way >> he >> can do that at all. However, the most frequent reason behind OOMs is not >> having enough RAM to create the field caches and not Solr caches, so I'm >> not >> sure how important this is. >> > > How important is any feature? You don't have a use for it, so it's not > important to you - someone else does so it is important to them. Soft value > caches can be useful. Don't jump to conclusions :) The reason behind this feature request is to have Solr caches which resize themselves when enough memory is not available. I agree that soft value caches are useful for this. All I'm saying is that most OOMs that get reported on the list are due to inadequate free memory for allocating field caches. Finding a way around that will be the key to make a Lucene/Solr application practical in a limited memory environment. Just for the record, I'm +1 for adding this feature but keeping the current behavior as the default. -- Regards, Shalin Shekhar Mangar.
TrieField -> NumericField ?
I just realized we still have "o.a.s.schema.Trie*Field" classes in Solr but Lucene switched to using "NumericField" ... should we convert the Solr class names prior to 1.4? -Hoss
[jira] Commented: (SOLR-1514) Facet search results contain 0:0 entries although '0' values were not indexed.
[ https://issues.apache.org/jira/browse/SOLR-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767890#action_12767890 ] Hoss Man commented on SOLR-1514: Can you provide a JUnit test case, or a schema.xml + some sample docs that reproduce this behavior? > Facet search results contain 0:0 entries although '0' values were not indexed. > -- > > Key: SOLR-1514 > URL: https://issues.apache.org/jira/browse/SOLR-1514 > Project: Solr > Issue Type: Bug > Components: search >Affects Versions: 1.3 > Environment: Solr is on: Linux 2.6.18-92.1.13.el5xen >Reporter: Renata Perkowska > > Hi, > in my Jmeter ATs I can see that under some circumstances facet search > results contain '0' both as keys > and values for the integer field called 'year' although I never index zeros. > When I do a normal search, I don't see any indexed fields with zeros. > When I run my facet test (using JMeter) in isolation, everything works fine. > It happens only when it's being run after other tests > (and other indexing/deleting). On the other hand it shouldn't be the case > that other indexing are influencing this test, as at the end of each test I'm > deleting > indexed documents so before running the facet test an index is empty. > My facet test looks as follows: > 1. Index group of documents > 2. Perform search on facets > 3. Remove documents from the index. > The results that I'm getting for an integer field 'year': > 1990:4 > 1995:4 > 0:0 > 1991:0 > 1992:0 > 1993:0 > 1994:0 > 1996:0 > 1997:0 > 1998:0 > I'm indexing only values 1990-1999, so there certainly shouldn't be any '0' > as keys in the result set. > The indexed is being optimized not after each document deletion from and > index, but only when an index is loaded/unloaded, so the optimization won't > solve the problem in this case. > If the facet.mincount>0 is provided, then I'm not getting 0:0, but other > entries with '0' values are gone as well: > 1990:4 > 1995:4 > I'm also indexing text fields, but I don't see a similar situation in this > case. This bug only happens for integer fields. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
RE: Where to free Tokenizer resources?
Yonik, > If you really want to release/acquire your resources each > time the tokenizer is used, then release it in the close() > and acquire in the reset(). There is no "done with this > forever" callback. I wanted to avoid that because acquring this resource is a relatively expensive operation. I wanted to do that per instance. I guess I should lobby Lucene folks and ask them to consider adding a new method to do so. Is my guess that Solr calls Tokenizer.close() more than once correct? My observation of the behavior suggets it but I couldn't find a concrete evidence in the source. > > -Yonik > http://www.lucidimagination.com > > On Tue, Oct 20, 2009 at 12:25 PM, Teruhiko Kurosaka > wrote: > > Hi, > > I have my own Tokenizer that was working with Solr 1.3 fine > but threw an Exception when used with Solr 1.4 dev. > > > > This Tokenizer uses some JNI-side resources that it takes > in the constructor and it frees it in close(). > > > > The behavior seems to indicate that Solr 1.4 calls close() > then reset(Reader) in order to reuse the Tokenizer. But my > Tokenizer threw an Exception because its resource has been > freed already. My temporary fix was to move the resource > release code from close() to finalize(). But I'm not very > happy with it because the timing of resource release is up to > the garbage collector. > > > > Question #1: Is close() supposed to be called more than > once? To me, > > close() should be called only once at the end of life cycle of the > > Tokenizer. (The old reader shold be closed when reset(Reader) is > > called.) > > > > If the answer is Yes, then > > > > Question #2: Is there any better place to release the > internal resource than in finalize()? > > > > Thank you. > > > > T. "Kuro" Kurosaka > > > > >
Re: Where to free Tokenizer resources?
If you really want to release/acquire your resources each time the tokenizer is used, then release it in the close() and acquire in the reset(). There is no "done with this forever" callback. -Yonik http://www.lucidimagination.com On Tue, Oct 20, 2009 at 12:25 PM, Teruhiko Kurosaka wrote: > Hi, > I have my own Tokenizer that was working with Solr 1.3 fine but threw an > Exception when used with Solr 1.4 dev. > > This Tokenizer uses some JNI-side resources that it takes in the constructor > and it frees it in close(). > > The behavior seems to indicate that Solr 1.4 calls close() then reset(Reader) > in order to reuse the Tokenizer. But my Tokenizer threw an Exception because > its resource has been freed already. My temporary fix was to move the > resource release code from close() to finalize(). But I'm not very happy > with it because the timing of resource release is up to the garbage collector. > > Question #1: Is close() supposed to be called more than once? To me, close() > should be called only once at the end of life cycle of the Tokenizer. (The > old reader shold be closed when reset(Reader) is called.) > > If the answer is Yes, then > > Question #2: Is there any better place to release the internal resource than > in finalize()? > > Thank you. > > T. "Kuro" Kurosaka > >
Where to free Tokenizer resources?
Hi, I have my own Tokenizer that was working with Solr 1.3 fine but threw an Exception when used with Solr 1.4 dev. This Tokenizer uses some JNI-side resources that it takes in the constructor and it frees it in close(). The behavior seems to indicate that Solr 1.4 calls close() then reset(Reader) in order to reuse the Tokenizer. But my Tokenizer threw an Exception because its resource has been freed already. My temporary fix was to move the resource release code from close() to finalize(). But I'm not very happy with it because the timing of resource release is up to the garbage collector. Question #1: Is close() supposed to be called more than once? To me, close() should be called only once at the end of life cycle of the Tokenizer. (The old reader shold be closed when reset(Reader) is called.) If the answer is Yes, then Question #2: Is there any better place to release the internal resource than in finalize()? Thank you. T. "Kuro" Kurosaka
Re: maxClauseCount in solrconfig.xml
On Tue, Oct 20, 2009 at 10:53 AM, Mark Miller wrote: > Any objections to sneaking into 1.4? Nope - do it quick! -Yonik http://www.lucidimagination.com
Re: maxClauseCount in solrconfig.xml
Mark Miller wrote: > Yonik Seeley wrote: > >> On Tue, Oct 20, 2009 at 9:06 AM, Mark Miller wrote: >> >> >>> >>>1024 >>> >>> Anyone think we should clarify that? The built-in multiterm queries are >>> constant score now, so its a bit misleading. >>> >>> >> Hmmm, yep... out of date. >> >> >> >>> Also, why are we using >>> >>> prefixQuery.setRewriteMethod(MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE); >>> >>> >> I dunno - ask the guy who made the change ;-) >> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/search/SolrQueryParser.java?revision=801872&view=markup >> >> -Yonik >> http://www.lucidimagination.com >> >> > Heh - I suspected it was me - but I think I made them before AUTO was > available. Just didn't want to flip them now without bringing it up first :) > > Any objections to sneaking into 1.4? -- - Mark http://www.lucidimagination.com
Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
+1 for having soft reference an available option by configuration, keeping the current behavior as default. Bill 2009/10/20 Noble Paul നോബിള് नोब्ळ् > On Tue, Oct 20, 2009 at 6:07 PM, Mark Miller > wrote: > > > I'm +1 obviously ;) No one is talking about making it the default. And I > > think its well known that soft value caches can be a valid choice - > > thats why google has one in their collections here ;) Its a nice way to > > let your cache grow and shrink based on the available RAM. Its not > > always the right choice, but sure is a nice option. And it doesn't have > > much to do with Lucene's FieldCaches. The main reason for a soft value > > cache is not to avoid OOM. Set your cache sizes correctly for that. And > > even if it was to avoid OOM, who cares if something else causes more of > > them? Thats like not fixing a bug in a piece of code because another > > piece of code has more bugs. Anyway, their purpose is to allow the cache > > to size depending on the available free RAM IMO. > > > +1 > > > > > Noble Paul നോബിള് नोब्ळ् wrote: > > > So , is everyone now in favor of this feature? Who has a -1 on this? > and > > > what is the concern? > > > > > > On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller > > wrote: > > > > > > > > >> On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar < > > >> shalinman...@gmail.com> wrote: > > >> > > >> I don't think the debate is about weak reference vs. soft references. > > >> > > >> There appears to be confusion between the two here no matter what the > > >> debate - soft references are for cachinh, weak references are not so > > much. > > >> Getting it right is important. > > >> > > >> I > > >> > > >>> guess the point that Lance is making is that using such a technique > > will > > >>> make application performance less predictable. There's also a good > > chance > > >>> that a soft reference based cache will cause cache thrashing and will > > hide > > >>> OOMs caused by inadequate cache sizes. So basically we trade an OOM > for > > >>> more > > >>> CPU usage (due to re-computation of results). > > >>> > > >>> > > >> That's the whole point. Your not hiding anything. I don't follow you. > > >> > > >> > > >> > > >> > > >>> Personally, I think giving an option is fine. What if the user does > not > > >>> have > > >>> enough RAM and he is willing to pay the price? Right now, there is no > > way > > >>> he > > >>> can do that at all. However, the most frequent reason behind OOMs is > > not > > >>> having enough RAM to create the field caches and not Solr caches, so > > I'm > > >>> not > > >>> sure how important this is. > > >>> > > >>> > > >> How important is any feature? You don't have a use for it, so it's not > > >> important to you - someone else does so it is important to them. Soft > > value > > >> caches can be useful. > > >> > > >> > > >> > > >> > > >>> On Tue, Oct 20, 2009 at 8:41 AM, Mark Miller > > >>> wrote: > > >>> > > >>> There is a difference - weak references are not for very good for > > caches > > >>> > > - > > soft references (soft values here) are good for caches in most jvms. > > They > > can be very nice. Weak refs are eagerly reclaimed - it's suggested > > that > > impls should not eagerly reclaim soft refs. > > > > - Mark > > > > http://www.lucidimagination.com (mobile) > > > > > > On Oct 19, 2009, at 8:22 PM, Lance Norskog > wrote: > > > > "Soft references" then. "Weak pointers" is an older term. (They're > > > > > > > "weak" because some bully can steal their candy.) > > > > > > On Sun, Oct 18, 2009 at 8:37 PM, Jason Rutherglen > > > wrote: > > > > > > Lance, > > > > > >> Do you mean soft references? > > >> > > >> On Sun, Oct 18, 2009 at 3:59 PM, Lance Norskog > > > >> wrote: > > >> > > >> -1 for weak references in caching. > > >> > > >>> This makes memory management less deterministic (predictable) and > > at > > >>> peak can cause cache-thrashing. In other words, the worst case > gets > > >>> even more worse. When designing a system I want predictability > and > > I > > >>> want to control the worst case, because system meltdowns are > caused > > by > > >>> the worst case. Having thousands of small weak references does > the > > >>> opposite. > > >>> > > >>> On Sat, Oct 17, 2009 at 2:00 AM, Noble Paul (JIRA) < > > j...@apache.org> > > >>> wrote: > > >>> > > >>> > > >>> > > [ > > > > > > > https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766864#action_12766864 > > ] > > > > Noble Paul commented on SOLR-1513: > > -- > > > > bq.Google Collections is already checked in as a dependency of > > Carrot > > clustering. > > > > in that e
Re: maxClauseCount in solrconfig.xml
Yonik Seeley wrote: > On Tue, Oct 20, 2009 at 9:06 AM, Mark Miller wrote: > >> >>1024 >> >> Anyone think we should clarify that? The built-in multiterm queries are >> constant score now, so its a bit misleading. >> > > Hmmm, yep... out of date. > > >> Also, why are we using >> >> prefixQuery.setRewriteMethod(MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE); >> > > I dunno - ask the guy who made the change ;-) > http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/search/SolrQueryParser.java?revision=801872&view=markup > > -Yonik > http://www.lucidimagination.com > Heh - I suspected it was me - but I think I made them before AUTO was available. Just didn't want to flip them now without bringing it up first :) -- - Mark http://www.lucidimagination.com
Re: maxClauseCount in solrconfig.xml
On Tue, Oct 20, 2009 at 9:06 AM, Mark Miller wrote: > > 1024 > > Anyone think we should clarify that? The built-in multiterm queries are > constant score now, so its a bit misleading. Hmmm, yep... out of date. > Also, why are we using > > prefixQuery.setRewriteMethod(MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE); I dunno - ask the guy who made the change ;-) http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/search/SolrQueryParser.java?revision=801872&view=markup -Yonik http://www.lucidimagination.com
Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
On Tue, Oct 20, 2009 at 6:07 PM, Mark Miller wrote: > I'm +1 obviously ;) No one is talking about making it the default. And I > think its well known that soft value caches can be a valid choice - > thats why google has one in their collections here ;) Its a nice way to > let your cache grow and shrink based on the available RAM. Its not > always the right choice, but sure is a nice option. And it doesn't have > much to do with Lucene's FieldCaches. The main reason for a soft value > cache is not to avoid OOM. Set your cache sizes correctly for that. And > even if it was to avoid OOM, who cares if something else causes more of > them? Thats like not fixing a bug in a piece of code because another > piece of code has more bugs. Anyway, their purpose is to allow the cache > to size depending on the available free RAM IMO. > +1 > > Noble Paul നോബിള് नोब्ळ् wrote: > > So , is everyone now in favor of this feature? Who has a -1 on this? and > > what is the concern? > > > > On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller > wrote: > > > > > >> On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar < > >> shalinman...@gmail.com> wrote: > >> > >> I don't think the debate is about weak reference vs. soft references. > >> > >> There appears to be confusion between the two here no matter what the > >> debate - soft references are for cachinh, weak references are not so > much. > >> Getting it right is important. > >> > >> I > >> > >>> guess the point that Lance is making is that using such a technique > will > >>> make application performance less predictable. There's also a good > chance > >>> that a soft reference based cache will cause cache thrashing and will > hide > >>> OOMs caused by inadequate cache sizes. So basically we trade an OOM for > >>> more > >>> CPU usage (due to re-computation of results). > >>> > >>> > >> That's the whole point. Your not hiding anything. I don't follow you. > >> > >> > >> > >> > >>> Personally, I think giving an option is fine. What if the user does not > >>> have > >>> enough RAM and he is willing to pay the price? Right now, there is no > way > >>> he > >>> can do that at all. However, the most frequent reason behind OOMs is > not > >>> having enough RAM to create the field caches and not Solr caches, so > I'm > >>> not > >>> sure how important this is. > >>> > >>> > >> How important is any feature? You don't have a use for it, so it's not > >> important to you - someone else does so it is important to them. Soft > value > >> caches can be useful. > >> > >> > >> > >> > >>> On Tue, Oct 20, 2009 at 8:41 AM, Mark Miller > >>> wrote: > >>> > >>> There is a difference - weak references are not for very good for > caches > >>> > - > soft references (soft values here) are good for caches in most jvms. > They > can be very nice. Weak refs are eagerly reclaimed - it's suggested > that > impls should not eagerly reclaim soft refs. > > - Mark > > http://www.lucidimagination.com (mobile) > > > On Oct 19, 2009, at 8:22 PM, Lance Norskog wrote: > > "Soft references" then. "Weak pointers" is an older term. (They're > > > > "weak" because some bully can steal their candy.) > > > > On Sun, Oct 18, 2009 at 8:37 PM, Jason Rutherglen > > wrote: > > > > Lance, > > > >> Do you mean soft references? > >> > >> On Sun, Oct 18, 2009 at 3:59 PM, Lance Norskog > >> wrote: > >> > >> -1 for weak references in caching. > >> > >>> This makes memory management less deterministic (predictable) and > at > >>> peak can cause cache-thrashing. In other words, the worst case gets > >>> even more worse. When designing a system I want predictability and > I > >>> want to control the worst case, because system meltdowns are caused > by > >>> the worst case. Having thousands of small weak references does the > >>> opposite. > >>> > >>> On Sat, Oct 17, 2009 at 2:00 AM, Noble Paul (JIRA) < > j...@apache.org> > >>> wrote: > >>> > >>> > >>> > [ > > > https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766864#action_12766864 > ] > > Noble Paul commented on SOLR-1513: > -- > > bq.Google Collections is already checked in as a dependency of > Carrot > clustering. > > in that e need to move it to core. > > Jason . We do not need to remove the original option. We can > probably > add an extra parameter say softRef="true" or something. That way , > we > are > not screwing up anything and perf benefits can be studied > separately. > > > Use Google Collections in ConcurrentLRUCache > > > >
Re: maxClauseCount in solrconfig.xml
Mark Miller wrote: > > 1024 > > Anyone think we should clarify that? The built-in multiterm queries are > constant score now, so its a bit misleading. > > Also, why are we using > > > prefixQuery.setRewriteMethod(MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE); > > Don't we want to use AUTO for the multi-term queries? Its essentially > the same but with better performance for low term counts? > > In fact, range query is using auto - almost doesn't make sense not to use it for wildcard and prefix as well ... -- - Mark http://www.lucidimagination.com
maxClauseCount in solrconfig.xml
1024 Anyone think we should clarify that? The built-in multiterm queries are constant score now, so its a bit misleading. Also, why are we using prefixQuery.setRewriteMethod(MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE); Don't we want to use AUTO for the multi-term queries? Its essentially the same but with better performance for low term counts? -- - Mark http://www.lucidimagination.com
Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
I'm +1 obviously ;) No one is talking about making it the default. And I think its well known that soft value caches can be a valid choice - thats why google has one in their collections here ;) Its a nice way to let your cache grow and shrink based on the available RAM. Its not always the right choice, but sure is a nice option. And it doesn't have much to do with Lucene's FieldCaches. The main reason for a soft value cache is not to avoid OOM. Set your cache sizes correctly for that. And even if it was to avoid OOM, who cares if something else causes more of them? Thats like not fixing a bug in a piece of code because another piece of code has more bugs. Anyway, their purpose is to allow the cache to size depending on the available free RAM IMO. Noble Paul നോബിള് नोब्ळ् wrote: > So , is everyone now in favor of this feature? Who has a -1 on this? and > what is the concern? > > On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller wrote: > > >> On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar < >> shalinman...@gmail.com> wrote: >> >> I don't think the debate is about weak reference vs. soft references. >> >> There appears to be confusion between the two here no matter what the >> debate - soft references are for cachinh, weak references are not so much. >> Getting it right is important. >> >> I >> >>> guess the point that Lance is making is that using such a technique will >>> make application performance less predictable. There's also a good chance >>> that a soft reference based cache will cause cache thrashing and will hide >>> OOMs caused by inadequate cache sizes. So basically we trade an OOM for >>> more >>> CPU usage (due to re-computation of results). >>> >>> >> That's the whole point. Your not hiding anything. I don't follow you. >> >> >> >> >>> Personally, I think giving an option is fine. What if the user does not >>> have >>> enough RAM and he is willing to pay the price? Right now, there is no way >>> he >>> can do that at all. However, the most frequent reason behind OOMs is not >>> having enough RAM to create the field caches and not Solr caches, so I'm >>> not >>> sure how important this is. >>> >>> >> How important is any feature? You don't have a use for it, so it's not >> important to you - someone else does so it is important to them. Soft value >> caches can be useful. >> >> >> >> >>> On Tue, Oct 20, 2009 at 8:41 AM, Mark Miller >>> wrote: >>> >>> There is a difference - weak references are not for very good for caches >>> - soft references (soft values here) are good for caches in most jvms. They can be very nice. Weak refs are eagerly reclaimed - it's suggested that impls should not eagerly reclaim soft refs. - Mark http://www.lucidimagination.com (mobile) On Oct 19, 2009, at 8:22 PM, Lance Norskog wrote: "Soft references" then. "Weak pointers" is an older term. (They're > "weak" because some bully can steal their candy.) > > On Sun, Oct 18, 2009 at 8:37 PM, Jason Rutherglen > wrote: > > Lance, > >> Do you mean soft references? >> >> On Sun, Oct 18, 2009 at 3:59 PM, Lance Norskog >> wrote: >> >> -1 for weak references in caching. >> >>> This makes memory management less deterministic (predictable) and at >>> peak can cause cache-thrashing. In other words, the worst case gets >>> even more worse. When designing a system I want predictability and I >>> want to control the worst case, because system meltdowns are caused by >>> the worst case. Having thousands of small weak references does the >>> opposite. >>> >>> On Sat, Oct 17, 2009 at 2:00 AM, Noble Paul (JIRA) >>> wrote: >>> >>> >>> [ https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766864#action_12766864 ] Noble Paul commented on SOLR-1513: -- bq.Google Collections is already checked in as a dependency of Carrot clustering. in that e need to move it to core. Jason . We do not need to remove the original option. We can probably add an extra parameter say softRef="true" or something. That way , we are not screwing up anything and perf benefits can be studied separately. Use Google Collections in ConcurrentLRUCache > > > Key: SOLR-1513 > URL: https://issues.apache.org/jira/browse/SOLR-1513 > Project: Solr > Issue Type: Improvement > Components: search > Affects Versions: 1.4 >>
[jira] Resolved: (SOLR-1099) FieldAnalysisRequestHandler
[ https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Koji Sekiguchi resolved SOLR-1099. -- Resolution: Fixed Committed revision 827032. Thanks. > FieldAnalysisRequestHandler > --- > > Key: SOLR-1099 > URL: https://issues.apache.org/jira/browse/SOLR-1099 > Project: Solr > Issue Type: New Feature > Components: Analysis >Affects Versions: 1.3 >Reporter: Uri Boness >Assignee: Koji Sekiguchi > Fix For: 1.4 > > Attachments: AnalisysRequestHandler_refactored.patch, > analysis_request_handlers_incl_solrj.patch, > AnalysisRequestHandler_refactored1.patch, > FieldAnalysisRequestHandler_incl_test.patch, > SOLR-1099-ordered-TokenizerChain.patch, SOLR-1099.patch, SOLR-1099.patch, > SOLR-1099.patch > > > The FieldAnalysisRequestHandler provides the analysis functionality of the > web admin page as a service. This handler accepts a filetype/fieldname > parameter and a value and as a response returns a breakdown of the analysis > process. It is also possible to send a query value which will use the > configured query analyzer as well as a showmatch parameter which will then > mark every matched token as a match. > If this handler is added to the code base, I also recommend to rename the > current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have > them both inherit from one AnalysisRequestHandlerBase class which provides > the common functionality of the analysis breakdown and its translation to > named lists. This will also enhance the current AnalysisRequestHandler which > right now is fairly simplistic. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
So , is everyone now in favor of this feature? Who has a -1 on this? and what is the concern? On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller wrote: > > > On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar < > shalinman...@gmail.com> wrote: > > I don't think the debate is about weak reference vs. soft references. >> > > There appears to be confusion between the two here no matter what the > debate - soft references are for cachinh, weak references are not so much. > Getting it right is important. > > I >> guess the point that Lance is making is that using such a technique will >> make application performance less predictable. There's also a good chance >> that a soft reference based cache will cause cache thrashing and will hide >> OOMs caused by inadequate cache sizes. So basically we trade an OOM for >> more >> CPU usage (due to re-computation of results). >> > > That's the whole point. Your not hiding anything. I don't follow you. > > > >> Personally, I think giving an option is fine. What if the user does not >> have >> enough RAM and he is willing to pay the price? Right now, there is no way >> he >> can do that at all. However, the most frequent reason behind OOMs is not >> having enough RAM to create the field caches and not Solr caches, so I'm >> not >> sure how important this is. >> > > How important is any feature? You don't have a use for it, so it's not > important to you - someone else does so it is important to them. Soft value > caches can be useful. > > > >> On Tue, Oct 20, 2009 at 8:41 AM, Mark Miller >> wrote: >> >> There is a difference - weak references are not for very good for caches >>> - >>> soft references (soft values here) are good for caches in most jvms. They >>> can be very nice. Weak refs are eagerly reclaimed - it's suggested that >>> impls should not eagerly reclaim soft refs. >>> >>> - Mark >>> >>> http://www.lucidimagination.com (mobile) >>> >>> >>> On Oct 19, 2009, at 8:22 PM, Lance Norskog wrote: >>> >>> "Soft references" then. "Weak pointers" is an older term. (They're >>> "weak" because some bully can steal their candy.) On Sun, Oct 18, 2009 at 8:37 PM, Jason Rutherglen wrote: Lance, > > Do you mean soft references? > > On Sun, Oct 18, 2009 at 3:59 PM, Lance Norskog > wrote: > > -1 for weak references in caching. >> >> This makes memory management less deterministic (predictable) and at >> peak can cause cache-thrashing. In other words, the worst case gets >> even more worse. When designing a system I want predictability and I >> want to control the worst case, because system meltdowns are caused by >> the worst case. Having thousands of small weak references does the >> opposite. >> >> On Sat, Oct 17, 2009 at 2:00 AM, Noble Paul (JIRA) >> wrote: >> >> >>> [ >>> >>> https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766864#action_12766864 >>> ] >>> >>> Noble Paul commented on SOLR-1513: >>> -- >>> >>> bq.Google Collections is already checked in as a dependency of Carrot >>> clustering. >>> >>> in that e need to move it to core. >>> >>> Jason . We do not need to remove the original option. We can probably >>> add an extra parameter say softRef="true" or something. That way , we >>> are >>> not screwing up anything and perf benefits can be studied separately. >>> >>> >>> Use Google Collections in ConcurrentLRUCache >>> Key: SOLR-1513 URL: https://issues.apache.org/jira/browse/SOLR-1513 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: google-collect-snapshot.jar, SOLR-1513.patch ConcurrentHashMap is used in ConcurrentLRUCache. The Google Colletions concurrent map implementation allows for soft values that are great for caches that potentially exceed the allocated heap. Though I suppose Solr caches usually don't use too much RAM? http://code.google.com/p/google-collections/ >>> -- >>> This message is automatically generated by JIRA. >>> - >>> You can reply to this email to add a comment to the issue online. >>> >>> >>> >>> >> >> -- >> Lance Norskog >> goks...@gmail.com >> >> >> > -- Lance Norskog goks...@gmail.com >>> >> >> -- >> Regards, >> Shalin Shekhar Mangar. >> > -- - Noble
Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache
On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar > wrote: I don't think the debate is about weak reference vs. soft references. There appears to be confusion between the two here no matter what the debate - soft references are for cachinh, weak references are not so much. Getting it right is important. I guess the point that Lance is making is that using such a technique will make application performance less predictable. There's also a good chance that a soft reference based cache will cause cache thrashing and will hide OOMs caused by inadequate cache sizes. So basically we trade an OOM for more CPU usage (due to re-computation of results). That's the whole point. Your not hiding anything. I don't follow you. Personally, I think giving an option is fine. What if the user does not have enough RAM and he is willing to pay the price? Right now, there is no way he can do that at all. However, the most frequent reason behind OOMs is not having enough RAM to create the field caches and not Solr caches, so I'm not sure how important this is. How important is any feature? You don't have a use for it, so it's not important to you - someone else does so it is important to them. Soft value caches can be useful. On Tue, Oct 20, 2009 at 8:41 AM, Mark Miller wrote: There is a difference - weak references are not for very good for caches - soft references (soft values here) are good for caches in most jvms. They can be very nice. Weak refs are eagerly reclaimed - it's suggested that impls should not eagerly reclaim soft refs. - Mark http://www.lucidimagination.com (mobile) On Oct 19, 2009, at 8:22 PM, Lance Norskog wrote: "Soft references" then. "Weak pointers" is an older term. (They're "weak" because some bully can steal their candy.) On Sun, Oct 18, 2009 at 8:37 PM, Jason Rutherglen wrote: Lance, Do you mean soft references? On Sun, Oct 18, 2009 at 3:59 PM, Lance Norskog wrote: -1 for weak references in caching. This makes memory management less deterministic (predictable) and at peak can cause cache-thrashing. In other words, the worst case gets even more worse. When designing a system I want predictability and I want to control the worst case, because system meltdowns are caused by the worst case. Having thousands of small weak references does the opposite. On Sat, Oct 17, 2009 at 2:00 AM, Noble Paul (JIRA) > wrote: [ https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766864#action_12766864 ] Noble Paul commented on SOLR-1513: -- bq.Google Collections is already checked in as a dependency of Carrot clustering. in that e need to move it to core. Jason . We do not need to remove the original option. We can probably add an extra parameter say softRef="true" or something. That way , we are not screwing up anything and perf benefits can be studied separately. Use Google Collections in ConcurrentLRUCache Key: SOLR-1513 URL: https://issues.apache.org/jira/browse/SOLR-1513 Project: Solr Issue Type: Improvement Components: search Affects Versions: 1.4 Reporter: Jason Rutherglen Priority: Minor Fix For: 1.5 Attachments: google-collect-snapshot.jar, SOLR-1513.patch ConcurrentHashMap is used in ConcurrentLRUCache. The Google Colletions concurrent map implementation allows for soft values that are great for caches that potentially exceed the allocated heap. Though I suppose Solr caches usually don't use too much RAM? http://code.google.com/p/google-collections/ -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com -- Regards, Shalin Shekhar Mangar.