Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Oct 20, 2009 at 11:57 PM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller 
> wrote:
>
> >
> >
> > On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> >  I don't think the debate is about weak reference vs. soft references.
> >>
> >
> > There appears to be confusion between the two here no matter what the
> > debate - soft references are for cachinh, weak references are not so
> much.
> > Getting it right is important.
> >
> >  I
> >> guess the point that Lance is making is that using such a technique will
> >> make application performance less predictable. There's also a good
> chance
> >> that a soft reference based cache will cause cache thrashing and will
> hide
> >> OOMs caused by inadequate cache sizes. So basically we trade an OOM for
> >> more
> >> CPU usage (due to re-computation of results).
> >>
> >
> > That's the whole point. Your not hiding anything. I don't follow you.
> >
>
> Using a soft reference based cache can hide the fact that one has
> inadequate
> memory for the cache size one has configured. Don't get me wrong. I'm not
> against the feature. I was merely trying to explain Lance's concerns as I
> understood them.
>
Lance concern is valid. Assuming that we are going to have this feature
(non-default)  we need a way to know that cache trashing has happened.I mean
the statistics should also expose the no:of cache entries which got removed.
This should enable the user to decide whether there should be more RAM or he
is happy to live w/ the extra cpu cycles for recomputation

>
>
> >
> >
> >
> >> Personally, I think giving an option is fine. What if the user does not
> >> have
> >> enough RAM and he is willing to pay the price? Right now, there is no
> way
> >> he
> >> can do that at all. However, the most frequent reason behind OOMs is not
> >> having enough RAM to create the field caches and not Solr caches, so I'm
> >> not
> >> sure how important this is.
> >>
> >
> > How important is any feature? You don't have a use for it, so it's not
> > important to you - someone else does so it is important to them. Soft
> value
> > caches can be useful.
>
>
> Don't jump to conclusions :)
>
> The reason behind this feature request is to have Solr caches which resize
> themselves when enough memory is not available. I agree that soft value
> caches are useful for this. All I'm saying is that most OOMs that get
> reported on the list are due to inadequate free memory for allocating field
> caches. Finding a way around that will be the key to make a Lucene/Solr
> application practical in a limited memory environment.
>
> Just for the record, I'm +1 for adding this feature but keeping the current
> behavior as the default.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: deploy solr in Eclipse IDE

2009-10-20 Thread Amit Nithian
Pradeep,
Attached are the files. You may have to open them in a text editor and
rename the project to match yours but should be pretty straightforward. I
used this with 1.3 trunk at the time so things may have changed but it's
easy enough to modify in eclipse.

- Amit

On Mon, Oct 19, 2009 at 4:16 PM, Pradeep Pujari wrote:

> This is ulr is helpful. If I checkout in Eclipse using SVN(Subclipse), the
> source files are not as per package structure. Can you please send me your
> ..project and .classpth files? Thank you in advance.
>
> Pradeep
>
> --- On Sun, 10/18/09, Amit Nithian  wrote:
>
> > From: Amit Nithian 
> > Subject: Re: deploy solr in Eclipse IDE
> > To: solr-dev@lucene.apache.org
> > Date: Sunday, October 18, 2009, 11:06 PM
> > Hey Pradeep,
> > Check out
> >
> http://lucene.apache.org/solr/version_control.html#Anonymous+Access+%28read-only%29
> >
> > <
> http://lucene.apache.org/solr/version_control.html#Anonymous+Access+%28read-only%29
> >If
> > you need more help with setting up Eclipse and Solr trunk
> > send me an email.
> > I can send you my .project and .classpath files as I have
> > it for my setup.
> >
> > Take care
> > Amit
> >
> > On Sun, Oct 18, 2009 at 11:34 AM, Pradeep Pujari <
> prade...@rocketmail.com>wrote:
> >
> > > Hi Amit,
> > > This is what I am looking for. Do you know the URL for
> > trunk?
> > >
> > > Thanks,
> > > Pradeep.
> > >
> > > --- On Sun, 10/18/09, Amit Nithian 
> > wrote:
> > >
> > > > From: Amit Nithian 
> > > > Subject: Re: deploy solr in Eclipse IDE
> > > > To: solr-dev@lucene.apache.org
> > > > Date: Sunday, October 18, 2009, 12:55 AM
> > > > I think you may have better luck
> > > > setting up Eclipse, Subclipse etc and hook
> > > > off of trunk rather than having to re-create the
> > eclipse
> > > > project every time
> > > > a nightly build comes out.
> > > > I simply have an eclipse project tied to trunk
> > and every so
> > > > often i'll do an
> > > > SVN update when I want/need the latest code.
> > > >
> > > > hope that helps some!
> > > > Amit
> > > >
> > > > On Thu, Oct 15, 2009 at 2:31 AM, Brian Carmalt
> > 
> > > > wrote:
> > > >
> > > > > Hello,
> > > > >
> > > > > I Start Solr with Jetty using the following
> > code. If
> > > > the classpath and
> > > > > src paths are set correctly in Eclipse and
> > you pass
> > > > the solr.home to the
> > > > > VM on startup, you just have to start this
> > class and
> > > > you can debug Solr
> > > > > in Eclipse.
> > > > >
> > > > > 
> > > > > import org.mortbay.jetty.Connector;
> > > > > import org.mortbay.jetty.Server;
> > > > > import
> > org.mortbay.jetty.webapp.WebAppContext;
> > > > >
> > > > > public class JettyStarter {
> > > > >
> > > > >/**
> > > > > *
> > @param args
> > > > > */
> > > > >public static
> > void
> > > > main(String[] args) {
> > > > >
> > > > >
> > > > try {
> > > > >
> > > > >
> > > > Server
> > server = new
> > > > Server();
> > > > >
> > > > >
> > > >
> >WebAppContext solr = new
> > > > WebAppContext();
> > > > >
> > > >
> >solr.setContextPath("/solr");
> > > > > solr.setWar("Path to solr directory or
> > war");
> > > > >
> > > >
> >server.addHandler(solr);
> > > > >
> > > >
> >server.setStopAtShutdown(true);
> > > > >
> > > >
> >server.start();
> > > > >
> > > > } catch (Exception e) {
> > > > >
> > > > // TODO
> > Auto-generated catch
> > > > block
> > > > >
> > > >
> >e.printStackTrace();
> > > > >
> > > > }
> > > > >}
> > > > >
> > > > > }
> > > > >
> > > > > 
> > > > >
> > > > >
> > > > > Am Dienstag, den 13.10.2009, 16:43 -0700
> > schrieb
> > > > Pradeep Pujari:
> > > > > > Hi All,
> > > > > >
> > > > > > I am trying to install solr nightly
> > build into
> > > > Eclipse IDE and facing lot
> > > > > of issues while importing the zip file. The
> > build
> > > > path, libs and various
> > > > > source files are scattered. It took me lot
> > of tine to
> > > > configure and make it
> > > > > run.
> > > > > >
> > > > > > What development environment are being
> > used and
> > > > is there a smooth way of
> > > > > importing daily-nightly build into eclipse?
> > > > > >
> > > > > > Please help.
> > > > > >
> > > > > > Thanks,
> > > > > > Pradeep.
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > >
> > >
> > >
> >
>
>


Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-20 Thread Lance Norskog
On-topic: Will the Google implementations + soft references behave
well with 8+ processors?

Semi-on-topic: If you want to really know multiprocessor algorithms,
this is the bible: "The Art Of Multiprocessor Programming". Hundreds
of parallel algorithms for many different jobs, all coded in Java, and
cross-referenced with the java.util.concurrent package. Just amazing.

http://www.elsevier.com/wps/find/bookdescription.cws_home/714091/description#description

Off-topic: I was representing a system troubleshooting philosophy:
"Fail Early, Fail Loud". Meaning, if there is a problem like OOMs,
tell me and I'll fix it permanently. But different situations call for
different answers, and Mark is representing "just keep working, ok?".
Brittle v.s. Supple is one way to think of it.

On Tue, Oct 20, 2009 at 11:27 AM, Shalin Shekhar Mangar
 wrote:
> On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller  wrote:
>
>>
>>
>> On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>>
>>  I don't think the debate is about weak reference vs. soft references.
>>>
>>
>> There appears to be confusion between the two here no matter what the
>> debate - soft references are for cachinh, weak references are not so much.
>> Getting it right is important.
>>
>>  I
>>> guess the point that Lance is making is that using such a technique will
>>> make application performance less predictable. There's also a good chance
>>> that a soft reference based cache will cause cache thrashing and will hide
>>> OOMs caused by inadequate cache sizes. So basically we trade an OOM for
>>> more
>>> CPU usage (due to re-computation of results).
>>>
>>
>> That's the whole point. Your not hiding anything. I don't follow you.
>>
>
> Using a soft reference based cache can hide the fact that one has inadequate
> memory for the cache size one has configured. Don't get me wrong. I'm not
> against the feature. I was merely trying to explain Lance's concerns as I
> understood them.
>
>
>>
>>
>>
>>> Personally, I think giving an option is fine. What if the user does not
>>> have
>>> enough RAM and he is willing to pay the price? Right now, there is no way
>>> he
>>> can do that at all. However, the most frequent reason behind OOMs is not
>>> having enough RAM to create the field caches and not Solr caches, so I'm
>>> not
>>> sure how important this is.
>>>
>>
>> How important is any feature? You don't have a use for it, so it's not
>> important to you - someone else does so it is important to them. Soft value
>> caches can be useful.
>
>
> Don't jump to conclusions :)
>
> The reason behind this feature request is to have Solr caches which resize
> themselves when enough memory is not available. I agree that soft value
> caches are useful for this. All I'm saying is that most OOMs that get
> reported on the list are due to inadequate free memory for allocating field
> caches. Finding a way around that will be the key to make a Lucene/Solr
> application practical in a limited memory environment.
>
> Just for the record, I'm +1 for adding this feature but keeping the current
> behavior as the default.
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Lance Norskog
goks...@gmail.com


Re: clustering schema

2009-10-20 Thread Yonik Seeley
Actually just copying the example schema to contrib seemed to work
fine... those should probably be kept in alignment regardless of if we
decide to do something different about the data directory.

-Yonik
http://www.lucidimagination.com


Re: clustering schema

2009-10-20 Thread Yonik Seeley
On Tue, Oct 20, 2009 at 5:31 PM, Grant Ingersoll  wrote:
> Can't we set up the clustering solrconfig to have a different data directory
> and remove the default of ./solr/data?   I get caught on this gotcha in a
> lot of places these days b/c I am often trying out lots of different
> configs.

We could, but that has it's own downsides... like creating lucene
indexes in various places in source directories like contrib.

-Yonik
http://www.lucidimagination.com

> On Oct 20, 2009, at 5:13 PM, Yonik Seeley wrote:
>
>> So when I go to try the clustering example, I fire up the server, hit
>> it with the example on the Wiki
>>
>> http://localhost:8983/solr/select?indent=on&q=*:*&rows=10&clustering=true
>>
>> And... boom.
>>
>> java.lang.NullPointerException
>>        at
>> org.apache.solr.schema.SortableIntField.write(SortableIntField.java:72)
>>        at org.apache.solr.schema.SchemaField.write(SchemaField.java:108)
>> [...]
>>
>> It's because a schema mismatch of course... I had already indexed data
>> using the normal schema, and now we're using a different schema/config
>> with the same data dir.
>> I imagine this will be a common mistake.
>>
>> Should we try to do this like SolrCell... just make it a lazy handler
>> and reference the libs in solrconfig.xml?  Oh wait... searchComponents
>> can't be lazy I don't think... darn.
>> I guess the only "fix" (it's not really a bug, just undesirable) is to
>> try and get the schemas closer together?
>>
>> -Yonik
>> http://www.lucidimagination.com
>
> --
> Grant Ingersoll
> http://www.lucidimagination.com/
>
> Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
> Solr/Lucene:
> http://www.lucidimagination.com/search
>
>


Re: clustering schema

2009-10-20 Thread Grant Ingersoll
Can't we set up the clustering solrconfig to have a different data  
directory and remove the default of ./solr/data?   I get caught on  
this gotcha in a lot of places these days b/c I am often trying out  
lots of different configs.


On Oct 20, 2009, at 5:13 PM, Yonik Seeley wrote:


So when I go to try the clustering example, I fire up the server, hit
it with the example on the Wiki

http://localhost:8983/solr/select?indent=on&q=*:*&rows=10&clustering=true

And... boom.

java.lang.NullPointerException
	at org.apache.solr.schema.SortableIntField.write 
(SortableIntField.java:72)

at org.apache.solr.schema.SchemaField.write(SchemaField.java:108)
[...]

It's because a schema mismatch of course... I had already indexed data
using the normal schema, and now we're using a different schema/config
with the same data dir.
I imagine this will be a common mistake.

Should we try to do this like SolrCell... just make it a lazy handler
and reference the libs in solrconfig.xml?  Oh wait... searchComponents
can't be lazy I don't think... darn.
I guess the only "fix" (it's not really a bug, just undesirable) is to
try and get the schemas closer together?

-Yonik
http://www.lucidimagination.com


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



clustering schema

2009-10-20 Thread Yonik Seeley
So when I go to try the clustering example, I fire up the server, hit
it with the example on the Wiki

http://localhost:8983/solr/select?indent=on&q=*:*&rows=10&clustering=true

And... boom.

java.lang.NullPointerException
at 
org.apache.solr.schema.SortableIntField.write(SortableIntField.java:72)
at org.apache.solr.schema.SchemaField.write(SchemaField.java:108)
[...]

It's because a schema mismatch of course... I had already indexed data
using the normal schema, and now we're using a different schema/config
with the same data dir.
I imagine this will be a common mistake.

Should we try to do this like SolrCell... just make it a lazy handler
and reference the libs in solrconfig.xml?  Oh wait... searchComponents
can't be lazy I don't think... darn.
I guess the only "fix" (it's not really a bug, just undesirable) is to
try and get the schemas closer together?

-Yonik
http://www.lucidimagination.com


RE: Where to free Tokenizer resources?

2009-10-20 Thread Teruhiko Kurosaka
Erik,
That's a good idea.  

But that means the resource releasing code must live in
the finialize method and it has to wait until GC
kicks in.  Correct?
-kuro  

> -Original Message-
> From: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
> Sent: Tuesday, October 20, 2009 12:37 PM
> To: solr-dev@lucene.apache.org
> Subject: Re: Where to free Tokenizer resources?
> 
> What about acquiring the resource in your tokenizer factory 
> instead of at the tokenizer level?
> 
>   Erik
> 
> 
> On Oct 20, 2009, at 1:16 PM, Teruhiko Kurosaka wrote:
> 
> >
> > Yonik,
> >
> >> If you really want to release/acquire your resources each time the 
> >> tokenizer is used, then release it in the close() and 
> acquire in the 
> >> reset().  There is no "done with this forever" callback.
> >
> > I wanted to avoid that because acquring this resource is a 
> relatively 
> > expensive operation.  I wanted to do that per instance.  I guess I 
> > should lobby Lucene folks and ask them to consider adding a 
> new method 
> > to do so.
> >
> > Is my guess that Solr calls Tokenizer.close() more than 
> once correct? 
> > My observation of the behavior suggets it but I couldn't find a 
> > concrete evidence in the source.
> >
> >
> >>
> >> -Yonik
> >> http://www.lucidimagination.com
> >>
> >> On Tue, Oct 20, 2009 at 12:25 PM, Teruhiko Kurosaka 
> >>  wrote:
> >>> Hi,
> >>> I have my own Tokenizer that was working with Solr 1.3 fine
> >> but threw an Exception when used with Solr 1.4 dev.
> >>>
> >>> This Tokenizer uses some JNI-side resources that it takes
> >> in the constructor and it frees it in close().
> >>>
> >>> The behavior seems to indicate that Solr 1.4 calls close()
> >> then reset(Reader) in order to reuse the Tokenizer.  But 
> my Tokenizer 
> >> threw an Exception because its resource has been freed already. My 
> >> temporary fix was to move the resource release code from 
> close() to 
> >> finalize().  But I'm not very happy with it because the timing of 
> >> resource release is up to the garbage collector.
> >>>
> >>> Question #1: Is close() supposed to be called more than
> >> once? To me,
> >>> close() should be called only once at the end of life 
> cycle of the 
> >>> Tokenizer.  (The old reader shold be closed when reset(Reader) is
> >>> called.)
> >>>
> >>> If the answer is Yes, then
> >>>
> >>> Question #2: Is there any better place to release the
> >> internal resource than in finalize()?
> >>>
> >>> Thank you.
> >>>
> >>> T. "Kuro" Kurosaka
> >>>
> >>>
> >>
> 
> 

[jira] Commented: (SOLR-1516) DocumentList and Document QueryResponseWriter

2009-10-20 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767925#action_12767925
 ] 

Chris A. Mattmann commented on SOLR-1516:
-

Hi All:

I don't mean to be a pest here, but I've seen the amount of activity going on 
the SOLR lists recently, as well as the decision to hold off on calling for a 
vote on 1.4 until Lucene 2.9.1 is released. This patch is self-contained, 
doesn't touch any code, and honestly, it only adds functionality that would 
have made my life as a user of SOLR a lot easier (I would have saved the hour 
of debugging and printing out #getClass on the Objects in NamedList, and on top 
of that only had to implement an #emitDoc or #emitDocList function and 
optionally #emitHeader and #emitFooter, rather than the rest of the supporting 
code).

Am I the only one that's run into a problem trying to write a custom XML SOLR 
output that's inherently simple? That is, XML output that doesn't need to worry 
about the inherent types of the named values in the NamedList, output that only 
cares about spitting out the set of returned Documents?

It would be great to see this get into 1.4, but if I'm the outlier, I can wait. 
Just thought I'd raise the issue.

Cheers,
Chris


> DocumentList and Document QueryResponseWriter
> -
>
> Key: SOLR-1516
> URL: https://issues.apache.org/jira/browse/SOLR-1516
> Project: Solr
>  Issue Type: New Feature
>  Components: search
>Affects Versions: 1.3
> Environment: My MacBook Pro laptop.
>Reporter: Chris A. Mattmann
>Priority: Minor
> Fix For: 1.5
>
> Attachments: SOLR-1516.Mattmann.101809.patch.txt
>
>
> I tried to implement a custom QueryResponseWriter the other day and was 
> amazed at the level of unmarshalling and weeding through objects that was 
> necessary just to format the output o.a.l.Document list. As a user, I wanted 
> to be able to implement either 2 functions:
> * process a document at a time, and format it (for speed/efficiency)
> * process all the documents at once, and format them (in case an aggregate 
> calculation is necessary for outputting)
> So, I've decided to contribute 2 simple classes that I think are sufficiently 
> generic and reusable. The first is o.a.s.request.DocumentResponseWriter -- it 
> handles the first bullet above. The second is 
> o.a.s.request.DocumentListResponseWriter. Both are abstract base classes and 
> require the user to implement either an #emitDoc function (in the case of 
> bullet 1), or an #emitDocList function (in the case of bullet 2). Both 
> classes provide an #emitHeader and #emitFooter function set that handles 
> formatting and output before the Document list is processed.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: Where to free Tokenizer resources?

2009-10-20 Thread Erik Hatcher
What about acquiring the resource in your tokenizer factory instead of  
at the tokenizer level?


Erik


On Oct 20, 2009, at 1:16 PM, Teruhiko Kurosaka wrote:



Yonik,


If you really want to release/acquire your resources each
time the tokenizer is used, then release it in the close()
and acquire in the reset().  There is no "done with this
forever" callback.


I wanted to avoid that because acquring this resource
is a relatively expensive operation.  I wanted to do
that per instance.  I guess I should lobby Lucene folks
and ask them to consider adding a new method to do so.

Is my guess that Solr calls Tokenizer.close() more than
once correct? My observation of the behavior suggets
it but I couldn't find a concrete evidence in the source.




-Yonik
http://www.lucidimagination.com

On Tue, Oct 20, 2009 at 12:25 PM, Teruhiko Kurosaka
 wrote:

Hi,
I have my own Tokenizer that was working with Solr 1.3 fine

but threw an Exception when used with Solr 1.4 dev.


This Tokenizer uses some JNI-side resources that it takes

in the constructor and it frees it in close().


The behavior seems to indicate that Solr 1.4 calls close()

then reset(Reader) in order to reuse the Tokenizer.  But my
Tokenizer threw an Exception because its resource has been
freed already. My temporary fix was to move the resource
release code from close() to finalize().  But I'm not very
happy with it because the timing of resource release is up to
the garbage collector.


Question #1: Is close() supposed to be called more than

once? To me,

close() should be called only once at the end of life cycle of the
Tokenizer.  (The old reader shold be closed when reset(Reader) is
called.)

If the answer is Yes, then

Question #2: Is there any better place to release the

internal resource than in finalize()?


Thank you.

T. "Kuro" Kurosaka








Re: TrieField -> NumericField ?

2009-10-20 Thread Mark Miller
Yonik Seeley wrote:
> On Tue, Oct 20, 2009 at 2:18 PM, Chris Hostetter
>  wrote:
>   
>> I just realized we still have "o.a.s.schema.Trie*Field" classes in Solr
>> but Lucene switched to using "NumericField" ... should we convert the Solr
>> class names prior to 1.4?
>> 
>
> I dunno - NumericField is too generic.  We still have two other types
> of numeric fields.
>
> -Yonik
> http://www.lucidimagination.com
>   
I spotted the same thing this morning and was about to raise it when I
came to the same conclusion.

-- 
- Mark

http://www.lucidimagination.com





Re: TrieField -> NumericField ?

2009-10-20 Thread Chris Hostetter

: > I just realized we still have "o.a.s.schema.Trie*Field" classes in Solr
: > but Lucene switched to using "NumericField" ... should we convert the Solr
: > class names prior to 1.4?
: 
: I dunno - NumericField is too generic.  We still have two other types
: of numeric fields.

I'm fine with that ... I just wanted to make sure it was something we at 
least thought about (and not just an oversight)


-Hoss



Re: TrieField -> NumericField ?

2009-10-20 Thread Yonik Seeley
On Tue, Oct 20, 2009 at 2:18 PM, Chris Hostetter
 wrote:
>
> I just realized we still have "o.a.s.schema.Trie*Field" classes in Solr
> but Lucene switched to using "NumericField" ... should we convert the Solr
> class names prior to 1.4?

I dunno - NumericField is too generic.  We still have two other types
of numeric fields.

-Yonik
http://www.lucidimagination.com


Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-20 Thread Shalin Shekhar Mangar
On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller  wrote:

>
>
> On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>  I don't think the debate is about weak reference vs. soft references.
>>
>
> There appears to be confusion between the two here no matter what the
> debate - soft references are for cachinh, weak references are not so much.
> Getting it right is important.
>
>  I
>> guess the point that Lance is making is that using such a technique will
>> make application performance less predictable. There's also a good chance
>> that a soft reference based cache will cause cache thrashing and will hide
>> OOMs caused by inadequate cache sizes. So basically we trade an OOM for
>> more
>> CPU usage (due to re-computation of results).
>>
>
> That's the whole point. Your not hiding anything. I don't follow you.
>

Using a soft reference based cache can hide the fact that one has inadequate
memory for the cache size one has configured. Don't get me wrong. I'm not
against the feature. I was merely trying to explain Lance's concerns as I
understood them.


>
>
>
>> Personally, I think giving an option is fine. What if the user does not
>> have
>> enough RAM and he is willing to pay the price? Right now, there is no way
>> he
>> can do that at all. However, the most frequent reason behind OOMs is not
>> having enough RAM to create the field caches and not Solr caches, so I'm
>> not
>> sure how important this is.
>>
>
> How important is any feature? You don't have a use for it, so it's not
> important to you - someone else does so it is important to them. Soft value
> caches can be useful.


Don't jump to conclusions :)

The reason behind this feature request is to have Solr caches which resize
themselves when enough memory is not available. I agree that soft value
caches are useful for this. All I'm saying is that most OOMs that get
reported on the list are due to inadequate free memory for allocating field
caches. Finding a way around that will be the key to make a Lucene/Solr
application practical in a limited memory environment.

Just for the record, I'm +1 for adding this feature but keeping the current
behavior as the default.

-- 
Regards,
Shalin Shekhar Mangar.


TrieField -> NumericField ?

2009-10-20 Thread Chris Hostetter


I just realized we still have "o.a.s.schema.Trie*Field" classes in Solr
but Lucene switched to using "NumericField" ... should we convert the Solr 
class names prior to 1.4?




-Hoss



[jira] Commented: (SOLR-1514) Facet search results contain 0:0 entries although '0' values were not indexed.

2009-10-20 Thread Hoss Man (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12767890#action_12767890
 ] 

Hoss Man commented on SOLR-1514:


Can you provide a JUnit test case, or a schema.xml + some sample docs that 
reproduce this behavior?

> Facet search results contain 0:0 entries although '0' values were not indexed.
> --
>
> Key: SOLR-1514
> URL: https://issues.apache.org/jira/browse/SOLR-1514
> Project: Solr
>  Issue Type: Bug
>  Components: search
>Affects Versions: 1.3
> Environment: Solr is on: Linux  2.6.18-92.1.13.el5xen
>Reporter: Renata Perkowska
>
> Hi,
> in my Jmeter  ATs  I can see that under some circumstances facet search 
> results contain '0' both as keys
> and values for the integer field called 'year' although I never index zeros. 
> When I do a normal search, I don't see any indexed fields with zeros. 
> When I run my facet test (using JMeter) in isolation, everything works fine. 
> It happens only when it's being run after other tests
> (and other indexing/deleting). On the other hand it shouldn't be the case 
> that other indexing are influencing this test, as at the end of each test I'm 
> deleting
> indexed documents so before running the facet test an index is empty.
> My facet test looks as follows:
>  1. Index group of documents
>  2. Perform search on facets
>  3. Remove documents from the index.
> The results that I'm getting for an integer field 'year':
>  1990:4
>  1995:4
>  0:0
>  1991:0
>  1992:0
>  1993:0
>  1994:0
>  1996:0
>  1997:0
>  1998:0
> I'm indexing only values 1990-1999, so there certainly shouldn't be any '0'  
> as keys in the result set.
> The indexed is being optimized not after each document deletion from and 
> index, but only when an index is loaded/unloaded, so the optimization won't 
> solve the problem in this case. 
> If the facet.mincount>0 is provided, then  I'm not getting 0:0, but other 
> entries with '0' values are gone as well:
> 1990:4
> 1995:4
> I'm also indexing text fields, but I don't see a similar situation in this 
> case. This bug only happens for integer fields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: Where to free Tokenizer resources?

2009-10-20 Thread Teruhiko Kurosaka

Yonik,

> If you really want to release/acquire your resources each 
> time the tokenizer is used, then release it in the close() 
> and acquire in the reset().  There is no "done with this 
> forever" callback.

I wanted to avoid that because acquring this resource
is a relatively expensive operation.  I wanted to do
that per instance.  I guess I should lobby Lucene folks 
and ask them to consider adding a new method to do so.

Is my guess that Solr calls Tokenizer.close() more than
once correct? My observation of the behavior suggets
it but I couldn't find a concrete evidence in the source.


> 
> -Yonik
> http://www.lucidimagination.com
> 
> On Tue, Oct 20, 2009 at 12:25 PM, Teruhiko Kurosaka 
>  wrote:
> > Hi,
> > I have my own Tokenizer that was working with Solr 1.3 fine 
> but threw an Exception when used with Solr 1.4 dev.
> >
> > This Tokenizer uses some JNI-side resources that it takes 
> in the constructor and it frees it in close().
> >
> > The behavior seems to indicate that Solr 1.4 calls close() 
> then reset(Reader) in order to reuse the Tokenizer.  But my 
> Tokenizer threw an Exception because its resource has been 
> freed already. My temporary fix was to move the resource 
> release code from close() to finalize().  But I'm not very 
> happy with it because the timing of resource release is up to 
> the garbage collector.
> >
> > Question #1: Is close() supposed to be called more than 
> once? To me, 
> > close() should be called only once at the end of life cycle of the 
> > Tokenizer.  (The old reader shold be closed when reset(Reader) is 
> > called.)
> >
> > If the answer is Yes, then
> >
> > Question #2: Is there any better place to release the 
> internal resource than in finalize()?
> >
> > Thank you.
> >
> > T. "Kuro" Kurosaka
> >
> >
> 

Re: Where to free Tokenizer resources?

2009-10-20 Thread Yonik Seeley
If you really want to release/acquire your resources each time the
tokenizer is used, then release it in the close() and acquire in the
reset().  There is no "done with this forever" callback.

-Yonik
http://www.lucidimagination.com

On Tue, Oct 20, 2009 at 12:25 PM, Teruhiko Kurosaka  wrote:
> Hi,
> I have my own Tokenizer that was working with Solr 1.3 fine but threw an 
> Exception when used with Solr 1.4 dev.
>
> This Tokenizer uses some JNI-side resources that it takes in the constructor 
> and it frees it in close().
>
> The behavior seems to indicate that Solr 1.4 calls close() then reset(Reader) 
> in order to reuse the Tokenizer.  But my Tokenizer threw an Exception because 
> its resource has been freed already. My temporary fix was to move the 
> resource release code from close() to finalize().  But I'm not very happy 
> with it because the timing of resource release is up to the garbage collector.
>
> Question #1: Is close() supposed to be called more than once? To me, close() 
> should be called only once at the end of life cycle of the Tokenizer.  (The 
> old reader shold be closed when reset(Reader) is called.)
>
> If the answer is Yes, then
>
> Question #2: Is there any better place to release the internal resource than 
> in finalize()?
>
> Thank you.
>
> T. "Kuro" Kurosaka
>
>


Where to free Tokenizer resources?

2009-10-20 Thread Teruhiko Kurosaka
Hi,
I have my own Tokenizer that was working with Solr 1.3 fine but threw an 
Exception when used with Solr 1.4 dev.

This Tokenizer uses some JNI-side resources that it takes in the constructor 
and it frees it in close().

The behavior seems to indicate that Solr 1.4 calls close() then reset(Reader) 
in order to reuse the Tokenizer.  But my Tokenizer threw an Exception because 
its resource has been freed already. My temporary fix was to move the resource 
release code from close() to finalize().  But I'm not very happy with it 
because the timing of resource release is up to the garbage collector.

Question #1: Is close() supposed to be called more than once? To me, close() 
should be called only once at the end of life cycle of the Tokenizer.  (The old 
reader shold be closed when reset(Reader) is called.)

If the answer is Yes, then

Question #2: Is there any better place to release the internal resource than in 
finalize()?

Thank you.

T. "Kuro" Kurosaka



Re: maxClauseCount in solrconfig.xml

2009-10-20 Thread Yonik Seeley
On Tue, Oct 20, 2009 at 10:53 AM, Mark Miller  wrote:
> Any objections to sneaking into 1.4?

Nope - do it quick!

-Yonik
http://www.lucidimagination.com


Re: maxClauseCount in solrconfig.xml

2009-10-20 Thread Mark Miller
Mark Miller wrote:
> Yonik Seeley wrote:
>   
>> On Tue, Oct 20, 2009 at 9:06 AM, Mark Miller  wrote:
>>   
>> 
>>>
>>>1024
>>>
>>> Anyone think we should clarify that? The built-in multiterm queries are
>>> constant score now, so its a bit misleading.
>>> 
>>>   
>> Hmmm, yep... out of date.
>>
>>   
>> 
>>> Also, why are we using
>>>
>>> prefixQuery.setRewriteMethod(MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE);
>>> 
>>>   
>> I dunno - ask the guy who made the change ;-)
>> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/search/SolrQueryParser.java?revision=801872&view=markup
>>
>> -Yonik
>> http://www.lucidimagination.com
>>   
>> 
> Heh - I suspected it was me - but I think I made them before AUTO was
> available. Just didn't want to flip them now without bringing it up first :)
>
>   
Any objections to sneaking into 1.4?

-- 
- Mark

http://www.lucidimagination.com





Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-20 Thread Bill Au
+1 for having soft reference an available option by configuration, keeping
the current behavior as default.

Bill

2009/10/20 Noble Paul നോബിള്‍ नोब्ळ् 

> On Tue, Oct 20, 2009 at 6:07 PM, Mark Miller 
> wrote:
>
> > I'm +1 obviously ;) No one is talking about making it the default. And I
> > think its well known that soft value caches can be a valid choice -
> > thats why google has one in their collections here ;) Its a nice way to
> > let your cache grow and shrink based on the available RAM. Its not
> > always the right choice, but sure is a nice option. And it doesn't have
> > much to do with Lucene's FieldCaches. The main reason for a soft value
> > cache is not to avoid OOM. Set your cache sizes correctly for that. And
> > even if it was to avoid OOM, who cares if something else causes more of
> > them? Thats like not fixing a bug in a piece of code because another
> > piece of code has more bugs. Anyway, their purpose is to allow the cache
> > to size depending on the available free RAM IMO.
> >
> +1
>
> >
> > Noble Paul നോബിള്‍ नोब्ळ् wrote:
> > > So , is everyone now in favor of this feature? Who has a -1 on this?
> and
> > > what is the concern?
> > >
> > > On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller 
> > wrote:
> > >
> > >
> > >> On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar <
> > >> shalinman...@gmail.com> wrote:
> > >>
> > >>  I don't think the debate is about weak reference vs. soft references.
> > >>
> > >> There appears to be confusion between the two here no matter what the
> > >> debate - soft references are for cachinh, weak references are not so
> > much.
> > >> Getting it right is important.
> > >>
> > >>  I
> > >>
> > >>> guess the point that Lance is making is that using such a technique
> > will
> > >>> make application performance less predictable. There's also a good
> > chance
> > >>> that a soft reference based cache will cause cache thrashing and will
> > hide
> > >>> OOMs caused by inadequate cache sizes. So basically we trade an OOM
> for
> > >>> more
> > >>> CPU usage (due to re-computation of results).
> > >>>
> > >>>
> > >> That's the whole point. Your not hiding anything. I don't follow you.
> > >>
> > >>
> > >>
> > >>
> > >>> Personally, I think giving an option is fine. What if the user does
> not
> > >>> have
> > >>> enough RAM and he is willing to pay the price? Right now, there is no
> > way
> > >>> he
> > >>> can do that at all. However, the most frequent reason behind OOMs is
> > not
> > >>> having enough RAM to create the field caches and not Solr caches, so
> > I'm
> > >>> not
> > >>> sure how important this is.
> > >>>
> > >>>
> > >> How important is any feature? You don't have a use for it, so it's not
> > >> important to you - someone else does so it is important to them. Soft
> > value
> > >> caches can be useful.
> > >>
> > >>
> > >>
> > >>
> > >>> On Tue, Oct 20, 2009 at 8:41 AM, Mark Miller 
> > >>> wrote:
> > >>>
> > >>>  There is a difference - weak references are not for very good for
> > caches
> > >>>
> >  -
> >  soft references (soft values here) are good for caches in most jvms.
> > They
> >  can be very nice. Weak refs are eagerly reclaimed - it's suggested
> > that
> >  impls should not eagerly reclaim soft refs.
> > 
> >  - Mark
> > 
> >  http://www.lucidimagination.com (mobile)
> > 
> > 
> >  On Oct 19, 2009, at 8:22 PM, Lance Norskog 
> wrote:
> > 
> >  "Soft references" then. "Weak pointers" is an older term. (They're
> > 
> > 
> > > "weak" because some bully can steal their candy.)
> > >
> > > On Sun, Oct 18, 2009 at 8:37 PM, Jason Rutherglen
> > >  wrote:
> > >
> > >  Lance,
> > >
> > >> Do you mean soft references?
> > >>
> > >> On Sun, Oct 18, 2009 at 3:59 PM, Lance Norskog  >
> > >> wrote:
> > >>
> > >>  -1 for weak references in caching.
> > >>
> > >>> This makes memory management less deterministic (predictable) and
> > at
> > >>> peak can cause cache-thrashing. In other words, the worst case
> gets
> > >>> even more worse. When designing a system I want predictability
> and
> > I
> > >>> want to control the worst case, because system meltdowns are
> caused
> > by
> > >>> the worst case. Having thousands of small weak references does
> the
> > >>> opposite.
> > >>>
> > >>> On Sat, Oct 17, 2009 at 2:00 AM, Noble Paul (JIRA) <
> > j...@apache.org>
> > >>> wrote:
> > >>>
> > >>>
> > >>>
> >   [
> > 
> > 
> >
> https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766864#action_12766864
> >  ]
> > 
> >  Noble Paul commented on SOLR-1513:
> >  --
> > 
> >  bq.Google Collections is already checked in as a dependency of
> > Carrot
> >  clustering.
> > 
> >  in that e 

Re: maxClauseCount in solrconfig.xml

2009-10-20 Thread Mark Miller
Yonik Seeley wrote:
> On Tue, Oct 20, 2009 at 9:06 AM, Mark Miller  wrote:
>   
>>
>>1024
>>
>> Anyone think we should clarify that? The built-in multiterm queries are
>> constant score now, so its a bit misleading.
>> 
>
> Hmmm, yep... out of date.
>
>   
>> Also, why are we using
>>
>> prefixQuery.setRewriteMethod(MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE);
>> 
>
> I dunno - ask the guy who made the change ;-)
> http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/search/SolrQueryParser.java?revision=801872&view=markup
>
> -Yonik
> http://www.lucidimagination.com
>   
Heh - I suspected it was me - but I think I made them before AUTO was
available. Just didn't want to flip them now without bringing it up first :)

-- 
- Mark

http://www.lucidimagination.com





Re: maxClauseCount in solrconfig.xml

2009-10-20 Thread Yonik Seeley
On Tue, Oct 20, 2009 at 9:06 AM, Mark Miller  wrote:
>    
>    1024
>
> Anyone think we should clarify that? The built-in multiterm queries are
> constant score now, so its a bit misleading.

Hmmm, yep... out of date.

> Also, why are we using
>
> prefixQuery.setRewriteMethod(MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE);

I dunno - ask the guy who made the change ;-)
http://svn.apache.org/viewvc/lucene/solr/trunk/src/java/org/apache/solr/search/SolrQueryParser.java?revision=801872&view=markup

-Yonik
http://www.lucidimagination.com


Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Tue, Oct 20, 2009 at 6:07 PM, Mark Miller  wrote:

> I'm +1 obviously ;) No one is talking about making it the default. And I
> think its well known that soft value caches can be a valid choice -
> thats why google has one in their collections here ;) Its a nice way to
> let your cache grow and shrink based on the available RAM. Its not
> always the right choice, but sure is a nice option. And it doesn't have
> much to do with Lucene's FieldCaches. The main reason for a soft value
> cache is not to avoid OOM. Set your cache sizes correctly for that. And
> even if it was to avoid OOM, who cares if something else causes more of
> them? Thats like not fixing a bug in a piece of code because another
> piece of code has more bugs. Anyway, their purpose is to allow the cache
> to size depending on the available free RAM IMO.
>
+1

>
> Noble Paul നോബിള്‍ नोब्ळ् wrote:
> > So , is everyone now in favor of this feature? Who has a -1 on this? and
> > what is the concern?
> >
> > On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller 
> wrote:
> >
> >
> >> On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar <
> >> shalinman...@gmail.com> wrote:
> >>
> >>  I don't think the debate is about weak reference vs. soft references.
> >>
> >> There appears to be confusion between the two here no matter what the
> >> debate - soft references are for cachinh, weak references are not so
> much.
> >> Getting it right is important.
> >>
> >>  I
> >>
> >>> guess the point that Lance is making is that using such a technique
> will
> >>> make application performance less predictable. There's also a good
> chance
> >>> that a soft reference based cache will cause cache thrashing and will
> hide
> >>> OOMs caused by inadequate cache sizes. So basically we trade an OOM for
> >>> more
> >>> CPU usage (due to re-computation of results).
> >>>
> >>>
> >> That's the whole point. Your not hiding anything. I don't follow you.
> >>
> >>
> >>
> >>
> >>> Personally, I think giving an option is fine. What if the user does not
> >>> have
> >>> enough RAM and he is willing to pay the price? Right now, there is no
> way
> >>> he
> >>> can do that at all. However, the most frequent reason behind OOMs is
> not
> >>> having enough RAM to create the field caches and not Solr caches, so
> I'm
> >>> not
> >>> sure how important this is.
> >>>
> >>>
> >> How important is any feature? You don't have a use for it, so it's not
> >> important to you - someone else does so it is important to them. Soft
> value
> >> caches can be useful.
> >>
> >>
> >>
> >>
> >>> On Tue, Oct 20, 2009 at 8:41 AM, Mark Miller 
> >>> wrote:
> >>>
> >>>  There is a difference - weak references are not for very good for
> caches
> >>>
>  -
>  soft references (soft values here) are good for caches in most jvms.
> They
>  can be very nice. Weak refs are eagerly reclaimed - it's suggested
> that
>  impls should not eagerly reclaim soft refs.
> 
>  - Mark
> 
>  http://www.lucidimagination.com (mobile)
> 
> 
>  On Oct 19, 2009, at 8:22 PM, Lance Norskog  wrote:
> 
>  "Soft references" then. "Weak pointers" is an older term. (They're
> 
> 
> > "weak" because some bully can steal their candy.)
> >
> > On Sun, Oct 18, 2009 at 8:37 PM, Jason Rutherglen
> >  wrote:
> >
> >  Lance,
> >
> >> Do you mean soft references?
> >>
> >> On Sun, Oct 18, 2009 at 3:59 PM, Lance Norskog 
> >> wrote:
> >>
> >>  -1 for weak references in caching.
> >>
> >>> This makes memory management less deterministic (predictable) and
> at
> >>> peak can cause cache-thrashing. In other words, the worst case gets
> >>> even more worse. When designing a system I want predictability and
> I
> >>> want to control the worst case, because system meltdowns are caused
> by
> >>> the worst case. Having thousands of small weak references does the
> >>> opposite.
> >>>
> >>> On Sat, Oct 17, 2009 at 2:00 AM, Noble Paul (JIRA) <
> j...@apache.org>
> >>> wrote:
> >>>
> >>>
> >>>
>   [
> 
> 
> https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766864#action_12766864
>  ]
> 
>  Noble Paul commented on SOLR-1513:
>  --
> 
>  bq.Google Collections is already checked in as a dependency of
> Carrot
>  clustering.
> 
>  in that e need to move it to core.
> 
>  Jason . We do not need to remove the original option. We can
> probably
>  add an extra parameter say softRef="true" or something. That way ,
> we
>  are
>  not screwing up anything and perf benefits can be studied
> separately.
> 
> 
>  Use Google Collections in ConcurrentLRUCache
> 
> 
> > 

Re: maxClauseCount in solrconfig.xml

2009-10-20 Thread Mark Miller
Mark Miller wrote:
> 
> 1024
>
> Anyone think we should clarify that? The built-in multiterm queries are
> constant score now, so its a bit misleading.
>
> Also, why are we using
>
>
> prefixQuery.setRewriteMethod(MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE);
>
> Don't we want to use AUTO for the multi-term queries? Its essentially
> the same but with better performance for low term counts?
>
>   
In fact, range query is using auto - almost doesn't make sense not to
use it for wildcard and prefix as well ...

-- 
- Mark

http://www.lucidimagination.com





maxClauseCount in solrconfig.xml

2009-10-20 Thread Mark Miller

1024

Anyone think we should clarify that? The built-in multiterm queries are
constant score now, so its a bit misleading.

Also, why are we using

   
prefixQuery.setRewriteMethod(MultiTermQuery.CONSTANT_SCORE_FILTER_REWRITE);

Don't we want to use AUTO for the multi-term queries? Its essentially
the same but with better performance for low term counts?

-- 
- Mark

http://www.lucidimagination.com





Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-20 Thread Mark Miller
I'm +1 obviously ;) No one is talking about making it the default. And I
think its well known that soft value caches can be a valid choice -
thats why google has one in their collections here ;) Its a nice way to
let your cache grow and shrink based on the available RAM. Its not
always the right choice, but sure is a nice option. And it doesn't have
much to do with Lucene's FieldCaches. The main reason for a soft value
cache is not to avoid OOM. Set your cache sizes correctly for that. And
even if it was to avoid OOM, who cares if something else causes more of
them? Thats like not fixing a bug in a piece of code because another
piece of code has more bugs. Anyway, their purpose is to allow the cache
to size depending on the available free RAM IMO.

Noble Paul നോബിള്‍ नोब्ळ् wrote:
> So , is everyone now in favor of this feature? Who has a -1 on this? and
> what is the concern?
>
> On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller  wrote:
>
>   
>> On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar <
>> shalinman...@gmail.com> wrote:
>>
>>  I don't think the debate is about weak reference vs. soft references.
>> 
>> There appears to be confusion between the two here no matter what the
>> debate - soft references are for cachinh, weak references are not so much.
>> Getting it right is important.
>>
>>  I
>> 
>>> guess the point that Lance is making is that using such a technique will
>>> make application performance less predictable. There's also a good chance
>>> that a soft reference based cache will cause cache thrashing and will hide
>>> OOMs caused by inadequate cache sizes. So basically we trade an OOM for
>>> more
>>> CPU usage (due to re-computation of results).
>>>
>>>   
>> That's the whole point. Your not hiding anything. I don't follow you.
>>
>>
>>
>> 
>>> Personally, I think giving an option is fine. What if the user does not
>>> have
>>> enough RAM and he is willing to pay the price? Right now, there is no way
>>> he
>>> can do that at all. However, the most frequent reason behind OOMs is not
>>> having enough RAM to create the field caches and not Solr caches, so I'm
>>> not
>>> sure how important this is.
>>>
>>>   
>> How important is any feature? You don't have a use for it, so it's not
>> important to you - someone else does so it is important to them. Soft value
>> caches can be useful.
>>
>>
>>
>> 
>>> On Tue, Oct 20, 2009 at 8:41 AM, Mark Miller 
>>> wrote:
>>>
>>>  There is a difference - weak references are not for very good for caches
>>>   
 -
 soft references (soft values here) are good for caches in most jvms. They
 can be very nice. Weak refs are eagerly reclaimed - it's suggested that
 impls should not eagerly reclaim soft refs.

 - Mark

 http://www.lucidimagination.com (mobile)


 On Oct 19, 2009, at 8:22 PM, Lance Norskog  wrote:

 "Soft references" then. "Weak pointers" is an older term. (They're

 
> "weak" because some bully can steal their candy.)
>
> On Sun, Oct 18, 2009 at 8:37 PM, Jason Rutherglen
>  wrote:
>
>  Lance,
>   
>> Do you mean soft references?
>>
>> On Sun, Oct 18, 2009 at 3:59 PM, Lance Norskog 
>> wrote:
>>
>>  -1 for weak references in caching.
>> 
>>> This makes memory management less deterministic (predictable) and at
>>> peak can cause cache-thrashing. In other words, the worst case gets
>>> even more worse. When designing a system I want predictability and I
>>> want to control the worst case, because system meltdowns are caused by
>>> the worst case. Having thousands of small weak references does the
>>> opposite.
>>>
>>> On Sat, Oct 17, 2009 at 2:00 AM, Noble Paul (JIRA) 
>>> wrote:
>>>
>>>
>>>   
  [

 https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766864#action_12766864
 ]

 Noble Paul commented on SOLR-1513:
 --

 bq.Google Collections is already checked in as a dependency of Carrot
 clustering.

 in that e need to move it to core.

 Jason . We do not need to remove the original option. We can probably
 add an extra parameter say softRef="true" or something. That way , we
 are
 not screwing up anything and perf benefits can be studied separately.


 Use Google Collections in ConcurrentLRUCache

 
> 
>
>  Key: SOLR-1513
>  URL: https://issues.apache.org/jira/browse/SOLR-1513
>  Project: Solr
>   Issue Type: Improvement
>   Components: search
>  Affects Versions: 1.4
>>

[jira] Resolved: (SOLR-1099) FieldAnalysisRequestHandler

2009-10-20 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1099?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi resolved SOLR-1099.
--

Resolution: Fixed

Committed revision 827032. Thanks.

> FieldAnalysisRequestHandler
> ---
>
> Key: SOLR-1099
> URL: https://issues.apache.org/jira/browse/SOLR-1099
> Project: Solr
>  Issue Type: New Feature
>  Components: Analysis
>Affects Versions: 1.3
>Reporter: Uri Boness
>Assignee: Koji Sekiguchi
> Fix For: 1.4
>
> Attachments: AnalisysRequestHandler_refactored.patch, 
> analysis_request_handlers_incl_solrj.patch, 
> AnalysisRequestHandler_refactored1.patch, 
> FieldAnalysisRequestHandler_incl_test.patch, 
> SOLR-1099-ordered-TokenizerChain.patch, SOLR-1099.patch, SOLR-1099.patch, 
> SOLR-1099.patch
>
>
> The FieldAnalysisRequestHandler provides the analysis functionality of the 
> web admin page as a service. This handler accepts a filetype/fieldname 
> parameter and a value and as a response returns a breakdown of the analysis 
> process. It is also possible to send a query value which will use the 
> configured query analyzer as well as a showmatch parameter which will then 
> mark every matched token as a match.
> If this handler is added to the code base, I also recommend to rename the 
> current AnalysisRequestHandler to DocumentAnalysisRequestHandler and have 
> them both inherit from one AnalysisRequestHandlerBase class which provides 
> the common functionality of the analysis breakdown and its translation to 
> named lists. This will also enhance the current AnalysisRequestHandler which 
> right now is fairly simplistic.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-20 Thread Noble Paul നോബിള്‍ नोब्ळ्
So , is everyone now in favor of this feature? Who has a -1 on this? and
what is the concern?

On Tue, Oct 20, 2009 at 3:56 PM, Mark Miller  wrote:

>
>
> On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
>  I don't think the debate is about weak reference vs. soft references.
>>
>
> There appears to be confusion between the two here no matter what the
> debate - soft references are for cachinh, weak references are not so much.
> Getting it right is important.
>
>  I
>> guess the point that Lance is making is that using such a technique will
>> make application performance less predictable. There's also a good chance
>> that a soft reference based cache will cause cache thrashing and will hide
>> OOMs caused by inadequate cache sizes. So basically we trade an OOM for
>> more
>> CPU usage (due to re-computation of results).
>>
>
> That's the whole point. Your not hiding anything. I don't follow you.
>
>
>
>> Personally, I think giving an option is fine. What if the user does not
>> have
>> enough RAM and he is willing to pay the price? Right now, there is no way
>> he
>> can do that at all. However, the most frequent reason behind OOMs is not
>> having enough RAM to create the field caches and not Solr caches, so I'm
>> not
>> sure how important this is.
>>
>
> How important is any feature? You don't have a use for it, so it's not
> important to you - someone else does so it is important to them. Soft value
> caches can be useful.
>
>
>
>> On Tue, Oct 20, 2009 at 8:41 AM, Mark Miller 
>> wrote:
>>
>>  There is a difference - weak references are not for very good for caches
>>> -
>>> soft references (soft values here) are good for caches in most jvms. They
>>> can be very nice. Weak refs are eagerly reclaimed - it's suggested that
>>> impls should not eagerly reclaim soft refs.
>>>
>>> - Mark
>>>
>>> http://www.lucidimagination.com (mobile)
>>>
>>>
>>> On Oct 19, 2009, at 8:22 PM, Lance Norskog  wrote:
>>>
>>> "Soft references" then. "Weak pointers" is an older term. (They're
>>>
 "weak" because some bully can steal their candy.)

 On Sun, Oct 18, 2009 at 8:37 PM, Jason Rutherglen
  wrote:

  Lance,
>
> Do you mean soft references?
>
> On Sun, Oct 18, 2009 at 3:59 PM, Lance Norskog 
> wrote:
>
>  -1 for weak references in caching.
>>
>> This makes memory management less deterministic (predictable) and at
>> peak can cause cache-thrashing. In other words, the worst case gets
>> even more worse. When designing a system I want predictability and I
>> want to control the worst case, because system meltdowns are caused by
>> the worst case. Having thousands of small weak references does the
>> opposite.
>>
>> On Sat, Oct 17, 2009 at 2:00 AM, Noble Paul (JIRA) 
>> wrote:
>>
>>
>>>  [
>>>
>>> https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766864#action_12766864
>>> ]
>>>
>>> Noble Paul commented on SOLR-1513:
>>> --
>>>
>>> bq.Google Collections is already checked in as a dependency of Carrot
>>> clustering.
>>>
>>> in that e need to move it to core.
>>>
>>> Jason . We do not need to remove the original option. We can probably
>>> add an extra parameter say softRef="true" or something. That way , we
>>> are
>>> not screwing up anything and perf benefits can be studied separately.
>>>
>>>
>>> Use Google Collections in ConcurrentLRUCache
>>>
 

  Key: SOLR-1513
  URL: https://issues.apache.org/jira/browse/SOLR-1513
  Project: Solr
   Issue Type: Improvement
   Components: search
  Affects Versions: 1.4
 Reporter: Jason Rutherglen
 Priority: Minor
  Fix For: 1.5

  Attachments: google-collect-snapshot.jar, SOLR-1513.patch


 ConcurrentHashMap is used in ConcurrentLRUCache.  The Google
 Colletions concurrent map implementation allows for soft values that
 are
 great for caches that potentially exceed the allocated heap.  Though
 I
 suppose Solr caches usually don't use too much RAM?
 http://code.google.com/p/google-collections/


>>> --
>>> This message is automatically generated by JIRA.
>>> -
>>> You can reply to this email to add a comment to the issue online.
>>>
>>>
>>>
>>>
>>
>> --
>> Lance Norskog
>> goks...@gmail.com
>>
>>
>>
>

 --
 Lance Norskog
 goks...@gmail.com


>>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>


-- 
-
Noble 

Re: [jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-20 Thread Mark Miller



On Oct 20, 2009, at 12:12 AM, Shalin Shekhar Mangar > wrote:



I don't think the debate is about weak reference vs. soft references.


There appears to be confusion between the two here no matter what the  
debate - soft references are for cachinh, weak references are not so  
much. Getting it right is important.



I
guess the point that Lance is making is that using such a technique  
will
make application performance less predictable. There's also a good  
chance
that a soft reference based cache will cause cache thrashing and  
will hide
OOMs caused by inadequate cache sizes. So basically we trade an OOM  
for more

CPU usage (due to re-computation of results).


That's the whole point. Your not hiding anything. I don't follow you.




Personally, I think giving an option is fine. What if the user does  
not have
enough RAM and he is willing to pay the price? Right now, there is  
no way he
can do that at all. However, the most frequent reason behind OOMs is  
not
having enough RAM to create the field caches and not Solr caches, so  
I'm not

sure how important this is.


How important is any feature? You don't have a use for it, so it's not  
important to you - someone else does so it is important to them. Soft  
value caches can be useful.




On Tue, Oct 20, 2009 at 8:41 AM, Mark Miller   
wrote:


There is a difference - weak references are not for very good for  
caches -
soft references (soft values here) are good for caches in most  
jvms. They
can be very nice. Weak refs are eagerly reclaimed - it's suggested  
that

impls should not eagerly reclaim soft refs.

- Mark

http://www.lucidimagination.com (mobile)


On Oct 19, 2009, at 8:22 PM, Lance Norskog  wrote:

"Soft references" then. "Weak pointers" is an older term. (They're

"weak" because some bully can steal their candy.)

On Sun, Oct 18, 2009 at 8:37 PM, Jason Rutherglen
 wrote:


Lance,

Do you mean soft references?

On Sun, Oct 18, 2009 at 3:59 PM, Lance Norskog 
wrote:


-1 for weak references in caching.

This makes memory management less deterministic (predictable)  
and at
peak can cause cache-thrashing. In other words, the worst case  
gets
even more worse. When designing a system I want predictability  
and I
want to control the worst case, because system meltdowns are  
caused by

the worst case. Having thousands of small weak references does the
opposite.

On Sat, Oct 17, 2009 at 2:00 AM, Noble Paul (JIRA) >

wrote:



 [
https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12766864#action_12766864
]

Noble Paul commented on SOLR-1513:
--

bq.Google Collections is already checked in as a dependency of  
Carrot

clustering.

in that e need to move it to core.

Jason . We do not need to remove the original option. We can  
probably
add an extra parameter say softRef="true" or something. That  
way , we are
not screwing up anything and perf benefits can be studied  
separately.



Use Google Collections in ConcurrentLRUCache



  Key: SOLR-1513
  URL: https://issues.apache.org/jira/browse/SOLR-1513
  Project: Solr
   Issue Type: Improvement
   Components: search
 Affects Versions: 1.4
 Reporter: Jason Rutherglen
 Priority: Minor
  Fix For: 1.5

  Attachments: google-collect-snapshot.jar, SOLR-1513.patch


ConcurrentHashMap is used in ConcurrentLRUCache.  The Google
Colletions concurrent map implementation allows for soft  
values that are
great for caches that potentially exceed the allocated heap.   
Though I

suppose Solr caches usually don't use too much RAM?
http://code.google.com/p/google-collections/



--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.






--
Lance Norskog
goks...@gmail.com







--
Lance Norskog
goks...@gmail.com






--
Regards,
Shalin Shekhar Mangar.