SolrIndexWriter holding reference to deleted file?

2007-12-20 Thread amamare
I have an application consisting of three web applications running on JBoss 1.4.2 on a Linux Redhat server. I'm using Solr/Lucene embeddedly to create and maintain a frequently updated index. Once updated, the index is copied to another directory used for searching. Old index-files in the search

Re: SolrIndexWriter holding reference to deleted file?

2007-12-20 Thread Yonik Seeley
This is probably related to using Solr/Lucene embeddedly See the warning at the top of http://wiki.apache.org/solr/EmbeddedSolr It does sound like your SolrIndexSearcher objects aren't being closed. Solr (via SolrCore) doesn't rely on garbage collection to close the searchers (since gc

Re: Retrieving Tokens

2007-12-20 Thread Erick Erickson
I think that what Yonik wants is a higher-level response. *Why* do you want to process the tokens later? What is the use case you're trying to satisfy? Best Erick On Dec 20, 2007 1:37 AM, Rishabh Joshi [EMAIL PROTECTED] wrote: What are you trying to do with the tokens? Yonik, we wanted a

Re: Making stemming dynamic at query time

2007-12-20 Thread Otis Gospodnetic
Kamran, I think Bertrand's suggestion is the only possible solution. I can't think of a way you can not stem at index time and make it an option at search time. If you look at and understand low-level/basic indexing and term matching process, I think you'll see why this seems impossible.

Re: debugging slowness

2007-12-20 Thread Otis Gospodnetic
Brian (moving to solr-user), Sounds like GC to me. That is, the JVM not having large enough heap. Run jconsole and you'll quickly see if this guess is correct or not (kill -QUIT is also your friend, believe it or not). We recently had somebody who had a nice little Solr spellchecker instance

Re: debugging slowness

2007-12-20 Thread Brian Whitman
On Dec 20, 2007, at 11:02 AM, Otis Gospodnetic wrote: Sounds like GC to me. That is, the JVM not having large enough heap. Run jconsole and you'll quickly see if this guess is correct or not (kill -QUIT is also your friend, believe it or not). We recently had somebody who had a nice

RE: Making stemming dynamic at query time

2007-12-20 Thread Steven A Rowe
I can think of a way to not store stems in the index, but to gain the benefit from stemming, i.e. improved recall: expand the query to include all index terms that share stems with the original query terms. Here's one way to achieve this: - When indexing, run all terms through the stemmer, and

Re: debugging slowness

2007-12-20 Thread Ryan McKinley
Can't run jconsole, no X at the moment, if need be I'll install it though... You should be able to connect jconsole to a remote process (no need for X on the server) check: http://java.sun.com/j2se/1.5.0/docs/guide/management/

Successful project based on SOLR

2007-12-20 Thread Marius Hanganu
Hi guys, I just wanted to let you know our company has successfully launched a new high traffic website based on a powerful CMS built on top of SOLR. The website - http://www.hotnews.ro - serves up to 80k users per day with an average 400K pages per day. It uses an custom hibernate-SOLR

RE: Successful project based on SOLR

2007-12-20 Thread Charlie Jackson
Congratulations! It uses an custom hibernate-SOLR bridge which allows transparent persistence of entities on different SOLR servers. Any chance of this code making its way back to the SOLR community? Or, if not, can you give me an idea how you did it? This seamless integration of Hibernate

Re: Making stemming dynamic at query time

2007-12-20 Thread Kamran Shadkhast
Kamran, I think Bertrand's suggestion is the only possible solution. I can't think of a way you can not stem at index time and make it an option at search time. If you look at and understand low-level/basic indexing and term matching process, I think you'll see why this seems impossible.

Re: Making stemming dynamic at query time

2007-12-20 Thread Erick Erickson
Well, you *still* have to store the stemmed and unstemmed version in your index, otherwise you can't distinguish between, say, run and running because you'd have indexed run both times. But you could think about using special tokenizing. That is, for a word that's stemmed, index a stem form.

Re: Successful project based on SOLR

2007-12-20 Thread Jonathan Ariel
What's the difference with that and Hibernate Searchhttp://www.hibernate.org/410.html ? On Dec 20, 2007 2:09 PM, Charlie Jackson [EMAIL PROTECTED] wrote: Congratulations! It uses an custom hibernate-SOLR bridge which allows transparent persistence of entities on different SOLR servers.

Re: Retrieving Tokens

2007-12-20 Thread Eswar K
Yonik/Erick, We are building a custome Search which is to be done in 2 parts executed at different points of time. As a result of it, the first step we want tokenize the information and store it, which we want to retrieve a later point of time for further processing and then store it back into

clearing solr write.lock

2007-12-20 Thread Kasi Sankaralingam
I am running into a problem where previous residual lock files are left in solr data directory after A failed process, can we programmatically/efficiently remove this .lock file. Also, has anyone Externalized the handling of lock files (meaning keep the lock file for example in database?) Any

Re: clearing solr write.lock

2007-12-20 Thread Mike Klaas
On 20-Dec-07, at 11:24 AM, Kasi Sankaralingam wrote: I am running into a problem where previous residual lock files are left in solr data directory after A failed process, can we programmatically/efficiently remove this .lock file. Also, has anyone Externalized the handling of lock files

RE: Successful project based on SOLR

2007-12-20 Thread Charlie Jackson
That's the first I've seen of Hibernate Search. Looks interesting, but I think it's a little different than what I was looking for. Since it indexes into Lucene, it's close, but I wouldn't have a bunch of my favorite Solr features, such as remote indexing and field-level analysis at index and

Re: Successful project based on SOLR

2007-12-20 Thread Jonathan Ariel
It could be really nice to have Solr support for Hibernate Search... good project to work on ;) I'll start digging into it. On Dec 20, 2007 5:28 PM, Charlie Jackson [EMAIL PROTECTED] wrote: That's the first I've seen of Hibernate Search. Looks interesting, but I think it's a little different

RE: Successful project based on SOLR

2007-12-20 Thread Charlie Jackson
Yeah I remember seeing that at one point when I was first looking at the solrj client. I had plans to build on it but I got pulled away on something else. Maybe it's time to take another look and see what I can do with it. As Jonathan said, it's a good project to work on. -Original

RE: clearing solr write.lock

2007-12-20 Thread Kasi Sankaralingam
Hi Mike, Thanks a lot, where would this lock information go and also How do I set the lock timeout? Kasi -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Thursday, December 20, 2007 12:19 PM To: solr-user@lucene.apache.org Subject: Re: clearing solr write.lock On

Re: solr and NFS in distributed deployment, real time indexing and real time searching

2007-12-20 Thread Erick Erickson
You might try searching the Lucene users list for NFS. I know there has been frequent discussion of locking issues etc. But since I'm not using an NFS mount, I just glossed over them. Also, my recollection is that many (most? all?) of the underlying issues have been dealt with with new versions

Re: solr and NFS in distributed deployment, real time indexing and real time searching

2007-12-20 Thread Ryan McKinley
if you can avoid NFS, that is much better. The solr distribution scripts should help with this. I think recent lucene changes have made it possible, but it will still be slower... Kasi Sankaralingam wrote: Has anyone done the above successfully without pulling hairs (stale NFS handle

maxBooleanClauses

2007-12-20 Thread Stu Hood
Hello, Is the 'maxBooleanClauses' setting just there for sanity checking, to protect me from my users? Thanks, Stu Hood Webmail.us You manage your business. We'll manage your email.®

Re: get the fields of solr

2007-12-20 Thread Yonik Seeley
On Dec 20, 2007 8:47 PM, Edward Zhang [EMAIL PROTECTED] wrote: I tried it, but the QTime was beyond my tolerance.It costs me about 53s on average to show=schema. That's probably because Luke tries to find the top terms for each field by default. Try passing in numTerms=0 -Yonik The index

Re: get the fields of solr

2007-12-20 Thread Edward Zhang
Wow, thanks for Yonik 's quick reply! :) That is what I want! I just tried numTerms=500 then I ignored the useness of numTerms. On 12/21/07, Yonik Seeley [EMAIL PROTECTED] wrote: On Dec 20, 2007 8:47 PM, Edward Zhang [EMAIL PROTECTED] wrote: I tried it, but the QTime was beyond my

Re: clearing solr write.lock

2007-12-20 Thread Mike Klaas
Hey Kasi, Take a look at the solr config file for the included example (example/ solr/conf/solrconfig.xml). It is the canonical documentation. cheers, -Mike On 20-Dec-07, at 1:46 PM, Kasi Sankaralingam wrote: Hi Mike, Thanks a lot, where would this lock information go and also How do I

substituting a db

2007-12-20 Thread alexander lind
Hi List I have a pretty big app in the works, and in short it will need to index a lot of items, with with some core attributes, and hundreds of optional attributes for each item. The app then needs to be able to make queries like 'find all items with attributes attribute_1=yes,