RE: Solr replication

2008-01-16 Thread Dilip.TS
Hi Bill, I have some questions regarding the SOLR collection distribution. !) Is it possilbe to add the index operations on the the slave server using SOLR collection distribution and still the master server is updated with these changes? 2)I have a requirement of having more than one solr

Problem with dismax handler when searching Solr along with field

2008-01-16 Thread farhanali
when i search the query for example http://localhost:8983/solr/select/?q=categoryqt=dismax it gives the results but when i want to search on the basis of field name like http://localhost:8983/solr/select/?q=maincategory:Carsqt=dismax it does not gives results however

Indexing two sets of details

2008-01-16 Thread Gavin
Hi, In the web application we are developing we have two sets of details. The personal details and the resume details. We allow 5 different resumes to be available for each user. But we want the personal details to remain same for each 5 resumes. The problem is when personal details are

Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Shalin Shekhar Mangar
Look at http://issues.apache.org/jira/browse/SOLR-303 Please note that it is still work in progress. So you may not be able to use it immeadiately. On Jan 16, 2008 10:53 AM, Srikant Jakilinki [EMAIL PROTECTED] wrote: Hi All, There is a requirement in our group of indexing and searching

Re: Solr replication

2008-01-16 Thread Bill Au
my answers inilne... On Jan 16, 2008 3:51 AM, Dilip.TS [EMAIL PROTECTED] wrote: Hi Bill, I have some questions regarding the SOLR collection distribution. !) Is it possilbe to add the index operations on the the slave server using SOLR collection distribution and still the master server is

Re: Indexing very large files.

2008-01-16 Thread David Thibault
All, I just found a thread about this on the mailing list archives because I'm troubleshooting the same problem. The kicker is that it doesn't take such large files to kill the StringBuilder. I have discovered the following: By using a text file made up of 3,443,464 bytes or less, I get no

Cache size and Heap size

2008-01-16 Thread Evgeniy Strokin
Hello,.. I have relatively large RAM (10Gb) on my server which is running Solr. I increased Cache settings and start to see OutOfMemory exceptions, specially on facet search. Is anybody has some suggestions how Cache settings related to Memory consumptions? What are optimal settings? How they

conceptual issues with solr

2008-01-16 Thread Philippe Guillard
Hi here, It seems that Lucene accepts any kind of XML document but Solr accepts only flat name/value pairs inside a document to be indexed. You'll find below what I'd like to do, Thanks for help of any kind ! Phil I need to index products (hotels)

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
I don't think this is a StringBuilder limitation, but rather your Java JVM doesn't start with enough memory. i.e. -Xmx. In raw Lucene, I've indexed 240M files Best Erick On Jan 16, 2008 10:12 AM, David Thibault [EMAIL PROTECTED] wrote: All, I just found a thread about this on the

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
P.S. Lucene by default limits the maximum field length to 10K tokens, so you have to bump that for large files. Erick On Jan 16, 2008 11:04 AM, Erick Erickson [EMAIL PROTECTED] wrote: I don't think this is a StringBuilder limitation, but rather your Java JVM doesn't start with enough memory.

Re: Indexing very large files.

2008-01-16 Thread David Thibault
I think your PS might do the trick. My JVM doesn't seem to be the issue, because I've set it to -Xmx512m -Xms256m. I will track down the solr config parameter you mentioned and try that. Thanks for the quick response! Dave On 1/16/08, Erick Erickson [EMAIL PROTECTED] wrote: P.S. Lucene by

Re: Indexing very large files.

2008-01-16 Thread David Thibault
I tried raising the maxFieldLength1/maxFieldLength under mainIndex as well as indexDefaults and still no luck. I'm trying to upload a text file that is about 8 MB in size. I think the following stack trace still points to some sort of overflowed String issue. Thoughts? Solr returned an

Re: Indexing very large files.

2008-01-16 Thread Erick Erickson
The PS really wasn't related to your OOM, and raising that shouldn't have changed the behavior. All that happens if you go beyond 10,000 tokens is that the rest gets thrown away. But we're beyond my real knowledge level about SOLR, so I'll defer to others. A very quick-n-dirty test as to whether

Re: Indexing very large files.

2008-01-16 Thread Walter Underwood
This error means that the JVM has run out of heap space. Increase the heap space. That is an option on the java command. I set my heap to 200 Meg and do it this way with Tomcat 6: JAVA_OPTS=-Xmx600M tomcat/bin/startup.sh wunder On 1/16/08 8:33 AM, David Thibault [EMAIL PROTECTED] wrote:

Re: Indexing very large files.

2008-01-16 Thread David Thibault
Nice signature...=) On 1/16/08, Erick Erickson [EMAIL PROTECTED] wrote: The PS really wasn't related to your OOM, and raising that shouldn't have changed the behavior. All that happens if you go beyond 10,000 tokens is that the rest gets thrown away. But we're beyond my real knowledge level

Re: Indexing very large files.

2008-01-16 Thread David Thibault
Walter and all, I had been bumping up the heap for my Java app (running outside of Tomcat) but I hadn't yet tried bumping up my Tomcat heap. That seems to have helped me upload the 8MB file, but it's crashing while uploading a 32MB file now. I Just bumped tomcat to 1024MB of heap, so I'm not sure

Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Srikant Jakilinki
Thanks for that Shalin. Looks like I have to wait and keep track of developments. Forgetting about indexes that cannot be fit on a single machine (distributed search), any links to have Solr running in a 2-machine environment? I want to measure how much improvement there will be in

Re: Cache size and Heap size

2008-01-16 Thread evgeniy . strokin
I'm using Tomcat. I set Max Size = 5Gb and I checked in profiler that it's actually uses whole memory. There is no significant memory use by other applications. Whole change was I increased the size of cache to: LRU Cache(maxSize=1048576, initialSize=1048576, autowarmCount=524288, [EMAIL

Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Shalin Shekhar Mangar
Solr provides a few scripts to create a multiple-machine deployment. One box is setup as the master (used primarily for writes) and others as slaves. Slaves are added as per application requirements. The index is transferred using rsync. Look at http://wiki.apache.org/solr/CollectionDistribution

Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Mike Klaas
On 16-Jan-08, at 11:09 AM, Srikant Jakilinki wrote: Thanks for that Shalin. Looks like I have to wait and keep track of developments. Forgetting about indexes that cannot be fit on a single machine (distributed search), any links to have Solr running in a 2-machine environment? I want to

Re: Solr in a distributed multi-machine high-performance environment

2008-01-16 Thread Mike Klaas
On 15-Jan-08, at 9:23 PM, Srikant Jakilinki wrote: 2) Solr that has to handle a large collective index which has to be split up on multi-machines - The index is ever increasing (TB scale) and dynamic and all of it has to be searched at any point This will require significant development on

Re: Cache size and Heap size

2008-01-16 Thread Mike Klaas
On 16-Jan-08, at 11:15 AM, [EMAIL PROTECTED] wrote: I'm using Tomcat. I set Max Size = 5Gb and I checked in profiler that it's actually uses whole memory. There is no significant memory use by other applications. Whole change was I increased the size of cache to: LRU Cache(maxSize=1048576,

Re: Problem with dismax handler when searching Solr along with field

2008-01-16 Thread Mike Klaas
On 16-Jan-08, at 3:15 AM, farhanali wrote: when i search the query for example http://localhost:8983/solr/select/?q=categoryqt=dismax it gives the results but when i want to search on the basis of field name like http://localhost:8983/solr/select/?q=maincategory:Carsqt=dismax it does

IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
I am using the embedded Solr API for my indexing process. I created a brand new index with my application without any problem. I then ran my indexer in incremental mode. This process copies the working index to a temporary Solr location, adds/updates any records, optimizes the index, and then

Re: Big number of conditions of the search

2008-01-16 Thread evgeniy . strokin
I see,.. but I really need to run it on Solr. We have already indexed everything. I don't really want to construct a query with 1K OR conditions, and send to Solr to parse it first and run it after. May be there is a way to go directly to Lucene, or Solr and run such query from Java, passing

Re: Indexing very large files.

2008-01-16 Thread David Thibault
OK, I have now bumped my tomcat JVM up to 1024MB min and 1500MB max. For some reason Walter's suggestion helped me get past the 8MB file upload to Solr but it's still choking on a 32MB file. Is there a way to set per-webapp JVM settings in tomcat, or is the overall tomcat JVM sufficient to set?

RE: Indexing very large files.

2008-01-16 Thread Timothy Wonil Lee
I think you should try isolating the problem. It may turn out that the problem isn't really to do with Solr, but file uploading. I'm no expert, but that's what I'd try out in such situation. Cheers, Timothy Wonil Lee http://timundergod.blogspot.com/

Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Otis Gospodnetic
Kevin, Don't have the answer to EOF but I'm wondering why the index moving. You don't need to do that as far as Solr is concerned. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kevin Osborn [EMAIL PROTECTED] To: Solr

Re: Spell checker index rebuild

2008-01-16 Thread Otis Gospodnetic
Do you trust the spellchecker 100% (not looking at its source now). I'd peek at the index with Luke (Luke I trust :)) and see if that term is really there first. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Doug Steigerwald [EMAIL

Re: Indexing very large files.

2008-01-16 Thread Yonik Seeley
From your stack trace, it looks like it's your client running out of memory, right? SimplePostTool was meant as a command-line replacement to curl to remove that dependency, not as a recommended way to talk to Solr. -Yonik On Jan 16, 2008 4:29 PM, David Thibault [EMAIL PROTECTED] wrote: OK, I

Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Otis Gospodnetic
Kevin, Perhaps you want to look at how Solr can be used in a master-slave setup. This will separate your indexing from searching. Don't have the URL, but it's on zee Wiki. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Kevin Osborn

Re: Indexing very large files.

2008-01-16 Thread Otis Gospodnetic
David, I bet you can quickly identify the source using YourKit or another Java profiler jmap command line tool might also give you some direction. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: David Thibault [EMAIL PROTECTED] To:

Re: IOException: read past EOF during optimize phase

2008-01-16 Thread Kevin Osborn
I did see that bug, which made me suspect Lucene. In my case, I tracked down the problem. It was my own application. I was using Java's FileChannel.transferTo functions to copy my index from one location to another. One of the files is bigger than 2^31-1 bytes. So, one of my files was corrupted

Logging in Solr

2008-01-16 Thread David Thibault
All, I'm new to Solr and Tomcat and I'm trying to track down some odd errors. How do I set up Tomcat to do fine-grained Solr-specific logging? I have looked around enough to know that it should be possible to do per-webapp logging in Tomcat 5.5, but the details are hard to follow for a newbie.

Re: conceptual issues with solr

2008-01-16 Thread Norberto Meijome
On Wed, 16 Jan 2008 16:54:56 +0100 Philippe Guillard [EMAIL PROTECTED] wrote: Hi here, It seems that Lucene accepts any kind of XML document but Solr accepts only flat name/value pairs inside a document to be indexed. You'll find below what I'd like to do, Thanks for help of any kind !

Re: Solr schema filters

2008-01-16 Thread Chris Hostetter
: For this exact example, use the WordDelimiterFilter exactly as : configured in the text fieldType in the example schema that ships : with solr. The trick is to then use some slop when querying. : : FT-50-43 will be indexed as FT, 50, 43 / 5043 (the last two tokens : are in the same position).

Re: DisMax Syntax

2008-01-16 Thread Chris Hostetter
: I may be mistaken, but this is not equivalent to my query.In my query i have : matches for x1, matches for x2 without slope and/or boosting and then match : to x1 x2 (exact match) with slope (~) a and boost (b) in order to have : results with exact match score better. : The total score is the

Re: Fuzziness with DisMaxRequestHandler

2008-01-16 Thread Chris Hostetter
: Is there any way to make the DisMaxRequestHandler a bit more forgiving with : user queries, I'm only getting results when the user enters a close to : perfect match. I'd like to allow near matches if possible, but I'm not sure : how to add something like this when special query syntax isn't

Re: Transactions and Solr Was: Re: Delte by multiple id problem

2008-01-16 Thread Chris Hostetter
: Does anyone have more experience doing this kind of stuff and whants to share? My advice: don't. I work with (or work with people who work with) about two dozen Solr indexes -- we don't attempt to update a single one of them in any sort of transactional way. Some of them are updated real

Re: Restrict values in a multivalued field

2008-01-16 Thread Chris Hostetter
: In my schema I have a multivalued field, and the values of that field are : stored and indexed in the index. I wanted to know if its possible to : restrict the number of multiple values being returned from that field, on a : search? And how? Because, lets say, if I have thousands of values in

Re: Fwd: Solr Text field

2008-01-16 Thread Chris Hostetter
: searches. That is fine by me. But I'm still at the first question: : How do I conduct a wildcard search for ARIZONA on a solr.textField? I tried as i said: it really depends on what kind of index analyzer you have configured for the field -- the query analyzer isn't used at all when dealing

Re: batch indexing takes more time than shown on SOLR output -- something to do with IO?

2008-01-16 Thread Chris Hostetter
: INFO: {add=[10485, 10488, 10489, 10490, 10491, 10495, 10497, 10498, ...(42 : more) : ]} 0 875 : : However, when timing this instruction on the client-side (I use SOlrJ -- : req.process(server)) I get totally different numbers (in the beginning the : client-side measured time is about 2 seconds

Re: FunctionQuery in a custom request handler

2008-01-16 Thread Chris Hostetter
: How do I access the ValueSource for my DateField? I'd like to use a : ReciprocalFloatFunction from inside the code, adding it aside others in the : main BooleanQuery. The FieldType API provides a getValueSource method (so every FieldType picks it's own best ValueSource implementaion). -Hoss