Solr Locking Issue

2008-12-06 Thread Rinesh1
Hi, Please help me with the following scenario. I have a solr data folder SOLR_DATA I have 2 web applications solr1 and solr 2 referring to the same SOLR_DATA folder. I am trying to index data using solr1/update and solr2/update sequentially. Indexing using the first

Adding External Metadata to pdf document

2008-12-06 Thread Jana, Kumar Raja
Hi, I need to add some external metadata along with the documents I send to ExtractingRequestHandler. Can someone please tell me how do i achieve this? E.g. Say I need to index the file abc.pdf. I want to add some more additional information to the metadata such as Category = Alphabets,

RE: Russian stopwords

2008-12-06 Thread tushar kapoor
Hi Steve, You were right,it turned out to be a an encoding issue but a really weird one. I was using windows notepad to save the stopwords file in UTF-8 encoding. On the other hand I was using editplus to save synonyms file. That was the only difference. The moment I switched to editplus for

Returning snippets with results

2008-12-06 Thread Jana, Kumar Raja
Hi, I want to get snippets along with my results. For this, I use the Highlighting Feature to return the context of fragment size 10. Some of the documents are very large (over 30 MB) in size and the Highlighting feature works only for stored fields. So this makes it necessary for me to store

Re: Solr Locking Issue

2008-12-06 Thread Grant Ingersoll
In Lucene (hence Solr) only one IndexWriter may write to an index at a time (by design), so pointing two separate Solr instances at the same index will result in the lock issue you describe. I guess the question back to you is, why do you need two web apps pointing to the same Solr data

Re: Adding External Metadata to pdf document

2008-12-06 Thread Grant Ingersoll
Hi Kumar, Wow, a brave soul trying out Solr Cell (aka the ExtractingRequestHandler) already! Cool! To add in external metadata, you can pass in literal parameters, as in: In your example, you could do something like: ext.literal.Category=Alphabetsext.literal.Catalog_ID=1213123 This will

Re: Solr Locking Issue

2008-12-06 Thread Rinesh1
Hi Grant, Q.Why do you need two web apps pointing to the same Solr data directory? A.I am planning to deploy solr in a load balanced environment where there will be 3 web servers and 3 app servers.So there will be solr web app deployed in 3 app servers and there will be 1 SOLR_DATA

Re: Solr Locking Issue

2008-12-06 Thread Grant Ingersoll
Typically this is handle through Solr's built-in replication capabilities.This is commonly referred to as a master/slave or master/worker setup whereby indexing takes place in one instance of Solr, and then the worker nodes pull snapshots from the master on a regular basis (I've seen

Re: Returning snippets with results

2008-12-06 Thread Grant Ingersoll
I don't think there is, since storage (or term vectors, but that likely won't save you any space) is the only place that Solr has the content stored in the correct order. Namely, for searching, documents are split up into an inverted index and it is really cumbersome to recreate a

Re: Trying to exclude integer field with certain numbers

2008-12-06 Thread Grant Ingersoll
Can you retrieve those thread_ids as is? That is, if you query for thread_id:456 (w/o all the other stuff) what happens? Also, try adding debugQuery=true to your input parameters. This should give you some more information about how the query was parsed, etc. HTH, Grant On Dec 2,

Delta-import hack to use last indexed id document

2008-12-06 Thread Marc Sturlese
Hey there, I am doing some hacks to some parts of the solr source. I am doing a feature for everytime I use delta import hanlder I want it to start geting info from the db starting from the last indexed document id (from the latest execution). The point of doing that is that if I start a full

Re: Delta-import hack to use last indexed id document

2008-12-06 Thread Jon Baer
This sounds a little like my original problem of deltaQuery imports per entity ... https://issues.apache.org/jira/browse/SOLR-783 I wonder if those 2 hacks could be combined to fix the issue. - Jon On Dec 6, 2008, at 12:29 PM, Marc Sturlese wrote: Hey there, I am doing some hacks to some

Re: Delta-import hack to use last indexed id document

2008-12-06 Thread Noble Paul നോബിള്‍ नोब्ळ्
On Sat, Dec 6, 2008 at 10:59 PM, Marc Sturlese [EMAIL PROTECTED] wrote: Hey there, I am doing some hacks to some parts of the solr source. I am doing a feature for everytime I use delta import hanlder I want it to start geting info from the db starting from the last indexed document id (from

RE: Russian stopwords

2008-12-06 Thread Lance Norskog
The default encoding on windows is not UTF-8. This causes various weirdness when you develop on Windows. This has helped me find all places in string-handling that need the encoding name parameter, so it's not all bad. Lance -Original Message- From: tushar kapoor [mailto:[EMAIL

RE: Dealing with field values as key/value pairs

2008-12-06 Thread Lance Norskog
This is really cool. U... How does it integrate with the Data Import Handler? Lance -Original Message- From: Chris Hostetter [mailto:[EMAIL PROTECTED] Sent: Friday, December 05, 2008 8:31 PM To: solr-user@lucene.apache.org Subject: Re: Dealing with field values as key/value pairs

MoreLikeThis and boost functions

2008-12-06 Thread Jérôme Etévé
Hi everyone, I'm wondering if the MoreLikeThis handler takes the boost function parameter into account for the scoring (hence the sorting I guess) of the similar documents it finds. Thanks for your help ! Jerome. -- Jerome Eteve. Chat with me live at http://www.eteve.net [EMAIL PROTECTED]

Re: Trying to exclude integer field with certain numbers

2008-12-06 Thread Yonik Seeley
On Tue, Dec 2, 2008 at 9:29 PM, Jake Conk [EMAIL PROTECTED] wrote: I am trying to exclude certain records from my search results in my query by specifying which ones I don't want back but its not working as expected. Here is my query: +message:test AND (-thread_id:123 OR -thread_id:456 OR

Limitations of Distributed Search ....

2008-12-06 Thread souravm
Hi, We are planning to use Solr for processing large volume of application log files (around ~ 10 Billions documents of size 5-6 TB). One of the approach we are considering for the same is to use Distributed Search extensively. What we have in mind is distributing the log files in multiple