How can i omit the illegal characters,when indexing the docs?

2009-01-02 Thread RaghavPrabhu
Hi all, I am extracting the word document using Apache POI,then generate the xml doc,which is the document that i want to indexing in the solr. The problem which i faced was,it thrown the error in the browser is shown below. HTTP Status 500 - Illegal character ((CTRL-CHAR, code 8)) at

Re: synonyms.txt file updated frequently

2009-01-02 Thread Alexander Ramos Jardim
People, Thanks for all the replies, The business requirement I have is to update the synonyms list every time someone from the sales department establishes a new dictionary (they do that a couple times in a week) I must add the new synonyms to the index. I think I will stick with query time

Re: synonyms.txt file updated frequently

2009-01-02 Thread Alexander Ramos Jardim
Grant, I am following your idea to write a new TokenFilter. As long as I looked in SynonymTokenFilter and Factory code, it is the Factory the reponsible for loading the new Just let me make some stupid questions: 1. I will have to write a custom TokenFilter and TokenFilterFactory, right? 2.

cannot allocate memory for snapshooter

2009-01-02 Thread Brian Whitman
I have an indexing machine on a test server (a mid-level EC2 instance, 8GB of RAM) and I run jetty like: java -server -Xms5g -Xmx5g -XX:MaxPermSize=128m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heap -Dsolr.solr.home=/vol/solr -Djava.awt.headless=true -jar start.jar The indexing

debugging long commits

2009-01-02 Thread Brian Whitman
We have a distributed setup that has been experiencing glacially slow commit times on only some of the shards. (10s on a good shard, 263s on a slow shard.) Each shard for this index has about 10GB of lucene index data and the documents are segregated by an md5 hash, so the distribution of

Re: debugging long commits

2009-01-02 Thread Brian Whitman
Not sure if these help. Here's the stack trace and jmap -histo output during a long (bad) commit Full thread dump Java HotSpot(TM) 64-Bit Server VM (11.0-b16 mixed mode): Attach Listener daemon prio=10 tid=0x2aabf9954400 nid=0x5e1c runnable [0x..0x42048d20]

Highlighting not working

2009-01-02 Thread Sushil Vegad
Hi, I cant get highlighting to work. I tried everything mentioned about it on the forum. PLEASE HELP... We use solrJ; search a field called content, it is the default search field, indexed and stored. Its type is text, has analyzer associated with it. There is no uniqueKey in the schema

Re: debugging long commits

2009-01-02 Thread Brian Whitman
I think I'm getting close with this (sorry for the self-replies) I tried an optimize (which we never do) and it took 30m and said this a lot: Exception in thread Lucene Merge Thread #4 org.apache.lucene.index.MergePolicy$MergeException: java.lang.ArrayIndexOutOfBoundsException: Array index out

Re: Using query functions against a type field

2009-01-02 Thread Chris Hostetter
: I would like to use a query function to boost documents of a certain : type. I realize that I can use a boost query for this, but in : analyzing the scoring it doesn't seem as predictable as the query : functions. It should be fairly predictible, can you elaborate on what problems you have

Re: cannot allocate memory for snapshooter

2009-01-02 Thread Bill Au
add more swap space: http://www.nabble.com/Not-enough-space-to11423199.html#a11424938 Bill On Fri, Jan 2, 2009 at 10:52 AM, Brian Whitman br...@echonest.com wrote: I have an indexing machine on a test server (a mid-level EC2 instance, 8GB of RAM) and I run jetty like: java -server -Xms5g

Re: Different results return for capital and small letters.

2009-01-02 Thread Otis Gospodnetic
Tushar, Could you ask on solr-user in the future, please? Your last sentence got cut off. Do you have LowerCaseFilter in both the index and query-time analyzer sections? Perhaps you should just paste that section of the config. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr -

Re: cannot allocate memory for snapshooter

2009-01-02 Thread Brian Whitman
Thanks for the pointer. (It seems really weird to alloc 5GB of swap just because the JVM needs to run a shell script.. but I get hoss's explanation in the following post) On Fri, Jan 2, 2009 at 2:37 PM, Bill Au bill.w...@gmail.com wrote: add more swap space:

Re: cannot allocate memory for snapshooter

2009-01-02 Thread Otis Gospodnetic
Here is another one that I just saw on hadoop's core-user list: If you have overcommit_mem turned on, Java 1.5 will lock *all* of its maximum heap size into RAM (ignores swap!) upon startup. Earlier versions of 1.5 also allocate 1GB of RAM for code compilation. I've seen situations where there

Re: Dismax query parser with different field classes

2009-01-02 Thread Mark Ferguson
Hello, It looks like a boost query will accomplish what I am looking for quite nicely. Mark On Wed, Dec 31, 2008 at 5:29 PM, Mark Ferguson mark.a.fergu...@gmail.comwrote: Hello, I have a set of documents in which I have different classes of fields that I would like to search separately.

Re: Dismax query parser with different field classes

2009-01-02 Thread Mark Ferguson
Hi again, I have a small problem with using a boost query, which is that I would like documents found in the boost query to be returned even if the main query does not include those results. So what I am effectively looking for is an OR between the dismax query and the boost query, rather than a

understanding queryNorm

2009-01-02 Thread vinay kumar kaku
Hi, i wanted to understand how the queryNorm is calculated. i did read similarity documentation of lucene it says it is 1 �C�C�C�C�C�C�C�C�C�C�C�C�C�C sqrt(sumOfSquaredWeights) sumOfSquaredWeights =

Pgination in Solr

2009-01-02 Thread Bhawani Sharma
Hi All, How can i do Pagination in Solr ? Is there any Solr api which provides such method through which i can perform this ? Please reply ASAP . Thanks in advance. Thanks: Bhawani Sharma -- View this message in context: http://www.nabble.com/Pgination-in-Solr-tp21262532p21262532.html Sent

Re: Pgination in Solr

2009-01-02 Thread Umar Shah
solr supports params start and rows append start=Xrows=Y to the url (assuming you are using standard request handler) where X = page number and Y = results per page. On Sat, Jan 3, 2009 at 11:57 AM, Bhawani Sharma bhawanisha...@aol.com wrote: Hi All, How can i do Pagination in Solr ?