Exact match

2008-07-28 Thread Sunil
Hi, I am sending a request to solr for exact match. Example: (title:("Web 2.0" OR "Social Networking") OR description: ("Web 2.0" OR "Social Networking")) But in the results I am getting stories matching "Social", "Web" etc. Please let me know what's going wrong. Thanks, Sunil

Re: Exact match

2008-07-28 Thread Erik Hatcher
Look at what Solr returns when adding &debugQuery=true for the parsed query, and also consider how your fields are analyzed (their associated type, etc). Erik On Jul 28, 2008, at 4:56 AM, Sunil wrote: Hi, I am sending a request to solr for exact match. Example: (title:("Web 2.0"

RE: Exact match

2008-07-28 Thread Sunil
Both the fields are "text" type: How "&debugQuery=true" will help? I am not familiar with the output. Thanks, Sunil -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, July 28, 2008 2:33 PM To: solr-user@lucene.apache.org Subject: Re: Exact match Look at wh

Re: Exact match

2008-07-28 Thread Erik Hatcher
On Jul 28, 2008, at 5:31 AM, Sunil wrote: Both the fields are "text" type: The definition of the field type is important - perhaps it is stripping "2.0"? You can find out by using Solr analysis.jsp (see the Solr admin area in your installation). How "&debugQuery=true" will help? I

nested data structure definition

2008-07-28 Thread Ranjeet
Hi, Can we defined nested data structure in schema.xml for searching? is it prossible or not? Thanks & Regards, Ranjeet Jha

Re: nested data structure definition

2008-07-28 Thread Shalin Shekhar Mangar
Hi Ranjeet, Solr supports multi-valued fields and you can always denormalize your data. Can you give more details on the problem you are trying to solve? On Mon, Jul 28, 2008 at 3:20 PM, Ranjeet <[EMAIL PROTECTED]>wrote: > Hi, > > Can we defined nested data structure in schema.xml for searching?

Re: nested data structure definition

2008-07-28 Thread Ranjeet
Hi, In our case there is Category object under Catalog object, so I do not want to defined the data structure for the Category. I want to give the reference of Category uder Catalog, how can I do this. Regards, Ranjeet - Original Message - From: "Shalin Shekhar Mangar" <[EMAIL PROTE

Re: nested data structure definition

2008-07-28 Thread Shalin Shekhar Mangar
Hi, In Solr there is no hierarchy of objects. De-normalize everything into one schema using multi-valued fields where applicable. Decide on what the document should be. What do you want to return as individual results -- are they catalogs or categories? You can get more help if you give an exampl

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy
Shalin - yes the allfields field exists in my schema.xml file. It is a field that has all of the text from all of the fields concatenated together into one field. My spellCheckIndexDir is created and has 2 segment files, but I think the index is empty. When I initiate the 1st spellcheck.build

Re: Unsure about omitNorms, termVectors...

2008-07-28 Thread Grant Ingersoll
On Jul 24, 2008, at 9:48 AM, Fuad Efendi wrote: Hi, It's unclear... found in schema.xml: omitNorms: (expert) set to true to omit the norms associated with this field (this disables length normalization and index-time boosting for the field, and saves some memory). Only full-text

Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Shalin Shekhar Mangar
Can you show us the query you are issuing? Make sure you add spellcheck=true to the query as a parameter to turn on spell checking. On Mon, Jul 28, 2008 at 6:16 PM, Andrew Nagy <[EMAIL PROTECTED]>wrote: > Shalin - yes the allfields field exists in my schema.xml file. It is a > field that has all

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy
> -Original Message- > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > Sent: Monday, July 28, 2008 10:09 AM > To: solr-user@lucene.apache.org > Subject: Re: SpellCheckComponent problems (was: Multiple search > components in one handler - ie spellchecker) > > Can you show us the quer

Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Shalin Shekhar Mangar
Hi Andrew, Your configuration which you specified in the earlier thread looks fine. Your query is also ok. The complete lack of spell check results in the response you pasted suggests that the SpellCheckComponent is not added to the SearchHandler's list of components. Can you check your solrconfi

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy
I was just reviewing the solr logs and I noticed the following: Jul 28, 2008 11:52:01 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.SpellCheckComponent' It looks like the SpellCheckComponent is not

RE: solr synonyms behaviour

2008-07-28 Thread Laurent Gilles
Hi, I was faced with the same issues reguarding multiwords synonyms Let's say a synonyms list like: club, bar, night cabaret Now if we have a document containing "club", with the default synonyms filter behaviour with expand=true, we will end up in the lucene index with a document containing "cl

Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Shalin Shekhar Mangar
No, SpellCheckComponent was in the nightly long before July 25. There must be a stack trace after that error message. Can you post that? On Mon, Jul 28, 2008 at 9:26 PM, Andrew Nagy <[EMAIL PROTECTED]>wrote: > I was just reviewing the solr logs and I noticed the following: > > Jul 28, 2008 11:52:

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy
Hmm ... sorry, that was the output of a java program that uses solr that I ran and noticed the error. That error doesn't happen when I start solr. Sorry for the confusion. I just changed my schema to have a dedicated field for spelling called "spelling" and I created a new field type for the

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy
Well I will include the stack trace for the aforementioned error: Jul 28, 2008 12:20:17 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: Error loading class 'org.apache.solr.handler.component.SpellCheckComponent' at org.apache.solr.core.SolrResour

Re: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Shalin Shekhar Mangar
Well that means the nightly solr jar you are using is older than you think it is. Try running solr normally without the program and see if you can get it working. On Mon, Jul 28, 2008 at 9:54 PM, Andrew Nagy <[EMAIL PROTECTED]>wrote: > Well I will include the stack trace for the aforementioned er

RE: SpellCheckComponent problems (was: Multiple search components in one handler - ie spellchecker)

2008-07-28 Thread Andrew Nagy
> -Original Message- > From: Shalin Shekhar Mangar [mailto:[EMAIL PROTECTED] > Sent: Monday, July 28, 2008 12:38 PM > To: solr-user@lucene.apache.org > Subject: Re: SpellCheckComponent problems (was: Multiple search > components in one handler - ie spellchecker) > > Well that means the nigh

Unsynchronized FIFOCache - 9x times performance boost on 8-CPU system

2008-07-28 Thread Fuad Efendi
Please see discussion at http://issues.apache.org/jira/browse/SOLR-665 Very simple: map = new LinkedHashMap(initialSize, 0.75f, true) - LRU Cache (and we need synchronized get()) map = new LinkedHashMap(initialSize, 0.75f, false) - FIFO (and we do not need synchronized get()) -- Thanks, Fuad E

RE: nested data structure definition

2008-07-28 Thread Lance Norskog
If you want to think of Solr in database terms, it has only one table. The fields in this table have very flexible type definitions. There can be many optional fields. They also can have various indexes which used together can search text in useful ways. If you want to model multiple tables, you

Multiple Update servers

2008-07-28 Thread Rakesh Godhani
Hi, we are currently evaluating Solr and have been browsing the archives for one particular issue but can¹t seem to find the answer, so please forgive me if I¹m asking a repetitive question. We like the idea of having multiple slave servers serving up queries and a master performing updates. How

big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske
Hi all, For some queries I need to return a lot of rows at once (say 100). When performing these queries I notice a big difference between qTime (which is mostly in the 15-30 ms range due to caching) and total time taken to return the response (measured through SolrJ's elapsedTime), which takes

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Yonik Seeley
That high of a difference is due to the part of the index containing these particular stored fields not being in OS cache. What's the size on disk of your index compared to your physical RAM? -Yonik On Mon, Jul 28, 2008 at 4:10 PM, Britske <[EMAIL PROTECTED]> wrote: > > Hi all, > > For some quer

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske
Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files if that matters) Physical RAM is 2 GB with -Xmx800M set to Solr. Yonik Seeley wrote: > > That high of a difference is due to the part of the index containing > these particular stored fields not being in OS cache. What's the size > on

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Yonik Seeley
That's a bit too tight to have *all* of the index cached...your best bet is to go to 4GB+, or figure out a way not to have to retrieve so many stored fields. -Yonik On Mon, Jul 28, 2008 at 4:27 PM, Britske <[EMAIL PROTECTED]> wrote: > > Size on disk is 1.84 GB (of which 1.3 GB sits in FDT files i

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Mike Klaas
Another possibility is to partition the stored fields into a frequently-accessed set and a full set. If the frequently-accessed set is significantly smaller (in terms of # bytes), then the documents will be tightly-packed on disk and the os caching will be much more effective given the sam

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske
I'm on a development box currently and production servers will be bigger, but at the same time the index will be too. Each query requests at most 20 stored fields. Why doesn't help lazyfieldloading in this situation? I don't need to retrieve all stored fields and I thought I wasn't doing this (

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Yonik Seeley
On Mon, Jul 28, 2008 at 4:53 PM, Britske <[EMAIL PROTECTED]> wrote: > Each query requests at most 20 stored fields. Why doesn't help > lazyfieldloading in this situation? It's the disk seek that kills you... loading 1 byte or 1000 bytes per document would be about the same speed. > Also, if I und

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Grant Ingersoll
What version of Solr/Lucene are you using? On Jul 28, 2008, at 4:53 PM, Britske wrote: I'm on a development box currently and production servers will be bigger, but at the same time the index will be too. Each query requests at most 20 stored fields. Why doesn't help lazyfieldloading in th

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske
Thanks for clearing that up for me. I'm going to investigate some more... Yonik Seeley wrote: > > On Mon, Jul 28, 2008 at 4:53 PM, Britske <[EMAIL PROTECTED]> wrote: >> Each query requests at most 20 stored fields. Why doesn't help >> lazyfieldloading in this situation? > > It's the disk seek

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske
I'm using the solr-nightly of 2008-04-05 Grant Ingersoll-6 wrote: > > What version of Solr/Lucene are you using? > > On Jul 28, 2008, at 4:53 PM, Britske wrote: > >> >> I'm on a development box currently and production servers will be >> bigger, but >> at the same time the index will be to

RE: Tokenizing and searching named character entity references

2008-07-28 Thread Steven A Rowe
Hi Frances, HTMLStripWhitespaceTokenizerFactory wraps a WhitespaceTokenizer around an HTMLStripReader. You could extend HTMLStripReader to not decode named character entities, e.g. by overriding HTMLStripReader.read() so that it calls an alternative readEntity(), which instead of converting en

Re: Expansion stemming

2008-07-28 Thread Chris Hostetter
: "Expansion stemming ? Takes a root word and 'expands' it to all of its : various forms ? can be used either at insertion time or at query : time." : : How do I specify that I want the expansion stemming instead of the porter : stemming? there isn't anexpclit expansion stemming filter included

Re: morphology and queryPrase

2008-07-28 Thread Chris Hostetter
: When i'm looking for words taking care of distance between them, i'm using : lucene syntax "A B"~distance... unfortunaly if A leads to A1 and A2 forms i : should split this into syntax +("A1 B"~dist "A2 B"~dist ") - this grows with : progression depending of normal forms quantity of each term. :

Re: Best way to return ExternalFileField in the results

2008-07-28 Thread Chris Hostetter
: I've been trying to return a field of type ExternalFileField in the search : result. Upon examining XMLWriter class, it seems like Solr can't do this out : of the box. Therefore, I've tried to hack Solr to enable this behaviour. : The goal is to call to ExternalFileField.getValueSource(SchemaFi

Re: Unsure about omitNorms, termVectors...

2008-07-28 Thread Chris Hostetter
: > omitNorms: do I need it for full-text fields even if I don't need index-time : > boosting? I don't want to boost text where keyword repeated several time. Is : > my understanding correct? if you omitNorms="true" then you not only lose index-time doc/field boosting, but you also loose lengthN

Re: Best way to return ExternalFileField in the results

2008-07-28 Thread Ryan McKinley
In general though i wondering if steping back a bit and modifying your request handler to use a SolrDocumentList where you've already flattened the ExternalFileField into each SolrDocument would be an easier approach -- then you wouldnt' need to modify the ResponseWriter at all. Consider

RE: Tokenizing and searching named character entity references

2008-07-28 Thread Chris Hostetter
: You could extend HTMLStripReader to not decode named character entities, : e.g. by overriding HTMLStripReader.read() so that it calls an : alternative readEntity(), which instead of converting entity references : to characters would just leave the entity references as-is, something : like:

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Mike Klaas
On 28-Jul-08, at 1:53 PM, Britske wrote: Each query requests at most 20 stored fields. Why doesn't help lazyfieldloading in this situation? It does help, but not enough. With lots of data per document and not a lot of memory, it becomes probabilistically likely that each doc resides in a

javax.xml.stream.XMLStreamException while indexing

2008-07-28 Thread Pieter Berkel
I've recently encountered a strange error while batch indexing around 500 average-sized documents: HTTP Status 500 - null javax.xml.stream.XMLStreamException at com.bea.xml.stream.MXParser.fillBuf(MXParser.java:3700) at com.bea.xml.stream.MXParser.more(MXParser.java:3715) at com.bea.x

RE: solr synonyms behaviour

2008-07-28 Thread swarag
Hi Laurent Laurent Gilles wrote: > > Hi, > > I was faced with the same issues reguarding multiwords synonyms > Let's say a synonyms list like: > > club, bar, night cabaret > > Now if we have a document containing "club", with the default synonyms > filter behaviour with expand=true, we will

Re: big discrepancy between elapsedtime and qtime although enableLazyFieldLoading= true

2008-07-28 Thread Britske
That sounds interesting. Let me explain my situation, which may be a variant of what you are proposing. My documents contain more than 10.000 fields, but these fields are divided like: 1. about 20 general purpose fields, of which more than 1 can be selected in a query. 2. about 10.000 fields of

Re: nested data structure definition

2008-07-28 Thread matt connolly
In my site, I have a document, which may have multiple comments. For each comment, I would like to know several pieces of information, like: text, author, and date. -Matt Shalin Shekhar Mangar wrote: > > Hi Ranjeet, > > Solr supports multi-valued fields and you can always denormalize your >