newbie lucene indexing/search question

2006-12-28 Thread moraleslos
I currently have a book containing content that is stored in the database by paragraph. For example, a book contains content with 5 paragraphs. Therefore each paragraph is stored as a distinct record in a database. In the object domain, I have a Book object which holds a java.util.List of Paragrap

Text storing design and performance question

2007-01-10 Thread moraleslos
I'm running into a little dilemma with Lucene highlighting and indexing. I currently index anything and everything that gets inserted into a database. This database includes all the content that is searched. Now I'll have lots and lots of content, thinking of the range of 50GB+, all stored in t

Re: Text storing design and performance question

2007-01-10 Thread moraleslos
e it from there and pass that string to the > highlighter instead. > > Erik > > > On Jan 10, 2007, at 10:45 AM, moraleslos wrote: > >> >> I'm running into a little dilemma with Lucene highlighting and >> indexing. I >> currently

Re: Text storing design and performance question

2007-01-10 Thread moraleslos
y, and so > you would only highlight the docs as they became visible to the user. > This is generally a small amount...often one at a time. > > - Mark > > moraleslos wrote: >> Hi Erik, >> >> Would that slow performance a bit? For example, say I receive 50,000

Re: Text storing design and performance question

2007-01-10 Thread moraleslos
ext from the database through the highlighter (with the query) before > displaying it. > > - Mark > > On 1/10/07, moraleslos <[EMAIL PROTECTED]> wrote: >> >> >> Hi Mark, >> >> Looks like I've got to implement some sort of pagination for my clients.

RE: Text storing design and performance question

2007-01-10 Thread moraleslos
hen shove each piece of relevant > text from the database through the highlighter (with the query) before > displaying it. > > - Mark > > On 1/10/07, moraleslos <[EMAIL PROTECTED]> wrote: >> >> >> Hi Mark, >> >> Looks like I've got to

sort on a searchable field

2007-01-10 Thread moraleslos
>From what I understand about Lucene, one can only sort on a field that is indexed but not tokenized (and hence not searchable). I have content that can be searched by keyword and also a date string, e.g. text:Lucene AND date:[2007-01-01 TO 2007-01-10] Since my date is searchable, I need to inde

Sorting using Lucene search query syntax

2007-01-16 Thread moraleslos
Is it possible to specify a sort on a field using standard Lucene search query syntax? I was not able to find it in the query doc so I assume not but I would like to make sure before going on to use the API. Thanks in advance! -los -- View this message in context: http://www.nabble.com/Sortin

sorting issue with un-tokenized field

2007-01-17 Thread moraleslos
-- View this message in context: http://www.nabble.com/sorting-issue-with-un-tokenized-field-tf3029674.html#a8418417 Sent from the Lucene - Java Users mailing list archive at Nabble.com. - To unsubscribe, e-mail: [EMAIL PROTE

Re: sorting issue with un-tokenized field

2007-01-17 Thread moraleslos
Oops, accidently pressed the ENTER key before doing anything ;-) I have a field called "bookTitle" that I specified as UN_TOKENIZED and STORED in the index (i.e. keyword). However, when I do a sort on this field during a search I get this error: Exception occurred during search: java.lang.Runt

Highlighting issues

2007-02-27 Thread moraleslos
In my search query I have two fields to search, a metadata field and the actual contents. The metadata field is just an enum containing FIRST and LAST. Here is an example search query: Content:"Barry Bonds" and Metadata:FIRST I have Lucene highlight the hits like this: ... getBestFragment(stan

Lucene for name matching

2007-04-05 Thread moraleslos
I was wondering if anyone has done people name matching using Lucene. For example, I have a name coming from some external source that I would like to match with the one I have in my DB. Lets say my DB contains the name "John Smith". If the external source has something like "Smith John", "Smit

Re: Lucene for name matching

2007-04-05 Thread moraleslos
ng matching algorithms as well that are used in > various approaches. See http://en.wikipedia.org/wiki/Edit_distance. > Googling record linkage may help. From there, you can pretty much > knock yourself out with all the different approaches > > On Apr 5, 2007, at 3:58 PM, morale

Re: Lucene for name matching

2007-04-06 Thread moraleslos
Thanks guys! I really really appreciate your feedback. I didn't know a "simple" problem like People name matching would be this complicated. I knew there will be some unusual circumstances or rules, but I did not realize how much work has been done to solve parts of the problem (string matching

issues with optimizer

2007-06-06 Thread moraleslos
I'm running into the "WARN | Compass Scheduled Optimizer | AdaptiveOptimizer | ne.optimizer.AdaptiveOptimizer 104 | Failed to obtain lock on sub-index [book], will do it next time." messages more frequently, most likely due to the lucene index getting enormous. The adaptive optimizer is scheduled

Re: issues with optimizer

2007-06-06 Thread moraleslos
Hoss, actually its more of a "general" question rather than a compass specific one. Here's my complete process: I have incoming data being indexed every hour. The data varies from 100 to 1 documents. I'm also having the index optimized via Compass (using its Adaptive or Aggressive optimize

FNFE on the index

2007-06-07 Thread moraleslos
Hi, I'm encountering this error and not sure why this is happening: java.io.FileNotFoundException: /index/book/_19b87.tis (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:212)