Re: Searching against Database
Hello Even i am searching the same code as all my web display information is stored in database. Early response will be very much helpful Thanks and regards Raju - Original Message - From: Hetan Shah [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 5:56 AM Subject: Searching against Database Hello All, I have got all the answers from this fantastic mailing list. I have another question ;) What is the best way (Best Practices) to integrate Lucene with live database, Oracle to be more specific. Any pointers are really very much appreciated. thanks guys. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ArrayIndexOutOfBoundsException if stopword on left of bool clause w/ StandardAnalyzer
Claude Devarenne writes: My question is: should the queryParser catch that there is no term before trying to add a clause when using a StandardAnalyzer? Is this even possible? Should the burden be on the application to either catch the exception or parse the query before handing it out to the queryParser? Yes. Yes. No. There are fixes in bugzilla that would make query parser read that query as title:bla and simply drop the stop word. see http://issues.apache.org/bugzilla/show_bug.cgi?id=9110 http://issues.apache.org/bugzilla/show_bug.cgi?id=25820 Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Re: Searching against Database
I don't have any best practices to offer. I have been using Lucene with MySQL for an year though. All I do is store a key of some sort in the index new Field(id, getPK(), true, false, false) and then relate that to the database in code. For Live Oracle databases, you might consider different things. As I hear, Oracle lets you use Java in PL (no experience here). So you might consider to add some code into the triggers to add and delete documents from the index. But modifying the index is not as quick as modifying a database in most cases. So you might want to come up with some sort of a compromise on this. Perhaps more experienced users in this list will have better insights. Hope that helps. On Thu, 15 Jul 2004 lingaraju wrote : Hello Even i am searching the same code as all my web display information is stored in database. Early response will be very much helpful Thanks and regards Raju - Original Message - From: Hetan Shah [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 5:56 AM Subject: Searching against Database Hello All, I have got all the answers from this fantastic mailing list. I have another question ;) What is the best way (Best Practices) to integrate Lucene with live database, Oracle to be more specific. Any pointers are really very much appreciated. thanks guys. -H
Re: Searching against Database
Hi again, I'm thinking to get the list of IDs from the database and the list of hits from Lucene Index and to create a comparator in order to eliminate the not permitted Hits from the list. Which solution do you think is better? Thanks, Sergiu Sergiu Gordea wrote: Hi, I have a simillar problem. I'm working on a web application in which the users have different permissions. Not all information stored in the index is public for all users. The documents in Index are identified by the same ID that the rows have in database tables. I can get the IDs of the documents that can be accesible by the user, but if this are 1000, what will happen in Lucene? Is this a valid solution? Can anyone provide a better idea? Thanks, Sergiu lingaraju wrote: Hello Even i am searching the same code as all my web display information is stored in database. Early response will be very much helpful Thanks and regards Raju - Original Message - From: Hetan Shah [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 5:56 AM Subject: Searching against Database Hello All, I have got all the answers from this fantastic mailing list. I have another question ;) What is the best way (Best Practices) to integrate Lucene with live database, Oracle to be more specific. Any pointers are really very much appreciated. thanks guys. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search +QueryParser+Score
Hey Guy's Apologies. I have a Question Is there any API avaliable in Lucene1.4 to set the Score value to 1.0f or lesser BEFORE doing the Query Parser for search , so that the returns Hits for the Score settings only. with regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: One Field!
On Jul 14, 2004, at 10:19 PM, Jones G wrote: I have an index with multiple fields. Right now I am using MultiFieldQueryParser to search the fields. This means that if the same term occurs in multiple fields, it will be weighed accordingly. Is there any way to treat all the fields in question as one field and score the document accordingly without having to reindex. You could change the coord() factor of Similarity in a custom implementation - that might do what you want with scoring. But I prefer having a single queryable field that aggregates everything I want searchable, which would require re-indexing in your scenario. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching against Database
In this situation, you may want to investigate implementing a custom Filter which is user-specific and constrains the search space to only the rows a specific user is allowed to search. Erik On Jul 15, 2004, at 3:04 AM, Sergiu Gordea wrote: Hi again, I'm thinking to get the list of IDs from the database and the list of hits from Lucene Index and to create a comparator in order to eliminate the not permitted Hits from the list. Which solution do you think is better? Thanks, Sergiu Sergiu Gordea wrote: Hi, I have a simillar problem. I'm working on a web application in which the users have different permissions. Not all information stored in the index is public for all users. The documents in Index are identified by the same ID that the rows have in database tables. I can get the IDs of the documents that can be accesible by the user, but if this are 1000, what will happen in Lucene? Is this a valid solution? Can anyone provide a better idea? Thanks, Sergiu lingaraju wrote: Hello Even i am searching the same code as all my web display information is stored in database. Early response will be very much helpful Thanks and regards Raju - Original Message - From: Hetan Shah [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 5:56 AM Subject: Searching against Database Hello All, I have got all the answers from this fantastic mailing list. I have another question ;) What is the best way (Best Practices) to integrate Lucene with live database, Oracle to be more specific. Any pointers are really very much appreciated. thanks guys. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search +QueryParser+Score
Kathik, I have a really hard time following your questions, otherwise I'd chime in on them more often. Your meaning is not often clear. In the case of normalizing the score to 1.0 or less - this is precisely what Hits does for you. I'm not sure what you mean by BEFORE doing QueryParser - a score is computed based on a query, so it necessarily must come after. Erik On Jul 15, 2004, at 6:55 AM, Karthik N S wrote: Hey Guy's Apologies. I have a Question Is there any API avaliable in Lucene1.4 to set the Score value to 1.0f or lesser BEFORE doing the Query Parser for search , so that the returns Hits for the Score settings only. with regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Search +QueryParser+Score
I don't really understand what QueryParser has to do with your question. If you want only Hits that have a score of 1.0 (keep in mind that Hits normalizes scores if they are over 1.0), why not just walk all the Hits in order until you get to one that is not 1.0? Or, use a HitCollector to collect hits (scores not normalized with a HitCollector) and bail out when you are done. (although bailing out of a HitCollector is not as clean as we should make it in Lucene 2.0 - we should add that to the whiteboard). Erik On Jul 15, 2004, at 7:36 AM, Karthik N S wrote: Hey Guys... Apologies Let me be more Specific regarding the last mail I would like to get all Hits returned with score = 1.0 ONLY using Query Parser . What are my Options. with regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 4:45 PM To: Lucene Users List Subject: Re: Search +QueryParser+Score Kathik, I have a really hard time following your questions, otherwise I'd chime in on them more often. Your meaning is not often clear. In the case of normalizing the score to 1.0 or less - this is precisely what Hits does for you. I'm not sure what you mean by BEFORE doing QueryParser - a score is computed based on a query, so it necessarily must come after. Erik On Jul 15, 2004, at 6:55 AM, Karthik N S wrote: Hey Guy's Apologies. I have a Question Is there any API avaliable in Lucene1.4 to set the Score value to 1.0f or lesser BEFORE doing the Query Parser for search , so that the returns Hits for the Score settings only. with regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching against Database
If you know ahead of time which documents are viewable by a certain user group you could add a field, such as group, and then when you index the document you put the names of the user groups that are allowed to view that document. Then your query tool can append, for example AND group:developers to the user's query. Then you will not have to merge results. -Will -Original Message- From: Sergiu Gordea [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 2:58 AM To: Lucene Users List Subject: Re: Searching against Database Hi, I have a simillar problem. I'm working on a web application in which the users have different permissions. Not all information stored in the index is public for all users. The documents in Index are identified by the same ID that the rows have in database tables. I can get the IDs of the documents that can be accesible by the user, but if this are 1000, what will happen in Lucene? Is this a valid solution? Can anyone provide a better idea? Thanks, Sergiu lingaraju wrote: Hello Even i am searching the same code as all my web display information is stored in database. Early response will be very much helpful Thanks and regards Raju - Original Message - From: Hetan Shah [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 5:56 AM Subject: Searching against Database Hello All, I have got all the answers from this fantastic mailing list. I have another question ;) What is the best way (Best Practices) to integrate Lucene with live database, Oracle to be more specific. Any pointers are really very much appreciated. thanks guys. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Problems indexing Japanese with CJKAnalyzer
If its a web application, you have to cal request.setEncoding(UTF-8) before reading any parameters. Also make sure html page encoding is specified as UTF-8 in the metatag. most web app servers decode the request paramaters in the system's default encoding algorithm. If u call above method, I think it will solve ur problem. Praveen - Original Message - From: Bruno Tirel [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 6:15 AM Subject: RE: Problems indexing Japanese with CJKAnalyzer Hi All, I am also trying to localize everything for French application, using UTF-8 encoding. I have already applied what Jon described. I fully confirm his recommandation for HTML Parser and HTML Document changes with UNICODE and UTF-8 encoding specification. In my case, I have still one case not functional : using meta-data from HTML document, as in demo3 example. Trying to convert to UTF-8, or ISO-8859-1, it is still not correctly encoded when I check with Luke. A word Propriété is seen either as Propri?t? with a square, or as Propriã©tã©. My local codepage is Cp1252, so should be viewed as ISO-8859-1. Same result when I use local FileEncoding parameter. All the other fields are correctly encoded into UTF-8, tokenized and successfully searched through JSP page. Is anybody already facing this issue? Any help available? Best regards, Bruno -Message d'origine- De : Jon Schuster [mailto:[EMAIL PROTECTED] Envoyé : mercredi 14 juillet 2004 22:51 À : 'Lucene Users List' Objet : RE: Problems indexing Japanese with CJKAnalyzer Hi all, Thanks for the help on indexing Japanese documents. I eventually got things working, and here's an update so that other folks might have an easier time in similar situations. The problem I had was indeed with the encoding, but it was more than just the encoding on the initial creation of the HTMLParser (from the Lucene demo package). In HTMLDocument, doing this: InputStreamReader reader = new InputStreamReader( new FileInputStream(f), SJIS); HTMLParser parser = new HTMLParser( reader ); creates the parser and feeds it Unicode from the original Shift-JIS encoding document, but then when the document contents is fetched using this line: Field fld = Field.Text(contents, parser.getReader() ); HTMLParser.getReader creates an InputStreamReader and OutputStreamWriter using the default encoding, which in my case was Windows 1252 (essentially Latin-1). That was bad. In the HTMLParser.jj grammar file, adding an explicit encoding of UTF8 on both the Reader and Writer got things mostly working. The one missing piece was in the options section of the HTMLParser.jj file. The original grammar file generates an input character stream class that treats the input as a stream of 1-byte characters. To have JavaCC generate a stream class that handles double-byte characters, you need the option UNICODE_INPUT=true. So, there were essentially three changes in two files: HTMLParser.jj - add UNICODE_INPUT=true to options section; add explicit UTF8 encoding on Reader and Writer creation in getReader(). As far as I can tell, this changes works fine for all of the languages I need to handle, which are English, French, German, and Japanese. HTMLDocument - add explicit encoding of SJIS when creating the Reader used to create the HTMLParser. (For western languages, I use encoding of ISO8859_1.) And of course, use the right language tokenizer. --Jon earlier responses snipped; see the list archive - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching against Database
This is not a solution in my case, becasue the permissions of the groups, and the user groups can be changed, and it will make managing index to be a nightmare. anyway, I appreciate the advice, maybe it will be useful for the other guys that asked this question. Sergiu [EMAIL PROTECTED] wrote: If you know ahead of time which documents are viewable by a certain user group you could add a field, such as group, and then when you index the document you put the names of the user groups that are allowed to view that document. Then your query tool can append, for example AND group:developers to the user's query. Then you will not have to merge results. -Will -Original Message- From: Sergiu Gordea [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 2:58 AM To: Lucene Users List Subject: Re: Searching against Database Hi, I have a simillar problem. I'm working on a web application in which the users have different permissions. Not all information stored in the index is public for all users. The documents in Index are identified by the same ID that the rows have in database tables. I can get the IDs of the documents that can be accesible by the user, but if this are 1000, what will happen in Lucene? Is this a valid solution? Can anyone provide a better idea? Thanks, Sergiu lingaraju wrote: Hello Even i am searching the same code as all my web display information is stored in database. Early response will be very much helpful Thanks and regards Raju - Original Message - From: Hetan Shah [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 5:56 AM Subject: Searching against Database Hello All, I have got all the answers from this fantastic mailing list. I have another question ;) What is the best way (Best Practices) to integrate Lucene with live database, Oracle to be more specific. Any pointers are really very much appreciated. thanks guys. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Wildcard search with my own analyzer
I wanted to support categories, and so I created my own analyzer so that: Root Category||My Category||Some Other Things Would be split up into three terms split by ||, and I wanted it to stay case sensitive. If I do a search for: categories:Root Category it works fine. But if I do a search for: categories:Root Cate* it doesn't find it. What do I need to do so that wildcard searching will work on this? I am using the same analyzer for indexing and searching (otherwise the first search wouldn't work either). Thank you, Joel Shellman - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Anyone use MultiSearcher class
Don, I think I finally understand your problem -- and mine -- with MultiSearcher. I had tested an implementation of my system using ParallelMultiSearcher to split a huge index over many computers. I was very impressed by the results on my test data, but alarmed after a trial with live data :) Consider MultiSearcher.search(Query Q). Suppose that Q aggregated over ALL the Searchables in the MultiSearcher would return 1000 documents. But, the Hits object created by search() will only cache the first 100 documents. When Hits.doc(101) is called, Hits will cache 200 documents -- then 400, 800, 1600 and so on. How does Hits get these extra documents? By calling the MultiSearcher again. Now consider a MultiSearcher as described above with 2 Searchables. With respect to Q, Searchable S has 1000 documents, Searchable T has zero. So to fetch the 101st document, not only is S searched, but T is too, even though the result of Q applied to T is still zero and will always be zero. The same thing will happen when fetching the 201st, 401st and 801st document. This accounts for my slow performance, and I think yours too. That your observed degradation is a power of 2 is a clue. My performance is especially vulnerable because slave Searchables in the MultiSearcher are Remote -- accessed via RMI. I guess I have to code smarter around MultiSearcher. One problem you highlight is that Hits is final -- so it is not possible even to modify the 100/200/400 cache size logic. Any ideas from anyone would be much appreciated. Mark Florence CTO, AIRS 800-897-7714 x 1703 [EMAIL PROTECTED] -Original Message- From: Don Vaillancourt [mailto:[EMAIL PROTECTED] Sent: Monday, July 12, 2004 12:36 pm To: Lucene Users List Subject: Anyone use MultiSearcher class Hello, Has anyone used the Multisearcher class? I have noticed that searching two indexes using this MultiSearcher class takes 8 times longer than searching only one index. I could understand if it took 3 to 4 times longer to search due to sorting the two search results and stuff, but why 8 times longer. Is there some optimization that can be done to hasten the search? Or should I just write my own MultiSearcher. The problem though is that there is no way for me to create my own Hits object (no methods are available and the class is final). Anyone have any clue? Thanks Don Vaillancourt Director of Software Development WEB IMPACT INC. 416-815-2000 ext. 245 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching against Database
Hi, You how to convert RTF file file txt file. Any API available? If u have any sample code pls send it to me. Regards, Natarajan. -Original Message- From: Sergiu Gordea [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 2:16 PM To: Lucene Users List Subject: Re: Searching against Database This is not a solution in my case, becasue the permissions of the groups, and the user groups can be changed, and it will make managing index to be a nightmare. anyway, I appreciate the advice, maybe it will be useful for the other guys that asked this question. Sergiu [EMAIL PROTECTED] wrote: If you know ahead of time which documents are viewable by a certain user group you could add a field, such as group, and then when you index the document you put the names of the user groups that are allowed to view that document. Then your query tool can append, for example AND group:developers to the user's query. Then you will not have to merge results. -Will -Original Message- From: Sergiu Gordea [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 2:58 AM To: Lucene Users List Subject: Re: Searching against Database Hi, I have a simillar problem. I'm working on a web application in which the users have different permissions. Not all information stored in the index is public for all users. The documents in Index are identified by the same ID that the rows have in database tables. I can get the IDs of the documents that can be accesible by the user, but if this are 1000, what will happen in Lucene? Is this a valid solution? Can anyone provide a better idea? Thanks, Sergiu lingaraju wrote: Hello Even i am searching the same code as all my web display information is stored in database. Early response will be very much helpful Thanks and regards Raju - Original Message - From: Hetan Shah [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 5:56 AM Subject: Searching against Database Hello All, I have got all the answers from this fantastic mailing list. I have another question ;) What is the best way (Best Practices) to integrate Lucene with live database, Oracle to be more specific. Any pointers are really very much appreciated. thanks guys. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Wildcard search with my own analyzer
On Jul 15, 2004, at 10:02 AM, Morus Walter wrote: Joel Shellman writes: What do I need to do so that wildcard searching will work on this? I am using the same analyzer for indexing and searching (otherwise the first search wouldn't work either). Check what query is produced (query.toString(...)). I guess that query parser which seems to be what you are using does not support wildcards within `'. Right... when you use double-quotes, a PhraseQuery is implied, and it has no support for wildcards (currently). Check the AnalysisParalysis page on the wiki for some insight into how to go about trouble-shooting things like this. First is to eliminate QueryParser and see if you can make a query through the API that matches what you're after. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Searching against Database
See this document! http://www.jguru.com/faq/view.jsp?EID=1074229 Regards! -- Daniel Hi, You how to convert RTF file file txt file. Any API available? If u have any sample code pls send it to me. Regards, Natarajan. -Original Message- From: Sergiu Gordea [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 2:16 PM To: Lucene Users List Subject: Re: Searching against Database This is not a solution in my case, becasue the permissions of the groups, and the user groups can be changed, and it will make managing index to be a nightmare. anyway, I appreciate the advice, maybe it will be useful for the other guys that asked this question. Sergiu [EMAIL PROTECTED] wrote: If you know ahead of time which documents are viewable by a certain user group you could add a field, such as group, and then when you index the document you put the names of the user groups that are allowed to view that document. Then your query tool can append, for example AND group:developers to the user's query. Then you will not have to merge results. -Will -Original Message- From: Sergiu Gordea [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 2:58 AM To: Lucene Users List Subject: Re: Searching against Database Hi, I have a simillar problem. I'm working on a web application in which the users have different permissions. Not all information stored in the index is public for all users. The documents in Index are identified by the same ID that the rows have in database tables. I can get the IDs of the documents that can be accesible by the user, but if this are 1000, what will happen in Lucene? Is this a valid solution? Can anyone provide a better idea? Thanks, Sergiu lingaraju wrote: Hello Even i am searching the same code as all my web display information is stored in database. Early response will be very much helpful Thanks and regards Raju - Original Message - From: Hetan Shah [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 5:56 AM Subject: Searching against Database Hello All, I have got all the answers from this fantastic mailing list. I have another question ;) What is the best way (Best Practices) to integrate Lucene with live database, Oracle to be more specific. Any pointers are really very much appreciated. thanks guys. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Problems indexing Japanese with CJKAnalyzer ... Or French with UTF-8 and MetaData
I don't think I understand correctly your proposal. As a basis, I am using Demo3 with indexHTML, HTMLDocument and HTMLParser. Inside HTML parser, I am calling getMetaTags (calling addMetaData) wich return Properties object. My issue is coming fron this definition : Properties are stored into ISO-8859-1 encoding, when all my data encodings inside and outside are UTF-8. I am not successful in getting UTF-8 values from this Parser.GetMetaTags() through any conversion. These data are extracted from an HTML page, with UTF-8 encoding declared at the beginning of the file. I do not see how to call a request.setEncoding(UTF-8) : I need the Parser to have knowledge of UTF-8 encoding... And it doesn't appear when using Properties object. Any feedback? -Message d'origine- De : Praveen Peddi [mailto:[EMAIL PROTECTED] Envoyé : jeudi 15 juillet 2004 15:12 À : Lucene Users List Objet : Re: Problems indexing Japanese with CJKAnalyzer If its a web application, you have to cal request.setEncoding(UTF-8) before reading any parameters. Also make sure html page encoding is specified as UTF-8 in the metatag. most web app servers decode the request paramaters in the system's default encoding algorithm. If u call above method, I think it will solve ur problem. Praveen - Original Message - From: Bruno Tirel [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 6:15 AM Subject: RE: Problems indexing Japanese with CJKAnalyzer Hi All, I am also trying to localize everything for French application, using UTF-8 encoding. I have already applied what Jon described. I fully confirm his recommandation for HTML Parser and HTML Document changes with UNICODE and UTF-8 encoding specification. In my case, I have still one case not functional : using meta-data from HTML document, as in demo3 example. Trying to convert to UTF-8, or ISO-8859-1, it is still not correctly encoded when I check with Luke. A word Propriété is seen either as Propri?t? with a square, or as Propriã©tã©. My local codepage is Cp1252, so should be viewed as ISO-8859-1. Same result when I use local FileEncoding parameter. All the other fields are correctly encoded into UTF-8, tokenized and successfully searched through JSP page. Is anybody already facing this issue? Any help available? Best regards, Bruno -Message d'origine- De : Jon Schuster [mailto:[EMAIL PROTECTED] Envoyé : mercredi 14 juillet 2004 22:51 À : 'Lucene Users List' Objet : RE: Problems indexing Japanese with CJKAnalyzer Hi all, Thanks for the help on indexing Japanese documents. I eventually got things working, and here's an update so that other folks might have an easier time in similar situations. The problem I had was indeed with the encoding, but it was more than just the encoding on the initial creation of the HTMLParser (from the Lucene demo package). In HTMLDocument, doing this: InputStreamReader reader = new InputStreamReader( new FileInputStream(f), SJIS); HTMLParser parser = new HTMLParser( reader ); creates the parser and feeds it Unicode from the original Shift-JIS encoding document, but then when the document contents is fetched using this line: Field fld = Field.Text(contents, parser.getReader() ); HTMLParser.getReader creates an InputStreamReader and OutputStreamWriter using the default encoding, which in my case was Windows 1252 (essentially Latin-1). That was bad. In the HTMLParser.jj grammar file, adding an explicit encoding of UTF8 on both the Reader and Writer got things mostly working. The one missing piece was in the options section of the HTMLParser.jj file. The original grammar file generates an input character stream class that treats the input as a stream of 1-byte characters. To have JavaCC generate a stream class that handles double-byte characters, you need the option UNICODE_INPUT=true. So, there were essentially three changes in two files: HTMLParser.jj - add UNICODE_INPUT=true to options section; add explicit UTF8 encoding on Reader and Writer creation in getReader(). As far as I can tell, this changes works fine for all of the languages I need to handle, which are English, French, German, and Japanese. HTMLDocument - add explicit encoding of SJIS when creating the Reader used to create the HTMLParser. (For western languages, I use encoding of ISO8859_1.) And of course, use the right language tokenizer. --Jon earlier responses snipped; see the list archive - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Powered By Lucene image?
Hi, Are there any powered by Lucene images? I thought there used to be some on the site but I can't find them now. Any help is appreciated! Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: RE: Scoring without normalization!
Sadly, I am still running into problems Explain shows the following after the modification. Rank: 1 ID: 11285358Score: 5.5740864E8 5.5740864E8 = product of: 8.3611296E8 = sum of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 1235940), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 1235940), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=1235940) 0.125 = coord(1/8) 2.7106019E-8 = product of: 1.08424075E-7 = sum of: 5.7318403E-9 = weight(abstract:an in 1235940), product of: 0.03711049 = queryWeight(abstract:an), product of: 2.073038 = idf(docFreq=1569960) 0.017901499 = queryNorm 1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of: 1.0 = tf(termFreq(abstract:an)=1) 2.073038 = idf(docFreq=1569960) 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940) 1.0269223E-7 = weight(abstract:iron in 1235940), product of: 0.111071706 = queryWeight(abstract:iron), product of: 6.2046037 = idf(docFreq=25209) 0.017901499 = queryNorm 9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of: 2.0 = tf(termFreq(abstract:iron)=4) 6.2046037 = idf(docFreq=25209) 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940) 0.25 = coord(2/8) 0.667 = coord(2/3) Rank: 2 ID: 8157438 Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 159395), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 159395), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=159395) 0.125 = coord(1/8) 0.3334 = coord(1/3) Rank: 3 ID: 10543103Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 553967), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 553967), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=553967) 0.125 = coord(1/8) 0.3334 = coord(1/3) Rank: 4 ID: 8753559 Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 2563152), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 2563152), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=2563152) 0.125 = coord(1/8) 0.3334 = coord(1/3) I would like to get rid of all normalizations and just have TF and IDF. What am I missing? On Thu, 15 Jul 2004 Anson Lau wrote : If you don't mind hacking the source: In Hits.java In method getMoreDocs() // Comment out the following //float scoreNorm = 1.0f; //if (length 0 scoreDocs[0].score 1.0f) { // scoreNorm = 1.0f / scoreDocs[0].score; //} // And just set scoreNorm to 1. int scoreNorm = 1; I don't know if u can do it without going to the src. Anson -Original Message- From: Jones G [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 6:52 AM To: [EMAIL PROTECTED] Subject: Scoring without normalization! How do I remove document normalization from scoring in Lucene? I just want to stick to TF IDF. Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Scoring without normalization!
Have you looked at: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html in particular, at: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int) http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#queryNorm(float) Doug Jones G wrote: Sadly, I am still running into problems Explain shows the following after the modification. Rank: 1 ID: 11285358Score: 5.5740864E8 5.5740864E8 = product of: 8.3611296E8 = sum of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 1235940), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 1235940), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=1235940) 0.125 = coord(1/8) 2.7106019E-8 = product of: 1.08424075E-7 = sum of: 5.7318403E-9 = weight(abstract:an in 1235940), product of: 0.03711049 = queryWeight(abstract:an), product of: 2.073038 = idf(docFreq=1569960) 0.017901499 = queryNorm 1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of: 1.0 = tf(termFreq(abstract:an)=1) 2.073038 = idf(docFreq=1569960) 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940) 1.0269223E-7 = weight(abstract:iron in 1235940), product of: 0.111071706 = queryWeight(abstract:iron), product of: 6.2046037 = idf(docFreq=25209) 0.017901499 = queryNorm 9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of: 2.0 = tf(termFreq(abstract:iron)=4) 6.2046037 = idf(docFreq=25209) 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940) 0.25 = coord(2/8) 0.667 = coord(2/3) Rank: 2 ID: 8157438 Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 159395), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 159395), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=159395) 0.125 = coord(1/8) 0.3334 = coord(1/3) Rank: 3 ID: 10543103Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 553967), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 553967), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=553967) 0.125 = coord(1/8) 0.3334 = coord(1/3) Rank: 4 ID: 8753559 Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 2563152), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 2563152), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=2563152) 0.125 = coord(1/8) 0.3334 = coord(1/3) I would like to get rid of all normalizations and just have TF and IDF. What am I missing? On Thu, 15 Jul 2004 Anson Lau wrote : If you don't mind hacking the source: In Hits.java In method getMoreDocs() // Comment out the following //float scoreNorm = 1.0f; //if (length 0 scoreDocs[0].score 1.0f) { // scoreNorm = 1.0f / scoreDocs[0].score; //} // And just set scoreNorm to 1. int scoreNorm = 1; I don't know if u can do it without going to the src. Anson -Original Message- From: Jones G [mailto:[EMAIL PROTECTED] Sent: Thursday, July 15, 2004 6:52 AM To: [EMAIL PROTECTED] Subject: Scoring without normalization! How do I remove document normalization from scoring in Lucene? I just want to stick to TF IDF. Thanks. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Re: Scoring without normalization!
Thanks. I tried overriding Similarity, returning 1 in lengthNorm and queryNorm and setSimilarity on IndexSearcher with this. Query: 1 Found: 1540632 Rank: 1 ID: 8157438 Score: 0.9994 3.73650457E11 = weight(title:iron in 159395), product of: 7.0507255 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 1.0 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 159395), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=159395) How do I get rid of QueryWeight, fieldWeight, fieldNorm from the scoring? I tried modifying TermQuery without much luck. On Thu, 15 Jul 2004 Doug Cutting wrote : Have you looked at: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html in particular, at: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#lengthNorm(java.lang.String,%20int) http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#queryNorm(float) Doug Jones G wrote: Sadly, I am still running into problems Explain shows the following after the modification. Rank: 1 ID: 11285358Score: 5.5740864E8 5.5740864E8 = product of: 8.3611296E8 = sum of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 1235940), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 1235940), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=1235940) 0.125 = coord(1/8) 2.7106019E-8 = product of: 1.08424075E-7 = sum of: 5.7318403E-9 = weight(abstract:an in 1235940), product of: 0.03711049 = queryWeight(abstract:an), product of: 2.073038 = idf(docFreq=1569960) 0.017901499 = queryNorm 1.5445337E-7 = fieldWeight(abstract:an in 1235940), product of: 1.0 = tf(termFreq(abstract:an)=1) 2.073038 = idf(docFreq=1569960) 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940) 1.0269223E-7 = weight(abstract:iron in 1235940), product of: 0.111071706 = queryWeight(abstract:iron), product of: 6.2046037 = idf(docFreq=25209) 0.017901499 = queryNorm 9.24558E-7 = fieldWeight(abstract:iron in 1235940), product of: 2.0 = tf(termFreq(abstract:iron)=4) 6.2046037 = idf(docFreq=25209) 7.4505806E-8 = fieldNorm(field=abstract, doc=1235940) 0.25 = coord(2/8) 0.667 = coord(2/3) Rank: 2 ID: 8157438 Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 159395), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 159395), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=159395) 0.125 = coord(1/8) 0.3334 = coord(1/3) Rank: 3 ID: 10543103Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 553967), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 553967), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=553967) 0.125 = coord(1/8) 0.3334 = coord(1/3) Rank: 4 ID: 8753559 Score: 2.7870432E8 2.7870432E8 = product of: 8.3611296E8 = product of: 6.6889037E9 = weight(title:iron in 2563152), product of: 0.12621856 = queryWeight(title:iron), product of: 7.0507255 = idf(docFreq=10816) 0.017901499 = queryNorm 5.2994613E10 = fieldWeight(title:iron in 2563152), product of: 1.0 = tf(termFreq(title:iron)=1) 7.0507255 = idf(docFreq=10816) 7.5161928E9 = fieldNorm(field=title, doc=2563152) 0.125 = coord(1/8) 0.3334 = coord(1/3) I would like to get rid of all normalizations and just have TF and IDF. What am I missing? On Thu, 15 Jul 2004 Anson Lau wrote : If you don't mind hacking the source: In Hits.java In method getMoreDocs() // Comment out the following //float scoreNorm = 1.0f; //if (length 0 scoreDocs[0].score 1.0f) { // scoreNorm = 1.0f / scoreDocs[0].score; //} // And just set scoreNorm to 1. int scoreNorm = 1; I don't know if u can do it without going to the src. Anson -Original Message- From: Jones G [mailto:[EMAIL PROTECTED] Sent:
Token or not Token, PerFieldAnalyzer
Hello, When indexing a field, we have the choice of tokenizing it or not. I have a custom analyzer that contains a tokenizer... does it mean that if the boolean token is set to false, the analyzer is not applied on the field content? Everywhere in the documentation (and it seems logical) you say to use the same analyzer for indexing and querying... how is this handled on not tokenized fields? In my case, I have certain fields on which I want the tokenization and anlysis and everything to happen... but on other fields, I just want to index the content as it is (no alterations at all) and not analyze at query time... is that possible? -- Florian - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Token or not Token, PerFieldAnalyzer
Florian Sauvin wrote: Everywhere in the documentation (and it seems logical) you say to use the same analyzer for indexing and querying... how is this handled on not tokenized fields? Imperfectly. The QueryParser knows nothing about the index, so it does not know which fields were tokenized and which were not. Moreover, even the index does not know this, since you can freely intermix tokenized and untokenized values in a single field. In my case, I have certain fields on which I want the tokenization and anlysis and everything to happen... but on other fields, I just want to index the content as it is (no alterations at all) and not analyze at query time... is that possible? It is very possible. A good way to handle this is to use PerFieldAnalyzerWrapper. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: release migration plan
fp235-5 wrote: I am looking at the code to implement setIndexInterval() in IndexWriter. I'd like to have your opinion on the best way to do it. Currently the creation of an instance of TermInfosWriter requires the following steps: ... IndexWriter.addDocument(Document) IndexWriter.addDocument(Document, Analyser) DocumentWriter.addDocument(String, Document) DocumentWriter.writePostings(Posting[],String) TermInfosWriter.init To give a different value to indexInterval in TermInfosWriter, we need to add a variable holding this value into IndexWriter and DocumentWriter and modify the constructors for DocumentWriter and TermInfosWriter. (quite heavy changes) I think this is the best approach. I would replace other parameters in these constructors which can be derived from an IndexWriter with the IndexWriter. That way, if we add more parameters like this, they can also be passed in through the IndexWriter. All of the parameters to the DocumentWriter constructor are fields of IndexWriter. So one can instead simply pass a single parameter, an IndexWriter, then access its directory, analyzer, similarity and maxFieldLength in the DocumentWriter constructor. A public getDirectory() method would also need to be added to IndexWriter for this to work. Similarly, two of SegmentMerger's constructor parameters could be replaced with an IndexWriter, the directory and boolean useCompoundFile. In SegmentMerge I would replace the directory parameter with IndexWriter. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching against Database
Is it possible to search against the column in the table ? If so are there any limitations on the # of columns one should target to search against? any other suggestions? Thanks. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Searching against Database
- Original Message - From: Hetan Shah [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, July 15, 2004 7:51 PM Subject: Re: Searching against Database Is it possible to search against the column in the table ? If so are there any limitations on the # of columns one should target to search against? What you can search against all depends on how you index your columns. I believe you mentioned that you had data in multiple tables for each record (or Document in Lucene). If you map your your columns to Lucene Fields, and make sure that the primary key for each record is stored in the same Lucene Document object as the columns (Fields), then you should be golden. Someone earlier pointed out that Oracle allows Java in its stored procedures, so if you use a single stored procedure to insert a new record, that same procedure can create a matching Lucene Document and add it to the index. For updates, you will need to delete the Lucene document and then add a new copy of the updated record. If you have a primary key field that is indexed and stored in Lucene, you can use http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#delete(org.apache.lucene.index.Term) to delete the old version. Pete any other suggestions? Thanks. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]