RE: tool to check the index field
Try using : Luke : http://www.getopt.org/luke/ Limo : http://limo.sourceforge.net/ Regards, Kiran. -Original Message- From: lingaraju [mailto:[EMAIL PROTECTED] Sent: 17 November 2004 16:00 To: Lucene Users List Subject: tool to check the index field HI ALL I am having index file created by other people Now i want to know how many field are there in the index Is there any third party tool to do this I saw some where some GUI tool to do this but forgot the name. Regards LingaRaju - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SELECTIVE Indexing
I doubt if it can be used as a plug in. Would be good to know if it can be used as a plug in. Regards, Kiran. -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: 17 May 2004 12:30 To: Lucene Users List Subject: RE: SELECTIVE Indexing Hi Can I Use TIDY [as plug in ] with Lucene ... with regards Karthik -Original Message- From: Viparthi, Kiran (AFIS) [mailto:[EMAIL PROTECTED] Sent: Monday, May 17, 2004 3:27 PM To: 'Lucene Users List' Subject: RE: SELECTIVE Indexing Try using Tidy. Creates a Document of the html and allows you to apply xpath. Hope this helps. Kiran. -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: 17 May 2004 11:59 To: Lucene Users List Subject: SELECTIVE Indexing Hi all Can Some Body tell me How to Index CERTAIN PORTION OF THE HTML FILE Only ex:- table . /table with regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SELECTIVE Indexing
Try using Tidy. Creates a Document of the html and allows you to apply xpath. Hope this helps. Kiran. -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: 17 May 2004 11:59 To: Lucene Users List Subject: SELECTIVE Indexing Hi all Can Some Body tell me How to Index CERTAIN PORTION OF THE HTML FILE Only ex:- table . /table with regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Did you mean...
Hi Timo, I was mentioning to your previous code that you can collect all the text from term. IndexReader reader = IndexReader.open(ram); TermEnum te = reader.terms(); StringBuffer sb = new StringBuffer(); while(te.next()) { Term t = te.term(); sb.append(t.text()); } And you can get the tokens using StringTokenizer on the sb.toString() and put them into Map by calculating the occurrences. As mentioned I didn't use any information from index so I didn't uses any TokenStream but let me check it out. Kiran -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 16 February 2004 11:38 To: Lucene Users List Subject: Re: Did you mean... On Thursday 12 February 2004 18:35, Viparthi, Kiran (AFIS) wrote: As mentioned the only way I can see is to get the output of the analyzer directly as a TokenStream iterate through it and insert it into a Map. Could you provide or point me to some example code on how to get and use TokenStream. The API docs are somewhat unclear to me... - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Did you mean...
Hi, We archived this by creating a separate index words extracting the complete list of words. You can also work on the frequency if you are extracting these from other indexes but could be expensive. Manipulating the search for doing a fuzzy search in the words index would give you the better list of matching words for spellings. Kiran. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 12 February 2004 08:48 To: Lucene Users List Subject: Re: Did you mean... On Thursday 12 February 2004 00:15, Matt Tucker wrote: We implemented that type of system using a spelling engine by Wintertree: http://www.wintertree-software.com There are some free Java spelling packages out there too that you could likely use. But this does not ensure that the word really exists in the index. The word google does propose however to exist. Regards Timo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Did you mean...
Hi Timo, As we just deal with a small and limited KAON Ontology. I should say we use a crude way using StringTokenizer searching for And maintaining a unique list. But I assume that there could be other better ways if you are getting them from another index. Kiran. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: 12 February 2004 17:54 To: Lucene Users List Subject: Re: Did you mean... On Thursday 12 February 2004 09:43, Viparthi, Kiran (AFIS) wrote: We archived this by creating a separate index words extracting the complete list of words. How were you extracting the words? Timo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: umlaut normalisation
Hi, is that possible with lucene to use umlaut normalisation? For example Query: Hühnerstall -- Query: Huehnerstall. Just a comment, I'm not really answering the questions you ask. I assume you can manipulate your query to remove the significance of accented characters when doing searches such that Hühnerstall would find Huhnerstall. I achieved this process by removing accents in my searchString and making sure that the analyzer indexes the document with replacing accents as well. This ofcause includes that the document was indexed with normalized umlauts. This issue is very important, because not every one starting a search against german documents may have a german keyboard. This brings me to the next problem. Currently only Luke delivers result for Hühnerstall, my selfed implemented solution allways makes huhnerstall out of it in the query (Why?). But ther is no huhnerstall indexed. regards Thomas Regards, Kiran
Query expansion
We want to provide did you mean search suggestions on our search results pages. Most of the did you mean searches will be derived from synonyms, translations and other information from our ontology(KAON). 1. It would be nice to be able to navigate the Query object created by the QueryParser.parse(String) and modify the Query expanding certain clauses prior to calling Query.toString() to create the did you mean searches. This would require accessor methods to navigate the query clauses and methods to actually change the Query. These do not appear to be present in the current API. To our minds the inferior alternative is to modify the QueryParser itself to do the expansion and build in a expand/nonexpand instruction into the QueryParser grammar. Does anyone have better ideas? 2. A related issue is that we are basically happy with the standard Lucene QueryParser though we need to make some minor changes to the grammar. In this case it would be convenient to create an equivalent of the Query.toString() method to serialize conforming to new grammar outside of the Query class. The problem here is there don't appear to be enough accessor methods in the Query classes to write a new X.toString(Query). Richard and Kiran