RE: tool to check the index field

2004-11-17 Thread Viparthi, Kiran (AFIS)
Try using : 

Luke : http://www.getopt.org/luke/
Limo : http://limo.sourceforge.net/

Regards,
Kiran.


-Original Message-
From: lingaraju [mailto:[EMAIL PROTECTED] 
Sent: 17 November 2004 16:00
To: Lucene Users List
Subject: tool to check the index field


HI ALL

I am having  index file created by other people
Now  i want to know how many field are there in the index
Is there any third party tool to do this
I saw some where some GUI tool to do this but  forgot the name.

Regards
LingaRaju 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: SELECTIVE Indexing

2004-05-19 Thread Viparthi, Kiran (AFIS)
I doubt if it can be used as a plug in.
Would be good to know if it can be used as a plug in.

Regards,
Kiran.

-Original Message-
From: Karthik N S [mailto:[EMAIL PROTECTED] 
Sent: 17 May 2004 12:30
To: Lucene Users List
Subject: RE: SELECTIVE Indexing


Hi

Can I Use TIDY [as plug in ] with Lucene ...


with regards
Karthik

-Original Message-
From: Viparthi, Kiran (AFIS) [mailto:[EMAIL PROTECTED]
Sent: Monday, May 17, 2004 3:27 PM
To: 'Lucene Users List'
Subject: RE: SELECTIVE Indexing



Try using Tidy.
Creates a Document of the html and allows you to apply xpath. Hope this
helps.

Kiran.

-Original Message-
From: Karthik N S [mailto:[EMAIL PROTECTED]
Sent: 17 May 2004 11:59
To: Lucene Users List
Subject: SELECTIVE Indexing



Hi all

   Can Some Body tell me How to Index  CERTAIN PORTION OF THE HTML FILE Only

   ex:-
table .
   

 /table


with regards
Karthik




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: SELECTIVE Indexing

2004-05-17 Thread Viparthi, Kiran (AFIS)

Try using Tidy.
Creates a Document of the html and allows you to apply xpath.
Hope this helps.

Kiran.

-Original Message-
From: Karthik N S [mailto:[EMAIL PROTECTED] 
Sent: 17 May 2004 11:59
To: Lucene Users List
Subject: SELECTIVE Indexing



Hi all

   Can Some Body tell me How to Index  CERTAIN PORTION OF THE HTML FILE Only

   ex:-
table .
   

 /table


with regards
Karthik




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Did you mean...

2004-02-16 Thread Viparthi, Kiran (AFIS)
Hi Timo,

I was mentioning to your previous code that you can collect all the text
from term.

IndexReader reader = IndexReader.open(ram);
TermEnum te = reader.terms();
StringBuffer sb = new StringBuffer();
while(te.next())
{
Term t = te.term();
 sb.append(t.text());
}

And you can get the tokens using StringTokenizer on the sb.toString() and
put them into Map by calculating the occurrences.
As mentioned I didn't use any information from index so I didn't uses any
TokenStream but let me check it out.

Kiran

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: 16 February 2004 11:38
To: Lucene Users List
Subject: Re: Did you mean...


On Thursday 12 February 2004 18:35, Viparthi, Kiran (AFIS) wrote:
 As mentioned the only way I can see is to get the output of the 
 analyzer directly as a TokenStream iterate through it and insert it 
 into a Map.

Could you provide or point me to some example code on how to get and use 
TokenStream. The API docs are somewhat unclear to me...

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Did you mean...

2004-02-12 Thread Viparthi, Kiran (AFIS)
Hi,

We archived this by creating a separate index words extracting the
complete list of words.
You can also work on the frequency if you are extracting these from other
indexes but could be expensive.
Manipulating the search for doing a fuzzy search in the words index would
give you the better list of matching words for spellings.

Kiran.


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: 12 February 2004 08:48
To: Lucene Users List
Subject: Re: Did you mean...


On Thursday 12 February 2004 00:15, Matt Tucker wrote:
 We implemented that type of system using a spelling engine by 
 Wintertree:

 http://www.wintertree-software.com

 There are some free Java spelling packages out there too that you 
 could likely use.

But this does not ensure that the word really exists in the index. The word 
google does propose however to exist.

Regards
Timo

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Did you mean...

2004-02-12 Thread Viparthi, Kiran (AFIS)
Hi Timo,

As we just deal with a small and limited KAON Ontology.
I should say we use a crude way using StringTokenizer searching for  
And maintaining a unique list.

But I assume that there could be other better ways if you are getting them
from another index.

Kiran.

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
Sent: 12 February 2004 17:54
To: Lucene Users List
Subject: Re: Did you mean...


On Thursday 12 February 2004 09:43, Viparthi, Kiran (AFIS) wrote:
 We archived this by creating a separate index words extracting the 
 complete list of words.

How were you extracting the words?

Timo

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: umlaut normalisation

2004-01-27 Thread Viparthi, Kiran (AFIS)

Hi,

is that possible with lucene to use umlaut normalisation?
For example Query: Hühnerstall -- Query: Huehnerstall.

Just a comment, I'm not really answering the questions you ask.

I assume you can manipulate your query to remove the significance of
accented characters when doing searches such that Hühnerstall would find
Huhnerstall. 
I achieved this process by removing accents in my searchString and making
sure that the analyzer indexes the document with replacing accents as well.



This ofcause includes that the document was indexed with normalized
umlauts. This issue is very important, because not every one starting a
search 
against german documents may have a german keyboard.

This brings me to the next problem. Currently only Luke delivers result
for Hühnerstall, my selfed implemented solution allways makes
huhnerstall 
out of it in the query (Why?). But ther is no huhnerstall indexed.

regards Thomas

Regards,
Kiran



Query expansion

2003-12-18 Thread Viparthi, Kiran (AFIS)
We want to provide did you mean search suggestions on our search results
pages. Most of the did you mean searches will be derived from synonyms,
translations and other information from our ontology(KAON). 
 
 1. It would be nice to be able to navigate the Query object created by the
QueryParser.parse(String) and modify the Query expanding certain clauses
prior to calling Query.toString() to create the did you mean searches.
This would require accessor methods to navigate the query clauses and
methods to actually change the Query. These do not appear to be present in
the current API. To our minds the inferior alternative is to modify the
QueryParser itself to do the expansion and build in a expand/nonexpand
instruction into the QueryParser grammar. Does anyone have better ideas? 
 
 2. A related issue is that we are basically happy with the standard Lucene
QueryParser though we need to make some minor changes to the grammar. In
this case it would be convenient to create an equivalent of the
Query.toString() method to serialize conforming to new grammar outside of
the Query class. The problem here is there don't appear to be enough
accessor methods in the Query classes to write a new X.toString(Query). 
 
 Richard and Kiran