Problem in unicode field value retrival

2002-06-10 Thread Harpreet S Walia

Hi

I am trying to index and search unicode (utf - 8) . the code i am using to index the 
documents is as follows :

/**/
IndexWriter iw = new IndexWriter(d:\\jakarta-tomcat3.2.3\\webapps\\lucene\\index, 
new SimpleAnalyzer(), true); 
String dirBase = d:\\jakarta-tomcat3.2.3\\webapps\\lucene\\docs;
File docDir = new File(dirBase);
String[] docFiles  = docDir.list();
InputStreamReader isr;
InputStream is;
Document doc;
for(int i=0;idocFiles.length;i++)
   { 
  File tempFile = new File(dirBase + \\ + docFiles[i]);
  if(tempFile.isFile()==true)
{
System.out.println(Indexing File : + docFiles[i]);
is = new FileInputStream(tempFile);
isr=new InputStreamReader(is,utf-8);
   doc= new Document();
   doc.add(Field.UnIndexed(path,tempFile.toString()));
   doc.add(Field.Text(abc,(Reader)isr));
   doc.add(Field.Text(all,sansui));
   iw.addDocument(doc);
   is.close();
   isr.close();
  doc=null;
  }
}
 iw.close();
 is=null;
 isr=null;
 iw=null;
 docDir=null;
 
 System.out.println(Indexing Complete);

/**/

Now when i try to search the contents and get the field called abc by using the method 
doc.get(abc) , i get null as the output.

Can anyone please tell me where i am going wrong .

Thanks And Regards
Harpreet



Re: Problem in unicode field value retrival

2002-06-10 Thread Ian Lea

I don't think you can retrieve the contents of Fields that have
been loaded by a Reader.  From the javadoc for Field:

Text(String name, Reader value)

   Constructs a Reader-valued Field that is tokenized and indexed, but is
   not stored in the index verbatim.


--
Ian.
[EMAIL PROTECTED]


 [EMAIL PROTECTED] (Harpreet S Walia) wrote 

 Hi
 
 I am trying to index and search unicode (utf - 8) . the code i am using to index the 
documents is as follows :
 
 
/**/
 IndexWriter iw = new IndexWriter(d:\\jakarta-tomcat3.2.3\\webapps\\lucene\\index, 
new SimpleAnalyzer(), true); 
 String dirBase = d:\\jakarta-tomcat3.2.3\\webapps\\lucene\\docs;
 File docDir = new File(dirBase);
 String[] docFiles  = docDir.list();
 InputStreamReader isr;
 InputStream is;
 Document doc;
 for(int i=0;idocFiles.length;i++)
{ 
   File tempFile = new File(dirBase + \\ + docFiles[i]);
   if(tempFile.isFile()==true)
 {
 System.out.println(Indexing File : + docFiles[i]);
 is = new FileInputStream(tempFile);
 isr=new InputStreamReader(is,utf-8);
doc= new Document();
doc.add(Field.UnIndexed(path,tempFile.toString()));
doc.add(Field.Text(abc,(Reader)isr));
doc.add(Field.Text(all,sansui));
iw.addDocument(doc);
is.close();
isr.close();
   doc=null;
   }
 }
  iw.close();
  is=null;
  isr=null;
  iw=null;
  docDir=null;
  
  System.out.println(Indexing Complete);
 
 
/**/
 
 Now when i try to search the contents and get the field called abc by using the 
method doc.get(abc) , i get null as the output.
 
 Can anyone please tell me where i am going wrong .
 
 Thanks And Regards
 Harpreet
 
--
Searchable personal storage and archiving from http://www.digimem.net/



--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]


Re: Problem in unicode field value retrival

2002-06-10 Thread Harpreet S Walia

Hi,

That was the problem , Thanks :-) . still i am strugling to get lucene to
search non english unicode content . it works partially will simple analyser
but doesn't return any results with standard analyser . is there a way by
which i can output the exact contents that are going into the index.

Thanks and regards,
Harpreet


- Original Message -
From: Ian Lea [EMAIL PROTECTED]
To: Harpreet S Walia [EMAIL PROTECTED]
Cc: Lucene Users List [EMAIL PROTECTED]
Sent: Monday, June 10, 2002 5:15 PM
Subject: Re: Problem in unicode field value retrival


 I don't think you can retrieve the contents of Fields that have
 been loaded by a Reader.  From the javadoc for Field:

 Text(String name, Reader value)

Constructs a Reader-valued Field that is tokenized and indexed, but is
not stored in the index verbatim.


 --
 Ian.
 [EMAIL PROTECTED]


  [EMAIL PROTECTED] (Harpreet S Walia) wrote
 
  Hi
 
  I am trying to index and search unicode (utf - 8) . the code i am using
to index the documents is as follows :
 
 
/***
***/
  IndexWriter iw = new
IndexWriter(d:\\jakarta-tomcat3.2.3\\webapps\\lucene\\index, new
SimpleAnalyzer(), true);
  String dirBase = d:\\jakarta-tomcat3.2.3\\webapps\\lucene\\docs;
  File docDir = new File(dirBase);
  String[] docFiles  = docDir.list();
  InputStreamReader isr;
  InputStream is;
  Document doc;
  for(int i=0;idocFiles.length;i++)
 {
File tempFile = new File(dirBase + \\ + docFiles[i]);
if(tempFile.isFile()==true)
  {
  System.out.println(Indexing File : + docFiles[i]);
  is = new FileInputStream(tempFile);
  isr=new InputStreamReader(is,utf-8);
 doc= new Document();
 doc.add(Field.UnIndexed(path,tempFile.toString()));
 doc.add(Field.Text(abc,(Reader)isr));
 doc.add(Field.Text(all,sansui));
 iw.addDocument(doc);
 is.close();
 isr.close();
doc=null;
}
  }
   iw.close();
   is=null;
   isr=null;
   iw=null;
   docDir=null;
 
   System.out.println(Indexing Complete);
 
 
/***
***/
 
  Now when i try to search the contents and get the field called abc by
using the method doc.get(abc) , i get null as the output.
 
  Can anyone please tell me where i am going wrong .
 
  Thanks And Regards
  Harpreet
 
 --
 Searchable personal storage and archiving from http://www.digimem.net/








 --
 To unsubscribe, e-mail:
mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
mailto:[EMAIL PROTECTED]


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Problem in unicode field value retrival

2002-06-10 Thread Otis Gospodnetic

Hello,

 That was the problem , Thanks :-) . still i am strugling to get
 lucene to
 search non english unicode content . it works partially will simple
 analyser
 but doesn't return any results with standard analyser . is there a
 way by
 which i can output the exact contents that are going into the index  

Perhaps something like this will help.  This is a very recent post from
the searchable mailing list archives at http://nagoya.apache.org/:

http://nagoya.apache.org/eyebrowse/ReadMsg?[EMAIL PROTECTED]msgId=352570

Otis


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Within Search

2002-06-10 Thread Otis Gospodnetic

Hello,

I'm sending this to lucene-user list, as that seems more appropriate.
I haven't used Lucene's slop feature, but it looks like both
QueryParser and PhraseQuery have support for slop.  I am not sure what
the syntax for it is, but if nothign else you should be able to call
setSlop(int) method on an instance of PhraseQuery.

Oh, it looks like you missed it in the Query Parser Syntax document:
http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

Otis


--- none none [EMAIL PROTECTED] wrote:
 hi,
 i asked some help about this feature some time ago, but no answer.
 What do i need to do is the WithinPhraseSearch. An example can be:
 
 search for:  car w/10 rent.
 
 This mean, look for documents that contains 'car' and within 10 words
 'rent'. So, what i think i need is:
 
 1.Change the QueryParser.jj to reconize the operator w/xx as the
 within operator.
 
 2.The QueryParser should return a PhraseQuery with a slop factor
 equals to '10' for the example above. Should also ignore w/xx if xx
 is not numeric.
 
 An other question: what should i do if i want the query operator
 (AND,OR,NOT,etc) to be case insensitive? what should i change inside
 the QueryParser.jj ? 
 
 PLEASE HELP, because i really don't know how to use the JavaCC
 utility.
 
 Thanks,
 bye.
 
  
 
 
 ___
 WIN a first class trip to Hawaii.  Live like the King of Rock and
 Roll
 on the big Island. Enter Now!

http://r.lycos.com/r/sagel_mail/http://www.elvis.lycos.com/sweepstakes
 
 --
 To unsubscribe, e-mail:  
 mailto:[EMAIL PROTECTED]
 For additional commands, e-mail:
 mailto:[EMAIL PROTECTED]
 


__
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Within Search

2002-06-10 Thread none none

 thanks, i saw the queryparser documentation and works fine.
now how can i make the query operator like 'AND', 'OR', etc, case insensitive? also 
how can i change the '~' to 'w/' ?
I really don't know how use JavaCC, but may be for someone is easy,
someone can help me?
thank you.
--

On Mon, 10 Jun 2002 09:01:29  
 none none wrote:
hi,
i asked some help about this feature some time ago, but no answer.
What do i need to do is the WithinPhraseSearch. An example can be:

search for:  car w/10 rent.

This mean, look for documents that contains 'car' and within 10 words 'rent'. So, 
what i think i need is:

1.Change the QueryParser.jj to reconize the operator w/xx as the within operator.

2.The QueryParser should return a PhraseQuery with a slop factor equals to '10' for 
the example above. Should also ignore w/xx if xx is not numeric.

An other question: what should i do if i want the query operator (AND,OR,NOT,etc) to 
be case insensitive? what should i change inside the QueryParser.jj ? 

PLEASE HELP, because i really don't know how to use the JavaCC utility.

Thanks,
bye.

 


___
WIN a first class trip to Hawaii.  Live like the King of Rock and Roll
on the big Island. Enter Now!
http://r.lycos.com/r/sagel_mail/http://www.elvis.lycos.com/sweepstakes

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




___
WIN a first class trip to Hawaii.  Live like the King of Rock and Roll
on the big Island. Enter Now!
http://r.lycos.com/r/sagel_mail/http://www.elvis.lycos.com/sweepstakes

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Within Search

2002-06-10 Thread Peter Carlson

This is a bit more complicated.

We have had this discussion a while ago about having a NEAR operator. The
queryParser.jj of how to do this in the developer mailling list.

The problem is that the solution is not generic. That is what if the term is
a wildcard or a more complicated sub query (a query in parentheses).

For example
(a AND b) w/ c

This type of query is not supported by the Lucene Slop factor. That's why
it's not support in Lucene as part of the general QueryParser.

If you are willing to live with these limitations, the queryParser.jj with
the NEAR operator should work.

--Peter 


On 6/10/02 1:19 PM, none none [EMAIL PROTECTED] wrote:

 thanks, i saw the queryparser documentation and works fine.
 now how can i make the query operator like 'AND', 'OR', etc, case insensitive?
 also how can i change the '~' to 'w/' ?
 I really don't know how use JavaCC, but may be for someone is easy,
 someone can help me?
 thank you.
 --
 
 On Mon, 10 Jun 2002 09:01:29
 none none wrote:
 hi,
 i asked some help about this feature some time ago, but no answer.
 What do i need to do is the WithinPhraseSearch. An example can be:
 
 search for:  car w/10 rent.
 
 This mean, look for documents that contains 'car' and within 10 words 'rent'.
 So, what i think i need is:
 
 1.Change the QueryParser.jj to reconize the operator w/xx as the within
 operator.
 
 2.The QueryParser should return a PhraseQuery with a slop factor equals to
 '10' for the example above. Should also ignore w/xx if xx is not numeric.
 
 An other question: what should i do if i want the query operator
 (AND,OR,NOT,etc) to be case insensitive? what should i change inside the
 QueryParser.jj ?
 
 PLEASE HELP, because i really don't know how to use the JavaCC utility.
 
 Thanks,
 bye.
 
 
 
 
 ___
 WIN a first class trip to Hawaii.  Live like the King of Rock and Roll
 on the big Island. Enter Now!
 http://r.lycos.com/r/sagel_mail/http://www.elvis.lycos.com/sweepstakes
 
 --
 To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 
 
 
 
 ___
 WIN a first class trip to Hawaii.  Live like the King of Rock and Roll
 on the big Island. Enter Now!
 http://r.lycos.com/r/sagel_mail/http://www.elvis.lycos.com/sweepstakes
 
 --
 To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
 For additional commands, e-mail: mailto:[EMAIL PROTECTED]
 
 


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




How does simple analyser work

2002-06-10 Thread Harpreet S Walia

Hi,

Are there any resources available which explain how the simple analyser processes the 
data given to it . 
what i want to know is that suppose i have a set of words , what exact rules are 
applied to tokenize and index these words and how can i customize them. 

My requirement is that the words be broken only by spaces and not at any other 
character . I understand that this can be done by writing  a parser in JAVACC . but is 
there any simpler way of achieving this .

I would really appriciate the help .

Thanks and regards
Harpreet