Re: Tool for analyzing analyzers

2004-05-28 Thread markharw00d
Hi Erik,
I've had this running OK from the command line and in Eclipse on XP.
I suspect it might be because you're running a different OS? The Classfinder tries 
to split the system property
java.class.path  on the ; character but I forgot different OSes have different 
seperators.

As for Luke etc - I had a vague notion that this could be extended into a more 
generalised workbench
for Lucene that could also help with indexing.
Using a plug-in architecture (once we get classloading sorted!) you could define 
interfaces for 
things such as fetchers (db/file/web) and parsers (PDF/Word..) and configure them to 
create indexes
using a GUI like this, or a web-based interface. People could then contribute plug-in 
implementations 
as Jars that you could just drop in to the workbench. 

Let me know your setup details and I'll try fix the classloader issue.

Cheers
Mark

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Tool for analyzing analyzers

2004-05-28 Thread Morus Walter
Hi Mark,

 I've had this running OK from the command line and in Eclipse on XP.
 I suspect it might be because you're running a different OS? The Classfinder tries 
 to split the system property
 java.class.path  on the ; character but I forgot different OSes have different 
 seperators.
 
 Let me know your setup details and I'll try fix the classloader issue.
 
I have the same problems and am running on linux using ':' to separate
the class path...

BTW: I tried to compile your sources but you left out the part in thinlet.
  2928 Sun Oct 12 19:47:56 CEST 2003 thinlet/AppletLauncher.class
  2643 Sun Oct 12 19:47:56 CEST 2003 thinlet/FrameLauncher.class
 74823 Sun Oct 12 19:47:56 CEST 2003 thinlet/Thinlet.class
Was that intentional?

Morus

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Range Query Sombody HELP please

2004-05-28 Thread Ype Kingma
Karthik,

On Friday 28 May 2004 05:54, Karthik N S wrote:

...
 Weh we do a search in SQL  using '*' we all know that the result would be
 total no of records in the table,but when  we want to get limit our record
 we apply  range between 2 specific row records [Which we call it as
 subsearch]


Similarly  on a indexed  record  I would like perform the same tecnique
 as above.

In case you need to reuse the limitation a filter is the way to go in Lucene.
However it seems to be better to get the range query working first.

   In fact I was looking at the url u sent me in the last mail on using
 getRange Queries
  and was working on the same

 http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

The query I gave uses two +'s prefixed to the query parts:

+search_word +(book:[100 TO 200])

Both query parts are required because of the +'s, ie. it works
as the AND operator in SQL. The TO operator queries the range
in the book field.

 and

 http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

 but witou results for the last 12 hrs.

You have probably seen a lot of different things that will be useful later.

 If u could spare a few minuts and please expalin or provide a simple  [
 full ] example using and
 over riding the  getRange() method .

The problem you'll probably run into is that Lucene does not
support numbers directly, you'll have to index them as strings,
eg. by prefixing zero's:

As Erik indicated: http://wiki.apache.org/jakarta-lucene/SearchNumericalFields

You may have to reindex your data for this. In case you have a lot of data
consider setting up a test first.

Then in the getRangeQuery() method of your parser you'll need to prefix the queried
numbers in the same way. The example in the article is about date fields,
but the adaptation to numbers shouldn't be a problem.

When you override this in your query parser:
getRangeQuery(String field, Analyzer analyzer, String start, String end, boolean 
inclusive)
it will be called for the example query with  start = 100 and end = 200.

(See http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
under Customizing query parser).

In the overriding method you can then call the super method with the
start and end prefixed with zero's as indicated in searching numerical fields
referred to above.

Have fun, you'll get it working,

Ype

 with regards
 Karthik

 -Original Message-
 From: Ype Kingma [mailto:[EMAIL PROTECTED]
 Sent: Thursday, May 27, 2004 11:03 PM
 To: [EMAIL PROTECTED]
 Subject: Re: Range Query Sombody HELP please

 On Thursday 27 May 2004 09:37, Karthik N S wrote:
  Hi
 Lucene -Developer My main intention was
 
   Search for an word hit  in a Unique Field  between  ranges say
  book100  - book 200  indexed numbers
   It's something like creating a SUBSEARCH  with in the SEARCHINDEX.
...
 Could you explain what you mean by subsearch?
 I suppose you might want to have a look at the various filter classes
 in the org.apache.lucene.search package.

 Regards,
 Ype


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Range Query Sombody HELP please

2004-05-28 Thread Karthik N S
Hey ype

Thx for the advice but still I need to get the  exact situation working ,

1) I have a unique Field [ called filename ] which is indexed of type Text.
It accepts the name of the HTML files as  the indexing parameter ,
   Also there is another Field called Contents   which stores all the
contents of that
   indicated unique named html file.

2) The indexer complete indexes for about 5000 html files  sucessfully .

3) When I do a search for word ,it returns a hit of  400  on various html
files

Now in this situation if I want to limit the hits  between  First 200  to
400  html Page Names  only
what exactly should I do to using getRange() method.


Please advise on how to proceed ...


with regards
Karthik


-Original Message-
From: Ype Kingma [mailto:[EMAIL PROTECTED]
Sent: Friday, May 28, 2004 1:14 PM
To: [EMAIL PROTECTED]
Subject: Re: Range Query Sombody HELP please


Karthik,

On Friday 28 May 2004 05:54, Karthik N S wrote:

...
 Weh we do a search in SQL  using '*' we all know that the result would be
 total no of records in the table,but when  we want to get limit our record
 we apply  range between 2 specific row records [Which we call it as
 subsearch]


Similarly  on a indexed  record  I would like perform the same tecnique
 as above.

In case you need to reuse the limitation a filter is the way to go in
Lucene.
However it seems to be better to get the range query working first.

   In fact I was looking at the url u sent me in the last mail on using
 getRange Queries
  and was working on the same

 http://jakarta.apache.org/lucene/docs/queryparsersyntax.html

The query I gave uses two +'s prefixed to the query parts:

+search_word +(book:[100 TO 200])

Both query parts are required because of the +'s, ie. it works
as the AND operator in SQL. The TO operator queries the range
in the book field.

 and

 http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html

 but witou results for the last 12 hrs.

You have probably seen a lot of different things that will be useful later.

 If u could spare a few minuts and please expalin or provide a simple  [
 full ] example using and
 over riding the  getRange() method .

The problem you'll probably run into is that Lucene does not
support numbers directly, you'll have to index them as strings,
eg. by prefixing zero's:

As Erik
indicated: http://wiki.apache.org/jakarta-lucene/SearchNumericalFields

You may have to reindex your data for this. In case you have a lot of data
consider setting up a test first.

Then in the getRangeQuery() method of your parser you'll need to prefix the
queried
numbers in the same way. The example in the article is about date fields,
but the adaptation to numbers shouldn't be a problem.

When you override this in your query parser:
getRangeQuery(String field, Analyzer analyzer, String start, String end,
boolean inclusive)
it will be called for the example query with  start = 100 and end = 200.

(See http://today.java.net/pub/a/today/2003/11/07/QueryParserRules.html
under Customizing query parser).

In the overriding method you can then call the super method with the
start and end prefixed with zero's as indicated in searching numerical
fields
referred to above.

Have fun, you'll get it working,

Ype

 with regards
 Karthik

 -Original Message-
 From: Ype Kingma [mailto:[EMAIL PROTECTED]
 Sent: Thursday, May 27, 2004 11:03 PM
 To: [EMAIL PROTECTED]
 Subject: Re: Range Query Sombody HELP please

 On Thursday 27 May 2004 09:37, Karthik N S wrote:
  Hi
 Lucene -Developer My main intention was
 
   Search for an word hit  in a Unique Field  between  ranges say
  book100  - book 200  indexed numbers
   It's something like creating a SUBSEARCH  with in the SEARCHINDEX.
...
 Could you explain what you mean by subsearch?
 I suppose you might want to have a look at the various filter classes
 in the org.apache.lucene.search package.

 Regards,
 Ype


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Range Query Sombody HELP please

2004-05-28 Thread Erik Hatcher
On May 28, 2004, at 4:54 AM, Karthik N S wrote:
1) I have a unique Field [ called filename ] which is indexed of type 
Text.
You probably do not want to use Field.Text for a filename.  Use 
Field.Keyword instead.

2) The indexer complete indexes for about 5000 html files  sucessfully 
.
Now use Luke (Google for _luke lucene_) to browse your index, and check 
that you are getting what you think.  You can do ad-hoc queries there 
also.

Now in this situation if I want to limit the hits  between  First 200  
to
400  html Page Names  only
what exactly should I do to using getRange() method.
If you want the first 200 - 400, start your Hits walking at index 200, 
and proceed through 400.

Is there some field you want to key off to do the range?  Or do you 
just want the 200th - 400th hits from the search, which is an entirely 
different question than about ranges.

Please advise on how to proceed ...
Please send (succinct) code examples in the future to really keep this 
discussion concrete and clear.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Tool for analyzing analyzers

2004-05-28 Thread Erik Hatcher
On May 28, 2004, at 2:46 AM, [EMAIL PROTECTED] wrote:
Hi Erik,
I've had this running OK from the command line and in Eclipse on XP.
I suspect it might be because you're running a different OS? The 
Classfinder tries to split the system property
java.class.path  on the ; character but I forgot different OSes 
have different seperators.
There is another OS other than Mac OS X?  :)
There is a File constant that gives you the OS-specific separator.  
File.pathSeparatorChar.

Using a plug-in architecture (once we get classloading sorted!) you 
could define interfaces for
things such as fetchers (db/file/web) and parsers (PDF/Word..) and 
configure them to create indexes
using a GUI like this, or a web-based interface. People could then 
contribute plug-in implementations
as Jars that you could just drop in to the workbench.
Sounds like we'd be re-inventing Nutch :)
But I'd love to build a Lucene demo application that is powerful 
enough to be used as a foundation for folks to use out-of-the-box.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Tool for analyzing analyzers

2004-05-28 Thread Zilverline info
Hi Erik,
Erik Hatcher wrote:
[snip]
But I'd love to build a Lucene demo application that is powerful 
enough to be used as a foundation for folks to use out-of-the-box.
That's just what I thought. Here's one: http://www.zilverline.org
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Cheers,
   Michael Franken
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Range Query Sombody HELP please

2004-05-28 Thread Ype Kingma
On Friday 28 May 2004 10:54, Karthik N S wrote:
 Hey ype

 Thx for the advice but still I need to get the  exact situation working ,

 1) I have a unique Field [ called filename ] which is indexed of type Text.
 It accepts the name of the HTML files as  the indexing parameter ,
Also there is another Field called Contents   which stores all the
 contents of that
indicated unique named html file.

 2) The indexer complete indexes for about 5000 html files  sucessfully .

 3) When I do a search for word ,it returns a hit of  400  on various html
 files

 Now in this situation if I want to limit the hits  between  First 200  to
 400  html Page Names  only
 what exactly should I do to using getRange() method.

A range query will provide a range of indexed values, and
I thought you needed to add the record number as an indexed field
in each record.

However, you seem to use the 200 and 400 here as the order number
for each record in the result of the query on the Contents field.
Is that correct?
When so, in which order do you expect the results of your query?

Kind regards,
Ype


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Exact Field Match

2004-05-28 Thread Reece . 1247688
Hi,



Does Lucene have support for exact field match?  Is there a way to
say that this field equals exactly this value?  I know I can do it by using
an untokenized field.  But I have some values that I would want to store in
both tokenized and untokenized copies of the same field.  Instead of doing
that I'm just storing the tokenized version.



For example:

MyField = My
value.



I want to search where My value. is the exact match for this
field but I also sometime want to do a containing search so that just a query
for value matches.  



I'm planning on extracting the stored value and
comparing it to see if its an exact match.  If you have a better idea please
send it my way!



Thanks,

Reece

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Exact Field Match

2004-05-28 Thread Gus Kormeier
Yes, you can.  
And others probably have a much better example than mine...

There is probably a wiki or other document describing it.

You can chain queries together with BooleanQuery.  I am creating a Vector of
Query's based on restriction criteria off my site and then loading them into
the BooleanQuery.
You might check out WildCardQuery which works well with/without wildcard
parameters inside it.
-Gus



   QueryParser qp = new QueryParser(contents,analyzer);
   qp.setOperator(DEFAULT_OPERATOR);
 Query query = qp.parse(queryline);

   if(vFilters != null  vFilters.size()  0){
  BooleanQuery bq = new BooleanQuery();
  bq.add(query,true/*required*/,false/*not prohibited*/);
  Enumeration enum = vFilters.elements();
  while(enum.hasMoreElements()){
bq.add( (Query) enum.nextElement(),true/*required*/,false/*not
prohibited*/);
  }
   hits = searcher.search(bq);

   }else{
   hits = searcher.search(query);
   }
...


  public void setFilter(String fieldname,String fieldvalue){
   if(fieldname != null  fieldvalue != null 
  fieldname.length()  0  fieldvalue.length()  0){
  if(fieldvalue.indexOf(?) == -1){
 fieldvalue += ?;
  }
  Term t = new Term(fieldname,fieldvalue);
  WildcardQuery tq = new WildcardQuery(t);
  Filter myfilter = new QueryFilter(tq);
  setFilter(filter);
  vFilters.addElement(tq);
   }
  }

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent: Friday, May 28, 2004 4:13 PM
To: [EMAIL PROTECTED]
Subject: Exact Field Match


Hi,

Does Lucene have support for exact field match?  Is there a way to
say that this field equals exactly this value?  I know I can do it by using
an untokenized field.  But I have some values that I would want to store in
both tokenized and untokenized copies of the same field.  Instead of doing
that I'm just storing the tokenized version.

For example:
MyField = My
value.

I want to search where My value. is the exact match for this
field but I also sometime want to do a containing search so that just a
query
for value matches.  

I'm planning on extracting the stored value and
comparing it to see if its an exact match.  If you have a better idea please
send it my way!

Thanks,
Reece

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]