Re: searching using the CJKAnalyzer

2004-10-12 Thread Daan Hoogland
Jon Schuster wrote:

I didn't need to make any changes to Entities to get Japanese searches working. Are 
you using the CJKAnalyzer when you perform the search, not only when building the 
index?
  

Yes, I use CJKAnalyzer all around. When searching I translate 
character-entities in order to find anything. When displaying search 
results, I don't see anything that looks as being part of an eastern 
character set. instead I see accented latin - and mathematical symbols.

When I don't pass entities by the way things get really nasty:
query passed: ??
 char(, LATIN_1_SUPPLEMENT)  char(?, LATIN_1_SUPPLEMENT) token found : 
  length: 1
 char(?, LATIN_1_SUPPLEMENT)  char(, LATIN_1_SUPPLEMENT)  char(, 
LATIN_1_SUPPLEMENT) token found :  length: 1
 char(, LATIN_1_SUPPLEMENT) searching contents: 

This was a query for two japanese characters.

-Original Message-
From: Daan Hoogland [mailto:[EMAIL PROTECTED] 
Sent: Sunday, October 10, 2004 10:48 PM
To: Lucene Users List
Subject: Re: searching using the CJKAnalyzer
Importance: Low


Che Dong wrote:

  

Seem not Analyser problem but html parser charset detecting error.

Could you show me the detail of the problem?



Thank Che,
I got it working by making the decode() from the Entities in demo 
public. I wrote a scanner to tranlate any entities in the query.
I want to translate back to entities in the results, but I'm not sure 
what the criteria should be. It seems to be just binary data.
How to conclude that 04?03?04 means ?

  

Thanks

Che Dong

Daan Hoogland wrote:



LS,
in
http://issues.apache.org/eyebrowse/ReadMsg?listId=30msgNo=8980
Jon Schuster explains how to get a Japanese search system working. I 
followed his advice and got a index that luke shows as what I 
expected it to be.
I don't know how to enter a search so that it gets passed to the 
engine properly. It works in luke but not in weblucene or in my own app.


  

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]







  




-- 
The information contained in this communication and any attachments is confidential 
and may be privileged, and is for the sole use of the intended recipient(s). Any 
unauthorized review, use, disclosure or distribution is prohibited. If you are not the 
intended recipient, please notify the sender immediately by replying to this message 
and destroy all copies of this message and any attachments. ASML is neither liable for 
the proper and complete transmission of the information contained in this 
communication, nor for any delay in its receipt.


Re: Indexing Strategy for 20 million documents

2004-10-12 Thread Otis Gospodnetic

--- Christoph Kiehl [EMAIL PROTECTED] wrote:

 Otis Gospodnetic wrote:
 
  I would try putting everything in a single index first, and split
 it up
  only if I see performance issues.  
 
 Why would put everything into a single index? I found some benchmark 
 results on the list (starting with your post from 06/08/04) from
 which I 
 got the impression that the performance loss is very small if I
 choose 
 to search in multiple indexes with MultiSearcher instead of using one
 
 big index.

I think it's simpler to deal with a single index.  One directory, one
set of lock files, etc.  If you don't gain anything by having multiple
indices, why have them?

  Going from 1 index to N indices is
  not a lot of work (not a lot of Lucene-related code). 
 
 How do you get from 1 index to N indices without adding the documents
 again?

Yes, you would have to re-create N Lucene indices.

Otis


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: searching using the CJKAnalyzer

2004-10-12 Thread Che Dong
CJKAnalyser not support single byte-stream, front end interface and 
backend indexing process need to transform source into double byte 
charactor-stream properly before search/index.

Please tell me know the output of
http://www.chedong.com/tech/HelloUnicode.java
with javac -encoding=gb2312 and javac -encoding=iso-8859-1
Regards
Che Dong
Daan Hoogland wrote:
Jon Schuster wrote:

I didn't need to make any changes to Entities to get Japanese searches working. Are 
you using the CJKAnalyzer when you perform the search, not only when building the 
index?

Yes, I use CJKAnalyzer all around. When searching I translate 
character-entities in order to find anything. When displaying search 
results, I don't see anything that looks as being part of an eastern 
character set. instead I see accented latin - and mathematical symbols.

When I don't pass entities by the way things get really nasty:
query passed: ??
 char(, LATIN_1_SUPPLEMENT)  char(?, LATIN_1_SUPPLEMENT) token found : 
  length: 1
 char(?, LATIN_1_SUPPLEMENT)  char(, LATIN_1_SUPPLEMENT)  char(, 
LATIN_1_SUPPLEMENT) token found :  length: 1
 char(, LATIN_1_SUPPLEMENT) searching contents: 

This was a query for two japanese characters.

-Original Message-
From: Daan Hoogland [mailto:[EMAIL PROTECTED] 
Sent: Sunday, October 10, 2004 10:48 PM
To: Lucene Users List
Subject: Re: searching using the CJKAnalyzer
Importance: Low

Che Dong wrote:


Seem not Analyser problem but html parser charset detecting error.
Could you show me the detail of the problem?
  

Thank Che,
I got it working by making the decode() from the Entities in demo 
public. I wrote a scanner to tranlate any entities in the query.
I want to translate back to entities in the results, but I'm not sure 
what the criteria should be. It seems to be just binary data.
How to conclude that 04?03?04 means ?



Thanks
Che Dong
Daan Hoogland wrote:
  


LS,
in
http://issues.apache.org/eyebrowse/ReadMsg?listId=30msgNo=8980
Jon Schuster explains how to get a Japanese search system working. I 
followed his advice and got a index that luke shows as what I 
expected it to be.
I don't know how to enter a search so that it gets passed to the 
engine properly. It works in luke but not in weblucene or in my own app.



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
  







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Multisearcher question

2004-10-12 Thread Sreedhar, Dantam
Hi,

Index side information:

No. of indexes: Two (to explain better I call these as index_a and
index_b).

Fields in index_a: x and y.
Fields in index_b: y and z.

I have written a multisearch code like this.

Searcher search_a = new IndexSearcher(LOCATION_OF_INDEX_A);
Searcher search_b = new IndexSearcher(LOCATION_OF_INDEX_B);
Searcher[] searcher = new Searcher[2];
searcher[0] = search_a;
searcher[1] = search_b;
MultiSearcher searcher = new MultiSearcher(searcher);

I am getting the following results,

x:query  - WORKS
x:query AND y:query - WORKS
x:query AND z:query - DOESN'T WORK

Is this expected behavior?

My question is, Can MultiSearcher be used to search on indexes with
different fields? If yes, could you please correct the above code.

Thanks,
-Sreedhar


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Special field values

2004-10-12 Thread Michael Hartmann
Hi everybody,

I am thinking about extending the Lucene search with metadata in the
following way

Field   Value
---
Title   (n1, n2, n3, ..., nm) | ni element of {0,1} and m amount of distinct
metadata values for title

Expressed in an informal way, I want to store a tuple of values in a field.
The values in the tuple show whether a value is used in the title or not.

My question is then, whether I have to code that on my own or if the model
is already set up to work like that.

Thanks,
Michael



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Special field values

2004-10-12 Thread Otis Gospodnetic
Hello Michael,

This is something you'd have to code on your own.

Otis

--- Michael Hartmann [EMAIL PROTECTED] wrote:

 Hi everybody,
 
 I am thinking about extending the Lucene search with metadata in the
 following way
 
 Field Value

---
 Title (n1, n2, n3, ..., nm) | ni element of {0,1} and m amount of
 distinct
 metadata values for title
 
 Expressed in an informal way, I want to store a tuple of values in a
 field.
 The values in the tuple show whether a value is used in the title or
 not.
 
 My question is then, whether I have to code that on my own or if the
 model
 is already set up to work like that.
 
 Thanks,
 Michael
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Multisearcher question

2004-10-12 Thread Otis Gospodnetic
Hello Sreedhar,

This is the expected behaviour.  The query is run against each index,
and it won't have any matches in either index, because neither index
has both fields.

Otis

--- Sreedhar, Dantam [EMAIL PROTECTED] wrote:

 Hi,
 
 Index side information:
 
 No. of indexes: Two (to explain better I call these as index_a and
 index_b).
 
 Fields in index_a: x and y.
 Fields in index_b: y and z.
 
 I have written a multisearch code like this.
 
 Searcher search_a = new IndexSearcher(LOCATION_OF_INDEX_A);
 Searcher search_b = new IndexSearcher(LOCATION_OF_INDEX_B);
 Searcher[] searcher = new Searcher[2];
 searcher[0] = search_a;
 searcher[1] = search_b;
 MultiSearcher searcher = new MultiSearcher(searcher);
 
 I am getting the following results,
 
 x:query  - WORKS
 x:query AND y:query - WORKS
 x:query AND z:query - DOESN'T WORK
 
 Is this expected behavior?
 
 My question is, Can MultiSearcher be used to search on indexes with
 different fields? If yes, could you please correct the above code.
 
 Thanks,
 -Sreedhar
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Multisearcher question

2004-10-12 Thread Sreedhar, Dantam
Thanks Otis for you reply.

If I want to solve the problem that I have defined in my previous mail,
what is the suggested approach? 

Thanks,
-Sreedhar

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, October 12, 2004 6:35 PM
To: Lucene Users List
Subject: Re: Multisearcher question


Hello Sreedhar,

This is the expected behaviour.  The query is run against each index,
and it won't have any matches in either index, because neither index
has both fields.

Otis

--- Sreedhar, Dantam [EMAIL PROTECTED] wrote:

 Hi,
 
 Index side information:
 
 No. of indexes: Two (to explain better I call these as index_a and
 index_b).
 
 Fields in index_a: x and y.
 Fields in index_b: y and z.
 
 I have written a multisearch code like this.
 
 Searcher search_a = new IndexSearcher(LOCATION_OF_INDEX_A);
 Searcher search_b = new IndexSearcher(LOCATION_OF_INDEX_B);
 Searcher[] searcher = new Searcher[2];
 searcher[0] = search_a;
 searcher[1] = search_b;
 MultiSearcher searcher = new MultiSearcher(searcher);
 
 I am getting the following results,
 
 x:query  - WORKS
 x:query AND y:query - WORKS
 x:query AND z:query - DOESN'T WORK
 
 Is this expected behavior?
 
 My question is, Can MultiSearcher be used to search on indexes with
 different fields? If yes, could you please correct the above code.
 
 Thanks,
 -Sreedhar
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



SearchBlox J2EE Search Component Version 2.0 released

2004-10-12 Thread Robert Selvaraj

SearchBlox is a J2EE Search Component that delivers out-of-the-box search
functionality for fast and easy implementation with your websites,
applications, intranets and portals. SearchBlox uses the Lucene Search API
and incorporates integrated HTTP/HTTPS and File System crawlers, support for
various document formats including HTML, Word, PDF, PowerPoint and Excel,
support for indexing and searching content in 18 languages and customizable
search results, all controlled from a browser-based Admin Console. 

Main features in this release: 
==
- Advanced Search: search by file format, language, keyword occurrence and
modified date
- Keyword-in-Context Display: search results are displayed with areas of
content where the keyword occurs
- Upgrade to Lucene 1.4.2
- Performance and stability improvements
- Bug fixes 

SearchBlox is available as a Web Archive (WAR) and is deployable on any
Servlet 2.3/JSP 1.2 compliant server. SearchBlox Getting-Started Guides are
available for the following servers: 

JBoss - http://www.searchblox.com/gettingstarted_jboss.html
Jetty - http://www.searchblox.com/gettingstarted_jetty.html
JRun - http://www.searchblox.com/gettingstarted_jrun.html
Oracle - http://www.searchblox.com/gettingstarted_oracle.html
Pramati - http://www.searchblox.com/gettingstarted_pramati.html
Resin - http://www.searchblox.com/gettingstarted_resin.html
Sun - http://www.searchblox.com/gettingstarted_sun.html
Tomcat - http://www.searchblox.com/gettingstarted_tomcat.html
Weblogic - http://www.searchblox.com/gettingstarted_weblogic.html
Websphere - http://www.searchblox.com/gettingstarted_websphere.html 

SearchBlox is also available as SearchBlox Server. The Server is an
integrated application incorporating everything you need to run SearchBlox.
The Server includes the SearchBlox J2EE Component, the Jetty Application
Server and the Java Runtime Environment (JRE) 1.4. With the SearchBlox
Server, there are no additional software requirements to deploy SearchBlox. 

The SearchBlox FREE Edition is available free of charge and can index up to
1000 documents. 

The software can be downloaded from http://www.searchblox.com




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: indexing numeric entities?

2004-10-12 Thread Damian Gajda
Yes You need to parse the entities Yourself. I implemented an HTML
entity parser as a part of http://objectledge.org project. You may use
it if it will fit Your needs. It is in a ledge-components project
module. See http://objectledge.org/modules/ledge-components/index.html

Have fun,
-- 
Damian Gajda
Caltha Sp. j.
http://www.caltha.pl/




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: indexing numeric entities?

2004-10-12 Thread Patel, Viral


-Original Message-
From: Damian Gajda [mailto:[EMAIL PROTECTED]
Sent: Tuesday, October 12, 2004 10:23 AM
To: Lucene Users List
Subject: Re: indexing numeric entities?


Yes You need to parse the entities Yourself. I implemented an HTML
entity parser as a part of http://objectledge.org project. You may use
it if it will fit Your needs. It is in a ledge-components project
module. See http://objectledge.org/modules/ledge-components/index.html

Have fun,
-- 
Damian Gajda
Caltha Sp. j.
http://www.caltha.pl/




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Special field values

2004-10-12 Thread Paul Elschot
On Tuesday 12 October 2004 15:02, Otis Gospodnetic wrote:
 Hello Michael,

 This is something you'd have to code on your own.

 Otis

 --- Michael Hartmann [EMAIL PROTECTED] wrote:
  Hi everybody,
 
  I am thinking about extending the Lucene search with metadata in the
  following way
 
  Field   Value

 ---

  Title   (n1, n2, n3, ..., nm) | ni element of {0,1} and m amount of
  distinct
  metadata values for title
 
  Expressed in an informal way, I want to store a tuple of values in a
  field.
  The values in the tuple show whether a value is used in the title or
  not.

A Lucene index can easily be used to determine whether or not a term is
in a field of a document:

IndexReader.open(indexName).termDocs(new Term(term, field)).skipTo(documentNr)

returns the boolean indicating that.
What do you need the {0,1} values for?

Regards,
Paul Elschot.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Multisearcher question

2004-10-12 Thread Terry Steichen
I think what Sreedhar is asking for is the capability to form a join across multiple 
indices - and if so, I could sure use that capability myself.  However, I think 
Lucene's logic focuses only on a single query, so I doubt if that's easily done.

  - Original Message - 
  From: Otis Gospodnetic 
  To: Lucene Users List 
  Sent: Tuesday, October 12, 2004 9:04 AM
  Subject: Re: Multisearcher question


  Hello Sreedhar,

  This is the expected behaviour.  The query is run against each index,
  and it won't have any matches in either index, because neither index
  has both fields.

  Otis

  --- Sreedhar, Dantam [EMAIL PROTECTED] wrote:

   Hi,
   
   Index side information:
   
   No. of indexes: Two (to explain better I call these as index_a and
   index_b).
   
   Fields in index_a: x and y.
   Fields in index_b: y and z.
   
   I have written a multisearch code like this.
   
   Searcher search_a = new IndexSearcher(LOCATION_OF_INDEX_A);
   Searcher search_b = new IndexSearcher(LOCATION_OF_INDEX_B);
   Searcher[] searcher = new Searcher[2];
   searcher[0] = search_a;
   searcher[1] = search_b;
   MultiSearcher searcher = new MultiSearcher(searcher);
   
   I am getting the following results,
   
   x:query  - WORKS
   x:query AND y:query - WORKS
   x:query AND z:query - DOESN'T WORK
   
   Is this expected behavior?
   
   My question is, Can MultiSearcher be used to search on indexes with
   different fields? If yes, could you please correct the above code.
   
   Thanks,
   -Sreedhar
   
   
   -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail: [EMAIL PROTECTED]
   
   


  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]



Re: Special field values

2004-10-12 Thread Paul Elschot
On Tuesday 12 October 2004 19:27, Paul Elschot wrote:


 IndexReader.open(indexName).termDocs(new Term(term,
 field)).skipTo(documentNr)

 returns the boolean indicating that.

Well, almost. When it returns true one still needs to check the TermDocs
for being at the documentNr.

Paul Elschot


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



sorting and score ordering

2004-10-12 Thread Chris Fraschetti
If I use a Sort instance on my searcher, what will have priority?
Score or Sort? Assuming I have a pages with .9, .9, and .5 scores, ...
if the .5 has a higher 'sort' value, will it return higher than one of
the .9 lucene score values if they are lower?

-- 
___
Chris Fraschetti
e [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Problem indexing

2004-10-12 Thread Miguel Angel
Hi, i have problem indexing in the rout  C:\TXT\DOC\ 

But i indexing in the rout C:\TXT  is OK

Why is the problem ??

P.D Anybody speak spanish in the list please reply
P.D.  Si alguien habla espaƱol por favor respodame gracias..

-- 
Miguel Angel Angeles R.
Asesoria en Conectividad y Servidores
Telf. 97451277

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: sorting and score ordering

2004-10-12 Thread Nader Henein
As far as my testing showed, the sort will take priority, because it's 
basically an opt-in sort as opposed to the defaulted score sort. So 
you're basically displaying a sorted set over all your results as 
opposed to sorting the most relevant results.

Hope this helps
Nader Henein
Chris Fraschetti wrote:
If I use a Sort instance on my searcher, what will have priority?
Score or Sort? Assuming I have a pages with .9, .9, and .5 scores, ...
if the .5 has a higher 'sort' value, will it return higher than one of
the .9 lucene score values if they are lower?
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]