AW: Special field values

2004-10-13 Thread Michael Hartmann
| -Ursprüngliche Nachricht-
| Von: 
| [EMAIL PROTECTED]
| e.org 
| [mailto:[EMAIL PROTECTED]
ta.apache.org] Im Auftrag von Paul Elschot
| Gesendet: Dienstag, 12. Oktober 2004 19:27
| An: [EMAIL PROTECTED]
| Betreff: Re: Special field values
| 
| On Tuesday 12 October 2004 15:02, Otis Gospodnetic wrote:
|  Hello Michael,
| 
|  This is something you'd have to code on your own.
| 
|  Otis
| 
|  --- Michael Hartmann [EMAIL PROTECTED] wrote:
|   Hi everybody,
|  
|   I am thinking about extending the Lucene search with 
| metadata in the 
|   following way
|  
|   Field Value
| 
|  
| --
|  -
| 
|   Title (n1, n2, n3, ..., nm) | ni element of {0,1} and 
| m amount of
|   distinct
|   metadata values for title
|  
|   Expressed in an informal way, I want to store a tuple of 
| values in a 
|   field.
|   The values in the tuple show whether a value is used in 
| the title or 
|   not.
| 
| A Lucene index can easily be used to determine whether or not 
| a term is in a field of a document:
| 
| IndexReader.open(indexName).termDocs(new Term(term, 
| field)).skipTo(documentNr)
| 
| returns the boolean indicating that.
| What do you need the {0,1} values for?
| 
| Regards,
| Paul Elschot.

Hi Paul,

Thanks for your answer. The field should store a vector of values that
indicate whether or not a term exists in a document or not. Just pure
vanilla vector space model. I've read that Lucene has some kind of VSM but
currently I don't understand how to handle that.

Regards,
Michael



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Multi + Parallel

2004-10-13 Thread Karthik N S


Hi
 Guys

Apologies..


I was Curious to Know the Difference between ParallelMultiSearcher  and
MultiSearcher ,

1) Is the working internal functionality of these  are  same or different .

2) In terms of time domain do these differ when searching same no of  fields
/ words .

3)What are the features used on each of  API.


Thx in advance


  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Special field values

2004-10-13 Thread Daniel Naber
On Wednesday 13 October 2004 08:45, Michael Hartmann wrote:

 The field should store a vector of values that
 indicate whether or not a term exists in a document or not.

You can just add more than one field with the same name but different 
values per document, then searching for single values should work.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Too many Open Files + lucene 1.4.1 + Linux O/s

2004-10-13 Thread Karthik N S
Hi


Apologies for  the Long wait..


   My Linux system on ulimit -a  respresent


core file size   (blocks, -c) 0
data seg size  (kbytes, -d) unlimited
file size(blocks, -f) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files(-n) 1024
pipe size  (512 bytes, -p) 8
stack size   (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes(-u) 1983
virtual memory (kbytes, -v) unlimited


The Problem of  Too many Open Files  happens on every 2nd Search  being
done

I think as u say  open files(-n) 1024   should be
increased...


More Advises  is Accepted  greatefully

Thx in advance





-Original Message-
From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED]
Sent: Sunday, October 03, 2004 5:08 AM
To: Lucene Users List
Subject: Re: Too many Open Files + lucene 1.4.1 + Linux O/s


Karthik N S wrote:

Hi Luceners,


Apologies.


Other day was Trying to Search using the Luceneweb  version
with Lucene1-4-1.zip  and   O/s = Linux, J2SDK version 1.4.2_03-b02

With Roughly around  500 Documents (715116 kb )  Indexed  using
Lucene1.4-final.jar and  writer.setUseCompoundFile(true);


Here are a couple of possibilities:
- the setUseCompoundFile(true) will only apply to indexes created (or
optimized) after the option is set.
  All pre-existing indexes will still be in the multi-file format.
- number of documents does not directly impact the number of files
needed by Lucene. If the index is
  really in a compound file format (see above), and is optimized, you
will need a fixed number of file handles.
  Even if the index is in a multi-file format, the number of files
needed depends on the number of indexed *fields* in the index (not
documents).
- do you get the error on the first and every search or only once in a
while? Perhaps where there are lots of
  concurrent users? Perhaps after you've done X searchers?
- check your OS-level setting for the number of open files. This is
shell/system-dependent somewhat, but
   ulimit -a should get you started. The number of open files should
be large enough to allow for all files
   and sockets that your application needs to open. In a typical
server-side Java app setting this value should
   be around 8000. Defaults are much smaller, so unless you have changed
this, this may be the answer.
- look into lsof utility. It can display all file handles in use by a
given process. This is a good tool to
  troubleshoot too many open files issues.

Good luck.
Dmitry.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Multi + Parallel

2004-10-13 Thread Erik Hatcher
On Oct 13, 2004, at 3:14 AM, Karthik N S wrote:
I was Curious to Know the Difference between ParallelMultiSearcher  and
MultiSearcher ,
1) Is the working internal functionality of these  are  same or 
different .
They are different internally.  Externally they should return identical 
results and not appear different at all.

Internally, ParallelMultiSearcher searches each index in a separate 
thread (searches wait until all threads finish before returning).   In 
MultiSearcher, each index is searched serially.

You will not likely see a benefit to using ParallelMultiSearcher unless 
your environment is specialized to accommodate multi-threading 
(multiple CPU's, indexes on separate drives that can operate 
independently, etc).

2) In terms of time domain do these differ when searching same no of  
fields
/ words .

3)What are the features used on each of  API.
There is no external difference to using either implementation.  
Benchmark searches using both and see what is best, but generally 
MultiSeacher will be better in most environments as it avoids the 
overhead of starting up and managing multiple threads.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


WhitespaceAnalyzer Problem

2004-10-13 Thread Gabriela D
I have been indexing my flat files (plain text documents) using 
WhitespaceAnalyzer, in order not to miss out any characters during tokenizing. 
The results are satisfactory when I use exact search criteria for 
searching. However, I am unable to get any results or hits when I use 
wildcard searching using * or ?. Why is this so? Any work around for 
this?
I am using Lucene 1.4 rc3. FYI, I am using same WhitespaceAnalyzer for 
both indexing as well as searching.
Please help.
Regards, Dera.



-
Do you Yahoo!?
vote.yahoo.com - Register online to vote today!

Re: WhitespaceAnalyzer Problem

2004-10-13 Thread Erik Hatcher
Dera - give the troubleshooting techniques provided here a try: 
http://wiki.apache.org/jakarta-lucene/AnalysisParalysis

Provide us with a more detailed example of a sentence of text you 
indexed and how you are searching (using QueryParser, I presume) and we 
can likely offer more assistance.

Erik
On Oct 13, 2004, at 7:21 AM, Gabriela D wrote:
I have been indexing my flat files (plain text documents) using
WhitespaceAnalyzer, in order not to miss out any characters during 
tokenizing.
The results are satisfactory when I use exact search criteria for
searching. However, I am unable to get any results or hits when I use
wildcard searching using * or ?. Why is this so? Any work around 
for
this?
I am using Lucene 1.4 rc3. FYI, I am using same WhitespaceAnalyzer for
both indexing as well as searching.
Please help.
Regards, Dera.


-
Do you Yahoo!?
vote.yahoo.com - Register online to vote today!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Encrypted indexes

2004-10-13 Thread Weir, Michael
We need to have index files that can't be reverse engineered, etc. An
obvious approach would be to write a 'FSEncryptedDirectory' class, but
sounds like a performance killer.

Does anyone have experience in making an index secure?

Thanks for any help,
Michael Weir 
  
   This message may contain privileged and/or confidential information.  If you 
have received this e-mail in error or are not the intended recipient, you may not use, 
copy, disseminate or distribute it; do not open any attachments, delete it immediately 
from your system and notify the sender promptly by e-mail that you have done so.  
Thank you. 
 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Encrypted indexes

2004-10-13 Thread Nader Henein
Well, are you storing any data for retrieval from the index, because 
you could encrypt the actual data and then encrypt the search string 
public key style.

Nader Henein
Weir, Michael wrote:
We need to have index files that can't be reverse engineered, etc. An
obvious approach would be to write a 'FSEncryptedDirectory' class, but
sounds like a performance killer.
Does anyone have experience in making an index secure?
Thanks for any help,
Michael Weir 
 
  This message may contain privileged and/or confidential information.  If you have received this e-mail in error or are not the intended recipient, you may not use, copy, disseminate or distribute it; do not open any attachments, delete it immediately from your system and notify the sender promptly by e-mail that you have done so.  Thank you. 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

 

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Encrypted indexes

2004-10-13 Thread petite_abeille
On Oct 13, 2004, at 15:26, Nader Henein wrote:
Well, are you storing any data for retrieval from the index, because 
you could encrypt the actual data and then encrypt the search string 
public key style.
Alternatively, write your index to an encrypted volume... something 
along the line of FileVault and PGP Disk [1] [2].

PA.
[1] http://www.apple.com/macosx/features/filevault/
[2] http://www.pgp.com/products/desktop/index.html
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Encrypted indexes

2004-10-13 Thread Cheolgoo Kang
I think it's possible to make a field encrypted by an symmetric encryption
algorithms just the same as the compressed field and algorithms such like
DES can be used with little performance loss.

If the ability to block reverse engineering is critical, you should use PKI
and would result more and more performance loss than those symmectic
methods.


On Wed, 13 Oct 2004 15:33:53 +0200, petite_abeille
[EMAIL PROTECTED] wrote:
 
 On Oct 13, 2004, at 15:26, Nader Henein wrote:
 
  Well, are you storing any data for retrieval from the index, because
  you could encrypt the actual data and then encrypt the search string
  public key style.
 
 Alternatively, write your index to an encrypted volume... something
 along the line of FileVault and PGP Disk [1] [2].
 
 PA.
 
 [1] http://www.apple.com/macosx/features/filevault/
 [2] http://www.pgp.com/products/desktop/index.html
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


Cheolgoo, Kang

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: sorting and score ordering

2004-10-13 Thread Chris Fraschetti
Is there a way I can (without recompiling) ... make the score have
priority and then my sort take affect when two results have the same
rank?

Along with that, is there a simple way to assign a new scorer to the
searcher? So I can use the same lucene algorithm for my hits, but
tweak it a little to fit my needs?

-Chris


On Wed, 13 Oct 2004 09:36:04 +0400, Nader Henein [EMAIL PROTECTED] wrote:
 As far as my testing showed, the sort will take priority, because it's
 basically an opt-in sort as opposed to the defaulted score sort. So
 you're basically displaying a sorted set over all your results as
 opposed to sorting the most relevant results.
 
 Hope this helps
 
 Nader Henein
 
 Chris Fraschetti wrote:
 
 If I use a Sort instance on my searcher, what will have priority?
 Score or Sort? Assuming I have a pages with .9, .9, and .5 scores, ...
 if the .5 has a higher 'sort' value, will it return higher than one of
 the .9 lucene score values if they are lower?
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-- 
___
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: sorting and score ordering

2004-10-13 Thread Daniel Naber
On Wednesday 13 October 2004 19:53, Chris Fraschetti wrote:

 Is there a way I can (without recompiling) ... make the score have
 priority and then my sort take affect when two results have the same
 rank?

You can just (explicitly) sort by score and use some other field as a 
second sort key.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: sorting and score ordering

2004-10-13 Thread Paul Elschot
On Wednesday 13 October 2004 19:53, Chris Fraschetti wrote:
 Is there a way I can (without recompiling) ... make the score have
 priority and then my sort take affect when two results have the same
 rank?

 Along with that, is there a simple way to assign a new scorer to the
 searcher? So I can use the same lucene algorithm for my hits, but
 tweak it a little to fit my needs?

There is no one to one relationship between a seacher and a scorer.

When a query consists eg. of two terms, there will be three scorers
executing the search for that query: one TermScorer for each term,
and one scorer to combine the other two to provide the search results,
usually a BooleanScorer or a ConjunctionScorer.
For proximity queries, other scorers are used.

Regards,
Paul Elschot


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Index + Searching

2004-10-13 Thread Hetan Shah
Hello,
I am using the IndexHTML class to index around 30,000 files and it is 
working fine. Question that I have is, is there a way to add multiple 
fields to index so that when the actual search is performed I can 
extract the exact match.
E.g.
the fields can be
1) title - abc
2) name - foo inc,
3) description - Lorem ipsum dolor sit
4) URL - www.lorem.ipsum

and so on,
From search when the match for title 'abc' is found then searching for 
doc.get(name) can return foo inc and so on.

Is this already happening in any other indexing class if not what do I 
need to add to IndexHTML class to accomplish this?

thanks for all the help gang.
-H
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: sorting and score ordering

2004-10-13 Thread Chris Fraschetti
I haven't seen an example on how to apply two sorts to a search.. can
you help me out with that?

-Chris


On Wed, 13 Oct 2004 20:03:05 +0200, Daniel Naber
[EMAIL PROTECTED] wrote:
 On Wednesday 13 October 2004 19:53, Chris Fraschetti wrote:
 
  Is there a way I can (without recompiling) ... make the score have
  priority and then my sort take affect when two results have the same
  rank?
 
 You can just (explicitly) sort by score and use some other field as a
 second sort key.
 
 Regards
 Daniel
 
 --
 http://www.danielnaber.de
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 


-- 
___
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: sorting and score ordering

2004-10-13 Thread Doug Cutting
Paul Elschot wrote:
Along with that, is there a simple way to assign a new scorer to the
searcher? So I can use the same lucene algorithm for my hits, but
tweak it a little to fit my needs?

There is no one to one relationship between a seacher and a scorer.
But you can use a different Similarity implementation with each Searcher.
Doug
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: sorting and score ordering

2004-10-13 Thread Daniel Naber
On Wednesday 13 October 2004 20:44, Chris Fraschetti wrote:

 I haven't seen an example on how to apply two sorts to a search.. can
 you help me out with that?

Check out the documentation for Sort(SortField[] fields) and SortField.

Regards
 Daniel

-- 
http://www.danielnaber.de

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: sorting and score ordering

2004-10-13 Thread Praveen Peddi
Use SortField.FIELD_SCORE as the first element in the SortField[] when you 
pass it to sort method.

Praveen
- Original Message - 
From: Chris Fraschetti [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, October 13, 2004 3:19 PM
Subject: Re: sorting and score ordering


Will do.
My other question was: the 'score' for a page as far as I know, is
only accessible post-search... and is not contained in a field. How
can I specift the score as a sort field when there is no field 'score'
?
-Chris
On Wed, 13 Oct 2004 21:06:14 +0200, Daniel Naber
[EMAIL PROTECTED] wrote:
On Wednesday 13 October 2004 20:44, Chris Fraschetti wrote:
 I haven't seen an example on how to apply two sorts to a search.. can
 you help me out with that?
Check out the documentation for Sort(SortField[] fields) and SortField.

Regards
Daniel
--
http://www.danielnaber.de
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


--
___
Chris Fraschetti, Student CompSci System Admin
University of San Francisco
e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Lucene disk usage

2004-10-13 Thread Tea Yu
Hi,

As I remember right there was a discussion about the 3* vs 2* index size
disk usage of a compound index during optimization, was that patched in
1.4.2?

Cheers,
Tea


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]