AW: Special field values
| -Ursprüngliche Nachricht- | Von: | [EMAIL PROTECTED] | e.org | [mailto:[EMAIL PROTECTED] ta.apache.org] Im Auftrag von Paul Elschot | Gesendet: Dienstag, 12. Oktober 2004 19:27 | An: [EMAIL PROTECTED] | Betreff: Re: Special field values | | On Tuesday 12 October 2004 15:02, Otis Gospodnetic wrote: | Hello Michael, | | This is something you'd have to code on your own. | | Otis | | --- Michael Hartmann [EMAIL PROTECTED] wrote: | Hi everybody, | | I am thinking about extending the Lucene search with | metadata in the | following way | | Field Value | | | -- | - | | Title (n1, n2, n3, ..., nm) | ni element of {0,1} and | m amount of | distinct | metadata values for title | | Expressed in an informal way, I want to store a tuple of | values in a | field. | The values in the tuple show whether a value is used in | the title or | not. | | A Lucene index can easily be used to determine whether or not | a term is in a field of a document: | | IndexReader.open(indexName).termDocs(new Term(term, | field)).skipTo(documentNr) | | returns the boolean indicating that. | What do you need the {0,1} values for? | | Regards, | Paul Elschot. Hi Paul, Thanks for your answer. The field should store a vector of values that indicate whether or not a term exists in a document or not. Just pure vanilla vector space model. I've read that Lucene has some kind of VSM but currently I don't understand how to handle that. Regards, Michael - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Multi + Parallel
Hi Guys Apologies.. I was Curious to Know the Difference between ParallelMultiSearcher and MultiSearcher , 1) Is the working internal functionality of these are same or different . 2) In terms of time domain do these differ when searching same no of fields / words . 3)What are the features used on each of API. Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Special field values
On Wednesday 13 October 2004 08:45, Michael Hartmann wrote: The field should store a vector of values that indicate whether or not a term exists in a document or not. You can just add more than one field with the same name but different values per document, then searching for single values should work. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Too many Open Files + lucene 1.4.1 + Linux O/s
Hi Apologies for the Long wait.. My Linux system on ulimit -a respresent core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited file size(blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files(-n) 1024 pipe size (512 bytes, -p) 8 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes(-u) 1983 virtual memory (kbytes, -v) unlimited The Problem of Too many Open Files happens on every 2nd Search being done I think as u say open files(-n) 1024 should be increased... More Advises is Accepted greatefully Thx in advance -Original Message- From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED] Sent: Sunday, October 03, 2004 5:08 AM To: Lucene Users List Subject: Re: Too many Open Files + lucene 1.4.1 + Linux O/s Karthik N S wrote: Hi Luceners, Apologies. Other day was Trying to Search using the Luceneweb version with Lucene1-4-1.zip and O/s = Linux, J2SDK version 1.4.2_03-b02 With Roughly around 500 Documents (715116 kb ) Indexed using Lucene1.4-final.jar and writer.setUseCompoundFile(true); Here are a couple of possibilities: - the setUseCompoundFile(true) will only apply to indexes created (or optimized) after the option is set. All pre-existing indexes will still be in the multi-file format. - number of documents does not directly impact the number of files needed by Lucene. If the index is really in a compound file format (see above), and is optimized, you will need a fixed number of file handles. Even if the index is in a multi-file format, the number of files needed depends on the number of indexed *fields* in the index (not documents). - do you get the error on the first and every search or only once in a while? Perhaps where there are lots of concurrent users? Perhaps after you've done X searchers? - check your OS-level setting for the number of open files. This is shell/system-dependent somewhat, but ulimit -a should get you started. The number of open files should be large enough to allow for all files and sockets that your application needs to open. In a typical server-side Java app setting this value should be around 8000. Defaults are much smaller, so unless you have changed this, this may be the answer. - look into lsof utility. It can display all file handles in use by a given process. This is a good tool to troubleshoot too many open files issues. Good luck. Dmitry. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Multi + Parallel
On Oct 13, 2004, at 3:14 AM, Karthik N S wrote: I was Curious to Know the Difference between ParallelMultiSearcher and MultiSearcher , 1) Is the working internal functionality of these are same or different . They are different internally. Externally they should return identical results and not appear different at all. Internally, ParallelMultiSearcher searches each index in a separate thread (searches wait until all threads finish before returning). In MultiSearcher, each index is searched serially. You will not likely see a benefit to using ParallelMultiSearcher unless your environment is specialized to accommodate multi-threading (multiple CPU's, indexes on separate drives that can operate independently, etc). 2) In terms of time domain do these differ when searching same no of fields / words . 3)What are the features used on each of API. There is no external difference to using either implementation. Benchmark searches using both and see what is best, but generally MultiSeacher will be better in most environments as it avoids the overhead of starting up and managing multiple threads. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
WhitespaceAnalyzer Problem
I have been indexing my flat files (plain text documents) using WhitespaceAnalyzer, in order not to miss out any characters during tokenizing. The results are satisfactory when I use exact search criteria for searching. However, I am unable to get any results or hits when I use wildcard searching using * or ?. Why is this so? Any work around for this? I am using Lucene 1.4 rc3. FYI, I am using same WhitespaceAnalyzer for both indexing as well as searching. Please help. Regards, Dera. - Do you Yahoo!? vote.yahoo.com - Register online to vote today!
Re: WhitespaceAnalyzer Problem
Dera - give the troubleshooting techniques provided here a try: http://wiki.apache.org/jakarta-lucene/AnalysisParalysis Provide us with a more detailed example of a sentence of text you indexed and how you are searching (using QueryParser, I presume) and we can likely offer more assistance. Erik On Oct 13, 2004, at 7:21 AM, Gabriela D wrote: I have been indexing my flat files (plain text documents) using WhitespaceAnalyzer, in order not to miss out any characters during tokenizing. The results are satisfactory when I use exact search criteria for searching. However, I am unable to get any results or hits when I use wildcard searching using * or ?. Why is this so? Any work around for this? I am using Lucene 1.4 rc3. FYI, I am using same WhitespaceAnalyzer for both indexing as well as searching. Please help. Regards, Dera. - Do you Yahoo!? vote.yahoo.com - Register online to vote today! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Encrypted indexes
We need to have index files that can't be reverse engineered, etc. An obvious approach would be to write a 'FSEncryptedDirectory' class, but sounds like a performance killer. Does anyone have experience in making an index secure? Thanks for any help, Michael Weir This message may contain privileged and/or confidential information. If you have received this e-mail in error or are not the intended recipient, you may not use, copy, disseminate or distribute it; do not open any attachments, delete it immediately from your system and notify the sender promptly by e-mail that you have done so. Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Encrypted indexes
Well, are you storing any data for retrieval from the index, because you could encrypt the actual data and then encrypt the search string public key style. Nader Henein Weir, Michael wrote: We need to have index files that can't be reverse engineered, etc. An obvious approach would be to write a 'FSEncryptedDirectory' class, but sounds like a performance killer. Does anyone have experience in making an index secure? Thanks for any help, Michael Weir This message may contain privileged and/or confidential information. If you have received this e-mail in error or are not the intended recipient, you may not use, copy, disseminate or distribute it; do not open any attachments, delete it immediately from your system and notify the sender promptly by e-mail that you have done so. Thank you. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Encrypted indexes
On Oct 13, 2004, at 15:26, Nader Henein wrote: Well, are you storing any data for retrieval from the index, because you could encrypt the actual data and then encrypt the search string public key style. Alternatively, write your index to an encrypted volume... something along the line of FileVault and PGP Disk [1] [2]. PA. [1] http://www.apple.com/macosx/features/filevault/ [2] http://www.pgp.com/products/desktop/index.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Encrypted indexes
I think it's possible to make a field encrypted by an symmetric encryption algorithms just the same as the compressed field and algorithms such like DES can be used with little performance loss. If the ability to block reverse engineering is critical, you should use PKI and would result more and more performance loss than those symmectic methods. On Wed, 13 Oct 2004 15:33:53 +0200, petite_abeille [EMAIL PROTECTED] wrote: On Oct 13, 2004, at 15:26, Nader Henein wrote: Well, are you storing any data for retrieval from the index, because you could encrypt the actual data and then encrypt the search string public key style. Alternatively, write your index to an encrypted volume... something along the line of FileVault and PGP Disk [1] [2]. PA. [1] http://www.apple.com/macosx/features/filevault/ [2] http://www.pgp.com/products/desktop/index.html - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Cheolgoo, Kang - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sorting and score ordering
Is there a way I can (without recompiling) ... make the score have priority and then my sort take affect when two results have the same rank? Along with that, is there a simple way to assign a new scorer to the searcher? So I can use the same lucene algorithm for my hits, but tweak it a little to fit my needs? -Chris On Wed, 13 Oct 2004 09:36:04 +0400, Nader Henein [EMAIL PROTECTED] wrote: As far as my testing showed, the sort will take priority, because it's basically an opt-in sort as opposed to the defaulted score sort. So you're basically displaying a sorted set over all your results as opposed to sorting the most relevant results. Hope this helps Nader Henein Chris Fraschetti wrote: If I use a Sort instance on my searcher, what will have priority? Score or Sort? Assuming I have a pages with .9, .9, and .5 scores, ... if the .5 has a higher 'sort' value, will it return higher than one of the .9 lucene score values if they are lower? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sorting and score ordering
On Wednesday 13 October 2004 19:53, Chris Fraschetti wrote: Is there a way I can (without recompiling) ... make the score have priority and then my sort take affect when two results have the same rank? You can just (explicitly) sort by score and use some other field as a second sort key. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sorting and score ordering
On Wednesday 13 October 2004 19:53, Chris Fraschetti wrote: Is there a way I can (without recompiling) ... make the score have priority and then my sort take affect when two results have the same rank? Along with that, is there a simple way to assign a new scorer to the searcher? So I can use the same lucene algorithm for my hits, but tweak it a little to fit my needs? There is no one to one relationship between a seacher and a scorer. When a query consists eg. of two terms, there will be three scorers executing the search for that query: one TermScorer for each term, and one scorer to combine the other two to provide the search results, usually a BooleanScorer or a ConjunctionScorer. For proximity queries, other scorers are used. Regards, Paul Elschot - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Index + Searching
Hello, I am using the IndexHTML class to index around 30,000 files and it is working fine. Question that I have is, is there a way to add multiple fields to index so that when the actual search is performed I can extract the exact match. E.g. the fields can be 1) title - abc 2) name - foo inc, 3) description - Lorem ipsum dolor sit 4) URL - www.lorem.ipsum and so on, From search when the match for title 'abc' is found then searching for doc.get(name) can return foo inc and so on. Is this already happening in any other indexing class if not what do I need to add to IndexHTML class to accomplish this? thanks for all the help gang. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sorting and score ordering
I haven't seen an example on how to apply two sorts to a search.. can you help me out with that? -Chris On Wed, 13 Oct 2004 20:03:05 +0200, Daniel Naber [EMAIL PROTECTED] wrote: On Wednesday 13 October 2004 19:53, Chris Fraschetti wrote: Is there a way I can (without recompiling) ... make the score have priority and then my sort take affect when two results have the same rank? You can just (explicitly) sort by score and use some other field as a second sort key. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sorting and score ordering
Paul Elschot wrote: Along with that, is there a simple way to assign a new scorer to the searcher? So I can use the same lucene algorithm for my hits, but tweak it a little to fit my needs? There is no one to one relationship between a seacher and a scorer. But you can use a different Similarity implementation with each Searcher. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sorting and score ordering
On Wednesday 13 October 2004 20:44, Chris Fraschetti wrote: I haven't seen an example on how to apply two sorts to a search.. can you help me out with that? Check out the documentation for Sort(SortField[] fields) and SortField. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: sorting and score ordering
Use SortField.FIELD_SCORE as the first element in the SortField[] when you pass it to sort method. Praveen - Original Message - From: Chris Fraschetti [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, October 13, 2004 3:19 PM Subject: Re: sorting and score ordering Will do. My other question was: the 'score' for a page as far as I know, is only accessible post-search... and is not contained in a field. How can I specift the score as a sort field when there is no field 'score' ? -Chris On Wed, 13 Oct 2004 21:06:14 +0200, Daniel Naber [EMAIL PROTECTED] wrote: On Wednesday 13 October 2004 20:44, Chris Fraschetti wrote: I haven't seen an example on how to apply two sorts to a search.. can you help me out with that? Check out the documentation for Sort(SortField[] fields) and SortField. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- ___ Chris Fraschetti, Student CompSci System Admin University of San Francisco e [EMAIL PROTECTED] | http://meteora.cs.usfca.edu - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene disk usage
Hi, As I remember right there was a discussion about the 3* vs 2* index size disk usage of a compound index during optimization, was that patched in 1.4.2? Cheers, Tea - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]