Re: what's the use of proximity data?

2007-03-26 Thread karl wettin
27 mar 2007 kl. 09.33 skrev SK R: Hi, I'm speaking about term positions. In Lucene File Format, the .prx file contains the lists of positions that each term occurs at within documents. I asked what's the purpose of this .prx file? It is generally used for phrase/span queries.

Re: what's the use of proximity data?

2007-03-26 Thread SK R
Hi, I'm speaking about term positions. In Lucene File Format, the .prx file contains the lists of positions that each term occurs at within documents. I asked what's the purpose of this .prx file? Also how phrasequery handled? Thanks & Regards RSK On 3/27/07, karl wettin <[EMAI

Re: what's the use of proximity data?

2007-03-26 Thread karl wettin
27 mar 2007 kl. 08.49 skrev SK R: Hi, Please clarify my doubts. What's the use of storing proximity data internally while indexing? Is it only for score calculation or any other additional purpose? How lucene handles phrase query? Whether it's depend on proximity data of phrase

what's the use of proximity data?

2007-03-26 Thread SK R
Hi, Please clarify my doubts. What's the use of storing proximity data internally while indexing? Is it only for score calculation or any other additional purpose? How lucene handles phrase query? Whether it's depend on proximity data of phrase terms or any other? Thanks & Regards RSK

Re: why Apache doesnt create a nice forum like the others???

2007-03-26 Thread karl wettin
27 mar 2007 kl. 08.28 skrev Mohammad Norouzi: Karl, Maybe I am out of date! do you mean with Nabble I can access this mailing list? Yes. -- karl On 3/27/07, karl wettin <[EMAIL PROTECTED]> wrote: 27 mar 2007 kl. 08.03 skrev Mohammad Norouzi: > I am using some JBoss products and th

Re: why Apache doesnt create a nice forum like the others???

2007-03-26 Thread Mohammad Norouzi
Karl, Maybe I am out of date! do you mean with Nabble I can access this mailing list? On 3/27/07, karl wettin <[EMAIL PROTECTED]> wrote: 27 mar 2007 kl. 08.03 skrev Mohammad Norouzi: > I am using some JBoss products and they have a very nice and great > forum, > > I am wondering why Apache st

Re: why Apache doesnt create a nice forum like the others???

2007-03-26 Thread karl wettin
27 mar 2007 kl. 08.03 skrev Mohammad Norouzi: I am using some JBoss products and they have a very nice and great forum, I am wondering why Apache still uses this old-fashioned mailing list?? Are you sure it is not just your email client that is old fashioned? You can also try services su

Re: why Apache doesnt create a nice forum like the others???

2007-03-26 Thread Daniel Noll
Mohammad Norouzi wrote: I am using some JBoss products and they have a very nice and great forum, I am wondering why Apache still uses this old-fashioned mailing list?? Probably because you can simulate a forum using a mailing list, but you can't easily simulate a mailing list using a forum.

why Apache doesnt create a nice forum like the others???

2007-03-26 Thread Mohammad Norouzi
I am using some JBoss products and they have a very nice and great forum, I am wondering why Apache still uses this old-fashioned mailing list?? -- Regards, Mohammad

Re: Virtually merge two indexes?

2007-03-26 Thread Chris Lu
Thanks! I need to use fields both indexes to calculate a final ranking. For example, one index is for the major content, the other index has the frequently-updated vote/score/popularity information. Like you said, the ParallelIndex seems too much hassle to maintain. It could be simple if I stor

Re: Virtually merge two indexes?

2007-03-26 Thread Xiaocheng Luan
How the indexes will be searched, do you need to search fields in both indexes? If the ParallelReader is not an attractive solution for you, finding a general solution may be difficult. Would it be possible to explore solutions that may work for your specific case? Just a thought. Xiaocheng C

Re: Matched Query Part in Hit Object

2007-03-26 Thread Chris Hostetter
Lucene Query objects do not generally "carry" this kind of incormation ... for debugging purposes you can use the Explanation class, but it is not particularly efficient. you may also want to look at SpanQueries ... they are a specialized subset of Queries which do keep track of this info, and yo

Re: Virtually merge two indexes?

2007-03-26 Thread Daniel Noll
Chris Hostetter wrote: : I think the better question could be, given a large/stale index A, a : small/updated index B, and the B does not satisfy the requirement of : ParallelReader. How can I create an index C that "add the same : documents in the same order of index A"? 1) optimize A so it has

Re: index word files ( doc )

2007-03-26 Thread Daniel Noll
Ryan Ackley wrote: >> Any comments on this are appreciated. One thing I thought of would be >> to continue to offer the text extraction as open source but add html >> conversion with hit highlighting for a variety of file formats as a >> commercial add on. Is this something anyone would pay for? W

Re: Virtually merge two indexes?

2007-03-26 Thread Chris Hostetter
: 1. When I use setBoost for document in index C, will that be counted in? i don't know what "counted in" means .. are you asking how documents boosts affect ParallelReader? ... because i have no idea. : 2. Does index A allow any deletion at all? If index A has some : deletions, I suppose index

Custom Analyzer Help please

2007-03-26 Thread TimF
I would like to be able to get terms from my data that are a combination of two existing analyzers. I would like this for both posting and searching of various fields. An example of the data might be as follows: Hello XY&Z Corporation - [EMAIL PROTECTED] I would like the following terms to come

Re: Virtually merge two indexes?

2007-03-26 Thread Chris Lu
Thanks Chris Hostetter! More questions: 1. When I use setBoost for document in index C, will that be counted in? 2. Does index A allow any deletion at all? If index A has some deletions, I suppose index C should also delete those after optimizing? But which deletion takes precedence? 3. If ind

Re: Virtually merge two indexes?

2007-03-26 Thread Chris Hostetter
: I think the better question could be, given a large/stale index A, a : small/updated index B, and the B does not satisfy the requirement of : ParallelReader. How can I create an index C that "add the same : documents in the same order of index A"? 1) optimize A so it has a single segment with n

Re: Linking two different indexes

2007-03-26 Thread Yakn
Thanks for the ParrallelReader, but that is not going to work either. I can see a use for it if I could add the document correctly with the content from Nutch. Ok, so I will try and ellaborate as much as possible here. From the previous post I made, I had: Nutch Index

Re: Matched Query Part in Hit Object

2007-03-26 Thread mark harwood
Not sure I understand the problem fully to be honest. >>however it doesn't give the effective keyword in query string. Are you saying it doesn't add the highlight markup in the appropriate place? If so,can you provide a Junit example? >>I want to find that "lorem" is the matched part of the que

Re: Matched Query Part in Hit Object

2007-03-26 Thread Mohsen Saboorian
Any hint? Mohsen Saboorian wrote: > > Hi, > Is there a way to find the matched part of query string in the Hit object? > Lucene's Hilghlighter module does part of the job, highlighting the > matched word in the result document, however it doesn't give the effective > keyword in query string. >

Re: Virtually merge two indexes?

2007-03-26 Thread Chris Lu
Hi, Steven, Although it's true that you would need to re-index your content for the frequently updated fields, you would *not* need to re-index the large/stale content index, as long as you keep constant the number of documents and the order in which you index them. This seems good but too st

Re: Virtually merge two indexes?

2007-03-26 Thread Steven Rowe
Hi Chris, Chris Lu wrote: > Hi, Steven, > > Thanks for the instant reply! But let's see the warning in the > ParallelReader javadoc: > "It is up to you to make sure all indexes are created and modified > the same way. For example, if you add documents to one index, you need > to add the same docu

Re: Virtually merge two indexes?

2007-03-26 Thread Chris Lu
Hi, Steven, Thanks for the instant reply! But let's see the warning in the ParallelReader javadoc: "It is up to you to make sure all indexes are created and modified the same way. For example, if you add documents to one index, you need to add the same documents in the same order to the other ind

Re: Virtually merge two indexes?

2007-03-26 Thread Steven Rowe
I think ParallelReader, first released in Lucene-Java 1.9, should meet your needs: - An IndexReader which reads multiple, parallel indexes. Each index added must have the same number of documents, but typical

Virtually merge two indexes?

2007-03-26 Thread Chris Lu
Hi, Gurus, One thing I want to do is: one index has fields like [primary-key, not-so-frequently-updated-fields, large-content-fields,...], and another index has [primary-key, frequently-updated-fields]. The purpose is to make the indexing process faster by keeping large/stale fields in one index

Re: Running out of memory while doing a search

2007-03-26 Thread Santa Clause
Here are the queries being run: +spanFirst(FIELD1:bfc, 2) spanNear([FIELD1:bfc,FIELD1:51], 2, true) This works with 603 matches +spanFirst(FIELD1:bfc, 2) spanNear([FIELD1:bfc,FIELD1:51], 2, true) +YNFIELD:y Runs out of memory (should have ~300 matches) +(+spanFirst(FIELD1:bfc,

Re: how to search over another search

2007-03-26 Thread Steven Rowe
Oops, sorry for the confusion, I was thinking of ParallelReader, first available in Lucene-Java release 1.9: - An IndexReader which reads multiple, parallel indexes. Each index added must have the same number

Re: how to search over another search

2007-03-26 Thread Steven Rowe
Hi Mohammad, Have you looked at MultiSearcher? Section 5.6 of Lucene in Action covers its use. Steve Mohammad Norouzi wrote: > hi > I have two separated index but there are some fields that are common > betwee

Re: search on multiple fields

2007-03-26 Thread Grant Ingersoll
Have a look at the MultifieldQueryParser. It isn't exactly what you are describing, but should be able to handle it. If you are building your queries programmatically, you can ask the IndexReader for the field names using getFieldNames method. On Mar 26, 2007, at 4:51 AM, Melanie Langloi

Re: how to search over another search

2007-03-26 Thread Erick Erickson
The short form is no. Lucene is emphatically NOT a relational database. Of course, you could take the results of the first search, collect the IDs and query the second, but for large sets this may not be practical Why not combine the indexes? That would be the "lucene way"... There has been exte

Re: search on multiple fields

2007-03-26 Thread Erick Erickson
The fastest way to figure this out for yourself would be to fire up luke and try it. That said, I'm quite sure it won't work. Erick On 3/26/07, Melanie Langlois <[EMAIL PROTECTED]> wrote: Hi, I'm wondering if lucene would understand such a query: content*:mysearch It's just because I i

Re: index word files ( doc )

2007-03-26 Thread Antony Bowesman
Ryan Ackley wrote: The 512 byte thing is a limitation of POIFS I think. I could be wrong though. Have you tried opening the file with just POIFS? It was some time ago, but it looks like I used both org.apache.poi.hwpf.extractor.WordExtractor org.apache.poi.hdf.extractor.WordDocument with the

Re: index word files ( doc )

2007-03-26 Thread John Haxby
John Haxby wrote: Sami Siren wrote: There's also antiword [1] which can convert your .doc to plain text or PS, not sure how good it is. antiword isn't very good. I use wvWare (http://wvware.sourceforge.net/) directly, but you may find that using abiword is better for you (abiword is an edi

Re: index word files ( doc )

2007-03-26 Thread John Haxby
Sami Siren wrote: There's also antiword [1] which can convert your .doc to plain text or PS, not sure how good it is. antiword isn't very good. I use wvWare (http://wvware.sourceforge.net/) directly, but you may find that using abiword is better for you (abiword is an editor, but it also do

Re: index word files ( doc )

2007-03-26 Thread Ryan Ackley
The 512 byte thing is a limitation of POIFS I think. I could be wrong though. Have you tried opening the file with just POIFS? On 3/26/07, Antony Bowesman <[EMAIL PROTECTED]> wrote: Ryan Ackley wrote: > Yes I do have plans for adding fast save support and support for more > file formats. The tim

Re: index word files ( doc )

2007-03-26 Thread Ryan Ackley
That is good to know thank you. Looking at their documentation, their preview seems to show the contents of the index for a particular file and you can transform this using xml. I can see how this would be useful. What I was proposing was a conversion from the binary format to html and including t

Re: how to search over another search

2007-03-26 Thread Mohammad Norouzi
I mean when I get result from the first index, find the common records from the second index depending on first result. something like relation between two database tables, relation by primary key index1: id name somefield1 1 jack value1 2

RE: Reverse search

2007-03-26 Thread Melanie Langlois
Hi Mark, Thanks, it does help. I will try that. Mélanie -Original Message- From: markharw00d [mailto:[EMAIL PROTECTED] Sent: Monday, March 26, 2007 12:36 AM To: java-user@lucene.apache.org Subject: Re: Reverse search On app startup: 1) parse all Queries and place in an array. 2) Creat

search on multiple fields

2007-03-26 Thread Melanie Langlois
Hi, I'm wondering if lucene would understand such a query: content*:mysearch It's just because I index several translations of my document contents in addition with common fields, and this separation is really usefull when an user specify the language in which he wants to search, but I w

Re: how to search over another search

2007-03-26 Thread jafarim
what do you mean by "applying the result to the second one"? On 3/26/07, Mohammad Norouzi <[EMAIL PROTECTED]> wrote: hi I have two separated index but there are some fields that are common between them. now I want to search from one index and then apply the result to the second one. what soluti

Re: index word files ( doc )

2007-03-26 Thread jafarim
Good to know that your devised commercial feature is already offered by Enhydra Snapper as an open-source feature. Check here: http://www.enhydra.org/apps/snapper/index.html On 3/26/07, Ryan Ackley <[EMAIL PROTECTED]> wrote: Yes I do have plans for adding fast save support and support for more