Re: Get element Class DOM !!!!

2009-01-13 Thread Fredrik Andersson
This has nothing to do with Lucene, but as I have written something very similar I'm taking the bait. You're best of using XPath or similar XML/HTML query language to parse the product specs, prices or whatever you're after. Each webshop you're indexing will have its own set of query expressions fo

Re: ranking

2008-06-28 Thread Fredrik Andersson
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/Similarity.html On Sat, Jun 28, 2008 at 2:16 AM, Maha Khairy <[EMAIL PROTECTED]> wrote: > > > I wanted to know how the ranking work in Lucene and if it is only according > to the frequency or there is any oth

Re: Multiple instances of Lucene IndexWriter

2007-10-13 Thread Fredrik Andersson
); > directory.makeLock(IndexWriter.WRITE_LOCK_NAME).isLocked() > > or > Directory directory = FSDirectory.getDirectory(indexDir); > IndexReader.isLocked(directory) > > many thanks, > David > > > Fredrik Andersson-2 wrote: > > > > What you suggested is

Re: Multiple instances of Lucene IndexWriter

2007-10-12 Thread Fredrik Andersson
What you suggested is generally the most easygoing way to deal with it, i.ehaving a separate index per writer and one serial merging process. I have dabbled with disabling (file system) locks and synchronizing the writing processes by different means, but it's failure-prone unless you're very famil

Re: Sub indices in Lucene

2007-02-14 Thread Fredrik Andersson
This is functionality you would build upon Lucene. I.e, when a document is dropped to the indexing module, you check the category and append the document to an appropriate index. You can then search multiple indices with one searcher or with multiple searchers. Also, Lucene 1.4 is kinda old... wel

Term hitscores in multiterm searches

2006-11-23 Thread Fredrik Andersson
Hi gang! If you do a multiterm query to Lucene, say "foo bar zoo", it gives you a heap of documents (Hits) as a result and all is well. If you want some tracking abilities to this query, for instance you want to know that document X was included in the Hits because "foo" and "bar" matched but "zo

Re: Unable to create lucene index

2006-10-18 Thread Fredrik Andersson
If you want to create an index, you have to supply the true as the last constructor argument to IndexWriter. The lock files use some kind of hash for their ID:s and might very well persist even if you delete the directory. So, delete new directory (if it ever was created), delete any lockfiles, ch

Re: Forcing an IndexReader to read-only

2006-09-01 Thread Fredrik Andersson
Hossman, thank you! Exactly what I was looking for. And I know the application of "locks", it's just a little peculiar situation right now which requires this... "fix" : ) Fredrik On 8/31/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : Even if it's very briefly whilst opening the index - a w

Re: Yonik Seeley joins Lucene PMC

2006-08-31 Thread Fredrik Andersson
Project Management Committee On 8/31/06, Leimbach, Johannes <[EMAIL PROTECTED]> wrote: Please excuse my stupid question - but what is a PMC? -Ursprüngliche Nachricht- Von: Fredrik Andersson [mailto:[EMAIL PROTECTED] Gesendet: Donnerstag, 31. August 2006 09:56 An: g

Re: Yonik Seeley joins Lucene PMC

2006-08-31 Thread Fredrik Andersson
Congrats, well deserved! On 8/31/06, Peter Keegan <[EMAIL PROTECTED]> wrote: Wow, from the contributions I've seen, I thought he was already a member. Congratulations, Yonik! Peter On 8/30/06, Doug Cutting <[EMAIL PROTECTED]> wrote: > > The Lucene PMC has voted to add Yonik Seeley to its ran

Re: Forcing an IndexReader to read-only

2006-08-31 Thread Fredrik Andersson
ased and the IndexReader ha no further need to lock the index unless you attempt a delete. : Date: Wed, 30 Aug 2006 17:31:32 +0200 : From: Fredrik Andersson <[EMAIL PROTECTED]> : Reply-To: general@lucene.apache.org : To: general@lucene.apache.org : Subject: Forcing an IndexReader to read-only

Forcing an IndexReader to read-only

2006-08-30 Thread Fredrik Andersson
Hi guys! I don't know if I've missed some crucial feature here, but how d'you actually force an IndexSearcher (and hence, the underlying IndexReader) to go read-only? The default behaviour now seems to be that the first one to acquire a lock automatically gets a read/write-lock, instead of leavin

Re: Kind of hardware config ?

2006-08-29 Thread Fredrik Andersson
Hey guys. 4Gb of RAM for an index of 2 million documents should really not be a problem. You should consider separating the index from the actual content ( i.e, only save the index data in your index, not the html), if you have the possibility to do that. I am not very comfortable with the very c

Re: Enforcing total matches

2006-05-29 Thread Fredrik Andersson
Yeah, I meant field matches ofcourse. Well ok, I'll check out the Keyword Analyzer and give it a go, thanks! On 5/30/06, Chris Hostetter <[EMAIL PROTECTED]> wrote: : I'd like to know if there's a way to force a query to return only exact : hits, not partials/subsets. For instance: : A query "fo

Enforcing total matches

2006-05-28 Thread Fredrik Andersson
Hey guys! I'd like to know if there's a way to force a query to return only exact hits, not partials/subsets. For instance: A query "foo bar" will match on a field with "foo bar zoo". This I would like to avoid as I'm need of removing duplicates on certain fields. Two options considered this far,

Re: Should I use one or many index's?

2006-05-26 Thread Fredrik Andersson
Ok, I figured you had some setup like that. Personally, I would prefer one large index. The overhead associated with opening/closing/managing thousands of searchers/modifiers is much bigger than to incorporate the personal restriction in the query. Also, you risk running out of filepointers, depe

Re: Should I use one or many index's?

2006-05-26 Thread Fredrik Andersson
If the users only should have access to search their own documents, it would probably make sense to keep their respective index locally. Besides greater query speed, it would also simplify things when updating/appending the index. So, that would mean one index, one IndexModifier and one IndexSearc

Moving index from 32 to 64 bit

2006-05-26 Thread Fredrik Andersson
Hi guys. Short question: Can a Lucene index (v1.9) be moved from a 32-bit Linux platform to a 64-bit Linux platform without breaking it? Thanks, Fredrik

Re: Term Vectors -- searching or just ranking?

2006-04-21 Thread Fredrik Andersson
Hi James, I can't speak for anyone else, but my experience is that the general approach is to first select a subset based on the angle between the query vector and the document vector, in their non-reduced forms (this is a normal search-for-keyword, what Lucene does by default, in vector notation)

Re: updating Lucene Index

2006-01-26 Thread Fredrik Andersson
[EMAIL PROTECTED] > Sent: Tuesday, January 24, 2006 4:04 PM > To: general@lucene.apache.org > Subject: RE: updating Lucene Index > > even if you use IndexModifier class, > you should delete then addDoc the document to be updated. > > Thanks, > > Koji > > > -Orig

Re: updating Lucene Index

2006-01-24 Thread Fredrik Andersson
Hi, Use the IndexModifier class? On 1/24/06, Kodumuri, Madhavi <[EMAIL PROTECTED]> wrote: > > Hi, > > My Lucene Indexer indexes from scratch with no problem. But I would like > to update the index database next time I run Indexer rather than > deleting the database and creating index from scratch

Any tricks to IndexReader.termDocs(Term)?

2005-10-04 Thread Fredrik Andersson
Hey Gang. Problem regarding the termDocs(Term) function, help most appreciated. // Create stored, indexed, non-tokenized field Field field = new Field("someId", someInteger+"", true, true, false); doc.add(field). This field looks fine in Luke and can be read properly, however, when I try to get a

Re: Reading back binary fields

2005-09-29 Thread Fredrik Andersson
Problem solved, it was a problem located elsewhere in the code not related to Lucene. Sorry! Fredrik On 9/29/05, Fredrik Andersson <[EMAIL PROTECTED]> wrote: > > Hey Gang! > > I'm having some problems when modifying an existing index, adding a binary > field

Reading back binary fields

2005-09-29 Thread Fredrik Andersson
Hey Gang! I'm having some problems when modifying an existing index, adding a binary field to each document. Or more specifically, I have a problem reading back that field. I'm using the IndexModifier from the trunk, and I am positive that the binary field gets written down, since the field name s

Re: Binary fields in index

2005-09-26 Thread Fredrik Andersson
t; You can encode (e.g. base64) the binary data to get a String > and store the String. > > Koji > > > -Original Message- > > From: Fredrik Andersson [mailto:[EMAIL PROTECTED] > > Sent: Monday, September 26, 2005 6:31 PM > > To: general@lucene.apache.org &g

Binary fields in index

2005-09-26 Thread Fredrik Andersson
Hello Gang! Is there any trick, or undocumented way, to store binary (unindexed, untokenized) data in a Lucene Field? All the Field constructors just deal with Strings. I'm currently using another database to store binary data, but it would be very neat, and more efficient, to store it directly in

VSM in Lucene, again

2005-09-04 Thread Fredrik Andersson
Hi folks. I read a transcript from last months digest of this list, in a post by Rajesh Munavalli, that Lucene uses a VSM retrieval method. In my previous work with VSM, it has included matching a query vector towards the documents in the term-document space. I have dissected and customized a l