Query search syntax: abs_path
Hello list, When I do a search using the property abs_path, I only have results if the path name is all in lower-case, if it has one letter in upper-case it doesn't work. I must only have lower-case letter in the path? Best regards, Rodrigo Baptista. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: clustering results
-Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: April 11, 2004 1:03 PM To: Lucene Users List Subject: Re: clustering results I got all excited reading the subject line clustering results but this isn't really clustering is it? This is more sorting. Does anyone know of any work within Lucene (or another indexer) to do actual subject clustering (i.e. like Vivisimo @ http://vivisimo.com/ or Kartoo @ http://www.kartoo.com/)? It would be pretty awesome if Lucene had such ability, I know there aren't a whole lot of clustering options, and the commercial products are very expensive. Anyhow, just curious. The one I know about is Carrot - http://www.cs.put.poznan.pl/dweiss/carrot/ Regards, Bruce Ritchie http://www.jivesoftware.com/ smime.p7s Description: S/MIME cryptographic signature
Re: ValueListHandler pattern with Lucene
On Sunday 11 April 2004 17:46, Erik Hatcher wrote: In other words, you need to invent your own pattern here?! :) I just experimented a bit and came up with the ValueListSupplier which replaces the ValueList in the VLH. Seems to work so far... :-) Comments are greatly appreciated! Timo public class ValueListSupplier implements IValueListIterator { private final Log log = LogFactory.getLog(this.getClass()); // TODO junit test case private Hits hits; protected BitSet fetched; protected List list; protected int index; public ValueListSupplier(Hits hits) { int size = hits.length(); this.list = new ArrayList(size); // stupid idiots at SUN for (int i = 0; i size; i++) list.add(null); this.fetched = new BitSet(); this.hits = hits; this.index = 0; } public List getList() { return list; } public int size() { return list.size(); } public boolean hasPrevious() { return index 0; } public boolean hasNext() { return index size(); } /** * @param index */ public synchronized void move(int index) { this.index = index; } public void reset() { move(0); } public Object current() { validate(index, index + 1); return list.get(index); } public List previous(int count) { int from = Math.max(0, index - count); int to = index; validate(from, to); move(from); return list.subList(from, to); } public List next(int count) { int from = index; int to = Math.min(Math.max(0, size() - 1), index + count); validate(from, to); move(to); return list.subList(from, to); } /** * @param from * starting index (inclusive) * @param to * ending index (exclusive) */ private void validate(int from, int to) { while ((from = fetched.nextClearBit(from)) to) { log.debug(fetching # + from); try { list.set(from, SearchResultAdapter.wrap(hits.doc(from))); fetched.set(from); } catch (IOException e) { // TODO potentially bug e.printStackTrace(); } } } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: verifying index integrity
Doug Cutting wrote: If you use this method, it is possible to corrupt things. In particular, if you unlock an index that another process is modifying, then modify it, then these two processes might step on one another. So this method should only be called when you are certain that no one else is modifying the index. We're handling this by using .pid files. We use a standard initializer and use your own lock files with process IDs. If you're on UNIX I can give you the source to the JNI getpid that I created. I've been meaning on Open Sourcing this anyway... putting it into commons probably. This way you can prevent multiple initialization if a java process is currently running that might be working with your index. Otherwise there's no real way to be sure the lock isn't stale (unless time is a factor but that slows things down) Kevin -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster - http://www.newsmonster.org/ Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965 AIM/YIM - sfburtonator, Web - http://peerfear.org/ GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
suitability of lucene for project
hi all i am investigating technologies to use for a project which basically retrieves html pages on a regular basis(or whenever there are changes) and allow html parsing to extract specific information, and presenting them as links in a webpage. Note that this is not a general search engine kind of project but we are extracting clinical information from various website and consolidating them. Pls advise me whether Lucene can do the above and in areas where it cannot, suggestions to solutions will be appreciated. Thanks Sebastian Ho Bioinformatics Institute - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: suitability of lucene for project
No, Lucene is not the right solution for this particular use. It does not include anything to retrieve HTML pages, or parse them. However, if you ever needed full-text search, the Lucene is where it's at. Erik On Apr 12, 2004, at 9:28 PM, Sebastian Ho wrote: hi all i am investigating technologies to use for a project which basically retrieves html pages on a regular basis(or whenever there are changes) and allow html parsing to extract specific information, and presenting them as links in a webpage. Note that this is not a general search engine kind of project but we are extracting clinical information from various website and consolidating them. Pls advise me whether Lucene can do the above and in areas where it cannot, suggestions to solutions will be appreciated. Thanks Sebastian Ho Bioinformatics Institute - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: suitability of lucene for project
It could be part of you solution, but I don't think so. Let me explain: I've done this a few times something similar to what you describe. I use often use HttpUnit to get information. How you process it, it's up to you. If you want it to be indexed (searchable), you can use Lucene. If you want to extract structured (or semi-structured) information, use wrapper induction techniques (not Lucene). cheers, sv On 13 Apr 2004, Sebastian Ho wrote: hi all i am investigating technologies to use for a project which basically retrieves html pages on a regular basis(or whenever there are changes) and allow html parsing to extract specific information, and presenting them as links in a webpage. Note that this is not a general search engine kind of project but we are extracting clinical information from various website and consolidating them. Pls advise me whether Lucene can do the above and in areas where it cannot, suggestions to solutions will be appreciated. Thanks Sebastian Ho Bioinformatics Institute - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]