Query search syntax: abs_path

2004-04-12 Thread Rodrigo Baptista

Hello list,

When I do a search using the property abs_path, I only have results if
the path name is all in lower-case, if it has one letter in upper-case
it doesn't work.
I must only have lower-case letter in the path?

Best regards,
Rodrigo Baptista.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: clustering results

2004-04-12 Thread Bruce Ritchie
 -Original Message-
 From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
 Sent: April 11, 2004 1:03 PM
 To: Lucene Users List
 Subject: Re: clustering results
 
 I got all excited reading the subject line clustering 
 results but this isn't really clustering is it?  This is 
 more sorting.  Does anyone know of any work within Lucene (or 
 another indexer) to do actual subject clustering (i.e. like 
 Vivisimo @ http://vivisimo.com/ or Kartoo @ 
 http://www.kartoo.com/)?  It would be pretty awesome if 
 Lucene had such ability, I know there aren't a whole lot of 
 clustering options, and the commercial products are very expensive.  
 Anyhow, just curious.

The one I know about is Carrot - http://www.cs.put.poznan.pl/dweiss/carrot/


Regards,

Bruce Ritchie
http://www.jivesoftware.com/


smime.p7s
Description: S/MIME cryptographic signature


Re: ValueListHandler pattern with Lucene

2004-04-12 Thread lucene
On Sunday 11 April 2004 17:46, Erik Hatcher wrote:
 In other words, you need to invent your own pattern here?!  :)

I just experimented a bit and came up with the ValueListSupplier which 
replaces the ValueList in the VLH. Seems to work so far... :-) Comments are 
greatly appreciated!

Timo

public class ValueListSupplier implements IValueListIterator
{
private final Log log = LogFactory.getLog(this.getClass());

// TODO junit test case
private Hits hits;
protected BitSet fetched;
protected List list;
protected int index;

public ValueListSupplier(Hits hits)
{
int size = hits.length();
this.list = new ArrayList(size);
// stupid idiots at SUN
for (int i = 0; i  size; i++) list.add(null);
this.fetched = new BitSet();
this.hits = hits;
this.index = 0;
}

public List getList()
{
return list;
}

public int size()
{
return list.size();
}

public boolean hasPrevious()
{
return index  0;
}

public boolean hasNext()
{
return index  size();
}

/**
 * @param index
 */
public synchronized void move(int index)
{
this.index = index;
}

public void reset()
{
move(0);
}

public Object current()
{
validate(index, index + 1);
return list.get(index);
}

public List previous(int count)
{
int from = Math.max(0, index - count);
int to = index;

validate(from, to);
move(from);
return list.subList(from, to);
}

public List next(int count)
{
int from = index;
int to = Math.min(Math.max(0, size() - 1), index + count);

validate(from, to);
move(to);
return list.subList(from, to);
}

/**
 * @param from
 * starting index (inclusive)
 * @param to
 * ending index (exclusive)
 */
private void validate(int from, int to)
{
while ((from = fetched.nextClearBit(from))  to)
{
log.debug(fetching # + from);

try
{
list.set(from, 
SearchResultAdapter.wrap(hits.doc(from)));
fetched.set(from);
}
catch (IOException e)
{
// TODO potentially bug
e.printStackTrace();
}
}
}

}

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: verifying index integrity

2004-04-12 Thread Kevin A. Burton
Doug Cutting wrote:

If you use this method, it is possible to corrupt things.  In 
particular, if you unlock an index that another process is modifying, 
then modify it, then these two processes might step on one another.  
So this method should only be called when you are certain that no one 
else is modifying the index.

We're handling this by using .pid files.  We use a standard initializer 
and use your own lock files with process IDs.  If you're on UNIX I can 
give you the source to the JNI getpid that I created.  I've been meaning 
on Open Sourcing this anyway... putting it into commons probably.

This way you can prevent multiple initialization if a java process is 
currently running that might be working with your index.  Otherwise 
there's no real way to be sure the lock isn't stale (unless time is a 
factor but that slows things down)

Kevin

--

Please reply using PGP.

   http://peerfear.org/pubkey.asc
   
   NewsMonster - http://www.newsmonster.org/
   
Kevin A. Burton, Location - San Francisco, CA, Cell - 415.595.9965
  AIM/YIM - sfburtonator,  Web - http://peerfear.org/
GPG fingerprint: 5FB2 F3E2 760E 70A8 6174 D393 E84D 8D04 99F1 4412
 IRC - freenode.net #infoanarchy | #p2p-hackers | #newsmonster

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


suitability of lucene for project

2004-04-12 Thread Sebastian Ho
hi all

i am investigating technologies to use for a project which basically
retrieves html pages on a regular basis(or whenever there are changes)
and allow html parsing to extract specific information, and presenting
them as links in a webpage. Note that this is not a general search
engine kind of project but we are extracting clinical information from
various website and consolidating them.

Pls advise me whether Lucene can do the above and in areas where it
cannot, suggestions to solutions will be appreciated.

Thanks

Sebastian Ho
Bioinformatics Institute


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: suitability of lucene for project

2004-04-12 Thread Erik Hatcher
No, Lucene is not the right solution for this particular use.  It does 
not include anything to retrieve HTML pages, or parse them.  However, 
if you ever needed full-text search, the Lucene is where it's at.

	Erik

On Apr 12, 2004, at 9:28 PM, Sebastian Ho wrote:

hi all

i am investigating technologies to use for a project which basically
retrieves html pages on a regular basis(or whenever there are changes)
and allow html parsing to extract specific information, and presenting
them as links in a webpage. Note that this is not a general search
engine kind of project but we are extracting clinical information from
various website and consolidating them.
Pls advise me whether Lucene can do the above and in areas where it
cannot, suggestions to solutions will be appreciated.
Thanks

Sebastian Ho
Bioinformatics Institute
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: suitability of lucene for project

2004-04-12 Thread Stephane James Vaucher
It could be part of you solution, but I don't think so. Let me explain:

I've done this a few times something similar to what you describe. I use 
often use HttpUnit to get information. How you process it, it's up 
to you. If you want it to be indexed (searchable), you can use Lucene. If 
you want to extract structured (or semi-structured) information, use 
wrapper induction techniques (not Lucene).

cheers,
sv

On 13 Apr 2004, Sebastian Ho wrote:

 hi all
 
 i am investigating technologies to use for a project which basically
 retrieves html pages on a regular basis(or whenever there are changes)
 and allow html parsing to extract specific information, and presenting
 them as links in a webpage. Note that this is not a general search
 engine kind of project but we are extracting clinical information from
 various website and consolidating them.
 
 Pls advise me whether Lucene can do the above and in areas where it
 cannot, suggestions to solutions will be appreciated.
 
 Thanks
 
 Sebastian Ho
 Bioinformatics Institute
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]