pyLucene and indexes

2007-02-24 Thread Raghavan Srinivasan
Have anyone on this forum successfully created indexes using pyLucene 
and then read it using the Java API . I realize this ought to be 
theoretically possible, but i don't have a lot of time left in my 
current project to chase down bugs . It would be very helpful to know if 
someone has succeeded with this on Lucene 2.0 .


Thanks,
Raghavan

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: updating index

2007-02-24 Thread no spam

I didn't fully understand your last post and why I wanted to do
IndexReader.terms() then IndexReader.termDocs().  Won't something like this
work?

   for (Business biz : updates)
   {
   Term t = new Term("id", biz.getId()+"");
   TermDocs tDocs = reader.termDocs(t);

   while (tDocs.next())
   {
   Document doc = reader.document(tDocs.doc());
   }
   }

But tDocs never contains any docs.   Is this because I've indexed my pk like
this:

doc.add(new Field("id", biz.getId(), Field.Store.YES, Field.Index.NO));

instead of

doc.add(new Field("id", biz.getId(), Field.Store.YES,
Field.Index.UNTOKENIZED));

Mark

On 2/21/07, Erick Erickson <[EMAIL PROTECTED]> wrote:


I think you can get MUCH better efficiency by using TermEnum/TermDocs. But
I
think you need to index (UN_TOKENIZED) your primary key (although now I'm
not sure. But I'd be surprised if TermEnum worked with un-indexed data.
Still, it'd be worth trying but I've always assumed that TermEnums only
worked on indexed fields).

Anyway, your loop looks more like this...

TermEnum terms = IndexReader.terms(new Term("primarykey", ""));
TermDocs tDocs = IndexRreader.termDocs();

while (terms.next()) {
   if (docsToUpdate.contains(terms.text()) {
   tDocs.seek(terms.term());
   writer.updateDocument(tDocs.doc());
   }
}

NOTE: I've been fast and loose with edge conditions, like insuring that
while (terms.next()) doesn't skip the first term, so caveat emptor
This
loop also assumes that there is one and only one document in your index
with
the primary key. Otherwise, you have to do some more work with the
TermDocs
class to process each document that has your primary key...

This is similar to creating Lucene filters, which is very fast

Hope this helps
Erick






[ANN]VTD-XML 2.0

2007-02-24 Thread Jimmy Zhang
The VTD-XML project team is proud to announce the release of 
version 2.0 of VTD-XML, the next generation XML parser/indexer.

The new features introduced in this version are:

* VTD+XML version 1.0: the world's first true native XML index 
that is simple, general-purpose and back-compatible with XML. 
* NodeRecorder Class that saves VTDNav's cursor location for 
later sequential access.

* Overwrite capability
* Lexically comparisons between VTD and strings

To download the software, please go to 
http://sourceforge.net/project/showfiles.php?group_id=110612


To read the latest benchmark report please go to
http://vtd-xml.sf.net/benchmark1.html

To get the latest API overview
http://www.ximpleware.com/vtd-xml_intro.pdf


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: how to define a pool for Searcher?

2007-02-24 Thread Mohammad Norouzi

Thank you Mark for your useful help. the code you introduce was very helpful
for me

but my only question is that I need to place an idle time for each open
searcher, so if it exceed the specific time then release that searcher and
get ready for another thread.

how can I put such this feature, I was thinking of a timeout listener, but
dont know where tu put it. I have a SingleSearcher that wraps lucene's
Searcher and it returns an ResultSet in which I put a Hits object. do I have
to put the time in my ResultSet or my SingleSeacher?

still I dont know ehrthrt the reader is important for Hits or Searcher?
consider I passed a hits to my ResultSet, now, if I close searcher, will the
Reader get closed?  or another vague thing is can a Reader work thread
safely for every Searcher with differenet queries?

Thank you very much again.

On 2/22/07, Mark Miller <[EMAIL PROTECTED]> wrote:


I would not do this from scratch...if you are interested in Solr go that
route else I would build off
http://issues.apache.org/jira/browse/LUCENE-390

- Mark

Mohammad Norouzi wrote:
> Hi all,
> I am going to build a Searcher pooling. if any one has experience on
> this, I
> would be glad to hear his/her recommendation and suggestion. I want to
> know
> what issues I should be apply. considering I am going to use this on a
> web
> application with many user sessions.
>
> thank you very much in advance.

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]





--
Regards,
Mohammad


Re: QueryParser bug?

2007-02-24 Thread Antony Bowesman

Thanks Doron, that works.
Antony


Doron Cohen wrote:

Hi Antony,

Could you try the patch in
http://issues.apache.org/jira/browse/LUCENE-813

Thanks,
Doron




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: TextMining.org Word extractor

2007-02-24 Thread Antony Bowesman

Yes, I found the info, but it seems his offer to hand over the software

http://mail-archives.apache.org/mod_mbox/lucene-java-user/200602.mbox/[EMAIL 
PROTECTED]

went un-answered.  Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing it 
is still Apache 2, but as I am about to ship some software, I wanted to put the 
right licence text where it should be.


Antony


Chris Hostetter wrote:


googling...
TextMining.org licence
...turns up lots of useful info, some from the archive of this list.


: Date: Fri, 23 Feb 2007 16:04:53 +1100
: From: Antony Bowesman <[EMAIL PROTECTED]>
: Reply-To: java-user@lucene.apache.org
: To: java-user@lucene.apache.org
: Subject: TextMining.org Word extractor
:
: I'm extracting text from Word using TextMining.org extractors - it works 
better
: than POI because it extracts Word 6/95 as well as 97-2002, which POI cannot 
do.
:   However, I'm trying to find out about licence issues with the TM jar. The TM
: website seems to be permanently hacked these days.
:
: Anyone know?




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: TextMining.org Word extractor

2007-02-24 Thread Chris Hostetter

: Yes, I found the info, but it seems his offer to hand over the software
: went un-answered.  Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing 
it

i don't know that you can assume that .. he specificaly said "Send me an
email directly if you are interested"

: is still Apache 2, but as I am about to ship some software, I wanted to put 
the
: right licence text where it should be.

he did explicitly say it was apache 2 in that email.  and whatever copy
you have that you want to ship should have come with the liscence.




-Hoss


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: TextMining.org Word extractor

2007-02-24 Thread Antony Bowesman

Hi Hoss,


: Yes, I found the info, but it seems his offer to hand over the software
: went un-answered.  Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing 
it

i don't know that you can assume that .. he specificaly said "Send me an
email directly if you are interested"


Yes, hence this thread ;)  I'd not like to rely on the textmining parser only to 
discover it's not useable.  I can use POI if I have to, but it does not handle 
Word 6, which is bad, so I'd rather use TM.



: is still Apache 2, but as I am about to ship some software, I wanted to put 
the
: right licence text where it should be.

he did explicitly say it was apache 2 in that email.  and whatever copy
you have that you want to ship should have come with the liscence.


Actually, the jar file is the one that's downloaded with the LuceneInAction.zip 
file from the Manning website


http://www.lucenebook.com/LuceneInAction.zip from 
http://www.manning.com/hatcher2/

and there's no licence file.  The book does not refer to the licence although it 
refers to the parser as 'freely available'.  The book just refers to the website 
- now unavailable.


I've tried sending Ryan Ackley mail direct.  Hopefully he will clarify its 
status.

Antony


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Querying multiple fields in a document

2007-02-24 Thread ruchi thakur

Hello,
I am new to Lucene.
I have a document with 3 fields -   name,subject,rollno

I want to search on the 2 field names name and subject
ie; i want to search for documents having a particular combination of name
and subject, (say all the documents with name as bob and subject as maths) .
Would appreciate if i can have any ideas on this.
Thanks and Regards,
Ruchika