pyLucene and indexes
Have anyone on this forum successfully created indexes using pyLucene and then read it using the Java API . I realize this ought to be theoretically possible, but i don't have a lot of time left in my current project to chase down bugs . It would be very helpful to know if someone has succeeded with this on Lucene 2.0 . Thanks, Raghavan - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: updating index
I didn't fully understand your last post and why I wanted to do IndexReader.terms() then IndexReader.termDocs(). Won't something like this work? for (Business biz : updates) { Term t = new Term("id", biz.getId()+""); TermDocs tDocs = reader.termDocs(t); while (tDocs.next()) { Document doc = reader.document(tDocs.doc()); } } But tDocs never contains any docs. Is this because I've indexed my pk like this: doc.add(new Field("id", biz.getId(), Field.Store.YES, Field.Index.NO)); instead of doc.add(new Field("id", biz.getId(), Field.Store.YES, Field.Index.UNTOKENIZED)); Mark On 2/21/07, Erick Erickson <[EMAIL PROTECTED]> wrote: I think you can get MUCH better efficiency by using TermEnum/TermDocs. But I think you need to index (UN_TOKENIZED) your primary key (although now I'm not sure. But I'd be surprised if TermEnum worked with un-indexed data. Still, it'd be worth trying but I've always assumed that TermEnums only worked on indexed fields). Anyway, your loop looks more like this... TermEnum terms = IndexReader.terms(new Term("primarykey", "")); TermDocs tDocs = IndexRreader.termDocs(); while (terms.next()) { if (docsToUpdate.contains(terms.text()) { tDocs.seek(terms.term()); writer.updateDocument(tDocs.doc()); } } NOTE: I've been fast and loose with edge conditions, like insuring that while (terms.next()) doesn't skip the first term, so caveat emptor This loop also assumes that there is one and only one document in your index with the primary key. Otherwise, you have to do some more work with the TermDocs class to process each document that has your primary key... This is similar to creating Lucene filters, which is very fast Hope this helps Erick
[ANN]VTD-XML 2.0
The VTD-XML project team is proud to announce the release of version 2.0 of VTD-XML, the next generation XML parser/indexer. The new features introduced in this version are: * VTD+XML version 1.0: the world's first true native XML index that is simple, general-purpose and back-compatible with XML. * NodeRecorder Class that saves VTDNav's cursor location for later sequential access. * Overwrite capability * Lexically comparisons between VTD and strings To download the software, please go to http://sourceforge.net/project/showfiles.php?group_id=110612 To read the latest benchmark report please go to http://vtd-xml.sf.net/benchmark1.html To get the latest API overview http://www.ximpleware.com/vtd-xml_intro.pdf - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: how to define a pool for Searcher?
Thank you Mark for your useful help. the code you introduce was very helpful for me but my only question is that I need to place an idle time for each open searcher, so if it exceed the specific time then release that searcher and get ready for another thread. how can I put such this feature, I was thinking of a timeout listener, but dont know where tu put it. I have a SingleSearcher that wraps lucene's Searcher and it returns an ResultSet in which I put a Hits object. do I have to put the time in my ResultSet or my SingleSeacher? still I dont know ehrthrt the reader is important for Hits or Searcher? consider I passed a hits to my ResultSet, now, if I close searcher, will the Reader get closed? or another vague thing is can a Reader work thread safely for every Searcher with differenet queries? Thank you very much again. On 2/22/07, Mark Miller <[EMAIL PROTECTED]> wrote: I would not do this from scratch...if you are interested in Solr go that route else I would build off http://issues.apache.org/jira/browse/LUCENE-390 - Mark Mohammad Norouzi wrote: > Hi all, > I am going to build a Searcher pooling. if any one has experience on > this, I > would be glad to hear his/her recommendation and suggestion. I want to > know > what issues I should be apply. considering I am going to use this on a > web > application with many user sessions. > > thank you very much in advance. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Regards, Mohammad
Re: QueryParser bug?
Thanks Doron, that works. Antony Doron Cohen wrote: Hi Antony, Could you try the patch in http://issues.apache.org/jira/browse/LUCENE-813 Thanks, Doron - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: TextMining.org Word extractor
Yes, I found the info, but it seems his offer to hand over the software http://mail-archives.apache.org/mod_mbox/lucene-java-user/200602.mbox/[EMAIL PROTECTED] went un-answered. Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing it is still Apache 2, but as I am about to ship some software, I wanted to put the right licence text where it should be. Antony Chris Hostetter wrote: googling... TextMining.org licence ...turns up lots of useful info, some from the archive of this list. : Date: Fri, 23 Feb 2007 16:04:53 +1100 : From: Antony Bowesman <[EMAIL PROTECTED]> : Reply-To: java-user@lucene.apache.org : To: java-user@lucene.apache.org : Subject: TextMining.org Word extractor : : I'm extracting text from Word using TextMining.org extractors - it works better : than POI because it extracts Word 6/95 as well as 97-2002, which POI cannot do. : However, I'm trying to find out about licence issues with the TM jar. The TM : website seems to be permanently hacked these days. : : Anyone know? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: TextMining.org Word extractor
: Yes, I found the info, but it seems his offer to hand over the software : went un-answered. Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing it i don't know that you can assume that .. he specificaly said "Send me an email directly if you are interested" : is still Apache 2, but as I am about to ship some software, I wanted to put the : right licence text where it should be. he did explicitly say it was apache 2 in that email. and whatever copy you have that you want to ship should have come with the liscence. -Hoss - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: TextMining.org Word extractor
Hi Hoss, : Yes, I found the info, but it seems his offer to hand over the software : went un-answered. Nutch uses Ryan Ackley's Word6 extractor, so I'm guessing it i don't know that you can assume that .. he specificaly said "Send me an email directly if you are interested" Yes, hence this thread ;) I'd not like to rely on the textmining parser only to discover it's not useable. I can use POI if I have to, but it does not handle Word 6, which is bad, so I'd rather use TM. : is still Apache 2, but as I am about to ship some software, I wanted to put the : right licence text where it should be. he did explicitly say it was apache 2 in that email. and whatever copy you have that you want to ship should have come with the liscence. Actually, the jar file is the one that's downloaded with the LuceneInAction.zip file from the Manning website http://www.lucenebook.com/LuceneInAction.zip from http://www.manning.com/hatcher2/ and there's no licence file. The book does not refer to the licence although it refers to the parser as 'freely available'. The book just refers to the website - now unavailable. I've tried sending Ryan Ackley mail direct. Hopefully he will clarify its status. Antony - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Querying multiple fields in a document
Hello, I am new to Lucene. I have a document with 3 fields - name,subject,rollno I want to search on the 2 field names name and subject ie; i want to search for documents having a particular combination of name and subject, (say all the documents with name as bob and subject as maths) . Would appreciate if i can have any ideas on this. Thanks and Regards, Ruchika