RE: BOOLEAN EXCEPTION APPSERVER SOLUTION
Hi Guys Apologies The form was Correct, The problem of CLASS LOADING was or may be a bug with the Version jdk1.4.1 and TOMCAT5.5.3 on Gentoo O/s. So I switched to jdk1.4.2 and every thing seems to be in proper Order as of now. Thx for the advise With regards karthik -Original Message- From: Ronnie [mailto:[EMAIL PROTECTED] Sent: Friday, February 11, 2005 4:37 PM To: Lucene Users List Subject: Re: BOOLEAN EXCEPTION APPSERVER Do a search for lucene jars, something like: # find $TOMCAT_HOME/ -name lucene*.jar Replace $TOMCAT_HOME with the correct dir to your tomcat installation. Also check the classpath of the user running tomcat. /Ronnie - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List lucene-user@jakarta.apache.org Sent: Friday, February 11, 2005 10:52 AM Subject: RE: BOOLEAN EXCEPTION APPSERVER Hi I removed the Lucene1.4.3.jar from the webapp dir and the result Exception raised Feb 11, 2005 3:48:26 PM org.apache.catalina.core.ApplicationContext log SEVERE: Error configuring application listener of class com.controlnet.servertool.WebContextReporter java.lang.NoClassDefFoundError: org/apache/lucene/analysis/Analyzer at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:1590) at java.lang.Class.getConstructor0(Class.java:1762) at java.lang.Class.newInstance0(Class.java:276) at java.lang.Class.newInstance(Class.java:259) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java: 3546) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4031) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:7 55) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:739) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:886) at org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:849 ) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:474) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1079) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:310) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSuppor t.java:119) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1011) at org.apache.catalina.core.StandardHost.start(StandardHost.java:718) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1003) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:437) at org.apache.catalina.core.StandardService.start(StandardService.java:450) at org.apache.catalina.core.StandardServer.start(StandardServer.java:2009) at org.apache.catalina.startup.Catalina.start(Catalina.java:538) So this mean's I have Only one Copy of Lucene in the Classpath of Tomcat5 and The same Exceptions are also avaliable for Windows2000 / Linux gentoo servers. Please Help Thx in advance -Original Message- From: Miles Barr [mailto:[EMAIL PROTECTED] Sent: Friday, February 11, 2005 2:51 PM To: Lucene Users List Subject: Re: BOOLEAN EXCEPTION APPSERVER On Fri, 2005-02-11 at 12:20 +0530, Karthik N S wrote: I am getting this error on ' Every FIRST SEARCH after Startup of the WEBSERVER ' and I have declared the following code only once in the method of execution %@ page import=org.apache.lucene.search.BooleanQuery% BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE); The Exception is as follows Feb 11, 2005 12:16:42 PM org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.lang.LinkageError: duplicate class definition: org/apache/lucene/search/BooleanQuery at java.lang.ClassLoader.defineClass0(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:502) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:123) at org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLo ader.java:1626) at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.jav a:850) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav a:1299) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav a:1181) at org.apache.jasper.servlet.JasperLoader.loadClass(JasperLoader.java:148) at org.apache.jasper.servlet.JasperLoader.loadClass(JasperLoader.java:69) o/S = Gentoo Linux java = 1.4.1 Ram = 256 webserver Tomcat5.5.3 It looks like the class definition is being loaded twice. But if it's being done by different classloaders
RE: BOOLEAN EXCEPTION APPSERVER
Hi I removed the Lucene1.4.3.jar from the webapp dir and the result Exception raised Feb 11, 2005 3:48:26 PM org.apache.catalina.core.ApplicationContext log SEVERE: Error configuring application listener of class com.controlnet.servertool.WebContextReporter java.lang.NoClassDefFoundError: org/apache/lucene/analysis/Analyzer at java.lang.Class.getDeclaredConstructors0(Native Method) at java.lang.Class.privateGetDeclaredConstructors(Class.java:1590) at java.lang.Class.getConstructor0(Class.java:1762) at java.lang.Class.newInstance0(Class.java:276) at java.lang.Class.newInstance(Class.java:259) at org.apache.catalina.core.StandardContext.listenerStart(StandardContext.java: 3546) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4031) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:7 55) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:739) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:525) at org.apache.catalina.startup.HostConfig.deployDirectory(HostConfig.java:886) at org.apache.catalina.startup.HostConfig.deployDirectories(HostConfig.java:849 ) at org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:474) at org.apache.catalina.startup.HostConfig.start(HostConfig.java:1079) at org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:310) at org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSuppor t.java:119) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1011) at org.apache.catalina.core.StandardHost.start(StandardHost.java:718) at org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1003) at org.apache.catalina.core.StandardEngine.start(StandardEngine.java:437) at org.apache.catalina.core.StandardService.start(StandardService.java:450) at org.apache.catalina.core.StandardServer.start(StandardServer.java:2009) at org.apache.catalina.startup.Catalina.start(Catalina.java:538) So this mean's I have Only one Copy of Lucene in the Classpath of Tomcat5 and The same Exceptions are also avaliable for Windows2000 / Linux gentoo servers. Please Help Thx in advance -Original Message- From: Miles Barr [mailto:[EMAIL PROTECTED] Sent: Friday, February 11, 2005 2:51 PM To: Lucene Users List Subject: Re: BOOLEAN EXCEPTION APPSERVER On Fri, 2005-02-11 at 12:20 +0530, Karthik N S wrote: I am getting this error on ' Every FIRST SEARCH after Startup of the WEBSERVER ' and I have declared the following code only once in the method of execution %@ page import=org.apache.lucene.search.BooleanQuery% BooleanQuery.setMaxClauseCount(Integer.MAX_VALUE); The Exception is as follows Feb 11, 2005 12:16:42 PM org.apache.catalina.core.StandardWrapperValve invoke SEVERE: Servlet.service() for servlet jsp threw exception java.lang.LinkageError: duplicate class definition: org/apache/lucene/search/BooleanQuery at java.lang.ClassLoader.defineClass0(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:502) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:123) at org.apache.catalina.loader.WebappClassLoader.findClassInternal(WebappClassLo ader.java:1626) at org.apache.catalina.loader.WebappClassLoader.findClass(WebappClassLoader.jav a:850) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav a:1299) at org.apache.catalina.loader.WebappClassLoader.loadClass(WebappClassLoader.jav a:1181) at org.apache.jasper.servlet.JasperLoader.loadClass(JasperLoader.java:148) at org.apache.jasper.servlet.JasperLoader.loadClass(JasperLoader.java:69) o/S = Gentoo Linux java = 1.4.1 Ram = 256 webserver Tomcat5.5.3 It looks like the class definition is being loaded twice. But if it's being done by different classloaders it should be fine. You might have two different versions on Lucene being loaded. Tomcat uses several classloaders depending on where it finds the JAR file: http://jakarta.apache.org/tomcat/tomcat-5.5-doc/class-loader-howto.html Make sure you only have one copy of the Lucene JAR visible to Tomcat. -- Miles Barr [EMAIL PROTECTED] Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: BOOLEAN EXCEPTION APPSERVER
Hi Apologies. When I said 'defined another BooleanQuery class' I meant actually writing another class with the name org.apache.lucene.search.BooleanQuery. I'm guessing this isn't the case. No None of my Packages either start or named with the Lucene similar names [I use Eclipse IDE and it would defnetly indicate the name conflict ,if this was the case] will come back afte switching the jdk from 1.4.1 to 1.4.2 Any more Ideas post to the Form will be of great Help Thx in advance -Original Message- From: Miles Barr [mailto:[EMAIL PROTECTED] Sent: Friday, February 11, 2005 4:03 PM To: Lucene Users List Subject: RE: BOOLEAN EXCEPTION APPSERVER On Fri, 2005-02-11 at 15:50 +0530, Karthik N S wrote: Hi I have One Jsp [Query.jsp] which constructs Query something like below +CLOTHS +(+SHOES SOCKS) +(PANTS SHIRTS) -COTTON AND itemPrice:[0010 TO 0020] That'd odd. You haven't defined another BooleanQuery class have you? So for the itemPrice Range I use the BooleanQuery When I said 'defined another BooleanQuery class' I meant actually writing another class with the name org.apache.lucene.search.BooleanQuery. I'm guessing this isn't the case. I'm afraid I'm out of ideas. Maybe as a last ditch attempt you could try switching JVMs? -- Miles Barr [EMAIL PROTECTED] Runtime Collective Ltd. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
REPLACE USING ANALYZERS
Hi Guys Apologies. I am would like to know if Any Analyzers out there which can give me the required o/p as shown below 1) I/p = "+~shoes -~nike" O/p = "+shoes -nike" 2) I/p = +(+"~shoes -~nike") O/p = +(+"shoes -nike") 3) I/p = +~shoes -~nike O/p = +shoes -nike [ Note:- I am Using the _javascript_ tool avaliable from Lucene ContributersSite to build Advance Search with synonym factor ] Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK]
RE: REPLACE USING ANALYZERS
Hi Erik OOps Forgot ! What about If the I/p is I/p = +~shoes~ -~nike~ or +(+~shoes~ -~nike~) or +~shoes~ -~nike~ Using replaceAll would not solve the Problem , since Fuzzy Searches in Query Parses would not return hits for equivalen's. |:( thx in advance Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, February 02, 2005 3:50 PM To: Lucene Users List Subject: Re: REPLACE USING ANALYZERS On Feb 2, 2005, at 4:12 AM, Karthik N S wrote: Hi Guys Apologies. I am would like to know if Any Analyzers out there which can give me the required o/p as shown below Sure: string.replaceAll(~,) :) 1) I/p = +~shoes -~nike O/p = +shoes -nike 2) I/p = +(+~shoes -~nike) O/p = +(+shoes -nike) 3) I/p = +~shoes -~nike O/p = +shoes -nike [ Note:- I am Using the Javascript tool avaliable from Lucene Contributers Site to build Advance Search with synonym factor ] Thx in advance image.tiff WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
LUCENE + EXCEPTION
Hi Guys Apologies.. On STANDALONE Usge of UPDATION/DELETION/ADDITION of Documents into MergerIndex, the Code of mine runs PERFECTLY with out any Problems. But When the same Code is plugged into a WEBAPP on TOMCAT with a servlet Running in SINGLE THREAD MODE,Some times Frequently I get the Error as below java.io.IOException: read past EOF at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:218) at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:356) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:323) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:64) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) at org.apache.lucene.search.Hits.init(Hits.java:43) at org.apache.lucene.search.Searcher.search(Searcher.java:33) at org.apache.lucene.search.Searcher.search(Searcher.java:27) Somebody Please tell me Why is this happening O/s = Jentoo JAVA = Jdk 1.4.2 WEBAPP = TOMCAT Lucene = 1.4.3 Thx in advance Karthik WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK]
RE: LUCENE + EXCEPTION
Hi Ok Still I have the Exeption in process ,If even I try to have a Servlet Single Instance [may be by Authentication processs] , but I made shure that Lucene's MergerIndexing is controlled by single Initiation... But With out any Shared Resource's the Exception is popping on Frequently, java.io.IOException: read past EOF at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou ndFileReader.java:218) at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:356) at org.apache.lucene.index.SegmentReader.norms(SegmentReader.java:323) at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:64) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) at org.apache.lucene.search.Hits.init(Hits.java:43) at org.apache.lucene.search.Searcher.search(Searcher.java:33) at org.apache.lucene.search.Searcher.search(Searcher.java:27) Please Help me [ I could not find any solution on Lucene Form for the same,may be I am the only one with the issue] Karthik -Original Message- From: Chris Lamprecht [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 25, 2005 9:48 AM To: Lucene Users List Subject: Re: LUCENE + EXCEPTION Hi Karthik, If you are talking about SingleThreadModel (i.e. your servlet implements javax.servlet.SingleThreadModel), this does not guarantee that two different instances of your servlet won't be run at the same time. It only guarantees that each instance of your servlet will only be run by one thread at a time. See: http://java.sun.com/j2ee/sdk_1.3/techdocs/api/javax/servlet/SingleThreadMode l.html If you are accessing a shared resource (a lucene index), you'll have to prevent concurrent modifications somehow other than SingleThreadModel. I think they've finally deprecated SingleThreadModel in the latest (may be not even out yet) servlet spec. -chris On STANDALONE Usge of UPDATION/DELETION/ADDITION of Documents into MergerIndex, the Code of mine runs PERFECTLY with out any Problems. But When the same Code is plugged into a WEBAPP on TOMCAT with a servlet Running in SINGLE THREAD MODE,Some times Frequently I get the Error as below - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: help in indexing
Hi Probably u need to use the Luke S/w to peek insid tu'r Indexer,Use it then come back for more help Karthik -Original Message- From: chetan minajagi [mailto:[EMAIL PROTECTED] Sent: Thursday, January 20, 2005 12:05 PM To: lucene-user@jakarta.apache.org Subject: help in indexing Hi , It might seem elementary to most of you. I am trying to build a search tool for internal use using lucene. I have used the following for .pdf -- PDFBOx .html -- demo file of lucene(HTMLDocument) .xls -- poi The indexing seems to work without throwing up any errors. But,when i try to search i end up getting with zero hits always. I have tried to use the same string that i see (System.out.print(Document)) but in vain. Can somebody let me know where and what could be wrong. Regards, Chetan - Do you Yahoo!? Yahoo! Search presents - Jib Jab's 'Second Term' - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: QUERYPARSIN BOOSTING
Hi Guys Apologies... If somebody's is been closely watching GOOGLE, It boost's WEBSITES for payed category sites based on search words. Can This [ boost the Full WEBSITE ] be achieved in Lucene's search based on searchword If So Please Explain /examples ???. with regards karthik -Original Message- From: Chuck Williams [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 11, 2005 2:00 PM To: Lucene Users List; [EMAIL PROTECTED] Subject: RE: QUERYPARSIN BOOSTING Karthik, I don't think the boost in your example does much since you are using an AND query, i.e. all hits will have to contain both vendor:nike and contents:shoes. If you used an OR, then the boost would put nike products above (non-nike) shoes, unless there was some other factor that causes score of contents:shoes to be 10x greater than that of vendor:nike. It's a good idea to look at the results of explain() when analyzing what's happening with scoring, tuning your boosts and your Similarity. Chuck -Original Message- From: Nader Henein [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 11, 2005 12:21 AM To: Lucene Users List Subject: Re: QUERYPARSIN BOOSTING From the text on the Lucene Jakarta Site : http://jakarta.apache.org/lucene/docs/queryparsersyntax.html Lucene provides the relevance level of matching documents based on the terms found. To boost a term use the caret, ^, symbol with a boost factor (a number) at the end of the term you are searching. The higher the boost factor, the more relevant the term will be. Boosting allows you to control the relevance of a document by boosting its term. For example, if you are searching for jakarta apache and you want the term jakarta to be more relevant boost it using the ^ symbol along with the boost factor next to the term. You would type: jakarta^4 apache This will make documents with the term jakarta appear more relevant. You can also boost Phrase Terms as in the example: jakarta apache^4 jakarta lucene By default, the boost factor is 1. Although the boost factor must be positive, it can be less than 1 (e.g. 0.2) Regards. Nader Henein Karthik N S wrote: Hi Guys Apologies... This Question may be asked million times on this form ,need some clarifications. 1) FieldType = keyword name = vendor 2)FieldType = text name = contents Question: 1) How to Construct a Query which would allow hits avaliable for the VENDOR to appear first ?. 2) If boosting is to be applied How TO ?. 3) Is the Query Constructed Below correct?. +Contents:shoes +((vendor:nike)^10) Please Advise. Thx in advance. WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
SYNONYM + GOOGLE
Hi Guys Apologies Does Lucene have a Synonym Functonality as Google. If u search Google using '~shoes', It returns hits based on the Synonym's [ I know there is a Synonym Wordnet based Lucene Package in the sandbox http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/contributions/WordN et/ ] Can this be achieved in Lucene ,If so How ??? Thx in Advance Karthik WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: SYNONYM + GOOGLE
Hi Erik Apologies... I may be a little offline from this form,but I may help u for the next version of Luncene In Action. I Was working on Java WordNet Library , On fiddling with the API's, found something Interesting , the code attached to this get's more Synonyms then the Wordnet's Indexed format avaliable from the LuceneinAction Zip File 1) It needs Wordnet2.0's Dictonery Installed 2) jwnl.jar from SourceForge [ http://sourceforge.net/project/showfiles.php?group_id=33824package_id=33975 release_id=196864 ] After sucess compilation Type for watch ORIGINAL : watch OR analog_watch OR digital_watch OR hunter OR hunting_watch OR pendulum_watch OR pocket_watch OR stem-winder OR wristwatch OR wrist_watch FORMATTED : watch OR analog watch OR digital watch OR hunter OR hunting watch OR pendulum watch OR pocket watch Check this Out,may be u will come up with Briliant Idea's with regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, January 10, 2005 5:19 PM To: Lucene Users List Subject: Re: SYNONYM + GOOGLE On Jan 10, 2005, at 5:33 AM, Karthik N S wrote: If u search Google using '~shoes', It returns hits based on the Synonym's [ I know there is a Synonym Wordnet based Lucene Package in the sandbox http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/ contributions/WordN et/ ] Can this be achieved in Lucene ,If so How ??? Yes, it can be achieved. Not quite synonyms, but various forms of the same word can be found in this example, like this search for similar (see the highlighted variations): http://www.lucenebook.com/search?query=similar This is accomplished using the Snowball stemmer filter found in the sandbox. For synonyms, you have lots of options. In Lucene in Action I demonstrate custom analyzers that inject synonyms using the WordNet database (from the sandbox). From the source code distribution of LIA: % ant SynonymAnalyzerViewer Buildfile: build.xml SynonymAnalyzerViewer: [echo] [echo] Using a custom SynonymAnalyzer, two fixed strings are [echo] analyzed with the results displayed. Synonyms, from the [echo] WordNet database, are injected into the same positions [echo] as the original words. [echo] [echo] See the Analysis chapter for more on synonym injection and [echo] position increments. The Tools and extensions chapter covers [echo] the WordNet feature found in the Lucene sandbox. [echo] [input] Press return to continue... [echo] Running lia.analysis.synonym.SynonymAnalyzerViewer... [java] 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile] [java] 2: [brown] [brownness] [brownish] [java] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger] [discombobulate] [confuse] [confound] [befuddle] [bedevil] [java] 4: [jumps] [java] 5: [over] [o] [across] [java] 6: [lazy] [faineant] [indolent] [otiose] [slothful] [java] 7: [dogs] ... The phrase analyzed was The quick brown fox jumps over the lazy dogs. Why no synonyms for jumps and dogs? WordNet has synonyms for jump and dog, but not the plural forms. Stemming would be a necessary step in achieving full synonym look-up, though this would need to be done carefully as the stem of a word is not necessarily a real word itself - so you'd probably want to stem the synonym database also to ensure accurate lookup. Also notice the semantically incorrect synonyms that appear for the animal fox (confuse, for example). Be careful! :) Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Please help - installation problem
Hi I think u need to add One more piece of code at the end of Path setting JAVA _HOME = /home/JDK.. and finally export TOMCAT_HOME CLASSPATH ANT_HOME PATH Once u have done this Type echo $CLASSPATH to check if the jar files are avaliable for compilation / Interpretation an I also know if tomcat version 3.2.4 is sufficient for Lucene to run? Any Relavent Tomcat that works with Jdk1.4.2 is suffucient for Lucene to Execute... with regards Karthik -Original Message- From: jac jac [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 11, 2005 8:21 AM To: lucene-user@jakarta.apache.org Subject: Please help - installation problem Hi all, This is jac here, and I am currently in urgent need to install Lucene on a unix machine. However, I am not sure where to set the paths coz I am unfamiliar with Unix am a newbie to Java as well. I have installed Lucene on the Windows version and it works but I cant understand why unix can't.. The following paths are what I have entered: Can someone please check for me? - PATH=.:$PATH:$ANT_HOME/bin - TOMCAT_HOME=/home/jac/jakarta-tomcat-3.2.4 - ANT_HOME=/home/jac/apache-ant-1.6.2 - CLASSPATH=$TOMCAT_HOME/webapps/luc/WEB-INF/lib/lucene-1.4.3.jar: $TOMCAT_HOME/webapps/luc/WEB-INF/lib/lucene-demos-1.4.3 Can I also know if tomcat version 3.2.4 is sufficient for Lucene to run? Thanks in advance! Regards, jac Yahoo! Mobile - Download the latest ringtones, games, and more! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
QUERYPARSIN BOOSTING
Hi Guys Apologies... This Question may be asked million times on this form ,need some clarifications. 1) FieldType = keyword name = vendor 2)FieldType = text name = contents Question: 1) How to Construct a Query which would allow hits avaliable for the VENDOR to appear first ?. 2) If boosting is to be applied How TO ?. 3) Is the Query Constructed Below correct?. +Contents:shoes +((vendor:nike)^10) Please Advise. Thx in advance. WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
INDEXREADER + MAXDOC
Hi Guys Apologies... On using the integer number of Indexreader.maxDoc() API , Is it possible to get the VALUES from the varoius fieldtypes. ex:- 'docs.get(contents) at IndexReader.maxdoc()' If so How...?? WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: INDEXREADER + MAXDOC
Hi Erik Apologies... I would like to EXTRACT the DATA from the various fields of the Last Document [as u said ] Ex: at IndexReader.maxDoc = 100 doc.get(Content) == ISBN100 doc.get(name)== LUCENE IN ACTION doc.get(author) == Erik Hatcher . This is my Requirement. Please With regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, January 04, 2005 5:10 PM To: Lucene Users List Subject: Re: INDEXREADER + MAXDOC On Jan 4, 2005, at 5:19 AM, Karthik N S wrote: On using the integer number of Indexreader.maxDoc() API , Is it possible to get the VALUES from the varoius fieldtypes. ex:- 'docs.get(contents) at IndexReader.maxdoc()' If so How...?? Just to be sure I understand... you want the last document in the index? IndexReader.document(n) will give you this. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: New Highlighter features + api
Hi Mark, Apologies.. Can u Please tell the form where to find the JavaDoc API for the Highlighter package u have created. Thx in Advance [ WISH U AND THE FORM 'A HAPPY NEW YEAR'] KARTHIK -Original Message- From: markharw00d [mailto:[EMAIL PROTECTED] Sent: Monday, January 03, 2005 4:14 AM To: Lucene Users List Subject: New Highlighter features The Highlighter package in CVS has been updated with the following new features: * GradientFormatter is a new formatter that can be used to change the colour intensity of matching terms, based on their score. I have found this to be a useful way of visualizing the basis of query matches, especially when the query was derived automatically eg in a MoreLikeThis style of query. * The QueryScorer class has a new constructor that takes an IndexReader which is used to provide term scores based on scarcity (idf score). Using this with the new GradientFormatter ensures the most important terms are highlighted most strongly. * New class TokenSources offers methods to produce a TokenStream from indexes using the new TermVector features (saving the cost of reanalyzing to produce TokenStreams for the highlighter). TokenStreams. Formatters and Scorers are all pluggable elements of the main Highlighter class so these are just new extensions around the existing core functionality. See the Javadocs for further details on how to use these components.. Cheers Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
SNOWBALL STEMMER + BOOSTING
Hi Guys Apologies.. Using Analysis Paralysis on SnowBall Stemmer [ using StandardAnalyzer. ENGLISH_STOP_WORDS and StopAnalyzer.ENGLISH_STOP_WORDS ] from http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html?page=last#thre ad for the word 'jakarta^4 apache' both the cases return me something like this = org.apache.lucene.analysis.snowball.SnowballAnalyzer: [JAKARTHA] [4] [APACHE] = I wonder what happened to the BOOSTING SYMBOL '^' and if the same word is used on QueryParser.parse(), What would be the Hit's returned??? Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
MergerIndex + Searchables
Hi Guys Apologies... I have several MERGERINDEXES [ MGR1,MGR2,MGR3]. for searching across these MERGERINDEXES I use the following Code IndexSearcher[] indexToSearch = new IndexSearcher[CNTINDXDBOOK]; for(int all=0;allCNTINDXDBOOK;all++){ indexToSearch[all] = new IndexSearcher(INDEXEDBOOKS[all]); System.out.println(all + ADDED TO SEARCHABLES + INDEXEDBOOKS[all]); } MultiSearcher searcher = new MultiSearcher(indexToSearch); Question : When on Search Process , How to Display that this relevan Document Id Originated from Which MRG??? [ Some thing like this : - Search word 'ISBN12345' is avalible from MRGx ] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: NUMERIC RANGE BOOLEAN
Hi Erik Apologies. Yes As I told u in the X-mail We have to get the All the Hits int the Range , So 0.99 cents IS ALWAYS be 0.99 cents on which we do the price Comaprison from consumer point of view . I hope I have answered u'r Question With regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, December 16, 2004 5:24 PM To: Lucene Users List Subject: Re: NUMERIC RANGE BOOLEAN On Dec 16, 2004, at 5:03 AM, Morus Walter wrote: Erik Hatcher writes: TooManyClauses exception occurs when a query such as a RangeQuery expands to more than 1024 terms. I don't see how this could be the case in the query you provided - are you certain that is the query that generated the error? Why not: the terms might be 0003 0003.1 0003.11 ... So the question is, how do his terms look like... Ah, good point! So, Karthik - what are are the values of those terms? Pragmatically, do you really need to do a range involving the cents of a price? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing with Lucene 1.4.3
Hi there Apologies. If u are using the IndexHTML from the demo.jar package which is abvaliable from Lucene1.4.3.zip Then u bettter look at the File Extensions of u'r file's,they may be filtered out of the indexing process due to this code present in IndexHTML.java } else if (file.getPath().endsWith(.html) || // index .html files file.getPath().endsWith(.htm) || // index .htm files file.getPath().endsWith(.txt)) { // index .txt files It the Extensions u have is within the 'endsWith' options then u have sucessfully indexed the 6000 Documents of u's Try to use the Luke Monitering S/f avaliable from the Jakartha Lucene Web site and check for the same [Hint Try to use the SearchFiles.class from the Lucene1.4.3.zip to search onthe documents u have indexed sucessfuly] with regards Karthik -Original Message- From: Hetan Shah [mailto:[EMAIL PROTECTED] Sent: Friday, December 17, 2004 12:30 AM To: Lucene Users List Subject: Indexing with Lucene 1.4.3 Hello, I have been trying to index around 6000 documents using IndexHTML from 1.4.3 and at the end of indexing in my index directory I only have 3 files. segments deletable and _5en.cfs Can someone tell me what is going on and where are the actual index files? How can I resolve this issue? Thanks. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
NUMERIC RANGE BOOLEAN
Hi Guys Apologies. Can some body Please Tell me Why is this Happening and any work around for the same .??? Constructed String : +bags +itemPrice:[0003 TO 0020] Query String: +contents:bags +itemPrice:[0003 TO 0020] org.apache.lucene.search.BooleanQuery$TooManyClauses at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:79) at org.apache.lucene.search.BooleanQuery.add(BooleanQuery.java:71) at org.apache.lucene.search.RangeQuery.rewrite(RangeQuery.java:99) at org.apache.lucene.search.BooleanQuery.rewrite(BooleanQuery.java:243) at org.apache.lucene.search.IndexSearcher.rewrite(IndexSearcher.java:166) WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: NUMERIC RANGE BOOLEAN
Hi Erik Yes this is Happening in our case and we are using Lucene1.4.3 with same Sys config from my Ex mail With regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, December 16, 2004 3:17 PM To: Lucene Users List Subject: Re: NUMERIC RANGE BOOLEAN On Dec 16, 2004, at 4:07 AM, Karthik N S wrote: Can some body Please Tell me Why is this Happening and any work around for the same .??? Constructed String : +bags +itemPrice:[0003 TO 0020] Query String: +contents:bags +itemPrice:[0003 TO 0020] org.apache.lucene.search.BooleanQuery$TooManyClauses TooManyClauses exception occurs when a query such as a RangeQuery expands to more than 1024 terms. I don't see how this could be the case in the query you provided - are you certain that is the query that generated the error? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
LUCENE1.4.1 - LUCENE1.4.2 - LUCENE1.4.3 Exception
Hi Guys Some body tell me what this Exception am Getting Pleae Sys Specifications O/s Linux Gentoo Appserver Apache Tomcat/4.1.24 Jdk build 1.4.2_03-b02 Lucene 1.4.1 ,2, 3 Note: - This Exception is displayed on Every 2nd Query after Tomcat is started java.io.IOException: Stale NFS file handle at java.io.RandomAccessFile.readBytes(Native Method) at java.io.RandomAccessFile.read(RandomAccessFile.java:307) at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:420) at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61) at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(Compou ndFileReader.java:220) at org.apache.lucene.store.InputStream.refill(InputStream.java:158) at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83) at org.apache.lucene.index.SegmentTermEnum.readTerm(SegmentTermEnum.java:142) at org.apache.lucene.index.SegmentTermEnum.next(SegmentTermEnum.java:115) at org.apache.lucene.index.TermInfosReader.scanEnum(TermInfosReader.java:143) at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:137) at org.apache.lucene.index.SegmentReader.docFreq(SegmentReader.java:253) at org.apache.lucene.search.IndexSearcher.docFreq(IndexSearcher.java:69) at org.apache.lucene.search.Similarity.idf(Similarity.java:255) at org.apache.lucene.search.TermQuery$TermWeight.sumOfSquaredWeights(TermQuery. java:47) at org.apache.lucene.search.Query.weight(Query.java:86) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:85) at org.apache.lucene.search.MultiSearcherThread.run(ParallelMultiSearcher.java: 251) WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: HITCOLLECTOR+SCORE+DELIMMA
Hi Vikas Gupta Since Erik Replied to me on my last mail, A FILTER cand be built for the same can be to fetch scrores between 0.2f to 1.0f. Can u please spare me some code for the same. [ Sorry for the Spell mistake, My Mail IDE does not have one ] With regards Karthik -Original Message- From: Vikas Gupta [mailto:[EMAIL PROTECTED] Sent: Monday, December 13, 2004 3:17 PM To: Lucene Users List Subject: RE: HITCOLLECTOR+SCORE+DELIMA On Dec 10, 2004, at 7:39 AM, Karthik N S wrote: I am still in delima on How to use the HitCollector for returning Hits hits between scores 0.2f to 1.0f , There is not a simple example for the same, yet lot's of talk on usage for the same on the form. 1) I am not 100% sure about this but it might work. Add the code starting with in IndexSearcher.java::search() // inherit javadoc public TopDocs search(Query query, Filter filter, final int nDocs) throws IOException { Scorer scorer = query.weight(this).scorer(reader); if (scorer == null) return new TopDocs(0, new ScoreDoc[0]); final BitSet bits = filter != null ? filter.bits(reader) : null; final HitQueue hq = new HitQueue(nDocs); final int[] totalHits = new int[1]; scorer.score(new HitCollector() { public final void collect(int doc, float score) { if (score 0.0f // ignore zeroed buckets score 0.2f score1.0f) (bits==null || bits.get(doc))) {// skip docs not in bits totalHits[0]++; hq.insert(new ScoreDoc(doc, score)); } } }); 2) Filter examples are in Lucene in Action book, Chapter 5. I wrote an example as well: String query = odyssey; BooleanQuery bq = new BooleanQuery(); bq.add(new TermQuery(new Term(content, query)), true, false); BooleanQuery bqf = new BooleanQuery(); bqf.add(new TermQuery(new Term(H2, query)), true, false); Filter f = new QueryFilter(bqf); IndexReader reader = IndexReader.open(new File(dir, index).getCanonicalPath()); Searcher luceneSearcher = new org.apache.lucene.search.IndexSearcher(reader); luceneSearcher.setSimilarity(new NutchSimilarity()); //Logically the following would be executed as follows: Find all //the docs matching bq. Select the ones which matchbqf hits = luceneSearcher.search(bq, f); System.out.print(query: + query); System.out.println(Total hits: + hits.length()); 3) delima is spelled as dilemma -Vikas Gupta - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: HITCOLLECTOR+SCORE+DELIMMA
Hi Erik What exactly do u mean by this We've emphasized numerous times that calling hits.doc(i) is a resource hit. Don't do it for documents you aren't going to show. To filter by score, use hits.score(i) first. I am bit Confused u mean to say Replace hits.doc(i) by hits.score(i) Also Ah, so you are accessing every document to get this field information. It is incorrect that you cannot filter prior to getting hits. You have a couple of options in filtering by a field value - use a QueryFilter . or simply AND a RangeQuery to the original query. Since the portal we ar building for is a eCommerce one, We have to return SearchWord across ( 7 ) x 1000 x 15000 documents , Get most of the Relevant His (Where ever Score is between 0.5 to 1.0 ) and then Sort the adjecent Fields 'Vendors' and 'Price' in ASC Order In such a case We cannot use RangeQuery without priorly knowing what exactly the Consumer want's Is it not possible to have a Generalized Filter in further versions of API , to Inject some minor factors prior to getting the Hits returned. Thx in advance Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, December 14, 2004 3:44 PM To: Lucene Users List Subject: Re: HITCOLLECTOR+SCORE+DELIMMA On Dec 13, 2004, at 11:16 PM, Karthik N S wrote: time [ A simple search of 'handbags' returned 1,60,000 hits and time taken was 440 secs ,in production Env / May be our Coding is poor,But we are constantly improving the process ]. If your searches are taking 440 seconds, you have something more fundamentally wrong. You are either doing some large wildcard/range/fuzzy expansions or you're accessing every document from all your hits. Is the searcher.search() method taking that long? I bet not. Or rather is it the iteration over the Hits that is killing the search time, which is what I suspect? We've emphasized numerous times that calling hits.doc(i) is a resource hit. Don't do it for documents you aren't going to show. To filter by score, use hits.score(i) first. { O/s Linux Gentoo , RAM 1GB, Lucene1.4.1,Appserver = Tomcat5, and BlackDawn Java 1.4.2 with Args -XX:+UseParallelGC for Garbage Collection } Please narrow your code down to a clean, succinct example that you can post. It is difficult to help you without details of your code (but let me emphasize again - it needs to be clean and succinct so it is quick for us to get a handle on). To be One step in advance ,We also have an adjecent Fields 'Vendor ','Price' which we have to accordingly Compare Best/Poor/Least results . So We have to have to limit the hits accordingly,since Lucene API does not provide any way to inject this limiting facility *prior* to getting the hits . Ah, so you are accessing every document to get this field information. It is incorrect that you cannot filter prior to getting hits. You have a couple of options in filtering by a field value - use a QueryFilter or simply AND a RangeQuery to the original query. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: HITCOLLECTOR+SCORE+DELIMMA
Hi Erik Apologies... In this Mailed http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED] che.orgmsgNo=11254 I have already told u that doc.get( ); was coming in batches for a mear hit of '4000' , and this is happening in real time [ A simple search of 'handbags' returned 1,60,000 hits and time taken was 440 secs ,in production Env / May be our Coding is poor,But we are constantly improving the process ]. { O/s Linux Gentoo , RAM 1GB, Lucene1.4.1,Appserver = Tomcat5, and BlackDawn Java 1.4.2 with Args -XX:+UseParallelGC for Garbage Collection } To be One step in advance ,We also have an adjecent Fields 'Vendor ','Price' which we have to accordingly Compare Best/Poor/Least results . So We have to have to limit the hits accordingly,since Lucene API does not provide any way to inject this limiting facility *prior* to getting the hits . [ Excuse me Nader Henein ,I am from a Lucene-Users Form NOT in Lucene-Developer's Form, So we expect a Least possible Help ] With Warm Regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, December 13, 2004 6:39 PM To: Lucene Users List Subject: Re: HITCOLLECTOR+SCORE+DELIMMA On Dec 13, 2004, at 6:58 AM, Karthik N S wrote: Iterate over Hits. returns large hit values and Iteration on Hits for scores consumes time , so How Do I Limit my Search Between [ X.xf to Y.yf ] prior getting the Hits. Why do you need to do this *prior* to getting Hits? You have yet to justify what you're asking. I almost guarantee you that navigating Hits in the way I said will be as fast as you need it to be. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: HITCOLLECTOR+SCORE+DELIMA
Hi Guys Apologies.. So u say I have to Build a Filter to Collect all the Scores between the 2 Ranges [ 0.2f to 1.0f] so the API for the same would be Hits hit = search(Query query, Filter filtertoGetScore) But while writing the Filter Score again depends on Hits Score = hits.score(x); How To solve this Or Am I in Wrong Process Any Simple Src for the same will be greatly appreciated. :) Thx in advance -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Friday, December 10, 2004 6:54 PM To: Lucene Users List Subject: Re: HITCOLLECTOR+SCORE+DELIMA On Dec 10, 2004, at 7:39 AM, Karthik N S wrote: I am still in delima on How to use the HitCollector for returning Hits hits between scores 0.2f to 1.0f , There is not a simple example for the same, yet lot's of talk on usage for the same on the form. Unfortunately there isn't a clean way to stop a HitCollector - it will simply collect all hits. Also, scores are _not_ normalized when passed to a HitCollector, so you may get scores 1.0. Hits, however, does normalize and you're guaranteed that scores will be = 1.0. Hits are in descending score order, so you may just want to use Hits and filter based on the score provided by hits.score(i). Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
HITCOLLECTOR+SCORE+DELIMA
Hi guys Apologies. I am still in delima on How to use the HitCollector for returning Hits hits between scores 0.2f to 1.0f , There is not a simple example for the same, yet lot's of talk on usage for the same on the form. Please somebody spare a bit of code (u'r intelligence) on this form. Thx in advance Karthik WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
SEARCH +HITS+LIMIT
Hi Guy's Apologies... One question for the form [ Especially Erik] 1) I have a MERGED Index with 100,000 File Indexed into it ( Content is one of the Fields of Type 'Text' ) 2) On search for a simple words Camera returns me 6000 hits. 3) Since the Search process is via WebApps , a simple JSP is used to display the Content. Question How to Display the Contents for the Hits in Incremental order ? [ Each Time a re hit to the Mergerindex with Incremental X value ]. This would solve the problem of Out of Memory by prefetching all the hit in one strait go process. Ex: Total hits 6000 1st page - hit's returned (1 to 25) 2nd page - hit's returned (26 to 50) . . . . N th page hit's returned ( 5975 - 6000 ) Hint : - This is similar to a SQL query SELECT * FROM LUCENE LIMIT 10, 5 WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: LUCENE + 1.4.2
Hi Erik Apologies... This mean's that Issues w.r.t 1.4.2 and 1.4.1 are fixed in 1.4.3 as of presently, 1) So u say We can retrospectively move our under Developemental Code from to 1.4.3 from 1.4.1 safetly ?. 2) Do we need to Reindex All Of Our Code done via 1.4.1 or continue with replacement of 1.4.3.jar alone ?. 3) Can we also Have some Announcement on the Form from time to time when ever the new and final Versions are released ?. Thx in advance. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Monday, December 06, 2004 3:39 PM To: Lucene Users List Subject: Re: LUCENE + 1.4.2 On Dec 6, 2004, at 1:22 AM, Karthik N S wrote: I am not able to find the FINAL Lucene 1.4.2 SRC any where on http://jakarta.apache.org/lucene/docs/index.html Please can some Body Reply the Form with the URL. Actually Lucene 1.4.3 is now available and I recommend you use it instead, through the official Jakarta binary downloads. We're still tidying up some loose ends on it before officially announcing it, but you can find it here: http://www.apache.org/dist/jakarta/lucene/binaries/ We did not release 1.4.1 or 1.4.2 properly, so those binaries are not available there currently. Whether we retroactively put them there is still undecided. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
LUCENE + 1.4.2
Hi Guys. Apologies... I am not able to find the FINAL Lucene 1.4.2 SRC any where on http://jakarta.apache.org/lucene/docs/index.html Please can some Body Reply the Form with the URL. WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: UNIQUE FIELD NAMES + SEARCH
Hi Erik Apologies... Thx That src worked perfectly. Wow that really overcame a huge boulder for me. .. :| with regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Thursday, December 02, 2004 4:23 PM To: Lucene Users List Subject: Re: UNIQUE FIELD NAMES + SEARCH On Dec 2, 2004, at 2:13 AM, Karthik N S wrote: I My Index, I have a Filed Type KeyWord ' FILE_NAME ' , It Captures UNIQUE FOLDER NAME'S [ Starts with B1,B2,B3. ] During Indexing Process. Please Can SomeBody Tell me How to Display ALL the FOLDER NAMES from the Field 'FILE_NAME' With out any Search Word I guess if you keep asking, someone will eventually answer :) Here's an example I use to get all categories from the Lucene index that drives my blog at http://www.blogscene.org/erik Set categories = new TreeSet(); IndexReader reader = IndexReader.open(indexDir); try { TermEnum terms = reader.terms(new Term(category, )); while (category.equals(terms.term().field())) { categories.add(terms.term().text()); if (!terms.next()) { break; } } } finally { reader.close(); } IndexReader is enumerating all the terms in the category field (you'll use your filename field name instead). [ Can I use 'B* ' for Search Exclusively on the Field Type ] Sure, but that would require that you walk every document returned from the search and pull its filename field. This would be vastly slower than the above code that goes directly to the terms. My apologies for not replying sooner on this. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: GETVALUES +SEARCH
Hi Erik Apologies.. We create a ArrayList Object and Load all the Hit Values into them and return the same for Display purpose on a Servlet. On the servlet we track the server side created ArrayList for Required number of dispalys. [ At any time we have to have all the hit values loaded into the arryList ,cannot compromise for the same ] We Obsorved that the doc.get() was not continous for an hit of 4000 and was coming in batches, So any new API features will definetly helps us. With regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 4:04 PM To: Lucene Users List Subject: Re: GETVALUES +SEARCH On Dec 1, 2004, at 12:41 AM, Karthik N S wrote: Is there any API in Lucene Which can retrieve all the searched Values in single fetch into some sort of an 'Array' WITHOUT using this [ below ] Looping process [ This would make the Search and display more Faster ]. for (int i = 0; i hits.length();i++) { Document doc = hits.doc(i); String path = doc.get(path); . } Are you really showing *all* results at one time? Or just the first several? Iterating over all hits and retrieving each Document is often unwise and generally unnecessary if only the first 20 or so are shown at first. I don't know of a simpler way to get all the path values in your example. Perhaps a HitCollector is more to your liking? Though it probably would not speed anything up for you. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
UNIQUE FIELD NAMES + SEARCH
Hi Guys' Apologies I My Index, I have a Filed Type KeyWord ' FILE_NAME ' , It Captures UNIQUE FOLDER NAME'S [ Starts with B1,B2,B3. ] During Indexing Process. Please Can SomeBody Tell me How to Display ALL the FOLDER NAMES from the Field 'FILE_NAME' With out any Search Word [ Can I use 'B* ' for Search Exclusively on the Field Type ] Thx in Advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
SEARCH CRITERIA
Hi Guys Apologies. On yahoo and Altavista ,if searched upon a word like 'kid' returns the search with similar as below. Also try: kid rock, kid games, star wars kid, karate kid More... How to obtain the similar search criteria using Lucene. Thx in advance Warm regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
GETVALUES +SEARCH
Hi Guys Apologies. On Search API the command [ package org.apache.lucene.document.Document ] Will this'public final String[] getValues(String name)' return me all the docs with out looping thru ? Please Explaine with example. Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: GETVALUES +SEARCH
Hi Guys Apologies... Is there any API in Lucene Which can retrieve all the searched Values in single fetch into some sort of an 'Array' WITHOUT using this [ below ] Looping process [ This would make the Search and display more Faster ]. for (int i = 0; i hits.length();i++) { Document doc = hits.doc(i); String path = doc.get(path); . } Thx in Advance Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 30, 2004 8:06 PM To: Lucene Users List Subject: Re: GETVALUES +SEARCH On Nov 30, 2004, at 7:10 AM, Karthik N S wrote: On Search API the command [ package org.apache.lucene.document.Document ] Will this'public final String[] getValues(String name)' return me all the docs with out looping thru ? getValues(fieldName) returns a String[] of the values of the field. It's similar to getValue(fieldName). If you index a field multiple times: doc.add(Field.Keyword(keyword, one)); doc.add(Field.Keyword(keyword, two)); getValue(keyword) will return one, but getValues(keyword) will return a String[] {one, two} If you want to retrieve all documents, use IndexReader's various API methods. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
UNIQUE FILE SEARCH
Hi Guy's Apologies. I have a Index with one of the fields is FieldType 'KeyWord' . To this Field I add UNIQUE File Names . On Search How can I display All the File names with out any SearchKeyword ?. Thx in Advance. WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
MERGERINDEX + SOLUTION
Hi Guys Apologies I have a MERGERINDEX [ Merged 1000 subindexes] , The Question is Does Somebody have any solution for recorrecting the Mergerindex [ in case of Corruption ] If so Please Let the Form know about this,so developers like us would use the same. Thx in Advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
X-Thread Message
Hi Guys Apologies Does Any body have any Improved Suggestion for the Thread http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED] he.orgmsgId=1992830 I am still in Delima WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: COUNT SUBINDEX [IN MERGERINDEX]
Hi Guys Apologies.. I am Still Confused.. ;( Let me make it more simple Question On using Search from a Index without any SearchWord, I would like to count the total number of Documents present in it. [ I Only have the Field Types 'Field.Keyword' which stores the Unique filename ] Will IndexReader.termDocs(term) give me the Count for the same. If so How To use it... Please Thx in advance. Karthik -Original Message- From: Paul Elschot [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 17, 2004 2:02 PM To: [EMAIL PROTECTED] Subject: Re: COUNT SUBINDEX [IN MERGERINDEX] On Wednesday 17 November 2004 07:10, Karthik N S wrote: Hi guy's Apologies. So A Mergeed Index is again a Single [ addition of subIndexes... ), If that case , If One of the Field Types is of type 'Field.Keyword' whic is Unique across the subIndexes [Before Merging]. and If I want to Count this Unique Field in a MergerIndex [After i'ts been Merged ] How do I do this Please. IndexReader.numDocs() will give the number of docs in an index. Lucene has no direct support for unique fields. After merging, if the same unique field value occurs in both source indexes, the merged index will contain two documents with that value. In case one wants to merge into unique field values, the non unique values in one of the source indexes need to be deleted before merging. See IndexReader.termDocs(term) on how to get the document numbers for (unique) terms via a TermDocs, and IndexReader.delete(docNum) for deleting docs. Regards, Paul. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: COUNT SUBINDEX [IN MERGERINDEX]
Hi guy's Apologies. So A Mergeed Index is again a Single [ addition of subIndexes... ), If that case , If One of the Field Types is of type 'Field.Keyword' whic is Unique across the subIndexes [Before Merging]. and If I want to Count this Unique Field in a MergerIndex [After i'ts been Merged ] How do I do this Please. Ex SubIndex1 = filename1,filenam2,filenam3 SubIndex2 = filename4,filenam5,filenam6 MergerIndex1 = filename1,filenam2,filenam3, filename4,filenam5,filenam6 [From MergerIndex] Count = 6 nos Something like the above Thx in Advance -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 17, 2004 10:30 AM To: Lucene Users List Subject: Re: COUNT SUBINDEX [IN MERGERINDEX] Once the index is merged there is only 1 index - there are no subindices. Otis --- Karthik N S [EMAIL PROTECTED] wrote: Hi Guys, Apologies . Can Some body Tell me which API to use to Count the number of SubIndexe's in a MERGED Index. Thx in Advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene1.4.1 + OutOf Memory
Hi Guy's Apologies . I am NOT Using sorting code hits = multiSearcher.search(query, new Sort(new SortField(filename, SortField.STRING))); but using multiSearcher.search(query) in Core Files setup and still getting the Error. More Advises Required.. Karthik -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 12:46 PM To: Lucene Users List Subject: Re: Lucene1.4.1 + OutOf Memory There is a memory leak in the sorting code of Lucene 1.4.1. 1.4.2 has the fix! --- Karthik N S [EMAIL PROTECTED] wrote: Hi Guys Apologies.. History Ist type : 4 subindexes + MultiSearcher + Search on Content Field Only for 2000 hits = Exception [ Too many Files Open ] IInd type : 40 Mergerd Indexes [1000 subindexes each] + MultiSearcher /ParallelSearcher + Search on Content Field Only for 2 hits = Exception [ OutOf Memeory ] System Config [same for both type] Amd Processor [High End Single] RAM 1GB O/s Linux ( jantoo type ) Appserver Tomcat 5.05 Jdk [ IBM Blackdown-1.4.1-01 ( == Jdk1.4.1) ] Index contains 15 Fields Search Done only on 1 field Retrieve 11 corrosponding fields 3 Fields are for debug details Switched from Ist type to IInd Type Can some body suggest me Why is this Happening Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene1.4.1 + OutOf Memory
Hi Guy's Apologies. That's Why Somebody on the form asked me to Switch to : 40 Mergerd Indexes [1000 subindexes each] + MultiSearcher / ParallelSearcher + Search on Content Field Only for 2 the problem of to many Files open was solved since now there were only 40 MergerIndexes - [1 MergerIndex has 1000 sub indexes] instead of 4 subindexes. Now I am gettinf Out of Memory Exception. Any Idea On how to Solve this problem. Thx in Advance -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 2:16 PM To: Lucene Users List Subject: RE: Lucene1.4.1 + OutOf Memory Exception too many files open means: - searcher object is nor closed after query execution - too little file handlers Regards J. Karthik N S [EMAIL PROTECTED]To: Lucene Users List [EMAIL PROTECTED], et.co.in [EMAIL PROTECTED] cc: (bcc: Iouli Golovatyi/X/GP/Novartis) 10.11.2004 09:41 Subject: RE: Lucene1.4.1 + OutOf Memory Please respond to Lucene UsersCategory: |-| List| ( ) Action needed | | ( ) Decision needed | | ( ) General Information | |-| Hi Guy's Apologies . I am NOT Using sorting code hits = multiSearcher.search(query, new Sort(new SortField(filename, SortField.STRING))); but using multiSearcher.search(query) in Core Files setup and still getting the Error. More Advises Required.. Karthik -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 12:46 PM To: Lucene Users List Subject: Re: Lucene1.4.1 + OutOf Memory There is a memory leak in the sorting code of Lucene 1.4.1. 1.4.2 has the fix! --- Karthik N S [EMAIL PROTECTED] wrote: Hi Guys Apologies.. History Ist type : 4 subindexes + MultiSearcher + Search on Content Field Only for 2000 hits = Exception [ Too many Files Open ] IInd type : 40 Mergerd Indexes [1000 subindexes each] + MultiSearcher /ParallelSearcher + Search on Content Field Only for 2 hits = Exception [ OutOf Memeory ] System Config [same for both type] Amd Processor [High End Single] RAM 1GB O/s Linux ( jantoo type ) Appserver Tomcat 5.05 Jdk [ IBM Blackdown-1.4.1-01 ( == Jdk1.4.1) ] Index contains 15 Fields Search Done only on 1 field Retrieve 11 corrosponding fields 3 Fields are for debug details Switched from Ist type to IInd Type Can some body suggest me Why is this Happening Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene1.4.1 + OutOf Memory
Hi Guy's Apologies.. Yes Erik The Day I switched from Lucene1.3.1 to Lucene1.4.1 We are using the CompoundFile format to writer.setUseCompoundFile(true); Some More Advises Please. Thx in advance -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 3:05 PM To: Lucene Users List Subject: Re: Lucene1.4.1 + OutOf Memory On Nov 10, 2004, at 1:55 AM, Karthik N S wrote: Hi Guys Apologies.. No need to apologize for asking questions. History Ist type : 4 subindexes + MultiSearcher + Search on Content Field You've got 40,000 indexes aggregated under a MultiSearcher and you're wondering why you're running out of memory?! :O Exception [ Too many Files Open ] Are you using the compound file format? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Lucene1.4.1 + OutOf Memory
files not being deleted after 1.4.1. Not sure if that could cause the problems you're experiencing. Regards Daniel Well, it seems not to be files, it looks more like those SegmentTermEnum objects accumulating in memory. #I've seen some discussion on these objects in the developer-newsgroup that had taken place some time ago. I am afraid this is some kind of runaway caching I have to deal with. Maybe not correctly addressed in this newsgroup, after all... Anyway: any idea if there is an API command to re-init caches? Thanks, Daniel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: 10 November 2004 09:35 To: Lucene Users List Subject: Re: Lucene1.4.1 + OutOf Memory On Nov 10, 2004, at 1:55 AM, Karthik N S wrote: Hi Guys Apologies.. No need to apologize for asking questions. History Ist type : 4 subindexes + MultiSearcher + Search on Content Field You've got 40,000 indexes aggregated under a MultiSearcher and you're wondering why you're running out of memory?! :O Exception [ Too many Files Open ] Are you using the compound file format? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
LUCENE + DATA RETRIVAL
Hi guys, Apologies... Has any one on the form attempted to retrieved data and Indexed Macromedia FLASH based Files If there is some example please distrubute ,it may be usefull for developer's. Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene1.4.1 + OutOf Memory
Hi Guys Apologies.. History Ist type : 4 subindexes + MultiSearcher + Search on Content Field Only for 2000 hits = Exception [ Too many Files Open ] IInd type : 40 Mergerd Indexes [1000 subindexes each] + MultiSearcher /ParallelSearcher + Search on Content Field Only for 2 hits = Exception [ OutOf Memeory ] System Config [same for both type] Amd Processor [High End Single] RAM 1GB O/s Linux ( jantoo type ) Appserver Tomcat 5.05 Jdk [ IBM Blackdown-1.4.1-01 ( == Jdk1.4.1) ] Index contains 15 Fields Search Done only on 1 field Retrieve 11 corrosponding fields 3 Fields are for debug details Switched from Ist type to IInd Type Can some body suggest me Why is this Happening Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
UPDATION+MERGERINDEX
Hi Guys Apologies. a) 1) SEARCH FOR SUBINDEX IN A OPTIMISED MERGED INDEX 2) DELETE THE FOUND SUBINDEX FROM THE OPTIMISED MERGERINDEX 3) OPTIMISE THE MERGERINDEX 4) ADD A NEW VERSION OF THE SUBINDEX TO THE MERGER INDEX 5) OPTIMISE THE MERGERINDEX b) 1) SEARCH FOR SUBINDEX IN A OPTIMISED MERGED INDEX 2) DELETE THE FOUND SUBINDEX FROM THE OPTIMISED MERGERINDEX 3) ADD A NEW VERSION OF THE SUBINDEX TO THE MERGER INDEX 4) OPTIMISE THE MERGERINDEX a OR b WHICH IS BETTER CHOICE THX IN ADVANCE WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
INDEXREADER + DELETE + LUCENE1.4.1
Hi Guy's Apologies There seems to be a bug unresolved [ Or may I be may be doing something wrong ] in IndexReader.delete(int docNum) Here is the Code indexSearcher = null; indexDirectory = null; indexReader = null; indexDirectory =FSDirectory.getDirectory(/root/MERGEDINDEX/MERGER_1,false); indexReader = IndexReader.open(indexDirectory); IndexReader.unlock(indexDirectory); indexSearcher = new IndexSearcher(indexReader); query = new TermQuery(new Term(fieldName, FiledValue)); hits = indexSearcher.search(query); if ( hits.length() 0 ) { for(int k=0;k=hits.length();k++) { PRINTDBG_.append(QUERY : + query.toString() + \n + FIELD NAME : + fieldName + \n + FIELD VALUE: + FiledValue + \n + TOTAL HITS : + hits.length() + \n + DELETING : + k); indexReader.delete(k); } } indexReader.close(); indexSearcher.close(); indexDirectory.close(); System.out.printl( Debugger : +PRINTDBG_); indexReader = null; indexSearcher = null; indexDirectory = null; //optimization indexDirectory = FSDirectory.getDirectory(pathMergeIndex,false); IndexWriter writer = new IndexWriter(indexDirectory, analyzer, false); writer.mergeFactor = mergeFactorVal_; writer.maxMergeDocs = maxMergeDocsVal_; writer.optimize(); writer.close(); indexDirectory = null; writer = null; In spite of Using a new IndexReader for every Deletion of documents and Optimization's The 'indexReader.delete(k)' does not seems to work Configuration History a) 1 MergerIndex = 1000 subIndexes [ fieldName = KeyWord Field Type] b) O/s Windows c) Amd Processor e) Lucene 1.4.1 f) Jdk 1.4.2 Please Some body Suggest me For Alternates WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
FW: Searchable Solutions Please
Hi Guy's Apologies. I am little Confused with the Search Factor. If the Search Word 'kid' is suppose to return me kid , kid's , kidoos, children 1) Do I need to use Combination of more then one Analysers ??? , If so How. 2) Any Alternate modification to be done for the simple Searcher methods. ?? Thx in advance. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, October 28, 2004 8:55 PM To: [EMAIL PROTECTED] Subject: RE: Searchable Solutions Please A quick pointer.. What you want to look at is using a stemming implementation. Look, for example, at the FAQ and docs related to the PorterStemFilter and writing A customer analyzer (http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi?file=chapter.index ingtoc=faq#q17). There is a lot of information regarding this but you'll need the same analyzer for index and query and this would be more or less English only. -George -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: Thursday, October 28, 2004 1:47 AM To: LUCENE Subject: Searchable Solutions Please Hi Guys Aplologies On a Using the Lucene Search , If returned hits for the following is to be aquired Search Word =' kids watches ' Hits on docs returned should have =kid's , kid watch , junior watches Solution's Please Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
LUCENE INDEX STATISTICS
Hi Guys Apologies. Can some body provide approximate Statics about the following factor for Developement and Deployment of Lucene [ it may be usefull for Pro's Developers ] a) Creation Indexing 1) X [ Say 100 Million ] of number of documents Y [ Kilobytes ] with Z no of Fields Hardware requirement [ RAM / Os / Processor / HardDisk Space / Other Specific Details ] Software [ Jdk Version / Lucene Version / Appserver Version ] 2) X [Say 100 Million] number to create Merged Indexes Hardware requirement [ RAM / Os / Processor / HardDisk Space / Other Specific Details ] Software [ Jdk Version / Lucene Version / Appserver Version ] b)Searching on Indexes [ 2 number of Persons Searching per Sec ] 1) X [ Say 100 Million ] of number of documents Y [ Kilobytes ] with Z no of Fields Hardware requirement [ RAM / Os / Processor / HardDisk Space / Other Specific Details ] Software [ Jdk Version / Lucene Version / Appserver Version ] 2)X [Say 100 Million] number of Merged Indexes Hardware requirement [ RAM / Os / Processor / HardDisk Space / Other Specific Details ] Software [ Jdk Version / Lucene Version / Appserver Version ] Thx in Advance Karthik WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Range Query
Hi Guys Apologies Please Correct me If I am wrong, with refrenc to http://issues.apache.org/eyebrowse/ReadMsg?listId=30msgNo=7103 I will have to Re - Index all my 1 Million subindexes with the 'Price FieldType' padded of to standard no of '0' s. So can use the code modified while Searching to find the range of Query... [ Is there any other way to handle this Only during SearchProcesss... ] Please some more Advise:( Thx in advance. -Original Message- From: Chuck Williams [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 20, 2004 8:06 PM To: Lucene Users List Subject: RE: Range Query Karthik, It is all spelled out in a Lucene HowTo here: http://wiki.apache.org/jakarta-lucene/SearchNumericalFields Have fun with it, Chuck -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 20, 2004 12:15 AM To: Lucene Users List; Jonathan Hager Subject: RE: Range Query Hi Jonathan When searching I also pad the query term ??? When Exactly are u handling this [ using During Indexing Process Also or while Search on Process Only ] Can u be Please be specific. [ if time permits and possible please can u send me the sample Code for the same ] . :) Thx in advance -Original Message- From: Jonathan Hager [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 20, 2004 3:31 AM To: Lucene Users List Subject: Re: Range Query That is exactly right. It is searching the ASCII. To solve it I pad my price using a method like this: /** * Pads the Price so that all prices are the same number of characters and * can be compared lexigraphically. * @param price * @return */ public static String formatPriceAsString(Double price) { if (price == null) { return null; } return PRICE_FORMATTER.format(price.doubleValue()); } where PRICE_FORMATTER contains enough digits for your largest number. private static final DecimalFormat PRICE_FORMATTER = new DecimalFormat(000.00); When searching I also pad the query term. I looked into hooking into QueryParser, but since the lower/upper prices for my application are different inputs, I choose to handle them without hooking into the QueryParser. Jonathan On Tue, 19 Oct 2004 12:35:06 +0530, Karthik N S [EMAIL PROTECTED] wrote: Hi Guys Apologies. I have a Field Type Text 'ItemPrice' , Using it to Store Price Factor in numeric such as 10, 25.25 , 50.00 If I am suppose to Find the Range factor between 2 prices ex - Contents:shoes +ItemPrice:[10.00 TO 50.60] I get results other then the Range that has been executed [This may be due to query parsing the Ascii values instead of numeric values ] Am I am missing something in the Querry syntax or Is this the wrong way to construct the Query. Please Somebody Advise me ASAP. :( Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Analysis Re visited
Hi Guys Apologies... Can some body tell me ,What I have been doing wrong on the Lucene basics. : ( [6 months 15 days] Using Lucene 1.4.1 O/s Win/Linux Ram 1GB I used a modified version of StandardAnalyzer.java [ called it GrammerAnalyzer.java ] and added Symbols '$,@,#,' to the same, Also when added this Analyzer to AnalysisDemo.java avaliable from web site http://today.java.net/pub/a/today/2003/07/30/LuceneIntro.html?page=last#thre ad 1) On search of '$100.50' the AnalysisDemo returned for the analyzer used as '[100.00]' 2)So I Used the same Analyzer for Indexing Purpose / Searching Purpose. 3)On Hacking the Luke's src [added the same GrammerAnalyzer] file avaliable from http://www.getopt.org/luke/ When I looked at the File containing for the same values , I was surpriced to find '$100.50' instead of 100.50 Please Somebody Advise me.. Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Range Query
Hi Jonathan When searching I also pad the query term ??? When Exactly are u handling this [ using During Indexing Process Also or while Search on Process Only ] Can u be Please be specific. [ if time permits and possible please can u send me the sample Code for the same ] . :) Thx in advance -Original Message- From: Jonathan Hager [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 20, 2004 3:31 AM To: Lucene Users List Subject: Re: Range Query That is exactly right. It is searching the ASCII. To solve it I pad my price using a method like this: /** * Pads the Price so that all prices are the same number of characters and * can be compared lexigraphically. * @param price * @return */ public static String formatPriceAsString(Double price) { if (price == null) { return null; } return PRICE_FORMATTER.format(price.doubleValue()); } where PRICE_FORMATTER contains enough digits for your largest number. private static final DecimalFormat PRICE_FORMATTER = new DecimalFormat(000.00); When searching I also pad the query term. I looked into hooking into QueryParser, but since the lower/upper prices for my application are different inputs, I choose to handle them without hooking into the QueryParser. Jonathan On Tue, 19 Oct 2004 12:35:06 +0530, Karthik N S [EMAIL PROTECTED] wrote: Hi Guys Apologies. I have a Field Type Text 'ItemPrice' , Using it to Store Price Factor in numeric such as 10, 25.25 , 50.00 If I am suppose to Find the Range factor between 2 prices ex - Contents:shoes +ItemPrice:[10.00 TO 50.60] I get results other then the Range that has been executed [This may be due to query parsing the Ascii values instead of numeric values ] Am I am missing something in the Querry syntax or Is this the wrong way to construct the Query. Please Somebody Advise me ASAP. :( Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Downloading Full Copies of Web Pages
Hi Try nutch [ http://www.nutch.org/docs/en/about.html ] underneath it uses Lucene :) -Original Message- From: Luciano Barbosa [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 20, 2004 3:06 AM To: [EMAIL PROTECTED] Subject: Downloading Full Copies of Web Pages Hi folks, I want to download full copies of web pages and storage them locally as well the hyperlink structures as local directories. I tried to use Lucene, but I've realized that it doesn't have a crawler. Does anyone know a software that make this? Thanks, - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
TestRangeQuery.java
Hi Does anybody have Trouble in Compiling TestRangeQuery.java in Eclipse 3.0 IDE, [ http://cvs.apache.org/viewcvs.cgi/jakarta-lucene/src/test/org/apache/lucene/ search ] Seem's there is an Error doc.add(new Field(id, id + docCount, Field.Store.YES, Field.Index.UN_TOKENIZED)); doc.add(new Field(content, content, Field.Store.NO, Field.Index.TOKENIZED)); Compiler Error is with Lucene1.4.1, Win O/s Field.Store.yes is not Found Thx in Advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Range Query
Hi Guys Apologies. I have a Field Type Text 'ItemPrice' , Using it to Store Price Factor in numeric such as 10, 25.25 , 50.00 If I am suppose to Find the Range factor between 2 prices ex - Contents:shoes +ItemPrice:[10.00 TO 50.60] I get results other then the Range that has been executed [This may be due to query parsing the Ascii values instead of numeric values ] Am I am missing something in the Querry syntax or Is this the wrong way to construct the Query. Please Somebody Advise me ASAP. :( Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Multi + Parallel
Hi Apologies.. Can somebody provide me Approximate answers [ Which is Better choice ] A search of 10,000 subindexes using multisearcher or a search on One Single Merged Index [ merged 10,000 Sub indexes ] a) SubIndexes 10,000 ( future) b) Field to be searche upon = 4 c)Field type present in Indexed format = 15 d) RAM = 1GB e) O/s Linux [ Clustered Enviournament] f) Processor make AMD [Probably High End] g) WebServer Tomcat 5.0.x 1)Which would be Faster ???; 2)If not What is may be the Probable Solution. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 13, 2004 3:53 PM To: Lucene Users List Subject: Re: Multi + Parallel On Oct 13, 2004, at 3:14 AM, Karthik N S wrote: I was Curious to Know the Difference between ParallelMultiSearcher and MultiSearcher , 1) Is the working internal functionality of these are same or different . They are different internally. Externally they should return identical results and not appear different at all. Internally, ParallelMultiSearcher searches each index in a separate thread (searches wait until all threads finish before returning). In MultiSearcher, each index is searched serially. You will not likely see a benefit to using ParallelMultiSearcher unless your environment is specialized to accommodate multi-threading (multiple CPU's, indexes on separate drives that can operate independently, etc). 2) In terms of time domain do these differ when searching same no of fields / words . 3)What are the features used on each of API. There is no external difference to using either implementation. Benchmark searches using both and see what is best, but generally MultiSeacher will be better in most environments as it avoids the overhead of starting up and managing multiple threads. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Multi + Parallel
Hi Guys Apologies.. I was Curious to Know the Difference between ParallelMultiSearcher and MultiSearcher , 1) Is the working internal functionality of these are same or different . 2) In terms of time domain do these differ when searching same no of fields / words . 3)What are the features used on each of API. Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Too many Open Files + lucene 1.4.1 + Linux O/s
Hi Apologies for the Long wait.. My Linux system on ulimit -a respresent core file size (blocks, -c) 0 data seg size (kbytes, -d) unlimited file size(blocks, -f) unlimited max locked memory (kbytes, -l) unlimited max memory size (kbytes, -m) unlimited open files(-n) 1024 pipe size (512 bytes, -p) 8 stack size (kbytes, -s) 8192 cpu time (seconds, -t) unlimited max user processes(-u) 1983 virtual memory (kbytes, -v) unlimited The Problem of Too many Open Files happens on every 2nd Search being done I think as u say open files(-n) 1024 should be increased... More Advises is Accepted greatefully Thx in advance -Original Message- From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED] Sent: Sunday, October 03, 2004 5:08 AM To: Lucene Users List Subject: Re: Too many Open Files + lucene 1.4.1 + Linux O/s Karthik N S wrote: Hi Luceners, Apologies. Other day was Trying to Search using the Luceneweb version with Lucene1-4-1.zip and O/s = Linux, J2SDK version 1.4.2_03-b02 With Roughly around 500 Documents (715116 kb ) Indexed using Lucene1.4-final.jar and writer.setUseCompoundFile(true); Here are a couple of possibilities: - the setUseCompoundFile(true) will only apply to indexes created (or optimized) after the option is set. All pre-existing indexes will still be in the multi-file format. - number of documents does not directly impact the number of files needed by Lucene. If the index is really in a compound file format (see above), and is optimized, you will need a fixed number of file handles. Even if the index is in a multi-file format, the number of files needed depends on the number of indexed *fields* in the index (not documents). - do you get the error on the first and every search or only once in a while? Perhaps where there are lots of concurrent users? Perhaps after you've done X searchers? - check your OS-level setting for the number of open files. This is shell/system-dependent somewhat, but ulimit -a should get you started. The number of open files should be large enough to allow for all files and sockets that your application needs to open. In a typical server-side Java app setting this value should be around 8000. Defaults are much smaller, so unless you have changed this, this may be the answer. - look into lsof utility. It can display all file handles in use by a given process. This is a good tool to troubleshoot too many open files issues. Good luck. Dmitry. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
IndexHTML parser + Constructer
Hi Apologies . Can Somebody Please tell me or how to include a constructer within 'org.apache.lucene.demo.html.HtmlParser.java' , So that using the Constructer read the String argument,Strips the HTML Tags and returns the String with out Tags. Currently 'org.apache.lucene.demo.html.HtmlParser.java' method accepts fullpath of the file and then reads the Content to Strip Tags.. Thx in Advance Karthik -Original Message- From: Daniel Naber [mailto:[EMAIL PROTECTED] Sent: Saturday, September 25, 2004 12:47 AM To: Lucene Users List Subject: Re: demo IndexHTML parser breaks unicode? On Friday 24 September 2004 19:58, Fred Toth wrote: I've got unicode in my source HTML. In particular, within meta tags, and it's getting broken by the indexer. Note that I'm not trying to query on any of this, just store and retrieve document titles with unicode characters. Please try again with the code from CVS, Christoph Goller committed a fix for this problem (at least I think it was this problem) 1-3 weeks ago. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Too many Open Files + lucene 1.4.1 + Linux O/s
Hi Luceners, Apologies. Other day was Trying to Search using the Luceneweb version with Lucene1-4-1.zip and O/s = Linux, J2SDK version 1.4.2_03-b02 With Roughly around 500 Documents (715116 kb ) Indexed using Lucene1.4-final.jar and writer.setUseCompoundFile(true); My Intension was to Search across all the 500Documents using MultiFieldQueryParser I have replaced the 'QueryParser.parse(srchkey,fildtpe[i], analyzer) ' with 'MultiFieldQueryParser.parse(SEARCHKEYS,fildtpe[],analyzer)' and hits = searcher.search(query) with hits = multiSearcher.search(query, new Sort(new SortField(filename, SortField.STRING))); I am getting the TOO many Open Files Exception , Can some body Help me With the Solution, [I have also inserted the REFRENCE JSP file ] java.io.IOException: Too many open files at java.io.UnixFileSystem.createFileExclusively(Native Method) at java.io.File.createNewFile(File.java:828) at org.apache.lucene.store.FSDirectory$1.obtain(FSDirectory.java:307) at org.apache.lucene.store.Lock.obtain(Lock.java:53) at org.apache.lucene.store.Lock$With.run(Lock.java:108) at org.apache.lucene.index.IndexReader.open(IndexReader.java:111) at org.apache.lucene.index.IndexReader.open(IndexReader.java:95) at org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:38) at org.apache.jsp.results_jsp._jspService(results_jsp.java:130) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:137) at javax.servlet.http.HttpServlet.service(HttpServlet.java:853) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:2 10) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:295) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:241) at javax.servlet.http.HttpServlet.service(HttpServlet.java:853) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:247) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:193) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:256) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:191) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) at org.apache.catalina.core.StandardContext.invoke(StandardContext.java:2415) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:180 ) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) at org.apache.catalina.valves.ErrorDispatcherValve.invoke(ErrorDispatcherValve. java:171) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:641) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:172 ) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:641) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :174) at org.apache.catalina.core.StandardPipeline$StandardPipelineValveContext.invok eNext(StandardPipeline.java:643) at org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:480) at org.apache.catalina.core.ContainerBase.invoke(ContainerBase.java:995) at org.apache.coyote.tomcat4.CoyoteAdapter.service(CoyoteAdapter.java:223) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:594) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.processConne ction(Http11Protocol.java:392) at org.apache.tomcat.util.net.TcpWorkerThread.runIt(PoolTcpEndpoint.java:565) at org.apache.tomcat.util.threads.ThreadPool$ControlRunnable.run(ThreadPool.jav a:619) at java.lang.Thread.run(Thread.java:534) WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
MultiSearcher + Sort
Guys Apologies Am I doing Wrong or is ther a bug with Lucene on Linux O/s When using ' MultiSearcher with Sort ' Please Somebody Reply me ASAP Tested both Lucene-1.4-final.jar,Lucene-1.4.1.jar hits = multiSearcher.search(query,sortField); Exception raised on Linux O/s Only [ On Windows it Works Perfectly ] Query String : (contents:gifts contents:articles) (path:gifts path:articles) (modified:gifts modified:articles) (filename:gifts filename:articles) (bookid:gifts bookid:articles) (creation:gifts creation:articles) (chapNme:gifts chapNme:articles) (itmName:gifts itmName:articles) (urltext:gifts urltext:articles) (itemCode:gifts itemCode:articles) (itemPrice:gifts itemPrice:articles) (pageid:gifts pageid:articles) --- EXCEPTION START- The Exception Raised file = SearchCreateArrayDataFiles.createArray1 Centralized Boolean Factor =false SYSTEM IS STOPPING COMPILATION -- EXCEPTION END- --- java.lang.RuntimeException: no terms in field bookid - cannot determine sort type at org.apache.lucene.search.FieldCacheImpl.getAuto(FieldCacheImpl.java:319) at org.apache.lucene.search.FieldSortedHitQueue.comparatorAuto(FieldSortedHitQu eue.java:326) at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator(FieldSorted HitQueue.java:167) at org.apache.lucene.search.FieldSortedHitQueue.init(FieldSortedHitQueue.java :58) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:118) at org.apache.lucene.search.MultiSearcher.search(MultiSearcher.java:141) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) at org.apache.lucene.search.Hits.init(Hits.java:51) at org.apache.lucene.search.Searcher.search(Searcher.java:41) --- - /*at com.controlnet.indexing.search.SearchCreateArrayDataFiles.createArray1(Searc hCreateArrayDataFiles.java:263) *at com.controlnet.indexing.search.SearchCreateArrayDataFiles.main(SearchCreateA rrayDataFiles.java:308) */ WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
displaying 'pages' of search results...
Hi Can u share the searcher.search(query, hitCollector); [light weight paging api ] Code on the form ,may be somebody like me need's it. ; ) Karthik -Original Message- From: Praveen Peddi [mailto:[EMAIL PROTECTED] Sent: Wednesday, September 22, 2004 1:24 AM To: Lucene Users List Subject: Re: displaying 'pages' of search results... The way we do it is: Get all the document ids, cache them and then get the first 50, second 50 documents etc. We wrote a light weight paging api on top of lucene. We call searcher.search(query, hitCollector); Our HitCollectorImpl implements collect method and just collects the document id only. Praveen - Original Message - From: Chris Fraschetti [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, September 21, 2004 3:33 PM Subject: displaying 'pages' of search results... I was wondering was the best way was to go about returning say 1,000,000 results, divided up into say 50 element sections and then accessing them via the first 50, second 50, etc etc. Is there a way to keep the query around so that lucene doesn't need to search again, or would the search be cached and no delay arise? Just looking for some ideas and possibly some implementational issues... -- ___ Chris Fraschetti e [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: ANT +BUILD + LUCENE
Hi Erik 1) Using Ant and Build.xml I want to run the org.apache.lucene.demo.IndexFiles to create an Indexfolder 2) Problem is The same Build.xml is to be used Across the O/s for creating Index 3) The path of Lucene1-4-final.jar are in respective directories for the O/s... [ Note :- The Path of Lucene_home,I/P and O/p directories are also O/s Specific should be in the Build.xml and should be trigged somthing by this type condition property=isWindows os family=windows / /condition or condition property=isUnix os family=unix / /condition I hope u get the situation. :{ With regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 14, 2004 7:37 PM To: Lucene Users List Subject: Re: ANT +BUILD + LUCENE I'm not following what you want very clearly, but there is an index task in Lucene's Sandbox. Please post what you are trying, and I'd be happy to help once I see the details. Erik On Sep 12, 2004, at 4:44 PM, Karthik N S wrote: Hi Guys Apologies.. The Task for me is to build the Index folder using Lucene a simple Build.xml for ANT The Problem .. Same 'Build .xml' should be used for differnet O/s... [ Win / Linux ] The glitch is respective jar files such as Lucene-1.4 .jar other jar files are not in same dir for the O/s. Also the I/p , O/p Indexer path for source/target may also vary. Please Somebody Help me. :( with regards Karthik WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
ANT +BUILD + LUCENE
Hi Guys Apologies.. The Task for me is to build the Index folder using Lucene a simple Build.xml for ANT The Problem .. Same 'Build .xml' should be used for differnet O/s... [ Win / Linux ] The glitch is respective jar files such as Lucene-1.4 .jar other jar files are not in same dir for the O/s. Also the I/p , O/p Indexer path for source/target may also vary. Please Somebody Help me. :( with regards Karthik WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene Minor Version ????
Hi Guys Apologies... Just was Curious to know If Lucene-1.4.1-final.jar a minor version change of Lucene1-4-final.jar or ;{ Thx in Advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Time to index documents
Hi Hetan Th's the major Problem of non Standatrdized Tags for HTML Document's u are Indexing ,resulting in lag time taken for Indexing process If u can Tweak the HTMLParser.jj file within lucene.zip '/demo/html' file [U have to have some Knowledge of JAVACC for this]. Karthik -Original Message- From: Hetan Shah [mailto:[EMAIL PROTECTED] Sent: Thursday, August 26, 2004 3:01 AM To: Lucene Users List Subject: Time to index documents Hello all, Is there a way to reduce the indexing time taken when the indexer is indexing about 30,000 + files. It is roughly taking around 6-7 hours to do this. I am using IndexHTML class to create the index out of HTML files. Another issue that I see is every once in a while I get the following output on the screen. adding ../31/1104852.html Parse Aborted: Encountered \ at line 7, column 1. Was expecting one of: ArgName ... = ... TagEnd ... Any suggestions on preventing this from happening? Thanks in advance. -H - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: integrationofLucene and PDF box
Hi santosh many people has worked in this arena... U look at the forms one by one and u may come across some example code to do similarly... Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 11:40 AM To: Lucene Users List Subject: integrationofLucene and PDF box any body integrated lucene with pdfbox? can we do it by changing the code in the IndexFiles.java or IndexHTML.java regards Santosh kumar ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: pdfboxhelp
Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the real path of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable with requirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:54 PM Subject: RE: pdfboxhelp Hi If u are using the jar file with Web Interface for jsp/servlet dev, Place the jar file in webapps/u'rapplication/Web-inf/lib and also correct the Classpath for the present modification. 2)create u'r own package and put all u'r java files copy the java files to /Web-inf/Classes/u'r package Then use the same..;{ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:31 PM To: Lucene Users List Subject: Re: pdfboxhelp thanks Natarajan and karthik, I corrected classpath but where I should write your code? should I write your code in IndexHTML.java which comes along with lucene or some other place? one more thing I kept pdfbox jar file in the classpath is this enough or I have to build the pdfbox? thankyou - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 3:20 PM Subject: RE: pdfboxhelp Hi Santhosh, Try out this below code.(pdfbox.jar file must be in your classpath) public String getContent(InputStream reader) throws IOException{PDFParser parser = null;PDDocument pdDoc = null;PDFTextStripper stripper = null;String pdftext = ;try{parser = new PDFParser(reader);parser.parse();pdDoc = parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor = new DecryptDocument(pdDoc);decryptor.decryptDocument();}stripper = new PDFTextStripper();pdftext = stripper.getText(pdDoc); info = pdDoc.getDocumentInformation();}catch(Exception err) {System.out.println(err.getMessage());}pdDoc.close();return pdftext;} Natarajan. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 3:14 PM To: Lucene Users List Subject: Re: pdfboxhelp Hi Don, your Idea is nice, but whenever I write the following code in IndexHTML.java of lucene import org.pdfbox.searchengine.lucene.*; File pdfFile = new File(/path/to/the/file.pdf); // Below returns a parse PDF file in a Lucene Document object. Document doc = LucenePDFDocument.getDocument(pdfFile); Iam getting the following error package org.pdfbox.searchengine.lucene does not exist I have downloaded pdfbox source code and kept the jar file in the classpath, please help me on this- Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 PMSubject: Re: pdfboxhelp Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File(/path/to/the/file.pdf); // Below returns a parse PDF file in a Lucene Document object.Document doc = LucenePDFDocument.getDocument(pdfFile); Santosh wrote: exactly, the same is required to me- Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PMSubject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad The harder you train in peace, the lesser you bleed in war ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly
RE: pdfboxhelp
Hi Santosh I think u'r Pdf is using Log4j package ,Try toe set the classpath for log4j.jar path. [ Is it a just a WARNING or an ERROR u are getting. Send me in u'r Configuration management Let me help u with it ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:11 AM To: Lucene Users List Cc: Ben Litchfield Subject: Re: pdfboxhelp hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:54 PM Subject: RE: pdfboxhelp Hi If u are using the jar file with Web Interface for jsp/servlet dev, Place the jar file in webapps/u'rapplication/Web-inf/lib and also correct the Classpath for the present modification. 2)create u'r own package and put all u'r java files copy the java files to /Web-inf/Classes/u'r package Then use the same..;{ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:31 PM To: Lucene Users List Subject: Re: pdfboxhelp thanks Natarajan and karthik, I corrected classpath but where I should write your code? should I write your code in IndexHTML.java which comes along with lucene or some other place? one more thing I kept pdfbox jar file in the classpath is this enough or I have to build the pdfbox? thankyou - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 3:20 PM Subject: RE: pdfboxhelp Hi Santhosh, Try out this below code.(pdfbox.jar file must be in your classpath) public String getContent(InputStream reader) throws IOException{PDFParser parser = null;PDDocument pdDoc = null;PDFTextStripper stripper = null;String pdftext = ;try{parser = new PDFParser(reader);parser.parse();pdDoc = parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor = new DecryptDocument(pdDoc);decryptor.decryptDocument();}stripper = new PDFTextStripper();pdftext = stripper.getText(pdDoc); info = pdDoc.getDocumentInformation();}catch(Exception err) {System.out.println(err.getMessage());}pdDoc.close();return pdftext;} Natarajan. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 3:14 PM To: Lucene Users List Subject: Re: pdfboxhelp Hi Don, your Idea is nice, but whenever I write the following code in IndexHTML.java of lucene import org.pdfbox.searchengine.lucene.*; File pdfFile = new File(/path/to/the/file.pdf); // Below returns a parse PDF file in a Lucene Document object. Document doc = LucenePDFDocument.getDocument(pdfFile); Iam getting the following error package org.pdfbox.searchengine.lucene does not exist I have downloaded pdfbox source code and kept the jar file in the classpath, please help me on this- Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 PMSubject: Re: pdfboxhelp Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File(/path/to/the/file.pdf); // Below returns a parse PDF file in a Lucene Document object.Document doc = LucenePDFDocument.getDocument(pdfFile); Santosh wrote: exactly, the same is required to me- Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday
RE: pdfboxhelp
Hi Santosh Hold on I's monday and I am on running off the Schedule with my Job... will reply u some time in noon. Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:51 AM To: Lucene Users List Subject: Fw: pdfboxhelp hi karthik, did u find any solution? should I send the pdf to u? - Original Message - From: Santosh [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:23 AM Subject: Re: pdfboxhelp hi karthik, I kept log4j in the classpath , I am sending classpath variable CLASSPATH .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2sdk1. 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\j2sd k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2sdk1. 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl.jar ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2sdk1. 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.zip;C :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6. jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox-0.6. 6\external\log4j.jar please check the error - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:26 AM Subject: RE: pdfboxhelp Hi Santosh I think u'r Pdf is using Log4j package ,Try toe set the classpath for log4j.jar path. [ Is it a just a WARNING or an ERROR u are getting. Send me in u'r Configuration management Let me help u with it ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:11 AM To: Lucene Users List Cc: Ben Litchfield Subject: Re: pdfboxhelp hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:54 PM Subject: RE: pdfboxhelp Hi If u are using the jar file with Web Interface for jsp/servlet dev, Place the jar file in webapps/u'rapplication/Web-inf/lib and also correct the Classpath for the present modification. 2)create u'r own package and put all u'r java files copy the java files to /Web-inf/Classes/u'r package Then use the same..;{ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:31 PM To: Lucene Users List Subject: Re: pdfboxhelp thanks Natarajan and karthik, I corrected classpath but where I should write your code? should I write your code in IndexHTML.java which comes along with lucene or some other place? one more thing I kept pdfbox jar file in the classpath is this enough or I have to build the pdfbox? thankyou - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 3:20 PM Subject: RE: pdfboxhelp Hi Santhosh, Try out this below code.(pdfbox.jar file must be in your classpath) public String getContent(InputStream
RE: pdf search
hi What is that u intend to Search and What is this own 'search words' First Explain properly u'r requirement to the form to get intented results. with regards Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Friday, August 20, 2004 5:59 PM To: Lucene Users List Subject: pdf search Hi, I am new bee to lucene. I have downloaded zip file. now how can i give my own list words to lucene? In the demo i saw that lucene is automatically creating index if we run the java program.but I want to give my own search words, how is it possible? regards Santosh kumar SoftPro Systems Hyderabad The harder you train in peace, the lesser you bleed in war ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Index Size
Guys Are u Using the Optimizing the index before close process. If not try using it... :} karthik -Original Message- From: Honey George [mailto:[EMAIL PROTECTED] Sent: Thursday, August 19, 2004 1:00 PM To: Lucene Users List Subject: Re: Index Size Hi, Please check for hidden files in the index folder. If you are using linx, do something like ls -al index folder I am also facing a similar problem where the index size is greater than the data size. In my case there were some hidden temproary files which the lucene creates. That was taking half of the total size. My problem is that after deleting the temporary files, the index size is same as that of the data size. That again seems to be a problem. I am yet to find out the reason.. Thanks, george --- Rob Jose [EMAIL PROTECTED] wrote: Hello I have indexed several thousand (52 to be exact) text files and I keep running out of disk space to store the indexes. The size of the documents I have indexed is around 2.5 GB. The size of the Lucene indexes is around 287 GB. Does this seem correct? I am not storing the contents of the file, just indexing and tokenizing. I am using Lucene 1.3 final. Can you guys let me know what you are experiencing? I don't want to go into production with something that I should be configuring better. I am not sure if this helps, but I have a temp index and a real index. I index the file into the temp index, and then merge the temp index into the real index using the addIndexes method on the IndexWriter. I have also set the production writer setUseCompoundFile to true. I did not set this on the temp index. The last thing that I do before closing the production writer is to call the optimize method. I would really appreciate any ideas to get the index size smaller if it is at all possible. Thanks Rob ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Restoring a corrupt index
Hi George Do u think ,the same would work for MERGED Indexes Please Can u suggest a solution. Karthik -Original Message- From: Honey George [mailto:[EMAIL PROTECTED] Sent: Thursday, August 19, 2004 2:08 PM To: Lucene Users List Subject: RE: Restoring a corrupt index This is what I did. There are 2 classes in the lucene source which are not public and therefore cannot be accessed from outside the package. The classes are 1. org.apache.lucene.index.SegmentInfos - collection of segments 2. org.apache.lucene.index.SegmentInfo -represents a sigle segment I took these two files and moved to a separate folder. Then created a class with the following code fragment. public void displaySegments(String indexDir) throws Exception { Directory dir = (Directory)FSDirectory.getDirectory(indexDir, false); SegmentInfos segments = new SegmentInfos(); segments.read(dir); StringBuffer str = new StringBuffer(); int size = segments.size(); str.append(Index Dir = + indexDir ); str.append(\nTotal Number of Segments + size); str.append(\n--); for(int i=0;isize;i++) { str.append(\n); str.append((i+1) + . ); str.append(((SegmentInfo)segments.get(i)).name); } str.append(\n--); System.out.println(str.toString()); } public void deleteSegment(String indexDir, String segmentName) throws Exception { Directory dir = (Directory)FSDirectory.getDirectory(indexDir, false); SegmentInfos segments = new SegmentInfos(); segments.read(dir); int size = segments.size(); String name = null; boolean found = false; for(int i=0;isize;i++) { name = ((SegmentInfo)segments.get(i)).name; if (segmentName.equals(name)) { found = true; segments.remove(i); System.out.println(Deleted the segment with name + name + from the segments file); break; } } if (found) { segments.write(dir); } else { System.out.println(Invalid segment name: + segmentName); } } Use the displaySegments() method to display the segments and deleteSegment to delete the corrupt segment. Thanks, George --- Karthik N S [EMAIL PROTECTED] wrote: Hi Guys In Our Situation we would be indexing Million Millions of Information documents with Huge Giga Bytes of Data Indexed and finally would be put into a MERGED INDEX, Categorized accordingly. There may be a possibility of Corruption, So Please do post the code reffrals Thx Karthik -Original Message- From: Honey George [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 18, 2004 5:51 PM To: Lucene Users List Subject: Re: Restoring a corrupt index Thanks Erik, that worked. I was able to remove the corrupt index and now it looks like the index is OK. I was able to view the number of documents in the index. Before that I was getting the error, java.io.IOException: read past EOF I am yet to find out how my index got corrupted. There is another thread going on about this topic, http://www.mail-archive.com/[EMAIL PROTECTED]/msg03165.html If anybody is facing similar problem and is interested in the code I can post it here. Thanks, George --- Erik Hatcher [EMAIL PROTECTED] wrote: The details of the segments file (and all the others) is freely available here: http://jakarta.apache.org/lucene/docs/fileformats.html Also, there is Java code in Lucene, of course, that manipulates the segments file which could be leveraged (although probably package scoped and not easily usable in a standalone repair tool). Erik On Aug 18, 2004, at 6:50 AM, Honey George wrote: Looks like problem is not with the hexeditor, even in the ultraedit(i had access to a windows box) I am seeing the same display. The problem is I am not able to identify where a record starts with just 1 record in the file. Need to try some alternate approach. Thanks, George ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http
AnalyZer HELP Please
Hi Guys Finally with lot's experimentation, I came to know that A word such as 'new' already present in Analyzer, will not return any hits [ Even when enclosed with Quotes \] such as New Year That's really Intresting:( Thx Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 7:35 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please On Aug 17, 2004, at 9:47 AM, Karthik N S wrote: I did as Erik replied in his mail , and searched for the complete word \New Year\ , but the QueryParser Still returns me hit for Year Only. [ The Analyzer I use has 555 English Stop words with new present in it ] No wonder! That's when I checked up with Analyzer's to verify, If u look at the list Analyzer's o/p GrammerAnalyzer is the one that has 555 English STOPWORDS. Do u think this is the bug in my Code. Whether this is a bug or not is really for your users to determine :) But it is absolutely the expected behavior. QueryParser analyzes the expression too. Even if you somehow changed QueryParser, if you never indexed the word new then you certainly cannot expect to search on it and find it. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Restoring a corrupt index
Hi Guys In Our Situation we would be indexing Million Millions of Information documents with Huge Giga Bytes of Data Indexed and finally would be put into a MERGED INDEX, Categorized accordingly. There may be a possibility of Corruption, So Please do post the code reffrals Thx Karthik -Original Message- From: Honey George [mailto:[EMAIL PROTECTED] Sent: Wednesday, August 18, 2004 5:51 PM To: Lucene Users List Subject: Re: Restoring a corrupt index Thanks Erik, that worked. I was able to remove the corrupt index and now it looks like the index is OK. I was able to view the number of documents in the index. Before that I was getting the error, java.io.IOException: read past EOF I am yet to find out how my index got corrupted. There is another thread going on about this topic, http://www.mail-archive.com/[EMAIL PROTECTED]/msg03165.html If anybody is facing similar problem and is interested in the code I can post it here. Thanks, George --- Erik Hatcher [EMAIL PROTECTED] wrote: The details of the segments file (and all the others) is freely available here: http://jakarta.apache.org/lucene/docs/fileformats.html Also, there is Java code in Lucene, of course, that manipulates the segments file which could be leveraged (although probably package scoped and not easily usable in a standalone repair tool). Erik On Aug 18, 2004, at 6:50 AM, Honey George wrote: Looks like problem is not with the hexeditor, even in the ultraedit(i had access to a windows box) I am seeing the same display. The problem is I am not able to identify where a record starts with just 1 record in the file. Need to try some alternate approach. Thanks, George ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
AnalyZer HELP Please
Hey Guys. Apologies.. Some small Help needed When I Run the Analyzer's for the word New Year (with Quotes) on Lucene1-4 final.jar on win 2k O/s Why is the SimpleAnalyzer splitting it into 2 words ??? or am i missing something in here.. Analzying New Year org.apache.lucene.analysis.WhitespaceAnalyzer: [] [New] [+] [Year] [] org.apache.lucene.analysis.SimpleAnalyzer: [new] [year] org.apache.lucene.analysis.StopAnalyzer: [new] [year] org.apache.lucene.analysis.standard.StandardAnalyzer: [new] [year] com.controlnet.indexing.analyzers.GrammerAnalyzer: [year] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: AnalyZer HELP Please
Hi Erik Apologies... What I ment to Say was, a word such as New Year (Quotes means \ ) on QueryParser.parse(word, contents, analyzer) should return me hits for the full word, but it did not. So when I did a quick run on Analyzer process and found that it was splitting the Word New Year = [New] [Year] Am I doing some thing wrong in here Thx in advance. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 6:18 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please This is what analyzers do. I don't know of any analyzer that deals with quotes in the way you're requesting, by keeping the contents together as a complete token. You'll have to write your own variant that does this. QueryParser, however, uses quotes to denote a phrase query, and will query for the words together. Perhaps this is sufficient for your needs? Erik On Aug 17, 2004, at 8:40 AM, Karthik N S wrote: Hey Guys. Apologies.. Some small Help needed When I Run the Analyzer's for the word New Year (with Quotes) on Lucene1-4 final.jar on win 2k O/s Why is the SimpleAnalyzer splitting it into 2 words ??? or am i missing something in here.. Analzying New Year org.apache.lucene.analysis.WhitespaceAnalyzer: [] [New] [+] [Year] [] org.apache.lucene.analysis.SimpleAnalyzer: [new] [year] org.apache.lucene.analysis.StopAnalyzer: [new] [year] org.apache.lucene.analysis.standard.StandardAnalyzer: [new] [year] com.controlnet.indexing.analyzers.GrammerAnalyzer: [year] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: AnalyZer HELP Please
Hi Patrick I did as Erik replied in his mail , and searched for the complete word \New Year\ , but the QueryParser Still returns me hit for Year Only. [ The Analyzer I use has 555 English Stop words with new present in it ] That's when I checked up with Analyzer's to verify, If u look at the list Analyzer's o/p GrammerAnalyzer is the one that has 555 English STOPWORDS. Do u think this is the bug in my Code. Thx Karthik -Original Message- From: Patrick Burleson [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 6:55 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please Karthik, What you would want to do with the split tokens ( New and Year ) is then create a PhraseQuery containing a Term object for each token. This should do what you want. As Erik said, QueryParser would have done this internally, only if you actually sent in the quotes...not just New Year, but \New Year\. Patrick On Tue, 17 Aug 2004 18:53:01 +0530, Karthik N S [EMAIL PROTECTED] wrote: Hi Erik Apologies... What I ment to Say was, a word such as New Year (Quotes means \ ) on QueryParser.parse(word, contents, analyzer) should return me hits for the full word, but it did not. So when I did a quick run on Analyzer process and found that it was splitting the Word New Year = [New] [Year] Am I doing some thing wrong in here Thx in advance. Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 6:18 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please This is what analyzers do. I don't know of any analyzer that deals with quotes in the way you're requesting, by keeping the contents together as a complete token. You'll have to write your own variant that does this. QueryParser, however, uses quotes to denote a phrase query, and will query for the words together. Perhaps this is sufficient for your needs? Erik On Aug 17, 2004, at 8:40 AM, Karthik N S wrote: Hey Guys. Apologies.. Some small Help needed When I Run the Analyzer's for the word New Year (with Quotes) on Lucene1-4 final.jar on win 2k O/s Why is the SimpleAnalyzer splitting it into 2 words ??? or am i missing something in here.. Analzying New Year org.apache.lucene.analysis.WhitespaceAnalyzer: [] [New] [+] [Year] [] org.apache.lucene.analysis.SimpleAnalyzer: [new] [year] org.apache.lucene.analysis.StopAnalyzer: [new] [year] org.apache.lucene.analysis.standard.StandardAnalyzer: [new] [year] com.controlnet.indexing.analyzers.GrammerAnalyzer: [year] WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: AnalyZer HELP Please
Hi Guys Apologies.. Correct me If I am wrong... During Indexing process, if the Analyzer has a word 'new' in the array ' STOPWORD' this word is prevented from indexing or Stopped from indexing. Then during the process of Search would not return me a hit on the word New Year , since the word 'new' is in Array STOPWORD ... [ Even if the Word is surrounded by \] With regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 17, 2004 7:35 PM To: Lucene Users List Subject: Re: AnalyZer HELP Please On Aug 17, 2004, at 9:47 AM, Karthik N S wrote: I did as Erik replied in his mail , and searched for the complete word \New Year\ , but the QueryParser Still returns me hit for Year Only. [ The Analyzer I use has 555 English Stop words with new present in it ] No wonder! That's when I checked up with Analyzer's to verify, If u look at the list Analyzer's o/p GrammerAnalyzer is the one that has 555 English STOPWORDS. Do u think this is the bug in my Code. Whether this is a bug or not is really for your users to determine :) But it is absolutely the expected behavior. QueryParser analyzes the expression too. Even if you somehow changed QueryParser, if you never indexed the word new then you certainly cannot expect to search on it and find it. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
HitCollector
Hello Please somebody explain me how to use the HitCollector on a simple Searcher.search(query) to obtain score range between 1.0f and 0.02456f. Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Highlighter package updated with overlapping token support
Hi Mark Apologies Please Casn u Provide the URL for the Users to Dwnload the new version of Highlighter package ( jar / Zip format) from u'r main website page. [ Because some of the developers may not have access to CVS downloading (Organization restrictions) from Lucene - sandbox ] Thx in advance with regards Karthik -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 27, 2004 2:28 AM To: [EMAIL PROTECTED] Subject: Highlighter package updated with overlapping token support I have updated the Highlighter code in CVS to support tokenizers that generate overlapping tokens. The Junit test rig has a new example test that uses a SynonymTokenizer which generates multiple tokens in the same position for the same input token eg (the token football is expanded into tokens soccer,footie and football). The Formatter interface had to be changed to take a new TokenGroup object instead of a single token but I doubt any code changes in clients are required because most people use the default Formatter implementation and haven't created their own implementations. Cheers Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Large index files
Hi I think (a) would be a better choice [I have done it on Linux upt to 7GB , it's pretty faster then doing the same on win2000 PF] with regards Karthik -Original Message- From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED] Sent: Friday, July 23, 2004 5:55 PM To: Lucene Users List Subject: Large index files Hi all I am using lucene to index a large dataset, it so happens 10% of this data yields indexes of 400MB, in all likelihood it is possible the index may go upto 7GB. My deployment will be on a linux/tomcat system, what will be a better solution a) create one large index and hope linux does not mind b) generate 7-10 indexes based on some criteria and glue them together using MultiReader, in this case I may cross the MAX file handles limit of Tomcat ? regards - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Extracting Lucene onto Tomcat
hi Just Copy the lucene.war file into the TomCat webApps Directory, and then start the Tomcat On the Browser type... http://localhost:8080/luceneweb will serve u the Pages. But first u have to index u'r directory for the web module to Serve u the searchable hits , I think there should be some Information in the Lucene package itself for doing this with regards Karthik -Original Message- From: Zilverline info [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 21, 2004 7:56 PM To: Lucene Users List Subject: Re: Extracting Lucene onto Tomcat Hi Ian, Depending on what you want to do, you could also follow the installation instructions on http://www.zilverline.org. It describes how to install zilverline, but the same goes for the lucene war. Hope this helps, Michael Franken Ian McDonnell wrote: Also another silly question, do i need to setup a war on the server? --- Ian McDonnell [EMAIL PROTECTED] wrote: Well when i extracted it, it created the org/apache/lucene directories in the public_html directory. When i try to compile any of the source it just throws numerous errors. I've got the classpath set to web-inf/classes. Have i extraced it to the wrong directory? --- Erik Hatcher [EMAIL PROTECTED] wrote: On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote: Is the package information and import paths ready to deploy on Tomcat server. I tried extracting lucene on the server, but when i compile files, it just throws numerous no class definition errors and errors relating to the package. Huh? Lucene certainly deploys just fine in Tomcat web applications (in a WAR under WEB-INF/lib). Could you elaborate on what you mean here? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Sign up for FREE email from SpinnersCity Online Dance Magazine Vortal at http://www.spinnerscity.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Score Range....
Hey Guys Apologies.. I hava Silly Question. On a avaliable Hit returns, How would one be able to get score between an upper and lower limit value Say' X 0.4 and X 1.0 ' Do u think this will work with regards Karthik WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search +QueryParser+Score
Hey Guy's Apologies. I have a Question Is there any API avaliable in Lucene1.4 to set the Score value to 1.0f or lesser BEFORE doing the Query Parser for search , so that the returns Hits for the Score settings only. with regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
HOWTO USE SORT on QUERY PARSER :)
Hey Guys' Apologies... Gee th's so simple u have explained me Thx a lot. Please correct me If I am wrong 1) So U tell me that On Field type FIELD_CONTENTS , the relevant hits can be sorted wrt Field type FIELD_DATE [ Where FIELD_DATE FIELD_CONTENTS are Field Typos for Lucene]... 2) To Run the Junit test's Do I need to Dwnload all the Files from CVS [Will there be a build .aml within the CVS] to run and execute the Tests... with regards Karthik -Original Message- From: Vladimir Yuryev [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 14, 2004 12:08 PM To: Lucene Users List Subject: Re: HOWTO USE SORT on QUERY PARSER :( example: query = QueryParser.parse(queryString, FIELD_CONTENTS, analyzer); Sort sort =new Sort(); sort.setSort(FIELD_DATE,true); //hits = searcher.search(query,sort); hits = multiSearcher.search(query,sort); ... FIELD_DATE - indexed field. Regards, Vladimir On Wed, 14 Jul 2004 12:02:33 +0530 Karthik N S [EMAIL PROTECTED] wrote: Hey Guys Apologies Before running the Build.xml for the Junit Test files , Do I need to Download all the Files present in Search folder from lucene CVS TEST in order to get the O/p Results With regards Karthik -Original Message- From: Vladimir Yuryev [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 14, 2004 11:38 AM To: Lucene Users List Subject: Re: HOWTO USE SORT on QUERY PARSER :( It is config problem. Run build.xml -- [Run ANT...]-- Run unit tests. Vladimir. On Wed, 14 Jul 2004 11:27:25 +0530 Karthik N S [EMAIL PROTECTED] wrote: Hi Guys Apologies I am using Eclipse 3.0 Ide , so when I run this file within the IDE,I am not able to VIEW the O/p Results. [ Till now I have no Idea about how to setup and run the Junit tests/View results on the O.ps ] Please give me some Tips on this . With regards Karthik -Original Message- From: Vladimir Yuryev [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 14, 2004 11:12 AM To: Lucene Users List Subject: Re: HOWTO USE SORT on QUERY PARSER :( Hi! From CVS -- jakarta-lucene/src/test/org/apache/lucene/search/TestSort.java Run it as UnitTest ( :-( -- :-)) Best regards, Vladimir. On Tue, 13 Jul 2004 15:31:18 +0530 Karthik N S [EMAIL PROTECTED] wrote: Hey Guys Apologies Can somebody please explain to me with a simple SRC example of how to use SORT on Query parser [1.4 lucene] [ I am confused with the code snippet on the CVS Test Case] with regards Karthik -Original Message- From: Grant Ingersoll [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 13, 2004 2:29 AM To: [EMAIL PROTECTED] Subject: Re: Could search results give an idea of which field matched See the explain functionality in the Javadocs and previous threads. You can ask Lucene to explain why it got the results it did for a give hit. [EMAIL PROTECTED] 07/12/04 04:52PM I search the index on multiple fields. Could the search results also tell me which field matched so that the document was selected? From what I can tell, only the document number and a score are returned, is there a way to also find out what was the field(s) of the document matched the query? Sildy - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search Result + Highlighter
Hi Guys Some week 's back had reported a problem regarding Search on Indexed file using Highlighter The Highlighter used to Dipslay [Pad] or [0] between words ( The Field type is Field.Text type, stores the HTML summary ) [ I am using a CustomAnalyzer which is similar to Standard Analyzer with 555 ENGLISH_STOP_WORDS] If any body has sombody looked into this matter for patch , please specfy.. with rehards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 14, 2004 1:06 AM To: Lucene Users List Subject: Re: Search Result Look at the Term Highlighter here: http://jakarta.apache.org/lucene/docs/lucene-sandbox/ On Jul 13, 2004, at 2:32 PM, Hetan Shah wrote: I think I have not explained my question correctly. What is happening is when I show the result on a page the text below the link as shown below. Test Page for Apache Installation http://dev-server.sfbay:8880/docs/sample.htm Sample content Jakarta Lucene - Lucene Sandbox http://dev-server.sfbay:8880/docs/lucene-sandbox/index.html [Jakarta Lucene] About Overview Powered by Lucene Who We Are Mailing Lists Resources FAQ (Official) jGuru FAQ Getting Started Query Syntax File Formats Javadoc Contributions Articles, etc. Benchmark In first example the search criteria sample occurs in the beginning of the page and so it shows up in the text below the link. In the second example the keyword sample shows up somewhere later in the document and so it does not show up in the text below the link. What can I do so that in all cases the text below the link always has the piece of the document where the keyword is found? thanks in advance. -H Hetan Shah wrote: What I am trying to figure out is. In my search result which is returned by the Document doc = hits.doc(i); text to show = doc.get(summary); The summary field seems to contain only the first few lines of the document. How can I make it to contain the piece that matches the query string? Thanks. -H Hetan Shah wrote: David, Do you know, in the demo code, how do I override or change this value so that I get to see the appropriate chuck of document? Would this change make the actual result to show the relevant section of the document? Sorry to sound so ignorant, I am very new at the whole search technology, getting to learn a lot from a great supportive community. Thanks, -H David Spencer wrote: Hetan Shah wrote: My search results are only displaying the top portion of the indexed documents. It does match the query in the later part of the document. Where should I look to change the code in demo3 of default 1.3 final distribution. In general if I want to show the block of document that matches with the query string which classes should I use? Sounds like this: http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/ IndexWriter.html#DEFAULT_MAX_FIELD_LENGTH Thanks guys. -H --- -- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene 1.3 final to 1.4final problem
Hey Dev Guys Apologies I have a Quick Problem... The no of Hits on set of Documents indexed using 1.3-final is not same on 1.4-final version [ The only modification done to the src is , I have upgraded my CustomAnalyzer on basis of StopAnalyzer avaliable in 1.4 ] Does doing this effect the performance. Some body please explain. with regards Karthik -Original Message- From: Alex Aw Seat Kiong [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 07, 2004 9:50 AM To: Lucene Users List Subject: upgrade from Lucene 1.3 final to 1.4rc3 problem Hi! I'm using Lucene 1.3 final currently, all things were working fine. But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it) We can re-compile it successfuly. but when will try to index the document. It give the error as below: java.lang.NullPointerException at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146) at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173) Which wrong? Pls help. Thanks. Regards, Alex - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene 1.3 final to 1.4final problem
Hey Dev Guys Apologies Can Some body Explain me Why for an I/P word TA to the StopAnalyzer.java returns [ta] instead of [ta] TA == [ta] instead of [ta] $125.96 === [125.95] instead of [$125.95] Is it something wrong I have been missing. with regards Karthik -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: Thursday, July 08, 2004 11:59 AM To: Lucene Users List Subject: Lucene 1.3 final to 1.4final problem Hey Dev Guys Apologies I have a Quick Problem... The no of Hits on set of Documents indexed using 1.3-final is not same on 1.4-final version [ The only modification done to the src is , I have upgraded my CustomAnalyzer on basis of StopAnalyzer avaliable in 1.4 ] Does doing this effect the performance. Some body please explain. with regards Karthik -Original Message- From: Alex Aw Seat Kiong [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 07, 2004 9:50 AM To: Lucene Users List Subject: upgrade from Lucene 1.3 final to 1.4rc3 problem Hi! I'm using Lucene 1.3 final currently, all things were working fine. But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it) We can re-compile it successfuly. but when will try to index the document. It give the error as below: java.lang.NullPointerException at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146) at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173) Which wrong? Pls help. Thanks. Regards, Alex - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Hit Score
Hey Ype Apologies . I would be more intrested in Boost/Weight factor in terms of Query rather then Fields. Please explain with example src. With regards Karthik -Original Message- From: Ype Kingma [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 07, 2004 12:08 PM To: [EMAIL PROTECTED] Subject: Re: Search Hit Score On Wednesday 07 July 2004 08:25, Ype Kingma wrote: For a single term query, one can iterate through IndexReader.termDocs(Term) and store the document numbers by TermDocs.docFreq(). That should be TermDocs.freq() Oops, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Search Hit Score
Hey Dev Guys Apologies Can some body Explain me How to Retrieve All hits avaliable per indexed document. To explain in Detail A Physical Search on Single document would list 3 places for a certain word occurance, So if i am suppose to retrieve all the 3 Occurances from the same Field using Lucene ... How to handle the query .. ... Explain with a simple SRC Example with regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Search Hit Score
Hi Dev Guys Apologies. I have 3 Questions for u. 1) I have a situation in here where I am suppose to group unique indexerd Documents depending upon the number of hit's per document. To Breifly Explain this All documet with n hits for a Search word would be grouped under Catagory A and all document with hits n+1 for the same Search Word should be grouped under Catagory B. Can Lucene provide some means internally to handle this situation. 2) What is this weight /Boost factor avaliable for the hits ,and how to use this Effectively. 3) Is there any thing in Lucene Core which reveles the version numbering of current used jar files something like on command prompt Java -version displaying the version. with regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 06, 2004 4:22 PM To: Lucene Users List Subject: Re: Latest StopAnalyzer.java On Jul 6, 2004, at 2:53 AM, Morus Walter wrote: Karthik N S writes: Can SomeBody Tell me Where Can I find Latest copy of StopAnalyzer.java which can be used with Lucene1_4-final, On Lucene-Sandbox I am not able to Find it. [ My Company Prohibits me from using CVS ] There is no lucene 1.4 final but org.apache.lucene.analysis.StopAnalyzer is part of the lucene core. Actually Doug did create Lucene 1.4 final: http://jakarta.apache.org/lucene/docs/index.html I'll try to squeeze in some time today to make it more official by ensuring the binaries are mirrored and such. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Upgrade from Lucene 1.3 final to 1.4 problem
Hey Apologies Same with me tooo... The no of Hits on set of Documents indexed using 1.3-final is not same on 1.4-final version [ The only modification done to the src is , I have upgraded my CustomAnalyzer on basis of StopAnalyzer avaliable in 1.4 ] Does doing this effect the performance. Some body please explain. with regards Karthik -Original Message- From: Alex Aw Seat Kiong [mailto:[EMAIL PROTECTED] Sent: Wednesday, July 07, 2004 9:50 AM To: Lucene Users List Subject: upgrade from Lucene 1.3 final to 1.4rc3 problem Hi! I'm using Lucene 1.3 final currently, all things were working fine. But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it) We can re-compile it successfuly. but when will try to index the document. It give the error as below: java.lang.NullPointerException at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146) at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102) at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173) Which wrong? Pls help. Thanks. Regards, Alex - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Using Highlighter in web Demo
Hello Developer's I am NOT able to get the API for the same [1.3-final or 1.4rc4 ] for import details. QueryScorer scorer = new QueryScorer(query); just was curious to Compile and execute the same...:) Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 29, 2004 5:52 PM To: Lucene Users List Subject: Re: Using Highlighter in web Demo On Jun 28, 2004, at 5:18 PM, Hetan Shah wrote: Is it possible to use highlighter successfully in the demos the web demo to be specific. Has any one tried out there? If so can they explain me how to go about it any code sample is really very appreciated. Straight from Lucene in Action: public class HighlightIt { private static final String text = Contrary to popular belief, Lorem Ipsum is + not simply random text. It has roots in a piece of + classical Latin literature from 45 BC, making it over + 2000 years old. Richard McClintock, a Latin professor + at Hampden-Sydney College in Virginia, looked up one + of the more obscure Latin words, consectetur, from + a Lorem Ipsum passage, and going through the cites + of the word in classical literature, discovered the + undoubtable source. Lorem Ipsum comes from sections + 1.10.32 and 1.10.33 of \de Finibus Bonorum et + Malorum\ (The Extremes of Good and Evil) by Cicero, + written in 45 BC. This book is a treatise on the + theory of ethics, very popular during the + Renaissance. The first line of Lorem Ipsum, \Lorem + ipsum dolor sit amet..\, comes from a line in + section 1.10.32.; // from http://www.lipsum.com/ public static void main(String[] args) throws IOException { String filename = args[0]; if (filename == null) { System.err.println(Usage: HighlightIt filename); System.exit(-1); } //TermQuery query = new TermQuery(new Term(f, ipsum)); PhraseQuery query = new PhraseQuery(); query.add(new Term(f, lorem)); query.add(new Term(f, ipsum)); QueryScorer scorer = new QueryScorer(query); SimpleHTMLFormatter formatter = new SimpleHTMLFormatter(span class=\highlight\, /span); Highlighter highlighter = new Highlighter(formatter, scorer); Fragmenter fragmenter = new SimpleFragmenter(50); highlighter.setTextFragmenter(fragmenter); TokenStream tokenStream = new StandardAnalyzer() .tokenStream(f, new StringReader(text)); String result = highlighter.getBestFragments(tokenStream, text, 5, ...); FileWriter writer = new FileWriter(filename); writer.write(html); writer.write(style\n + .highlight {\n + background: yellow;\n + }\n + /style); writer.write(body); writer.write(result); writer.write(/body/html); writer.close(); } } I just added the PhraseQuery in there instead of the TermQuery that is commented out. Highlighter works well with phrases also (although highlights each term individually, not the breadth of the phrase by itself). The above code runs like it says in the usage statement, give it a filename to save an HTML file that shows the terms highlighted in yellow. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Using Highlighter in web Demo
Oh! So silly of me Apologies Please Thx for the same -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 29, 2004 7:27 PM To: Lucene Users List Subject: RE: Using Highlighter in web Demo It sounds like you don't have the Highligher Jar in your CLASSPATH at compile-time. 1. Get Highlighter from CVS 2. Build a Jar using Ant and Highligher's build.xml 3. Add the resulting Jar to CLASSPATH 4. Compile your code (the one that imports QueryScorer) That should work. Otis --- Karthik N S [EMAIL PROTECTED] wrote: Hi Otis I am not able to compile the Code because the import statement for the below is not avaliable in the API QueryScorer scorer = new QueryScorer(query); with regards Karthik -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 29, 2004 6:55 PM To: Lucene Users List Subject: Re: Using Highlighter in web Demo Karthik, I don't understand your question. Sorting was only added in 1.4* versions, if I recall correctly. There was no sorting in 1.3. Otis --- Karthik N S [EMAIL PROTECTED] wrote: Hello Developer's I am NOT able to get the API for the same [1.3-final or 1.4rc4 ] for import details. QueryScorer scorer = new QueryScorer(query); just was curious to Compile and execute the same...:) Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 29, 2004 5:52 PM To: Lucene Users List Subject: Re: Using Highlighter in web Demo On Jun 28, 2004, at 5:18 PM, Hetan Shah wrote: Is it possible to use highlighter successfully in the demos the web demo to be specific. Has any one tried out there? If so can they explain me how to go about it any code sample is really very appreciated. Straight from Lucene in Action: public class HighlightIt { private static final String text = Contrary to popular belief, Lorem Ipsum is + not simply random text. It has roots in a piece of + classical Latin literature from 45 BC, making it over + 2000 years old. Richard McClintock, a Latin professor + at Hampden-Sydney College in Virginia, looked up one + of the more obscure Latin words, consectetur, from + a Lorem Ipsum passage, and going through the cites + of the word in classical literature, discovered the + undoubtable source. Lorem Ipsum comes from sections + 1.10.32 and 1.10.33 of \de Finibus Bonorum et + Malorum\ (The Extremes of Good and Evil) by Cicero, + written in 45 BC. This book is a treatise on the + theory of ethics, very popular during the + Renaissance. The first line of Lorem Ipsum, \Lorem + ipsum dolor sit amet..\, comes from a line in + section 1.10.32.; // from http://www.lipsum.com/ public static void main(String[] args) throws IOException { String filename = args[0]; if (filename == null) { System.err.println(Usage: HighlightIt filename); System.exit(-1); } //TermQuery query = new TermQuery(new Term(f, ipsum)); PhraseQuery query = new PhraseQuery(); query.add(new Term(f, lorem)); query.add(new Term(f, ipsum)); QueryScorer scorer = new QueryScorer(query); SimpleHTMLFormatter formatter = new SimpleHTMLFormatter(span class=\highlight\, /span); Highlighter highlighter = new Highlighter(formatter, scorer); Fragmenter fragmenter = new SimpleFragmenter(50); highlighter.setTextFragmenter(fragmenter); TokenStream tokenStream = new StandardAnalyzer() .tokenStream(f, new StringReader(text)); String result = highlighter.getBestFragments(tokenStream, text, 5, ...); FileWriter writer = new FileWriter(filename); writer.write(html); writer.write(style\n + .highlight {\n + background: yellow;\n + }\n + /style); writer.write(body); writer.write(result); writer.write(/body/html); writer.close(); } } I just added the PhraseQuery in there instead of the TermQuery that is commented out. Highlighter works well with phrases also (although highlights each term individually, not the breadth of the phrase by itself). The above code runs like it says in the usage statement, give it a filename to save an HTML file that shows the terms highlighted in yellow. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL
Delete Indexed from Merged Document
Guys Has Somebody out there tried DELETING/UPDATION of INDEXED Files from a MERGED Index Format, If HowTo do this Please Explain with regards Karthik -Original Message- From: Karthik N S [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 23, 2004 9:24 AM To: Lucene Users List Subject: RE: Delete Indexed from Merged Document Hi Otis The link u have specified displays on how to update an Indexed File [ Deleting the Old and then updating with new Ones'] But My Question to be more Specific is : - When we MERGED more then 2 Indexed files [using writer.addIndexes(luceneDirs)] , In such a case How to Delete one of the Indexed files from the MERGED Index in order to Insert an new updated one Please have some sample code snippet in this regard.. with regards Karthik -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Tuesday, June 22, 2004 12:52 PM To: Lucene Users List Subject: Re: Delete Indexed from Merged Document Hello Karthik, Here is the answer: http://www.jguru.com/faq/view.jsp?EID=492423 Otis --- Karthik N S [EMAIL PROTECTED] wrote: Dev Guys Apologies Please How Do I DELETE an Indexed Document from a MERGED Index File Can Some body Write me some Code Snippets on this... please With Regards Karthik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Delete Indexed from Merged Document
Hi Mr Wolf What is this // remove the document from index int docID = hits.id(0); and can I increment the 0 factor in the bracket ...for deletion Thx in advance Karthik -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 23, 2004 5:33 PM To: [EMAIL PROTECTED] Subject: AW: Delete Indexed from Merged Document Hello, Karthik N S [mailto:[EMAIL PROTECTED] Has Somebody out there tried DELETING/UPDATION of INDEXED Files from a MERGED Index Format, If HowTo do this Please Explain Of course you can delete or update a document from a merged index. It works in the same way as for all other indexes. You need an unique key (e.g. the file name or uri), which is indexed for searching, to find the right document, because the internal document numbers are changed after merging indexes or deleting documents and optimizing an index. Using this key you can search for the document and remove it. It doesn't matter if your index was created by merging serveral indexes or not. Example: /* Create index: */ Document document = new Document(); document.add(Field.Keyword(filename, file_name)); // this must be unique for each document! document.add(Field.Text(content, file_content)); writer.addDocument(document); /* ... */ writer.close(); /* Update or remove document: Use the file name to find the original document and remove it from index */ FSDirectory indexDirectory = FSDirectory.getDirectory(indexPath, false); IndexReader indexReader = IndexReader.open(indexDirectory); IndexSearcher indexSearcher = new IndexSearcher(indexReader); // create query and search for document using its filename TermQuery query = new TermQuery(new Term(filename, file_name)); Hits hits = indexSearcher.search(query); if ( hits.length() 0 ) { // remove the document from index int docID = hits.id(0); indexReader.delete( docID ); } // else: this is a new file or already removed, so we can simply add it. indexSearcher.close(); indexReader.close(); indexDirectory.close(); // now open an IndexWriter for the same index and add the updated file // as new document /* done */ Hope it helps. Regards, Wolf-Dietrich Materna - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]