RE: Lucene for Indian Languages
Hi I do not think so ,but there was One requirement in the Form for the Devenagari script Have look at the forms,u might find something on this Karthik -Original Message- From: srinivasa raghavan [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 11:35 AM To: [EMAIL PROTECTED] Subject: Lucene for Indian Languages Hi all, Is Lucene API implemented for Indian contexts? I know that Lucene stemmers and filters for German and Russian Languages. I would like to know, whether there are stemmers and filters available/being developed for Indian Languages. Thanks, Rahavan. ___ Do you Yahoo!? Express yourself with Y! Messenger! Free. Download now. http://messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene for Indian Languages
Hi all, Is Lucene API implemented for Indian contexts? I know that Lucene stemmers and filters for German and Russian Languages. I would like to know, whether there are stemmers and filters available/being developed for Indian Languages. Thanks, Rahavan. ___ Do you Yahoo!? Express yourself with Y! Messenger! Free. Download now. http://messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: pdfboxhelp
Hi natarajan, I kept log4j.properties in the classpath my new classpath is .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien t.ja r;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2sdk1.4.1\ lib\ xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\j2sdk1.4.1\l ib\s ervlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2sdk1. 4.1\ lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl.jar;C:\ j2sd k1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2sdk1.4.1\lib\ jaxp .jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.zip;C:\struts.jar ;F:\ apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6.jar;C:\j2sdk1.4. 1\li b\lucene-20030909.jar;D:\setups\searchEngine\PDFBox-0.6.6\external\log4j.jar ;C:\ j2sdk1.4.1\lib\log4j.properties; but there is no difference in the output - Original Message - From: "Natarajan.T" <[EMAIL PROTECTED]> To: "'Lucene Users List'" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 10:56 AM Subject: RE: pdfboxhelp > Hi Santhosh, > > The attached file must be in your class path. > > > Natarajan. > > > > -Original Message- > From: Santosh [mailto:[EMAIL PROTECTED] > Sent: Monday, August 23, 2004 10:51 AM > To: Lucene Users List > Subject: Fw: pdfboxhelp > > hi karthik, > did u find any solution? should I send the pdf to u? > - Original Message - > From: "Santosh" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Monday, August 23, 2004 10:23 AM > Subject: Re: pdfboxhelp > > > > hi karthik, > > I kept log4j in the classpath , I am sending classpath variable > > > > CLASSPATH > > > > > .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webc > lien > > > t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2s > dk1. > > > 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\ > j2sd > > k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat > > 4.0\common\lib\servlet.jar;C:\Program > > > Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2s > dk1. > > > 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl > .jar > > > ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2s > dk1. > > > 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.z > ip;C > > > :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0. > 6.6. > > > jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox- > 0.6. > > 6\external\log4j.jar > > > > please check the error > > > > > > > > - Original Message - > > From: "Karthik N S" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Monday, August 23, 2004 10:26 AM > > Subject: RE: pdfboxhelp > > > > > > > Hi Santosh > > > > > > I think u'r Pdf is using Log4j package ,Try toe set the classpath > for > > > log4j.jar path. > > > > > > [ Is it a just a WARNING or an ERROR u are getting. > > > > > > Send me in u'r Configuration management Let me help u with it > ; [ > > > > > > > > > Karthik > > > > > > -Original Message- > > > From: Santosh [mailto:[EMAIL PROTECTED] > > > Sent: Monday, August 23, 2004 10:11 AM > > > To: Lucene Users List > > > Cc: Ben Litchfield > > > Subject: Re: pdfboxhelp > > > > > > > > > hi karthik, > > > > > > I have downloaded pdfbox and kept pdfjar file in the classpath, but > when > I > > > am typing following command in the command prompt I am getting the > error: > > > > > > D:\setups\searchEngine\PDFBox-0.6.6\src>java org.pdfbox.ExtractText > > > C:\test.pdf > > > C:\test.txt > > > log4j:WARN No appenders could be found for logger > > > (org.pdfbox.pdfparser.PDFParse > > > r). > > > log4j:WARN Please initialize the log4j system properly > > > > > > why I am getting this error? plz help > > > > > > > > > - Original Message - > > > From: "Karthik N S" <[EMAIL PROTECTED]> > > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > > Sent: Monday, August 23, 2004 9:21 AM > > > Subject: RE: pdfboxhelp > > > > > > > > > > Hi > > > > > > > > > > > > To Begin with try to build Indexes offline [ out of Tomcat > > container] > > > > and on completing indxexes, feed u'r search with the realpath of > the > > > offline indexed folder,Start the Tomcat and then use the > > > > search on As u experiment it out u will be comfortable > > withrequirment > > > of Indexing /Search.. ; [ > > > > > > > > Karthik > > > > > > > > -Original Message- > > > > From: Santosh [mailto:[EMAIL PROTECTED] > > > > Sent: Saturday, August 21, 2004 4:55 PM > > > > To: Lucene Users List > > > > Subject: Re: pdfboxhelp > > > > > > > > > > > > Yes I did the same. > > > > I copied all the classes into classes folder but > > > > now when I am building the index using IndexHTML the pdfs are not > added > > to > > > >
RE: pdfboxhelp
Hi Santosh Hold on I's monday and I am on running off the Schedule with my Job... will reply u some time in noon. Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:51 AM To: Lucene Users List Subject: Fw: pdfboxhelp hi karthik, did u find any solution? should I send the pdf to u? - Original Message - From: "Santosh" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 10:23 AM Subject: Re: pdfboxhelp > hi karthik, > I kept log4j in the classpath , I am sending classpath variable > > CLASSPATH > > .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien > t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2sdk1. > 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\j2sd > k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat > 4.0\common\lib\servlet.jar;C:\Program > Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2sdk1. > 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl.jar > ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2sdk1. > 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.zip;C > :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6. > jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox-0.6. > 6\external\log4j.jar > > please check the error > > > > - Original Message - > From: "Karthik N S" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Monday, August 23, 2004 10:26 AM > Subject: RE: pdfboxhelp > > > > Hi Santosh > > > > I think u'r Pdf is using Log4j package ,Try toe set the classpath for > > log4j.jar path. > > > > [ Is it a just a WARNING or an ERROR u are getting. > > > > Send me in u'r Configuration management Let me help u with it ; [ > > > > > > Karthik > > > > -Original Message- > > From: Santosh [mailto:[EMAIL PROTECTED] > > Sent: Monday, August 23, 2004 10:11 AM > > To: Lucene Users List > > Cc: Ben Litchfield > > Subject: Re: pdfboxhelp > > > > > > hi karthik, > > > > I have downloaded pdfbox and kept pdfjar file in the classpath, but when I > > am typing following command in the command prompt I am getting the error: > > > > D:\setups\searchEngine\PDFBox-0.6.6\src>java org.pdfbox.ExtractText > > C:\test.pdf > > C:\test.txt > > log4j:WARN No appenders could be found for logger > > (org.pdfbox.pdfparser.PDFParse > > r). > > log4j:WARN Please initialize the log4j system properly > > > > why I am getting this error? plz help > > > > > > - Original Message - > > From: "Karthik N S" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Monday, August 23, 2004 9:21 AM > > Subject: RE: pdfboxhelp > > > > > > > Hi > > > > > > > > > To Begin with try to build Indexes offline [ out of Tomcat > container] > > > and on completing indxexes, feed u'r search with the realpath of the > > offline indexed folder,Start the Tomcat and then use the > > > search on As u experiment it out u will be comfortable > withrequirment > > of Indexing /Search.. ; [ > > > > > > Karthik > > > > > > -Original Message- > > > From: Santosh [mailto:[EMAIL PROTECTED] > > > Sent: Saturday, August 21, 2004 4:55 PM > > > To: Lucene Users List > > > Subject: Re: pdfboxhelp > > > > > > > > > Yes I did the same. > > > I copied all the classes into classes folder but > > > now when I am building the index using IndexHTML the pdfs are not added > to > > > this index, only text and htmls are added to index. > > > what changes should I do for IndexHTML.java to build index with pdf > > > - Original Message - > > > From: "Karthik N S" <[EMAIL PROTECTED]> > > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > > Sent: Saturday, August 21, 2004 4:54 PM > > > Subject: RE: pdfboxhelp > > > > > > > > > > Hi > > > > > > > > If u are using the jar file with Web Interface for jsp/servlet dev, > > Place > > > > the jar file in "webapps///lib" > > > > and also correct the Classpath for the present modification. > > > > > > > > 2)create u'r own package and put all u'r java files copy the java > files > > > to > > > > /Web-inf/Classes/ > > > > > > > > > > > > Then use the same..;{ > > > > > > > > > > > > Karthik > > > > > > > > -Original Message- > > > > From: Santosh [mailto:[EMAIL PROTECTED] > > > > Sent: Saturday, August 21, 2004 4:31 PM > > > > To: Lucene Users List > > > > Subject: Re: pdfboxhelp > > > > > > > > > > > > thanks Natarajan and karthik, > > > > > > > > I corrected classpath > > > > > > > > but where I should write your code? > > > > should I write your code in IndexHTML.java which comes along with > > lucene > > > or > > > > some other place? > > > > one more thing > > > > I kept pdfbox jar file in the classpath is this enough or I have to > > build > > > > the pdfbox? > > > > > > > >
RE: pdfboxhelp
Hi Santhosh, The attached file must be in your class path. Natarajan. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:51 AM To: Lucene Users List Subject: Fw: pdfboxhelp hi karthik, did u find any solution? should I send the pdf to u? - Original Message - From: "Santosh" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 10:23 AM Subject: Re: pdfboxhelp > hi karthik, > I kept log4j in the classpath , I am sending classpath variable > > CLASSPATH > > .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webc lien > t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2s dk1. > 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\ j2sd > k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat > 4.0\common\lib\servlet.jar;C:\Program > Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2s dk1. > 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl .jar > ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2s dk1. > 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.z ip;C > :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0. 6.6. > jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox- 0.6. > 6\external\log4j.jar > > please check the error > > > > - Original Message - > From: "Karthik N S" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Monday, August 23, 2004 10:26 AM > Subject: RE: pdfboxhelp > > > > Hi Santosh > > > > I think u'r Pdf is using Log4j package ,Try toe set the classpath for > > log4j.jar path. > > > > [ Is it a just a WARNING or an ERROR u are getting. > > > > Send me in u'r Configuration management Let me help u with it ; [ > > > > > > Karthik > > > > -Original Message- > > From: Santosh [mailto:[EMAIL PROTECTED] > > Sent: Monday, August 23, 2004 10:11 AM > > To: Lucene Users List > > Cc: Ben Litchfield > > Subject: Re: pdfboxhelp > > > > > > hi karthik, > > > > I have downloaded pdfbox and kept pdfjar file in the classpath, but when I > > am typing following command in the command prompt I am getting the error: > > > > D:\setups\searchEngine\PDFBox-0.6.6\src>java org.pdfbox.ExtractText > > C:\test.pdf > > C:\test.txt > > log4j:WARN No appenders could be found for logger > > (org.pdfbox.pdfparser.PDFParse > > r). > > log4j:WARN Please initialize the log4j system properly > > > > why I am getting this error? plz help > > > > > > - Original Message - > > From: "Karthik N S" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Monday, August 23, 2004 9:21 AM > > Subject: RE: pdfboxhelp > > > > > > > Hi > > > > > > > > > To Begin with try to build Indexes offline [ out of Tomcat > container] > > > and on completing indxexes, feed u'r search with the realpath of the > > offline indexed folder,Start the Tomcat and then use the > > > search on As u experiment it out u will be comfortable > withrequirment > > of Indexing /Search.. ; [ > > > > > > Karthik > > > > > > -Original Message- > > > From: Santosh [mailto:[EMAIL PROTECTED] > > > Sent: Saturday, August 21, 2004 4:55 PM > > > To: Lucene Users List > > > Subject: Re: pdfboxhelp > > > > > > > > > Yes I did the same. > > > I copied all the classes into classes folder but > > > now when I am building the index using IndexHTML the pdfs are not added > to > > > this index, only text and htmls are added to index. > > > what changes should I do for IndexHTML.java to build index with pdf > > > - Original Message - > > > From: "Karthik N S" <[EMAIL PROTECTED]> > > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > > Sent: Saturday, August 21, 2004 4:54 PM > > > Subject: RE: pdfboxhelp > > > > > > > > > > Hi > > > > > > > > If u are using the jar file with Web Interface for jsp/servlet dev, > > Place > > > > the jar file in "webapps///lib" > > > > and also correct the Classpath for the present modification. > > > > > > > > 2)create u'r own package and put all u'r java files copy the java > files > > > to > > > > /Web-inf/Classes/ > > > > > > > > > > > > Then use the same..;{ > > > > > > > > > > > > Karthik > > > > > > > > -Original Message- > > > > From: Santosh [mailto:[EMAIL PROTECTED] > > > > Sent: Saturday, August 21, 2004 4:31 PM > > > > To: Lucene Users List > > > > Subject: Re: pdfboxhelp > > > > > > > > > > > > thanks Natarajan and karthik, > > > > > > > > I corrected classpath > > > > > > > > but where I should write your code? > > > > should I write your code in IndexHTML.java which comes along with > > lucene > > > or > > > > some other place? > > > > one more thing > > > > I kept pdfbox jar file in the classpath is this enough or I have to > > build > > > > the pdfbox? > > > > > > > > thankyou > > > > - Original Message - > > >
Fw: pdfboxhelp
hi karthik, did u find any solution? should I send the pdf to u? - Original Message - From: "Santosh" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 10:23 AM Subject: Re: pdfboxhelp > hi karthik, > I kept log4j in the classpath , I am sending classpath variable > > CLASSPATH > > .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien > t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2sdk1. > 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\j2sd > k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat > 4.0\common\lib\servlet.jar;C:\Program > Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2sdk1. > 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl.jar > ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2sdk1. > 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.zip;C > :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6. > jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox-0.6. > 6\external\log4j.jar > > please check the error > > > > - Original Message - > From: "Karthik N S" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Monday, August 23, 2004 10:26 AM > Subject: RE: pdfboxhelp > > > > Hi Santosh > > > > I think u'r Pdf is using Log4j package ,Try toe set the classpath for > > log4j.jar path. > > > > [ Is it a just a WARNING or an ERROR u are getting. > > > > Send me in u'r Configuration management Let me help u with it ; [ > > > > > > Karthik > > > > -Original Message- > > From: Santosh [mailto:[EMAIL PROTECTED] > > Sent: Monday, August 23, 2004 10:11 AM > > To: Lucene Users List > > Cc: Ben Litchfield > > Subject: Re: pdfboxhelp > > > > > > hi karthik, > > > > I have downloaded pdfbox and kept pdfjar file in the classpath, but when I > > am typing following command in the command prompt I am getting the error: > > > > D:\setups\searchEngine\PDFBox-0.6.6\src>java org.pdfbox.ExtractText > > C:\test.pdf > > C:\test.txt > > log4j:WARN No appenders could be found for logger > > (org.pdfbox.pdfparser.PDFParse > > r). > > log4j:WARN Please initialize the log4j system properly > > > > why I am getting this error? plz help > > > > > > - Original Message - > > From: "Karthik N S" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Monday, August 23, 2004 9:21 AM > > Subject: RE: pdfboxhelp > > > > > > > Hi > > > > > > > > > To Begin with try to build Indexes offline [ out of Tomcat > container] > > > and on completing indxexes, feed u'r search with the realpath of the > > offline indexed folder,Start the Tomcat and then use the > > > search on As u experiment it out u will be comfortable > withrequirment > > of Indexing /Search.. ; [ > > > > > > Karthik > > > > > > -Original Message- > > > From: Santosh [mailto:[EMAIL PROTECTED] > > > Sent: Saturday, August 21, 2004 4:55 PM > > > To: Lucene Users List > > > Subject: Re: pdfboxhelp > > > > > > > > > Yes I did the same. > > > I copied all the classes into classes folder but > > > now when I am building the index using IndexHTML the pdfs are not added > to > > > this index, only text and htmls are added to index. > > > what changes should I do for IndexHTML.java to build index with pdf > > > - Original Message - > > > From: "Karthik N S" <[EMAIL PROTECTED]> > > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > > Sent: Saturday, August 21, 2004 4:54 PM > > > Subject: RE: pdfboxhelp > > > > > > > > > > Hi > > > > > > > > If u are using the jar file with Web Interface for jsp/servlet dev, > > Place > > > > the jar file in "webapps///lib" > > > > and also correct the Classpath for the present modification. > > > > > > > > 2)create u'r own package and put all u'r java files copy the java > files > > > to > > > > /Web-inf/Classes/ > > > > > > > > > > > > Then use the same..;{ > > > > > > > > > > > > Karthik > > > > > > > > -Original Message- > > > > From: Santosh [mailto:[EMAIL PROTECTED] > > > > Sent: Saturday, August 21, 2004 4:31 PM > > > > To: Lucene Users List > > > > Subject: Re: pdfboxhelp > > > > > > > > > > > > thanks Natarajan and karthik, > > > > > > > > I corrected classpath > > > > > > > > but where I should write your code? > > > > should I write your code in IndexHTML.java which comes along with > > lucene > > > or > > > > some other place? > > > > one more thing > > > > I kept pdfbox jar file in the classpath is this enough or I have to > > build > > > > the pdfbox? > > > > > > > > thankyou > > > > - Original Message - > > > > From: "Natarajan.T" <[EMAIL PROTECTED]> > > > > To: "'Lucene Users List'" <[EMAIL PROTECTED]> > > > > Sent: Saturday, August 21, 2004 3:20 PM > > > > Subject: RE: pdfboxhelp > > > > > > > > > > > > > Hi Santhosh, > > > > > > > > > > Try
Re: pdfboxhelp
hi karthik, I kept log4j in the classpath , I am sending classpath variable CLASSPATH .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2sdk1. 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\j2sd k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2sdk1. 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl.jar ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2sdk1. 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.zip;C :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6. jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox-0.6. 6\external\log4j.jar please check the error - Original Message - From: "Karthik N S" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 10:26 AM Subject: RE: pdfboxhelp > Hi Santosh > > I think u'r Pdf is using Log4j package ,Try toe set the classpath for > log4j.jar path. > > [ Is it a just a WARNING or an ERROR u are getting. > > Send me in u'r Configuration management Let me help u with it ; [ > > > Karthik > > -Original Message- > From: Santosh [mailto:[EMAIL PROTECTED] > Sent: Monday, August 23, 2004 10:11 AM > To: Lucene Users List > Cc: Ben Litchfield > Subject: Re: pdfboxhelp > > > hi karthik, > > I have downloaded pdfbox and kept pdfjar file in the classpath, but when I > am typing following command in the command prompt I am getting the error: > > D:\setups\searchEngine\PDFBox-0.6.6\src>java org.pdfbox.ExtractText > C:\test.pdf > C:\test.txt > log4j:WARN No appenders could be found for logger > (org.pdfbox.pdfparser.PDFParse > r). > log4j:WARN Please initialize the log4j system properly > > why I am getting this error? plz help > > > - Original Message - > From: "Karthik N S" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Monday, August 23, 2004 9:21 AM > Subject: RE: pdfboxhelp > > > > Hi > > > > > > To Begin with try to build Indexes offline [ out of Tomcat container] > > and on completing indxexes, feed u'r search with the realpath of the > offline indexed folder,Start the Tomcat and then use the > > search on As u experiment it out u will be comfortable withrequirment > of Indexing /Search.. ; [ > > > > Karthik > > > > -Original Message- > > From: Santosh [mailto:[EMAIL PROTECTED] > > Sent: Saturday, August 21, 2004 4:55 PM > > To: Lucene Users List > > Subject: Re: pdfboxhelp > > > > > > Yes I did the same. > > I copied all the classes into classes folder but > > now when I am building the index using IndexHTML the pdfs are not added to > > this index, only text and htmls are added to index. > > what changes should I do for IndexHTML.java to build index with pdf > > - Original Message - > > From: "Karthik N S" <[EMAIL PROTECTED]> > > To: "Lucene Users List" <[EMAIL PROTECTED]> > > Sent: Saturday, August 21, 2004 4:54 PM > > Subject: RE: pdfboxhelp > > > > > > > Hi > > > > > > If u are using the jar file with Web Interface for jsp/servlet dev, > Place > > > the jar file in "webapps///lib" > > > and also correct the Classpath for the present modification. > > > > > > 2)create u'r own package and put all u'r java files copy the java files > > to > > > /Web-inf/Classes/ > > > > > > > > > Then use the same..;{ > > > > > > > > > Karthik > > > > > > -Original Message- > > > From: Santosh [mailto:[EMAIL PROTECTED] > > > Sent: Saturday, August 21, 2004 4:31 PM > > > To: Lucene Users List > > > Subject: Re: pdfboxhelp > > > > > > > > > thanks Natarajan and karthik, > > > > > > I corrected classpath > > > > > > but where I should write your code? > > > should I write your code in IndexHTML.java which comes along with > lucene > > or > > > some other place? > > > one more thing > > > I kept pdfbox jar file in the classpath is this enough or I have to > build > > > the pdfbox? > > > > > > thankyou > > > - Original Message - > > > From: "Natarajan.T" <[EMAIL PROTECTED]> > > > To: "'Lucene Users List'" <[EMAIL PROTECTED]> > > > Sent: Saturday, August 21, 2004 3:20 PM > > > Subject: RE: pdfboxhelp > > > > > > > > > > Hi Santhosh, > > > > > > > > Try out this below code.(pdfbox.jar file must be in your > classpath) > > > > > > > > public String getContent(InputStream reader) throws > > IOException{PDFParser > > > parser = null;PDDocument pdDoc = null;PDFTextStripper stripper = > > null;String > > > pdftext = "";try{parser = new PDFParser(reader);parser.parse();pdDoc = > > > parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor > = > > > new > > > > DecryptDocument(pdDoc);decryptor.decryptDocument("");}stripper = new > > > PDFTextStripper();pdftext = stripper
RE: pdfboxhelp
Hi Santosh I think u'r Pdf is using Log4j package ,Try toe set the classpath for log4j.jar path. [ Is it a just a WARNING or an ERROR u are getting. Send me in u'r Configuration management Let me help u with it ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:11 AM To: Lucene Users List Cc: Ben Litchfield Subject: Re: pdfboxhelp hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\src>java org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: "Karthik N S" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp > Hi > > > To Begin with try to build Indexes offline [ out of Tomcat container] > and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the > search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ > > Karthik > > -Original Message- > From: Santosh [mailto:[EMAIL PROTECTED] > Sent: Saturday, August 21, 2004 4:55 PM > To: Lucene Users List > Subject: Re: pdfboxhelp > > > Yes I did the same. > I copied all the classes into classes folder but > now when I am building the index using IndexHTML the pdfs are not added to > this index, only text and htmls are added to index. > what changes should I do for IndexHTML.java to build index with pdf > - Original Message - > From: "Karthik N S" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Saturday, August 21, 2004 4:54 PM > Subject: RE: pdfboxhelp > > > > Hi > > > > If u are using the jar file with Web Interface for jsp/servlet dev, Place > > the jar file in "webapps///lib" > > and also correct the Classpath for the present modification. > > > > 2)create u'r own package and put all u'r java files copy the java files > to > > /Web-inf/Classes/ > > > > > > Then use the same..;{ > > > > > > Karthik > > > > -Original Message- > > From: Santosh [mailto:[EMAIL PROTECTED] > > Sent: Saturday, August 21, 2004 4:31 PM > > To: Lucene Users List > > Subject: Re: pdfboxhelp > > > > > > thanks Natarajan and karthik, > > > > I corrected classpath > > > > but where I should write your code? > > should I write your code in IndexHTML.java which comes along with lucene > or > > some other place? > > one more thing > > I kept pdfbox jar file in the classpath is this enough or I have to build > > the pdfbox? > > > > thankyou > > - Original Message - > > From: "Natarajan.T" <[EMAIL PROTECTED]> > > To: "'Lucene Users List'" <[EMAIL PROTECTED]> > > Sent: Saturday, August 21, 2004 3:20 PM > > Subject: RE: pdfboxhelp > > > > > > > Hi Santhosh, > > > > > > Try out this below code.(pdfbox.jar file must be in your classpath) > > > > > > public String getContent(InputStream reader) throws > IOException{PDFParser > > parser = null;PDDocument pdDoc = null;PDFTextStripper stripper = > null;String > > pdftext = "";try{parser = new PDFParser(reader);parser.parse();pdDoc = > > parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor = > > new > > > DecryptDocument(pdDoc);decryptor.decryptDocument("");}stripper = new > > PDFTextStripper();pdftext = stripper.getText(pdDoc); > > > > > >info = pdDoc.getDocumentInformation();}catch(Exception err) > > {System.out.println(err.getMessage());}pdDoc.close();return pdftext;} > > > > > > Natarajan. > > > > > > -Original Message- > > > From: Santosh [mailto:[EMAIL PROTECTED] > > > Sent: Saturday, August 21, 2004 3:14 PM > > > To: Lucene Users List > > > Subject: Re: pdfboxhelp > > > > > > Hi Don, > > > > > > your Idea is nice, but whenever I write the following code in > > > IndexHTML.java of lucene > > > > > > > > > import org.pdfbox.searchengine.lucene.*; > > > > > > File pdfFile = new File("/path/to/the/file.pdf"); > > > > > > // Below returns a parse PDF file in a Lucene Document object. > > > Document doc = LucenePDFDocument.getDocument(pdfFile); > > > > > > Iam getting the following error > > > > > > package org.pdfbox.searchengine.lucene does not exist > > > > > > I have downloaded pdfbox source code and kept the jar file in the > > > classpath, please help me on this- Original Message - From: Don > > Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 > > PMSubject: Re: pdfboxhelp > > > > > > > > > Here is the super simple code required. > > > > > > import org.pdfbox.searchengine.lucene.*; > > > > > > File pdfFile = new File("/path/to/the/file.pdf"); > > > > > > // Below returns a parse
Re: pdfboxhelp
hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\src>java org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: "Karthik N S" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp > Hi > > > To Begin with try to build Indexes offline [ out of Tomcat container] > and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the > search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ > > Karthik > > -Original Message- > From: Santosh [mailto:[EMAIL PROTECTED] > Sent: Saturday, August 21, 2004 4:55 PM > To: Lucene Users List > Subject: Re: pdfboxhelp > > > Yes I did the same. > I copied all the classes into classes folder but > now when I am building the index using IndexHTML the pdfs are not added to > this index, only text and htmls are added to index. > what changes should I do for IndexHTML.java to build index with pdf > - Original Message - > From: "Karthik N S" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Saturday, August 21, 2004 4:54 PM > Subject: RE: pdfboxhelp > > > > Hi > > > > If u are using the jar file with Web Interface for jsp/servlet dev, Place > > the jar file in "webapps///lib" > > and also correct the Classpath for the present modification. > > > > 2)create u'r own package and put all u'r java files copy the java files > to > > /Web-inf/Classes/ > > > > > > Then use the same..;{ > > > > > > Karthik > > > > -Original Message- > > From: Santosh [mailto:[EMAIL PROTECTED] > > Sent: Saturday, August 21, 2004 4:31 PM > > To: Lucene Users List > > Subject: Re: pdfboxhelp > > > > > > thanks Natarajan and karthik, > > > > I corrected classpath > > > > but where I should write your code? > > should I write your code in IndexHTML.java which comes along with lucene > or > > some other place? > > one more thing > > I kept pdfbox jar file in the classpath is this enough or I have to build > > the pdfbox? > > > > thankyou > > - Original Message - > > From: "Natarajan.T" <[EMAIL PROTECTED]> > > To: "'Lucene Users List'" <[EMAIL PROTECTED]> > > Sent: Saturday, August 21, 2004 3:20 PM > > Subject: RE: pdfboxhelp > > > > > > > Hi Santhosh, > > > > > > Try out this below code.(pdfbox.jar file must be in your classpath) > > > > > > public String getContent(InputStream reader) throws > IOException{PDFParser > > parser = null;PDDocument pdDoc = null;PDFTextStripper stripper = > null;String > > pdftext = "";try{parser = new PDFParser(reader);parser.parse();pdDoc = > > parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor = > > new > > > DecryptDocument(pdDoc);decryptor.decryptDocument("");}stripper = new > > PDFTextStripper();pdftext = stripper.getText(pdDoc); > > > > > >info = pdDoc.getDocumentInformation();}catch(Exception err) > > {System.out.println(err.getMessage());}pdDoc.close();return pdftext;} > > > > > > Natarajan. > > > > > > -Original Message- > > > From: Santosh [mailto:[EMAIL PROTECTED] > > > Sent: Saturday, August 21, 2004 3:14 PM > > > To: Lucene Users List > > > Subject: Re: pdfboxhelp > > > > > > Hi Don, > > > > > > your Idea is nice, but whenever I write the following code in > > > IndexHTML.java of lucene > > > > > > > > > import org.pdfbox.searchengine.lucene.*; > > > > > > File pdfFile = new File("/path/to/the/file.pdf"); > > > > > > // Below returns a parse PDF file in a Lucene Document object. > > > Document doc = LucenePDFDocument.getDocument(pdfFile); > > > > > > Iam getting the following error > > > > > > package org.pdfbox.searchengine.lucene does not exist > > > > > > I have downloaded pdfbox source code and kept the jar file in the > > > classpath, please help me on this- Original Message - From: Don > > Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 > > PMSubject: Re: pdfboxhelp > > > > > > > > > Here is the super simple code required. > > > > > > import org.pdfbox.searchengine.lucene.*; > > > > > > File pdfFile = new File("/path/to/the/file.pdf"); > > > > > > // Below returns a parse PDF file in a Lucene Document object.Document > > doc = LucenePDFDocument.getDocument(pdfFile); > > > > > > Santosh wrote: > > > > > > exactly, the same is required to me- Original Message - From: > Don > > Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 > > PMSubject: Re: pdfboxhelp > > > > > > > > > What are your intensions with PDFBox? > > > > > > You want
RE: pdfboxhelp
Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the real path of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable with requirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf - Original Message - From: "Karthik N S" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Saturday, August 21, 2004 4:54 PM Subject: RE: pdfboxhelp > Hi > > If u are using the jar file with Web Interface for jsp/servlet dev, Place > the jar file in "webapps///lib" > and also correct the Classpath for the present modification. > > 2)create u'r own package and put all u'r java files copy the java files to > /Web-inf/Classes/ > > > Then use the same..;{ > > > Karthik > > -Original Message- > From: Santosh [mailto:[EMAIL PROTECTED] > Sent: Saturday, August 21, 2004 4:31 PM > To: Lucene Users List > Subject: Re: pdfboxhelp > > > thanks Natarajan and karthik, > > I corrected classpath > > but where I should write your code? > should I write your code in IndexHTML.java which comes along with lucene or > some other place? > one more thing > I kept pdfbox jar file in the classpath is this enough or I have to build > the pdfbox? > > thankyou > - Original Message - > From: "Natarajan.T" <[EMAIL PROTECTED]> > To: "'Lucene Users List'" <[EMAIL PROTECTED]> > Sent: Saturday, August 21, 2004 3:20 PM > Subject: RE: pdfboxhelp > > > > Hi Santhosh, > > > > Try out this below code.(pdfbox.jar file must be in your classpath) > > > > public String getContent(InputStream reader) throws IOException{PDFParser > parser = null;PDDocument pdDoc = null;PDFTextStripper stripper = null;String > pdftext = "";try{parser = new PDFParser(reader);parser.parse();pdDoc = > parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor = > new > > DecryptDocument(pdDoc);decryptor.decryptDocument("");}stripper = new > PDFTextStripper();pdftext = stripper.getText(pdDoc); > > > >info = pdDoc.getDocumentInformation();}catch(Exception err) > {System.out.println(err.getMessage());}pdDoc.close();return pdftext;} > > > > Natarajan. > > > > -Original Message- > > From: Santosh [mailto:[EMAIL PROTECTED] > > Sent: Saturday, August 21, 2004 3:14 PM > > To: Lucene Users List > > Subject: Re: pdfboxhelp > > > > Hi Don, > > > > your Idea is nice, but whenever I write the following code in > > IndexHTML.java of lucene > > > > > > import org.pdfbox.searchengine.lucene.*; > > > > File pdfFile = new File("/path/to/the/file.pdf"); > > > > // Below returns a parse PDF file in a Lucene Document object. > > Document doc = LucenePDFDocument.getDocument(pdfFile); > > > > Iam getting the following error > > > > package org.pdfbox.searchengine.lucene does not exist > > > > I have downloaded pdfbox source code and kept the jar file in the > > classpath, please help me on this- Original Message - From: Don > Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 > PMSubject: Re: pdfboxhelp > > > > > > Here is the super simple code required. > > > > import org.pdfbox.searchengine.lucene.*; > > > > File pdfFile = new File("/path/to/the/file.pdf"); > > > > // Below returns a parse PDF file in a Lucene Document object.Document > doc = LucenePDFDocument.getDocument(pdfFile); > > > > Santosh wrote: > > > > exactly, the same is required to me- Original Message - From: Don > Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 > PMSubject: Re: pdfboxhelp > > > > > > What are your intensions with PDFBox? > > > > You want to use it to index PDF files? > > > > Santosh wrote: > > > > hi, > > > > I have downloaded pdfbox zip. but i am in ambigous state that where to > > start. how can I check with demo, I dont see any help document with this > > download, please help me. > > > > > > regards > > Santosh kumar > > SoftPro Systems > > Hyderabad > > > > > > "The harder you train in peace, the lesser you bleed in war" > > > > ---SOFTPRO DISCLAIMER-- > > > > Information contained in this E-MAIL and any attachments are > > confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' > > and 'confidential'. > > > > If you are not an intended or authorised recipient of this E-MAIL or > > have received it in error, You are notified that any use, copying or > > dissemination of the information contained in this E-MAIL in any > > manne
Re: speeding up queries (MySQL faster)
> For example, Nutch automatically translates such > clauses into QueryFilters. Thanks for the excellent pointer Doug! I'll will definitely be implementing this optimization. If anyone cares, I did a 1 minute hprof test with the search server in a servlet container. Here are the results (sorry about Yahoo's short line length). -Yonik resin.hprof.txt: Exclusive Method Times (CPU) (virtual times) 27390 (37.5%) java.net.PlainSocketImpl.socketAccept 14885 (20.4%) org.apache.lucene.index.SegmentTermDocs.skipTo 6700 (9.2%) org.apache.lucene.index.CompoundFileReader$CSInputStream.rea dInternal 5810 (8.0%) java.io.UnixFileSystem.list 4785 (6.5%) org.apache.lucene.store.InputStream.readByte 3315 (4.5%) java.io.RandomAccessFile.readBytes 1302 (1.8%) java.net.SocketOutputStream.socketWrite0 1004 (1.4%) java.io.RandomAccessFile.seek 546 (0.7%) java.lang.String.intern 336 (0.5%) com.caucho.vfs.WriteStream.print 248 (0.3%) org.apache.lucene.search.TermScorer.next 236 (0.3%) org.apache.lucene.queryParser.QueryParser.jj_scan_token 232 (0.3%) org.apache.lucene.index.SegmentTermEnum.readTerm 228 (0.3%) org.apache.lucene.search.ConjunctionScorer.score 200 (0.3%) org.apache.lucene.queryParser.FastCharStream.refill 196 (0.3%) org.apache.lucene.store.InputStream.readVInt 180 (0.2%) java.security.AccessController.doPrivileged 172 (0.2%) org.apache.lucene.search.ConjunctionScorer.doNext 152 (0.2%) java.lang.Object.clone 152 (0.2%) org.apache.lucene.index.SegmentReader.document 148 (0.2%) java.lang.Throwable.fillInStackTrace 128 (0.2%) org.apache.lucene.index.SegmentReader.norms 116 (0.2%) org.apache.lucene.store.InputStream.readString 112 (0.2%) java.lang.StrictMath.log 108 (0.1%) java.util.LinkedList.addLast 100 (0.1%) java.net.SocketInputStream.socketRead0 88 (0.1%) org.apache.lucene.search.ConjunctionScorer.next __ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: speeding up queries (MySQL faster)
Yonik Seeley wrote: Setup info & Stats: - 4.3M documents, 12 keyword fields per document, 11 [ ... ] "field1:4 AND field2:188453 AND field3:1" field1:4 done alone selects around 4.2M records field2:188453 done alone selects around 1.6M records field3:1 done alone selects around 1K records The whole query normally selects less than 50 records Only the first 10 are returned (or whatever range the client selects). The "field1:4" clause is probably dominating the cost of query execution. Clauses which match large portions of the collection are slow to evaluate. If there are not too many different such clauses then you can optimize this by re-using a Filter in place of such clauses, typically a QueryFilter. For example, Nutch automatically translates such clauses into QueryFilters. See: http://cvs.sourceforge.net/viewcvs.py/nutch/nutch/src/java/net/nutch/searcher/LuceneQueryOptimizer.java?view=markup Note that this only converts clauses whose boost is zero. Since filters do not affect ranking we can only safely convert clauses which do not contribute to the score, i.e, those whose boost is zero. Scores might still be different in the filtered results because of Similarity.coord(). But, in Nutch, Similarity.coord() is overidden to always return 1.0, so that the replacement of clauses with filters does not alter the final scores at all. Doug - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: speeding up queries (MySQL faster)
Oops, CPU usage is *not* 50%, but closer to 98%. This is due to a bug in CPU% on RHEL 3 on multiprocessor CPUS (I ran run multiple threads in while(1) loops, and it will still only show 50% CPU usage for that process). The agregated (not per-process) statistics shown by top are correct, and they show about 73% user time, 25% system time, and anywhere between .5% and 2% idle time. Unfortunately, this means that I won't be getting any performance improvements from using a second IndexSearcher, and I'm stuck at being 3 times slower than MySQL on the same data/queries. I guess the next step is some profiling... move the server out of the servlet container and move the clients in with the server, and then try some hprof work. Does anyone have pointers to lucene caching and how to tune it? -Yonik --- Bernhard Messer <[EMAIL PROTECTED]> wrote: > Yonik, > > there is another "synchronized" block in > CSInputStream which could block > your second cpu out. __ Do you Yahoo!? Yahoo! Mail - 50x more storage than other providers! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]