problem restoring index
hi, when I restart the tomcat . the Index is getting corrupted. If I take the backup of Index and then restarting tomcat. the Index is not working properly. Do I have to Index again all the documents whenever I restart the Tomcat? ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Help for alternatives for search words
I am using lucene for searching. It is working fine but I have a problem with alternate words.If I am searching for fooddy the lucene will give the result for food also . how can I trace these alternate word foam in these documents. How lucene will support this feature? how can I get alternate words list. I need three alternate words for each search word. when each time a user enters roam I need to show are you searching for ? (food,foods,foody) ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Re: modifying existing index
I am able to delete now the Index using the following if(indexDir.exists()) { IndexReader reader = IndexReader.open( indexDir ); uidIter = reader.terms(new Term(id, )); while (uidIter.term() != null uidIter.term().field() == id) { reader.delete(uidIter.term()); uidIter.next(); } reader.close(); } where id is the keyword field. But here also all the documents are deleted. How can I modify my code and delete particular document with given id Iam creating the index in the following way Document doc = new Document(); doc.add(Field.Text(text,text)); doc.add(Field.Keyword(id,Long.toString(id))); doc.add(Field.Keyword(title,title)); doc.add(Field.Keyword(keywords,keywords)); doc.add(Field.Keyword(type,type)); writer.addDocument(doc); - Original Message - From: Chuck Williams [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 1:06 PM Subject: RE: modifying existing index A good way to do this is to add a keyword field with whatever unique id you have for the document. Then you can delete the term containing a unique id to delete the document from the index (look at IndexReader.delete(Term)). You can look at the demo class IndexHTML to see how it does incremental indexing for an example. Chuck -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 23, 2004 11:34 PM To: Lucene Users List Subject: Re: modifying existing index I have gon through IndexReader , I got method : delete(int docNum) , but from where I will get document number? Is this predifined? or we have to give a number prior to indexing? - Original Message - From: Luke Francl [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 1:26 AM Subject: Re: modifying existing indexOn Tue, 2004-11-23 at 13:59, Santosh wrote: I am using lucene for indexing, when I am creating Index the docuemnts are added. but when I want to modify the single existing document and reIndex again, it is taking as new document and adding one more time, so that I am getting same document twice in the results. To overcome this I am deleting existing Index and again recreating whole Index. but is it possibe to index the modified document again and overwrite existing document without deleting and recreation. can I do this? If so how? You do not need to recreate the whole index. Just mark the document as deleted using the IndexReader and then add it again with the IndexWriter. Remember to close your IndexReader and IndexWriter after doing this. The deleted document will be removed the next time you optimize your index. Luke Francl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
modifying existing index
I am using lucene for indexing, when I am creating Index the docuemnts are added. but when I want to modify the single existing document and reIndex again, it is taking as new document and adding one more time, so that I am getting same document twice in the results. To overcome this I am deleting existing Index and again recreating whole Index. but is it possibe to index the modified document again and overwrite existing document without deleting and recreation. can I do this? If so how? and one more question. can lucene will be able to do stemming? If I am searching for roam then I know that it can give result for foam using fuzzy query. But my requirement is if I search for roam can I get the similar worlist as output. so that I can show the end user in the column --- do you mean foam? How can I get similar word list in the given content? ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
fetching similar wordlist as given word
can lucene will be able to do stemming? If I am searching for roam then I know that it can give result for foam using fuzzy query. But my requirement is if I search for roam can I get the similar wordlist as output. so that I can show the end user in the column --- do you mean foam? How can I get similar word list in the given content? ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Re: modifying existing index
I have gon through IndexReader , I got method : delete(int docNum) , but from where I will get document number? Is this predifined? or we have to give a number prior to indexing? - Original Message - From: Luke Francl [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 1:26 AM Subject: Re: modifying existing index On Tue, 2004-11-23 at 13:59, Santosh wrote: I am using lucene for indexing, when I am creating Index the docuemnts are added. but when I want to modify the single existing document and reIndex again, it is taking as new document and adding one more time, so that I am getting same document twice in the results. To overcome this I am deleting existing Index and again recreating whole Index. but is it possibe to index the modified document again and overwrite existing document without deleting and recreation. can I do this? If so how? You do not need to recreate the whole index. Just mark the document as deleted using the IndexReader and then add it again with the IndexWriter. Remember to close your IndexReader and IndexWriter after doing this. The deleted document will be removed the next time you optimize your index. Luke Francl - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: worddoucments search
I have gon through textmining.org, I am able to extract text in string format. but how can I get it as lucene document format - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, August 24, 2004 11:54 PM Subject: Re: worddoucments search As I just answered in a separate email to Ryan - we used textmining.orglibrary, too, as an example of something that is easier to use thanPOI. It's been a while since I wrote that chapter, so it slipped mymind when I replied. Yes, use textmining.org first, you'll be able toinclude it in your code in 2 minutes. Good stuff. Otis - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
integrationofLucene and PDF box
any body integrated lucene with pdfbox? can we do it by changing the code in the IndexFiles.java or IndexHTML.java regards Santosh kumar ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Re: integration of lucene with pdfbox
I dont know how to add lucene document to index, i know how to add given directory. any body please tell me how to add lucene document to index - Original Message - From: Ben Litchfield [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 8:13 PM Subject: Re: integration of lucene with pdfbox If you can use lucene on its own then you already know how to add a lucene Document to the index. So you need to be able to take a PDF and get a lucene Document. org.pdfbox.searchengine.lucene.LucenePDFDocument.getDocument() does that for you. Ben On Mon, 23 Aug 2004, Santosh wrote: I have downloaded pdfbox and lucene and kept jar files in the class path, I am able to work with both of them independently but how can I integrate both regards Santosh kumar ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
worddoucments search
Can lucene be able to search word documents? if so please give me information about it regards Santosh kumar ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Re: pdfboxhelp
Hi natarajan, I kept log4j.properties in the classpath my new classpath is .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien t.ja r;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2sdk1.4.1\ lib\ xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\j2sdk1.4.1\l ib\s ervlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2sdk1. 4.1\ lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl.jar;C:\ j2sd k1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2sdk1.4.1\lib\ jaxp .jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.zip;C:\struts.jar ;F:\ apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6.jar;C:\j2sdk1.4. 1\li b\lucene-20030909.jar;D:\setups\searchEngine\PDFBox-0.6.6\external\log4j.jar ;C:\ j2sdk1.4.1\lib\log4j.properties; but there is no difference in the output - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:56 AM Subject: RE: pdfboxhelp Hi Santhosh, The attached file must be in your class path. Natarajan. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:51 AM To: Lucene Users List Subject: Fw: pdfboxhelp hi karthik, did u find any solution? should I send the pdf to u? - Original Message - From: Santosh [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:23 AM Subject: Re: pdfboxhelp hi karthik, I kept log4j in the classpath , I am sending classpath variable CLASSPATH .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webc lien t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2s dk1. 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\ j2sd k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2s dk1. 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl .jar ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2s dk1. 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.z ip;C :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0. 6.6. jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox- 0.6. 6\external\log4j.jar please check the error - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:26 AM Subject: RE: pdfboxhelp Hi Santosh I think u'r Pdf is using Log4j package ,Try toe set the classpath for log4j.jar path. [ Is it a just a WARNING or an ERROR u are getting. Send me in u'r Configuration management Let me help u with it ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:11 AM To: Lucene Users List Cc: Ben Litchfield Subject: Re: pdfboxhelp hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:54 PM Subject: RE: pdfboxhelp Hi If u are using
Re: pdfboxhelp
I kept the file in the classpath .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien t.ja r;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;D:\JAVAPRO;E:\ Prog ram Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\j2sdk1.4.1\lib\classes12.z ip;C:\struts.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6.jar;C:\j2sdk1.4.1\lib\lucene -200 30909.jar;D:\setups\searchEngine\PDFBox-0.6.6\external\log4j.jar;C:\j2sdk1.4 .1\l ib\log4j.properties;D:\setups\searchEngine\PDFBox-0.6.6\external\ant.jar;D:\ setu ps\searchEngine\PDFBox-0.6.6\external\checkstyle-all-2.4.jar;D:\setups\searc hEng ine\PDFBox-0.6.6\external\junit.jar;D:\setups\searchEngine\PDFBox-0.6.6\exte rnal \lucene-1.4-final.jar;D:\setups\searchEngine\PDFBox-0.6.6\external\lucene-de mos- 1.4-final.jar;D:\setups\searchEngine\PDFBox-0.6.6\external\xercesImpl.jar;D: \set ups\searchEngine\PDFBox-0.6.6\external\xml-apis.jar; but there is no change in the output, it is same as previous E:\java org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly. what might be the error? - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:56 AM Subject: RE: pdfboxhelp Hi Santhosh, The attached file must be in your class path. Natarajan. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:51 AM To: Lucene Users List Subject: Fw: pdfboxhelp hi karthik, did u find any solution? should I send the pdf to u? - Original Message - From: Santosh [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:23 AM Subject: Re: pdfboxhelp hi karthik, I kept log4j in the classpath , I am sending classpath variable CLASSPATH .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webc lien t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2s dk1. 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\ j2sd k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2s dk1. 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl .jar ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2s dk1. 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.z ip;C :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0. 6.6. jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox- 0.6. 6\external\log4j.jar please check the error - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:26 AM Subject: RE: pdfboxhelp Hi Santosh I think u'r Pdf is using Log4j package ,Try toe set the classpath for log4j.jar path. [ Is it a just a WARNING or an ERROR u are getting. Send me in u'r Configuration management Let me help u with it ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:11 AM To: Lucene Users List Cc: Ben Litchfield Subject: Re: pdfboxhelp hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf
integration of lucene with pdfbox
I have downloaded pdfbox and lucene and kept jar files in the class path, I am able to work with both of them independently but how can I integrate both regards Santosh kumar ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Re: pdfboxhelp
hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:54 PM Subject: RE: pdfboxhelp Hi If u are using the jar file with Web Interface for jsp/servlet dev, Place the jar file in webapps/u'rapplication/Web-inf/lib and also correct the Classpath for the present modification. 2)create u'r own package and put all u'r java files copy the java files to /Web-inf/Classes/u'r package Then use the same..;{ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:31 PM To: Lucene Users List Subject: Re: pdfboxhelp thanks Natarajan and karthik, I corrected classpath but where I should write your code? should I write your code in IndexHTML.java which comes along with lucene or some other place? one more thing I kept pdfbox jar file in the classpath is this enough or I have to build the pdfbox? thankyou - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 3:20 PM Subject: RE: pdfboxhelp Hi Santhosh, Try out this below code.(pdfbox.jar file must be in your classpath) public String getContent(InputStream reader) throws IOException{PDFParser parser = null;PDDocument pdDoc = null;PDFTextStripper stripper = null;String pdftext = ;try{parser = new PDFParser(reader);parser.parse();pdDoc = parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor = new DecryptDocument(pdDoc);decryptor.decryptDocument();}stripper = new PDFTextStripper();pdftext = stripper.getText(pdDoc); info = pdDoc.getDocumentInformation();}catch(Exception err) {System.out.println(err.getMessage());}pdDoc.close();return pdftext;} Natarajan. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 3:14 PM To: Lucene Users List Subject: Re: pdfboxhelp Hi Don, your Idea is nice, but whenever I write the following code in IndexHTML.java of lucene import org.pdfbox.searchengine.lucene.*; File pdfFile = new File(/path/to/the/file.pdf); // Below returns a parse PDF file in a Lucene Document object. Document doc = LucenePDFDocument.getDocument(pdfFile); Iam getting the following error package org.pdfbox.searchengine.lucene does not exist I have downloaded pdfbox source code and kept the jar file in the classpath, please help me on this- Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 PMSubject: Re: pdfboxhelp Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File(/path/to/the/file.pdf); // Below returns a parse PDF file in a Lucene Document object.Document doc = LucenePDFDocument.getDocument(pdfFile); Santosh wrote: exactly, the same is required to me- Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PMSubject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems
Re: pdfboxhelp
hi karthik, I kept log4j in the classpath , I am sending classpath variable CLASSPATH .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2sdk1. 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\j2sd k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2sdk1. 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl.jar ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2sdk1. 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.zip;C :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6. jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox-0.6. 6\external\log4j.jar please check the error - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:26 AM Subject: RE: pdfboxhelp Hi Santosh I think u'r Pdf is using Log4j package ,Try toe set the classpath for log4j.jar path. [ Is it a just a WARNING or an ERROR u are getting. Send me in u'r Configuration management Let me help u with it ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:11 AM To: Lucene Users List Cc: Ben Litchfield Subject: Re: pdfboxhelp hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:54 PM Subject: RE: pdfboxhelp Hi If u are using the jar file with Web Interface for jsp/servlet dev, Place the jar file in webapps/u'rapplication/Web-inf/lib and also correct the Classpath for the present modification. 2)create u'r own package and put all u'r java files copy the java files to /Web-inf/Classes/u'r package Then use the same..;{ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:31 PM To: Lucene Users List Subject: Re: pdfboxhelp thanks Natarajan and karthik, I corrected classpath but where I should write your code? should I write your code in IndexHTML.java which comes along with lucene or some other place? one more thing I kept pdfbox jar file in the classpath is this enough or I have to build the pdfbox? thankyou - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 3:20 PM Subject: RE: pdfboxhelp Hi Santhosh, Try out this below code.(pdfbox.jar file must be in your classpath) public String getContent(InputStream reader) throws IOException{PDFParser parser = null;PDDocument pdDoc = null;PDFTextStripper stripper = null;String pdftext = ;try{parser = new PDFParser(reader);parser.parse();pdDoc = parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor = new DecryptDocument(pdDoc);decryptor.decryptDocument();}stripper = new PDFTextStripper();pdftext = stripper.getText(pdDoc); info = pdDoc.getDocumentInformation();}catch(Exception err) {System.out.println(err.getMessage());}pdDoc.close();return pdftext;} Natarajan. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday
Fw: pdfboxhelp
hi karthik, did u find any solution? should I send the pdf to u? - Original Message - From: Santosh [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:23 AM Subject: Re: pdfboxhelp hi karthik, I kept log4j in the classpath , I am sending classpath variable CLASSPATH .;..;C:\j2sdk1.4.1\lib;C:\j2sdk1.4.1\lib\jndi.jar;C:\j2sdk1.4.1\lib\webclien t.jar;C:\j2sdk1.4.1\lib\mail.jar;C:\j2sdk1.4.1\lib\activation.jar;C:\j2sdk1. 4.1\lib\xml-apis.jar;D:\JAVAPRO;C:\j2sdk1.4.1\jre\lib\ext\msbase.jar;C:\j2sd k1.4.1\lib\servlet.jar;E:\Program Files\Apache Tomcat 4.0\common\lib\servlet.jar;C:\Program Files\Altova\xmlspy\XMLSpyInterface.jar;C:\j2sdk1.4.1\lib\sax.jar;C:\j2sdk1. 4.1\lib\dom.jar;C:\j2sdk1.4.1\lib\xalan.jar;C:\j2sdk1.4.1\lib\xercesImpl.jar ;C:\j2sdk1.4.1\lib\xmlParserAPIs.jar;C:\j2sdk1.4.1\lib\parser.jar;C:\j2sdk1. 4.1\lib\jaxp.jar;C:\j2sdk1.4.1\lib\xml.jar;C:\j2sdk1.4.1\lib\classes12.zip;C :\struts.jar;F:\apache-ant-1.6.1\lib\ant.jar;C:\j2sdk1.4.1\lib\PDFBox-0.6.6. jar;C:\j2sdk1.4.1\lib\lucene-20030909.jar;D:\setups\searchEngine\PDFBox-0.6. 6\external\log4j.jar please check the error - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:26 AM Subject: RE: pdfboxhelp Hi Santosh I think u'r Pdf is using Log4j package ,Try toe set the classpath for log4j.jar path. [ Is it a just a WARNING or an ERROR u are getting. Send me in u'r Configuration management Let me help u with it ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Monday, August 23, 2004 10:11 AM To: Lucene Users List Cc: Ben Litchfield Subject: Re: pdfboxhelp hi karthik, I have downloaded pdfbox and kept pdfjar file in the classpath, but when I am typing following command in the command prompt I am getting the error: D:\setups\searchEngine\PDFBox-0.6.6\srcjava org.pdfbox.ExtractText C:\test.pdf C:\test.txt log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse r). log4j:WARN Please initialize the log4j system properly why I am getting this error? plz help - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Monday, August 23, 2004 9:21 AM Subject: RE: pdfboxhelp Hi To Begin with try to build Indexes offline [ out of Tomcat container] and on completing indxexes, feed u'r search with the realpath of the offline indexed folder,Start the Tomcat and then use the search on As u experiment it out u will be comfortable withrequirment of Indexing /Search.. ; [ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:55 PM To: Lucene Users List Subject: Re: pdfboxhelp Yes I did the same. I copied all the classes into classes folder but now when I am building the index using IndexHTML the pdfs are not added to this index, only text and htmls are added to index. what changes should I do for IndexHTML.java to build index with pdf - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:54 PM Subject: RE: pdfboxhelp Hi If u are using the jar file with Web Interface for jsp/servlet dev, Place the jar file in webapps/u'rapplication/Web-inf/lib and also correct the Classpath for the present modification. 2)create u'r own package and put all u'r java files copy the java files to /Web-inf/Classes/u'r package Then use the same..;{ Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 4:31 PM To: Lucene Users List Subject: Re: pdfboxhelp thanks Natarajan and karthik, I corrected classpath but where I should write your code? should I write your code in IndexHTML.java which comes along with lucene or some other place? one more thing I kept pdfbox jar file in the classpath is this enough or I have to build the pdfbox? thankyou - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 3:20 PM Subject: RE: pdfboxhelp Hi Santhosh, Try out this below code.(pdfbox.jar file must be in your classpath) public String getContent(InputStream reader) throws IOException{PDFParser parser = null;PDDocument pdDoc = null;PDFTextStripper stripper = null;String pdftext = ;try{parser = new PDFParser(reader);parser.parse();pdDoc = parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor = new
Re: pdfboxhelp
thanks Natarajan and karthik, I corrected classpath but where I should write your code? should I write your code in IndexHTML.java which comes along with lucene or some other place? one more thing I kept pdfbox jar file in the classpath is this enough or I have to build the pdfbox? thankyou - Original Message - From: Natarajan.T [EMAIL PROTECTED] To: 'Lucene Users List' [EMAIL PROTECTED] Sent: Saturday, August 21, 2004 3:20 PM Subject: RE: pdfboxhelp Hi Santhosh, Try out this below code.(pdfbox.jar file must be in your classpath) public String getContent(InputStream reader) throws IOException{PDFParser parser = null;PDDocument pdDoc = null;PDFTextStripper stripper = null;String pdftext = ;try{parser = new PDFParser(reader);parser.parse();pdDoc = parser.getPDDocument();if(pdDoc.isEncrypted()){DecryptDocument decryptor = new DecryptDocument(pdDoc);decryptor.decryptDocument();}stripper = new PDFTextStripper();pdftext = stripper.getText(pdDoc); info = pdDoc.getDocumentInformation();}catch(Exception err) {System.out.println(err.getMessage());}pdDoc.close();return pdftext;} Natarajan. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Saturday, August 21, 2004 3:14 PM To: Lucene Users List Subject: Re: pdfboxhelp Hi Don, your Idea is nice, but whenever I write the following code in IndexHTML.java of lucene import org.pdfbox.searchengine.lucene.*; File pdfFile = new File(/path/to/the/file.pdf); // Below returns a parse PDF file in a Lucene Document object. Document doc = LucenePDFDocument.getDocument(pdfFile); Iam getting the following error package org.pdfbox.searchengine.lucene does not exist I have downloaded pdfbox source code and kept the jar file in the classpath, please help me on this- Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 PMSubject: Re: pdfboxhelp Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File(/path/to/the/file.pdf); // Below returns a parse PDF file in a Lucene Document object.Document doc = LucenePDFDocument.getDocument(pdfFile); Santosh wrote: exactly, the same is required to me- Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PMSubject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad The harder you train in peace, the lesser you bleed in war ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- Don VaillancourtDirector of Software Development WEB IMPACT INC.phone: 416-815-2000 ext. 245fax: 416-815-2001email: [EMAIL PROTECTED]: http://www.web-impact.com This email message is intended only for the addressee(s)and contains information that may be confidential and/orcopyright. If you are not the intended recipient pleasenotify the sender by reply email and immediately deletethis email. Use, disclosure or reproduction of this emailby anyone other than the intended recipient(s) is strictlyprohibited. No representation is made that this email orany attachments are free of viruses. Virus scanning isrecommended and is the responsibility of the recipient. ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete
pdf search
Hi, I am new bee to lucene. I have downloaded zip file. now how can i give my own list words to lucene? In the demo i saw that lucene is automatically creating index if we run the java program.but I want to give my own search words, how is it possible? regards Santosh kumar SoftPro Systems Hyderabad The harder you train in peace, the lesser you bleed in war ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Fw: pdf search
How can I search through PDF? - Original Message - From: Santosh To: Lucene Users List Sent: Friday, August 20, 2004 5:59 PM Subject: pdf search Hi, I am new bee to lucene. I have downloaded zip file. now how can i give my own list words to lucene? In the demo i saw that lucene is automatically creating index if we run the java program.but I want to give my own search words, how is it possible? regards Santosh kumar SoftPro Systems Hyderabad The harder you train in peace, the lesser you bleed in war ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
pdfboxhelp
hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad The harder you train in peace, the lesser you bleed in war ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Re: pdf search
hi karthik, I have a website with some items, each contain html and pdf documents , I have to store keywords against each item, whenever a user enters any search word if it matches with any one of the existing keyword list then it should show the link to particular Item. - Original Message - From: Karthik N S [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Friday, August 20, 2004 6:56 PM Subject: RE: pdf search hi What is that u intend to Search and What is this own 'search words' First Explain properly u'r requirement to the form to get intented results. with regards Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Friday, August 20, 2004 5:59 PM To: Lucene Users List Subject: pdf search Hi, I am new bee to lucene. I have downloaded zip file. now how can i give my own list words to lucene? In the demo i saw that lucene is automatically creating index if we run the java program.but I want to give my own search words, how is it possible? regards Santosh kumar SoftPro Systems Hyderabad The harder you train in peace, the lesser you bleed in war ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: pdfboxhelp
exactly, the same is required to me - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PM Subject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad The harder you train in peace, the lesser you bleed in war ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Re: pdfboxhelp
- Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 PM Subject: Re: pdfboxhelp Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File(/path/to/the/file.pdf); // Below returns a parse PDF file in a Lucene Document object. Document doc = LucenePDFDocument.getDocument(pdfFile); Santosh wrote: exactly, the same is required to me - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PM Subject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad The harder you train in peace, the lesser you bleed in war ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS
Re: pdfboxhelp
Iam sorry, mail has been sent accidentally - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 8:02 PM Subject: Re: pdfboxhelp Did I leave you speechless!? :-) Santosh wrote: - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 PM Subject: Re: pdfboxhelp Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File(/path/to/the/file.pdf); // Below returns a parse PDF file in a Lucene Document object. Document doc = LucenePDFDocument.getDocument(pdfFile); Santosh wrote: exactly, the same is required to me - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PM Subject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad The harder you train in peace, the lesser you bleed in war ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
searchhelp
Hi, I am using lucene search engine for my application. i am able to search through the text files and htmls as specified by lucene can you please clarify my doubts 1.can lucene search through pdfs and word documents? if yes then how? 2.can lucene search through database ? if yes then how? thankyou santosh ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Re: searchhelp
I am recently joined into list, I didnt gone through any previous mails, if you have any mails or related code please forward it to me - Original Message - From: Chandan Tamrakar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:47 PM Subject: Re: searchhelp For PDF you need to extract a text from pdf files using pdfbox library and for word documents u can use apache POI api's . There are messages posted on the lucene list related to your queries. About database ,i guess someone must have done it . :) - Original Message - From: Santosh [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:58 PM Subject: searchhelp Hi, I am using lucene search engine for my application. i am able to search through the text files and htmls as specified by lucene can you please clarify my doubts 1.can lucene search through pdfs and word documents? if yes then how? 2.can lucene search through database ? if yes then how? thankyou santosh ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searchhelp
thanks everybody, but i didnt got any code or any real help in this links any body has performed previously this search?if yes then please send me the code, or tell me the what code I have to add to my present lucene - Original Message - From: David Townsend [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 4:17 PM Subject: RE: searchhelp JGURU FAQ http://www.jguru.com/faq/Lucene OFFICIAL FAQ http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi MAIL ARCHIVE http://www.mail-archive.com/[EMAIL PROTECTED]/ hope this helps. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: 19 August 2004 11:25 To: Lucene Users List Subject: Re: searchhelp I am recently joined into list, I didnt gone through any previous mails, if you have any mails or related code please forward it to me - Original Message - From: Chandan Tamrakar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:47 PM Subject: Re: searchhelp For PDF you need to extract a text from pdf files using pdfbox library and for word documents u can use apache POI api's . There are messages posted on the lucene list related to your queries. About database ,i guess someone must have done it . :) - Original Message - From: Santosh [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:58 PM Subject: searchhelp Hi, I am using lucene search engine for my application. i am able to search through the text files and htmls as specified by lucene can you please clarify my doubts 1.can lucene search through pdfs and word documents? if yes then how? 2.can lucene search through database ? if yes then how? thankyou santosh ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]