Re: searchhelp
For PDF you need to extract a text from pdf files using pdfbox library and for word documents u can use apache POI api's . There are messages posted on the lucene list related to your queries. About database ,i guess someone must have done it . :) - Original Message - From: Santosh [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:58 PM Subject: searchhelp Hi, I am using lucene search engine for my application. i am able to search through the text files and htmls as specified by lucene can you please clarify my doubts 1.can lucene search through pdfs and word documents? if yes then how? 2.can lucene search through database ? if yes then how? thankyou santosh ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searchhelp
The PDF and WORD stuff has been done too: have a look at http://www.zilverline.org. Michael Franken Chandan Tamrakar wrote: For PDF you need to extract a text from pdf files using pdfbox library and for word documents u can use apache POI api's . There are messages posted on the lucene list related to your queries. About database ,i guess someone must have done it . :) - Original Message - From: Santosh [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:58 PM Subject: searchhelp Hi, I am using lucene search engine for my application. i am able to search through the text files and htmls as specified by lucene can you please clarify my doubts 1.can lucene search through pdfs and word documents? if yes then how? 2.can lucene search through database ? if yes then how? thankyou santosh ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searchhelp
I am recently joined into list, I didnt gone through any previous mails, if you have any mails or related code please forward it to me - Original Message - From: Chandan Tamrakar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:47 PM Subject: Re: searchhelp For PDF you need to extract a text from pdf files using pdfbox library and for word documents u can use apache POI api's . There are messages posted on the lucene list related to your queries. About database ,i guess someone must have done it . :) - Original Message - From: Santosh [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:58 PM Subject: searchhelp Hi, I am using lucene search engine for my application. i am able to search through the text files and htmls as specified by lucene can you please clarify my doubts 1.can lucene search through pdfs and word documents? if yes then how? 2.can lucene search through database ? if yes then how? thankyou santosh ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searchhelp
Hi, Note that Lucene only provides an API to build a search engine you can use it how ever you want it. You can pass data to indexing in 2 forms. 1. java.lang.String 2. java.io.Reader What Lucene recieves is any of the two objects above. Now in the case of non-text documents you need to extract the text information from the documents and either create as a text file and convert to a Reader object or creat a String object (for small files). For indexing database contents, you need to write your own APIs to get data from the database (using JDBC/EJB etc), convert the data to a String object and pass it to Lucene for indexing. Again Lucene is not responsible for getting the data from your application. It only indexed the data given it to you. Also for extracting contents from pdf doc files(generally known as straining) I know of 2 more tools wvWare - for word documents pdftotext(xpdf) - for pdf documents. Google around and you will get lot of links. Hope this helps. Thanks, George --- Santosh [EMAIL PROTECTED] wrote: I am recently joined into list, I didnt gone through any previous mails, if you have any mails or related code please forward it to me - Original Message - From: Chandan Tamrakar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:47 PM Subject: Re: searchhelp For PDF you need to extract a text from pdf files using pdfbox library and for word documents u can use apache POI api's . There are messages posted on the lucene list related to your queries. About database ,i guess someone must have done it . :) - Original Message - From: Santosh [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:58 PM Subject: searchhelp Hi, I am using lucene search engine for my application. i am able to search through the text files and htmls as specified by lucene can you please clarify my doubts 1.can lucene search through pdfs and word documents? if yes then how? 2.can lucene search through database ? if yes then how? thankyou santosh ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ___ALL-NEW Yahoo! Messenger - all new features - even more fun! http://uk.messenger.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searchhelp
for pdf u can refer www.pdfbox.org and pls. check the apache POI project in jakarta.apache.org site for indexing MS documents. - Original Message - From: Santosh [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 4:09 PM Subject: Re: searchhelp I am recently joined into list, I didnt gone through any previous mails, if you have any mails or related code please forward it to me - Original Message - From: Chandan Tamrakar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:47 PM Subject: Re: searchhelp For PDF you need to extract a text from pdf files using pdfbox library and for word documents u can use apache POI api's . There are messages posted on the lucene list related to your queries. About database ,i guess someone must have done it . :) - Original Message - From: Santosh [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:58 PM Subject: searchhelp Hi, I am using lucene search engine for my application. i am able to search through the text files and htmls as specified by lucene can you please clarify my doubts 1.can lucene search through pdfs and word documents? if yes then how? 2.can lucene search through database ? if yes then how? thankyou santosh ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: searchhelp
JGURU FAQ http://www.jguru.com/faq/Lucene OFFICIAL FAQ http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi MAIL ARCHIVE http://www.mail-archive.com/[EMAIL PROTECTED]/ hope this helps. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: 19 August 2004 11:25 To: Lucene Users List Subject: Re: searchhelp I am recently joined into list, I didnt gone through any previous mails, if you have any mails or related code please forward it to me - Original Message - From: Chandan Tamrakar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:47 PM Subject: Re: searchhelp For PDF you need to extract a text from pdf files using pdfbox library and for word documents u can use apache POI api's . There are messages posted on the lucene list related to your queries. About database ,i guess someone must have done it . :) - Original Message - From: Santosh [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:58 PM Subject: searchhelp Hi, I am using lucene search engine for my application. i am able to search through the text files and htmls as specified by lucene can you please clarify my doubts 1.can lucene search through pdfs and word documents? if yes then how? 2.can lucene search through database ? if yes then how? thankyou santosh ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searchhelp
thanks everybody, but i didnt got any code or any real help in this links any body has performed previously this search?if yes then please send me the code, or tell me the what code I have to add to my present lucene - Original Message - From: David Townsend [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 4:17 PM Subject: RE: searchhelp JGURU FAQ http://www.jguru.com/faq/Lucene OFFICIAL FAQ http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi MAIL ARCHIVE http://www.mail-archive.com/[EMAIL PROTECTED]/ hope this helps. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: 19 August 2004 11:25 To: Lucene Users List Subject: Re: searchhelp I am recently joined into list, I didnt gone through any previous mails, if you have any mails or related code please forward it to me - Original Message - From: Chandan Tamrakar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:47 PM Subject: Re: searchhelp For PDF you need to extract a text from pdf files using pdfbox library and for word documents u can use apache POI api's . There are messages posted on the lucene list related to your queries. About database ,i guess someone must have done it . :) - Original Message - From: Santosh [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:58 PM Subject: searchhelp Hi, I am using lucene search engine for my application. i am able to search through the text files and htmls as specified by lucene can you please clarify my doubts 1.can lucene search through pdfs and word documents? if yes then how? 2.can lucene search through database ? if yes then how? thankyou santosh ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: searchhelp
As far as I remember, the pdfbox release includes some existing code to index pdfs with lucene, based upon the demo created for lucene 1.3. In fact, I think the code only works for lucene 1,3 - something to do with a change from arrays to vectors in lucene 1.4. I may be wrong though. http://www.csh.rit.edu/~ben/projects/pdfbox/javadoc/org/pdfbox/searchengine/lucene/package-summary.html thanks everybody, but i didnt got any code or any real help in this links any body has performed previously this search?if yes then please send me the code, or tell me the what code I have to add to my present lucene - Original Message - From: David Townsend [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 4:17 PM Subject: RE: searchhelp JGURU FAQ http://www.jguru.com/faq/Lucene OFFICIAL FAQ http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi MAIL ARCHIVE http://www.mail-archive.com/[EMAIL PROTECTED]/ hope this helps. -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: 19 August 2004 11:25 To: Lucene Users List Subject: Re: searchhelp I am recently joined into list, I didnt gone through any previous mails, if you have any mails or related code please forward it to me - Original Message - From: Chandan Tamrakar [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:47 PM Subject: Re: searchhelp For PDF you need to extract a text from pdf files using pdfbox library and for word documents u can use apache POI api's . There are messages posted on the lucene list related to your queries. About database ,i guess someone must have done it . :) - Original Message - From: Santosh [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Thursday, August 19, 2004 3:58 PM Subject: searchhelp Hi, I am using lucene search engine for my application. i am able to search through the text files and htmls as specified by lucene can you please clarify my doubts 1.can lucene search through pdfs and word documents? if yes then how? 2.can lucene search through database ? if yes then how? thankyou santosh ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]