Re: searchhelp

2004-08-19 Thread Chandan Tamrakar
For PDF you need to extract a text from pdf files using pdfbox library  and
for word documents u can use apache POI api's . There are messages
posted on the  lucene list related to your queries. About database ,i guess
someone must have done it . :)

- Original Message - 
From: Santosh [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 3:58 PM
Subject: searchhelp


Hi,

I am using lucene search engine for my application.

i am able to search through the text files and htmls as specified by lucene

can you please clarify my doubts

1.can lucene search through pdfs and word documents? if yes then how?

2.can lucene search through database ? if yes then how?

thankyou

santosh


---SOFTPRO DISCLAIMER--

Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.

If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.

In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.

SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.

The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: searchhelp

2004-08-19 Thread Zilverline info
The PDF and WORD stuff has been done too: have a look at 
http://www.zilverline.org.

Michael Franken
Chandan Tamrakar wrote:
For PDF you need to extract a text from pdf files using pdfbox library  and
for word documents u can use apache POI api's . There are messages
posted on the  lucene list related to your queries. About database ,i guess
someone must have done it . :)
- Original Message - 
From: Santosh [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 3:58 PM
Subject: searchhelp

Hi,
I am using lucene search engine for my application.
i am able to search through the text files and htmls as specified by lucene
can you please clarify my doubts
1.can lucene search through pdfs and word documents? if yes then how?
2.can lucene search through database ? if yes then how?
thankyou
santosh
---SOFTPRO DISCLAIMER--
Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.
If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.
In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.
SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.
The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: searchhelp

2004-08-19 Thread Santosh
I am recently joined into list, I didnt gone through any previous mails, if
you have any mails or related code please forward it to me
- Original Message -
From: Chandan Tamrakar [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 3:47 PM
Subject: Re: searchhelp


 For PDF you need to extract a text from pdf files using pdfbox library
and
 for word documents u can use apache POI api's . There are messages
 posted on the  lucene list related to your queries. About database ,i
guess
 someone must have done it . :)

 - Original Message -
 From: Santosh [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Thursday, August 19, 2004 3:58 PM
 Subject: searchhelp


 Hi,

 I am using lucene search engine for my application.

 i am able to search through the text files and htmls as specified by
lucene

 can you please clarify my doubts

 1.can lucene search through pdfs and word documents? if yes then how?

 2.can lucene search through database ? if yes then how?

 thankyou

 santosh


 ---SOFTPRO DISCLAIMER--

 Information contained in this E-MAIL and any attachments are
 confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
 and 'confidential'.

 If you are not an intended or authorised recipient of this E-MAIL or
 have received it in error, You are notified that any use, copying or
 dissemination  of the information contained in this E-MAIL in any
 manner whatsoever is strictly prohibited. Please delete it immediately
 and notify the sender by E-MAIL.

 In such a case reading, reproducing, printing or further dissemination
 of this E-MAIL is strictly prohibited and may be unlawful.

 SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
 hereto is free from computer viruses or other defects.

 The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
 those of the author and are not necessarily those of SOFTPRO SYSTEMS.
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: searchhelp

2004-08-19 Thread Honey George
Hi,
  Note that Lucene only provides an API to build a
search engine you can use it how ever you want it. You
can pass data to indexing in 2 forms.
1. java.lang.String
2. java.io.Reader

What Lucene recieves is any of the two objects above.
Now in the case of non-text documents you need to
extract the text information from the documents and
either create as a text file and convert to a Reader
object or creat a String object (for small files). 

For indexing database contents, you need to write your
own APIs to get data from the database (using JDBC/EJB
etc), convert the data to a String object and pass it
to Lucene for indexing.

Again Lucene is not responsible for getting the data
from your application. It only indexed the data given
it to you.

Also for extracting contents from pdf  doc
files(generally known as straining) I know of 2 more
tools
wvWare - for word documents
pdftotext(xpdf) - for pdf documents.

Google around and you will get lot of links.

Hope this helps.

Thanks,
   George

 --- Santosh [EMAIL PROTECTED] wrote: 
 I am recently joined into list, I didnt gone through
 any previous mails, if
 you have any mails or related code please forward it
 to me
 - Original Message -
 From: Chandan Tamrakar [EMAIL PROTECTED]
 To: Lucene Users List
 [EMAIL PROTECTED]
 Sent: Thursday, August 19, 2004 3:47 PM
 Subject: Re: searchhelp
 
 
  For PDF you need to extract a text from pdf files
 using pdfbox library
 and
  for word documents u can use apache POI api's .
 There are messages
  posted on the  lucene list related to your
 queries. About database ,i
 guess
  someone must have done it . :)
 
  - Original Message -
  From: Santosh [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Sent: Thursday, August 19, 2004 3:58 PM
  Subject: searchhelp
 
 
  Hi,
 
  I am using lucene search engine for my
 application.
 
  i am able to search through the text files and
 htmls as specified by
 lucene
 
  can you please clarify my doubts
 
  1.can lucene search through pdfs and word
 documents? if yes then how?
 
  2.can lucene search through database ? if yes then
 how?
 
  thankyou
 
  santosh
 
 
  ---SOFTPRO
 DISCLAIMER--
 
  Information contained in this E-MAIL and any
 attachments are
  confidential being  proprietary to SOFTPRO SYSTEMS
  is 'privileged'
  and 'confidential'.
 
  If you are not an intended or authorised recipient
 of this E-MAIL or
  have received it in error, You are notified that
 any use, copying or
  dissemination  of the information contained in
 this E-MAIL in any
  manner whatsoever is strictly prohibited. Please
 delete it immediately
  and notify the sender by E-MAIL.
 
  In such a case reading, reproducing, printing or
 further dissemination
  of this E-MAIL is strictly prohibited and may be
 unlawful.
 
  SOFTPRO SYSYTEMS does not REPRESENT or WARRANT
 that an attachment
  hereto is free from computer viruses or other
 defects.
 
  The opinions expressed in this E-MAIL and any
 ATTACHEMENTS may be
  those of the author and are not necessarily those
 of SOFTPRO SYSTEMS.
 


 
 
 
 

-
  To unsubscribe, e-mail:
 [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
 

-
 To unsubscribe, e-mail:
 [EMAIL PROTECTED]
 For additional commands, e-mail:
 [EMAIL PROTECTED]
 
  





___ALL-NEW Yahoo! Messenger - 
all new features - even more fun!  http://uk.messenger.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: searchhelp

2004-08-19 Thread Chandan Tamrakar
for pdf  u can refer www.pdfbox.org  and pls. check the apache POI project
in jakarta.apache.org site for indexing MS documents.

- Original Message - 
From: Santosh [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 4:09 PM
Subject: Re: searchhelp


 I am recently joined into list, I didnt gone through any previous mails,
if
 you have any mails or related code please forward it to me
 - Original Message -
 From: Chandan Tamrakar [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Thursday, August 19, 2004 3:47 PM
 Subject: Re: searchhelp


  For PDF you need to extract a text from pdf files using pdfbox library
 and
  for word documents u can use apache POI api's . There are messages
  posted on the  lucene list related to your queries. About database ,i
 guess
  someone must have done it . :)
 
  - Original Message -
  From: Santosh [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Sent: Thursday, August 19, 2004 3:58 PM
  Subject: searchhelp
 
 
  Hi,
 
  I am using lucene search engine for my application.
 
  i am able to search through the text files and htmls as specified by
 lucene
 
  can you please clarify my doubts
 
  1.can lucene search through pdfs and word documents? if yes then how?
 
  2.can lucene search through database ? if yes then how?
 
  thankyou
 
  santosh
 
 
  ---SOFTPRO DISCLAIMER--
 
  Information contained in this E-MAIL and any attachments are
  confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
  and 'confidential'.
 
  If you are not an intended or authorised recipient of this E-MAIL or
  have received it in error, You are notified that any use, copying or
  dissemination  of the information contained in this E-MAIL in any
  manner whatsoever is strictly prohibited. Please delete it immediately
  and notify the sender by E-MAIL.
 
  In such a case reading, reproducing, printing or further dissemination
  of this E-MAIL is strictly prohibited and may be unlawful.
 
  SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
  hereto is free from computer viruses or other defects.
 
  The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
  those of the author and are not necessarily those of SOFTPRO SYSTEMS.
  
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]





-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: searchhelp

2004-08-19 Thread David Townsend
JGURU FAQ
http://www.jguru.com/faq/Lucene

OFFICIAL FAQ
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi

MAIL ARCHIVE
http://www.mail-archive.com/[EMAIL PROTECTED]/

hope this helps.


-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: 19 August 2004 11:25
To: Lucene Users List
Subject: Re: searchhelp


I am recently joined into list, I didnt gone through any previous mails, if
you have any mails or related code please forward it to me
- Original Message -
From: Chandan Tamrakar [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 3:47 PM
Subject: Re: searchhelp


 For PDF you need to extract a text from pdf files using pdfbox library
and
 for word documents u can use apache POI api's . There are messages
 posted on the  lucene list related to your queries. About database ,i
guess
 someone must have done it . :)

 - Original Message -
 From: Santosh [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Thursday, August 19, 2004 3:58 PM
 Subject: searchhelp


 Hi,

 I am using lucene search engine for my application.

 i am able to search through the text files and htmls as specified by
lucene

 can you please clarify my doubts

 1.can lucene search through pdfs and word documents? if yes then how?

 2.can lucene search through database ? if yes then how?

 thankyou

 santosh


 ---SOFTPRO DISCLAIMER--

 Information contained in this E-MAIL and any attachments are
 confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
 and 'confidential'.

 If you are not an intended or authorised recipient of this E-MAIL or
 have received it in error, You are notified that any use, copying or
 dissemination  of the information contained in this E-MAIL in any
 manner whatsoever is strictly prohibited. Please delete it immediately
 and notify the sender by E-MAIL.

 In such a case reading, reproducing, printing or further dissemination
 of this E-MAIL is strictly prohibited and may be unlawful.

 SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
 hereto is free from computer viruses or other defects.

 The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
 those of the author and are not necessarily those of SOFTPRO SYSTEMS.
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: searchhelp

2004-08-19 Thread Santosh
thanks everybody,

but i didnt got any code or any real help in this links
any body has performed previously this search?if yes then please send me the
code, or tell me the what code I have to add to my present lucene
- Original Message -
From: David Townsend [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 4:17 PM
Subject: RE: searchhelp


JGURU FAQ
http://www.jguru.com/faq/Lucene

OFFICIAL FAQ
http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi

MAIL ARCHIVE
http://www.mail-archive.com/[EMAIL PROTECTED]/

hope this helps.


-Original Message-
From: Santosh [mailto:[EMAIL PROTECTED]
Sent: 19 August 2004 11:25
To: Lucene Users List
Subject: Re: searchhelp


I am recently joined into list, I didnt gone through any previous mails, if
you have any mails or related code please forward it to me
- Original Message -
From: Chandan Tamrakar [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 3:47 PM
Subject: Re: searchhelp


 For PDF you need to extract a text from pdf files using pdfbox library
and
 for word documents u can use apache POI api's . There are messages
 posted on the  lucene list related to your queries. About database ,i
guess
 someone must have done it . :)

 - Original Message -
 From: Santosh [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Sent: Thursday, August 19, 2004 3:58 PM
 Subject: searchhelp


 Hi,

 I am using lucene search engine for my application.

 i am able to search through the text files and htmls as specified by
lucene

 can you please clarify my doubts

 1.can lucene search through pdfs and word documents? if yes then how?

 2.can lucene search through database ? if yes then how?

 thankyou

 santosh


 ---SOFTPRO DISCLAIMER--

 Information contained in this E-MAIL and any attachments are
 confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
 and 'confidential'.

 If you are not an intended or authorised recipient of this E-MAIL or
 have received it in error, You are notified that any use, copying or
 dissemination  of the information contained in this E-MAIL in any
 manner whatsoever is strictly prohibited. Please delete it immediately
 and notify the sender by E-MAIL.

 In such a case reading, reproducing, printing or further dissemination
 of this E-MAIL is strictly prohibited and may be unlawful.

 SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
 hereto is free from computer viruses or other defects.

 The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
 those of the author and are not necessarily those of SOFTPRO SYSTEMS.
 



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: searchhelp

2004-08-19 Thread Reyhood Farhan
As far as I remember, the pdfbox release includes some existing code to 
index pdfs with lucene, based upon the demo created for lucene 1.3. In 
fact, I  think the code only works for lucene 1,3 - something to do with 
a change from arrays to vectors in lucene 1.4. I may be wrong though. 

http://www.csh.rit.edu/~ben/projects/pdfbox/javadoc/org/pdfbox/searchengine/lucene/package-summary.html


 thanks everybody,
 
 but i didnt got any code or any real help in this links
 any body has performed previously this search?if yes then please send me the
 code, or tell me the what code I have to add to my present lucene
 - Original Message -
 From: David Townsend [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Thursday, August 19, 2004 4:17 PM
 Subject: RE: searchhelp
 
 
 JGURU FAQ
 http://www.jguru.com/faq/Lucene
 
 OFFICIAL FAQ
 http://lucene.sourceforge.net/cgi-bin/faq/faqmanager.cgi
 
 MAIL ARCHIVE
 http://www.mail-archive.com/[EMAIL PROTECTED]/
 
 hope this helps.
 
 
 -Original Message-
 From: Santosh [mailto:[EMAIL PROTECTED]
 Sent: 19 August 2004 11:25
 To: Lucene Users List
 Subject: Re: searchhelp
 
 
 I am recently joined into list, I didnt gone through any previous mails, if
 you have any mails or related code please forward it to me
 - Original Message -
 From: Chandan Tamrakar [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Thursday, August 19, 2004 3:47 PM
 Subject: Re: searchhelp
 
 
  For PDF you need to extract a text from pdf files using pdfbox library
 and
  for word documents u can use apache POI api's . There are messages
  posted on the  lucene list related to your queries. About database ,i
 guess
  someone must have done it . :)
 
  - Original Message -
  From: Santosh [EMAIL PROTECTED]
  To: [EMAIL PROTECTED]
  Sent: Thursday, August 19, 2004 3:58 PM
  Subject: searchhelp
 
 
  Hi,
 
  I am using lucene search engine for my application.
 
  i am able to search through the text files and htmls as specified by
 lucene
 
  can you please clarify my doubts
 
  1.can lucene search through pdfs and word documents? if yes then how?
 
  2.can lucene search through database ? if yes then how?
 
  thankyou
 
  santosh
 
 
  ---SOFTPRO DISCLAIMER--
 
  Information contained in this E-MAIL and any attachments are
  confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
  and 'confidential'.
 
  If you are not an intended or authorised recipient of this E-MAIL or
  have received it in error, You are notified that any use, copying or
  dissemination  of the information contained in this E-MAIL in any
  manner whatsoever is strictly prohibited. Please delete it immediately
  and notify the sender by E-MAIL.
 
  In such a case reading, reproducing, printing or further dissemination
  of this E-MAIL is strictly prohibited and may be unlawful.
 
  SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
  hereto is free from computer viruses or other defects.
 
  The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
  those of the author and are not necessarily those of SOFTPRO SYSTEMS.
  
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]