Re: Indexing documents with SOLR

2010-12-11 Thread Adam Estrada
Pankaj,

Check this article out on how to get going with Nutch.
http://bit.ly/dbBdK4This is a few months old so you will have to note
that there is a new
parameter called something like -SolrUrl that will allow you to update your
solr index with the crawled data.

For crawling your local file system, you will have to change the http:// to
file:// in your seed.txt file to point to the directory you want to crawl.
Another VERY important option is to increase your Java heap size. I do this
by using the JAVA_OPT environment variable.

Adam

On Sat, Dec 11, 2010 at 8:27 AM, pankaj bhatt panbh...@gmail.com wrote:

 Hi Adam,
Thanks a lot for pointing me out to NUTCH.
Can you please tell me, is through NUTCH Can I read teh directory on
 local system or on a shared file system.

   Will wait for your response.

 / Pankaj Bhatt


 On Fri, Dec 10, 2010 at 9:35 PM, Adam Estrada estrada.a...@gmail.comwrote:

 Nutch is also a great option if you want a crawler. I have found that you
 will need to use the latest version of PDFBox and a it's dependencies for
 better results. Also, make sure to set JAVA_OPT to something really large
 so
 that you won't exceed your heap size.

 Adam

 On Fri, Dec 10, 2010 at 6:27 AM, Tommaso Teofili
 tommaso.teof...@gmail.comwrote:

  Hi Pankaj,
  you can find the needed documentation right here [1].
  Hope this helps,
  Tommaso
 
  [1] : http://wiki.apache.org/solr/ExtractingRequestHandler
 
  2010/12/10 pankaj bhatt panbh...@gmail.com
 
   Hi All,
I am a newbie to SOLR and trying to integrate TIKA + SOLR.
Can anyone please guide me, how to achieve this.
  
   * My Req is:* I have a directory containing a lot of PDF,DOC's and i
 need
   to
   make a search within the documents. I am using SOLR web application.
  
 I just need some sample xml code both for solr-config.xml
 and
  the
   directory-schema.xml
  Awaiting eagerly for your response.
  
   Regards,
   Pankaj Bhatt.
  
 





Indexing documents with SOLR

2010-12-10 Thread pankaj bhatt
Hi All,
  I am a newbie to SOLR and trying to integrate TIKA + SOLR.
  Can anyone please guide me, how to achieve this.

* My Req is:* I have a directory containing a lot of PDF,DOC's and i need to
make a search within the documents. I am using SOLR web application.

   I just need some sample xml code both for solr-config.xml and the
directory-schema.xml
Awaiting eagerly for your response.

Regards,
Pankaj Bhatt.


Re: Indexing documents with SOLR

2010-12-10 Thread Tommaso Teofili
Hi Pankaj,
you can find the needed documentation right here [1].
Hope this helps,
Tommaso

[1] : http://wiki.apache.org/solr/ExtractingRequestHandler

2010/12/10 pankaj bhatt panbh...@gmail.com

 Hi All,
  I am a newbie to SOLR and trying to integrate TIKA + SOLR.
  Can anyone please guide me, how to achieve this.

 * My Req is:* I have a directory containing a lot of PDF,DOC's and i need
 to
 make a search within the documents. I am using SOLR web application.

   I just need some sample xml code both for solr-config.xml and the
 directory-schema.xml
Awaiting eagerly for your response.

 Regards,
 Pankaj Bhatt.



Re: Indexing documents with SOLR

2010-12-10 Thread Adam Estrada
Nutch is also a great option if you want a crawler. I have found that you
will need to use the latest version of PDFBox and a it's dependencies for
better results. Also, make sure to set JAVA_OPT to something really large so
that you won't exceed your heap size.

Adam

On Fri, Dec 10, 2010 at 6:27 AM, Tommaso Teofili
tommaso.teof...@gmail.comwrote:

 Hi Pankaj,
 you can find the needed documentation right here [1].
 Hope this helps,
 Tommaso

 [1] : http://wiki.apache.org/solr/ExtractingRequestHandler

 2010/12/10 pankaj bhatt panbh...@gmail.com

  Hi All,
   I am a newbie to SOLR and trying to integrate TIKA + SOLR.
   Can anyone please guide me, how to achieve this.
 
  * My Req is:* I have a directory containing a lot of PDF,DOC's and i need
  to
  make a search within the documents. I am using SOLR web application.
 
I just need some sample xml code both for solr-config.xml and
 the
  directory-schema.xml
 Awaiting eagerly for your response.
 
  Regards,
  Pankaj Bhatt.