Re: Indexing documents with SOLR
Pankaj, Check this article out on how to get going with Nutch. http://bit.ly/dbBdK4This is a few months old so you will have to note that there is a new parameter called something like -SolrUrl that will allow you to update your solr index with the crawled data. For crawling your local file system, you will have to change the http:// to file:// in your seed.txt file to point to the directory you want to crawl. Another VERY important option is to increase your Java heap size. I do this by using the JAVA_OPT environment variable. Adam On Sat, Dec 11, 2010 at 8:27 AM, pankaj bhatt panbh...@gmail.com wrote: Hi Adam, Thanks a lot for pointing me out to NUTCH. Can you please tell me, is through NUTCH Can I read teh directory on local system or on a shared file system. Will wait for your response. / Pankaj Bhatt On Fri, Dec 10, 2010 at 9:35 PM, Adam Estrada estrada.a...@gmail.comwrote: Nutch is also a great option if you want a crawler. I have found that you will need to use the latest version of PDFBox and a it's dependencies for better results. Also, make sure to set JAVA_OPT to something really large so that you won't exceed your heap size. Adam On Fri, Dec 10, 2010 at 6:27 AM, Tommaso Teofili tommaso.teof...@gmail.comwrote: Hi Pankaj, you can find the needed documentation right here [1]. Hope this helps, Tommaso [1] : http://wiki.apache.org/solr/ExtractingRequestHandler 2010/12/10 pankaj bhatt panbh...@gmail.com Hi All, I am a newbie to SOLR and trying to integrate TIKA + SOLR. Can anyone please guide me, how to achieve this. * My Req is:* I have a directory containing a lot of PDF,DOC's and i need to make a search within the documents. I am using SOLR web application. I just need some sample xml code both for solr-config.xml and the directory-schema.xml Awaiting eagerly for your response. Regards, Pankaj Bhatt.
Indexing documents with SOLR
Hi All, I am a newbie to SOLR and trying to integrate TIKA + SOLR. Can anyone please guide me, how to achieve this. * My Req is:* I have a directory containing a lot of PDF,DOC's and i need to make a search within the documents. I am using SOLR web application. I just need some sample xml code both for solr-config.xml and the directory-schema.xml Awaiting eagerly for your response. Regards, Pankaj Bhatt.
Re: Indexing documents with SOLR
Hi Pankaj, you can find the needed documentation right here [1]. Hope this helps, Tommaso [1] : http://wiki.apache.org/solr/ExtractingRequestHandler 2010/12/10 pankaj bhatt panbh...@gmail.com Hi All, I am a newbie to SOLR and trying to integrate TIKA + SOLR. Can anyone please guide me, how to achieve this. * My Req is:* I have a directory containing a lot of PDF,DOC's and i need to make a search within the documents. I am using SOLR web application. I just need some sample xml code both for solr-config.xml and the directory-schema.xml Awaiting eagerly for your response. Regards, Pankaj Bhatt.
Re: Indexing documents with SOLR
Nutch is also a great option if you want a crawler. I have found that you will need to use the latest version of PDFBox and a it's dependencies for better results. Also, make sure to set JAVA_OPT to something really large so that you won't exceed your heap size. Adam On Fri, Dec 10, 2010 at 6:27 AM, Tommaso Teofili tommaso.teof...@gmail.comwrote: Hi Pankaj, you can find the needed documentation right here [1]. Hope this helps, Tommaso [1] : http://wiki.apache.org/solr/ExtractingRequestHandler 2010/12/10 pankaj bhatt panbh...@gmail.com Hi All, I am a newbie to SOLR and trying to integrate TIKA + SOLR. Can anyone please guide me, how to achieve this. * My Req is:* I have a directory containing a lot of PDF,DOC's and i need to make a search within the documents. I am using SOLR web application. I just need some sample xml code both for solr-config.xml and the directory-schema.xml Awaiting eagerly for your response. Regards, Pankaj Bhatt.