Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-27 Thread Gary Taylor
/ /entity /entity /document /dataConfig So it's something related to BinFileDataSource and TikaEntityProcessor. Thanks, Gary. On 26/02/2015 14:24, Gary Taylor wrote: Alex, That's great. Thanks for the pointers. I'll try and get more info on this and file a JIRA issue. Kind

Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-26 Thread Gary Taylor
Alex, Same results on recursive=true / recursive=false. I also tried importing plain text files instead of epub (still using TikeEntityProcessor though) and get exactly the same result - ie. all files fetched, but only one document indexed in Solr. With verbose output, I get a row for each

Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-26 Thread Gary Taylor
Alex, That's great. Thanks for the pointers. I'll try and get more info on this and file a JIRA issue. Kind regards, Gary. On 26/02/2015 14:16, Alexandre Rafalovitch wrote: On 26 February 2015 at 08:32, Gary Taylor g...@inovem.com wrote: Alex, Same results on recursive=true / recursive

Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-25 Thread Gary Taylor
. Thanks for any assistance / pointers. Regards, Gary -- Gary Taylor | www.inovem.com | www.kahootz.com INOVEM Ltd is registered in England and Wales No 4228932 Registered Office 1, Weston Court, Weston, Berkshire. RG20 8JE kahootz.com is a trading name of INOVEM Ltd.

Re: Can't index all docs in a local folder with DIH in Solr 5.0.0

2015-02-25 Thread Gary Taylor
, URPs and even a newsletter: http://www.solr-start.com/ On 25 February 2015 at 11:14, Gary Taylor g...@inovem.com wrote: I can't get the FileListEntityProcessor and TikeEntityProcessor to correctly add a Solr document for each epub file in my local directory. I've just downloaded Solr 5.0.0

Re: tika integration exception and other related queries

2011-06-09 Thread Gary Taylor
Naveen, Not sure our requirement matches yours, but one of the things we index is a comment item that can have one or more files attached to it. To index the whole thing as a single Solr document we create a zipfile containing a file with the comment details in it and any additional

Re: tika integration exception and other related queries

2011-06-08 Thread Gary Taylor
Naveen, For indexing Zip files with Tika, take a look at the following thread : http://lucene.472066.n3.nabble.com/Extracting-contents-of-zipped-files-with-Tika-and-Solr-1-4-1-td2327933.html I got it to work with the 3.1 source and a couple of patches. Hope this helps. Regards, Gary. On

Re: Extracting contents of zipped files with Tika and Solr 1.4.1 (now Solr 3.1)

2011-05-23 Thread Gary Taylor
Jayendra, I cleared out my local repository, and replayed all of my steps from Friday and it now it works. The only difference (or the only one that's obvious to me) was that I applied the patch before doing a full compile/test/dist. But I assumed that given I was seeing my new log entries

Re: Extracting contents of zipped files with Tika and Solr 1.4.1 (now Solr 3.1)

2011-05-20 Thread Gary Taylor
grateful. Thanks and kind regards, Gary. On 11/04/2011 11:12, Gary Taylor wrote: Jayendra, Thanks for the info - been keeping an eye on this list in case this topic cropped up again. It's currently a background task for me, so I'll try and take a look at the patches and re-test soon. Joey

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-04-11 Thread Gary Taylor
message, some people have been able to get this functionality to work as desired. -- Gary Taylor INOVEM Tel +44 (0)1488 648 480 Fax +44 (0)7092 115 933 gary.tay...@inovem.com www.inovem.com INOVEM Ltd is registered in England and Wales No 4228932 Registered Office 1, Weston Court, Weston

Re: adding a document using curl

2011-03-03 Thread Gary Taylor
As an example, I run this in the same directory as the msword1.doc file: curl http://localhost:8983/solr/core0/update/extract?literal.docid=74literal.type=5; -F file=@msword1.doc The type literal is just part of my schema. Gary. On 03/03/2011 11:45, Ken Foskey wrote: On Thu, 2011-03-03

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-31 Thread Gary Taylor
Can anyone shed any light on this, and whether it could be a config issue? I'm now using the latest SVN trunk, which includes the Tika 0.8 jars. When I send a ZIP file (containing two txt files, doc1.txt and doc2.txt) to the ExtractingRequestHandler, I get the following log entry (formatted

Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor
Hi, I posted a question in November last year about indexing content from multiple binary files into a single Solr document and Jayendra responded with a simple solution to zip them up and send that single file to Solr. I understand that the Tika 0.4 JARs supplied with Solr 1.4.1 don't

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor
Thanks Erlend. Not used SVN before, but have managed to download and build latest trunk code. Now I'm getting an error when trying to access the admin page (via Jetty) because I specify HTMLStripStandardTokenizerFactory in my schema.xml, but this appears to be no-longer supplied as part of

Re: Extracting contents of zipped files with Tika and Solr 1.4.1

2011-01-25 Thread Gary Taylor
the filenames and contents. Should I be able to index the contents of files stored in a zip by using extract ? Thanks and kind regards, Gary. On 25/01/2011 15:32, Gary Taylor wrote: Thanks Erlend. Not used SVN before, but have managed to download and build latest trunk code. Now I'm getting

Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Gary Taylor
Hi, We're trying to use Solr to replace a custom Lucene server. One requirement we have is to be able to index the content of multiple binary files into a single Solr document. For example, a uniquely named object in our app can have multiple attached-files (eg. Word, PDF etc.), and we

Re: Extracting and indexing content from multiple binary files into a single Solr document

2010-11-17 Thread Gary Taylor
to the ExtractingRequestHandler for indexing and included as a part of single Solr document. Regards, Jayendra On Wed, Nov 17, 2010 at 6:27 AM, Gary Taylor lt;g...@inovem.comgt; wrote: gt; Hi, gt; gt; We're trying to use Solr to replace a custom Lucene server. One gt; requirement we have