Re: Help with a DIH config file

2019-03-16 Thread Jörn Franke
You have to specify the option recursive=true on the entity files On Fri, Mar 15, 2019 at 7:59 PM wclarke wrote: > One last question. > > I have everything running as it should finally. However, when I pull out > of > testing to do the entire directory it's just cycling through. The >

Re: Help with a DIH config file

2019-03-16 Thread Jörn Franke
https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#the-filelistentityprocessor On Sun, Mar 17, 2019 at 1:32 AM Jörn Franke wrote: > You have to specify the option recursive=true on the entity files > > On Fri, Mar 15, 2019 at 7:59 PM

Re: Help with a DIH config file

2019-03-15 Thread wclarke
One last question. I have everything running as it should finally. However, when I pull out of testing to do the entire directory it's just cycling through. The directory is full of folders that have the documents in them. Do I need an html or other file sitting in there randomly to get it to

Re: Help with a DIH config file

2019-03-15 Thread wclarke
Thanks! that fixed it. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Help with a DIH config file

2019-03-15 Thread Tim Allison
n this instance). > > Any suggestions would be greatly appreciated! > > thanks, > Demian > > -Original Message----- > From: Jörn Franke > Sent: Friday, March 15, 2019 4:18 AM > To: solr-user@lucene.apache.org > Subject: Re: Help with a DIH config file > > Do you have

Re: Help with a DIH config file

2019-03-15 Thread Jörn Franke
ssage- > From: Jörn Franke > Sent: Friday, March 15, 2019 4:18 AM > To: solr-user@lucene.apache.org > Subject: Re: Help with a DIH config file > > Do you have an exception? > It could be that the pdf is broken - can you open it on your computer with a > pdfreader? > > If t

RE: Help with a DIH config file

2019-03-15 Thread Demian Katz
@lucene.apache.org Subject: Re: Help with a DIH config file Do you have an exception? It could be that the pdf is broken - can you open it on your computer with a pdfreader? If the exception is related to Tika and pdf then file an issue with the pdfbox project. If there is an issue with Tika

Re: Help with a DIH config file

2019-03-15 Thread Jörn Franke
Do you have an exception? It could be that the pdf is broken - can you open it on your computer with a pdfreader? If the exception is related to Tika and pdf then file an issue with the pdfbox project. If there is an issue with Tika and MsOffice documents then Apache poi is the right project

Re: Help with a DIH config file

2019-03-15 Thread wclarke
Thank you so much. You helped a great deal. I am running into one last issue where the Tika DIH is stopping at a specific language and fails there (Malayalam). Do you know of a work around? -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Help with a DIH config file

2019-03-14 Thread Jörn Franke
sorry for my late reply. thanks for sharing yes this is possible. maybe my last mail were confusing. I hope the examples below help Alternative 1 - Use only DIH without update processor tika-data-config-2xml - add transformer in entity and the transformation in field (here done for id and for

Re: Help with a DIH config file

2019-03-13 Thread wclarke
Got each one working individually, but not multiples. Is it possible? Please see attached files. Thanks!!! tika-data-config-2.xml solrconfig.xml --

Re: Help with a DIH config file

2019-03-13 Thread wclarke
I didn't know I could do an updateProcessorChain and call it in the config file. I tried doing it in the solrconfig, but it just wouldn't take. I will try this though! Thanks The value is the file path in id/url. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Help with a DIH config file

2019-03-13 Thread wclarke
Absolutely! I attached it to the original message, But I can post here too. I am VERY new to Solr and am winging it and while the documentation has been a little helpful, I just need more complex examples. tika-data-config-2.xml

Re: Help with a DIH config file

2019-03-12 Thread Jörn Franke
Some addition: You can also strip HTML in DIH using the HTML Strip transformer: https://wiki.apache.org/solr/DataImportHandler#HTMLStripTransformer In that way you can probably live without a UpdateRequestProcessorChain On Tue, Mar 12, 2019 at 10:24 PM Jörn Franke wrote: > Would it be possible

Re: Help with a DIH config file

2019-03-12 Thread Jörn Franke
Would it be possible to share the DIH config file? I am not sure if I get all your points correctly. Ad 1) is this about a value in a field? Then use the regex transformer: https://wiki.apache.org/solr/DataImportHandler#RegexTransformer Alternatively, use a RegexReplaceProcessorFactoryin