Modify the log directory for dih

2018-10-02 Thread lala
Hi, Is there a way to set the log directory for a dih request and the log file name? thanks in advance... -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Modify the log directory for dih

2018-10-02 Thread lala
I know tha Solr logs the dih operations (& most of other operations) in server\logs\solr.log file. What I want is to configure the dih requests to be logged in another path, with another name if it's possible. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Re: Modify the log directory for dih

2018-10-02 Thread lala
Thanks a lot Shawn for your reply, For what you said: Shawn Heisey-2 wrote > With a change to the log4j configuration file, you can direct all logs > created by the DIH classes to a separate file, no code changes needed. Since I'm a newbee regarding log4j, Can you please give me an example abou

Re: Modify the log directory for dih

2018-10-03 Thread lala
Hi, I am using: Solr: 7.4 OS: windows7 I start solr using a service on startup. Additional info: I am developing a web application that uses solr as search engine, I use DIH to index folders in solr using the FileListEntityProcessor. What I need is logging each index operation in a file that I ca

Re: Modify the log directory for dih

2018-10-09 Thread lala
Hi, I installed solr as a service onwindows using nssm. The log4j2.xml file resides in solr\server\resources. I managed to direct logging of the class "LogUpdateProcessorFactory" to a new location (a sub-directory in logs). My problem now is: - I need to create a new log file for each dih request

Solr dih extract text from inline images in pdf

2018-03-06 Thread lala
Hi, I am working with solr7, indexing multilingual files existing in a folder, using DIH (FileListEntityProcessor for the basic entity, & TikaEntityProcessor for the child entity in configuration file). My problem relies here: I want to extract texts from images inside PDF files, that works fine

Re: Solr dih extract text from inline images in pdf

2018-03-07 Thread lala
Thanks for your reply Erick, Actually I am using Solrj to index files among other operations with Solr, but to index a large amount of differesnt kinds of file, I'm sending a DIH request to Solr using Solrj API : FileListEntityProcessor with TikaEntityParser... Why not benefit from this technolog

Re: Solr dih extract text from inline images in pdf

2018-03-07 Thread lala
Thanks Charlie... It's just confusing for me, In the DIH configuration file, the inner entity that takes "TikaEntityProcessor" as its processor, I can easily specify a tikaConfig attribute to an xml file, located inside the config folder in the core, and where in this file I should be able to overr

Re: Solr dih extract text from inline images in pdf

2018-03-07 Thread lala
I dont' know what is the problem, when posting the message, the xml format inside the is not correct, it should contain ["<"param name="extractInlineImages" type="bool">true] AND ["<"param name="sortByPosition" type="bool">true]... -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f47

Re: Fwd: configuring Solr with Tesseract

2018-05-28 Thread lala
Hi, can you please point me out to "the discussion about how OCR can take minutes of CPU per page", I really need to understand more the Tika OCR behavior with solr. -- Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Is there a way to force content extraction with a given encoding

2019-11-07 Thread lala
I am using the /update/extract request handler to push documents into solr, but some text documents, that are encoded as windows-1255 (arabic texts) are not extracted properly, the text given is not readable. I searched in the web, and solr documentation and found nothing. I need to send the file