Hi, I have encountered a problem before: some files cannot be detected based on their content. So I added the file name and solved the problem.
But now I have another problem: adding a file name actually resulted in not being detected. If that's the case, I need to make two attempts: Try using content detection first, and then try using file name detection. [email protected] From: Tilman Hausherr Date: 2023-04-20 14:32 To: user Subject: Re: Tika server extraction failed Hi, I don't see why this is a problem, and you're mentioning the solution yourself. If you want detection by content, then don't pass the filename. Tilman On 20.04.2023 08:19, [email protected] wrote: Hi, Tilman I have encountered another problem. t1.xml is a simple plain text file, not a standard XML file. When I use Tika Server 2.7.0 to extract file content, the results are as follows: curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain" -H "Content-Disposition: attachment; filename=t1.xml" Result: fail (empty) curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain" curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain" -H "Content-Disposition: attachment; filename=t1.txt" curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain" -H "Content-Disposition: attachment; filename=t1.docx" Result: success The file name information affects the extraction result.
