Hi,

I have encountered a problem before: some files cannot be detected based on 
their content.
So I added the file name and solved the problem.

But now I have another problem: adding a file name actually resulted in not 
being detected.

If that's the case, I need to make two attempts:
Try using content detection first, 
and then try using file name detection.




[email protected]
 
From: Tilman Hausherr
Date: 2023-04-20 14:32
To: user
Subject: Re: Tika server extraction failed
Hi,

I don't see why this is a problem, and you're mentioning the solution yourself. 
If you want detection by content, then don't pass the filename.

Tilman

On 20.04.2023 08:19, [email protected] wrote:
Hi, Tilman

    I have encountered another problem.
    
    t1.xml is a simple plain text file, not a standard XML file.
    When I use Tika Server 2.7.0 to extract file content, the results are as 
follows:

curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain" -H 
"Content-Disposition: attachment; filename=t1.xml"
Result: fail (empty)    

curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain"
curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain" -H 
"Content-Disposition: attachment; filename=t1.txt"
curl -T t1.xml http://127.0.0.1:12000/tika --header "Accept: text/plain" -H 
"Content-Disposition: attachment; filename=t1.docx"
Result: success

    The file name information affects the extraction result.


Reply via email to