RE: getLanguage returns "lt" if pdf-file contains only images

Ken Krugler Thu, 18 Dec 2014 06:41:44 -0800

Hi Sven,

From your email below, it seems like you get 2 characters per page - can you 
provide details on what those are?


Thanks,

-- Ken

> From: Krüger, Sven
> Sent: June 25, 2014 6:22:52am PDT
> To: [email protected]
> Subject: getLanguage returns "lt" if pdf-file contains only images
> 
> Hello,
>  
> if a pdf-file only contains graphics without extractable text, getLanguage 
> returns "lt".
>  
> Currently I can filter that because the length of the extracted content is 2 
> * metadata.get("xmpTPg:NPages") - but I don‘t think this is supposed to work 
> that way.
>  
> Is there any way to get a value that indicates the probability of  the 
> detected language or another way to get a proper (in this case no) language?
> Regards Sven
>  

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr

RE: getLanguage returns "lt" if pdf-file contains only images

Reply via email to