Hi Sven, From your email below, it seems like you get 2 characters per page - can you provide details on what those are?
Thanks, -- Ken > From: Krüger, Sven > Sent: June 25, 2014 6:22:52am PDT > To: [email protected] > Subject: getLanguage returns "lt" if pdf-file contains only images > > Hello, > > if a pdf-file only contains graphics without extractable text, getLanguage > returns "lt". > > Currently I can filter that because the length of the extracted content is 2 > * metadata.get("xmpTPg:NPages") - but I don‘t think this is supposed to work > that way. > > Is there any way to get a value that indicates the probability of the > detected language or another way to get a proper (in this case no) language? > Regards Sven > -------------------------- Ken Krugler +1 530-210-6378 http://www.scaleunlimited.com custom big data solutions & training Hadoop, Cascading, Cassandra & Solr
