[jira] [Commented] (TIKA-1401) occured infinite loop using tika library

2015-05-06 Thread Matthias Krueger (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14530284#comment-14530284
 ] 

Matthias Krueger commented on TIKA-1401:


I'm preparing a patch for this issue. I think we should avoid all DTD parsing 
in XML root extractor.

I need some input on how important Android code compatibility is.

There are two options:

1. Continue to use SAX and use Xerces 
https://xerces.apache.org/xerces2-j/features.html#disallow-doctype-decl
* This will prevent any XML containing a DTD from being parsed.
* Advantages: This will solve all DTD related security issues. Code will still 
compile for Android.
* Disadvantage: Root element extraction will fail even for XML that is fine but 
just happens to contain a DTD.

2. Re-implement using StAX and javax.xml.stream.supportDTD=false
* This will have the DTD part of the XML being skipped (but no exception 
thrown).
* Advantages: Will solve all DTD related security issues. Root elements can 
still be extracted for XML that contains a DTD (if there is no entity used in 
any of the root elements attribute values). Code is slightly more elegant as it 
avoids the ignorable Exception.
* Disadvantage: StAX not supported on Android.


 occured infinite loop using tika library
 

 Key: TIKA-1401
 URL: https://issues.apache.org/jira/browse/TIKA-1401
 Project: Tika
  Issue Type: Bug
  Components: detector
Affects Versions: 1.5
Reporter: Robin.Hwang

 Hi
 1. Save the file with the following content as errorfile.xml
 {code}
 ?xml version=1.0?
 !DOCTYPE billion [
 !ELEMENT billion (#PCDATA)
 !ENTITY laugh0 
 

[jira] [Commented] (TIKA-1401) occured infinite loop using tika library

2015-03-20 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372172#comment-14372172
 ] 

Tyler Palsulich commented on TIKA-1401:
---

Still loop infinitely with Tika 1.8-SNAPSHOT.

 occured infinite loop using tika library
 

 Key: TIKA-1401
 URL: https://issues.apache.org/jira/browse/TIKA-1401
 Project: Tika
  Issue Type: Bug
  Components: detector
Affects Versions: 1.5
Reporter: Robin.Hwang

 Hi
 1. Save the file with the following content as errorfile.xml
 {code}
 ?xml version=1.0?
 !DOCTYPE billion [
 !ELEMENT billion (#PCDATA)
 !ENTITY laugh0 
 

[jira] [Commented] (TIKA-1401) occured infinite loop using tika library

2014-08-26 Thread Robin.Hwang (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110536#comment-14110536
 ] 

Robin.Hwang commented on TIKA-1401:
---

Thanks for the answer
Confirmation is, please tell me.

I considering alternative method using timeout in thread.. I want to hear from 
your feedback.
attach the code below




import java.io.InputStream;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.Future;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

import org.apache.tika.Tika;
import org.apache.tika.mime.MimeTypes;

public class TikaFileTypeDetector implements CallableString{

private static final int TIMEOUT = 10;

private final Tika tika = new Tika();

private InputStream inputStream;

private TikaFileTypeDetector(InputStream inputStream) {
this.inputStream = inputStream;
}

@Override
public String call() throws Exception {
String mimetype = MimeTypes.OCTET_STREAM;

mimetype = tika.detect(inputStream);
return mimetype;
}

public static String detect(InputStream inputStream) {
if(inputStream == null) {
return null;
}
String mimetype = MimeTypes.OCTET_STREAM;

TikaFileTypeDetector detector = new 
TikaFileTypeDetector(inputStream);
ExecutorService executor = Executors.newSingleThreadExecutor();

FutureString future = executor.submit(detector);

try {
mimetype = future.get(TIMEOUT, TimeUnit.SECONDS);
} catch (InterruptedException e) {
e.printStackTrace();
} catch (ExecutionException e) {
e.printStackTrace();
} catch (TimeoutException e) {
e.printStackTrace();
} finally {
executor.shutdownNow();
}

return mimetype;
}

}


 occured infinite loop using tika library
 

 Key: TIKA-1401
 URL: https://issues.apache.org/jira/browse/TIKA-1401
 Project: Tika
  Issue Type: Bug
  Components: detector
Affects Versions: 1.5
Reporter: Robin.Hwang

 Hi
 1. Save the file with the following content as errorfile.xml
 ?xml version=1.0?
 !DOCTYPE billion [
 !ELEMENT billion (#PCDATA)
 !ENTITY laugh0 
 

[jira] [Commented] (TIKA-1401) occured infinite loop using tika library

2014-08-25 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14110294#comment-14110294
 ] 

Nick Burch commented on TIKA-1401:
--

At first glance, it looks like we might need to bring over the naughty xml 
protection we have in the xml parsing side to the xml detector as well. 
Hopefully one of our xml experts can be along shortly to confirm!

 occured infinite loop using tika library
 

 Key: TIKA-1401
 URL: https://issues.apache.org/jira/browse/TIKA-1401
 Project: Tika
  Issue Type: Bug
  Components: detector
Affects Versions: 1.5
Reporter: Robin.Hwang

 Hi
 1. Save the file with the following content as errorfile.xml
 ?xml version=1.0?
 !DOCTYPE billion [
 !ELEMENT billion (#PCDATA)
 !ENTITY laugh0