[Tika Wiki] Update of "MockParser" by TimothyAllison

2017-02-16 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "MockParser" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/MockParser?action=diff=3=4

  
  Please note that for 3., permanent hangs -- you cannot terminate the Thread.  
Thread's ''stop'', ''suspend'', ''destroy'' sound like they'll do the trick, 
but they won't. '''You need to kill the entire process.'''
  
- As of Tika 1.15, we added a MockParser in the tika-core-tests.jar that will 
allow you to test your framework against 1-3.  Simply add that jar to your 
class path and then include a  xml file in your set of test documents, 
and crash, crash away.
+ As of Tika 1.15, we added a MockParser in the tika-core-tests.jar that will 
allow you to test your framework against items 1-3.  Simply add that jar to 
your class path and then include a  xml file in your set of test 
documents, and crash, crash away.
  
  == Usage ==
  
@@ -36, +36 @@

  
  === Your Framework ===
  Place the tika-core-tests.jar on your class path (NOT IN PRODUCTION!!!) and 
then add some mock.xml files to your batch of documents.
- 
  
  
  === Mock options ===


[Tika Wiki] Update of "MockParser" by TimothyAllison

2017-02-16 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "MockParser" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/MockParser?action=diff=1=2

  == Background ==
  So, you've tried Tika on a couple of files and all works well.  Problem 
solved!
  
+ No. 
+ 
- No. In very rare cases, Tika can so some really bad things.  We try to fix 
these problems when we can, but if history is any indication (e.g. 
[[https://issues.apache.org/jira/browse/TIKA-1132|TIKA-1132]]), if you are 
processing millions of files, you'll need to defend against:
+ In very rare cases, Tika can so some really bad things.  We try to fix these 
problems when we can, but if history is any indication (e.g. 
[[https://issues.apache.org/jira/browse/TIKA-1132|TIKA-1132]]), if you are 
processing millions/billions of files from the wild, you'll need to defend 
against:
  
   1. Regular catchable exceptions
   2. !OutOfMemory errors which can put the jvm in an unreliable state
@@ -24, +26 @@

  `java -cp "bin/*" org.apache.tika.TikaCLI mock_example.xml`
  
  === Tika-server ===
- Place the tika-server.jar and the tika-core.tests.jar in a "bin directory.
+ Place the tika-server.jar and the tika-core.tests.jar in a "bin" directory.
  
- `java -cp "serverbin/*" org.apache.tika.server.TikaServerCli`
+ `java -cp "bin/*" org.apache.tika.server.TikaServerCli`
+ 
+ Then curl away:
+ 
+ `curl -T mock_example.xml http://localhost:9998/rmeta/text`
  
  === Your Framework ===
  Place the tika-core-tests.jar on your class path (NOT IN PRODUCTION!!!) and 
then add some mock.xml files to your batch of documents.
  
  
- 
- Then curl away:
- 
- `curl -T mock_example.xml http://localhost:9998/rmeta/text`
  
  === Mock options ===
  See the mock example.xml file in 
tika-parsers/src/test/resources/test-documents/mock.  
@@ -84, +86 @@

  
  
  ``
+ == References ==
+  1. 
[[http://openpreservation.org/blog/2014/03/21/tika-ride-characterising-web-content-nanite/|Tika
 to Ride]]
+  2. 
[[http://events.linuxfoundation.org/sites/events/files/slides/TikaEval_ACNA15_allison_herceg_v2.pdf|Evaluating
 Text Extraction]]
  


[Tika Wiki] Update of "MockParser" by TimothyAllison

2017-02-16 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Tika Wiki" for change 
notification.

The "MockParser" page has been changed by TimothyAllison:
https://wiki.apache.org/tika/MockParser

New page:
= MockParser =

== Background ==
So, you've tried Tika on a couple of files and all works well.  Problem solved!

No. In very rare cases, Tika can so some really bad things.  We try to fix 
these problems when we can, but if history is any indication (e.g. 
[[https://issues.apache.org/jira/browse/TIKA-1132|TIKA-1132]]), if you are 
processing millions of files, you'll need to defend against:

 1. Regular catchable exceptions
 2. !OutOfMemory errors which can put the jvm in an unreliable state
 3. Permanent hangs (Tika can chew up massive amounts of resources and go 
''forever'')
 4. Security vulnerabilities (e.g. 
[[http://seclists.org/bugtraq/2016/Nov/40|CVE-2016-6809]] and 
[[http://seclists.org/oss-sec/2016/q2/413|CVE-2016-4434]])

Please note that for 3., permanent hangs -- you cannot terminate the Thread.  
Thread's ''stop'', ''suspend'', ''destroy'' sound like they'll do the trick, 
but they won't. '''You need to kill the entire process.'''

As of Tika 1.15, we added a MockParser in the tika-core-tests.jar that will 
allow you to test your framework against 1-3.  Simply add that jar to your 
class path and then include a  xml file in your set of test documents, 
and crash, crash away.

== Usage ==

=== Tika-app ===
Place the tika-app.jar and the tika-core-tests.jar in a "bin" directory.

`java -cp "bin/*" org.apache.tika.TikaCLI mock_example.xml`

=== Tika-server ===
Place the tika-server.jar and the tika-core.tests.jar in a "bin directory.

`java -cp "serverbin/*" org.apache.tika.server.TikaServerCli`

=== Your Framework ===
Place the tika-core-tests.jar on your class path (NOT IN PRODUCTION!!!) and 
then add some mock.xml files to your batch of documents.



Then curl away:

`curl -T mock_example.xml http://localhost:9998/rmeta/text`

=== Mock options ===
See the mock example.xml file in 
tika-parsers/src/test/resources/test-documents/mock.  

This shows all of the examples of what you can do.
```






Nikolai Lobachevsky



some content


writing to System.out


writing to System.err





not another IOException





``