RE: Testing an ingest framework that uses Apache Tika

2017-02-16 Thread Allison, Timothy B.
Thank you, Chris, Luís and Konstantin!



-Original Message-
From: Mattmann, Chris A (3010) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Thursday, February 16, 2017 10:18 AM
To: dev@tika.apache.org; lfcnas...@gmail.com
Cc: solr-u...@lucene.apache.org
Subject: Re: Testing an ingest framework that uses Apache Tika

++1 awesome job

++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010) Manager, NSF 
& Open Source Projects Formulation and Development Offices (8212) NASA Jet 
Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate 
Professor, Computer Science Department University of Southern California, Los 
Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++
 

On 2/16/17, 5:28 AM, "Luís Filipe Nassif"  wrote:

Excellent, Tim! Thank you for all your great work on Apache Tika!

2017-02-16 11:23 GMT-02:00 Konstantin Gribov :

> Tim,
>
> it's a awesome feature for downstream projects' integration tests. Thanks
> for implementing it!
>
> чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. :
>
> > All,
> >
> > I finally got around to documenting Apache Tika's MockParser[1].  As of
> > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
> you
> > can simulate:
> >
> > 1. Regular catchable exceptions
> > 2. OOMs
> > 3. Permanent hangs
> >
> > This will allow you to determine if your ingest framework is robust
> > against these issues.
> >
> > As always, we fix Tika when we can, but if history is any indicator,
> > you'll want to make sure your ingest code can handle these issues if you
> > are handling millions/billions of files from the wild.
> >
> > Cheers,
> >
> > Tim
> >
> >
> > [1] https://wiki.apache.org/tika/MockParser
> >
> --
>
> Best regards,
> Konstantin Gribov
>




Re: Testing an ingest framework that uses Apache Tika

2017-02-16 Thread Mattmann, Chris A (3010)
++1 awesome job

++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, NSF & Open Source Projects Formulation and Development Offices (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-503
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++
 

On 2/16/17, 5:28 AM, "Luís Filipe Nassif"  wrote:

Excellent, Tim! Thank you for all your great work on Apache Tika!

2017-02-16 11:23 GMT-02:00 Konstantin Gribov :

> Tim,
>
> it's a awesome feature for downstream projects' integration tests. Thanks
> for implementing it!
>
> чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. :
>
> > All,
> >
> > I finally got around to documenting Apache Tika's MockParser[1].  As of
> > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
> you
> > can simulate:
> >
> > 1. Regular catchable exceptions
> > 2. OOMs
> > 3. Permanent hangs
> >
> > This will allow you to determine if your ingest framework is robust
> > against these issues.
> >
> > As always, we fix Tika when we can, but if history is any indicator,
> > you'll want to make sure your ingest code can handle these issues if you
> > are handling millions/billions of files from the wild.
> >
> > Cheers,
> >
> > Tim
> >
> >
> > [1] https://wiki.apache.org/tika/MockParser
> >
> --
>
> Best regards,
> Konstantin Gribov
>




Re: Testing an ingest framework that uses Apache Tika

2017-02-16 Thread Luís Filipe Nassif
Excellent, Tim! Thank you for all your great work on Apache Tika!

2017-02-16 11:23 GMT-02:00 Konstantin Gribov :

> Tim,
>
> it's a awesome feature for downstream projects' integration tests. Thanks
> for implementing it!
>
> чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. :
>
> > All,
> >
> > I finally got around to documenting Apache Tika's MockParser[1].  As of
> > Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and
> you
> > can simulate:
> >
> > 1. Regular catchable exceptions
> > 2. OOMs
> > 3. Permanent hangs
> >
> > This will allow you to determine if your ingest framework is robust
> > against these issues.
> >
> > As always, we fix Tika when we can, but if history is any indicator,
> > you'll want to make sure your ingest code can handle these issues if you
> > are handling millions/billions of files from the wild.
> >
> > Cheers,
> >
> > Tim
> >
> >
> > [1] https://wiki.apache.org/tika/MockParser
> >
> --
>
> Best regards,
> Konstantin Gribov
>


Re: Testing an ingest framework that uses Apache Tika

2017-02-16 Thread Konstantin Gribov
Tim,

it's a awesome feature for downstream projects' integration tests. Thanks
for implementing it!

чт, 16 февр. 2017 г. в 16:17, Allison, Timothy B. :

> All,
>
> I finally got around to documenting Apache Tika's MockParser[1].  As of
> Tika 1.15 (unreleased), add tika-core-tests.jar to your class path, and you
> can simulate:
>
> 1. Regular catchable exceptions
> 2. OOMs
> 3. Permanent hangs
>
> This will allow you to determine if your ingest framework is robust
> against these issues.
>
> As always, we fix Tika when we can, but if history is any indicator,
> you'll want to make sure your ingest code can handle these issues if you
> are handling millions/billions of files from the wild.
>
> Cheers,
>
> Tim
>
>
> [1] https://wiki.apache.org/tika/MockParser
>
-- 

Best regards,
Konstantin Gribov


Testing an ingest framework that uses Apache Tika

2017-02-16 Thread Allison, Timothy B.
All,

I finally got around to documenting Apache Tika's MockParser[1].  As of Tika 
1.15 (unreleased), add tika-core-tests.jar to your class path, and you can 
simulate:

1. Regular catchable exceptions
2. OOMs
3. Permanent hangs

This will allow you to determine if your ingest framework is robust against 
these issues.

As always, we fix Tika when we can, but if history is any indicator, you'll 
want to make sure your ingest code can handle these issues if you are handling 
millions/billions of files from the wild.

Cheers,

Tim


[1] https://wiki.apache.org/tika/MockParser