Re: [EXTERNAL] Docker image along with 1.23?

2019-11-21 Thread Tim Allison
K.  Sounds like an example Docker file will meet your needs, Eric?

Users can currently build their own images with the Docker file in
tika-server, and there's logical-spark.

As noted, there are complexities with distributing an image.

Between those two options, folks should basically be ok.  Right?

I might want to add an advanced Docker file example on our wiki  (or
perhaps in logical-spark ???) that:
1) runs tika-server in spawn-child mode
2) returns stack-traces
3) includes the "provided" xerial sqlite jar
4) includes non ASL 2.0 compatible dependencies for image processing in PDFs

Anything else?



On Thu, Nov 21, 2019 at 7:10 AM Eric Pugh 
wrote:

> That makes sense.   Having a robust Dockerfile, even if it isn’t
> published, is a great way of modeling best practices in running Tika in
> server mode.
>
>
>
> > On Nov 21, 2019, at 3:26 AM, Nick Burch  wrote:
> >
> > On Thu, 21 Nov 2019, Oleg Tikhonov wrote:
> >> My question is more pragmatic.
> >> What we put inside the Dockerfile, on which image it will be based on
> (say
> >> Ubuntu) ...
> >> What will contain an entrypoint? Tika Server? Should we "install" a
> >> tesseract? Anything more?
> >
> > If we want to be trendy, then Sergey Beryozkin did some cool stuck with
> Quarkus and a GraalVM native image of Tika, video online at
> >
> https://aceu19.apachecon.com/session/apache-tika-goes-native-graalvm-and-quarkus
> >
> > I'd possibly suggest two dockerfiles (but not published images!), both
> based on a fairly thin common Java base image (so probably ubuntu rather
> than alphine). One with just Tika Server + tesseract + english tesseract
> data, one with all the optional Tika dependencies (sql natives libraries
> etc) and tesseract and all the available tesseract languages
> >
> > Some other projects are currently leading the debate on ASF binary
> releases that bundle the JVM, I'd suggest we wait for that to resolve
> before we think about trying to publish pre-built images ourselves. Linking
> to images from external organisations we trust should be fine though, eg
> similar to http://httpd.apache.org/docs/current/platform/windows.html#down
> >
> > Nick
>
> ___
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>


Re: [EXTERNAL] Docker image along with 1.23?

2019-11-21 Thread Eric Pugh
That makes sense.   Having a robust Dockerfile, even if it isn’t published, is 
a great way of modeling best practices in running Tika in server mode.



> On Nov 21, 2019, at 3:26 AM, Nick Burch  wrote:
> 
> On Thu, 21 Nov 2019, Oleg Tikhonov wrote:
>> My question is more pragmatic.
>> What we put inside the Dockerfile, on which image it will be based on (say
>> Ubuntu) ...
>> What will contain an entrypoint? Tika Server? Should we "install" a
>> tesseract? Anything more?
> 
> If we want to be trendy, then Sergey Beryozkin did some cool stuck with 
> Quarkus and a GraalVM native image of Tika, video online at
> https://aceu19.apachecon.com/session/apache-tika-goes-native-graalvm-and-quarkus
> 
> I'd possibly suggest two dockerfiles (but not published images!), both based 
> on a fairly thin common Java base image (so probably ubuntu rather than 
> alphine). One with just Tika Server + tesseract + english tesseract data, one 
> with all the optional Tika dependencies (sql natives libraries etc) and 
> tesseract and all the available tesseract languages
> 
> Some other projects are currently leading the debate on ASF binary releases 
> that bundle the JVM, I'd suggest we wait for that to resolve before we think 
> about trying to publish pre-built images ourselves. Linking to images from 
> external organisations we trust should be fine though, eg similar to 
> http://httpd.apache.org/docs/current/platform/windows.html#down
> 
> Nick

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com  | 
My Free/Busy   
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 


This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



Re: [EXTERNAL] Docker image along with 1.23?

2019-11-21 Thread Nick Burch

On Thu, 21 Nov 2019, Oleg Tikhonov wrote:

My question is more pragmatic.
What we put inside the Dockerfile, on which image it will be based on (say
Ubuntu) ...
What will contain an entrypoint? Tika Server? Should we "install" a
tesseract? Anything more?


If we want to be trendy, then Sergey Beryozkin did some cool stuck with 
Quarkus and a GraalVM native image of Tika, video online at

https://aceu19.apachecon.com/session/apache-tika-goes-native-graalvm-and-quarkus

I'd possibly suggest two dockerfiles (but not published images!), both 
based on a fairly thin common Java base image (so probably ubuntu rather 
than alphine). One with just Tika Server + tesseract + english tesseract 
data, one with all the optional Tika dependencies (sql natives libraries 
etc) and tesseract and all the available tesseract languages


Some other projects are currently leading the debate on ASF binary 
releases that bundle the JVM, I'd suggest we wait for that to resolve 
before we think about trying to publish pre-built images ourselves. 
Linking to images from external organisations we trust should be fine 
though, eg similar to 
http://httpd.apache.org/docs/current/platform/windows.html#down


Nick


Re: [EXTERNAL] Docker image along with 1.23?

2019-11-21 Thread Oleg Tikhonov
My question is more pragmatic.
What we put inside the Dockerfile, on which image it will be based on (say
Ubuntu) ...
What will contain an entrypoint? Tika Server? Should we "install" a
tesseract? Anything more?

Thanks,
Oleg

On Thu, Nov 21, 2019 at 4:46 AM Chris Mattmann  wrote:

> Yeah producing the actual image is tricky and my recommendation is for
> Tika to
> stay out of the business of that. Leave it to LogicalSpark or others to do
> this. It’s
> tricky with licenses and I doubt ASF will ever develop an optimal solution
> to this
> due to the nature of its core mission as Nick stated.
>
>
>
>
>
>
>
>
>
> From: Eric Pugh 
> Reply-To: "dev@tika.apache.org" 
> Date: Wednesday, November 20, 2019 at 6:02 PM
> To: "dev@tika.apache.org" 
> Cc: "Allison, Timothy B (US 1760-Affiliate)" <
> timothy.b.alli...@jpl.nasa.gov>
> Subject: Re: [EXTERNAL] Docker image along with 1.23?
>
>
>
> I was thinking more of producing the actual image, so that others don’t
> have to go through the pain of compiling an image.   Having the Dockerfile
> made available as well does give a nice recipe for modifying the “official”
> image.   I recently tested Tesseract 3 with the latest Tika, and I did it
> by tweaking the existing Dockerfile that LogicalSpark has published.
>
>
>
> I don’t know how other projects at ASF handle the image publishing.
>
>
>
>
>
>
>
>
>
> On Nov 20, 2019, at 7:02 PM, Chris Mattmann  wrote:
>
> Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply
> shipping text file,
>
> code. Under a license. If we create a “docker image” and then publish it
> to the ASF
>
> hub then I agree with you.
>
> My suggestion and my interpretation of Tim’s is to ship a standard
> “Dockerfile”. Do you
>
> agree with this? It should be air covered (as former VP, Legal, at least
> it would have been
>
> with me).
>
> Cheers,
>
> Chris
>
> From: Nick Burch 
>
> Reply-To: "dev@tika.apache.org" 
>
> Date: Wednesday, November 20, 2019 at 3:57 PM
>
> To: "Allison, Timothy B (US 1760-Affiliate)" <
> timothy.b.alli...@jpl.nasa.gov>
>
> Cc: "" 
>
> Subject: [EXTERNAL] Re: Docker image along with 1.23?
>
> On Wed, 20 Nov 2019, Tim Allison wrote:
>
> Eric Pugh recently asked on another channel if we had any plans to
>
> release an official docker image for 1.23.
>
> Depending on what we put in the container, we do need to be a little
>
> careful. There's "platform dependencies" under non-compatible licenses
>
> that we can optionally use if people have installed them, which we
>
> ourselves can't directly ship under ASF rules. (Tesseract is fine as
>
> that's Apache Licenses, Java itself is trickier, see the Netbeans
>
> discussions on legal-discuss@ and LEGAL jira)
>
> Shipping an official docker container with the Tika Server on seems to me
>
> to be a helpful step for users, but we just need to make sure we're
>
> following ASF policies. (The Apache Software Foundation mission is to
>
> "provide software for the public good", but source code is the main focus
>
> for the mission, binaries are trickier!)
>
> Nick
>
>
>
> ___
>
> Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 |
> http://www.opensourceconnections.com <
> http://www.opensourceconnections.com/> | My Free/Busy <
> http://tinyurl.com/eric-cal>
>
> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <
> https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
>
>
> This e-mail and all contents, including attachments, is considered to be
> Company Confidential unless explicitly stated otherwise, regardless of
> whether attachments are marked as such.
>
>
>
>
>
>


Re: [EXTERNAL] Docker image along with 1.23?

2019-11-20 Thread Chris Mattmann
Yeah producing the actual image is tricky and my recommendation is for Tika to 
stay out of the business of that. Leave it to LogicalSpark or others to do 
this. It’s 
tricky with licenses and I doubt ASF will ever develop an optimal solution to 
this 
due to the nature of its core mission as Nick stated.

 

 

 

 

From: Eric Pugh 
Reply-To: "dev@tika.apache.org" 
Date: Wednesday, November 20, 2019 at 6:02 PM
To: "dev@tika.apache.org" 
Cc: "Allison, Timothy B (US 1760-Affiliate)" 
Subject: Re: [EXTERNAL] Docker image along with 1.23?

 

I was thinking more of producing the actual image, so that others don’t have to 
go through the pain of compiling an image.   Having the Dockerfile made 
available as well does give a nice recipe for modifying the “official” image.   
I recently tested Tesseract 3 with the latest Tika, and I did it by tweaking 
the existing Dockerfile that LogicalSpark has published.

 

I don’t know how other projects at ASF handle the image publishing.

 

 

 

 

On Nov 20, 2019, at 7:02 PM, Chris Mattmann  wrote:

Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping 
text file, 

code. Under a license. If we create a “docker image” and then publish it to the 
ASF 

hub then I agree with you.

My suggestion and my interpretation of Tim’s is to ship a standard 
“Dockerfile”. Do you

agree with this? It should be air covered (as former VP, Legal, at least it 
would have been

with me). 

Cheers,

Chris

From: Nick Burch 

Reply-To: "dev@tika.apache.org" 

Date: Wednesday, November 20, 2019 at 3:57 PM

To: "Allison, Timothy B (US 1760-Affiliate)" 

Cc: "" 

Subject: [EXTERNAL] Re: Docker image along with 1.23?

On Wed, 20 Nov 2019, Tim Allison wrote:

Eric Pugh recently asked on another channel if we had any plans to

release an official docker image for 1.23.

Depending on what we put in the container, we do need to be a little 

careful. There's "platform dependencies" under non-compatible licenses 

that we can optionally use if people have installed them, which we 

ourselves can't directly ship under ASF rules. (Tesseract is fine as 

that's Apache Licenses, Java itself is trickier, see the Netbeans 

discussions on legal-discuss@ and LEGAL jira)

Shipping an official docker container with the Tika Server on seems to me 

to be a helpful step for users, but we just need to make sure we're 

following ASF policies. (The Apache Software Foundation mission is to 

"provide software for the public good", but source code is the main focus 

for the mission, binaries are trickier!)

Nick

 

___

Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  

Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>
   

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.

 

 



Re: [EXTERNAL] Docker image along with 1.23?

2019-11-20 Thread Eric Pugh
I was thinking more of producing the actual image, so that others don’t have to 
go through the pain of compiling an image.   Having the Dockerfile made 
available as well does give a nice recipe for modifying the “official” image.   
I recently tested Tesseract 3 with the latest Tika, and I did it by tweaking 
the existing Dockerfile that LogicalSpark has published.

I don’t know how other projects at ASF handle the image publishing.




> On Nov 20, 2019, at 7:02 PM, Chris Mattmann  wrote:
> 
> Nick, TBH, I don’t get it. If we ship the “Dockerfile” we are simply shipping 
> text file, 
> code. Under a license. If we create a “docker image” and then publish it to 
> the ASF 
> hub then I agree with you.
> 
> 
> 
> My suggestion and my interpretation of Tim’s is to ship a standard 
> “Dockerfile”. Do you
> agree with this? It should be air covered (as former VP, Legal, at least it 
> would have been
> with me). 
> 
> 
> 
> Cheers,
> 
> Chris
> 
> 
> 
> 
> 
> 
> 
> 
> 
> From: Nick Burch 
> Reply-To: "dev@tika.apache.org" 
> Date: Wednesday, November 20, 2019 at 3:57 PM
> To: "Allison, Timothy B (US 1760-Affiliate)" 
> Cc: "" 
> Subject: [EXTERNAL] Re: Docker image along with 1.23?
> 
> 
> 
> On Wed, 20 Nov 2019, Tim Allison wrote:
> 
> Eric Pugh recently asked on another channel if we had any plans to
> 
> release an official docker image for 1.23.
> 
> 
> 
> Depending on what we put in the container, we do need to be a little 
> 
> careful. There's "platform dependencies" under non-compatible licenses 
> 
> that we can optionally use if people have installed them, which we 
> 
> ourselves can't directly ship under ASF rules. (Tesseract is fine as 
> 
> that's Apache Licenses, Java itself is trickier, see the Netbeans 
> 
> discussions on legal-discuss@ and LEGAL jira)
> 
> 
> 
> Shipping an official docker container with the Tika Server on seems to me 
> 
> to be a helpful step for users, but we just need to make sure we're 
> 
> following ASF policies. (The Apache Software Foundation mission is to 
> 
> "provide software for the public good", but source code is the main focus 
> 
> for the mission, binaries are trickier!)
> 
> 
> 
> Nick
> 
> 
> 

___
Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | 
My Free/Busy <http://tinyurl.com/eric-cal>  
Co-Author: Apache Solr Enterprise Search Server, 3rd Ed 
<https://www.packtpub.com/big-data-and-business-intelligence/apache-solr-enterprise-search-server-third-edition-raw>

This e-mail and all contents, including attachments, is considered to be 
Company Confidential unless explicitly stated otherwise, regardless of whether 
attachments are marked as such.



Re: [EXTERNAL] Docker image along with 1.23?

2019-11-20 Thread Mattmann, Chris A (US 1760)
Sure let's do that. I also have a set of tika-dockers in USCDataScience useful 
for the ML/Deep learning stuff.



Chris Mattmann, Ph.D.
Deputy Chief Technology & Innovation Officer
17x   |   Office of the Chief Information Officer, Chief Technology and 
Innovation Office (1760)

JPL   |   jpl.nasa.gov
4800 Oak Grove Dr, Mail Stop 171-377
Pasadena, California 91109
O 818-354-8810   |   M 626-755-6564


From: Tim Allison 
Reply-To: "dev@tika.apache.org" , "Allison, Timothy B (US 
1760-Affiliate)" 
Date: Wednesday, November 20, 2019 at 1:20 PM
To: "" 
Subject: [EXTERNAL] Docker image along with 1.23?

All,
  Eric Pugh recently asked on another channel if we had any plans to
release an official docker image for 1.23.  IIRC Dave had that up and
running, but I couldn't get it to work as part of the release
process because I was behind a proxy or on Windows or something.  My dev
environment is now different, and I _should_ be able to get it to work.
  Do we want to try to release an official Docker image as part of the 1.23
release?

   Cheers,

   Tim