Re: Re-using a TikaStream

2021-03-01 Thread Nick Burch
On Mon, 1 Mar 2021, Tim Allison wrote: detectors should return the stream reset to the beginning. I agree - needs to be ready for the parser to then process Parsers, IIRC, should return the stream fully(?) read but not closed. Not always - if the parser wanted a File then it may not have

RE: Re-using a TikaStream

2021-03-01 Thread Peter Kronenberg
@tika.apache.org; lfcnas...@gmail.com Subject: Re: Re-using a TikaStream detectors should return the stream reset to the beginning. Parsers, IIRC, should return the stream fully(?) read but not closed. On Mon, Mar 1, 2021 at 10:29 AM Tim Allison mailto:talli...@apache.org>> wrote: Reusing streams

RE: Re-using a TikaStream

2021-03-01 Thread Nick Burch
On Fri, 26 Feb 2021, Peter Kronenberg wrote: For most audio files, using the AudioParser, the buffer is still at the beginning. Even though there is no text extraction, I would think that Tika still needs to read through the stream. The MP3Parser consumes the stream, but the MP4Parser does

Re: Re-using a TikaStream

2021-03-01 Thread Tim Allison
> *From:* Peter Kronenberg >> *Sent:* Friday, February 26, 2021 10:03 PM >> *To:* talli...@apache.org >> *Cc:* user@tika.apache.org; lfcnas...@gmail.com >> *Subject:* RE: Re-using a TikaStream >> >> >> >> This email was sent from outside your

Re: Re-using a TikaStream

2021-03-01 Thread Tim Allison
onenberg > *Sent:* Friday, February 26, 2021 10:03 PM > *To:* talli...@apache.org > *Cc:* user@tika.apache.org; lfcnas...@gmail.com > *Subject:* RE: Re-using a TikaStream > > > > This email was sent from outside your organisation, yet is displaying the > name of someone from y

RE: Re-using a TikaStream

2021-03-01 Thread Peter Kronenberg
? From: Peter Kronenberg Sent: Friday, February 26, 2021 10:03 PM To: talli...@apache.org Cc: user@tika.apache.org; lfcnas...@gmail.com Subject: RE: Re-using a TikaStream This email was sent from outside your organisation, yet is displaying the name of someone from your organisation. This often

RE: Re-using a TikaStream

2021-02-26 Thread Peter Kronenberg
3:17 PM To: Peter Kronenberg Cc: user@tika.apache.org; lfcnas...@gmail.com Subject: Re: Re-using a TikaStream The stream.available() call comes from ProxyInputStream. We don't modify that in TikaInputStream...maybe we should. TikaInputStream wraps an incoming InputStream

Re: Re-using a TikaStream

2021-02-26 Thread Tim Allison
ck to my original question, which > is, what is the best way to consistently be able to re-use the stream? > > > > *From:* Peter Kronenberg > *Sent:* Friday, February 26, 2021 12:18 PM > *To:* user@tika.apache.org; talli...@apache.org > *Cc:* lfcnas...@gmail.com > *Subject:* RE

RE: Re-using a TikaStream

2021-02-26 Thread Peter Kronenberg
to my original question, which is, what is the best way to consistently be able to re-use the stream? From: Peter Kronenberg Sent: Friday, February 26, 2021 12:18 PM To: user@tika.apache.org; talli...@apache.org Cc: lfcnas...@gmail.com Subject: RE: Re-using a TikaStream This email was sent from

RE: Re-using a TikaStream

2021-02-26 Thread Peter Kronenberg
ble: 10546620, position: 0 From: Peter Kronenberg Sent: Thursday, February 25, 2021 11:28 AM To: user@tika.apache.org; talli...@apache.org Cc: lfcnas...@gmail.com Subject: RE: Re-using a TikaStream This email was sent from outside your organisation, yet is displaying the name of someone from your orga

RE: Re-using a TikaStream

2021-02-25 Thread Peter Kronenberg
as...@gmail.com>; user@tika.apache.org<mailto:user@tika.apache.org> Subject: Re: Re-using a TikaStream Are you initializing w a file or a stream? On Thu, Feb 25, 2021 at 9:00 AM Peter Kronenberg mailto:peter.kronenb...@torch.ai>> wrote: But how is TikaInputStream allowing me to re-u

Re: Re-using a TikaStream

2021-02-25 Thread Tim Allison
gt; it can do it all in memory, that’s obviously better. And for my use case, > I don’t **always** have to re-read the stream. > > > > *From:* Tim Allison > *Sent:* Thursday, February 25, 2021 5:48 AM > *To:* user@tika.apache.org > *Cc:* lfcnas...@gmail.com > *Subject:* R

RE: Re-using a TikaStream

2021-02-25 Thread Peter Kronenberg
: Thursday, February 25, 2021 5:48 AM To: user@tika.apache.org Cc: lfcnas...@gmail.com Subject: Re: Re-using a TikaStream My $0.02 would be to use TikaInputStream because that gets a lot more use and is battle-tested. Within the last year or so, we started using RereadableInputStream in one

Re: Re-using a TikaStream

2021-02-25 Thread Tim Allison
ne, the stream would be used up. > > > > What is going on? > > > > > > *From:* Peter Kronenberg > *Sent:* Tuesday, February 23, 2021 10:00 AM > *To:* user@tika.apache.org; lfcnas...@gmail.com > *Subject:* RE: Re-using a TikaStream > > > > This

RE: Re-using a TikaStream

2021-02-23 Thread Peter Kronenberg
that once the Tika parse was done, the stream would be used up. What is going on? From: Peter Kronenberg Sent: Tuesday, February 23, 2021 10:00 AM To: user@tika.apache.org; lfcnas...@gmail.com Subject: RE: Re-using a TikaStream This email was sent from outside your organisation, yet is displaying

RE: Re-using a TikaStream

2021-02-23 Thread Peter Kronenberg
From: Peter Kronenberg Sent: Monday, February 22, 2021 8:30 PM To: lfcnas...@gmail.com Cc: user@tika.apache.org Subject: RE: Re-using a TikaStream This email was sent from outside your organisation, yet is displaying the name of someone from your organisation. This often happens in phishing attempts

RE: Re-using a TikaStream

2021-02-23 Thread Nick Burch
On Tue, 23 Feb 2021, Peter Kronenberg wrote: I was re-reading some emails with Nick Burch back around Dec 22-23 and maybe I mis-understood him, but it sounds like he was saying that TiksInputStream was smart enough to automatically spool the stream to disk to allow re-use. If a parser knows

RE: Re-using a TikaStream

2021-02-22 Thread Peter Kronenberg
eal pass InputStream is = tis.getInputStreamFactory().getInputStream() // second real pass } From: Luís Filipe Nassif Sent: Monday, February 22, 2021 5:42 PM To: Peter Kronenberg Cc: user@tika.apache.org Subject: Re: Re-using a TikaStream Something like: class MyInputStreamFactory impleme

Re: Re-using a TikaStream

2021-02-22 Thread Luís Filipe Nassif
ser@tika.apache.org; lfcnas...@gmail.com > *Subject:* RE: Re-using a TikaStream > > > > This email was sent from outside your organisation, yet is displaying the > name of someone from your organisation. This often happens in phishing > attempts. Please only interact with thi

RE: Re-using a TikaStream

2021-02-22 Thread Peter Kronenberg
I sent this question late on Friday. Sending it again. Can you provide a little more information how out to use the InputStreamFactory? From: Peter Kronenberg Sent: Friday, February 19, 2021 5:10 PM To: user@tika.apache.org; lfcnas...@gmail.com Subject: RE: Re-using a TikaStream This email

RE: Re-using a TikaStream

2021-02-19 Thread Peter Kronenberg
that TikaInputStream already automatically saved to disk to allow re-reading. From: Luís Filipe Nassif mailto:lfcnas...@gmail.com>> Sent: Friday, February 19, 2021 3:44 PM To: user@tika.apache.org<mailto:user@tika.apache.org> Subject: Re: Re-using a TikaStream You could call TikaInputSt

RE: Re-using a TikaStream

2021-02-19 Thread Peter Kronenberg
Thanks. I thought that TikaInputStream already automatically saved to disk to allow re-reading. From: Luís Filipe Nassif Sent: Friday, February 19, 2021 3:44 PM To: user@tika.apache.org Subject: Re: Re-using a TikaStream You could call TikaInputStream.getPath() at the beginning of your parser

Re: Re-using a TikaStream

2021-02-19 Thread Luís Filipe Nassif
You could call TikaInputStream.getPath() at the beginning of your parser, it will spool to file if not file based. After consuming the original inputStream, create a new one from the temp file created. If you are using 2.0.0-ALPHA, there is:

Re-using a TikaStream

2021-02-19 Thread Peter Kronenberg
If I finish parsing a TikaStream, can I re-use the stream (before it is closed)? I know you said that there is some magic behind the scenes where it spools it to a file. Can I just call reset() to start from the beginning? Peter Peter Kronenberg | Senior AI Analytic ENGINEER C: