On Mon, 1 Mar 2021, Tim Allison wrote:
detectors should return the stream reset to the beginning.
I agree - needs to be ready for the parser to then process
Parsers, IIRC, should return the stream fully(?) read but not closed.
Not always - if the parser wanted a File then it may not have
@tika.apache.org; lfcnas...@gmail.com
Subject: Re: Re-using a TikaStream
detectors should return the stream reset to the beginning.
Parsers, IIRC, should return the stream fully(?) read but not closed.
On Mon, Mar 1, 2021 at 10:29 AM Tim Allison
mailto:talli...@apache.org>> wrote:
Reusing streams
On Fri, 26 Feb 2021, Peter Kronenberg wrote:
For most audio files, using the AudioParser, the buffer is still at the
beginning. Even though there is no text extraction, I would think that
Tika still needs to read through the stream. The MP3Parser consumes the
stream, but the MP4Parser does
> *From:* Peter Kronenberg
>> *Sent:* Friday, February 26, 2021 10:03 PM
>> *To:* talli...@apache.org
>> *Cc:* user@tika.apache.org; lfcnas...@gmail.com
>> *Subject:* RE: Re-using a TikaStream
>>
>>
>>
>> This email was sent from outside your
onenberg
> *Sent:* Friday, February 26, 2021 10:03 PM
> *To:* talli...@apache.org
> *Cc:* user@tika.apache.org; lfcnas...@gmail.com
> *Subject:* RE: Re-using a TikaStream
>
>
>
> This email was sent from outside your organisation, yet is displaying the
> name of someone from y
?
From: Peter Kronenberg
Sent: Friday, February 26, 2021 10:03 PM
To: talli...@apache.org
Cc: user@tika.apache.org; lfcnas...@gmail.com
Subject: RE: Re-using a TikaStream
This email was sent from outside your organisation, yet is displaying the name
of someone from your organisation. This often
3:17 PM
To: Peter Kronenberg
Cc: user@tika.apache.org; lfcnas...@gmail.com
Subject: Re: Re-using a TikaStream
The stream.available() call comes from ProxyInputStream. We don't modify that
in TikaInputStream...maybe we should.
TikaInputStream wraps an incoming InputStream
ck to my original question, which
> is, what is the best way to consistently be able to re-use the stream?
>
>
>
> *From:* Peter Kronenberg
> *Sent:* Friday, February 26, 2021 12:18 PM
> *To:* user@tika.apache.org; talli...@apache.org
> *Cc:* lfcnas...@gmail.com
> *Subject:* RE
to my original question, which is,
what is the best way to consistently be able to re-use the stream?
From: Peter Kronenberg
Sent: Friday, February 26, 2021 12:18 PM
To: user@tika.apache.org; talli...@apache.org
Cc: lfcnas...@gmail.com
Subject: RE: Re-using a TikaStream
This email was sent from
ble: 10546620, position: 0
From: Peter Kronenberg
Sent: Thursday, February 25, 2021 11:28 AM
To: user@tika.apache.org; talli...@apache.org
Cc: lfcnas...@gmail.com
Subject: RE: Re-using a TikaStream
This email was sent from outside your organisation, yet is displaying the name
of someone from your orga
as...@gmail.com>;
user@tika.apache.org<mailto:user@tika.apache.org>
Subject: Re: Re-using a TikaStream
Are you initializing w a file or a stream?
On Thu, Feb 25, 2021 at 9:00 AM Peter Kronenberg
mailto:peter.kronenb...@torch.ai>> wrote:
But how is TikaInputStream allowing me to re-u
gt; it can do it all in memory, that’s obviously better. And for my use case,
> I don’t **always** have to re-read the stream.
>
>
>
> *From:* Tim Allison
> *Sent:* Thursday, February 25, 2021 5:48 AM
> *To:* user@tika.apache.org
> *Cc:* lfcnas...@gmail.com
> *Subject:* R
: Thursday, February 25, 2021 5:48 AM
To: user@tika.apache.org
Cc: lfcnas...@gmail.com
Subject: Re: Re-using a TikaStream
My $0.02 would be to use TikaInputStream because that gets a lot more use and
is battle-tested. Within the last year or so, we started using
RereadableInputStream in one
ne, the stream would be used up.
>
>
>
> What is going on?
>
>
>
>
>
> *From:* Peter Kronenberg
> *Sent:* Tuesday, February 23, 2021 10:00 AM
> *To:* user@tika.apache.org; lfcnas...@gmail.com
> *Subject:* RE: Re-using a TikaStream
>
>
>
> This
that once the Tika parse was done, the stream
would be used up.
What is going on?
From: Peter Kronenberg
Sent: Tuesday, February 23, 2021 10:00 AM
To: user@tika.apache.org; lfcnas...@gmail.com
Subject: RE: Re-using a TikaStream
This email was sent from outside your organisation, yet is displaying
From: Peter Kronenberg
Sent: Monday, February 22, 2021 8:30 PM
To: lfcnas...@gmail.com
Cc: user@tika.apache.org
Subject: RE: Re-using a TikaStream
This email was sent from outside your organisation, yet is displaying the name
of someone from your organisation. This often happens in phishing attempts
On Tue, 23 Feb 2021, Peter Kronenberg wrote:
I was re-reading some emails with Nick Burch back around Dec 22-23 and
maybe I mis-understood him, but it sounds like he was saying that
TiksInputStream was smart enough to automatically spool the stream to
disk to allow re-use.
If a parser knows
eal pass
InputStream is = tis.getInputStreamFactory().getInputStream()
// second real pass
}
From: Luís Filipe Nassif
Sent: Monday, February 22, 2021 5:42 PM
To: Peter Kronenberg
Cc: user@tika.apache.org
Subject: Re: Re-using a TikaStream
Something like:
class MyInputStreamFactory impleme
ser@tika.apache.org; lfcnas...@gmail.com
> *Subject:* RE: Re-using a TikaStream
>
>
>
> This email was sent from outside your organisation, yet is displaying the
> name of someone from your organisation. This often happens in phishing
> attempts. Please only interact with thi
I sent this question late on Friday. Sending it again. Can you provide a
little more information how out to use the InputStreamFactory?
From: Peter Kronenberg
Sent: Friday, February 19, 2021 5:10 PM
To: user@tika.apache.org; lfcnas...@gmail.com
Subject: RE: Re-using a TikaStream
This email
that TikaInputStream already automatically saved to disk to
allow re-reading.
From: Luís Filipe Nassif mailto:lfcnas...@gmail.com>>
Sent: Friday, February 19, 2021 3:44 PM
To: user@tika.apache.org<mailto:user@tika.apache.org>
Subject: Re: Re-using a TikaStream
You could call TikaInputSt
Thanks. I thought that TikaInputStream already automatically saved to disk to
allow re-reading.
From: Luís Filipe Nassif
Sent: Friday, February 19, 2021 3:44 PM
To: user@tika.apache.org
Subject: Re: Re-using a TikaStream
You could call TikaInputStream.getPath() at the beginning of your parser
You could call TikaInputStream.getPath() at the beginning of your parser,
it will spool to file if not file based. After consuming the original
inputStream, create a new one from the temp file created.
If you are using 2.0.0-ALPHA, there is:
If I finish parsing a TikaStream, can I re-use the stream (before it is
closed)? I know you said that there is some magic behind the scenes where it
spools it to a file. Can I just call reset() to start from the beginning?
Peter
Peter Kronenberg | Senior AI Analytic ENGINEER
C:
24 matches
Mail list logo