Subject: Re: Extracting Text from embedded images in PDF docs
Hi Tim
Sure, once I get an initial PR ready I'll send an update and I'll explain what
I did for a start and we will discuss it further
.
:)
-Original Message-
From: Sergey Beryozkin [mailto:sberyoz...@gmail.com]
Sent: Friday, May 19, 2017 12:40 PM
To: user@tika.apache.org
Subject: Re: Extracting Text from embedded images in PDF docs
Hi Tim
On 19/05/17 17:31, Allison, Timothy B. wrote:
The autoscaling feature of Beam and the job
This is fantastic news! Let me know if I can help...I know _nothing_ about
Beam, tho.
:)
-Original Message-
From: Sergey Beryozkin [mailto:sberyoz...@gmail.com]
Sent: Friday, May 19, 2017 12:40 PM
To: user@tika.apache.org
Subject: Re: Extracting Text from embedded images in PDF docs
-2328
It will take me few more weeks to create a PR,
Thanks, Sergey
-Original Message-
From: Sergey Beryozkin [mailto:sberyoz...@gmail.com]
Sent: Friday, May 19, 2017 12:27 PM
To: user@tika.apache.org
Subject: Re: Extracting Text from embedded images in PDF docs
Hi Chris
I'm getting
age-
From: Sergey Beryozkin [mailto:sberyoz...@gmail.com]
Sent: Friday, May 19, 2017 12:27 PM
To: user@tika.apache.org
Subject: Re: Extracting Text from embedded images in PDF docs
Hi Chris
I'm getting nervous now, what will happen to me if it will not work out in the
end :-). Though, it actually d
On 5/19/17, 9:27 AM, "Sergey Beryozkin" wrote:
Hi Chris
I'm getting nervous now, what will happen to me if it will not work out
in the end :-). Though, it actually does work, for me at least :-)
Cheers, Sergey
On 19/05/17 17:23, Mattmann,
> Well, I'm trying to integrate Tika with Apache Beam,
Awesome! I saw two fantastic Beam talks at ApacheCon (two days ago?). I won't
tell anyone. ;)
Hi Chris
I'm getting nervous now, what will happen to me if it will not work out
in the end :-). Though, it actually does work, for me at least :-)
Cheers, Sergey
On 19/05/17 17:23, Mattmann, Chris A (3010) wrote:
Thanks Sergey what an awesome surprise you are the best!
Hi Tim
On 19/05/17 16:47, Allison, Timothy B. wrote:
Yes I was asking about it as I thought it was confusing it did not work
- I saw you following up on this possible issue in the other email...
Y, I agree. That _should_ work.
I'm doing some work with Tika now so it was of an immediate
>Yes I was asking about it as I thought it was confusing it did not work
>- I saw you following up on this possible issue in the other email...
Y, I agree. That _should_ work.
>I'm doing some work with Tika now so it was of an immediate interest to me...
Yay! What are you working on?
>Sure. By
On 19/05/17 16:25, Allison, Timothy B. wrote:
and when is "extractInlineImages" actually effective ?
Not sure I understand the question exactly?
If the question is "why didn't extractInlineImages work on a specific
document"? That's probably a bug or could be user error in the
>>and when is "extractInlineImages" actually effective ?
Not sure I understand the question exactly?
If the question is "why didn't extractInlineImages work on a specific
document"? That's probably a bug or could be user error in the
configuration...either way, please follow up and help us
at the very beginning of integrating OCR with
PDFs. We’d like to add a strategy that applies OCR on a given page if, say, <
10 words are extracted from the text…WDYT?
From: David Pilato [mailto:da...@pilato.fr]
Sent: Friday, May 19, 2017 5:55 AM
To: user@tika.apache.org
Subject: Re: Extracting Text f
Extracting Text from embedded images in PDF docs
Got it working. In case someone else hits the same issue, here is my
config file... Well... That was obvious :D
/
/<*properties*>
<*parsers*>
<*parser class="org.apache.tika.parser.DefaultParser"*/>
<*parser cl
the documentation so that you don’t waste an hour?
From: David Pilato [mailto:da...@pilato.fr]
Sent: Friday, May 19, 2017 5:55 AM
To: user@tika.apache.org
Subject: Re: Extracting Text from embedded images in PDF docs
Got it working. In case someone else hits the same issue, here is my config
file... Well
: Friday, May 19, 2017 at 2:55 AM
To: "user@tika.apache.org" <user@tika.apache.org>
Subject: Re: Extracting Text from embedded images in PDF docs
Got it working. In case someone else hits the same issue, here is my confi
Got it working. In case someone else hits the same issue, here is my config
file... Well... That was obvious :D
ocr_and_text
David
> Le 19 mai 2017 à 10:59, David Pilato a écrit :
>
> So I saw
So I saw in debug mode that indeed config.getExtractInlineImages() is false so
I'm going to check my config.
:D
David
> Le 18 mai 2017 à 22:18, David Pilato a écrit :
>
> Hey guys
>
>
> First post here ;)
>
> I'm trying to play with OCR with Tika. I installed Tesseract
18 matches
Mail list logo