Re: [EXTERNAL] Re: DL4JVGG16NetTest failures

2019-05-08 Thread Thejan Wijesinghe
works locally. :D > > > > On Wed, May 8, 2019 at 11:50 AM Chris Mattmann > wrote: > > > > Great work โ˜บ So itโ€™s fixed? > > > > > > > > > > > > > > > > From: Tim Allison > > Reply-To: "dev@tika.apache.org"

Re: [VOTE] Release Apache Tika 1.19.1 Candidate #2

2018-10-09 Thread Thejan Wijesinghe
All tests passed for me on my linux, +1 from me. On Tue, Oct 9, 2018 at 10:37 PM Tim Allison wrote: > No problem at all. Thank you, Oleg! > On Tue, Oct 9, 2018 at 12:33 PM Oleg Tikhonov > wrote: > > > > sorry. > > +1 > > > > On Tue, Oct 9, 2018 at 7:26 PM Tim Allison wrote: > > > > > Thank yo

[jira] [Created] (TIKA-2720) A parser to output universal sentence encodings to text

2018-09-02 Thread Thejan Wijesinghe (JIRA)
Thejan Wijesinghe created TIKA-2720: --- Summary: A parser to output universal sentence encodings to text Key: TIKA-2720 URL: https://issues.apache.org/jira/browse/TIKA-2720 Project: Tika

[jira] [Commented] (TIKA-2672) Upgrade dl4j to 1.0.0-beta2

2018-08-14 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16580018#comment-16580018 ] Thejan Wijesinghe commented on TIKA-2672: - Thanks alot [~talli...@apache

[jira] [Commented] (TIKA-2672) Upgrade dl4j to 1.0.0-beta

2018-07-26 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16558262#comment-16558262 ] Thejan Wijesinghe commented on TIKA-2672: - [~talli...@apache.org] we get:1

[jira] [Commented] (TIKA-2672) Upgrade dl4j to 1.0.0-beta

2018-07-06 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16534966#comment-16534966 ] Thejan Wijesinghe commented on TIKA-2672: - [~talli...@apache.org] sorry for

[jira] [Commented] (TIKA-2672) Upgrade dl4j to 1.0.0-beta

2018-06-19 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16517630#comment-16517630 ] Thejan Wijesinghe commented on TIKA-2672: - [~thammegowda] anytime bro! [~t

[jira] [Commented] (TIKA-2672) Upgrade dl4j to 1.0.0-beta

2018-06-17 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16515034#comment-16515034 ] Thejan Wijesinghe commented on TIKA-2672: - [~talli...@apache.org] I'll

[jira] [Commented] (TIKA-94) Speech recognition

2018-06-10 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-94?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507725#comment-16507725 ] Thejan Wijesinghe commented on TIKA-94: --- WOW! this issue is so old. [~edwinyeozl]

Re: Welcome Thejan Wijesinghe as an Apache Tika PMC and committer!

2018-05-10 Thread Thejan Wijesinghe
lifetime to accomplish them.[1] https://issues.apache.org/jira/browse/TIKA-2262 <https://issues.apache.org/jira/browse/TIKA-2262> Thanks and Best Regards,ThejanW* On Tue, May 8, 2018 at 12:10 AM Chris Mattmann wrote: > Welcome to Thejan Wijesinghe who has joined as a new Tika PMC m

Re: [ANNOUNCE] Welcome Madhav Sharan as Tika Committer and PMC Member

2017-09-01 Thread Thejan Wijesinghe
Congratulations Madhav Sharan โ˜บ On Fri, Sep 1, 2017 at 6:58 AM, Tyler Bui-Palsulich wrote: > Welcome, Madhav! > > Tyler > > On Aug 31, 2017 1:22 PM, "Allison, Timothy B." wrote: > > > W00t! Welcome, Madhav! > > > > -Original Message- > > From: Chris Mattmann [mailto:mattm...@apache.org

[jira] [Updated] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-08-28 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejan Wijesinghe updated TIKA-2400: Description: # This involves adding apiBaseUris and refactoring current Object Recognition

[jira] [Updated] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-08-28 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejan Wijesinghe updated TIKA-2400: Summary: Standardizing current Object Recognition REST parsers (was: Standardizing current

[jira] [Commented] (TIKA-2402) Support all image formats in Object Recognition REST Parser

2017-08-02 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16110428#comment-16110428 ] Thejan Wijesinghe commented on TIKA-2402: - [~lfcnassif] Oh! sorry! I didn&#

[jira] [Commented] (TIKA-2402) Support all image formats in Object Recognition REST Parser

2017-06-30 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16070128#comment-16070128 ] Thejan Wijesinghe commented on TIKA-2402: - Hi [~lfcnassif] :), this is for

[jira] [Updated] (TIKA-2402) Support all image formats in Object Recognition REST Parser

2017-06-26 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejan Wijesinghe updated TIKA-2402: Summary: Support all image formats in Object Recognition REST Parser (was: Support all

[jira] [Created] (TIKA-2402) Support all image format support for Object Recognition REST Parser

2017-06-26 Thread Thejan Wijesinghe (JIRA)
Thejan Wijesinghe created TIKA-2402: --- Summary: Support all image format support for Object Recognition REST Parser Key: TIKA-2402 URL: https://issues.apache.org/jira/browse/TIKA-2402 Project: Tika

[jira] [Updated] (TIKA-2398) Unifying Object Recognition REST services

2017-06-20 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejan Wijesinghe updated TIKA-2398: Affects Version/s: (was: 1.6) 1.16 > Unifying Object Recognit

[jira] [Created] (TIKA-2400) Standardizing current Object Recognition parsers

2017-06-20 Thread Thejan Wijesinghe (JIRA)
Thejan Wijesinghe created TIKA-2400: --- Summary: Standardizing current Object Recognition parsers Key: TIKA-2400 URL: https://issues.apache.org/jira/browse/TIKA-2400 Project: Tika Issue Type

[jira] [Updated] (TIKA-2398) Unifying Object Recognition REST services

2017-06-20 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejan Wijesinghe updated TIKA-2398: Summary: Unifying Object Recognition REST services (was: Unifying REST services

[jira] [Created] (TIKA-2398) Unifying REST services

2017-06-19 Thread Thejan Wijesinghe (JIRA)
Thejan Wijesinghe created TIKA-2398: --- Summary: Unifying REST services Key: TIKA-2398 URL: https://issues.apache.org/jira/browse/TIKA-2398 Project: Tika Issue Type: Improvement

Re: GSoC 2017 report: Wiki pages created to track the progress

2017-06-06 Thread Thejan Wijesinghe
Hi Thamme, Thank you for creating this. Will try my best to create a great wiki, then we will be able to use it as reference for future GSoC students of Tika. best, ThejanW On Tue, Jun 6, 2017 at 11:34 PM, Thamme Gowda wrote: > Hi Thejan, > > I created wiki pages for reporting the progress of

Re: experiences with Tika in Docker

2017-05-31 Thread Thejan Wijesinghe
Hi Tim, I've used Tika -server in docker but as a single instance only. Yes, its ability to limit container's resources with related to memory & CPU in the host machine is great, it gives us so much flexibility, we could enforce hard/soft memory limits, we could even manipulate the host machine's

[jira] [Commented] (TIKA-2362) Skipping Header and Footer data from documents

2017-05-16 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16012245#comment-16012245 ] Thejan Wijesinghe commented on TIKA-2362: - Can't we use regular expre

Re: Welcome Thejan Wijesinghe GSoC 2017 student!

2017-05-09 Thread Thejan Wijesinghe
Thank you Avtar, Tim Allison.. ๐Ÿ˜Š On Mon, May 8, 2017 at 6:33 PM, Allison, Timothy B. wrote: > Welcome! > > -Original Message- > From: David Meikle [mailto:loo...@gmail.com] > Sent: Saturday, May 6, 2017 8:10 AM > To: dev@tika.apache.org > Subject: Re: Welcome T

Re: Welcome Thejan Wijesinghe GSoC 2017 student!

2017-05-07 Thread Thejan Wijesinghe
Thank you Thamme Gowda and David Meikle.. ๐Ÿ˜Š On Sat, May 6, 2017 at 5:40 PM, David Meikle wrote: > Congratulations and welcome, Thejan! > > On 4 May 2017 at 18:19, Chris Mattmann wrote: > > > Iโ€™d like to welcome Thejan Wijesinghe our Apache Tika GSoC 2017 student, > &g

[jira] [Commented] (TIKA-2293) Tess4jOCRParser - A simpler Java version of TesseractOCRParser

2017-05-05 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15998298#comment-15998298 ] Thejan Wijesinghe commented on TIKA-2293: - Thank you [~talli...@mitre.org] .

Re: Welcome Thejan Wijesinghe GSoC 2017 student!

2017-05-04 Thread Thejan Wijesinghe
@Kranthi G V, Thank you brother. I hope you will be there to help me through out this summer. * On Fri, May 5, 2017 at 12:01 AM, Thejan Wijesinghe < thejan.k.wijesin...@gmail.com> wrote: > @Kranthi G V, Thank you brother. I hope you will be there for me to help > you through out

[jira] [Commented] (TIKA-2322) Video labeling using existing ObjectRecognition

2017-05-02 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15993236#comment-15993236 ] Thejan Wijesinghe commented on TIKA-2322: - I can now edit the Wiki. thank

[jira] [Commented] (TIKA-2322) Video labeling using existing ObjectRecognition

2017-05-02 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2322?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15992998#comment-15992998 ] Thejan Wijesinghe commented on TIKA-2322: - Will do, thanks. [~chrismattmann],

Re: apache tikka is not working for scanned image documents

2017-04-04 Thread Thejan Wijesinghe
btw it's not tikka. It's Tika :) On Wed, Apr 5, 2017 at 11:53 AM, Thejan Wijesinghe < thejan.k.wijesin...@gmail.com> wrote: > Hi Vadivelhan, > > As Chris mentioned, please visit https://wiki.apache.org/tika/TikaOCR and > install Tesseract in your machine. To check t

Re: apache tikka is not working for scanned image documents

2017-04-04 Thread Thejan Wijesinghe
Hi Vadivelhan, As Chris mentioned, please visit https://wiki.apache.org/tika/TikaOCR and install Tesseract in your machine. To check the availability of Tesseract in your machine, type this command without quotes "tesseract test.jpg out " in the terminal and check whether you can OCR an image and

[jira] [Commented] (TIKA-2293) Tess4jOCRParser - A simpler Java version of TesseractOCRParser

2017-03-24 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15941164#comment-15941164 ] Thejan Wijesinghe commented on TIKA-2293: - Thank you, [~talli...@mitre.org]

[jira] [Commented] (TIKA-2293) Tess4jOCRParser - A simpler Java version of TesseractOCRParser

2017-03-23 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15939037#comment-15939037 ] Thejan Wijesinghe commented on TIKA-2293: - About automating the download pro

[jira] [Commented] (TIKA-2293) Tess4jOCRParser - A simpler Java version of TesseractOCRParser

2017-03-23 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15938238#comment-15938238 ] Thejan Wijesinghe commented on TIKA-2293: - I just have some sudden good new

[jira] [Commented] (TIKA-2293) Tess4jOCRParser - A simpler Java version of TesseractOCRParser

2017-03-22 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15937769#comment-15937769 ] Thejan Wijesinghe commented on TIKA-2293: - Thank you Tim and Nick for

[jira] [Commented] (TIKA-2293) Tess4jOCRParser - A simpler Java version of TesseractOCRParser

2017-03-18 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931126#comment-15931126 ] Thejan Wijesinghe commented on TIKA-2293: - Other than that, I have also add

[jira] [Commented] (TIKA-2293) Tess4jOCRParser - A simpler Java version of TesseractOCRParser

2017-03-18 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15931124#comment-15931124 ] Thejan Wijesinghe commented on TIKA-2293: - # So I have created Tess4JOCRParser

[jira] [Updated] (TIKA-2293) Tess4jOCRParser - A simpler Java version of TesseractOCRParser

2017-03-17 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thejan Wijesinghe updated TIKA-2293: Fix Version/s: 1.15 > Tess4jOCRParser - A simpler Java version of TesseractOCRPar

Re: TIKA build error

2017-03-17 Thread Thejan Wijesinghe
ll take a look today. > > -Original Message- > From: Thejan Wijesinghe [mailto:thejan.k.wijesin...@gmail.com] > Sent: Wednesday, March 15, 2017 2:19 PM > To: dev@tika.apache.org > Subject: Re: TIKA build error > > Hi Ken, > > I also came across his report and the failing

Re: TIKA build error

2017-03-15 Thread Thejan Wijesinghe
ng details on OS/java used. > > > On Mar 15, 2017, at 10:20am, Thejan Wijesinghe < > thejan.k.wijesin...@gmail.com> wrote: > > > > With the latest commits of TIKA, it fails to build in my system because > one > > of the unit tests in TIKA parsers g

TIKA build error

2017-03-15 Thread Thejan Wijesinghe
With the latest commits of TIKA, it fails to build in my system because one of the unit tests in TIKA parsers gets failed. This is the failed test. Results : Failed tests: ODFParserTest.testOO2Metadata:196 expected:<[]> but was:<[ ]> Tests in error: ODFParserTest.testNullStylesInODTFooter:36

Request for the dataset to test OCR

2017-03-11 Thread Thejan Wijesinghe
Hello Tim, I need the OCR dataset to benchmark my new OCR parser with the existing one. I just saw that you've commented here, https://issues.apache.org/jira/browse/TIKA-2262?focusedCommentId=15862781&; page=com.atlassian.jira.plugin.system.issuetabpanels: comment-tabpanel#comment-15862781 sayin

Re: Tess4j API for TIKA OCR parser

2017-03-08 Thread Thejan Wijesinghe
t; > Best, > TG > > On Mar 7, 2017 7:44 AM, "Allison, Timothy B." wrote: > > Y and why not give the new tika-eval module a trial to evaluate the > differences in output? :) > > -Original Message- > From: Thamme Gowda [mailto:thammego...@apache.org] &

[jira] [Created] (TIKA-2293) Tess4jOCRParser - A simpler Java version of TesseractOCRParser

2017-03-08 Thread Thejan Wijesinghe (JIRA)
Thejan Wijesinghe created TIKA-2293: --- Summary: Tess4jOCRParser - A simpler Java version of TesseractOCRParser Key: TIKA-2293 URL: https://issues.apache.org/jira/browse/TIKA-2293 Project: Tika

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Thejan Wijesinghe
Hi Nick, I thought the same thing. I will try to keep the public method signatures unchanged and will send updates on my progress. On Tue, Mar 7, 2017 at 5:48 PM, Nick Burch wrote: > On Tue, 7 Mar 2017, Thejan Wijesinghe wrote: > >> I have already use the Tess4j API to

Re: Tess4j API for TIKA OCR parser

2017-03-07 Thread Thejan Wijesinghe
/ocr/TesseractOCRParser.java On Tue, Mar 7, 2017 at 1:17 PM, Thejan Wijesinghe < thejan.k.wijesin...@gmail.com> wrote: > Thamme, > I have already use the Tess4j API to rewrite the TesseractOCRParser class, > Although It successfully extracts content from most of the file types,

Re: Tess4j API for TIKA OCR parser

2017-03-06 Thread Thejan Wijesinghe
ter.com/thammegowda> > ~Sent via somebody's Webmail server! > > On Sat, Mar 4, 2017 at 9:04 AM, Thejan Wijesinghe < > thejan.k.wijesin...@gmail.com> wrote: > > > > > Hi Thamme, > > > > Yes. I am using Ubuntu :) and I had ImageMagick and Tesseract bo

Tess4j API for TIKA OCR parser

2017-03-04 Thread Thejan Wijesinghe
Hi Thamme, Yes. I am using Ubuntu :) and I had ImageMagick and Tesseract both installed in my system using apt-get. Since, I wasn't sure whether this is a problem with the APT software packages, I built both ImageMagick and Tesseract from sources. I also double checked the availability of Tessera

Re: Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

2017-03-03 Thread Thejan Wijesinghe
017 at 6:16 AM, Thamme Gowda wrote: > Thejan, > > Yes, send your questions to us, and cc dev list. > Looking forward to working with you! > > Best, > TG > > -- > Thamme Gowda > TG | @thammegowda > ~Sent via somebody's IMAP server > > On Mar 2, 2017

Re: Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

2017-03-02 Thread Thejan Wijesinghe
Dear Thamme and Chris, I have commented on the particular JIRA page and subscribed to the dev-mailing list as Thamme suggested. I am really interested in looking into the challenges that Thamme has provided. Thank you for guiding me this way. If I get any issues while working on these problems, is

[jira] [Commented] (TIKA-2262) Supporting Image-to-Text (Image Captioning) in Tika for Image MIME Types

2017-03-02 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15892824#comment-15892824 ] Thejan Wijesinghe commented on TIKA-2262: - I'm comfortable with most of t