Hello!

2015-04-06 Thread Doshi, Nipurn
Hi, I would like to introduce myself to the dev list. I am Nipurn Doshi, graduate student at Indiana University. I will be interning as User Experience Researcher in summer 2015 working with Chris Mattmann. Excited to learn and work with everyone. -Regards, Nipurn Doshi

[jira] [Commented] (TIKA-1330) Add robust tika-batch code

2015-04-06 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481555#comment-14481555 ] Konstantin Gribov commented on TIKA-1330: - [~talli...@mitre.org], you have mixed

[jira] [Commented] (TIKA-1330) Add robust tika-batch code

2015-04-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481614#comment-14481614 ] Hudson commented on TIKA-1330: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #607 (See

[jira] [Commented] (TIKA-1330) Add robust tika-batch code

2015-04-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481902#comment-14481902 ] Tim Allison commented on TIKA-1330: --- Thank you, [~grossws]! Add robust tika-batch code

[jira] [Commented] (TIKA-1330) Add robust tika-batch code

2015-04-06 Thread Konstantin Gribov (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481917#comment-14481917 ] Konstantin Gribov commented on TIKA-1330: - That was just a test to check that

[jira] [Commented] (TIKA-1330) Add robust tika-batch code

2015-04-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482242#comment-14482242 ] Tim Allison commented on TIKA-1330: --- Thank you for the offer! I'm going to turn to some

Re: Hello!

2015-04-06 Thread Mattmann, Chris A (3980)
Welcome Nipurn, looking forward to this!! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527

[jira] [Commented] (TIKA-1330) Add robust tika-batch code

2015-04-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14482241#comment-14482241 ] Tim Allison commented on TIKA-1330: --- Thank you for the offer! I'm going to turn to some

Re: Warm hello!

2015-04-06 Thread Mattmann, Chris A (3980)
Shivika, really looking forward to your contribution! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519,

Re: Any interest in running Apache Tika as part of CommonCrawl?

2015-04-06 Thread John Hewson
Yes, this would be great, if you need any PDFBox assistance then count me in. -- John On 3 Apr 2015, at 05:35, tallison314...@gmail.com wrote: All, What do we think? On Friday, April 3, 2015 at 8:23:11 AM UTC-4, talliso...@gmail.com wrote: CommonCrawl currently has the WET format that

[jira] [Commented] (TIKA-1519) Don't allow whatever is in http-equiv Content-Type to overwrite actual Content-Type in HtmlParser

2015-04-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14481227#comment-14481227 ] Hudson commented on TIKA-1519: -- SUCCESS: Integrated in tika-trunk-jdk1.7 #605 (See

[jira] [Resolved] (TIKA-1519) Don't allow whatever is in http-equiv Content-Type to overwrite actual Content-Type in HtmlParser

2015-04-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1519. --- Resolution: Fixed r1671533. For now, I've added Content-Type-Hint, and I'm only currently using it in

[jira] [Updated] (TIKA-1519) Don't allow whatever is in http-equiv Content-Type to overwrite actual Content-Type in HtmlParser

2015-04-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1519: -- Fix Version/s: (was: 1.9) 1.8 Don't allow whatever is in http-equiv Content-Type

[jira] [Updated] (TIKA-93) OCR support

2015-04-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-93: - Labels: memex (was: ) OCR support --- Key: TIKA-93

[jira] [Updated] (TIKA-1422) org.apache.tika.parser.mail.RFC822ParserTest fails

2015-04-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-1422: --- Labels: memex (was: ) org.apache.tika.parser.mail.RFC822ParserTest fails

[jira] [Updated] (TIKA-605) Tika GDAL parser

2015-04-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-605: -- Labels: gdal gsoc2013 integration memex mentor tika (was: gdal gsoc2013 integration

[jira] [Updated] (TIKA-1441) ExternalParsers should allow dynamic keys to be specified for Regexs

2015-04-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-1441: --- Labels: memex (was: ) ExternalParsers should allow dynamic keys to be specified for

[jira] [Updated] (TIKA-1391) Create Parser.parse() example

2015-04-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-1391: --- Labels: memex (was: ) Create Parser.parse() example -

[jira] [Updated] (TIKA-1421) Tika-Parsers tests fail on CentOS6 if tesseract isn't installed

2015-04-06 Thread Lewis John McGibbney (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lewis John McGibbney updated TIKA-1421: --- Labels: memex (was: ) Tika-Parsers tests fail on CentOS6 if tesseract isn't