[
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175914#comment-16175914
]
ASF GitHub Bot commented on TIKA-2400:
--
smadha commented on a change in pull request #208: Fix for
[
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175913#comment-16175913
]
ASF GitHub Bot commented on TIKA-2400:
--
smadha commented on a change in pull request #208: Fix for
[
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175911#comment-16175911
]
ASF GitHub Bot commented on TIKA-2400:
--
smadha commented on a change in pull request #208: Fix for
[
https://issues.apache.org/jira/browse/TIKA-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175537#comment-16175537
]
Hudson commented on TIKA-2466:
--
SUCCESS: Integrated in Jenkins build Tika-trunk #1370 (See
Like Sergey, it’ll take me some time to understand your recommendations. Thank
you!
On one small point:
>return a PCollection>, where ParseResult is a
>class with properties { String content, Metadata metadata }
For this option, I’d strongly encourage using the
Hi all,
One other thing is that Tika extracts metadata, and language information in
which order
doesn’t matter (Keys can be out of order).
Would this be useful?
Cheers,
Chris
On 9/21/17, 2:10 PM, "Sergey Beryozkin" wrote:
Hi Eugene
Thank you, very
Hi Eugene
Thank you, very helpful, let me read it few times before I get what
exactly I need to clarify :-), two questions so far:
On 21/09/17 21:40, Eugene Kirpichov wrote:
Thanks all for the discussion. It seems we have consensus that both
within-document order and association with the
[
https://issues.apache.org/jira/browse/TIKA-2466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-2466.
---
Resolution: Fixed
Fix Version/s: 1.17
Many thanks [~rombert] for your patches! We'll probably
Hi all,
Please also welcome Chris to this thread,
Chris, thanks for joining in :-), FYI, the main concern that was raised
is that it was not obvious when to use TikaIO in the current form, given
that Beam+TikaIO will have a totally unordered sequence of data
(originally extracted by Tika in
Hi Chris, thanks,
On 21/09/17 18:54, Chris Mattmann wrote:
Thanks Sergey, feel free to CC me directly at mattm...@apache.org on the Beam
thread.
My own 2c is that Tika’s “metadata” extraction can be any order, and with our
tika-dl module
and the new feature extraction from multimedia files
Thanks Sergey, feel free to CC me directly at mattm...@apache.org on the Beam
thread.
My own 2c is that Tika’s “metadata” extraction can be any order, and with our
tika-dl module
and the new feature extraction from multimedia files using Tensorflow and DL4j
these are
perfect examples where the
Hi Tim
On 21/09/17 14:33, Allison, Timothy B. wrote:
Thank you, Sergey.
My knowledge of Apache Beam is limited -- I saw Davor and Jean-Baptiste's talk
at ApacheCon in Miami, and I was and am totally impressed, but I haven't had a
chance to work with it yet.
From my perspective, if I
Thank you, Sergey.
My knowledge of Apache Beam is limited -- I saw Davor and Jean-Baptiste's talk
at ApacheCon in Miami, and I was and am totally impressed, but I haven't had a
chance to work with it yet.
From my perspective, if I understand this thread (and I may not!), getting
unordered
Hi Tim
Thanks, will link you to the thread shortly
In general, I'd say TikaIO has probably generated more interest then
some of the other Beam IOs which is a good sign :-)
The questions at the moment:
1) what interesting things can be done with the unordered Tika produced data
2) would it
Hi Sergey,
I just subscribed to Beam's dev list. Can you forward me your latest email so
that I can respond to the thread? Or can you ping me via their list? Thank
you!
-Original Message-
From: Sergey Beryozkin [mailto:sberyoz...@gmail.com]
Sent: Thursday, September 21, 2017 5:53
Hi Guys
TikaIO is getting some serious attention now on the Beam dev, and
unfortunately it is not all about it being a great addition to Beam.
The team is wondering what one can do with TikaIO vs someone just doing
some custom Beam function.
TikaIO and as any other Bounded text reader will
16 matches
Mail list logo