[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-11-21 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16262055#comment-16262055
 ] 

Hudson commented on TIKA-2400:
--

SUCCESS: Integrated in Jenkins build Tika-trunk #1395 (See 
[https://builds.apache.org/job/Tika-trunk/1395/])
Update changes with TIKA-2400 / GH-208 (chris.a.mattmann: 
[https://github.com/apache/tika/commit/946614badc212eab8cd59a437ed28f07b14c2fc4])
* (edit) CHANGES.txt


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Assignee: Chris A. Mattmann
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261082#comment-16261082
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

chrismattmann commented on issue #208: Fix for TIKA-2400 Standardizing current 
Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-346096928
 
 
   nevermind @ThejanW I did it


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Assignee: Chris A. Mattmann
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261069#comment-16261069
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

chrismattmann commented on issue #208: Fix for TIKA-2400 Standardizing current 
Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-346094114
 
 
   @ThejanW can you please also remove the Docker files present in 
captioning/tf and in recognition/tf?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261065#comment-16261065
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

chrismattmann closed pull request #208: Fix for TIKA-2400 Standardizing current 
Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/tika-parsers/src/main/java/org/apache/tika/parser/captioning/tf/TensorflowRESTCaptioner.java
 
b/tika-parsers/src/main/java/org/apache/tika/parser/captioning/tf/TensorflowRESTCaptioner.java
index d49ef0fed..5fd9d9a97 100644
--- 
a/tika-parsers/src/main/java/org/apache/tika/parser/captioning/tf/TensorflowRESTCaptioner.java
+++ 
b/tika-parsers/src/main/java/org/apache/tika/parser/captioning/tf/TensorflowRESTCaptioner.java
@@ -72,16 +72,16 @@
 MediaType.image("gif")
 })));
 
-private static final String LABEL_LANG = "en";
+private static final String LABEL_LANG = "eng";
 
 @Field
-private URI apiBaseUri;
+private URI apiBaseUri = URI.create("http://localhost:8764/inception/v3;);
 
 @Field
-private int captions;
+private int captions = 5;
 
 @Field
-private int maxCaptionLength;
+private int maxCaptionLength = 15;
 
 private URI apiUri;
 
@@ -107,7 +107,7 @@ public boolean isAvailable() {
 public void initialize(Map params) throws 
TikaConfigException {
 try {
 healthUri = URI.create(apiBaseUri + "/ping");
-apiUri = URI.create(apiBaseUri + 
String.format(Locale.getDefault(), 
"/captions?beam_size=%1$d_caption_length=%2$d",
+apiUri = URI.create(apiBaseUri + 
String.format(Locale.getDefault(), 
"/caption/image?beam_size=%1$d_caption_length=%2$d",
 captions, maxCaptionLength));
 
 DefaultHttpClient client = new DefaultHttpClient();
diff --git 
a/tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 
b/tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
index 37caf4538..a5a126ba9 100644
--- 
a/tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
+++ 
b/tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
@@ -55,11 +55,9 @@
  * properties
  *  parsers
  *   parser 
class=org.apache.tika.parser.recognition.ObjectRecognitionParser
- *mimeimage/jpeg/mime
  *params
- *  param name=topN type=int2/param
- *  param name=minConfidence 
type=double0.015/param
  *  param name=class 
type=stringorg.apache.tika.parser.recognition.tf.TensorflowRESTRecogniser/param
+ *  param name=class 
type=stringorg.apache.tika.parser.captioning.tf.TensorflowRESTCaptioner/param
  */params
  *   /parser
  *  /parsers
@@ -83,12 +81,6 @@ public int compare(RecognisedObject o1, RecognisedObject o2) 
{
 }
 };
 
-@Field
-private double minConfidence = 0.05;
-
-@Field
-private int topN = 2;
-
 private ObjectRecogniser recogniser;
 
 @Field(name = "class")
@@ -102,7 +94,6 @@ public void initialize(Map params) throws 
TikaConfigException {
 recogniser.initialize(params);
 LOG.info("Recogniser = {}", recogniser.getClass().getName());
 LOG.info("Recogniser Available = {}", recogniser.isAvailable());
-LOG.info("minConfidence = {}, topN={}", minConfidence, topN);
 }
 
 @Override
@@ -140,29 +131,17 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
 for (RecognisedObject object : objects) {
 if (object instanceof CaptionObject) {
 if (xhtmlStartVal == null) xhtmlStartVal = "captions";
-LOG.debug("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_IMG_CAP, mdValue);
-acceptedObjects.add(object);
+String labelAndConfidence = String.format(Locale.ENGLISH, 
"%s (%.5f)", object.getLabel(), object.getConfidence());
+metadata.add(MD_KEY_IMG_CAP, labelAndConfidence);
 xhtmlIds.add(String.valueOf(count++));
 } else {
 if (xhtmlStartVal == null) xhtmlStartVal = "objects";
-if (object.getConfidence() >= minConfidence) {
-count++;
-LOG.info("Add {}", object);
-String mdValue = 

[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261066#comment-16261066
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

chrismattmann commented on issue #208: Fix for TIKA-2400 Standardizing current 
Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-346093396
 
 
   yes, looks great! great job @ThejanW 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-11-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16261049#comment-16261049
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on issue #208: Fix for TIKA-2400 Standardizing current Object 
Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-346091934
 
 
   @chrismattmann can we merge this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16240829#comment-16240829
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on issue #208: Fix for TIKA-2400 Standardizing current Object 
Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-342274754
 
 
   @thammegowda @chrismattmann @smadha This is complete now. I have updated 
tensorflow version and models to the latest(tf 1.4.0). Currently object rec 
REST parsers are not functioning due to the URL change of  
imagenet_lsvrc_2015_synsets.txt & imagenet_metadata.txt. By this PR, those 
issues can also be resolved. Therefore it would be nice if we can merge this 
before 1.17. Testing instructions are included in the initial comment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-10-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205529#comment-16205529
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on issue #208: Fix for TIKA-2400 Standardizing current Object 
Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-336805333
 
 
   The new urls are,
   
https://raw.githubusercontent.com/tensorflow/models/master/research/inception/inception/data/imagenet_lsvrc_2015_synsets.txt
   
https://raw.githubusercontent.com/tensorflow/models/master/research/inception/inception/data/imagenet_metadata.txt
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-10-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205509#comment-16205509
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on issue #208: Fix for TIKA-2400 Standardizing current Object 
Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-336800656
 
 
   I was getting the same error. Nothing is wrong with your docker setup. The 
problem was with the download url of **imagenet_lsvrc_2015_synsets.txt** & 
**imagenet_metadata.txt**. Apparently tf maintainers have moved these meta 
files and models to another repo https://github.com/tensorflow/serving. 
   See, 
https://raw.githubusercontent.com/tensorflow/models/master/inception/inception/data/imagenet_lsvrc_2015_synsets.txt
   
https://raw.githubusercontent.com/tensorflow/models/master/inception/inception/data/imagenet_metadata.txt
   you will get 404. I'll update with the new URLs
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-10-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205506#comment-16205506
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on issue #208: Fix for TIKA-2400 Standardizing current Object 
Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-336800656
 
 
   I was getting the same error. Nothing is wrong with your docker setup. The 
problem was with the download url of **imagenet_lsvrc_2015_synsets.txt** & 
**imagenet_metadata.txt**. Apparently tf maintainers have moved these files to 
another location. 
   See, 
https://raw.githubusercontent.com/tensorflow/models/master/inception/inception/data/imagenet_lsvrc_2015_synsets.txt
   
https://raw.githubusercontent.com/tensorflow/models/master/inception/inception/data/imagenet_metadata.txt
   you will get 404. I'll update with the new URLs
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-10-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205504#comment-16205504
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on issue #208: Fix for TIKA-2400 Standardizing current Object 
Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-336800656
 
 
   I was getting the same error. Nothing is wrong with your docker setup. The 
problem was with the download url of **imagenet_lsvrc_2015_synsets.txt** & 
imagenet_metadata.txt. Apparently tf maintainers have moved these files to 
another location. 
   See, 
https://raw.githubusercontent.com/tensorflow/models/master/inception/inception/data/imagenet_lsvrc_2015_synsets.txt
   
https://raw.githubusercontent.com/tensorflow/models/master/inception/inception/data/imagenet_metadata.txt
   you will get 404. I'll update with the new URLs
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-10-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16205254#comment-16205254
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

thammegowda commented on issue #208: Fix for TIKA-2400 Standardizing current 
Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-336733795
 
 
   I cant test image as well as video docker.
   
   ```
   docker run -it -p 8764:8764 uscdatascience/inception-rest-tika
   Unable to find image 'uscdatascience/inception-rest-tika:latest' locally
   latest: Pulling from uscdatascience/inception-rest-tika
   9fb6c798fa41: Already exists
   3b61febd4aef: Already exists
   9d99b9777eb0: Already exists
   d010c8cf75d7: Already exists
   7fac07fb303e: Already exists
   5601f0fca79b: Already exists
   dad2688af054: Already exists
   efa7176a3f6c: Already exists
   5ba941a90099: Already exists
   b5a6f1155f94: Already exists
   7e863f718dc4: Already exists
   Digest: 
sha256:20840a9c9e5cd2fed7d6c19ba38901c9ba6ec06fe0afe13b9f6624dc12e2
   Status: Downloaded newer image for uscdatascience/inception-rest-tika:latest
   Can't import video libraries, No video functionality is available
   Traceback (most recent call last):
 File "/usr/bin/inceptionapi.py", line 265, in 
   app = Classifier(__name__)
 File "/usr/bin/inceptionapi.py", line 221, in __init__
   self.names = imagenet.create_readable_names_for_imagenet_labels()
 File 
"/models-c15fada28113eca32dc98d6e3bec4755d0d5b4c2/slim/datasets/imagenet.py", 
line 93, in create_readable_names_for_imagenet_labels
   assert num_synsets_in_ilsvrc == 1000
   AssertionError
   ```
   
   ```
   $ docker run -it -p 8764:8764 uscdatascience/inception-video-rest-tika
   .
   cv2.__version__ 3.2.0
   Traceback (most recent call last):
 File "/usr/bin/inceptionapi.py", line 265, in 
   app = Classifier(__name__)
 File "/usr/bin/inceptionapi.py", line 221, in __init__
   self.names = imagenet.create_readable_names_for_imagenet_labels()
 File 
"/models-c15fada28113eca32dc98d6e3bec4755d0d5b4c2/slim/datasets/imagenet.py", 
line 93, in create_readable_names_for_imagenet_labels
   assert num_synsets_in_ilsvrc == 1000
   AssertionError
   ```
   
   I will wait for others to test and confirm if the issue is with my docker 
setup or with the images
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187225#comment-16187225
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r142016934
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -140,29 +133,17 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
 for (RecognisedObject object : objects) {
 if (object instanceof CaptionObject) {
 if (xhtmlStartVal == null) xhtmlStartVal = "captions";
-LOG.debug("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_IMG_CAP, mdValue);
-acceptedObjects.add(object);
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
 
 Review comment:
   As of now to get label and confidence people have to split. I think 
traversing two arrays in a single loop will be easier than that. We can ensure 
that these two arrays are of same length.
   
   Also if you want JSON why don't store a serialised JSON in one metadata key, 
looks bad but better than a single String with space separated label and 
confidence. 
   
   I'll leave it upto you guys. :+1:
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186973#comment-16186973
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on issue #208: Fix for TIKA-2400 Standardizing current Object 
Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-333291933
 
 
   @thammegowda you can't test the video docker, because you haven't pulled the 
correct docker image. The docker image for video docker is 
`uscdatascience/inception-video-rest-tika`. Please see my initial comment of 
this PR.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16186968#comment-16186968
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r142000829
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   @thammegowda your understanding is exactly correct.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178404#comment-16178404
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

thammegowda commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140669630
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   Correct me if my understanding is wrong:
   -  we have removed minConfidence and topN from ObjectRecognitionParser
   + We have added them to classes that implement `ObjectRecogniser` interface 
- Like TensorflowRestRecogniser, TensforflowRestImageCaptioner etc ..  These 
are referred as _client_ in Thejan's terminology
   + We also have URL accompanying each _client_, which allow tweaking of these 
parameters.
   
   
   Food for Design thought: We might not have URLs for every client. to be 
specific - we could have a client using DL4J that doesn't use REST 
communication. So these parameters are required for the client and hence they 
should have it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178398#comment-16178398
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

thammegowda commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140669630
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   Correct me if my understanding is wrong:
   -  we have removed minConfidence and topN from ObjectRecognitionParser
   + We have added them to classes that implement `ObjectRecogniser` interface 
- Like TensorflowRestRecogniser, TensforflowRestImageCaptioner etc ..  These 
are referred as _client_ in Thejan's terminalogy
   + We also have URL accompanying each _client_, which allow tweaking of these 
parameters.
   
   
   Food for Design thought: We might not have URLs for every client. to be 
specific - we could have a client using DL4J that doesn't use REST 
communication. So these parameters are required for the client and hence they 
should have it.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178403#comment-16178403
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

thammegowda commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140669424
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/captioning/tf/TensorflowRESTCaptioner.java
 ##
 @@ -107,7 +107,7 @@ public boolean isAvailable() {
 public void initialize(Map params) throws 
TikaConfigException {
 try {
 healthUri = URI.create(apiBaseUri + "/ping");
-apiUri = URI.create(apiBaseUri + 
String.format(Locale.getDefault(), 
"/captions?beam_size=%1$d_caption_length=%2$d",
+apiUri = URI.create(apiBaseUri + 
String.format(Locale.getDefault(), 
"/caption/image?beam_size=%1$d_caption_length=%2$d",
 
 Review comment:
   Improvement: `String.format(Locale.getDefault()`, ...) and 
`String.format(...)` are equivalent right (default is inferred implicitely)? 
   Rule of thumb - 1) When you have two options, pick the simple one! For me, 
latter one looks simple
   2) If you want to enforce a specific locale, then it not same as default.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178401#comment-16178401
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

thammegowda commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140669367
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/captioning/tf/TensorflowRESTCaptioner.java
 ##
 @@ -75,13 +75,13 @@
 private static final String LABEL_LANG = "en";
 
 Review comment:
   Improvement: We should use `eng` as per [ISO 
693-2](https://www.loc.gov/standards/iso639-2/php/code_list.php). Wish I knew 
this when I coded this up.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178402#comment-16178402
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

thammegowda commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140670497
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -140,29 +133,17 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
 for (RecognisedObject object : objects) {
 if (object instanceof CaptionObject) {
 if (xhtmlStartVal == null) xhtmlStartVal = "captions";
-LOG.debug("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_IMG_CAP, mdValue);
-acceptedObjects.add(object);
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
+metadata.add(MD_KEY_IMG_CAP, mdVal);
 xhtmlIds.add(String.valueOf(count++));
 } else {
 if (xhtmlStartVal == null) xhtmlStartVal = "objects";
-if (object.getConfidence() >= minConfidence) {
-count++;
-LOG.info("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s 
(%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_OBJ_REC, mdValue);
-acceptedObjects.add(object);
-xhtmlIds.add(object.getId());
-if (count >= topN) {
-break;
-}
-} else {
-LOG.warn("Object {} confidence {} less than min {}", 
object, object.getConfidence(), minConfidence);
-}
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
+metadata.add(MD_KEY_OBJ_REC, mdVal);
+xhtmlIds.add(object.getId());
 }
+LOG.info("Add {}", object);
 
 Review comment:
   > will be great if you can remove String concatenation from 
RecognisedObject.toString to use StringBuffer or String format 
   
   If you suggested this for performance gain, Let's take a deeper look.  
`RecognisedObject.toString()` does not run over a loop. Its just one giant 
concatenation with `+`. I remember reading somewhere that JDK can easily 
optimize such statement, but I couldn't find the source of this knowledge now 
so I am giving you this test : 
   ```java
   class Main {
 
 public static long concat(int n){
   long st = System.nanoTime();
   for (int i = 0; i < n; i++) {
  String s = "a" + "b" + "c" + "d" + "e" + "f" +
 "g" + "h" + "i" + "j" +"k";
   }
   return System.nanoTime() - st;
 }
 
 public static long builder(int n){
   long st = System.nanoTime();
   for (int i = 0; i < n; i++) {
 String s = new StringBuilder().append("a").append("b")
   .append("c").append("d").append("e").append("f")
   .append("g").append("h").append("i").append("j")
   .append("k").toString();
   }
   return System.nanoTime() - st;
 }
 
 public static void main(String[] args) {
   int n = 1_000_000;
   System.out.printf("Builder Time in ns : %10d\n", builder(n));
   System.out.printf(" Concat Time in ns : %10d\n", concat(n));
 }
   }
   ```
   I ran it on  https://repl.it/languages/java
   
   ```
   java version "1.8.0_31"
   Java(TM) SE Runtime Environment (build 1.8.0_31-b13)
   Java HotSpot(TM) 64-Bit Server VM (build 25.31-b07, mixed mode)
  
   Builder Time in ns :   50614748
Concat Time in ns :2500615
   ```
   see, it's in fact better!!
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: 

[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178399#comment-16178399
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

thammegowda commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140670009
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -140,29 +133,17 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
 for (RecognisedObject object : objects) {
 if (object instanceof CaptionObject) {
 if (xhtmlStartVal == null) xhtmlStartVal = "captions";
-LOG.debug("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_IMG_CAP, mdValue);
-acceptedObjects.add(object);
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
 
 Review comment:
   > would be great if we can store object.getLabel() and 
object.getConfidence() into separate metadata fields. 
   
   IMHO, it complicates metadata key-values. If we split, we get two arrays of 
confidence and labels, then users have to match labels with confidence using 
the index in arrays. One solution to this problem is still an open issue in 
Tika - i.e, support complex data structure like JSON for metadata. Until then 
we have full info captured in XHML content, so it should be fine.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16178400#comment-16178400
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

thammegowda commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140670568
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java
 ##
 @@ -73,19 +74,27 @@
 /**
  * Maximum buffer size for image
  */
-private static final String LABEL_LANG = "en";
+protected static final String LABEL_LANG = "en";
 
 Review comment:
   Also in the future, wherever you want to use language code, please use ISO 
639-2, which is `eng` for English.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176010#comment-16176010
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140421441
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   yes, minConfidence and topN can be set through CLI/ Tika Config since we 
have defined them in REST clients. In TensorflowRESTVideoRecogniser, you're 
extending TensorflowRESTRecogniser, that's why I have made some of the fields 
in TensorflowRESTRecogniser as protected(we need them there to derive apiUri 
and healthUri).
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175997#comment-16175997
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140420790
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   please see,
   
https://github.com/ThejanW/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/captioning/tf/TensorflowRESTCaptioner.java#L77-L84
   
   
https://github.com/ThejanW/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java#L79-L86
   
   
https://github.com/ThejanW/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTVideoRecogniser.java#L71-L72
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175996#comment-16175996
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140420704
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   sorry, I misunderstood your question, the reason why I have removed 
minConfidence and topN from objectRecognitionParser is, objectRecognitionParser 
does not need to keep such client specific parameters. Those client specific 
fields should be in that specific client, we are just using 
ObjectRecognitionParser to process objects from the respective REST client and 
put them in the xhtml.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175914#comment-16175914
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140411633
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java
 ##
 @@ -73,19 +74,27 @@
 /**
  * Maximum buffer size for image
  */
-private static final String LABEL_LANG = "en";
+protected static final String LABEL_LANG = "en";
 
 @Field
-private URI apiUri = 
URI.create("http://localhost:8764/inception/v4/classify?topk=10;);
+protected URI apiBaseUri = 
URI.create("http://localhost:8764/inception/v4;);
+
+@Field
+protected int topN = 2;
+
 @Field
-private URI healthUri = 
URI.create("http://localhost:8764/inception/v4/ping;);
+protected double minConfidence = 0.015;
+
+protected URI apiUri;
+
+protected URI healthUri;
 
 Review comment:
   You can still keep a default value by extracting String constants and 
deriving a default value too. No big deal though
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175913#comment-16175913
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325811
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -140,29 +133,17 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
 for (RecognisedObject object : objects) {
 if (object instanceof CaptionObject) {
 if (xhtmlStartVal == null) xhtmlStartVal = "captions";
-LOG.debug("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_IMG_CAP, mdValue);
-acceptedObjects.add(object);
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
+metadata.add(MD_KEY_IMG_CAP, mdVal);
 xhtmlIds.add(String.valueOf(count++));
 } else {
 if (xhtmlStartVal == null) xhtmlStartVal = "objects";
-if (object.getConfidence() >= minConfidence) {
-count++;
-LOG.info("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s 
(%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_OBJ_REC, mdValue);
-acceptedObjects.add(object);
-xhtmlIds.add(object.getId());
-if (count >= topN) {
-break;
-}
-} else {
-LOG.warn("Object {} confidence {} less than min {}", 
object, object.getConfidence(), minConfidence);
-}
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
+metadata.add(MD_KEY_OBJ_REC, mdVal);
+xhtmlIds.add(object.getId());
 }
+LOG.info("Add {}", object);
 
 Review comment:
   - [ ] Thanks for following good logging practice of using `{}`. will be 
great if you can remove String concatenation from 
[`RecognisedObject.toString`](https://github.com/ThejanW/tika/blob/92c65e0a43e7f09a0566bec34f352314dffe5def/tika-parsers/src/main/java/org/apache/tika/parser/recognition/RecognisedObject.java#L84-L90)
 to use `StringBuffer` or `String format`. You can do it through IDE with few 
clicks. Thanks in advance for cleanup
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175911#comment-16175911
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140411401
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   @ThejanW For my understanding `minConfidence` and `topN` can still be 
tweaked through Tika config / CLI options right?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173466#comment-16173466
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140024736
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java
 ##
 @@ -73,19 +74,27 @@
 /**
  * Maximum buffer size for image
  */
-private static final String LABEL_LANG = "en";
+protected static final String LABEL_LANG = "en";
 
 @Field
-private URI apiUri = 
URI.create("http://localhost:8764/inception/v4/classify?topk=10;);
+protected URI apiBaseUri = 
URI.create("http://localhost:8764/inception/v4;);
+
+@Field
+protected int topN = 2;
+
 @Field
-private URI healthUri = 
URI.create("http://localhost:8764/inception/v4/ping;);
+protected double minConfidence = 0.015;
+
+protected URI apiUri;
+
+protected URI healthUri;
 
 Review comment:
   I have defined a apiBaseUri and using that practice in all REST clients, 
using that apiBaseUri, we can derive healthUri and apiUri, see 
https://github.com/ThejanW/tika/blob/2a81e975e48f2d1e051920725221fc5341e6db5f/tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java#L111-L112
 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173452#comment-16173452
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140022863
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   Hey! good catch..it's not easy maintaining comments like 
these(https://github.com/ThejanW/tika/blob/92c65e0a43e7f09a0566bec34f352314dffe5def/tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java#L49-L70)
 
   
   A future developers will also miss these. Will update them asap  :+1: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16173438#comment-16173438
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r140021556
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   Yeah, I have moved minConfidence logic to REST servers, it is kind of odd to 
ask for topk objects from the backend and filter those objects again in the 
client with related to minConfidence and select topN objects, just too much 
logic in the client. we can directly ask the backend to give us topN objects 
which has a confidence greater than the minConfidence, less iterations and 
simplified client :100: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172812#comment-16172812
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139329533
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   If you plan to put these controls in REST URI then please leave it somewhere 
in comments and wiki too. Also, this needs to be updated in comments too - 
https://github.com/ThejanW/tika/blob/92c65e0a43e7f09a0566bec34f352314dffe5def/tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java#L60
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172806#comment-16172806
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325945
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java
 ##
 @@ -73,19 +74,27 @@
 /**
  * Maximum buffer size for image
  */
-private static final String LABEL_LANG = "en";
+protected static final String LABEL_LANG = "en";
 
 Review comment:
   - [ ]  Will be great if you can put the reason to make it `protected` in 
comments so no one changes it in future.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172799#comment-16172799
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139326146
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -183,4 +164,4 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
 metadata.add("no.objects", Boolean.TRUE.toString());
 }
 }
-}
 
 Review comment:
   - [ ] Extra line break
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172796#comment-16172796
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325333
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   - [ ] Any specific reason to remove `minConfidence` and `topN` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172804#comment-16172804
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325490
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -140,29 +133,17 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
 for (RecognisedObject object : objects) {
 if (object instanceof CaptionObject) {
 if (xhtmlStartVal == null) xhtmlStartVal = "captions";
-LOG.debug("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_IMG_CAP, mdValue);
-acceptedObjects.add(object);
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
 
 Review comment:
   - [ ] would be great if we can store `object.getLabel()` and 
`object.getConfidence()` into separate metadata fields. Like creating a new key 
`MD_KEY_CAP_CONFIDENCE` for storing confidence, instead of wrapping them both 
in a single `String mdVal`. 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172802#comment-16172802
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325427
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -140,29 +133,17 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
 for (RecognisedObject object : objects) {
 if (object instanceof CaptionObject) {
 if (xhtmlStartVal == null) xhtmlStartVal = "captions";
-LOG.debug("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_IMG_CAP, mdValue);
-acceptedObjects.add(object);
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
 
 Review comment:
   - [ ] Can we please rename `mdVal` to something more related to the value of 
this variable? Like `imageLabelAndConfidence` or ``objectLabelAndConfidence``
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172790#comment-16172790
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325563
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -140,29 +133,17 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
 for (RecognisedObject object : objects) {
 if (object instanceof CaptionObject) {
 if (xhtmlStartVal == null) xhtmlStartVal = "captions";
-LOG.debug("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_IMG_CAP, mdValue);
-acceptedObjects.add(object);
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
+metadata.add(MD_KEY_IMG_CAP, mdVal);
 xhtmlIds.add(String.valueOf(count++));
 } else {
 if (xhtmlStartVal == null) xhtmlStartVal = "objects";
-if (object.getConfidence() >= minConfidence) {
-count++;
-LOG.info("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s 
(%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_OBJ_REC, mdValue);
-acceptedObjects.add(object);
-xhtmlIds.add(object.getId());
-if (count >= topN) {
-break;
-}
-} else {
-LOG.warn("Object {} confidence {} less than min {}", 
object, object.getConfidence(), minConfidence);
-}
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
 
 Review comment:
   - [ ] same comments, variable name and seperate metadata key
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172797#comment-16172797
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139326190
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTVideoRecogniser.java
 ##
 @@ -17,63 +17,91 @@
 
 package org.apache.tika.parser.recognition.tf;
 
+import java.io.ByteArrayOutputStream;
 import java.io.IOException;
 import java.io.InputStream;
 import java.net.URI;
-import java.util.Collections;
+import java.util.Locale;
+import java.util.Map;
 import java.util.Set;
+import java.util.Collections;
+import java.util.HashSet;
 
 import javax.ws.rs.core.UriBuilder;
 
+import org.apache.http.HttpResponse;
+import org.apache.http.client.methods.HttpGet;
+import org.apache.http.client.methods.HttpPost;
+import org.apache.http.entity.ByteArrayEntity;
+import org.apache.http.impl.client.DefaultHttpClient;
 import org.apache.tika.Tika;
 import org.apache.tika.config.Field;
+import org.apache.tika.config.Param;
 import org.apache.tika.config.TikaConfig;
+import org.apache.tika.exception.TikaConfigException;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.IOUtils;
 import org.apache.tika.metadata.Metadata;
 import org.apache.tika.mime.MediaType;
 import org.apache.tika.mime.MimeType;
 import org.apache.tika.mime.MimeTypeException;
+import org.apache.tika.parser.ParseContext;
+import org.apache.tika.parser.recognition.RecognisedObject;
+import org.json.JSONArray;
+import org.json.JSONObject;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import org.xml.sax.ContentHandler;
+import org.xml.sax.SAXException;
 
 /**
  * Tensor Flow video recogniser which has high performance.
  * This implementation uses Tensorflow via REST API.
  * 
- * NOTE : //TODO: link to wiki page here
+ * NOTE : https://wiki.apache.org/tika/TikaAndVisionVideo
  *
  * @since Apache Tika 1.15
  */
-public class TensorflowRESTVideoRecogniser extends TensorflowRESTRecogniser{
+public class TensorflowRESTVideoRecogniser extends TensorflowRESTRecogniser {
 
 Review comment:
   - [ ] Extra space
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172792#comment-16172792
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139329533
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
-private double minConfidence = 0.05;
 
 Review comment:
   If you plan to put in REST URI then please leave it somewhere in comments 
too. Also, this needs to be updated in comments too - 
https://github.com/ThejanW/tika/blob/92c65e0a43e7f09a0566bec34f352314dffe5def/tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java#L60
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172794#comment-16172794
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139329240
 
 

 ##
 File path: 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/video_util.py
 ##
 @@ -1,5 +1,5 @@
 #!/usr/bin/env python
-# 
 
 Review comment:
   I guess there are very few actual changes in this file but mostly extra 
spaces and new lines. Though your code is great I'll suggest few of extra 
spaces and new lines in future as it brings focus to actual change only. Makes 
sense?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172798#comment-16172798
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325811
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -140,29 +133,17 @@ public synchronized void parse(InputStream stream, 
ContentHandler handler, Metad
 for (RecognisedObject object : objects) {
 if (object instanceof CaptionObject) {
 if (xhtmlStartVal == null) xhtmlStartVal = "captions";
-LOG.debug("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s (%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_IMG_CAP, mdValue);
-acceptedObjects.add(object);
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
+metadata.add(MD_KEY_IMG_CAP, mdVal);
 xhtmlIds.add(String.valueOf(count++));
 } else {
 if (xhtmlStartVal == null) xhtmlStartVal = "objects";
-if (object.getConfidence() >= minConfidence) {
-count++;
-LOG.info("Add {}", object);
-String mdValue = String.format(Locale.ENGLISH, "%s 
(%.5f)",
-object.getLabel(), object.getConfidence());
-metadata.add(MD_KEY_OBJ_REC, mdValue);
-acceptedObjects.add(object);
-xhtmlIds.add(object.getId());
-if (count >= topN) {
-break;
-}
-} else {
-LOG.warn("Object {} confidence {} less than min {}", 
object, object.getConfidence(), minConfidence);
-}
+String mdVal = String.format(Locale.ENGLISH, "%s (%.5f)", 
object.getLabel(), object.getConfidence());
+metadata.add(MD_KEY_OBJ_REC, mdVal);
+xhtmlIds.add(object.getId());
 }
+LOG.info("Add {}", object);
 
 Review comment:
   - [ ] Thanks for following good logging practice if using `{}`. will be 
great if you can remove String concatenation from 
[`RecognisedObject.toString`](https://github.com/ThejanW/tika/blob/92c65e0a43e7f09a0566bec34f352314dffe5def/tika-parsers/src/main/java/org/apache/tika/parser/recognition/RecognisedObject.java#L84-L90)
 to use `StringBuffer` or `String format`. You can do it through IDE with few 
clicks. Thanks in advance for cleanup
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172791#comment-16172791
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139326176
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTVideoRecogniser.java
 ##
 @@ -17,63 +17,91 @@
 
 package org.apache.tika.parser.recognition.tf;
 
+import java.io.ByteArrayOutputStream;
 import java.io.IOException;
 import java.io.InputStream;
 import java.net.URI;
-import java.util.Collections;
+import java.util.Locale;
+import java.util.Map;
 import java.util.Set;
+import java.util.Collections;
+import java.util.HashSet;
 
 import javax.ws.rs.core.UriBuilder;
 
+import org.apache.http.HttpResponse;
+import org.apache.http.client.methods.HttpGet;
+import org.apache.http.client.methods.HttpPost;
+import org.apache.http.entity.ByteArrayEntity;
+import org.apache.http.impl.client.DefaultHttpClient;
 import org.apache.tika.Tika;
 import org.apache.tika.config.Field;
+import org.apache.tika.config.Param;
 import org.apache.tika.config.TikaConfig;
+import org.apache.tika.exception.TikaConfigException;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.IOUtils;
 import org.apache.tika.metadata.Metadata;
 import org.apache.tika.mime.MediaType;
 import org.apache.tika.mime.MimeType;
 import org.apache.tika.mime.MimeTypeException;
+import org.apache.tika.parser.ParseContext;
+import org.apache.tika.parser.recognition.RecognisedObject;
+import org.json.JSONArray;
+import org.json.JSONObject;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import org.xml.sax.ContentHandler;
+import org.xml.sax.SAXException;
 
 /**
  * Tensor Flow video recogniser which has high performance.
  * This implementation uses Tensorflow via REST API.
  * 
- * NOTE : //TODO: link to wiki page here
+ * NOTE : https://wiki.apache.org/tika/TikaAndVisionVideo
 
 Review comment:
    
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172800#comment-16172800
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139326006
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java
 ##
 @@ -73,19 +74,27 @@
 /**
  * Maximum buffer size for image
  */
-private static final String LABEL_LANG = "en";
+protected static final String LABEL_LANG = "en";
 
 @Field
-private URI apiUri = 
URI.create("http://localhost:8764/inception/v4/classify?topk=10;);
+protected URI apiBaseUri = 
URI.create("http://localhost:8764/inception/v4;);
+
+@Field
+protected int topN = 2;
+
 @Field
-private URI healthUri = 
URI.create("http://localhost:8764/inception/v4/ping;);
+protected double minConfidence = 0.015;
+
+protected URI apiUri;
+
+protected URI healthUri;
 
 Review comment:
   - [ ] Why remove default value?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172803#comment-16172803
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139326151
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java
 ##
 @@ -160,4 +175,4 @@ public void 
checkInitialization(InitializableProblemHandler handler)
 LOG.debug("Num Objects found {}", recObjs.size());
 return recObjs;
 }
-}
+}
 
 Review comment:
   - [ ] Extra line break
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172801#comment-16172801
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325945
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java
 ##
 @@ -73,19 +74,27 @@
 /**
  * Maximum buffer size for image
  */
-private static final String LABEL_LANG = "en";
+protected static final String LABEL_LANG = "en";
 
 Review comment:
   - [ ]  Will be great if you can put the reason in comments so no one changes 
it in future.
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172793#comment-16172793
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325921
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTRecogniser.java
 ##
 @@ -56,7 +57,7 @@
  * Tensor Flow image recogniser which has high performance.
  * This implementation uses Tensorflow via REST API.
  * 
- * NOTE : //TODO: link to wiki page here
+ * NOTE : https://wiki.apache.org/tika/TikaAndVision
 
 Review comment:
   Thanks
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16172795#comment-16172795
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139326197
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/tf/TensorflowRESTVideoRecogniser.java
 ##
 @@ -17,63 +17,91 @@
 
 package org.apache.tika.parser.recognition.tf;
 
+import java.io.ByteArrayOutputStream;
 import java.io.IOException;
 import java.io.InputStream;
 import java.net.URI;
-import java.util.Collections;
+import java.util.Locale;
+import java.util.Map;
 import java.util.Set;
+import java.util.Collections;
+import java.util.HashSet;
 
 import javax.ws.rs.core.UriBuilder;
 
+import org.apache.http.HttpResponse;
+import org.apache.http.client.methods.HttpGet;
+import org.apache.http.client.methods.HttpPost;
+import org.apache.http.entity.ByteArrayEntity;
+import org.apache.http.impl.client.DefaultHttpClient;
 import org.apache.tika.Tika;
 import org.apache.tika.config.Field;
+import org.apache.tika.config.Param;
 import org.apache.tika.config.TikaConfig;
+import org.apache.tika.exception.TikaConfigException;
+import org.apache.tika.exception.TikaException;
+import org.apache.tika.io.IOUtils;
 import org.apache.tika.metadata.Metadata;
 import org.apache.tika.mime.MediaType;
 import org.apache.tika.mime.MimeType;
 import org.apache.tika.mime.MimeTypeException;
+import org.apache.tika.parser.ParseContext;
+import org.apache.tika.parser.recognition.RecognisedObject;
+import org.json.JSONArray;
+import org.json.JSONObject;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+import org.xml.sax.ContentHandler;
+import org.xml.sax.SAXException;
 
 /**
  * Tensor Flow video recogniser which has high performance.
  * This implementation uses Tensorflow via REST API.
  * 
- * NOTE : //TODO: link to wiki page here
+ * NOTE : https://wiki.apache.org/tika/TikaAndVisionVideo
  *
  * @since Apache Tika 1.15
  */
-public class TensorflowRESTVideoRecogniser extends TensorflowRESTRecogniser{
+public class TensorflowRESTVideoRecogniser extends TensorflowRESTRecogniser {
 
-private static final Logger LOG = 
LoggerFactory.getLogger(TensorflowRESTRecogniser.class);
+private static final Logger LOG = 
LoggerFactory.getLogger(TensorflowRESTVideoRecogniser.class);
 
 Review comment:
   Super thanks
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171165#comment-16171165
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139603908
 
 

 ##
 File path: 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/InceptionVideoRestDockerfile
 ##
 @@ -61,31 +48,22 @@ RUN make -j4
 RUN make install
 
 WORKDIR /
-
-# Install tensorflow and other dependencies
-RUN \
-  pip install --upgrade 
https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.1-cp27-none-linux_x86_64.whl
 --ignore-installed  && \
-  pip install flask requests pillow
-
-# Get the TF-slim dependencies
-# Downloading from a specific commit for future compatibility
-RUN wget 
https://github.com/tensorflow/models/archive/c15fada28113eca32dc98d6e3bec4755d0d5b4c2.zip
-
-RUN unzip c15fada28113eca32dc98d6e3bec4755d0d5b4c2.zip
-
 RUN \
-  wget 
https://raw.githubusercontent.com/apache/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/inceptionapi.py
 -O /usr/bin/inceptionapi.py && \
-  wget 
https://raw.githubusercontent.com/apache/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/video_util.py
 -O /usr/bin/video_util.py && \
+  wget 
https://raw.githubusercontent.com/ThejanW/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/inceptionapi.py
 -O /usr/bin/inceptionapi.py && \
 
 Review comment:
   will do once, merged :+1: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16171164#comment-16171164
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139603897
 
 

 ##
 File path: 
tika-parsers/src/main/resources/org/apache/tika/parser/captioning/tf/Im2txtRestDockerfile
 ##
 @@ -46,7 +43,7 @@ RUN \
 wget 
https://raw.githubusercontent.com/apache/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/captioning/tf/caption_generator.py
 \
 -O caption_generator.py && \
 
-wget 
https://raw.githubusercontent.com/apache/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/captioning/tf/im2txtapi.py
 \
+wget 
https://raw.githubusercontent.com/ThejanW/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/captioning/tf/im2txtapi.py
 \
 
 Review comment:
   will do once, merged :+1: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170701#comment-16170701
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

chrismattmann commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139537923
 
 

 ##
 File path: 
tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/InceptionVideoRestDockerfile
 ##
 @@ -61,31 +48,22 @@ RUN make -j4
 RUN make install
 
 WORKDIR /
-
-# Install tensorflow and other dependencies
-RUN \
-  pip install --upgrade 
https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.1-cp27-none-linux_x86_64.whl
 --ignore-installed  && \
-  pip install flask requests pillow
-
-# Get the TF-slim dependencies
-# Downloading from a specific commit for future compatibility
-RUN wget 
https://github.com/tensorflow/models/archive/c15fada28113eca32dc98d6e3bec4755d0d5b4c2.zip
-
-RUN unzip c15fada28113eca32dc98d6e3bec4755d0d5b4c2.zip
-
 RUN \
-  wget 
https://raw.githubusercontent.com/apache/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/inceptionapi.py
 -O /usr/bin/inceptionapi.py && \
-  wget 
https://raw.githubusercontent.com/apache/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/video_util.py
 -O /usr/bin/video_util.py && \
+  wget 
https://raw.githubusercontent.com/ThejanW/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/recognition/tf/inceptionapi.py
 -O /usr/bin/inceptionapi.py && \
 
 Review comment:
   reminder to change back after applying
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170700#comment-16170700
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

chrismattmann commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139537857
 
 

 ##
 File path: 
tika-parsers/src/main/resources/org/apache/tika/parser/captioning/tf/Im2txtRestDockerfile
 ##
 @@ -46,7 +43,7 @@ RUN \
 wget 
https://raw.githubusercontent.com/apache/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/captioning/tf/caption_generator.py
 \
 -O caption_generator.py && \
 
-wget 
https://raw.githubusercontent.com/apache/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/captioning/tf/im2txtapi.py
 \
+wget 
https://raw.githubusercontent.com/ThejanW/tika/master/tika-parsers/src/main/resources/org/apache/tika/parser/captioning/tf/im2txtapi.py
 \
 
 Review comment:
   Reminder this needs to be changed back
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16170102#comment-16170102
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW commented on issue #208: Fix for TIKA-2400 Standardizing current Object 
Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-330246151
 
 
   @chrismattmann @thammegowda yeah! lemme configure docker builds in 
uscdatascience :+1: 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169526#comment-16169526
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

chrismattmann commented on issue #208: Fix for TIKA-2400 Standardizing current 
Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-330115750
 
 
   Yes please use @uscdataacience thanks dudes 
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169501#comment-16169501
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

thammegowda commented on issue #208: Fix for TIKA-2400 Standardizing current 
Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#issuecomment-330109421
 
 
   @ThejanW  Great work. Looks like you have done lot of cleaning, so here is 
another  
   
   Please publish these docker images under some organization. Since we cannot 
use `apache` organization under docker hub, lets just use 
https://hub.docker.com/u/uscdatascience/, I gave you all the permissions.
   
   
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169479#comment-16169479
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325308
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
 
 Review comment:
   Any specific reason to remove `minConfidence` and `topN` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169477#comment-16169477
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

smadha commented on a change in pull request #208: Fix for TIKA-2400 
Standardizing current Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208#discussion_r139325308
 
 

 ##
 File path: 
tika-parsers/src/main/java/org/apache/tika/parser/recognition/ObjectRecognitionParser.java
 ##
 @@ -83,12 +83,6 @@ public int compare(RecognisedObject o1, RecognisedObject 
o2) {
 }
 };
 
-@Field
 
 Review comment:
   Any specific reason to remove `minConfidence` and `topN` ?
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (TIKA-2400) Standardizing current Object Recognition REST parsers

2017-09-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2400?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169285#comment-16169285
 ] 

ASF GitHub Bot commented on TIKA-2400:
--

ThejanW opened a new pull request #208: Fix for TIKA-2400 Standardizing current 
Object Recognition REST parsers
URL: https://github.com/apache/tika/pull/208
 
 
   **This PR consists of,** 
   
   1. Reformatting related to inceptionapi.py, im2txtapi.py and related Java 
clients
   2. Logic implementations for checking min confidence in server side instead 
of client side in Object Recognition REST parsers
   3. Refactoring to docker files
   
   **How to test?**
   
   1. `docker run -it -p 8764:8764 thejanw/inception-rest-tika` - then run the 
tests in **ObjectRecognitionParserTest** class
   
   2. `docker run -it -p 8764:8764 thejanw/im2txt-rest-tika` - then run the 
tests in **ObjectRecognitionParserTest** class
   
   3. `docker run -it -p 8764:8764 thejanw/inception-video-rest-tika` - then 
run the tests in **TensorflowVideoRecParserTest** class
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Standardizing current Object Recognition REST parsers
> -
>
> Key: TIKA-2400
> URL: https://issues.apache.org/jira/browse/TIKA-2400
> Project: Tika
>  Issue Type: Sub-task
>  Components: parser
>Reporter: Thejan Wijesinghe
>Priority: Minor
> Fix For: 1.17
>
>
> # This involves adding apiBaseUris and refactoring current Object Recognition 
> REST parsers,
> # Refactoring dockerfiles related to those parsers.
> #  Moving the logic related to checking minimum confidence into servers



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)