[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2016-06-09 Thread Peter Ansell (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323532#comment-15323532
 ] 

Peter Ansell commented on ANY23-226:


I think this can be resolved.

> Extract JSON-LD embedded in HTML
> 
>
> Key: ANY23-226
> URL: https://issues.apache.org/jira/browse/ANY23-226
> Project: Apache Any23
>  Issue Type: Wish
>  Components: core
>Affects Versions: 1.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
>  See http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
> I feel that we need to push this down at the jsonld-java level.
> I am investigating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2015-03-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372874#comment-14372874
 ] 

ASF GitHub Bot commented on ANY23-226:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/16#issuecomment-84384643
  
Ok Peter thank you for looking. This is great. I have not seen the test
failures. Can you please tell me if it is in Any23 or in jsonld-Java?
We could upgrade the Jsonld-Java implementation as well. To the 0.5.1
release

On Saturday, March 21, 2015, Peter Ansell notificati...@github.com wrote:

 The main bug was that the entire script node was being sent to
 JSONLD-Java, and not just its content.

 However, I also made a few other changes while doing that testing.

 It turned out that the jsonld was invalid, but somehow the exception when
 parses fail was changed to be silently swallowed, so the only indication
 was that the count was 0. I turned on the exception propagation again (no
 reason it should be swallowed outside of temporary testing).

 However, in addition to the 4 tests currently failing on the core tests,
 there are now other tests failing due to an inability to parse 
 

 —
 Reply to this email directly or view it on GitHub
 https://github.com/apache/any23/pull/16#issuecomment-84257478.



-- 
*Lewis*



 Extract JSON-LD embedded in HTML
 

 Key: ANY23-226
 URL: https://issues.apache.org/jira/browse/ANY23-226
 Project: Apache Any23
  Issue Type: Wish
  Components: core
Affects Versions: 1.0
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.3


  See http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
 I feel that we need to push this down at the jsonld-java level.
 I am investigating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2015-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372516#comment-14372516
 ] 

ASF GitHub Bot commented on ANY23-226:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/16


 Extract JSON-LD embedded in HTML
 

 Key: ANY23-226
 URL: https://issues.apache.org/jira/browse/ANY23-226
 Project: Apache Any23
  Issue Type: Wish
  Components: core
Affects Versions: 1.0
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.3


  See http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
 I feel that we need to push this down at the jsonld-java level.
 I am investigating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2015-03-20 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372528#comment-14372528
 ] 

Hudson commented on ANY23-226:
--

UNSTABLE: Integrated in Any23-trunk #1309 (See 
[https://builds.apache.org/job/Any23-trunk/1309/])
ANY23-226 Extract JSON-LD embedded in HTML (lewis.j.mcgibbney: rev 
1e3eb9c31af2f93906eee1081179d73c30a0881b)
* 
core/src/main/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorFactory.java
* core/src/main/java/org/apache/any23/extractor/html/DomUtils.java
* 
core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java
* plugins/integration-test/src/test/java/org/apache/any23/plugin/PluginIT.java
* core/src/main/resources/org/apache/any23/prefixes/prefixes.properties
* core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java
* 
core/src/main/resources/org/apache/any23/extractor/html/example-embedded-jsonld.html
* test-resources/src/test/resources/html/html-embedded-jsonld-extractor.html
* 
core/src/main/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractor.java
ANY23-226 : Make JSONLD extraction work (p_ansell: rev 
fd822849190240b8cf981ecc7abd0b4f592381d5)
* 
core/src/main/resources/META-INF/services/org.apache.any23.extractor.ExtractorFactory
* src/site/apt/any23-plugins.apt
* core/src/test/java/org/apache/any23/extractor/rdf/JSONLDExtractorTest.java
* core/src/main/java/org/apache/any23/cli/MicrodataParser.java
* 
plugins/html-scraper/src/main/java/org/apache/any23/plugin/htmlscraper/HTMLScraperExtractor.java
* 
core/src/main/java/org/apache/any23/extractor/html/HResumeExtractorFactory.java
* core/src/main/java/org/apache/any23/writer/TriXWriterFactory.java
* core/src/test/java/org/apache/any23/extractor/html/RDFMergerTest.java
* core/src/main/resources/META-INF/services/org.apache.any23.cli.Tool
* core/src/main/java/org/apache/any23/extractor/rdfa/RDFa11ExtractorFactory.java
* core/src/test/java/org/apache/any23/extractor/html/HCalendarExtractorTest.java
* core/src/main/java/org/apache/any23/extractor/html/HCardExtractorFactory.java
* plugins/basic-crawler/src/main/java/org/apache/any23/cli/Crawler.java
* core/src/main/java/org/apache/any23/writer/URIListWriterFactory.java
* core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java
* core/src/main/java/org/apache/any23/extractor/xpath/XPathExtractorFactory.java
* 
core/src/test/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractorTest.java
* 
core/src/main/java/org/apache/any23/extractor/html/HeadLinkExtractorFactory.java
* core/src/main/java/org/apache/any23/extractor/rdf/JSONLDExtractorFactory.java
* test-resources/src/test/resources/html/html-embedded-jsonld-extractor.html
* core/src/main/java/org/apache/any23/cli/ExtractorDocumentation.java
* 
plugins/html-scraper/src/main/resources/META-INF/services/org.apache.any23.extractor.ExtractorFactory
* core/src/main/java/org/apache/any23/writer/TurtleWriterFactory.java
* 
core/src/main/java/org/apache/any23/extractor/rdf/NTriplesExtractorFactory.java
* 
core/src/main/resources/META-INF/services/org.apache.any23.writer.WriterFactory
* core/src/main/java/org/apache/any23/writer/RDFXMLWriterFactory.java
* core/src/main/java/org/apache/any23/cli/MimeDetector.java
* 
core/src/test/java/org/apache/any23/extractor/example/ExampleExtractorFactory.java
* core/src/main/java/org/apache/any23/writer/NTriplesWriterFactory.java
* 
core/src/test/java/org/apache/any23/extractor/rdfa/AbstractRDFaExtractorTestCase.java
* 
plugins/office-scraper/src/main/resources/META-INF/services/org.apache.any23.extractor.ExtractorFactory
* core/src/main/java/org/apache/any23/extractor/html/AdrExtractorFactory.java
* core/src/main/java/org/apache/any23/cli/VocabPrinter.java
* core/src/main/java/org/apache/any23/extractor/rdf/TriXExtractorFactory.java
* 
plugins/html-scraper/src/main/java/org/apache/any23/plugin/htmlscraper/HTMLScraperExtractorFactory.java
* 
core/src/main/java/org/apache/any23/extractor/html/EmbeddedJSONLDExtractor.java
* core/src/main/java/org/apache/any23/extractor/rdfa/RDFaExtractorFactory.java
* 
plugins/office-scraper/src/main/java/org/apache/any23/plugin/officescraper/ExcelExtractor.java
* core/src/main/java/org/apache/any23/writer/NQuadsWriterFactory.java
* 
plugins/office-scraper/src/main/java/org/apache/any23/plugin/officescraper/ExcelExtractorFactory.java
* core/src/test/java/org/apache/any23/extractor/html/SpeciesExtractorTest.java
* 
core/src/main/java/org/apache/any23/extractor/html/TurtleHTMLExtractorFactory.java
* core/pom.xml
* 
core/src/main/java/org/apache/any23/extractor/html/SpeciesExtractorFactory.java
* 
core/src/main/java/org/apache/any23/extractor/html/HTMLMetaExtractorFactory.java
* nquads/src/main/java/org/apache/any23/io/nquads/NQuadsParserFactory.java
* core/src/main/java/org/apache/any23/extractor/html/ICBMExtractorFactory.java
* 

[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2015-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372518#comment-14372518
 ] 

ASF GitHub Bot commented on ANY23-226:
--

Github user ansell commented on the pull request:

https://github.com/apache/any23/pull/16#issuecomment-84257478
  
The main bug was that the entire script node was being sent to JSONLD-Java, 
and not just its content.

However, I also made a few other changes while doing that testing.

It turned out that the jsonld was invalid, but somehow the exception when 
parses fail was changed to be silently swallowed, so the only indication was 
that the count was 0. I turned on the exception propagation again (no reason it 
should be swallowed outside of temporary testing).

However, in addition to the 4 tests currently failing on the core tests, 
there are now other tests failing due to an inability to parse div itemscope


 Extract JSON-LD embedded in HTML
 

 Key: ANY23-226
 URL: https://issues.apache.org/jira/browse/ANY23-226
 Project: Apache Any23
  Issue Type: Wish
  Components: core
Affects Versions: 1.0
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.3


  See http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
 I feel that we need to push this down at the jsonld-java level.
 I am investigating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2015-03-12 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14359041#comment-14359041
 ] 

Lewis John McGibbney commented on ANY23-226:


bq. Hence, it may be more appropriate to pick out the script 
type=application/ld+json.../script elements in Any23 and pass their 
content to JSONLD-Java for parsing.
I think that you are right Peter. 

 Extract JSON-LD embedded in HTML
 

 Key: ANY23-226
 URL: https://issues.apache.org/jira/browse/ANY23-226
 Project: Apache Any23
  Issue Type: Wish
  Components: core
Affects Versions: 1.0
Reporter: Lewis John McGibbney
 Fix For: 1.3


  See http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
 I feel that we need to push this down at the jsonld-java level.
 I am investigating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)