Re: GSoC 2015 with Apache any23
Hi Nisala, On Thu, Mar 26, 2015 at 12:23 AM, dev-digest-h...@any23.apache.org wrote: Hi all, The test failures I came across have been reported recently with https://issues.apache.org/jira/browse/ANY23-256. Can I have access to the ANY23 wiki, my user name is: nisala12. Regards Nisala DONE. Apologies for the delay. Best. Lewis
Re: GSoC 2015 with Apache any23
Hi Lewis, Thanks for adding me to the wiki. Can you please give some comments to my previous mail on this thread regarding the microformat parser? Regards Nisala On Thu, Mar 26, 2015 at 7:07 PM, Lewis John Mcgibbney lewis.mcgibb...@gmail.com wrote: Hi Nisala, On Thu, Mar 26, 2015 at 12:23 AM, dev-digest-h...@any23.apache.org wrote: Hi all, The test failures I came across have been reported recently with https://issues.apache.org/jira/browse/ANY23-256. Can I have access to the ANY23 wiki, my user name is: nisala12. Regards Nisala DONE. Apologies for the delay. Best. Lewis
Jenkins build is still unstable: Any23-trunk #1315
See https://builds.apache.org/job/Any23-trunk/1315/
[jira] [Commented] (ANY23-247) FIX Attribute name itemscope associated with an element type html must be followed by the ' = ' character.
[ https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381978#comment-14381978 ] Lewis John McGibbney commented on ANY23-247: An example of a failing test for this issue {code} org.apache.any23.Any23Test.testMicrodataSupport Failing for the past 6 builds (Since Unstable#1309 ) Took 0.43 sec. Error Message Error while parsing RDF document. Stacktrace org.apache.any23.extractor.ExtractionException: Error while parsing RDF document. at com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1236) at org.semarglproject.source.XmlSource.process(XmlSource.java:48) at org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87) at org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167) at org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154) at org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:109) at org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:95) at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:105) at org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:41) at org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:462) at org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:254) at org.apache.any23.Any23.extract(Any23.java:298) at org.apache.any23.Any23.extract(Any23.java:433) at org.apache.any23.Any23.extract(Any23.java:347) at org.apache.any23.Any23Test.detectAndExtract(Any23Test.java:559) at org.apache.any23.Any23Test.assertExtractorActivation(Any23Test.java:590) at org.apache.any23.Any23Test.testMicrodataSupport(Any23Test.java:484) Standard Output [2015-03-26 02:01:37,665] INFO 4947[main] - org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:221) - Processing http://host.com/path Standard Error [Fatal Error] :23:15: Attribute name itemscope associated with an element type div must be followed by the ' = ' character. {code} FIX Attribute name itemscope associated with an element type html must be followed by the ' = ' character. -- Key: ANY23-247 URL: https://issues.apache.org/jira/browse/ANY23-247 Project: Apache Any23 Issue Type: Improvement Affects Versions: 1.1 Reporter: Lewis John McGibbney Assignee: Lewis John McGibbney Fix For: 1.3 In the following markup {code} !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN http://www.w3.org/TR/html4/loose.dtd; html xmlns=http://www.w3.org/1999/xhtml; xmlns:og=http://opengraphprotocol.org/schema/; xmlns:fb=http://www.facebook.com/2008/fbml; version=HTML+RDFa 1.0 xml:lang=en itemscope itemtype=http://schema.org/Product; head meta http-equiv=Content-Type content=text/html; charset=UTF-8 meta http-equiv=X-UA-Compatible content=IE=edge / meta name=generator content=ToolTwist / ... {code} Due to the absence of any subsequent value for *itemscope*, we get the following error in our web server logs {code} [Fatal Error] :2:185: Attribute name itemscope associated with an element type html must be followed by the ' = ' character. {code} Although the markup semantics are incorrect, Any23 should simply perform a check for the itemscope value being null, if this is the case then add *=*, there is a precedent for us doing something like this before, I just cant find the ticket right now! The code we need to add is present within either core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java -- This message was sent by Atlassian JIRA (v6.3.4#6332)