Re: GSoC 2015 with Apache any23

2015-03-26 Thread Lewis John Mcgibbney
Hi Nisala,

On Thu, Mar 26, 2015 at 12:23 AM, dev-digest-h...@any23.apache.org wrote:


 Hi all,
 The test failures I came across have been reported recently with
 https://issues.apache.org/jira/browse/ANY23-256. Can I have access to the
 ANY23 wiki, my user name is: nisala12.
 Regards
 Nisala


DONE. Apologies for the delay. Best.
Lewis


Re: GSoC 2015 with Apache any23

2015-03-26 Thread Nisala Mendis
Hi Lewis,
Thanks for adding me to the wiki. Can you please give some comments to my
previous mail on this thread regarding the microformat parser?
Regards
Nisala

On Thu, Mar 26, 2015 at 7:07 PM, Lewis John Mcgibbney 
lewis.mcgibb...@gmail.com wrote:

 Hi Nisala,

 On Thu, Mar 26, 2015 at 12:23 AM, dev-digest-h...@any23.apache.org
 wrote:

 
  Hi all,
  The test failures I came across have been reported recently with
  https://issues.apache.org/jira/browse/ANY23-256. Can I have access to
 the
  ANY23 wiki, my user name is: nisala12.
  Regards
  Nisala


 DONE. Apologies for the delay. Best.
 Lewis



Jenkins build is still unstable: Any23-trunk #1315

2015-03-26 Thread Apache Jenkins Server
See https://builds.apache.org/job/Any23-trunk/1315/



[jira] [Commented] (ANY23-247) FIX Attribute name itemscope associated with an element type html must be followed by the ' = ' character.

2015-03-26 Thread Lewis John McGibbney (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381978#comment-14381978
 ] 

Lewis John McGibbney commented on ANY23-247:


An example of a failing test for this issue
{code}
org.apache.any23.Any23Test.testMicrodataSupport
Failing for the past 6 builds (Since Unstable#1309 )
Took 0.43 sec.
Error Message

Error while parsing RDF document.

Stacktrace

org.apache.any23.extractor.ExtractionException: Error while parsing RDF 
document.
at 
com.sun.org.apache.xerces.internal.parsers.AbstractSAXParser.parse(AbstractSAXParser.java:1236)
at org.semarglproject.source.XmlSource.process(XmlSource.java:48)
at 
org.semarglproject.source.StreamProcessor.processInternal(StreamProcessor.java:87)
at 
org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:167)
at 
org.semarglproject.source.BaseStreamProcessor.process(BaseStreamProcessor.java:154)
at 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:109)
at 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser.parse(SesameRDFaParser.java:95)
at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:105)
at 
org.apache.any23.extractor.rdf.BaseRDFExtractor.run(BaseRDFExtractor.java:41)
at 
org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:462)
at 
org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:254)
at org.apache.any23.Any23.extract(Any23.java:298)
at org.apache.any23.Any23.extract(Any23.java:433)
at org.apache.any23.Any23.extract(Any23.java:347)
at org.apache.any23.Any23Test.detectAndExtract(Any23Test.java:559)
at 
org.apache.any23.Any23Test.assertExtractorActivation(Any23Test.java:590)
at org.apache.any23.Any23Test.testMicrodataSupport(Any23Test.java:484)

Standard Output

[2015-03-26 02:01:37,665] INFO  4947[main] - 
org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:221)
 - Processing http://host.com/path
  

Standard Error

[Fatal Error] :23:15: Attribute name itemscope associated with an element 
type div must be followed by the ' = ' character.

{code}

 FIX Attribute name itemscope associated with an element type html must be 
 followed by the ' = ' character.
 --

 Key: ANY23-247
 URL: https://issues.apache.org/jira/browse/ANY23-247
 Project: Apache Any23
  Issue Type: Improvement
Affects Versions: 1.1
Reporter: Lewis John McGibbney
Assignee: Lewis John McGibbney
 Fix For: 1.3


 In the following markup
 {code}
 !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01 Transitional//EN 
 http://www.w3.org/TR/html4/loose.dtd;
 html xmlns=http://www.w3.org/1999/xhtml; 
 xmlns:og=http://opengraphprotocol.org/schema/; 
 xmlns:fb=http://www.facebook.com/2008/fbml; version=HTML+RDFa 1.0 
 xml:lang=en itemscope itemtype=http://schema.org/Product;
 head
 meta http-equiv=Content-Type content=text/html; charset=UTF-8
 meta http-equiv=X-UA-Compatible content=IE=edge /
 meta name=generator content=ToolTwist /
 ...
 {code}
 Due to the absence of any subsequent value for *itemscope*, we get the 
 following error in our web server logs
 {code}
 [Fatal Error] :2:185: Attribute name itemscope associated with an element 
 type html must be followed by the ' = ' character.
 {code}
 Although the markup semantics are incorrect, Any23 should simply perform a 
 check for the itemscope value being null, if this is the case then add *=*, 
 there is a precedent for us doing something like this before, I just cant 
 find the ticket right now!
 The code we need to add is present within either 
 core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
 core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)