[jira] [Commented] (ANY23-337) BenchmarkTripleHandler does not report accurate extraction interval times

2018-04-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437726#comment-16437726
 ] 

Hudson commented on ANY23-337:
--

SUCCESS: Integrated in Jenkins build Any23-trunk #1561 (See 
[https://builds.apache.org/job/Any23-trunk/1561/])
ANY23-337 fixed: BenchmarkTripleHandler reported inaccurate runtimes (hans: rev 
c0db95e7c370eac13bbfcb9018eb960295a12faa)
* (edit) core/src/main/java/org/apache/any23/extractor/ExtractionResultImpl.java


> BenchmarkTripleHandler does not report accurate extraction interval times
> -
>
> Key: ANY23-337
> URL: https://issues.apache.org/jira/browse/ANY23-337
> Project: Apache Any23
>  Issue Type: Bug
>  Components: core, extractors
>Affects Versions: 2.2
>Reporter: Hans Brende
>Assignee: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> The begin and end-time statistics of the BenchmarkTripleHandler are recorded 
> in TripleHandler.openContext(), and TripleHandler.closeContext(), 
> respectively. However, neither of these methods are called until *after the 
> extraction has already completed*, resulting in extraction times that 
> approach zero!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Frequent Releases of Any23

2018-04-13 Thread Hans Brende
Hi everyone!

There are only 8 issues remaining in the Any23 codebase with the
designation of "Bug": ANY23-322, ANY23-345, ANY23-67, ANY23-330, ANY23-251,
ANY23-132, ANY23-154, and ANY23-159.

These issues could be potentially thorny, so I'm hoping more people will
jump in and help knock them out, or at least, let's start discussing them!

In particular, Lewis, you reported three of them: ANY23-159, ANY23-251, and
ANY23-322, and were actively involved in discussing three more: ANY23-154,
ANY23-132, and ANY23-67. Any update on the status of those?

It would also be cool to release version 2.3 first, and then defer these 8
bugs to version 2.4.

Thoughts?



On Thu, Apr 12, 2018 at 12:44 AM, Hans Brende  wrote:

> +1 for a swift next release!
>
> Nice work on those 8 issues everyone.
>
> Side question: does anyone have any idea why the any23.org server is
> returning a 500 Internal Server Error? Could it have something to do with
> the OpenIE stuff we added? We should probably get that fixed before the
> next release.
>
>
>
> On Wed, Apr 11, 2018 at 6:01 PM, lewis john mcgibbney 
> wrote:
>
>> Hi Folks,
>> I am very happy to see more use of the Any23 codebase and I encourage
>> anyone to step up and essentially learn the release management process.
>> Right now we have addressed some 8 issues, which could merit a new
>> release... I would have no issues with that.
>> We have a GSoC project hopefully coming up this summer again so I think
>> things are looking up for Any23 moving forward.
>> Lewis
>>
>>
>> --
>> http://home.apache.org/~lewismc/
>> http://people.apache.org/keys/committer/lewismc
>>
>
>


[GitHub] any23 pull request #79: ANY23-334 fixed: default language was a UUID

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/79


---


[jira] [Commented] (ANY23-334) SingleDocumentExtraction.createExtractionContext() uses UUID as defaultLanguage

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437563#comment-16437563
 ] 

ASF GitHub Bot commented on ANY23-334:
--

GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/79

ANY23-334 fixed: default language was a UUID

mvn clean test -> all tests pass

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-334

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/79.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #79


commit 53e9cdd69632c1b73fb4a1eb3348352a5e241b82
Author: Hans 
Date:   2018-04-13T16:47:03Z

ANY23-334 fixed: default language was a UUID




> SingleDocumentExtraction.createExtractionContext() uses UUID as 
> defaultLanguage
> ---
>
> Key: ANY23-334
> URL: https://issues.apache.org/jira/browse/ANY23-334
> Project: Apache Any23
>  Issue Type: Bug
>  Components: core
>Affects Versions: 2.2
>Reporter: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> {code}SingleDocumentExtraction.createExtractionContext() {code}
> returns an extraction context with defaultLanguage set to: 
> {code}UUID.randomUUID().toString(){code}
> I'm assuming this was meant to be localID rather than defaultLanguage.
> Here are the links to the relevant lines of code: 
> [SingleDocumentExtraction.createExtractionContext()|https://github.com/apache/any23/blob/1867cc66de9a82cd98f1962fdabbd3a8680ff408/core/src/main/java/org/apache/any23/extractor/SingleDocumentExtraction.java#L648]
> [ExtractionContext(String extractorName, IRI documentIRI, String 
> defaultLanguage)|https://github.com/apache/any23/blob/1867cc66de9a82cd98f1962fdabbd3a8680ff408/api/src/main/java/org/apache/any23/extractor/ExtractionContext.java#L61]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #79: ANY23-334 fixed: default language was a UUID

2018-04-13 Thread HansBrende
GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/79

ANY23-334 fixed: default language was a UUID

mvn clean test -> all tests pass

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-334

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/79.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #79


commit 53e9cdd69632c1b73fb4a1eb3348352a5e241b82
Author: Hans 
Date:   2018-04-13T16:47:03Z

ANY23-334 fixed: default language was a UUID




---


[jira] [Updated] (ANY23-307) Ensure Microformats test suite compliance

2018-04-13 Thread Hans Brende (JIRA)

 [ 
https://issues.apache.org/jira/browse/ANY23-307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende updated ANY23-307:
--
Issue Type: Improvement  (was: Bug)

> Ensure Microformats test suite compliance
> -
>
> Key: ANY23-307
> URL: https://issues.apache.org/jira/browse/ANY23-307
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: microformats
>Reporter: Lewis John McGibbney
>Priority: Major
> Fix For: 2.3
>
>
> I've been over on the Microformats IRC channel and it turns out they have a 
> wiki page and code relating to an entire compliance test suite
> http://microformats.org/wiki/test-suite
> We should implement compliance within Any23 for our microformats code 
> implementations
> https://github.com/apache/any23/tree/master/core/src/main/java/org/apache/any23/extractor/html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #78: ANY23-337 fixed: BenchmarkTripleHandler reported ina...

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/78


---


[jira] [Commented] (ANY23-337) BenchmarkTripleHandler does not report accurate extraction interval times

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437499#comment-16437499
 ] 

ASF GitHub Bot commented on ANY23-337:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/78


> BenchmarkTripleHandler does not report accurate extraction interval times
> -
>
> Key: ANY23-337
> URL: https://issues.apache.org/jira/browse/ANY23-337
> Project: Apache Any23
>  Issue Type: Bug
>  Components: core, extractors
>Affects Versions: 2.2
>Reporter: Hans Brende
>Priority: Major
> Fix For: 2.3
>
>
> The begin and end-time statistics of the BenchmarkTripleHandler are recorded 
> in TripleHandler.openContext(), and TripleHandler.closeContext(), 
> respectively. However, neither of these methods are called until *after the 
> extraction has already completed*, resulting in extraction times that 
> approach zero!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #78: ANY23-337 fixed: BenchmarkTripleHandler reported ina...

2018-04-13 Thread HansBrende
GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/78

ANY23-337 fixed: BenchmarkTripleHandler reported inaccurate runtimes

mvn clean test -> all tests pass

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-337

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/78.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #78


commit c0db95e7c370eac13bbfcb9018eb960295a12faa
Author: Hans 
Date:   2018-04-13T16:07:03Z

ANY23-337 fixed: BenchmarkTripleHandler reported inaccurate runtimes




---


[jira] [Commented] (ANY23-237) Fix RDFa test 0087: stylesheet reserved word is stripped out

2018-04-13 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437433#comment-16437433
 ] 

Hudson commented on ANY23-237:
--

SUCCESS: Integrated in Jenkins build Any23-trunk #1560 (See 
[https://builds.apache.org/job/Any23-trunk/1560/])
ANY23-237 added test case to ensure no regression (hans: rev 
b13472c202f23d9faacafdaa9bd29fbfddfcfe0b)
* (add) test-resources/src/test/resources/html/rdfa/0087.xhtml
* (edit) 
core/src/test/java/org/apache/any23/extractor/rdfa/RDFaExtractorTest.java
* (edit) 
core/src/test/java/org/apache/any23/extractor/rdfa/RDFa11ExtractorTest.java


> Fix RDFa test 0087: stylesheet reserved word is stripped out
> 
>
> Key: ANY23-237
> URL: https://issues.apache.org/jira/browse/ANY23-237
> Project: Apache Any23
>  Issue Type: Bug
>Reporter: stephane corlosquet
>Priority: Major
> Fix For: 2.3
>
>
> We have pretty much 100% green results on the official RDFa test suite at 
> http://rdfa.info/test-suite/. There is only one fail remaining: test 0087.
> For some reason, any23 isn't able to extract a triple out of this markup:
> {code}
>   href="http://example.org/stylesheet;>stylesheet
> {code}
> when it can extract the right triple for all the other elements in the test 
> such as 
> {code}
>  http://example.org/alternate;>alternate
> {code}
> I'm going to need some help to figure this out, as I have no idea what part 
> of any23 is causing this. I checked the same test on semargl (our RDFa 
> parser) and it is passing no problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (ANY23-13) Verify why the maven-changelog-plugin doesn't work properly

2018-04-13 Thread Hans Brende (JIRA)

 [ 
https://issues.apache.org/jira/browse/ANY23-13?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende resolved ANY23-13.
--
Resolution: Invalid

We have no "maven-changelog-plugin".

[~lewismc] Please re-open this issue if you find that an existing plugin we are 
using has this issue.

> Verify why the maven-changelog-plugin doesn't work properly
> ---
>
> Key: ANY23-13
> URL: https://issues.apache.org/jira/browse/ANY23-13
> Project: Apache Any23
>  Issue Type: Bug
>Affects Versions: 0.7.0
>Reporter: Lewis John McGibbney
>Priority: Minor
>  Labels: maven
> Fix For: 2.3
>
>
> The maven-changelog-plugin produces a report that is not linked within the 
> project reports.
> Check it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-17) problem detecting media type for turtle content with comment at the top

2018-04-13 Thread Hans Brende (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437347#comment-16437347
 ] 

Hans Brende commented on ANY23-17:
--

[~lewismc] I am unable to reproduce this issue. The file you provided works 
fine for me (the extractor detected 58 turtle triples). I will mark this issue 
as "resolved", but if you find that it's not fully resolved, please reopen.

> problem detecting media type for turtle content with comment at the top 
> 
>
> Key: ANY23-17
> URL: https://issues.apache.org/jira/browse/ANY23-17
> Project: Apache Any23
>  Issue Type: Bug
>  Components: mime
>Affects Versions: 0.7.0
>Reporter: Lewis John McGibbney
>Priority: Major
> Fix For: 2.3
>
> Attachments: sigma.config.turtle
>
>
> What steps will reproduce the problem?
> 1. paste the content of this file into any23.org
> 2. press extract
> What is the expected output? What do you see instead?
> triples 
> but instead you will see
> No suitable extractor found for this media type
> What version of the product are you using?
> 0.6.1
> Please provide any additional information below.
> So the problem is if there is a long comment at the top of the file 
> If you repeat the operation but delete the last word "sections" from the 
> first line then it works fine
> The proposed solution:
> It might be worth to do 
> If no suitable extractor were found at the first  place
> try to remove blank lines and turtle style comments
> from the source
> skip line if it match
> line.matches("^\\s+$")   // remove empty line
> or
> line.matches("^\\s*#.*$")// remove line which starts with # or white space 
> and # 
> and then check for turtle mime type again



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (ANY23-237) Fix RDFa test 0087: stylesheet reserved word is stripped out

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437305#comment-16437305
 ] 

ASF GitHub Bot commented on ANY23-237:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/77


> Fix RDFa test 0087: stylesheet reserved word is stripped out
> 
>
> Key: ANY23-237
> URL: https://issues.apache.org/jira/browse/ANY23-237
> Project: Apache Any23
>  Issue Type: Bug
>Reporter: stephane corlosquet
>Priority: Major
> Fix For: 2.3
>
>
> We have pretty much 100% green results on the official RDFa test suite at 
> http://rdfa.info/test-suite/. There is only one fail remaining: test 0087.
> For some reason, any23 isn't able to extract a triple out of this markup:
> {code}
>   href="http://example.org/stylesheet;>stylesheet
> {code}
> when it can extract the right triple for all the other elements in the test 
> such as 
> {code}
>  http://example.org/alternate;>alternate
> {code}
> I'm going to need some help to figure this out, as I have no idea what part 
> of any23 is causing this. I checked the same test on semargl (our RDFa 
> parser) and it is passing no problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #77: ANY23-237 added test case to ensure no regression

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/77


---


[jira] [Commented] (ANY23-237) Fix RDFa test 0087: stylesheet reserved word is stripped out

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-237?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437303#comment-16437303
 ] 

ASF GitHub Bot commented on ANY23-237:
--

GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/77

ANY23-237 added test case to ensure no regression

Added a test case to ensure we don't inadvertently switch back to buggy 
behavior.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-237

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/77.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #77


commit b13472c202f23d9faacafdaa9bd29fbfddfcfe0b
Author: Hans 
Date:   2018-04-13T13:25:04Z

ANY23-237 added test case to ensure no regression




> Fix RDFa test 0087: stylesheet reserved word is stripped out
> 
>
> Key: ANY23-237
> URL: https://issues.apache.org/jira/browse/ANY23-237
> Project: Apache Any23
>  Issue Type: Bug
>Reporter: stephane corlosquet
>Priority: Major
> Fix For: 2.3
>
>
> We have pretty much 100% green results on the official RDFa test suite at 
> http://rdfa.info/test-suite/. There is only one fail remaining: test 0087.
> For some reason, any23 isn't able to extract a triple out of this markup:
> {code}
>   href="http://example.org/stylesheet;>stylesheet
> {code}
> when it can extract the right triple for all the other elements in the test 
> such as 
> {code}
>  http://example.org/alternate;>alternate
> {code}
> I'm going to need some help to figure this out, as I have no idea what part 
> of any23 is causing this. I checked the same test on semargl (our RDFa 
> parser) and it is passing no problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #77: ANY23-237 added test case to ensure no regression

2018-04-13 Thread HansBrende
GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/77

ANY23-237 added test case to ensure no regression

Added a test case to ensure we don't inadvertently switch back to buggy 
behavior.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-237

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/77.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #77


commit b13472c202f23d9faacafdaa9bd29fbfddfcfe0b
Author: Hans 
Date:   2018-04-13T13:25:04Z

ANY23-237 added test case to ensure no regression




---


[jira] [Resolved] (ANY23-237) Fix RDFa test 0087: stylesheet reserved word is stripped out

2018-04-13 Thread Hans Brende (JIRA)

 [ 
https://issues.apache.org/jira/browse/ANY23-237?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hans Brende resolved ANY23-237.
---
Resolution: Fixed

> Fix RDFa test 0087: stylesheet reserved word is stripped out
> 
>
> Key: ANY23-237
> URL: https://issues.apache.org/jira/browse/ANY23-237
> Project: Apache Any23
>  Issue Type: Bug
>Reporter: stephane corlosquet
>Priority: Major
> Fix For: 2.3
>
>
> We have pretty much 100% green results on the official RDFa test suite at 
> http://rdfa.info/test-suite/. There is only one fail remaining: test 0087.
> For some reason, any23 isn't able to extract a triple out of this markup:
> {code}
>   href="http://example.org/stylesheet;>stylesheet
> {code}
> when it can extract the right triple for all the other elements in the test 
> such as 
> {code}
>  http://example.org/alternate;>alternate
> {code}
> I'm going to need some help to figure this out, as I have no idea what part 
> of any23 is causing this. I checked the same test on semargl (our RDFa 
> parser) and it is passing no problem.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] any23 pull request #76: ANY23-169 Fixed url resolving errors in MicrodataExt...

2018-04-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/76


---


[GitHub] any23 pull request #76: ANY23-169 Fixed url resolving errors in MicrodataExt...

2018-04-13 Thread HansBrende
GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/76

ANY23-169 Fixed url resolving errors in MicrodataExtractor

mvn clean test -> all tests pass

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-169

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/76.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #76


commit 3e5dce1dd9b043b6d6c9687f6f212b9b2ed2e573
Author: Hans 
Date:   2018-04-13T08:33:22Z

ANY23-169 Fixed url resolving errors in MicrodataExtractor




---


[jira] [Commented] (ANY23-169) Incorrect interpretation of relative and absolute paths with Microdata

2018-04-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16437012#comment-16437012
 ] 

ASF GitHub Bot commented on ANY23-169:
--

GitHub user HansBrende opened a pull request:

https://github.com/apache/any23/pull/76

ANY23-169 Fixed url resolving errors in MicrodataExtractor

mvn clean test -> all tests pass

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/HansBrende/any23 ANY23-169

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/76.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #76


commit 3e5dce1dd9b043b6d6c9687f6f212b9b2ed2e573
Author: Hans 
Date:   2018-04-13T08:33:22Z

ANY23-169 Fixed url resolving errors in MicrodataExtractor




> Incorrect interpretation of relative and absolute paths with Microdata
> --
>
> Key: ANY23-169
> URL: https://issues.apache.org/jira/browse/ANY23-169
> Project: Apache Any23
>  Issue Type: Bug
>  Components: microdata
>Reporter: Ruben Verborgh
>Priority: Major
>  Labels: microdata, url, urls
> Fix For: 2.3
>
>
> Parsing the following fragment located at 
> http://ruben.verborgh.org/tmp/slash-test.html
> Homepage
> Other
> results in the URIs
> http://ruben.verborgh.org/tmp/slash-test.html//
> http://ruben.verborgh.org/tmp/slash-test.html/other.html
> instead of the correct
> http://ruben.verborgh.org/tmp/
> http://ruben.verborgh.org/tmp/other.html
> Note that there is no trailing slash in the original.
> Test case:
> http://ruben.verborgh.org/tmp/slash-test.html
> http://any23.org/any23/?format=best=http%3A%2F%2Fruben.verborgh.org%2Ftmp%2Fslash-test.html=none



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)