[jira] [Commented] (ANY23-229) 501 error if no triples are extracted

2014-07-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14081869#comment-14081869
 ] 

ASF GitHub Bot commented on ANY23-229:
--

GitHub user scor opened a pull request:

https://github.com/apache/any23/pull/8

ANY23-229: don't return a 501 error if no triples are extracted

https://issues.apache.org/jira/browse/ANY23-229

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/scor/any23 ANY23-229

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/8.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8


commit 2ad518bfc500ad831b2d3bae6bf79326df3daa14
Author: scor 
Date:   2014-08-01T03:23:26Z

ANY23-229: don't return a 501 error if no triples are extracted




> 501 error if no triples are extracted
> -
>
> Key: ANY23-229
> URL: https://issues.apache.org/jira/browse/ANY23-229
> Project: Apache Any23
>  Issue Type: Bug
>Reporter: stephane corlosquet
>
> I found this bug when checking the RDFa test suite results. Some tests are 
> failing because any23 returns a 501 error when it's not able to extract 
> triples: "Extraction completed. No triples have been found." Those tests are 
> specifically expecting no triples to be extracted, and they fail because 
> any23 returns an error.
> I don't see any reason why this situation would trigger an error. It's fine 
> if a document doesn't generate any triples. It's up to the consumer to deal 
> with the lack of triples.
> The error is generated at 
> service/src/main/java/org/apache/any23/servlet/WebResponder.java:151



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ANY23-229) 501 error if no triples are extracted

2014-08-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14082051#comment-14082051
 ] 

ASF GitHub Bot commented on ANY23-229:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/8#issuecomment-50859496
  
Hey @scor do you have a patch? Or do I need to do this one? Is this the 
webservice?


> 501 error if no triples are extracted
> -
>
> Key: ANY23-229
> URL: https://issues.apache.org/jira/browse/ANY23-229
> Project: Apache Any23
>  Issue Type: Bug
>Reporter: stephane corlosquet
>
> I found this bug when checking the RDFa test suite results. Some tests are 
> failing because any23 returns a 501 error when it's not able to extract 
> triples: "Extraction completed. No triples have been found." Those tests are 
> specifically expecting no triples to be extracted, and they fail because 
> any23 returns an error.
> I don't see any reason why this situation would trigger an error. It's fine 
> if a document doesn't generate any triples. It's up to the consumer to deal 
> with the lack of triples.
> The error is generated at 
> service/src/main/java/org/apache/any23/servlet/WebResponder.java:151



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ANY23-229) 501 error if no triples are extracted

2014-08-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14083378#comment-14083378
 ] 

ASF GitHub Bot commented on ANY23-229:
--

Github user scor commented on the pull request:

https://github.com/apache/any23/pull/8#issuecomment-50954409
  
This is a pull request, so yes there is a patch. It's only affecting the 
web service. see my comment at 
https://issues.apache.org/jira/browse/ANY23-229?focusedCommentId=14081872&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14081872


> 501 error if no triples are extracted
> -
>
> Key: ANY23-229
> URL: https://issues.apache.org/jira/browse/ANY23-229
> Project: Apache Any23
>  Issue Type: Bug
>Reporter: stephane corlosquet
>
> I found this bug when checking the RDFa test suite results. Some tests are 
> failing because any23 returns a 501 error when it's not able to extract 
> triples: "Extraction completed. No triples have been found." Those tests are 
> specifically expecting no triples to be extracted, and they fail because 
> any23 returns an error.
> I don't see any reason why this situation would trigger an error. It's fine 
> if a document doesn't generate any triples. It's up to the consumer to deal 
> with the lack of triples.
> The error is generated at 
> service/src/main/java/org/apache/any23/servlet/WebResponder.java:151



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ANY23-229) 501 error if no triples are extracted

2014-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086981#comment-14086981
 ] 

ASF GitHub Bot commented on ANY23-229:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/8#issuecomment-51275247
  
commit 2ad518bfc500ad831b2d3bae6bf79326df3daa14
Author: scor 
Date:   Thu Jul 31 23:23:26 2014 -0400

ANY23-229: don't return a 501 error if no triples are extracted

Thank you


> 501 error if no triples are extracted
> -
>
> Key: ANY23-229
> URL: https://issues.apache.org/jira/browse/ANY23-229
> Project: Apache Any23
>  Issue Type: Bug
>Reporter: stephane corlosquet
>
> I found this bug when checking the RDFa test suite results. Some tests are 
> failing because any23 returns a 501 error when it's not able to extract 
> triples: "Extraction completed. No triples have been found." Those tests are 
> specifically expecting no triples to be extracted, and they fail because 
> any23 returns an error.
> I don't see any reason why this situation would trigger an error. It's fine 
> if a document doesn't generate any triples. It's up to the consumer to deal 
> with the lack of triples.
> The error is generated at 
> service/src/main/java/org/apache/any23/servlet/WebResponder.java:151



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ANY23-229) 501 error if no triples are extracted

2014-08-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14086983#comment-14086983
 ] 

ASF GitHub Bot commented on ANY23-229:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/8


> 501 error if no triples are extracted
> -
>
> Key: ANY23-229
> URL: https://issues.apache.org/jira/browse/ANY23-229
> Project: Apache Any23
>  Issue Type: Bug
>Reporter: stephane corlosquet
>
> I found this bug when checking the RDFa test suite results. Some tests are 
> failing because any23 returns a 501 error when it's not able to extract 
> triples: "Extraction completed. No triples have been found." Those tests are 
> specifically expecting no triples to be extracted, and they fail because 
> any23 returns an error.
> I don't see any reason why this situation would trigger an error. It's fine 
> if a document doesn't generate any triples. It's up to the consumer to deal 
> with the lack of triples.
> The error is generated at 
> service/src/main/java/org/apache/any23/servlet/WebResponder.java:151



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (ANY23-238) Fix generation of BNode name for microdata when 'itemid' is given without a value.

2014-09-12 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14131894#comment-14131894
 ] 

ASF GitHub Bot commented on ANY23-238:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/9#issuecomment-55443359
  
Hi @Timpy thank you so much fo this PR.
I absolutely agree with you that this is a problem as I do not like the 
look of the BNode hash either, it is useless for many purposes.
For the record, can you please post here what the new outout will look like.
Thank you
p.s. I posted https://issues.apache.org/jira/browse/ANY23-238 so that we 
can add this to our release report once we release Any23 1.0


> Fix generation of BNode name for microdata when 'itemid' is given without a 
> value.
> --
>
> Key: ANY23-238
> URL: https://issues.apache.org/jira/browse/ANY23-238
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: microdata
>Affects Versions: 1.0
>Reporter: Lewis John McGibbney
> Fix For: 1.1
>
>
> Linking this issue to the relevant Github issue
> https://github.com/apache/any23/pull/9



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-131) Nested Microdata are not extracted

2015-01-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288729#comment-14288729
 ] 

ASF GitHub Bot commented on ANY23-131:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/10

ANY23-131 Nested Microdata are not extracted

Trivial patch which addresses a recent mailing list item
http://www.mail-archive.com/user%40any23.apache.org/msg00166.html

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-131

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/10.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #10


commit 19abecc58c064cd388d3dcfe29ac90f1b7750ae0
Author: Lewis John McGibbney 
Date:   2015-01-23T04:37:04Z

ANY23-131 Nested Microdata are not extracted




> Nested Microdata are not extracted
> --
>
> Key: ANY23-131
> URL: https://issues.apache.org/jira/browse/ANY23-131
> Project: Apache Any23
>  Issue Type: Bug
>  Components: microdata
>Affects Versions: 0.7.0
>Reporter: Sebastien Richard
> Fix For: 1.2
>
>
> Proposed patch:
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java:
> remove incorrect optim:
> L166
> - return getUnnestedNodes( topLevelItemScopes ); 
> + return topLevelItemScopes;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-246) Add Open Graph Protocol and Facebook prefixes to popular.prefixes

2015-01-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288836#comment-14288836
 ] 

ASF GitHub Bot commented on ANY23-246:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/11

ANY23-246 Add Open Graph Protocol and Facebook prefixes to popular.prefixes

Pull request bundled with ANY23-131
I am getting better results here with more triples being extracted hence 
bumping the expected OG triples from 9 --> 12 in  
core/src/test/java/org/apache/any23/Any23Test#testExtractionParameters

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-246

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/11.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11


commit 19abecc58c064cd388d3dcfe29ac90f1b7750ae0
Author: Lewis John McGibbney 
Date:   2015-01-23T04:37:04Z

ANY23-131 Nested Microdata are not extracted

commit 403a8631ed7e9fd4f15968eb94ab736157e72d8f
Author: Lewis John McGibbney 
Date:   2015-01-23T06:06:47Z

ANY23-246 Add Open Graph Protocol and Facebook prefixes to popular.prefixes




> Add Open Graph Protocol and Facebook prefixes to popular.prefixes
> -
>
> Key: ANY23-246
> URL: https://issues.apache.org/jira/browse/ANY23-246
> Project: Apache Any23
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.2
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> We have recently discussed problems with the extraction of some data 
> relationships from both OGP and FB prefixes (in particular).
> This issue should quite simply add these two prefixes to the current list of 
> popular.prefixes we maintain [0] and should also review the accuracy of 
> existing prefixes within this list.
> [0] 
> https://github.com/apache/any23/blob/master/core/src/main/resources/org/apache/any23/prefixes/prefixes.properties



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-246) Add Open Graph Protocol and Facebook prefixes to popular.prefixes

2015-01-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288838#comment-14288838
 ] 

ASF GitHub Bot commented on ANY23-246:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/11


> Add Open Graph Protocol and Facebook prefixes to popular.prefixes
> -
>
> Key: ANY23-246
> URL: https://issues.apache.org/jira/browse/ANY23-246
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.2
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> We have recently discussed problems with the extraction of some data 
> relationships from both OGP and FB prefixes (in particular).
> This issue should quite simply add these two prefixes to the current list of 
> popular.prefixes we maintain [0] and should also review the accuracy of 
> existing prefixes within this list.
> [0] 
> https://github.com/apache/any23/blob/master/core/src/main/resources/org/apache/any23/prefixes/prefixes.properties



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-131) Nested Microdata are not extracted

2015-01-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14288839#comment-14288839
 ] 

ASF GitHub Bot commented on ANY23-131:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/10


> Nested Microdata are not extracted
> --
>
> Key: ANY23-131
> URL: https://issues.apache.org/jira/browse/ANY23-131
> Project: Apache Any23
>  Issue Type: Bug
>  Components: microdata
>Affects Versions: 0.7.0
>Reporter: Sebastien Richard
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> Proposed patch:
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java:
> remove incorrect optim:
> L166
> - return getUnnestedNodes( topLevelItemScopes ); 
> + return topLevelItemScopes;



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-248) NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame dependencies to 2.7.14

2015-02-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307466#comment-14307466
 ] 

ASF GitHub Bot commented on ANY23-248:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/12#issuecomment-73071919
  
@ansell can you take a look? Thank you Peter


> NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame dependencies 
> to 2.7.14
> -
>
> Key: ANY23-248
> URL: https://issues.apache.org/jira/browse/ANY23-248
> Project: Apache Any23
>  Issue Type: Bug
>Affects Versions: 1.1
> Environment: hadoop,linux
>Reporter: Souri
>Priority: Minor
> Fix For: 1.2
>
>
> I am trying to create n-triples from an html string. I am using the following 
> code to do it:
> StringDocumentSource documentSource = new StringDocumentSource(html, null);
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> final NTriplesWriter tripleHandler = new NTriplesWriter(out);
> Any23 runner = new Any23();
>
> runner.extract(documentSource,tripleHandler);
> tripleHandler.close();
> String result = out.toString("us-ascii");
> return result;
> This is giving me the error :
> java.lang.NullPointerException
>   at 
> org.apache.any23.extractor.SingleDocumentExtraction.filterExtractorsByMIMEType(SingleDocumentExtraction.java:421)
>   at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:223)
>   at org.apache.any23.Any23.extract(Any23.java:298)
>   at org.apache.any23.Any23.extract(Any23.java:433)
> I am running this in hadoop. When I run locally with a single file it works, 
> but doesn't work when run on hadoop.
> Can someone please tell me how to go about this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-248) NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame dependencies to 2.7.14

2015-02-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14307465#comment-14307465
 ] 

ASF GitHub Bot commented on ANY23-248:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/12

ANY23-248 NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame 
dependencies to 2.7.14

Trivial upgrade of sesame deps

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-248

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/12.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #12


commit 3332f4a89b2357a6a2bfe374dea1c894f3a298e6
Author: Lewis John McGibbney 
Date:   2015-02-05T16:04:03Z

ANY23-248 NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame 
dependencies to 2.7.14




> NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame dependencies 
> to 2.7.14
> -
>
> Key: ANY23-248
> URL: https://issues.apache.org/jira/browse/ANY23-248
> Project: Apache Any23
>  Issue Type: Bug
>Affects Versions: 1.1
> Environment: hadoop,linux
>Reporter: Souri
>Priority: Minor
> Fix For: 1.2
>
>
> I am trying to create n-triples from an html string. I am using the following 
> code to do it:
> StringDocumentSource documentSource = new StringDocumentSource(html, null);
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> final NTriplesWriter tripleHandler = new NTriplesWriter(out);
> Any23 runner = new Any23();
>
> runner.extract(documentSource,tripleHandler);
> tripleHandler.close();
> String result = out.toString("us-ascii");
> return result;
> This is giving me the error :
> java.lang.NullPointerException
>   at 
> org.apache.any23.extractor.SingleDocumentExtraction.filterExtractorsByMIMEType(SingleDocumentExtraction.java:421)
>   at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:223)
>   at org.apache.any23.Any23.extract(Any23.java:298)
>   at org.apache.any23.Any23.extract(Any23.java:433)
> I am running this in hadoop. When I run locally with a single file it works, 
> but doesn't work when run on hadoop.
> Can someone please tell me how to go about this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-248) NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame dependencies to 2.7.14

2015-02-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14314435#comment-14314435
 ] 

ASF GitHub Bot commented on ANY23-248:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/12


> NTriplesWriter on hadoop : issue with MIME type/Upgrade sesame dependencies 
> to 2.7.14
> -
>
> Key: ANY23-248
> URL: https://issues.apache.org/jira/browse/ANY23-248
> Project: Apache Any23
>  Issue Type: Bug
>Affects Versions: 1.1
> Environment: hadoop,linux
>Reporter: Souri
>Assignee: Lewis John McGibbney
>Priority: Minor
> Fix For: 1.2
>
>
> I am trying to create n-triples from an html string. I am using the following 
> code to do it:
> StringDocumentSource documentSource = new StringDocumentSource(html, null);
> ByteArrayOutputStream out = new ByteArrayOutputStream();
> final NTriplesWriter tripleHandler = new NTriplesWriter(out);
> Any23 runner = new Any23();
>
> runner.extract(documentSource,tripleHandler);
> tripleHandler.close();
> String result = out.toString("us-ascii");
> return result;
> This is giving me the error :
> java.lang.NullPointerException
>   at 
> org.apache.any23.extractor.SingleDocumentExtraction.filterExtractorsByMIMEType(SingleDocumentExtraction.java:421)
>   at 
> org.apache.any23.extractor.SingleDocumentExtraction.run(SingleDocumentExtraction.java:223)
>   at org.apache.any23.Any23.extract(Any23.java:298)
>   at org.apache.any23.Any23.extract(Any23.java:433)
> I am running this in hadoop. When I run locally with a single file it works, 
> but doesn't work when run on hadoop.
> Can someone please tell me how to go about this issue?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-250) Upgrade to Tika 1.7

2015-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340550#comment-14340550
 ] 

ASF GitHub Bot commented on ANY23-250:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/13#issuecomment-76448755
  
all tests pass locally. Would be appreciated if someone could confirm.



> Upgrade to Tika 1.7
> ---
>
> Key: ANY23-250
> URL: https://issues.apache.org/jira/browse/ANY23-250
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, mime
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> Tika 1.7 was release a while back. We should upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-250) Upgrade to Tika 1.7

2015-02-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14340548#comment-14340548
 ] 

ASF GitHub Bot commented on ANY23-250:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/13

ANY23-250 Upgrade to Tika 1.7

Hi Folks,
Please see PR for master branch which addresses ANY23-250

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-250

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/13.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #13


commit e47563ecc72572a97b612cded2717b2db4c5be72
Author: Lewis John McGibbney 
Date:   2015-02-27T18:44:41Z

ANY23-250 Upgrade to Tika 1.7




> Upgrade to Tika 1.7
> ---
>
> Key: ANY23-250
> URL: https://issues.apache.org/jira/browse/ANY23-250
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, mime
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> Tika 1.7 was release a while back. We should upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-250) Upgrade to Tika 1.7

2015-03-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14346201#comment-14346201
 ] 

ASF GitHub Bot commented on ANY23-250:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/13


> Upgrade to Tika 1.7
> ---
>
> Key: ANY23-250
> URL: https://issues.apache.org/jira/browse/ANY23-250
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, mime
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> Tika 1.7 was release a while back. We should upgrade.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-255) apache-any23-quads dependency should not be test in core pom.xml

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357345#comment-14357345
 ] 

ASF GitHub Bot commented on ANY23-255:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/14

ANY23-255 apache-any23-quads dependency should not be  test in core 
pom.xml



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-255

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/14.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #14


commit 098fc5bf25fdd21a0019c15001a288f3614cf5aa
Author: Lewis John McGibbney 
Date:   2015-03-11T18:29:01Z

ANY23-255 apache-any23-quads dependency should not be  test in core 
pom.xml




> apache-any23-quads dependency should not be  test in core pom.xml
> 
>
> Key: ANY23-255
> URL: https://issues.apache.org/jira/browse/ANY23-255
> Project: Apache Any23
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> Right now if you build core and try to invoke the app assembler
> {code}
> ./bin/any23 vocab
> {code}
> It spits out a message saying that no Factory implementation could be located 
> for nquads. This is because nquads is set as test within the 
> core/pom.xml
> This trivial issues will address and fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-255) apache-any23-quads dependency should not be test in core pom.xml

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357349#comment-14357349
 ] 

ASF GitHub Bot commented on ANY23-255:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/14


> apache-any23-quads dependency should not be  test in core pom.xml
> 
>
> Key: ANY23-255
> URL: https://issues.apache.org/jira/browse/ANY23-255
> Project: Apache Any23
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> Right now if you build core and try to invoke the app assembler
> {code}
> ./bin/any23 vocab
> {code}
> It spits out a message saying that no Factory implementation could be located 
> for nquads. This is because nquads is set as test within the 
> core/pom.xml
> This trivial issues will address and fix that.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-253) JSON-LD cannot be processed by Rover

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357400#comment-14357400
 ] 

ASF GitHub Bot commented on ANY23-253:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/15

ANY23-253 JSON-LD cannot be processed by Rover

Hi @michelemostarda, please check out the results which I now get when I 
run the following

lmcgibbn@LMC-032857 /usr/local/any23/core/target/appassembler(master) $ 
./bin/any23 rover -l output.log -o output.txt -s 
"http://people.apache.org/~lewismc/example-jsonld.jsonld";
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.


Apache Any23 :: rover



>Summary:
   -total calls: 2
   -total triples: 5
   -total runtime: 12 ms!
   -tripls/ms: 0
   -ms/calls: 6
>Extractor: consolidation-extractor
   -total calls: 1
   -total triples: 0
   -total runtime: 0 ms!
   -ms/calls: 0
>Extractor: rdf-jsonld
   -total calls: 1
   -total triples: 5
   -total runtime: 12 ms!
   -tripls/ms: 0
   -ms/calls: 12


Apache Any23 SUCCESS
Total time: 2s
Finished at: Wed Mar 11 11:57:42 PDT 2015
Final Memory: 75M/480M


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY-253

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/15.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15


commit f88cc51f3194e85978c55c330807a94254a2167c
Author: Lewis John McGibbney 
Date:   2015-03-11T18:59:03Z

ANY23-253 JSON-LD cannot be processed by Rover




> JSON-LD cannot be processed by Rover
> 
>
> Key: ANY23-253
> URL: https://issues.apache.org/jira/browse/ANY23-253
> Project: Apache Any23
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.1
>Reporter: Michele Mostarda
> Fix For: 1.2
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-253) JSON-LD cannot be processed by Rover

2015-03-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14357447#comment-14357447
 ] 

ASF GitHub Bot commented on ANY23-253:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/15


> JSON-LD cannot be processed by Rover
> 
>
> Key: ANY23-253
> URL: https://issues.apache.org/jira/browse/ANY23-253
> Project: Apache Any23
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.1
>Reporter: Michele Mostarda
> Fix For: 1.2
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-253) JSON-LD cannot be processed by Rover

2015-03-17 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14365240#comment-14365240
 ] 

ASF GitHub Bot commented on ANY23-253:
--

Github user michelemostarda commented on the pull request:

https://github.com/apache/any23/pull/15#issuecomment-82389735
  
Hi Lewis, sounds fine now.
Thanks
Best


> JSON-LD cannot be processed by Rover
> 
>
> Key: ANY23-253
> URL: https://issues.apache.org/jira/browse/ANY23-253
> Project: Apache Any23
>  Issue Type: Bug
>  Components: CLI
>Affects Versions: 1.1
>Reporter: Michele Mostarda
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2015-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371550#comment-14371550
 ] 

ASF GitHub Bot commented on ANY23-226:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/16

ANY23-226 Extract JSON-LD embedded in HTML

Initial patch for this support. 
It is not working correctly @ansell can you have a look into the parsing of 
JSONLD textual content?
I've provided a '//' comment to where I can see the correct parser being 
selected. It seems to not parse and extract the JSONLD so I know I am doing 
something wrong.
Thank you very much @ansell if you can have a wee look.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-226

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/16.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16


commit 1e3eb9c31af2f93906eee1081179d73c30a0881b
Author: Lewis John McGibbney 
Date:   2015-03-20T15:55:29Z

ANY23-226 Extract JSON-LD embedded in HTML




> Extract JSON-LD embedded in HTML
> 
>
> Key: ANY23-226
> URL: https://issues.apache.org/jira/browse/ANY23-226
> Project: Apache Any23
>  Issue Type: Wish
>  Components: core
>Affects Versions: 1.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
>  See http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
> I feel that we need to push this down at the jsonld-java level.
> I am investigating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2015-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14371554#comment-14371554
 ] 

ASF GitHub Bot commented on ANY23-226:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/16#issuecomment-84058410
  
Important to state, this is largely based off of our existing META 
extractor. We are merely looking for /HTML/HEAD/SCRIPT/ presence. 
Therefore, this initial effort needs to be augmented by a fully functional 
implementation which can catch presence of JSONLD in body as well.


> Extract JSON-LD embedded in HTML
> 
>
> Key: ANY23-226
> URL: https://issues.apache.org/jira/browse/ANY23-226
> Project: Apache Any23
>  Issue Type: Wish
>  Components: core
>Affects Versions: 1.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
>  See http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
> I feel that we need to push this down at the jsonld-java level.
> I am investigating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2015-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372516#comment-14372516
 ] 

ASF GitHub Bot commented on ANY23-226:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/16


> Extract JSON-LD embedded in HTML
> 
>
> Key: ANY23-226
> URL: https://issues.apache.org/jira/browse/ANY23-226
> Project: Apache Any23
>  Issue Type: Wish
>  Components: core
>Affects Versions: 1.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
>  See http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
> I feel that we need to push this down at the jsonld-java level.
> I am investigating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2015-03-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372518#comment-14372518
 ] 

ASF GitHub Bot commented on ANY23-226:
--

Github user ansell commented on the pull request:

https://github.com/apache/any23/pull/16#issuecomment-84257478
  
The main bug was that the entire script node was being sent to JSONLD-Java, 
and not just its content.

However, I also made a few other changes while doing that testing.

It turned out that the jsonld was invalid, but somehow the exception when 
parses fail was changed to be silently swallowed, so the only indication was 
that the count was 0. I turned on the exception propagation again (no reason it 
should be swallowed outside of temporary testing).

However, in addition to the 4 tests currently failing on the core tests, 
there are now other tests failing due to an inability to parse ""


> Extract JSON-LD embedded in HTML
> 
>
> Key: ANY23-226
> URL: https://issues.apache.org/jira/browse/ANY23-226
> Project: Apache Any23
>  Issue Type: Wish
>  Components: core
>Affects Versions: 1.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
>  See http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
> I feel that we need to push this down at the jsonld-java level.
> I am investigating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2015-03-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14372874#comment-14372874
 ] 

ASF GitHub Bot commented on ANY23-226:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/16#issuecomment-84384643
  
Ok Peter thank you for looking. This is great. I have not seen the test
failures. Can you please tell me if it is in Any23 or in jsonld-Java?
We could upgrade the Jsonld-Java implementation as well. To the 0.5.1
release

On Saturday, March 21, 2015, Peter Ansell  wrote:

> The main bug was that the entire script node was being sent to
> JSONLD-Java, and not just its content.
>
> However, I also made a few other changes while doing that testing.
>
> It turned out that the jsonld was invalid, but somehow the exception when
> parses fail was changed to be silently swallowed, so the only indication
> was that the count was 0. I turned on the exception propagation again (no
> reason it should be swallowed outside of temporary testing).
>
> However, in addition to the 4 tests currently failing on the core tests,
> there are now other tests failing due to an inability to parse "
> "
>
> —
> Reply to this email directly or view it on GitHub
> .
>


-- 
*Lewis*



> Extract JSON-LD embedded in HTML
> 
>
> Key: ANY23-226
> URL: https://issues.apache.org/jira/browse/ANY23-226
> Project: Apache Any23
>  Issue Type: Wish
>  Components: core
>Affects Versions: 1.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
>  See http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
> I feel that we need to push this down at the jsonld-java level.
> I am investigating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-226) Extract JSON-LD embedded in HTML

2015-03-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14375170#comment-14375170
 ] 

ASF GitHub Bot commented on ANY23-226:
--

Github user ansell commented on the pull request:

https://github.com/apache/any23/pull/16#issuecomment-84706349
  
The test failures are in the Microdata parsing code, not JSONLD-Java, so I 
thought it was fine to push this even though it was going to break the Jenkins 
build (it was already silently broken before due to the swallowed exception). 
The JSONLD parsing now works, the key fix on what you had done was to send the 
first child of the script element, which is the actual JSON code.


> Extract JSON-LD embedded in HTML
> 
>
> Key: ANY23-226
> URL: https://issues.apache.org/jira/browse/ANY23-226
> Project: Apache Any23
>  Issue Type: Wish
>  Components: core
>Affects Versions: 1.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
>  See http://www.w3.org/TR/json-ld/#embedding-json-ld-in-html-documents
> I feel that we need to push this down at the jsonld-java level.
> I am investigating.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14386954#comment-14386954
 ] 

ASF GitHub Bot commented on ANY23-247:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/17

ANY23-247 FIX Attribute name itemscope associated with an element type html 
must be followed by the ' = ' character.

Hi Folks,
PR which fixes this issue locally. I am getting clean builds now again 
after introducing this new MissingItemscopeAttributeValueRule class.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-247

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/17.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #17


commit 5ac2307a0245f06f07cbdbe300bc8608f73b1ba1
Author: Lewis John McGibbney 
Date:   2015-03-30T16:43:25Z

ANY23-247 FIX Attribute name itemscope associated with an element type html 
must be followed by the ' = ' character.




> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387512#comment-14387512
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user ansell commented on a diff in the pull request:

https://github.com/apache/any23/pull/17#discussion_r27437088
  
--- Diff: 
core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.any23.validator.rule;
+
+import org.apache.any23.validator.DOMDocument;
+import org.apache.any23.validator.Fix;
+import org.apache.any23.validator.Rule;
+import org.apache.any23.validator.RuleContext;
+
+/**
+ * This fixes missing attribute values for the 'itemscope' attribute, 
+ * which was be associated with  nodes.
+ * Typically when such a snippet of XHTML is fed through the 
+ * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
+ * subsequently to Sesame's {@link 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
+ * it will result in the following behavior. 
+ * 
+ * {@code
+ * [Fatal Error] :23:15: Attribute name "itemscope" associated with an 
element type "div" must be followed by the ' = ' character.
+ * }
+ * 
+ * This Fix is an effort to mitigate against that happening. 
+ *
+ */
+public class MissingItemscopeAttributeValueRule implements Fix {
--- End diff --

How is this class recognised or instantiated? META-INF/services/ or another 
method?


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387600#comment-14387600
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/17#discussion_r27442017
  
--- Diff: 
core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.any23.validator.rule;
+
+import org.apache.any23.validator.DOMDocument;
+import org.apache.any23.validator.Fix;
+import org.apache.any23.validator.Rule;
+import org.apache.any23.validator.RuleContext;
+
+/**
+ * This fixes missing attribute values for the 'itemscope' attribute, 
+ * which was be associated with  nodes.
+ * Typically when such a snippet of XHTML is fed through the 
+ * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
+ * subsequently to Sesame's {@link 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
+ * it will result in the following behavior. 
+ * 
+ * {@code
+ * [Fatal Error] :23:15: Attribute name "itemscope" associated with an 
element type "div" must be followed by the ' = ' character.
+ * }
+ * 
+ * This Fix is an effort to mitigate against that happening. 
+ *
+ */
+public class MissingItemscopeAttributeValueRule implements Fix {
--- End diff --

I looked for it being registered during a single document extraction. It 
was my understanding that validation and fixes are registered and active as 
part of the extraction parameters agenda? If a vanilla SingleDocumentExtration 
is invoked... as per the Any23Test then by default the Fixes and Validations 
are activated.


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387623#comment-14387623
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user ansell commented on a diff in the pull request:

https://github.com/apache/any23/pull/17#discussion_r27442717
  
--- Diff: 
core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.any23.validator.rule;
+
+import org.apache.any23.validator.DOMDocument;
+import org.apache.any23.validator.Fix;
+import org.apache.any23.validator.Rule;
+import org.apache.any23.validator.RuleContext;
+
+/**
+ * This fixes missing attribute values for the 'itemscope' attribute, 
+ * which was be associated with  nodes.
+ * Typically when such a snippet of XHTML is fed through the 
+ * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
+ * subsequently to Sesame's {@link 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
+ * it will result in the following behavior. 
+ * 
+ * {@code
+ * [Fatal Error] :23:15: Attribute name "itemscope" associated with an 
element type "div" must be followed by the ' = ' character.
+ * }
+ * 
+ * This Fix is an effort to mitigate against that happening. 
+ *
+ */
+public class MissingItemscopeAttributeValueRule implements Fix {
--- End diff --

It may be done using a classpath scan. I will look into it further.


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387704#comment-14387704
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/17#discussion_r27444885
  
--- Diff: 
core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.any23.validator.rule;
+
+import org.apache.any23.validator.DOMDocument;
+import org.apache.any23.validator.Fix;
+import org.apache.any23.validator.Rule;
+import org.apache.any23.validator.RuleContext;
+
+/**
+ * This fixes missing attribute values for the 'itemscope' attribute, 
+ * which was be associated with  nodes.
+ * Typically when such a snippet of XHTML is fed through the 
+ * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
+ * subsequently to Sesame's {@link 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
+ * it will result in the following behavior. 
+ * 
+ * {@code
+ * [Fatal Error] :23:15: Attribute name "itemscope" associated with an 
element type "div" must be followed by the ' = ' character.
+ * }
+ * 
+ * This Fix is an effort to mitigate against that happening. 
+ *
+ */
+public class MissingItemscopeAttributeValueRule implements Fix {
--- End diff --

Ack

On Monday, March 30, 2015, Peter Ansell  wrote:

> In
> 
core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
> :
>
> > +/**
> > + * This fixes missing attribute values for the 'itemscope' attribute,
> > + * which was be associated with  nodes.
> > + * Typically when such a snippet of XHTML is fed through the
> > + * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
> > + * subsequently to Sesame's {@link 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
> > + * it will result in the following behavior.
> > + * 
> > + * {@code
> > + * [Fatal Error] :23:15: Attribute name "itemscope" associated with an 
element type "div" must be followed by the ' = ' character.
> > + * }
> > + * 
> > + * This Fix is an effort to mitigate against that happening.
> > + *
> > + */
> > +public class MissingItemscopeAttributeValueRule implements Fix {
>
> It may be done using a classpath scan. I will look into it further.
>
> —
> Reply to this email directly or view it on GitHub
> .
>


-- 
*Lewis*



> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being

[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387707#comment-14387707
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/17#discussion_r27444925
  
--- Diff: 
core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.any23.validator.rule;
+
+import org.apache.any23.validator.DOMDocument;
+import org.apache.any23.validator.Fix;
+import org.apache.any23.validator.Rule;
+import org.apache.any23.validator.RuleContext;
+
+/**
+ * This fixes missing attribute values for the 'itemscope' attribute, 
+ * which was be associated with  nodes.
+ * Typically when such a snippet of XHTML is fed through the 
+ * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
+ * subsequently to Sesame's {@link 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
+ * it will result in the following behavior. 
+ * 
+ * {@code
+ * [Fatal Error] :23:15: Attribute name "itemscope" associated with an 
element type "div" must be followed by the ' = ' character.
+ * }
+ * 
+ * This Fix is an effort to mitigate against that happening. 
+ *
+ */
+public class MissingItemscopeAttributeValueRule implements Fix {
--- End diff --

Everything I've uploaded to the patch is what I have coded. There is no
other black magic on my end to get this invoked.

On Monday, March 30, 2015, Lewis John Mcgibbney 
wrote:

> Ack
>
> On Monday, March 30, 2015, Peter Ansell  > wrote:
>
>> In
>> 
core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
>> :
>>
>> > +/**
>> > + * This fixes missing attribute values for the 'itemscope' attribute,
>> > + * which was be associated with  nodes.
>> > + * Typically when such a snippet of XHTML is fed through the
>> > + * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
>> > + * subsequently to Sesame's {@link 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
>> > + * it will result in the following behavior.
>> > + * 
>> > + * {@code
>> > + * [Fatal Error] :23:15: Attribute name "itemscope" associated with 
an element type "div" must be followed by the ' = ' character.
>> > + * }
>> > + * 
>> > + * This Fix is an effort to mitigate against that happening.
>> > + *
>> > + */
>> > +public class MissingItemscopeAttributeValueRule implements Fix {
>>
>> It may be done using a classpath scan. I will look into it further.
>>
>> —
>> Reply to this email directly or view it on GitHub
>> .
>>
>
>
> --
> *Lewis*
>
>

-- 
*Lewis*



> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web

[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387757#comment-14387757
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user ansell commented on a diff in the pull request:

https://github.com/apache/any23/pull/17#discussion_r27446443
  
--- Diff: 
core/src/main/java/org/apache/any23/validator/rule/MissingItemscopeAttributeValueRule.java
 ---
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.any23.validator.rule;
+
+import org.apache.any23.validator.DOMDocument;
+import org.apache.any23.validator.Fix;
+import org.apache.any23.validator.Rule;
+import org.apache.any23.validator.RuleContext;
+
+/**
+ * This fixes missing attribute values for the 'itemscope' attribute, 
+ * which was be associated with  nodes.
+ * Typically when such a snippet of XHTML is fed through the 
+ * {@link org.apache.any23.extractor.rdfa.RDFa11Extractor}, and
+ * subsequently to Sesame's {@link 
org.semarglproject.sesame.rdf.rdfa.SesameRDFaParser},
+ * it will result in the following behavior. 
+ * 
+ * {@code
+ * [Fatal Error] :23:15: Attribute name "itemscope" associated with an 
element type "div" must be followed by the ' = ' character.
+ * }
+ * 
+ * This Fix is an effort to mitigate against that happening. 
+ *
+ */
+public class MissingItemscopeAttributeValueRule implements Fix {
--- End diff --

There is a hardcoded set in DefaultValidator.loadDefaultRules, but I can't 
find any place that is doing classpath scanning there.

I also do not understand the relationship between Rule and Fix. In the 
DefaultValidator, there are either Rule, or Rule+Fix, not just a Fix like you 
have here.

I will look into it further when I get a chance.


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2015-03-30 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14387787#comment-14387787
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user ansell commented on the pull request:

https://github.com/apache/any23/pull/17#issuecomment-87894632
  
Could you rebase your branch onto upstream master and try again? 

The place where the error started to become visible as a test failure (when 
I started rethrowing an exception that was being swallowed incorrectly) is on 
the current master, but your master branch is 4 commits behind that so the test 
will still silently succeed on your branch.


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2015-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388384#comment-14388384
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/17#issuecomment-88051079
  
@ansell done, the branch is now 2 ahead of master


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2015-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388395#comment-14388395
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/17#issuecomment-88056267
  
By the way @ansell, an observation is that whenever we make an attempt to 
infer the document language, we never succeed. It is always returns null. On 
every single occasion I get back null.


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2015-03-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14388397#comment-14388397
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/17#issuecomment-88056599
  
When I debug this, a good place to set a breakpoint is at line 

https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/SingleDocumentExtraction.java#L253
The parse fails on the RDFA1.1 parser with the following error... still
```
  [Fatal Error] :23:15: Attribute name "itemscope" associated with an 
element type "div" must be followed by the ' = ' character.
[2015-03-31 04:46:46,618]DEBUG544766[main] - 
org.apache.any23.extractor.SingleDocumentExtraction.runExtractor(SingleDocumentExtraction.java:488)
 - html-rdfa11: Error while parsing RDF document.
```


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-207) Implement Microformats2

2015-09-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738349#comment-14738349
 ] 

ASF GitHub Bot commented on ANY23-207:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/18#issuecomment-139149386
  
OK folks I am going to go ahead and merge this issue and associate it with 
https://issues.apache.org/jira/browse/ANY23-207


> Implement Microformats2
> ---
>
> Key: ANY23-207
> URL: https://issues.apache.org/jira/browse/ANY23-207
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> http://microformats.org/2014/03/05/getting-started-with-microformats2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-207) Implement Microformats2

2015-09-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14738354#comment-14738354
 ] 

ASF GitHub Bot commented on ANY23-207:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/18


> Implement Microformats2
> ---
>
> Key: ANY23-207
> URL: https://issues.apache.org/jira/browse/ANY23-207
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.0
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> http://microformats.org/2014/03/05/getting-started-with-microformats2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-277) Any23 master branch will not build to to build due to lacking maven-assembly-plugin

2016-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154693#comment-15154693
 ] 

ASF GitHub Bot commented on ANY23-277:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/20

ANY23-277 Any23 master branch will not build to to build due to lacking 
maven-assembly-plugin

This issue addresses https://issues.apache.org/jira/browse/ANY23-277
It also adds some configuration to .gitignore for the any23-site as well as 
Eclipse files.
Would like to commit EoB today if possible. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-277

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/20.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #20






> Any23 master branch will not build to to build due to lacking 
> maven-assembly-plugin
> ---
>
> Key: ANY23-277
> URL: https://issues.apache.org/jira/browse/ANY23-277
> Project: Apache Any23
>  Issue Type: Bug
>Affects Versions: 1.2
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 1.2
>
>
> I have no idea when this happened, but right now I cannot even build Any23 
> master branch. After consulting [~simone.tripodi] I've fixed the issues and 
> we are not back to an unsuccessful build due to previous issues reported. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-274) Change any23.microdata.ns.default configuration value to http://schema.org

2016-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154703#comment-15154703
 ] 

ASF GitHub Bot commented on ANY23-274:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/21

ANY23-274 Change any23.microdata.ns.default configuration value to 
http://schema.org

This issues addresses https://issues.apache.org/jira/browse/ANY23-274

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-274

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/21.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21


commit a3631de0e957424dab3cf1d0c1833017d7ccc6fc
Author: Lewis John McGibbney 
Date:   2016-02-19T19:16:12Z

ANY23-274 Change any23.microdata.ns.default configuration value to 
http://schema.org




> Change any23.microdata.ns.default configuration value to http://schema.org
> --
>
> Key: ANY23-274
> URL: https://issues.apache.org/jira/browse/ANY23-274
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> ./api/src/main/resources/default-configuration.properties uses the very old 
> (deprecated) http://rdf.data-vocabulary.org namespace as a default namespace 
> for prepending to Microdata extractions. 
> {code}
> #  Microdata default namespace.
> any23.microdata.ns.default=http://rdf.data-vocabulary.org/
> {code}
> For obvious reasons we should change this to http://schema.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-278) Upgrade all Maven plugin versions in parent pom.xml

2016-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154708#comment-15154708
 ] 

ASF GitHub Bot commented on ANY23-278:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/22

ANY23-278 Upgrade all Maven plugin versions in parent pom.xml

This issue addresses https://issues.apache.org/jira/browse/ANY23-278

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-278

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/22.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #22


commit 6e042b7ad1fe6600ffa36c0dd08e21ec4819553f
Author: Lewis John McGibbney 
Date:   2016-02-19T19:26:13Z

ANY23-278 Upgrade all Maven plugin versions in parent pom.xml




> Upgrade all Maven plugin versions in parent pom.xml
> ---
>
> Key: ANY23-278
> URL: https://issues.apache.org/jira/browse/ANY23-278
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: Plugin Management
>Affects Versions: 1.2
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> This issues is merely an update of all Maven plugin versions and 
> configuration within parent pom.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-278) Upgrade all Maven plugin versions in parent pom.xml

2016-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155387#comment-15155387
 ] 

ASF GitHub Bot commented on ANY23-278:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/22


> Upgrade all Maven plugin versions in parent pom.xml
> ---
>
> Key: ANY23-278
> URL: https://issues.apache.org/jira/browse/ANY23-278
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: Plugin Management
>Affects Versions: 1.2
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> This issues is merely an update of all Maven plugin versions and 
> configuration within parent pom.xml



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-277) Any23 master branch will not build to to build due to lacking maven-assembly-plugin

2016-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155389#comment-15155389
 ] 

ASF GitHub Bot commented on ANY23-277:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/20


> Any23 master branch will not build to to build due to lacking 
> maven-assembly-plugin
> ---
>
> Key: ANY23-277
> URL: https://issues.apache.org/jira/browse/ANY23-277
> Project: Apache Any23
>  Issue Type: Bug
>Affects Versions: 1.2
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Blocker
> Fix For: 1.2
>
>
> I have no idea when this happened, but right now I cannot even build Any23 
> master branch. After consulting [~simone.tripodi] I've fixed the issues and 
> we are not back to an unsuccessful build due to previous issues reported. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-274) Change any23.microdata.ns.default configuration value to http://schema.org

2016-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155393#comment-15155393
 ] 

ASF GitHub Bot commented on ANY23-274:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/21


> Change any23.microdata.ns.default configuration value to http://schema.org
> --
>
> Key: ANY23-274
> URL: https://issues.apache.org/jira/browse/ANY23-274
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: api
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> ./api/src/main/resources/default-configuration.properties uses the very old 
> (deprecated) http://rdf.data-vocabulary.org namespace as a default namespace 
> for prepending to Microdata extractions. 
> {code}
> #  Microdata default namespace.
> any23.microdata.ns.default=http://rdf.data-vocabulary.org/
> {code}
> For obvious reasons we should change this to http://schema.org



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2016-02-19 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15155451#comment-15155451
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/17#issuecomment-186536685
  
@ansell the line I am getting the error on is away down in semargl here

https://github.com/levkhomich/semargl/blob/ee8b35fc330deae6cb623fa3c57f583f3684bb76/rdfa/src/main/java/org/semarglproject/rdf/rdfa/RdfaParser.java#L1130
I am going to investigate this issue again this weekend as it is high time 
we got Any23 master back to successful healthy builds.


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.3
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2016-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212476#comment-15212476
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/17#issuecomment-201537723
  
hi @ansell OK I've added in the correct rule and fix as well as a test to 
verify that empty itemscope values are identified and fixed. 
Whilst debugging this however the core issue persists. Reasoning for this 
is that ```RDFa11Extractor extends BaseRDFExtractor``` which inherits the 
[parser function inputstream 
parameter](https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java#L105).
 This input stream is not the 'fixed' steam but the raw document. 
The only way I can think around this is for us to 
 * refactor the 
[RDFa1.1Extractor](https://github.com/apache/any23/blob/master/core/src/main/java/org/apache/any23/extractor/rdfa/RDFa11Extractor.java)
 such that it extends 
[TagSoupDomExtractor](https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L60)
 as oppose to (eventually) the 
[ContentExtractor](https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44),
 or
 * undertake a mass refactoring which essentially removes the 
[ContentExtractor](https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44)
 altogether... this would provide us with a much more flexible and adaptable 
extraction framework IMHO.

What do you think?


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2016-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212495#comment-15212495
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user ansell commented on the pull request:

https://github.com/apache/any23/pull/17#issuecomment-201545776
  
The system does seem a little too complex for our purposes and isn't usable 
because of that.

Removing generics would be the first step IMO as there are too many 
rawtypes definitions which indicate generics are being used badly.

ContentExtractor may be able to be completely removed instead of being 
refitted into the process after that and the parser should always be set to 
parse as far as practical for our purposes.

It is a little strange that there isn't a buffered, markable, InputStream 
provided for all of the steps to reuse as necessary rather than pushing a raw 
InputStream or other source into different extractors.


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2016-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212500#comment-15212500
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/17#issuecomment-201550510
  
I agree. Jumping through this in the debugged made me think the same.
I think it is different if Any23 is to be a PURE implementation... But that
is clearly not the case. Any23 fits in best when it can be used to extract
semantics from any old crap input that it is fed. Parsers and extractors
*should not* fail when there is a piece of crap input HTML. Currently,
that's exactly what happens and it is extremely limiting.

I would like to propose that this PR is committed to master as is, we then
open a brand new issue which acts exactly your comments refactoring out
content extractor and reusing the input stream which has been fixed, etc.

Any thoughts Peter? Thanks fr quick response.

On Friday, March 25, 2016, Peter Ansell  wrote:

> The system does seem a little too complex for our purposes and isn't
> usable because of that.
>
> Removing generics would be the first step IMO as there are too many
> rawtypes definitions which indicate generics are being used badly.
>
> ContentExtractor may be able to be completely removed instead of being
> refitted into the process after that and the parser should always be set 
to
> parse as far as practical for our purposes.
>
> It is a little strange that there isn't a buffered, markable, InputStream
> provided for all of the steps to reuse as necessary rather than pushing a
> raw InputStream or other source into different extractors.
>
> —
> You are receiving this because you authored the thread.
> Reply to this email directly or view it on GitHub
> 
>


-- 
*Lewis*



> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2016-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212539#comment-15212539
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user ansell commented on the pull request:

https://github.com/apache/any23/pull/17#issuecomment-201564662
  
I tested this pull request and it has a few failing tests for me. I know 
that the Any23 master hasn't been perfect for its test record (mostly due to 
unreliable remote queries), but I haven't been watching recently to know which 
tests are expected to fail.


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2016-03-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212568#comment-15212568
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/17#issuecomment-201573785
  
ACK @ansell , master branch is unstable with the following test failures


https://builds.apache.org/view/A-D/view/Any23/job/Any23-trunk/1466/#showFailuresLink

If you can reproduce this locally (or up until your test build fails within 
core with 3 failing tests) then that is the 'expected' behaviour right now. The 
Microdata test is directly related to the issue we are now discussing here. 

This issue is the most pressing for Any23 right now, IMHO it is a complete 
blocker to us releasing Any23 1.2


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-279) Fix EmbeddedJSONLDExtractor ExtractorDescription getDescription() implementation

2016-03-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15212909#comment-15212909
 ] 

ASF GitHub Bot commented on ANY23-279:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/23#issuecomment-201743212
  
https://issues.apache.org/jira/browse/ANY23-279


> Fix EmbeddedJSONLDExtractor ExtractorDescription getDescription() 
> implementation
> 
>
> Key: ANY23-279
> URL: https://issues.apache.org/jira/browse/ANY23-279
> Project: Apache Any23
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> Based on https://github.com/apache/any23/pull/23 a bug has been identified 
> which shows the following
> https://github.com/apache/any23/pull/23/files
> This issue will address that and merge the pull request



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2016-03-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15215394#comment-15215394
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/17#issuecomment-202702530
  
@ansell any further comments here? I will try to get to work on the larger 
issue this week. 


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-247) FIX Attribute name "itemscope" associated with an element type "html" must be followed by the ' = ' character.

2016-04-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-247?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223021#comment-15223021
 ] 

ASF GitHub Bot commented on ANY23-247:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/17


> FIX Attribute name "itemscope" associated with an element type "html" must be 
> followed by the ' = ' character.
> --
>
> Key: ANY23-247
> URL: https://issues.apache.org/jira/browse/ANY23-247
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> In the following markup
> {code}
>  "http://www.w3.org/TR/html4/loose.dtd";>
> http://www.w3.org/1999/xhtml"; 
> xmlns:og="http://opengraphprotocol.org/schema/"; 
> xmlns:fb="http://www.facebook.com/2008/fbml"; version="HTML+RDFa 1.0" 
> xml:lang="en" itemscope itemtype="http://schema.org/Product";>
> 
> 
> 
> 
> ...
> {code}
> Due to the absence of any subsequent value for *itemscope*, we get the 
> following error in our web server logs
> {code}
> [Fatal Error] :2:185: Attribute name "itemscope" associated with an element 
> type "html" must be followed by the ' = ' character.
> {code}
> Although the markup semantics are incorrect, Any23 should simply perform a 
> check for the itemscope value being null, if this is the case then add *=""*, 
> there is a precedent for us doing something like this before, I just cant 
> find the ticket right now!
> The code we need to add is present within either 
> core/src/main/java/org/apache/any23/extractor/microdata/ItemScope.java
> core/src/main/java/org/apache/any23/extractor/microdata/MicrodataParser.java



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-280) Refactor ContentExtractor to improve extraction flexibility

2016-04-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15228968#comment-15228968
 ] 

ASF GitHub Bot commented on ANY23-280:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/24

Initial move towards addressing ANY23-280 Refactor ContentExtractor to 
improve extraction flexibility

Hi Folks,
This is an initial crack at addressing 
https://issues.apache.org/jira/browse/ANY23-280
Essentially, the main API difference is the complete removal of ```public 
interface ContentExtractor extends Extractor``` from the Extractor 
interface in the api module.
This patch has a long way to go with numerous failing tests however I 
wanted to post it for feedback.
Although Any23 still builds with -DskipTests, without that flag the failing 
tests are as follows
```
Results :

Failed tests:
  Any23Test.testDemoCodeSnippet1:201
  Any23Test.testN3Detection1:92->assertDetection:661
  Any23Test.testN3Detection2:97->assertDetection:661
  Any23Test.testTTLDetection:87->assertDetection:661
  RoverTest.testRunMultiURLs:104->runWithMultiSourcesAndVerify:134 
Unexpected number of statements.
Tests in error:
  Any23Test.testProgrammaticExtraction:279 » NullPointer

CSVExtractorTest.testExtractionCommaSeparated:49->AbstractExtractorTestCase.dumpModelToRDFXML:714
 » Runtime

CSVExtractorTest.testExtractionEmptyValue:112->AbstractExtractorTestCase.dumpModelToRDFXML:714
 » Runtime

CSVExtractorTest.testExtractionSemicolonSeparated:64->AbstractExtractorTestCase.dumpModelToRDFXML:714
 » Runtime

CSVExtractorTest.testExtractionTabSeparated:79->AbstractExtractorTestCase.dumpModelToRDFXML:714
 » Runtime

CSVExtractorTest.testTypeManagement:94->AbstractExtractorTestCase.dumpModelToRDFXML:714
 » Runtime

RDFa11ExtractorTest>AbstractRDFaExtractorTestCase.testDrupalTestPage:124->AbstractExtractorTestCase.assertExtract:217->AbstractExtractorTestCase.assertExtract:200->AbstractExtractorTestCase.extract:185
 » NullPointer

RDFaExtractorTest>AbstractRDFaExtractorTestCase.testDrupalTestPage:124->AbstractExtractorTestCase.assertExtract:217->AbstractExtractorTestCase.assertExtract:200->AbstractExtractorTestCase.extract:185
 » NullPointer
Tests run: 403, Failures: 5, Errors: 8, Skipped: 11
```
You will see that some of the tests concern 
https://issues.apache.org/jira/browse/ANY23-267 as well.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-280

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/24.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #24


commit 801f2f93967bfd1295700223085eef3f54181517
Author: Lewis John McGibbney 
Date:   2016-04-06T19:44:35Z

Initial move towards addressing ANY23-280 Refactor ContentExtractor to 
improve extraction flexibility




> Refactor ContentExtractor to improve extraction flexibility
> ---
>
> Key: ANY23-280
> URL: https://issues.apache.org/jira/browse/ANY23-280
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, extractors
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> As discussed on ANY23-247, the 
> [ContentExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44]
>  is simply not fit for purpose. This issue was discovered and the cause has 
> plagued our builds ever since. Any extractors which implement 
> [BaseRDFExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java]
>  are based on the Extractor.ContentExtractor and hence work off of an 
> 'unfixed' raw data stream as oppose to a more flexible model such as the 
> [TagSoupDOMExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L60].
> This issue should refactor RDF extractors to enable more flexibility and to 
> avoid issues we encounter with the strict SAX parsing logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-280) Refactor ContentExtractor to improve extraction flexibility

2016-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15251342#comment-15251342
 ] 

ASF GitHub Bot commented on ANY23-280:
--

Github user lewismc commented on the pull request:

https://github.com/apache/any23/pull/24#issuecomment-212761670
  
Anyone get a chance to take a look at this? This is THE critical issue for 
Any23 to address right now IMHO.


> Refactor ContentExtractor to improve extraction flexibility
> ---
>
> Key: ANY23-280
> URL: https://issues.apache.org/jira/browse/ANY23-280
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, extractors
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> As discussed on ANY23-247, the 
> [ContentExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44]
>  is simply not fit for purpose. This issue was discovered and the cause has 
> plagued our builds ever since. Any extractors which implement 
> [BaseRDFExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java]
>  are based on the Extractor.ContentExtractor and hence work off of an 
> 'unfixed' raw data stream as oppose to a more flexible model such as the 
> [TagSoupDOMExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L60].
> This issue should refactor RDF extractors to enable more flexibility and to 
> avoid issues we encounter with the strict SAX parsing logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-293) Package log4j configuration with core appassembler

2016-06-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333167#comment-15333167
 ] 

ASF GitHub Bot commented on ANY23-293:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/25

ANY23-293 Package log4j configuration with core appassembler

This PR addresses a few things
 * Packages log4j configuration with core appassembler
 * adds 4 missing license headers
 * changes core log4j verbosity from DEBUG to INFO

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-293

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/25.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #25


commit 2df4845e8694e6c8206a25e0585e16bcfe293a07
Author: Lewis John McGibbney 
Date:   2016-06-16T06:00:05Z

ANY23-293 Package log4j configuration with core appassembler




> Package log4j configuration with core appassembler
> --
>
> Key: ANY23-293
> URL: https://issues.apache.org/jira/browse/ANY23-293
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> This issue relates to a log4j warning which notifies us that noconfig has 
> been provided for the logger implementation. The message is seen when one 
> invokes the any23 core app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-293) Package log4j configuration with core appassembler

2016-06-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-293?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15333172#comment-15333172
 ] 

ASF GitHub Bot commented on ANY23-293:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/25


> Package log4j configuration with core appassembler
> --
>
> Key: ANY23-293
> URL: https://issues.apache.org/jira/browse/ANY23-293
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> This issue relates to a log4j warning which notifies us that noconfig has 
> been provided for the logger implementation. The message is seen when one 
> invokes the any23 core app.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-280) Refactor ContentExtractor to improve extraction flexibility

2016-06-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340798#comment-15340798
 ] 

ASF GitHub Bot commented on ANY23-280:
--

Github user ansell commented on a diff in the pull request:

https://github.com/apache/any23/pull/24#discussion_r67792344
  
--- Diff: 
plugins/office-scraper/src/test/java/org/apache/any23/plugin/officescraper/ExcelExtractorTest.java
 ---
@@ -40,6 +42,8 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import com.fasterxml.jackson.databind.introspect.WithMember;
--- End diff --

This import seems to be unused


> Refactor ContentExtractor to improve extraction flexibility
> ---
>
> Key: ANY23-280
> URL: https://issues.apache.org/jira/browse/ANY23-280
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, extractors
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> As discussed on ANY23-247, the 
> [ContentExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44]
>  is simply not fit for purpose. This issue was discovered and the cause has 
> plagued our builds ever since. Any extractors which implement 
> [BaseRDFExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java]
>  are based on the Extractor.ContentExtractor and hence work off of an 
> 'unfixed' raw data stream as oppose to a more flexible model such as the 
> [TagSoupDOMExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L60].
> This issue should refactor RDF extractors to enable more flexibility and to 
> avoid issues we encounter with the strict SAX parsing logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-280) Refactor ContentExtractor to improve extraction flexibility

2016-06-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340803#comment-15340803
 ] 

ASF GitHub Bot commented on ANY23-280:
--

Github user ansell commented on the issue:

https://github.com/apache/any23/pull/24
  
There are a large number of whitespace modifications to change from tabs to 
2-space indentation. Is 2-space indentation what Any23 is aiming for, given 
that most java code is either tab or 4-space indentation.


> Refactor ContentExtractor to improve extraction flexibility
> ---
>
> Key: ANY23-280
> URL: https://issues.apache.org/jira/browse/ANY23-280
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, extractors
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> As discussed on ANY23-247, the 
> [ContentExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44]
>  is simply not fit for purpose. This issue was discovered and the cause has 
> plagued our builds ever since. Any extractors which implement 
> [BaseRDFExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java]
>  are based on the Extractor.ContentExtractor and hence work off of an 
> 'unfixed' raw data stream as oppose to a more flexible model such as the 
> [TagSoupDOMExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L60].
> This issue should refactor RDF extractors to enable more flexibility and to 
> avoid issues we encounter with the strict SAX parsing logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-280) Refactor ContentExtractor to improve extraction flexibility

2016-06-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340807#comment-15340807
 ] 

ASF GitHub Bot commented on ANY23-280:
--

Github user ansell commented on the issue:

https://github.com/apache/any23/pull/24
  
If we are going to be modifying the public API we probably should be aiming 
for a 2.0 release, otherwise the version numbers are arbitrary


> Refactor ContentExtractor to improve extraction flexibility
> ---
>
> Key: ANY23-280
> URL: https://issues.apache.org/jira/browse/ANY23-280
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, extractors
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> As discussed on ANY23-247, the 
> [ContentExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44]
>  is simply not fit for purpose. This issue was discovered and the cause has 
> plagued our builds ever since. Any extractors which implement 
> [BaseRDFExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java]
>  are based on the Extractor.ContentExtractor and hence work off of an 
> 'unfixed' raw data stream as oppose to a more flexible model such as the 
> [TagSoupDOMExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L60].
> This issue should refactor RDF extractors to enable more flexibility and to 
> avoid issues we encounter with the strict SAX parsing logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-280) Refactor ContentExtractor to improve extraction flexibility

2016-06-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340809#comment-15340809
 ] 

ASF GitHub Bot commented on ANY23-280:
--

Github user ansell commented on the issue:

https://github.com/apache/any23/pull/24
  
Given how broad this pull request is, it needs to be completed before I can 
work on some of the issues I have assigned to me.


> Refactor ContentExtractor to improve extraction flexibility
> ---
>
> Key: ANY23-280
> URL: https://issues.apache.org/jira/browse/ANY23-280
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, extractors
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> As discussed on ANY23-247, the 
> [ContentExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44]
>  is simply not fit for purpose. This issue was discovered and the cause has 
> plagued our builds ever since. Any extractors which implement 
> [BaseRDFExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java]
>  are based on the Extractor.ContentExtractor and hence work off of an 
> 'unfixed' raw data stream as oppose to a more flexible model such as the 
> [TagSoupDOMExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L60].
> This issue should refactor RDF extractors to enable more flexibility and to 
> avoid issues we encounter with the strict SAX parsing logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-280) Refactor ContentExtractor to improve extraction flexibility

2016-06-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15340890#comment-15340890
 ] 

ASF GitHub Bot commented on ANY23-280:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/24
  
> There are a large number of whitespace modifications to change from tabs 
to 2-space indentation. Is 2-space indentation what Any23 is aiming for, given 
that most java code is either tab or 4-space indentation.

I'll revert these changes to 4 spaces as per remainder of codebase and 
force an update to this PR.

> If we are going to be modifying the public API we probably should be 
aiming for a 2.0 release, otherwise the version numbers are arbitrary

I would have no issues with this as all... it is a v good suggestion.

> Given how broad this pull request is, it needs to be completed before I 
can work on some of the issues I have assigned to me.

Agreed. I'll put some time in to it this week and see if I can complete it, 
stabilize tests and update the PR for review.


> Refactor ContentExtractor to improve extraction flexibility
> ---
>
> Key: ANY23-280
> URL: https://issues.apache.org/jira/browse/ANY23-280
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, extractors
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> As discussed on ANY23-247, the 
> [ContentExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L44]
>  is simply not fit for purpose. This issue was discovered and the cause has 
> plagued our builds ever since. Any extractors which implement 
> [BaseRDFExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/core/src/main/java/org/apache/any23/extractor/rdf/BaseRDFExtractor.java]
>  are based on the Extractor.ContentExtractor and hence work off of an 
> 'unfixed' raw data stream as oppose to a more flexible model such as the 
> [TagSoupDOMExtractor|https://github.com/apache/any23/blob/63ba2fc82966cc056a2e475af849154d0dfdcf93/api/src/main/java/org/apache/any23/extractor/Extractor.java#L60].
> This issue should refactor RDF extractors to enable more flexibility and to 
> avoid issues we encounter with the strict SAX parsing logic.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-235) NQuads links broken on Supported Formats Page

2016-06-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15341926#comment-15341926
 ] 

ASF GitHub Bot commented on ANY23-235:
--

GitHub user band opened a pull request:

https://github.com/apache/any23/pull/26

ANY23-235 NQuads links broken on Supported Formats Page

Documentation updates:
1. supported-formats.apt page: now has working links to N-QUADS 
documentation; and better information about when current N-QUADS spec will be 
supported.
2. site/index.apt file was missing a set of matching "{","}" in two places.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/band/any23 ANY23-235

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/26.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #26


commit 14bf373467549751c71a4e550b4cf02ecabb5453
Author: William L. Anderson 
Date:   2016-06-21T14:49:47Z

ANY23-235 NQuads links broken on Supported Formats Page




> NQuads links broken on Supported Formats Page
> -
>
> Key: ANY23-235
> URL: https://issues.apache.org/jira/browse/ANY23-235
> Project: Apache Any23
>  Issue Type: Bug
>  Components: site
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
>Priority: Trivial
> Fix For: 1.2
>
>
> There are broken links here which we should fix.
> http://any23.apache.org/supported-formats.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-235) NQuads links broken on Supported Formats Page

2016-06-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15344882#comment-15344882
 ] 

ASF GitHub Bot commented on ANY23-235:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/26
  
This looks brilliant. I will merge ASAP. Thanks @band 


> NQuads links broken on Supported Formats Page
> -
>
> Key: ANY23-235
> URL: https://issues.apache.org/jira/browse/ANY23-235
> Project: Apache Any23
>  Issue Type: Bug
>  Components: site
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
>Priority: Trivial
> Fix For: 1.2
>
>
> There are broken links here which we should fix.
> http://any23.apache.org/supported-formats.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-235) NQuads links broken on Supported Formats Page

2016-06-23 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-235?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15346850#comment-15346850
 ] 

ASF GitHub Bot commented on ANY23-235:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/26


> NQuads links broken on Supported Formats Page
> -
>
> Key: ANY23-235
> URL: https://issues.apache.org/jira/browse/ANY23-235
> Project: Apache Any23
>  Issue Type: Bug
>  Components: site
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
>Priority: Trivial
> Fix For: 1.2
>
>
> There are broken links here which we should fix.
> http://any23.apache.org/supported-formats.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2016-08-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403177#comment-15403177
 ] 

ASF GitHub Bot commented on ANY23-276:
--

GitHub user ansell opened a pull request:

https://github.com/apache/any23/pull/27

ANY23-276 : Work on conversion to RDF4J, including bump to version 2.0

Conversion to RDF4J isn't backwards compatible, so bumping to version 2.0 
to reflect that.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ansell/any23 version2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/27.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #27


commit a1f5e5ea07d48bffbf69f5cf2fa132b16a32c2f4
Author: Peter Ansell 
Date:   2016-08-02T00:50:17Z

ANY23-276 : Start work on converting from Sesame to RDF4J

commit d866c71d8ffee83fe962ecbc96509119af8338dc
Author: Peter Ansell 
Date:   2016-08-02T01:09:26Z

Fix remaining compile issues

commit 47ac64ebf463dd2fb15731aaee7fd210051a
Author: Peter Ansell 
Date:   2016-08-02T01:14:14Z

Bump to version 2.0-SNAPSHOT to reflect the large changes

commit 1a084c16644a5fa444ec689d0d1cfb915b7c8d45
Author: Peter Ansell 
Date:   2016-08-02T01:16:53Z

Fix the changes to RELEASE-NOTES by grep/sed

commit bd1f787d3cc20e25fd354d249282ddbad3a72096
Author: Peter Ansell 
Date:   2016-08-02T01:18:08Z

Change property name to IRI

commit 4aacac8a02112cb819051931f3f3e87ef09ad0f4
Author: Peter Ansell 
Date:   2016-08-02T01:20:56Z

Fix encodeIRI to encodeURI

commit ce7bdbb743750f6442ae1254dd3cd48d7260beb4
Author: Peter Ansell 
Date:   2016-08-02T01:22:28Z

Fix tika configuration files




> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 1.3
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2016-08-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15403299#comment-15403299
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user ansell commented on the issue:

https://github.com/apache/any23/pull/27
  
Pull request to Semargl to support RDF4J has been sent:

https://github.com/levkhomich/semargl/pull/45


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 1.3
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2016-09-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491873#comment-15491873
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user ansell commented on the issue:

https://github.com/apache/any23/pull/27
  
The errors are likely to be in relation to me temporarily taking out 
Semargl. The pull request for Semargl linked above has been accepted, but the 
next release has not been done for it yet.


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2016-09-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15491902#comment-15491902
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/27
  
ACK @ansell I've made an addition trivial correction within the default 
configuration file so if you merge then we can get semargl released and upgrade 
then it would be great and we can put this one to bed.


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-296) Tar complains about groupid value being too big

2016-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688657#comment-15688657
 ] 

ASF GitHub Bot commented on ANY23-296:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/29

ANY23-296 Tar complains about groupid value being too big

This issue addresses https://issues.apache.org/jira/browse/ANY23-296

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-296

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/29.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #29


commit 120b5a4fd7a792912f5fd7019ac77e736405dafc
Author: Lewis John McGibbney 
Date:   2016-11-23T02:31:27Z

ANY23-296 Tar complains about groupid value being too big




> Tar complains about groupid value being too big 
> 
>
> Key: ANY23-296
> URL: https://issues.apache.org/jira/browse/ANY23-296
> Project: Apache Any23
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> When I attempt to build Any23 on MacOSX with following set up I am getting an 
> error
> {code}
> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 
> 2015-11-10T08:41:47-08:00)
> Maven home: /usr/local/Cellar/maven/3.3.9/libexec
> Java version: 1.8.0_91, vendor: Oracle Corporation
> Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "mac os x", version: "10.11.6", arch: "x86_64", family: "mac"
> {code}
> I get the following error.
> {code}
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 1.498 s
> [INFO] Finished at: 2016-11-22T18:29:44-08:00
> [INFO] Final Memory: 23M/379M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single (assembly) on 
> project apache-any23: Execution assembly of goal 
> org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single failed: user id 
> '498339010' is too big ( > 2097151 ). -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
> {code}
> This is explained at the following FAQ
> https://maven.apache.org/plugins/maven-assembly-plugin/faq.html#tarFileModes
> PR coming up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-296) Tar complains about groupid value being too big

2016-11-22 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15688659#comment-15688659
 ] 

ASF GitHub Bot commented on ANY23-296:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/29


> Tar complains about groupid value being too big 
> 
>
> Key: ANY23-296
> URL: https://issues.apache.org/jira/browse/ANY23-296
> Project: Apache Any23
>  Issue Type: Bug
>  Components: build
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 1.2
>
>
> When I attempt to build Any23 on MacOSX with following set up I am getting an 
> error
> {code}
> Apache Maven 3.3.9 (bb52d8502b132ec0a5a3f4c09453c07478323dc5; 
> 2015-11-10T08:41:47-08:00)
> Maven home: /usr/local/Cellar/maven/3.3.9/libexec
> Java version: 1.8.0_91, vendor: Oracle Corporation
> Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_91.jdk/Contents/Home/jre
> Default locale: en_US, platform encoding: UTF-8
> OS name: "mac os x", version: "10.11.6", arch: "x86_64", family: "mac"
> {code}
> I get the following error.
> {code}
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 1.498 s
> [INFO] Finished at: 2016-11-22T18:29:44-08:00
> [INFO] Final Memory: 23M/379M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single (assembly) on 
> project apache-any23: Execution assembly of goal 
> org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single failed: user id 
> '498339010' is too big ( > 2097151 ). -> [Help 1]
> [ERROR]
> [ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
> [ERROR]
> [ERROR] For more information about the errors and possible solutions, please 
> read the following articles:
> [ERROR] [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/PluginExecutionException
> {code}
> This is explained at the following FAQ
> https://maven.apache.org/plugins/maven-assembly-plugin/faq.html#tarFileModes
> PR coming up.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-297) Any23 doesn't build under JDK1.8

2016-11-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15696726#comment-15696726
 ] 

ASF GitHub Bot commented on ANY23-297:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/30

ANY23-297 Any23 doesn't build under JDK1.8

This issue addresses https://issues.apache.org/jira/browse/ANY23-297
@ansell can you please check how this affects your open pull request? Do 
you have any objections to merging this into master branch? Thanks

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-297

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/30.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #30


commit bb55685859a26b36ca4a9893bb93aa9eb7687b8c
Author: Lewis John McGibbney 
Date:   2016-11-25T21:33:57Z

ANY23-297 Any23 doesn't build under JDK1.8




> Any23 doesn't build under JDK1.8
> 
>
> Key: ANY23-297
> URL: https://issues.apache.org/jira/browse/ANY23-297
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build, documentation
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> When I attempt to build Any23 master branch using JDK1.8 I get the following 
> issue regarding Javadoc
> {code}
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 6.975 s
> [INFO] Finished at: 2016-11-22T18:36:44-08:00
> [INFO] Final Memory: 40M/768M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:2.8:jar (attach-javadocs) on 
> project apache-any23-api: MavenReportException: Error while creating archive:
> [ERROR] Exit code: 1 - 
> /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:30: error: 
> invalid use of @return
> [ERROR] * @return exit code.
> [ERROR] ^
> [ERROR] /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:32: 
> warning: no @throws for java.lang.Exception
> [ERROR] void run() throws Exception;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: unexpected end tag: 
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: element not closed: i
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/encoding/EncodingDetector.java:37:
>  warning: no @throws for java.io.IOException
> [ERROR] String guessEncoding(InputStream input) throws IOException;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:29:
>  error: reference not found
> [ERROR] * @see org.apache.any23.Any23
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:126:
>  error: reference not found
> [ERROR] * {@link SingleDocumentExtraction#METADATA_NESTING_FLAG}.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionResult.java:60:
>  error: self-closing element not allowed
> [ERROR] * 
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/IssueReport.java:43:
>  warning: no description for @param
> [ERROR] * @param ps
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:41:
>  warning: no @return
> [ERROR] Collection getSupportedMIMETypes();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:54:
>  warning: no @return
> [ERROR] String getExampleInput();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorRegistry.java:33:
>  warning: no description for @param
> [ERROR] * @param factory
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/jav

[jira] [Commented] (ANY23-297) Any23 doesn't build under JDK1.8

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706550#comment-15706550
 ] 

ASF GitHub Bot commented on ANY23-297:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/30
  
Hi @ansell any objections to merging into master branch? Seems the like the 
semargl issues are progressing which is great, however I don't think that 
should block us here in Any23. I'll commit by CoB today unless there are 
objections. Thanks


> Any23 doesn't build under JDK1.8
> 
>
> Key: ANY23-297
> URL: https://issues.apache.org/jira/browse/ANY23-297
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build, documentation
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> When I attempt to build Any23 master branch using JDK1.8 I get the following 
> issue regarding Javadoc
> {code}
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 6.975 s
> [INFO] Finished at: 2016-11-22T18:36:44-08:00
> [INFO] Final Memory: 40M/768M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:2.8:jar (attach-javadocs) on 
> project apache-any23-api: MavenReportException: Error while creating archive:
> [ERROR] Exit code: 1 - 
> /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:30: error: 
> invalid use of @return
> [ERROR] * @return exit code.
> [ERROR] ^
> [ERROR] /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:32: 
> warning: no @throws for java.lang.Exception
> [ERROR] void run() throws Exception;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: unexpected end tag: 
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: element not closed: i
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/encoding/EncodingDetector.java:37:
>  warning: no @throws for java.io.IOException
> [ERROR] String guessEncoding(InputStream input) throws IOException;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:29:
>  error: reference not found
> [ERROR] * @see org.apache.any23.Any23
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:126:
>  error: reference not found
> [ERROR] * {@link SingleDocumentExtraction#METADATA_NESTING_FLAG}.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionResult.java:60:
>  error: self-closing element not allowed
> [ERROR] * 
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/IssueReport.java:43:
>  warning: no description for @param
> [ERROR] * @param ps
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:41:
>  warning: no @return
> [ERROR] Collection getSupportedMIMETypes();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:54:
>  warning: no @return
> [ERROR] String getExampleInput();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorRegistry.java:33:
>  warning: no description for @param
> [ERROR] * @param factory
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorRegistry.java:82:
>  warning: no @return
> [ERROR] List getAllNames();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/MIMEType.java:42: 
> warning: no description for @param
> [ERROR] * @param mimeType
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/purifier/Purifier.java:38:
>  warning: no @throws for java.io.IOException
> [ERROR] void purify(InputStream inputStream) throws IOException;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mim

[jira] [Commented] (ANY23-297) Any23 doesn't build under JDK1.8

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706681#comment-15706681
 ] 

ASF GitHub Bot commented on ANY23-297:
--

Github user ansell commented on the issue:

https://github.com/apache/any23/pull/30
  
I will run a build to verify that it works and get back to you.


> Any23 doesn't build under JDK1.8
> 
>
> Key: ANY23-297
> URL: https://issues.apache.org/jira/browse/ANY23-297
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build, documentation
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> When I attempt to build Any23 master branch using JDK1.8 I get the following 
> issue regarding Javadoc
> {code}
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 6.975 s
> [INFO] Finished at: 2016-11-22T18:36:44-08:00
> [INFO] Final Memory: 40M/768M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:2.8:jar (attach-javadocs) on 
> project apache-any23-api: MavenReportException: Error while creating archive:
> [ERROR] Exit code: 1 - 
> /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:30: error: 
> invalid use of @return
> [ERROR] * @return exit code.
> [ERROR] ^
> [ERROR] /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:32: 
> warning: no @throws for java.lang.Exception
> [ERROR] void run() throws Exception;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: unexpected end tag: 
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: element not closed: i
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/encoding/EncodingDetector.java:37:
>  warning: no @throws for java.io.IOException
> [ERROR] String guessEncoding(InputStream input) throws IOException;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:29:
>  error: reference not found
> [ERROR] * @see org.apache.any23.Any23
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:126:
>  error: reference not found
> [ERROR] * {@link SingleDocumentExtraction#METADATA_NESTING_FLAG}.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionResult.java:60:
>  error: self-closing element not allowed
> [ERROR] * 
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/IssueReport.java:43:
>  warning: no description for @param
> [ERROR] * @param ps
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:41:
>  warning: no @return
> [ERROR] Collection getSupportedMIMETypes();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:54:
>  warning: no @return
> [ERROR] String getExampleInput();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorRegistry.java:33:
>  warning: no description for @param
> [ERROR] * @param factory
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorRegistry.java:82:
>  warning: no @return
> [ERROR] List getAllNames();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/MIMEType.java:42: 
> warning: no description for @param
> [ERROR] * @param mimeType
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/purifier/Purifier.java:38:
>  warning: no @throws for java.io.IOException
> [ERROR] void purify(InputStream inputStream) throws IOException;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/purifier/Purifier.java:25:
>  error: reference not found
> [ERROR] * a {@link org.apache.any23.mime.TikaMIMETypeDetector} could
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/ma

[jira] [Commented] (ANY23-297) Any23 doesn't build under JDK1.8

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706699#comment-15706699
 ] 

ASF GitHub Bot commented on ANY23-297:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/30
  
Excellent Sir thank you :)


> Any23 doesn't build under JDK1.8
> 
>
> Key: ANY23-297
> URL: https://issues.apache.org/jira/browse/ANY23-297
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build, documentation
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> When I attempt to build Any23 master branch using JDK1.8 I get the following 
> issue regarding Javadoc
> {code}
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 6.975 s
> [INFO] Finished at: 2016-11-22T18:36:44-08:00
> [INFO] Final Memory: 40M/768M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:2.8:jar (attach-javadocs) on 
> project apache-any23-api: MavenReportException: Error while creating archive:
> [ERROR] Exit code: 1 - 
> /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:30: error: 
> invalid use of @return
> [ERROR] * @return exit code.
> [ERROR] ^
> [ERROR] /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:32: 
> warning: no @throws for java.lang.Exception
> [ERROR] void run() throws Exception;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: unexpected end tag: 
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: element not closed: i
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/encoding/EncodingDetector.java:37:
>  warning: no @throws for java.io.IOException
> [ERROR] String guessEncoding(InputStream input) throws IOException;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:29:
>  error: reference not found
> [ERROR] * @see org.apache.any23.Any23
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:126:
>  error: reference not found
> [ERROR] * {@link SingleDocumentExtraction#METADATA_NESTING_FLAG}.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionResult.java:60:
>  error: self-closing element not allowed
> [ERROR] * 
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/IssueReport.java:43:
>  warning: no description for @param
> [ERROR] * @param ps
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:41:
>  warning: no @return
> [ERROR] Collection getSupportedMIMETypes();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:54:
>  warning: no @return
> [ERROR] String getExampleInput();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorRegistry.java:33:
>  warning: no description for @param
> [ERROR] * @param factory
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorRegistry.java:82:
>  warning: no @return
> [ERROR] List getAllNames();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/MIMEType.java:42: 
> warning: no description for @param
> [ERROR] * @param mimeType
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/purifier/Purifier.java:38:
>  warning: no @throws for java.io.IOException
> [ERROR] void purify(InputStream inputStream) throws IOException;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/purifier/Purifier.java:25:
>  error: reference not found
> [ERROR] * a {@link org.apache.any23.mime.TikaMIMETypeDetector} could
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/plugin/Any2

[jira] [Commented] (ANY23-297) Any23 doesn't build under JDK1.8

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706790#comment-15706790
 ] 

ASF GitHub Bot commented on ANY23-297:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/30
  
Thank you @ansell 


> Any23 doesn't build under JDK1.8
> 
>
> Key: ANY23-297
> URL: https://issues.apache.org/jira/browse/ANY23-297
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build, documentation
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> When I attempt to build Any23 master branch using JDK1.8 I get the following 
> issue regarding Javadoc
> {code}
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 6.975 s
> [INFO] Finished at: 2016-11-22T18:36:44-08:00
> [INFO] Final Memory: 40M/768M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:2.8:jar (attach-javadocs) on 
> project apache-any23-api: MavenReportException: Error while creating archive:
> [ERROR] Exit code: 1 - 
> /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:30: error: 
> invalid use of @return
> [ERROR] * @return exit code.
> [ERROR] ^
> [ERROR] /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:32: 
> warning: no @throws for java.lang.Exception
> [ERROR] void run() throws Exception;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: unexpected end tag: 
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: element not closed: i
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/encoding/EncodingDetector.java:37:
>  warning: no @throws for java.io.IOException
> [ERROR] String guessEncoding(InputStream input) throws IOException;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:29:
>  error: reference not found
> [ERROR] * @see org.apache.any23.Any23
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:126:
>  error: reference not found
> [ERROR] * {@link SingleDocumentExtraction#METADATA_NESTING_FLAG}.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionResult.java:60:
>  error: self-closing element not allowed
> [ERROR] * 
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/IssueReport.java:43:
>  warning: no description for @param
> [ERROR] * @param ps
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:41:
>  warning: no @return
> [ERROR] Collection getSupportedMIMETypes();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:54:
>  warning: no @return
> [ERROR] String getExampleInput();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorRegistry.java:33:
>  warning: no description for @param
> [ERROR] * @param factory
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorRegistry.java:82:
>  warning: no @return
> [ERROR] List getAllNames();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/MIMEType.java:42: 
> warning: no description for @param
> [ERROR] * @param mimeType
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/purifier/Purifier.java:38:
>  warning: no @throws for java.io.IOException
> [ERROR] void purify(InputStream inputStream) throws IOException;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/purifier/Purifier.java:25:
>  error: reference not found
> [ERROR] * a {@link org.apache.any23.mime.TikaMIMETypeDetector} could
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/plugin/Any23PluginM

[jira] [Commented] (ANY23-297) Any23 doesn't build under JDK1.8

2016-11-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15706789#comment-15706789
 ] 

ASF GitHub Bot commented on ANY23-297:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/30


> Any23 doesn't build under JDK1.8
> 
>
> Key: ANY23-297
> URL: https://issues.apache.org/jira/browse/ANY23-297
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: build, documentation
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
>Priority: Critical
> Fix For: 1.2
>
>
> When I attempt to build Any23 master branch using JDK1.8 I get the following 
> issue regarding Javadoc
> {code}
> [INFO] 
> 
> [INFO] BUILD FAILURE
> [INFO] 
> 
> [INFO] Total time: 6.975 s
> [INFO] Finished at: 2016-11-22T18:36:44-08:00
> [INFO] Final Memory: 40M/768M
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-javadoc-plugin:2.8:jar (attach-javadocs) on 
> project apache-any23-api: MavenReportException: Error while creating archive:
> [ERROR] Exit code: 1 - 
> /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:30: error: 
> invalid use of @return
> [ERROR] * @return exit code.
> [ERROR] ^
> [ERROR] /usr/local/any23/api/src/main/java/org/apache/any23/cli/Tool.java:32: 
> warning: no @throws for java.lang.Exception
> [ERROR] void run() throws Exception;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:36:
>  error: unexpected end tag: 
> [ERROR] * @return true if defined, false otherwise.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: unexpected end tag: 
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/configuration/Configuration.java:21:
>  error: element not closed: i
> [ERROR] * Defines the main Any23 configuration.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/encoding/EncodingDetector.java:37:
>  warning: no @throws for java.io.IOException
> [ERROR] String guessEncoding(InputStream input) throws IOException;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:29:
>  error: reference not found
> [ERROR] * @see org.apache.any23.Any23
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionParameters.java:126:
>  error: reference not found
> [ERROR] * {@link SingleDocumentExtraction#METADATA_NESTING_FLAG}.
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractionResult.java:60:
>  error: self-closing element not allowed
> [ERROR] * 
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/IssueReport.java:43:
>  warning: no description for @param
> [ERROR] * @param ps
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:41:
>  warning: no @return
> [ERROR] Collection getSupportedMIMETypes();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorFactory.java:54:
>  warning: no @return
> [ERROR] String getExampleInput();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorRegistry.java:33:
>  warning: no description for @param
> [ERROR] * @param factory
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/extractor/ExtractorRegistry.java:82:
>  warning: no @return
> [ERROR] List getAllNames();
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/MIMEType.java:42: 
> warning: no description for @param
> [ERROR] * @param mimeType
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/purifier/Purifier.java:38:
>  warning: no @throws for java.io.IOException
> [ERROR] void purify(InputStream inputStream) throws IOException;
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/mime/purifier/Purifier.java:25:
>  error: reference not found
> [ERROR] * a {@link org.apache.any23.mime.TikaMIMETypeDetector} could
> [ERROR] ^
> [ERROR] 
> /usr/local/any23/api/src/main/java/org/apache/any23/plugin/Any23PluginManager.java:99:
>  erro

[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2016-12-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15749918#comment-15749918
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user ansell commented on the issue:

https://github.com/apache/any23/pull/27
  
I rebased this to fix the conflicts. There are some test failures, some of 
which I @Ignore'd while other I left so far. Could you have a look in to see if 
you can figure out why the tests that have been ignored or still failing are 
broken?


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755866#comment-15755866
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/27
  
ack will do @ansell thanks


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15755879#comment-15755879
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/27
  
Hi @ansell 
```
[INFO] 

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-javadoc-plugin:2.8:jar (attach-javadocs) on 
project apache-any23-core: MavenReportException: Error while creating archive:
[ERROR] Exit code: 1 - 
/usr/local/any23/core/src/main/java/org/apache/any23/extractor/rdfa/RDFa11Parser.java:324:
 error: @param name not found
[ERROR] * @param curieOrURIList list of CURIE/URI.
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/extractor/rdfa/RDFa11Parser.java:328:
 warning: no @param for curieOrIRIList
[ERROR] protected IRI[] resolveCIRIeOrIRIList(Node n, String 
curieOrIRIList, boolean termAllowed)
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/extractor/rdfa/RDFa11Parser.java:328:
 warning: no @param for termAllowed
[ERROR] protected IRI[] resolveCIRIeOrIRIList(Node n, String 
curieOrIRIList, boolean termAllowed)
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/extractor/rdfa/RDFa11Parser.java:364:
 warning: no description for @param
[ERROR] * @param curieOrIRI
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/extractor/rdfa/XSLTStylesheet.java:62:
 warning: no @throws for org.apache.any23.extractor.rdfa.XSLTStylesheetException
[ERROR] public synchronized void applyTo(Document document, Writer output)
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/extractor/rdfa/XSLTStylesheet.java:74:
 warning: no @throws for org.apache.any23.extractor.rdfa.XSLTStylesheetException
[ERROR] public synchronized void applyTo(Document document, Writer output,
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/Any23ValueFactoryWrapper.java:219:
 warning: no description for @param
[ERROR] * @param uri
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/Any23ValueFactoryWrapper.java:235:
 error: reference not found
[ERROR] * @return a valid {@link org.openrdf.model.URI}
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/Any23ValueFactoryWrapper.java:39:
 error: reference not found
[ERROR] * Any23 specialization of the {@link 
org.openrdf.model.ValueFactory}.
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/RDFUtils.java:185: 
error: @param name not found
[ERROR] * @param namespace a base namespace for the {@link IRI}
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/RDFUtils.java:186: 
error: @param name not found
[ERROR] * @param localName a local name to associate with the namespace
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/RDFUtils.java:189: 
warning: no @param for uri
[ERROR] public static IRI iri(String uri) {
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/RDFUtils.java:198: 
warning: no @param for namespace
[ERROR] public static IRI uri(String namespace, String localName) {
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/RDFUtils.java:198: 
warning: no @param for localName
[ERROR] public static IRI uri(String namespace, String localName) {
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/RDFUtils.java:198: 
warning: no @return
[ERROR] public static IRI uri(String namespace, String localName) {
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/RDFUtils.java:205: 
warning: no @param for namespace
[ERROR] public static IRI iri(String namespace, String localName) {
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/RDFUtils.java:205: 
warning: no @param for localName
[ERROR] public static IRI iri(String namespace, String localName) {
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/RDFUtils.java:205: 
warning: no @return
[ERROR] public static IRI iri(String namespace, String localName) {
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/RDFUtils.java:313: 
warning: no @param for s
[ERROR] public static Literal literal(String s, IRI datatype) {
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/main/java/org/apache/any23/rdf/RDFUtils.java:313: 
warning: no @param for datatype
[ERROR] public static Literal literal(String s, IRI datatype) {
[ERROR] ^
[ERROR] 
/usr/local/any23/core/src/ma

[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2016-12-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15756078#comment-15756078
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user ansell commented on the issue:

https://github.com/apache/any23/pull/27
  
Okay, will fix the javadoc


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-299) Missing YAML to RDF parser

2017-01-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15801566#comment-15801566
 ] 

ASF GitHub Bot commented on ANY23-299:
--

GitHub user jgrzebyta opened a pull request:

https://github.com/apache/any23/pull/32

Any23-299

Issue described in task ANY23-299. 

That patch add support for YAML->RDF.

- update tika version
- update mimetypes.xml file
- add extractor factory and extractor of YAML files

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jgrzebyta/any23 ANY23-299

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/32.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #32


commit df619d6c3382f043a7ff279b666107008009c132
Author: Jacek Grzebyta 
Date:   2017-01-03T23:18:37Z

Update Apache Tika version 1.7->1.14

Update apache-tika version: 1.7 -> 1.14
Update commons-compress version: 1.9 -> 1.10

Signed-off-by: Jacek Grzebyta 

commit 0f2bb81815fff353f86ec2c5bfa9a22d78428340
Author: Jacek Grzebyta 
Date:   2017-01-04T16:35:47Z

Adds YAML support

- Adds YAML vocabulary
- Adds YAML extractor and extractor factory
- Adds unit tests
Signed-off-by:Jacek Grzebyta 

commit 33ee8b180f39aaa30d382a3c594eb66fd1712580
Author: Jacek Grzebyta 
Date:   2017-01-04T19:55:13Z

Fix problem with multiple-document files 

Add layer of document for all files.


Signed-off-by:Jacek Grzebyta 

commit f751ff888333e58da8ca937e563d2600958e95a3
Author: Jacek Grzebyta 
Date:   2017-01-04T23:41:44Z

Fix a few bugs

Signed-off-by:Jacek Grzebyta 

commit 95742dacb1548f8b5a7c73034d480c79701a5149
Author: Jacek Grzebyta 
Date:   2017-01-05T14:45:50Z

Update mimetypes.xml

- Import main content from
   org/apache/tika/mime/tika-mimetypes.xml in 
   org.apache.tika:tika-core:1.14:jar
   except application/x-atom and application/x-wsdl
 


Signed-off-by:Jacek Grzebyta 

commit bc90b1257dbb598d8985efb8ec116cbb93082138
Author: Jacek Grzebyta 
Date:   2017-01-05T14:46:26Z

Add mime detection test for yaml files

Signed-off-by:Jacek Grzebyta 




> Missing YAML to RDF parser
> --
>
> Key: ANY23-299
> URL: https://issues.apache.org/jira/browse/ANY23-299
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, extractors
>Affects Versions: 1.1
>Reporter: Jacek
>Priority: Minor
>  Labels: features, patch
> Fix For: 1.2
>
>
> I created YAML to RDF parser but I found there is not enough flexibility 
> within the core. I will fix the issue.
> Issue related to ANY23-280



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-299) Missing YAML to RDF parser

2017-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15814574#comment-15814574
 ] 

ASF GitHub Bot commented on ANY23-299:
--

Github user jgrzebyta commented on the issue:

https://github.com/apache/any23/pull/32
  
ANY23-299 task reopened. More tests needed/


> Missing YAML to RDF parser
> --
>
> Key: ANY23-299
> URL: https://issues.apache.org/jira/browse/ANY23-299
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, extractors
>Affects Versions: 1.1
>Reporter: Jacek
>Priority: Minor
>  Labels: features, patch
> Fix For: 1.2
>
>
> I created YAML to RDF parser but I found there is not enough flexibility 
> within the core. I will fix the issue.
> Issue related to ANY23-280



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2017-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815205#comment-15815205
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/27
  
Hi @ansell whats the current state of this patch? Thanks for merging in the 
PR.


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-300) Ignore NetBeans configuration files

2017-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815213#comment-15815213
 ] 

ASF GitHub Bot commented on ANY23-300:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/31


> Ignore NetBeans configuration files
> ---
>
> Key: ANY23-300
> URL: https://issues.apache.org/jira/browse/ANY23-300
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Jacek
>Priority: Trivial
>  Labels: git
> Fix For: 1.2
>
>
> NetBeans creates additional configuration files.
> Just added relevant patterns into .gitignore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-300) Ignore NetBeans configuration files

2017-01-10 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15815223#comment-15815223
 ] 

ASF GitHub Bot commented on ANY23-300:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/31
  
Merged thank you @jgrzebyta 


> Ignore NetBeans configuration files
> ---
>
> Key: ANY23-300
> URL: https://issues.apache.org/jira/browse/ANY23-300
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Jacek
>Assignee: Jacek
>Priority: Trivial
>  Labels: git
> Fix For: 1.2
>
>
> NetBeans creates additional configuration files.
> Just added relevant patterns into .gitignore



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2017-01-11 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15818761#comment-15818761
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/27
  
@ansell I just pulled the most recent version of this PR build and tested 
locally... all tests pass.


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15821856#comment-15821856
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/27
  
Hi @ansell this is starting to look REALLY good. I've pulled your 
suggestions. I did notice that the appassembler is still generated in the 
core/target/appassembler. I think that this can be removed based upon the 
suggestion for CLI to be modularized as you've implemented.
Is there anything else you want to add to this patch? If not then I suggest 
we merge into master. 


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822313#comment-15822313
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/27
  
Now getting the following when I attempt to build
```
[INFO] 

[INFO] BUILD FAILURE
[INFO] 

[INFO] Total time: 34.292 s
[INFO] Finished at: 2017-01-13T12:21:37-08:00
[INFO] Final Memory: 53M/760M
[INFO] 

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-assembly-plugin:3.0.0:single (assembly) on 
project apache-any23-cli: Error reading assemblies: No assembly descriptors 
found. -> [Help 1]
[ERROR]
[ERROR] To see the full stack trace of the errors, re-run Maven with the -e 
switch.
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, 
please read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the 
command
[ERROR]   mvn  -rf :apache-any23-cli
```


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822400#comment-15822400
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user ansell commented on the issue:

https://github.com/apache/any23/pull/27
  
I will work on the appassembler still being present for core and check the 
cli modules assembly-plugin and then merge to master.

Can you do a silent retarget (no notifications sent to mailing list) of all 
Jira issues to 2.0 to reflect the version bump?


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822446#comment-15822446
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user asfgit closed the pull request at:

https://github.com/apache/any23/pull/27


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-276) Upgrade sesame dependencies to RDF4J

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822541#comment-15822541
 ] 

ASF GitHub Bot commented on ANY23-276:
--

Github user lewismc commented on the issue:

https://github.com/apache/any23/pull/27
  
@ansell yes I will work on transitioning issues through the workflow. Thanks


> Upgrade sesame dependencies to RDF4J
> 
>
> Key: ANY23-276
> URL: https://issues.apache.org/jira/browse/ANY23-276
> Project: Apache Any23
>  Issue Type: Improvement
>Affects Versions: 1.1
>Reporter: Lewis John McGibbney
>Assignee: Peter Ansell
> Fix For: 2.0
>
>
> The successor to OpenRDF Sesame, named Eclipse RDF4J, is in the process of 
> releasing its first versions, with milestone builds available for testing on 
> Maven Central. 
> We should upgrade in Any23 as OpenRDF Sesame will not be releasing any more 
> versions now that the Eclipse project is up and running.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-299) Missing YAML to RDF parser

2017-01-13 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15822640#comment-15822640
 ] 

ASF GitHub Bot commented on ANY23-299:
--

Github user jgrzebyta closed the pull request at:

https://github.com/apache/any23/pull/32


> Missing YAML to RDF parser
> --
>
> Key: ANY23-299
> URL: https://issues.apache.org/jira/browse/ANY23-299
> Project: Apache Any23
>  Issue Type: Improvement
>  Components: core, extractors
>Affects Versions: 1.1
>Reporter: Jacek
>Assignee: Jacek
>Priority: Minor
>  Labels: features, patch
> Fix For: 2.0
>
>
> I created YAML to RDF parser but I found there is not enough flexibility 
> within the core. I will fix the issue.
> Issue related to ANY23-280



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (ANY23-304) Add extractor for OpenIE

2017-02-15 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15869386#comment-15869386
 ] 

ASF GitHub Bot commented on ANY23-304:
--

GitHub user lewismc opened a pull request:

https://github.com/apache/any23/pull/33

ANY23-304 Add extractor for OpenIE 1st pass

This issue is a first pass at addressing 
https://issues.apache.org/jira/browse/ANY23-304. There is more work to be done 
on the new extractor. The new dependency which has been introduced has a 
significant impact on the size of Any23 as there are loads of new transitive 
dependencies. @Yongyao this is for context.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lewismc/any23 ANY23-304

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/any23/pull/33.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #33


commit 1ffd60a68d20e7f313745202c012ac3fd2b49738
Author: Lewis John McGibbney 
Date:   2017-02-16T07:17:40Z

ANY23-304 Add extractor for OpenIE 1st pass




> Add extractor for OpenIE
> 
>
> Key: ANY23-304
> URL: https://issues.apache.org/jira/browse/ANY23-304
> Project: Apache Any23
>  Issue Type: Bug
>  Components: core, extractors
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 2.1
>
>
> I'm going to start work on an extractor which uses the OpenIE library 
> https://github.com/allenai/openie-standalone
> This will provide us with the ability to execute structured extractions from 
> unstructured content essentially taking Any23 in a new direction.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ANY23-304) Add extractor for OpenIE

2017-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870779#comment-15870779
 ] 

ASF GitHub Bot commented on ANY23-304:
--

Github user ansell commented on a diff in the pull request:

https://github.com/apache/any23/pull/33#discussion_r101641510
  
--- Diff: 
core/src/main/java/org/apache/any23/extractor/openie/OpenIEExtractor.java ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.any23.extractor.openie;
+
+import java.io.IOException;
+import java.util.List;
+
+import javax.xml.transform.TransformerConfigurationException;
+import javax.xml.transform.TransformerFactoryConfigurationError;
+
+import org.apache.any23.extractor.Extractor;
+import org.apache.any23.extractor.ExtractionContext;
+import org.apache.any23.extractor.ExtractorDescription;
+import org.apache.any23.util.StreamUtils;
+import org.eclipse.rdf4j.model.IRI;
+import org.apache.any23.extractor.ExtractionException;
+import org.apache.any23.extractor.ExtractionParameters;
+import org.apache.any23.extractor.ExtractionResult;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.w3c.dom.Document;
+
+import edu.knowitall.openie.Argument;
+import edu.knowitall.openie.Instance;
+import edu.knowitall.openie.OpenIE;
+import edu.knowitall.tool.parse.ClearParser;
+import edu.knowitall.tool.postag.ClearPostagger;
+import edu.knowitall.tool.srl.ClearSrl;
+import edu.knowitall.tool.tokenize.ClearTokenizer;
+import scala.collection.JavaConversions;
+import scala.collection.Seq;
+
+
+
+/**
+ * An https://github.com/allenai/openie-standalone";>OpenIE 
+ * extractor able to generate RDF statements from 
+ * sentences representing relations in the text.
+ */
+public class OpenIEExtractor implements Extractor.TagSoupDOMExtractor {
+
+private final Logger LOG = LoggerFactory.getLogger(getClass());
+
+private IRI documentRoot;
+
+/**
+ * default constructor
+ */
+OpenIEExtractor() {
+// default constructor
+}
+
+/**
+ * @see org.apache.any23.extractor.Extractor#getDescription()
+ */
+@Override
+public ExtractorDescription getDescription() {
+return OpenIEExtractorFactory.getDescriptionInstance();
+}
+
+@Override
+public void run(ExtractionParameters extractionParameters,
+ExtractionContext context, Document in, ExtractionResult out)
+throws IOException, ExtractionException {
+OpenIE openIE = new OpenIE(new ClearParser(new ClearPostagger(new 
ClearTokenizer())), new ClearSrl(), false, false);
+
+
+Seq extractions = null;
+try {
+extractions = 
openIE.extract(StreamUtils.asString(StreamUtils.documentToInputStream(in)));
+} catch (TransformerConfigurationException | 
TransformerFactoryConfigurationError e) {
+LOG.error("Error during extraction: {}", e);
+}
+
+List listExtractions = 
JavaConversions.seqAsJavaList(extractions);
+for(Instance instance : listExtractions) {
+StringBuilder sb = new StringBuilder();
+
+sb.append(instance.confidence())
+.append('\t')
+.append(instance.extr().context())
+.append('\t')
+.append(instance.extr().arg1().text())
+.append('\t')
+.append(instance.extr().rel().text())
+.append('\t');
+
+List listArg2s = 
JavaConversions.seqAsJavaList(instance.extr().arg2s());
+for(Argument argument : listArg2s) {
+sb.append(argument.text()).append("; ");
+}
+System.out.println(sb.toString());
--- End diff --

Presuming

[jira] [Commented] (ANY23-304) Add extractor for OpenIE

2017-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870778#comment-15870778
 ] 

ASF GitHub Bot commented on ANY23-304:
--

Github user ansell commented on a diff in the pull request:

https://github.com/apache/any23/pull/33#discussion_r101638032
  
--- Diff: core/src/main/java/org/apache/any23/rdf/RDFUtils.java ---
@@ -564,6 +569,4 @@ public static boolean isAbsoluteIRI(String href) {
 }
 }
 
-private RDFUtils() {}
--- End diff --

Is this signalling an intention to change the scope of this class to 
include non-static methods?


> Add extractor for OpenIE
> 
>
> Key: ANY23-304
> URL: https://issues.apache.org/jira/browse/ANY23-304
> Project: Apache Any23
>  Issue Type: Bug
>  Components: core, extractors
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 2.1
>
>
> I'm going to start work on an extractor which uses the OpenIE library 
> https://github.com/allenai/openie-standalone
> This will provide us with the ability to execute structured extractions from 
> unstructured content essentially taking Any23 in a new direction.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (ANY23-304) Add extractor for OpenIE

2017-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870843#comment-15870843
 ] 

ASF GitHub Bot commented on ANY23-304:
--

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/33#discussion_r101650651
  
--- Diff: 
core/src/main/java/org/apache/any23/extractor/openie/OpenIEExtractor.java ---
@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *  http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.any23.extractor.openie;
+
+import java.io.IOException;
+import java.util.List;
+
+import javax.xml.transform.TransformerConfigurationException;
+import javax.xml.transform.TransformerFactoryConfigurationError;
+
+import org.apache.any23.extractor.Extractor;
+import org.apache.any23.extractor.ExtractionContext;
+import org.apache.any23.extractor.ExtractorDescription;
+import org.apache.any23.util.StreamUtils;
+import org.eclipse.rdf4j.model.IRI;
+import org.apache.any23.extractor.ExtractionException;
+import org.apache.any23.extractor.ExtractionParameters;
+import org.apache.any23.extractor.ExtractionResult;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+import org.w3c.dom.Document;
+
+import edu.knowitall.openie.Argument;
+import edu.knowitall.openie.Instance;
+import edu.knowitall.openie.OpenIE;
+import edu.knowitall.tool.parse.ClearParser;
+import edu.knowitall.tool.postag.ClearPostagger;
+import edu.knowitall.tool.srl.ClearSrl;
+import edu.knowitall.tool.tokenize.ClearTokenizer;
+import scala.collection.JavaConversions;
+import scala.collection.Seq;
+
+
+
+/**
+ * An https://github.com/allenai/openie-standalone";>OpenIE 
+ * extractor able to generate RDF statements from 
+ * sentences representing relations in the text.
+ */
+public class OpenIEExtractor implements Extractor.TagSoupDOMExtractor {
+
+private final Logger LOG = LoggerFactory.getLogger(getClass());
+
+private IRI documentRoot;
+
+/**
+ * default constructor
+ */
+OpenIEExtractor() {
+// default constructor
+}
+
+/**
+ * @see org.apache.any23.extractor.Extractor#getDescription()
+ */
+@Override
+public ExtractorDescription getDescription() {
+return OpenIEExtractorFactory.getDescriptionInstance();
+}
+
+@Override
+public void run(ExtractionParameters extractionParameters,
+ExtractionContext context, Document in, ExtractionResult out)
+throws IOException, ExtractionException {
+OpenIE openIE = new OpenIE(new ClearParser(new ClearPostagger(new 
ClearTokenizer())), new ClearSrl(), false, false);
+
+
+Seq extractions = null;
+try {
+extractions = 
openIE.extract(StreamUtils.asString(StreamUtils.documentToInputStream(in)));
+} catch (TransformerConfigurationException | 
TransformerFactoryConfigurationError e) {
+LOG.error("Error during extraction: {}", e);
+}
+
+List listExtractions = 
JavaConversions.seqAsJavaList(extractions);
+for(Instance instance : listExtractions) {
+StringBuilder sb = new StringBuilder();
+
+sb.append(instance.confidence())
+.append('\t')
+.append(instance.extr().context())
+.append('\t')
+.append(instance.extr().arg1().text())
+.append('\t')
+.append(instance.extr().rel().text())
+.append('\t');
+
+List listArg2s = 
JavaConversions.seqAsJavaList(instance.extr().arg2s());
+for(Argument argument : listArg2s) {
+sb.append(argument.text()).append("; ");
+}
+System.out.println(sb.toString());
--- End diff --

Yes, cor

[jira] [Commented] (ANY23-304) Add extractor for OpenIE

2017-02-16 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/ANY23-304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870842#comment-15870842
 ] 

ASF GitHub Bot commented on ANY23-304:
--

Github user lewismc commented on a diff in the pull request:

https://github.com/apache/any23/pull/33#discussion_r101650632
  
--- Diff: core/src/main/java/org/apache/any23/rdf/RDFUtils.java ---
@@ -564,6 +569,4 @@ public static boolean isAbsoluteIRI(String href) {
 }
 }
 
-private RDFUtils() {}
--- End diff --

Possibly, I will see how the remainder of the implementation goes.


> Add extractor for OpenIE
> 
>
> Key: ANY23-304
> URL: https://issues.apache.org/jira/browse/ANY23-304
> Project: Apache Any23
>  Issue Type: Bug
>  Components: core, extractors
>Reporter: Lewis John McGibbney
>Assignee: Lewis John McGibbney
> Fix For: 2.1
>
>
> I'm going to start work on an extractor which uses the OpenIE library 
> https://github.com/allenai/openie-standalone
> This will provide us with the ability to execute structured extractions from 
> unstructured content essentially taking Any23 in a new direction.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


  1   2   3   4   5   6   7   >