Re: Un-depending on Apache Xerces

2018-05-01 Thread Claude Warren
I think undepending on Xerces is a good idea as well.  With lots of other
faster parsers to choose from it seems like we should not be forcing apps
to include Xerces as well.

Claude

On Tue, May 1, 2018 at 12:05 PM, Andy Seaborne  wrote:

> FYI:
>
> Xerces 2.12.0 is out (as of April 21) though it has not made it to Maven
> central.
>
> One thing of interest (to me) is whether it has a bugfixed version of
> Duration. JENA-1402
>
> I still think we should un-depend on Xerces.
>
> Andy
>
>
> On 28/04/18 20:38, Andy Seaborne wrote:
>
>> JENA-1537
>>
>> While the JDK does have a Xerces derived parser (it split off long before
>> 2.11.0 and separately evolved), it is behind Java9 module "java.xml".
>>
>> Jena uses Xerces 2.11.0 in two ways - for the datatypes (oaj.datatypes)
>> and XML parsing (oaj.rdfxml.xmlinput - also known as ARP).  Both make
>> internal use of Xerces.
>>
>> The datatypes uses Xerces provide XSD datatypes including validation.
>>
>> RDFXMLParser uses Xerces SAXParser and in a minor way some other stuff
>> that isn't in java.xml.sax.
>>
>> I've had a prototype-hack go at removing Xerces from Jena:
>> https://github.com/afs/jena-xerces
>>
>> Datatypes:
>>
>> * One feature omitted: XSDDatatype.loadUserDefined.
>>
>> These functions parse XSD scheme datatype definitions. The implementation
>> calls into the internal XML parsing which would not be legal in Java9
>> modules if using the JDK built-in parser. It seems to need a fairly
>> complete XML parser engine.
>>
>> We should consider dropping this feature.
>>
>> XML Parsing:
>>
>> * Looses the check on whether InputStreamReader or FileReader have the
>> right encoding for the XML document. It hooks into an interface call that
>> does not seem to be available in a standard SAX parser. (Shouldn't be using
>> Readers anyway!)
>>
>>  Andy
>>
>


-- 
I like: Like Like - The likeliest place on the web

LinkedIn: http://www.linkedin.com/in/claudewarren


ApacheCon North America 2018 schedule is now live.

2018-05-01 Thread Rich Bowen

Dear Apache Enthusiast,

We are pleased to announce our schedule for ApacheCon North America 
2018. ApacheCon will be held September 23-27 at the Montreal Marriott 
Chateau Champlain in Montreal, Canada.


Registration is open! The early bird rate of $575 lasts until July 21, 
at which time it goes up to $800. And the room block at the Marriott 
($225 CAD per night, including wifi) closes on August 24th.


We will be featuring more than 100 sessions on Apache projects. The 
schedule is now online at https://apachecon.com/acna18/


The schedule includes full tracks of content from Cloudstack[1], 
Tomcat[2], and our GeoSpatial community[3].


We will have 4 keynote speakers, two of whom are Apache members, and two 
from the wider community.


On Tuesday, Apache member and former board member Cliff Schmidt will be 
speaking about how Amplio uses technology to educate and improve the 
quality of life of people living in very difficult parts of the 
world[4]. And Apache Fineract VP Myrle Krantz will speak about how Open 
Source banking is helping the global fight against poverty[5].


Then, on Wednesday, we’ll hear from Bridget Kromhout, Principal Cloud 
Developer Advocate from Microsoft, about the really hard problem in 
software - the people[6]. And Euan McLeod, ‎VP VIPER at ‎Comcast will 
show us the many ways that Apache software delivers your favorite shows 
to your living room[7].


ApacheCon will also feature old favorites like the Lightning Talks, the 
Hackathon (running the duration of the event), PGP key signing, and lots 
of hallway-track time to get to know your project community better.


Follow us on Twitter, @ApacheCon, and join the disc...@apachecon.com 
mailing list (send email to discuss-subscr...@apachecon.com) to stay up 
to date with developments. And if your company wants to sponsor this 
event, get in touch at h...@apachecon.com for opportunities that are 
still available.


See you in Montreal!

Rich Bowen
VP Conferences, The Apache Software Foundation
h...@apachecon.com
@ApacheCon

[1] http://cloudstackcollab.org/
[2] http://tomcat.apache.org/conference.html
[3] http://apachecon.dukecon.org/acna/2018/#/schedule?search=geospatial
[4] 
http://apachecon.dukecon.org/acna/2018/#/scheduledEvent/df977fd305a31b903
[5] 
http://apachecon.dukecon.org/acna/2018/#/scheduledEvent/22c6c30412a3828d6
[6] 
http://apachecon.dukecon.org/acna/2018/#/scheduledEvent/fbbb2384fa91ebc6b
[7] 
http://apachecon.dukecon.org/acna/2018/#/scheduledEvent/88d50c3613852c2de


[jira] [Commented] (JENA-1537) Remove requirement for Apache Xerces.

2018-05-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459661#comment-16459661
 ] 

ASF GitHub Bot commented on JENA-1537:
--

Github user afs commented on the issue:

https://github.com/apache/jena/pull/413
  
It is from the released artifacts (specifically, the source artifact).

The Xerces `Version` class has `getVersion()` -> "2.11.0-jena".
The last Xerces dependency is in the JIRA ticket.
And the commit has the change to the dependency management in the POM.



> Remove requirement for Apache Xerces.
> -
>
> Key: JENA-1537
> URL: https://issues.apache.org/jira/browse/JENA-1537
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Datatypes, RDF/XML
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>
> Apache Xerces is used for parsing and also for dataype support.
> We can switch to the JDK built-in XML parser (which is actually a forked 
> Xerces).
> For jena-core datatype, we can extract the necessary code from Xerces and put 
> it in Jena (repackaged).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena issue #413: JENA-1537: Remove dependency on Apache Xerces.

2018-05-01 Thread afs
Github user afs commented on the issue:

https://github.com/apache/jena/pull/413
  
It is from the released artifacts (specifically, the source artifact).

The Xerces `Version` class has `getVersion()` -> "2.11.0-jena".
The last Xerces dependency is in the JIRA ticket.
And the commit has the change to the dependency management in the POM.



---


[jira] [Commented] (JENA-1537) Remove requirement for Apache Xerces.

2018-05-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459649#comment-16459649
 ] 

ASF GitHub Bot commented on JENA-1537:
--

Github user stain commented on the issue:

https://github.com/apache/jena/pull/413
  
Would be good for the history if this pull request (or ideally git commit) 
referenced which particular commit/version of Xerces this was extracted from - 
even if we are "importing from ourself" as Xerces is ASF.


> Remove requirement for Apache Xerces.
> -
>
> Key: JENA-1537
> URL: https://issues.apache.org/jira/browse/JENA-1537
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Datatypes, RDF/XML
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>
> Apache Xerces is used for parsing and also for dataype support.
> We can switch to the JDK built-in XML parser (which is actually a forked 
> Xerces).
> For jena-core datatype, we can extract the necessary code from Xerces and put 
> it in Jena (repackaged).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena issue #413: JENA-1537: Remove dependency on Apache Xerces.

2018-05-01 Thread stain
Github user stain commented on the issue:

https://github.com/apache/jena/pull/413
  
Would be good for the history if this pull request (or ideally git commit) 
referenced which particular commit/version of Xerces this was extracted from - 
even if we are "importing from ourself" as Xerces is ASF.


---


[jira] [Commented] (JENA-1537) Remove requirement for Apache Xerces.

2018-05-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459646#comment-16459646
 ] 

ASF GitHub Bot commented on JENA-1537:
--

Github user stain commented on the issue:

https://github.com/apache/jena/pull/413
  
+1 - And as the extracted code is under `org.apache.jena.ext.*` so the 
modified OSGI export looks good.


> Remove requirement for Apache Xerces.
> -
>
> Key: JENA-1537
> URL: https://issues.apache.org/jira/browse/JENA-1537
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Datatypes, RDF/XML
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>
> Apache Xerces is used for parsing and also for dataype support.
> We can switch to the JDK built-in XML parser (which is actually a forked 
> Xerces).
> For jena-core datatype, we can extract the necessary code from Xerces and put 
> it in Jena (repackaged).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena issue #413: JENA-1537: Remove dependency on Apache Xerces.

2018-05-01 Thread stain
Github user stain commented on the issue:

https://github.com/apache/jena/pull/413
  
+1 - And as the extracted code is under `org.apache.jena.ext.*` so the 
modified OSGI export looks good.


---


[GitHub] jena issue #411: Include TDB2 in the jena-osgi module

2018-05-01 Thread acoburn
Github user acoburn commented on the issue:

https://github.com/apache/jena/pull/411
  
This deploys just fine in Karaf.


---


Re: Un-depending on Apache Xerces

2018-05-01 Thread Andy Seaborne

FYI:

Xerces 2.12.0 is out (as of April 21) though it has not made it to Maven 
central.


One thing of interest (to me) is whether it has a bugfixed version of 
Duration. JENA-1402


I still think we should un-depend on Xerces.

Andy

On 28/04/18 20:38, Andy Seaborne wrote:

JENA-1537

While the JDK does have a Xerces derived parser (it split off long 
before 2.11.0 and separately evolved), it is behind Java9 module 
"java.xml".


Jena uses Xerces 2.11.0 in two ways - for the datatypes (oaj.datatypes) 
and XML parsing (oaj.rdfxml.xmlinput - also known as ARP).  Both make 
internal use of Xerces.


The datatypes uses Xerces provide XSD datatypes including validation.

RDFXMLParser uses Xerces SAXParser and in a minor way some other stuff 
that isn't in java.xml.sax.


I've had a prototype-hack go at removing Xerces from Jena:
https://github.com/afs/jena-xerces

Datatypes:

* One feature omitted: XSDDatatype.loadUserDefined.

These functions parse XSD scheme datatype definitions. The 
implementation calls into the internal XML parsing which would not be 
legal in Java9 modules if using the JDK built-in parser. It seems to 
need a fairly complete XML parser engine.


We should consider dropping this feature.

XML Parsing:

* Looses the check on whether InputStreamReader or FileReader have the 
right encoding for the XML document. It hooks into an interface call 
that does not seem to be available in a standard SAX parser. (Shouldn't 
be using Readers anyway!)


     Andy


[jira] [Commented] (JENA-1537) Remove requirement for Apache Xerces.

2018-05-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-1537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16459528#comment-16459528
 ] 

ASF GitHub Bot commented on JENA-1537:
--

GitHub user afs opened a pull request:

https://github.com/apache/jena/pull/413

JENA-1537: Remove dependency on Apache Xerces.

The dependency of Apache Xerces 2.11.0 can be removed by extracting the 
necessary datatype validation code from Xerces and using the JDK XML parser. 
The Xerces release jars `xercesImpl-2.11.0.jar` and `xml-apis-1.4.01.jar` are 
no longer needed which has OSGi and JPMS advantages.

Impacts:

* Switch to using the JDK built-in XML parser (this affects any use of XML 
in an application using Jena)
* Drop `XSDDatatype.loadUserDefined` - the necessary code isn't available 
via JDK APIs
* Add package tree `org.apache.jena.ext.xerces` for the extracted datatype 
validation and regex code (a SPARQL corner case)
* Remove Xerces from `pom.xml` and `jena-core/pom.xml`
* Remove Xerces from `jena-osgi/pom.xml`
* No checking of encoding mismatches between reader and XML charset 
declaration (access to the XML Declaration is not available, at least not in 
the same way)

The extracted code is only slightly cleaned up to keep some degree of 
alignment with the original Xerces source. That code originates from a long 
time ago and has a lot of warnings which have been suppressed.

There will also need to be a change to NOTICE to reflect NOTICE from Xerces 
(it is already in the NOTICE for the distribution).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/afs/jena xerces

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/413.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #413


commit c9a7e646be45d44f26b44e51187487f12f182b89
Author: Andy Seaborne 
Date:   2018-04-30T17:26:05Z

JENA-1537: Remove dependency on Xerces. Import needed code




> Remove requirement for Apache Xerces.
> -
>
> Key: JENA-1537
> URL: https://issues.apache.org/jira/browse/JENA-1537
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: Datatypes, RDF/XML
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>
> Apache Xerces is used for parsing and also for dataype support.
> We can switch to the JDK built-in XML parser (which is actually a forked 
> Xerces).
> For jena-core datatype, we can extract the necessary code from Xerces and put 
> it in Jena (repackaged).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena pull request #413: JENA-1537: Remove dependency on Apache Xerces.

2018-05-01 Thread afs
GitHub user afs opened a pull request:

https://github.com/apache/jena/pull/413

JENA-1537: Remove dependency on Apache Xerces.

The dependency of Apache Xerces 2.11.0 can be removed by extracting the 
necessary datatype validation code from Xerces and using the JDK XML parser. 
The Xerces release jars `xercesImpl-2.11.0.jar` and `xml-apis-1.4.01.jar` are 
no longer needed which has OSGi and JPMS advantages.

Impacts:

* Switch to using the JDK built-in XML parser (this affects any use of XML 
in an application using Jena)
* Drop `XSDDatatype.loadUserDefined` - the necessary code isn't available 
via JDK APIs
* Add package tree `org.apache.jena.ext.xerces` for the extracted datatype 
validation and regex code (a SPARQL corner case)
* Remove Xerces from `pom.xml` and `jena-core/pom.xml`
* Remove Xerces from `jena-osgi/pom.xml`
* No checking of encoding mismatches between reader and XML charset 
declaration (access to the XML Declaration is not available, at least not in 
the same way)

The extracted code is only slightly cleaned up to keep some degree of 
alignment with the original Xerces source. That code originates from a long 
time ago and has a lot of warnings which have been suppressed.

There will also need to be a change to NOTICE to reflect NOTICE from Xerces 
(it is already in the NOTICE for the distribution).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/afs/jena xerces

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/413.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #413


commit c9a7e646be45d44f26b44e51187487f12f182b89
Author: Andy Seaborne 
Date:   2018-04-30T17:26:05Z

JENA-1537: Remove dependency on Xerces. Import needed code




---


[GitHub] jena issue #411: Include TDB2 in the jena-osgi module

2018-05-01 Thread christopher-johnson
Github user christopher-johnson commented on the issue:

https://github.com/apache/jena/pull/411
  
BND reads the import packages from the jar manifest, which would include 
the bundle requirements when deployed.  The question is whether a resolver 
(like Aether[1] ) can locate the coordinates for transitive dependencies in a 
provided scope (that are not explicitly referenced bundles)?  The pattern seems 
to be for the other provided scopes that these transitive dependency 
coordinates are identified explicitly (like` jsonld-java`).

This could be tested with something like this:
`karaf@root()> bundle:install mvn:org.apache.jena/jena-osgi:3.8.0-SNAPSHOT `

It might work, but it would have to read the pom.xml from `tbd2`  and then 
`trans-data` in separate resolution requests.  Also, not sure what happens if 
`trans-data`, etc. are deployed in OSGI without BND manifests.

  [1] http://wiki.eclipse.org/Aether/Transitive_Dependency_Resolution


---