Re: Sonar?
I feel like we had this discussion before... but could be in a different project. I ran SonarQube a few times against Jena's codebase in the past, but haven't done it in a while. They also offer a cloud service similar to Travis, called SonarCloud.io: https://sonarcloud.io/dashboard?id=org.apache.jena%3Ajena +1 from me Bruno On Tuesday, 16 April 2019, 1:54:30 am NZST, ajs6f wrote: I see that Apache has Sonar code analysis services at: https://builds.apache.org/analysis and I wasn't able to find Jena there. It would be interesting to see what Sonar says about the codebase. Of course it has to be taken with a grain of salt, but it's often useful. Before I investigate turning Sonar on for our codebase, any thoughts/objections/information? Am I missing anything (like we already have it turned on and I just didn't see it, which happens all the time)? ajs6f
Re: GeoSPARQL process
Thanks, Greg, this is very detailed. Once the new module is in and settled and we have a release or two to learn from, I will take a closer look at the usage of this code to understand how it differs from the kind of caching that occurs elsewhere in Jena. ajs6f > On Apr 14, 2019, at 6:21 AM, Greg Albiston wrote: > > Hi, > > There are a lot of permutations that a GeoSPARQL query could take which > can generate different values that may or may not be useful later on. > The general strategy is to keep what is generated for a while and if > isn't used then drop it. I don't think any of the Cache implementations > offer this or a suitable alternative. > > The expiring-map removes entries that haven't been reused after a period > of time. The duration to retain, rate of checking and maximum size can > all be set. It is used for three purposes: > > - The Geometry Wrapper object resulting from de-serialising the Geometry > Literals. > - The transformed Geometry Wrapper object from changing the spatial > reference system. > - The result of a spatial relation between two Geometry Literals to > avoid re-testing when Query Re-writing is applied. > > Most of the GeoSPARQL functions are between two Geometry Literals, so > one could be needed in the next iteration of the query and the other > could be needed later. > > The first purpose offers the biggest impact on performance as there are > additional de-serialising of the Geometry Literal while Jena is > processing the query. Complex shages, e.g. polygons, can be very costly > to extract. > > The second purpose offers most benefit when complex shapes need > transforming. These transformations may be needed again during this > query but not the next. e.g. dataset is in SRS A. Query 1 is a > comparison with a set of values in SRS B. Query 2 then is a comparison > with a set of values in SRS C. The results from Query 1 are useless and > may never be needed again. > > The third purpose is due to GeoSPARQL allowing query re-writing where > the Geometry Literal isn't specified and instead Features and Geometries > are used, so a single query could test the same spatial relations upto > four times depending on bindings. > > The expiring-map is allowed to fill up while the query is processing and > then drops entries that aren't reused (in batches) or once the query > completes. Once it is full, new entries are quickly rejected but space > is freed up later from those entries not being re-used. A user with a > small dataset can cache everything while a large dataset can choose to > constrain it to get some benefit from caching without consuming vast > junks of memory. > > I tried using the Apache Collections 4 LRUMap and it made performance > worse once it was filled (at a guess due to "one out, one in" and > constant searching). I only found one Java implementation of a time > based cache. It seemed excessive to have the whole dependency for one > class and it wasn't as flexible as required. > > Hopefully this clarifies why the expiring-map approach was adopted. > > Thanks, > > Greg > > On 10/04/2019 16:50, ajs6f wrote: >> Just out of curiosity, Greg, what is the functionality offered by Expiring >> Map that isn't offered by Jena's already-extant oaj.atlas.lib.Cache >> implementations? Is it the ability to manually trigger expirations? >> >> ajs6f >> >>> On Apr 9, 2019, at 12:02 PM, Andy Seaborne wrote: >>> >>> [INFO] | \- io.github.galbiston:expiring-map:jar:1.0.2:compile
Sonar?
I see that Apache has Sonar code analysis services at: https://builds.apache.org/analysis and I wasn't able to find Jena there. It would be interesting to see what Sonar says about the codebase. Of course it has to be taken with a grain of salt, but it's often useful. Before I investigate turning Sonar on for our codebase, any thoughts/objections/information? Am I missing anything (like we already have it turned on and I just didn't see it, which happens all the time)? ajs6f
Re: GeoSPARQL process
Hi Greg, Neither of those (jdom2,rdf-tables) are problems or need anythign does before we can release Jena with GeoSPARQL in it. They can be changed, or not, later. For timing: everyone is busy! We could release 3.11.0 ASAP (it's 4 months since 3.10.0) and immediately start on 3.12.0. I have some time to help with a 3.12 ... hoping to get it all done during May. Or we could just accept a delay to 3.11.0. It is the usual tension between perfect and timely with volunteer time! What needs to happen for geosparql is contribution: 1/ The code should be under java package org.apache.jena I suggested: io.github.galbiston.geosparql_jena => org.apache.jena.geosparql io.github.galbiston.geosparql_fuseki => org.apache.jena.fuseki.geosparql 2/ Modules: jena-geosparql jena-fuseki/jena-fuseki-geospatial 3/ A "pull request" from Greg. That makes it clear it is being contributed. then the project can: 4/ A NOTICE files for combined fuseki jars. It goes in the code tree at src/main/resources/META-INF and ends up in the shaded jar. I can help with that. 5/ POM files ... because the build is maven. (where the ones I put on gist OK?) It is not necessary for release to do every piece of tidying up like dependency management of versions in the top pom.xml. Andy On 14/04/2019 10:01, Greg Albiston wrote: Hi, - rdf-tables: This could be taken out if problematic. It is a CSV/TSV to RDF converter to provide another route to load geospatial data and was useful on another project. Given that jena-csv has been deprecated, there might not be the demand for its inclusion. - jdom2: This is only used for GML reading/writing. Could look into replacing with any XML library already used by Jena. Recently found that Apache SIS offers a GML parser so will investigate whether this can be used (would offer more flexibility and maintenance with the GML versions). Thanks, Greg On 10/04/2019 22:15, Andy Seaborne wrote: On 09/04/2019 17:02, Andy Seaborne wrote: Here are the new dependencies: [INFO] | +- org.apache.sis.core:sis-referencing:jar:0.8:compile [INFO] | | +- javax.measure:unit-api:jar:1.0:compile [INFO] | | \- org.opengis:geoapi:jar:3.0.1:compile via the org.apache.sis org.opengis:geoapi https://github.com/opengeospatial/geoapi A form of BSD license. javax.measure:unit-api https://github.com/unitsofmeasurement/unit-api BSD 3-clause. [INFO] | +- org.locationtech.jts:jts-core:jar:1.16.1:compile Eclipse Distribution License 1.0 EDL 1.0 is cat-A Treat like BSD - NOTICE entry when repackage needed. Link to http://www.eclipse.org/org/documents/edl-v10.php is acceptable. (generally, links instead of a copy are now considered acceptable). [INFO] | +- org.jdom:jdom2:jar:2.0.6:compile Modified BSD - it does not appear to be the problematic, old BSD 4-clause. Seems like 3-clause with clause 3 is split in two. Needs more eyes on it. https://issues.apache.org/jira/browse/LEGAL-204 It is the BSD 2-clause license with two extra clauses about name usage. NOTICE entry when repackage needed. https://github.com/hunterhacker/jdom/blob/master/LICENSE.txt [INFO] | \- io.github.galbiston:expiring-map:jar:1.0.2:compile [INFO] +- io.github.galbiston:rdf-tables:jar:1.0.4:compile AL2 :-) [INFO] | +- com.opencsv:opencsv:jar:3.9:runtime https://sourceforge.net/p/opencsv/source/ci/master/tree/LICENSE AL2 [INFO] +- com.beust:jcommander:jar:1.72:compile https://github.com/cbeust/jcommander AL2 Andy On 08/04/2019 17:29, Andy Seaborne wrote: > Added a POM file for jena-fuseki-geosparql to the same gist: > > https://gist.github.com/afs/c6c291812bbc96fe55ac64ecdd1edfe4 > > Had to do some exclusions on rdf-tables. > > Andy >
[jira] [Commented] (JENA-1702) InputStream for HTTP constructModel queries are not closed
[ https://issues.apache.org/jira/browse/JENA-1702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16817644#comment-16817644 ] Rob Vesse commented on JENA-1702: - [~trueg] No worries about the duplicate, in this case your bug actually describes the problem better than either the PR or the associated JIRA IMO I happened to remember that we'd had a similar bug recently and it took me quite a lot of searching to find the corresponding duplicate! > InputStream for HTTP constructModel queries are not closed > -- > > Key: JENA-1702 > URL: https://issues.apache.org/jira/browse/JENA-1702 > Project: Apache Jena > Issue Type: Bug > Components: ARQ >Affects Versions: Jena 3.10.0 >Reporter: Sebastian Trüg >Priority: Major > Fix For: Jena 3.11.0 > > > I am accessing a Fuseki installation as follows: > {code:java} > String uri = fusekiHost + "/" + dataset; > RDFConnection conn = RDFConnectionFuseki.create().destination(uri).build(); > try(RDFConnection conn = createConnection(dataModelDs)) { > Model model = conn.queryConstruct("construct { ?s ?p ?o . } where { ?s ?p > ?o . }"); > return model; > }{code} > The problem is that after 5 of these requests the Spring boot application > this code runs in blocks due to the PoolingHttpClientConnectionManager > running out of free routes. > After lots of debugging I noticed that the InputStream that is used to read > the data is never closed. > InputStreams from "select" requests are closed in QueryEngineHTTP::close due > to "retainedConnection" being set. > The same is not true for "construct" queries since their results are parsed > via RDFDataMgr which does not close the InputStream. > I do not understand the code well enough to propose a proper solution but > maybe just setting "retainedConnection" for construct queries would be > enough? Either way, I think the stream needs to be closed somehow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] [jena] Claudenw commented on issue #534: [WIP] Proof of concept for prometheus endpoint
Claudenw commented on issue #534: [WIP] Proof of concept for prometheus endpoint URL: https://github.com/apache/jena/pull/534#issuecomment-483125506 The two modules in question come under dual licenses: CC0 or BSD-2-clause. My understanding from the legal-discuss thread is that the CC0 does not require notification because it is public domain but it also does not confer patent rights, so we elected to use BSD-2-Clause and need to add the BSD-2-Clause license file and a the notation that it applies to the HdrHistogram and LatencyUtils packages. I assume the notation will be in the Notice files for the fuseki bundles that include the libraries mentioned.. Claude On Sun, Apr 14, 2019 at 1:56 PM Andy Seaborne wrote: > > From the legal-discuss@ emailing list, the decision seems to be we are using 2 clause BSD. Please confirm that here or dev@ so it is in public, on a jena list/forum. > > Then we need to execute on that for the two dependencies that will be bundled into shared jars because we unpack and repack them. > > — > You are receiving this because you were mentioned. > Reply to this email directly, view it on GitHub, or mute the thread. -- I like: Like Like - The likeliest place on the web LinkedIn: http://www.linkedin.com/in/claudewarren This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services