FINAL REMINDER: Apache EU Roadshow 2018 in Berlin next week!
Hello Apache Supporters and Enthusiasts This is a final reminder that our Apache EU Roadshow will be held in Berlin next week on 13th and 14th June 2018. We will have 28 different sessions running over 2 days that cover some great topics. So if you are interested in Microservices, Internet of Things (IoT), Cloud, Apache Tomcat or Apache Http Server then we have something for you. https://foss-backstage.de/sessions/apache-roadshow We will be co-located with FOSS Backstage, so if you are interested in topics such as incubator, the Apache Way, open source governance, legal, trademarks or simply open source communities then there will be something there for you too. You can attend any of talks, presentations and workshops from the Apache EU Roadshow or FOSS Backstage. You can find details of the combined Apache EU Roadshow and FOSS Backstage conference schedule below: https://foss-backstage.de/schedule?day=2018-06-13 Ticket prices go up on 8th June 2018 and we have a last minute discount code that anyone can use before the deadline: 15% discount code: ASF15_discount valid until June 7, 23:55 CET You can register at the following link: https://foss-backstage.de/tickets Our Apache booth and lounge will be open from 11th - 14th June for meetups, hacking or to simply relax between sessions. And we will be posting regular updates on social media throughout next week so please follow us on Twitter @ApacheCon Thank you for your continued support and we look forward to seeing you in Berlin! Thanks Sharan Foga, VP Apache Community Development http://apachecon.com/ PLEASE NOTE: You are receiving this message because you are subscribed to a user@ or dev@ list of one or more Apache Software Foundation projects.
[jira] [Commented] (JENA-1556) text:query multilingual enhancements
[ https://issues.apache.org/jira/browse/JENA-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503724#comment-16503724 ] Code Ferret commented on JENA-1556: --- [~kinow] The enhancements are neutral w.r.t. {{Analyzers}}. From my _very limited_ understanding, hiragana and katakana are one-to-one - for each hiragana sequence there is a unique katakana and vice versa. I don't know whether the same can be said for kanji in relation to hiragana/katakana. If there is a unique hiragana sequence for any kanji sequence (not the same number of codepoints, but a unique sequence none-the-less) then if you have an analyzer for each of these three: same code with different configurations as in the case of our Tibetan, Chinese and Sanskrit analyzers; or independent analyzers for each of the three, then *as long as they tokenize to a common underlying token encoding for indexing* the {{text:searchFor}} list of tags will serve to allow strings in kanji to retrieve kanas and so on. The hard work is likely in the analyzers. These enhancements serve to configure the use of the analyzers. For encodings that collapse distinct syllables or ideograms to a single representation, e.g., homophones as happens in pinyin, then the {{text:auxIndex}} is used to manage the use of analyzers that can perform the collapsed mapping when indexing triples, and then {{text:searchFor}} is used as needed to manage the query side. I am not familiar with whether Solr or Elasticsearch have similar features for selecting analyzers in these data-dependent ways -- RDF is rather unique in providing a rigorous framework for consistently representing language and encoding information and I think this provides interesting opportunities to couple Jena with Lucene along the lines of these enhancements. The implementation that I have ready for a PR is strictly {{TextIndexLucene}} and supporting code for assemblers and such. There is no involvement of the {{jena-text-es}} sub-project. What is _the selective filter_? > text:query multilingual enhancements > > > Key: JENA-1556 > URL: https://issues.apache.org/jira/browse/JENA-1556 > Project: Apache Jena > Issue Type: New Feature > Components: Text >Affects Versions: Jena 3.7.0 >Reporter: Code Ferret >Assignee: Code Ferret >Priority: Major > Labels: pull-request-available > > This issue proposes two related enhancements of Jena Text. These enhancements > have been implemented and a PR can be issued. > There are two multilingual search situations that we want to support: > # We want to be able to search in one encoding and retrieve results that may > have been entered in other encodings. For example, searching via Simplified > Chinese (Hans) and retrieving results that may have been entered in > Traditional Chinese (Hant) or Pinyin. This will simplify applications by > permitting encoding independent retrieval without additional layers of > transcoding and so on. It's all done under the covers in Lucene. > # We want to search with queries entered in a lossy, e.g., phonetic, > encoding and retrieve results entered with accurate encoding. For example, > searching vis Pinyin without diacritics and retrieving all possible Hans and > Hant triples. > The first situation arises when entering triples that include languages with > multiple encodings that for various reasons are not normalized to a single > encoding. In this situation we want to be able to retrieve appropriate result > sets without regard for the encodings used at the time that the triples were > inserted into the dataset. > There are several such languages of interest in our application: Chinese, > Tibetan, Sanskrit, Japanese and Korean. There are various Romanizations and > ideographic variants. > Encodings may not normalized when inserting triples for a variety of reasons. > A principle one is that the {{rdf:langString}} object often must be entered > in the same encoding that it occurs in some physical text that is being > catalogued. Another is that metadata may be imported from sources that use > different encoding conventions and we want to preserve that form. > The second situation arises as we want to provide simple support for phonetic > or other forms of lossy search at the time that triples are indexed directly > in the Lucene system. > To handle the first situation we introduce a {{text}} assembler predicate, > {{text:searchFor}}, that specifies a list of language tags that provides a > list of language variants that should be searched whenever a query string of > a given encoding (language tag) is used. For example, the following > {{text:TextIndexLucene/text:defineAnalyzers}} fragment : > {code:java} > [ text:addLang "bo" ; > text:searchFor ( "bo"
[jira] [Commented] (JENA-1557) Update OSGi imports for 3.8 release
[ https://issues.apache.org/jira/browse/JENA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503558#comment-16503558 ] ASF GitHub Bot commented on JENA-1557: -- GitHub user acoburn opened a pull request: https://github.com/apache/jena/pull/428 Update OSGi imports Resolves: JENA-1557 This adds an exclusion for `org.checkerframework.checker.*`, which is now pulled in by `jena-shaded-guava`. This also adds guava (jsonld-java depends on this) and commons-compress to the `features.xml` file. In addition, this reformats the `Import-Package` definition to make it easier to read (and maintain). You can merge this pull request into a Git repository by running: $ git pull https://github.com/acoburn/jena JENA-1557 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/428.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #428 commit 443d5697fc22c90172833695ec0bd01252e80a54 Author: Aaron Coburn Date: 2018-06-06T16:33:47Z Update OSGi imports Resolves: JENA-1557 > Update OSGi imports for 3.8 release > --- > > Key: JENA-1557 > URL: https://issues.apache.org/jira/browse/JENA-1557 > Project: Apache Jena > Issue Type: Improvement > Components: OSGi > Environment: Karaf 4.2.0 >Reporter: Aaron Coburn >Priority: Major > Fix For: Jena 3.8.0 > > > The updates to various dependencies make it hard to install Jena in an OSGi > container. It would be good to update the features.xml file and add some > exclusions to the jena-osgi import declaration. > In particular: > jsonld-java now depends on Guava/24.1-jre > jena now depends on commons-compress > jena-shaded-guava tries to import too many packages (e.g. > org.checkerframework) > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jena pull request #428: Update OSGi imports
GitHub user acoburn opened a pull request: https://github.com/apache/jena/pull/428 Update OSGi imports Resolves: JENA-1557 This adds an exclusion for `org.checkerframework.checker.*`, which is now pulled in by `jena-shaded-guava`. This also adds guava (jsonld-java depends on this) and commons-compress to the `features.xml` file. In addition, this reformats the `Import-Package` definition to make it easier to read (and maintain). You can merge this pull request into a Git repository by running: $ git pull https://github.com/acoburn/jena JENA-1557 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/428.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #428 commit 443d5697fc22c90172833695ec0bd01252e80a54 Author: Aaron Coburn Date: 2018-06-06T16:33:47Z Update OSGi imports Resolves: JENA-1557 ---
[jira] [Created] (JENA-1557) Update OSGi imports for 3.8 release
Aaron Coburn created JENA-1557: -- Summary: Update OSGi imports for 3.8 release Key: JENA-1557 URL: https://issues.apache.org/jira/browse/JENA-1557 Project: Apache Jena Issue Type: Improvement Components: OSGi Environment: Karaf 4.2.0 Reporter: Aaron Coburn Fix For: Jena 3.8.0 The updates to various dependencies make it hard to install Jena in an OSGi container. It would be good to update the features.xml file and add some exclusions to the jena-osgi import declaration. In particular: jsonld-java now depends on Guava/24.1-jre jena now depends on commons-compress jena-shaded-guava tries to import too many packages (e.g. org.checkerframework) -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JENA-1553) Can't Backup data - java.io.IOException: Illegal UTF-8: 0xFFFFFFB1
[ https://issues.apache.org/jira/browse/JENA-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503424#comment-16503424 ] Brian Mullen commented on JENA-1553: I think I identified the problem. I believe the TDB data is corrupt, I ran some queries and get the "Impossibly Large Object" error from the troubleshooting page. I'm assuming that's why I can't backup the data. > Can't Backup data - java.io.IOException: Illegal UTF-8: 0xFFB1 > -- > > Key: JENA-1553 > URL: https://issues.apache.org/jira/browse/JENA-1553 > Project: Apache Jena > Issue Type: Bug > Components: Jena > Environment: Ubuntu 16.04 running Docker. Running stain/jena-fuseki > from the official Docker Hub. >Reporter: Brian Mullen >Priority: Major > > Attempting to backup through Fuseki, TDB 500M+ triples, breaking with error: > > {code:java} > [2018-06-01 13:25:46] Log4jLoggerAdapter WARN Exception in backup > org.apache.jena.atlas.RuntimeIOException: java.io.IOException: Illegal UTF-8: > 0xFFB1 > at org.apache.jena.atlas.io.IO.exception(IO.java:233) > at org.apache.jena.atlas.io.BlockUTF8.exception(BlockUTF8.java:275) > at > org.apache.jena.atlas.io.BlockUTF8.toCharsBuffer(BlockUTF8.java:150) > at org.apache.jena.atlas.io.BlockUTF8.toChars(BlockUTF8.java:73) > at org.apache.jena.atlas.io.BlockUTF8.toString(BlockUTF8.java:95) > at > org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:101) > at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:105) > at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:81) > at > org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:186) > at > org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:111) > at > org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:70) > at > org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128) > at > org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:82) > at > org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:50) > at > org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67) > at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:107) > at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:84) > at > org.apache.jena.tdb.lib.TupleLib.lambda$convertToTriples$2(TupleLib.java:54) > at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270) > at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270) > at org.apache.jena.atlas.iterator.Iter.next(Iter.java:891) > at > org.apache.jena.riot.system.StreamOps.sendQuadsToStream(StreamOps.java:140) > at > org.apache.jena.riot.writer.NQuadsWriter.write$(NQuadsWriter.java:62) > at > org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:45) > at > org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:91) > at org.apache.jena.riot.RDFWriter.write$(RDFWriter.java:208) > at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:165) > at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:112) > at > org.apache.jena.riot.RDFWriterBuilder.output(RDFWriterBuilder.java:149) > at org.apache.jena.riot.RDFDataMgr.write$(RDFDataMgr.java:1269) > at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1162) > at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1153) > at org.apache.jena.fuseki.mgt.Backup.backup(Backup.java:115) > at org.apache.jena.fuseki.mgt.Backup.backup(Backup.java:75) > at > org.apache.jena.fuseki.mgt.ActionBackup$BackupTask.run(ActionBackup.java:58) > at > org.apache.jena.fuseki.async.AsyncPool.lambda$submit$0(AsyncPool.java:55) > at org.apache.jena.fuseki.async.AsyncTask.call(AsyncTask.java:100) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.IOException: Illegal UTF-8: 0xFFB1 > ... 40 more > [2018-06-01 13:25:46] Log4jLoggerAdapter INFO > Backup(/fuseki/backups/PDE_PROD_2018-06-01_13-24-00):2{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JENA-1556) text:query multilingual enhancements
[ https://issues.apache.org/jira/browse/JENA-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503197#comment-16503197 ] Bruno P. Kinoshita commented on JENA-1556: -- Oh, actually the selective filter is enough for te reo maori. But for Japanese I think this could still be useful? > text:query multilingual enhancements > > > Key: JENA-1556 > URL: https://issues.apache.org/jira/browse/JENA-1556 > Project: Apache Jena > Issue Type: New Feature > Components: Text >Affects Versions: Jena 3.7.0 >Reporter: Code Ferret >Assignee: Code Ferret >Priority: Major > Labels: pull-request-available > > This issue proposes two related enhancements of Jena Text. These enhancements > have been implemented and a PR can be issued. > There are two multilingual search situations that we want to support: > # We want to be able to search in one encoding and retrieve results that may > have been entered in other encodings. For example, searching via Simplified > Chinese (Hans) and retrieving results that may have been entered in > Traditional Chinese (Hant) or Pinyin. This will simplify applications by > permitting encoding independent retrieval without additional layers of > transcoding and so on. It's all done under the covers in Lucene. > # We want to search with queries entered in a lossy, e.g., phonetic, > encoding and retrieve results entered with accurate encoding. For example, > searching vis Pinyin without diacritics and retrieving all possible Hans and > Hant triples. > The first situation arises when entering triples that include languages with > multiple encodings that for various reasons are not normalized to a single > encoding. In this situation we want to be able to retrieve appropriate result > sets without regard for the encodings used at the time that the triples were > inserted into the dataset. > There are several such languages of interest in our application: Chinese, > Tibetan, Sanskrit, Japanese and Korean. There are various Romanizations and > ideographic variants. > Encodings may not normalized when inserting triples for a variety of reasons. > A principle one is that the {{rdf:langString}} object often must be entered > in the same encoding that it occurs in some physical text that is being > catalogued. Another is that metadata may be imported from sources that use > different encoding conventions and we want to preserve that form. > The second situation arises as we want to provide simple support for phonetic > or other forms of lossy search at the time that triples are indexed directly > in the Lucene system. > To handle the first situation we introduce a {{text}} assembler predicate, > {{text:searchFor}}, that specifies a list of language tags that provides a > list of language variants that should be searched whenever a query string of > a given encoding (language tag) is used. For example, the following > {{text:TextIndexLucene/text:defineAnalyzers}} fragment : > {code:java} > [ text:addLang "bo" ; > text:searchFor ( "bo" "bo-x-ewts" "bo-alalc97" ) ; > text:analyzer [ > a text:GenericAnalyzer ; > text:class "io.bdrc.lucene.bo.TibetanAnalyzer" ; > text:params ( > [ text:paramName "segmentInWords" ; > text:paramValue false ] > [ text:paramName "lemmatize" ; > text:paramValue true ] > [ text:paramName "filterChars" ; > text:paramValue false ] > [ text:paramName "inputMode" ; > text:paramValue "unicode" ] > [ text:paramName "stopFilename" ; > text:paramValue "" ] > ) > ] ; > ] > {code} > indicates that when using a search string such as "རྡོ་རྗེ་སྙིང་"@bo the > Lucene index should also be searched for matches tagged as {{bo-x-ewts}} and > {{bo-alalc97}}. > This is made possible by a Tibetan {{Analyzer}} that tokenizes strings in all > three encodings into Tibetan Unicode. This is feasible since the > {{bo-x-ewts}} and {{bo-alalc97}} encodings are one-to-one with Unicode > Tibetan. Since all fields with these language tags will have a common set of > indexed terms, i.e., Tibetan Unicode, it suffices to arrange for the query > analyzer to have access to the language tag for the query string along with > the various fields that need to be considered. > Supposing that the query is: > {code:java} > (?s ?sc ?lit) text:query ("rje"@bo-x-ewts) > {code} > Then the query formed in {{TextIndexLucene}} will be: > {code:java} > label_bo:rje label_bo-x-ewts:rje label_bo-alalc97:rje > {code} > which is translated using a suitable {{Analyzer}}, >
[jira] [Commented] (JENA-1556) text:query multilingual enhancements
[ https://issues.apache.org/jira/browse/JENA-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503181#comment-16503181 ] Bruno P. Kinoshita commented on JENA-1556: -- Great idea! For the first case I think it would help Japanese users too. {quote} For example, most Japanese emails are in ISO-2022-JP ("JIS encoding") and web pages in Shift-JIS and yet mobile phones in Japan usually use some form of Extended Unix Code. (https://en.wikipedia.org/wiki/Japanese_language_and_computers) {quote} I am more interested in the second situation. I wonder if that would work for Japanese and for Te Reo Maori. The word for yesterday can be written in Japanese as "昨日". But there are a few ways users could search for it: * 昨日 (kanji) * きのう (hiragana) * キノウ (katakana [bit weird I think, but...]) * kinou (wapuro romaji) * kinō (hepburn romaji, harder to type, but still common) * kino Though I am not sure if this would work after these new analyzers/assemblers are added? Would it work [~code-ferret]? The same would apply for Te Reo Maori I think, where in GIS systems places can be found to have been indexed as the English version (e.g. Taupo, name of a town and of a lake), or with the Maori name(s) ("Taupo" or "Taupō" for the town and lake, "Taupō-nui-a-Tia","Taupo-nui-a-Tia", or "Taupō nui a Tia" & etc for the city) Never used Lucene or ElasticSearch with Japanese nor with Maori. But happy to help testing it if there's a PR, and maybe also investigate & learn how it works for Lucene & ElasticSearch at the moment :-D > text:query multilingual enhancements > > > Key: JENA-1556 > URL: https://issues.apache.org/jira/browse/JENA-1556 > Project: Apache Jena > Issue Type: New Feature > Components: Text >Affects Versions: Jena 3.7.0 >Reporter: Code Ferret >Assignee: Code Ferret >Priority: Major > Labels: pull-request-available > > This issue proposes two related enhancements of Jena Text. These enhancements > have been implemented and a PR can be issued. > There are two multilingual search situations that we want to support: > # We want to be able to search in one encoding and retrieve results that may > have been entered in other encodings. For example, searching via Simplified > Chinese (Hans) and retrieving results that may have been entered in > Traditional Chinese (Hant) or Pinyin. This will simplify applications by > permitting encoding independent retrieval without additional layers of > transcoding and so on. It's all done under the covers in Lucene. > # We want to search with queries entered in a lossy, e.g., phonetic, > encoding and retrieve results entered with accurate encoding. For example, > searching vis Pinyin without diacritics and retrieving all possible Hans and > Hant triples. > The first situation arises when entering triples that include languages with > multiple encodings that for various reasons are not normalized to a single > encoding. In this situation we want to be able to retrieve appropriate result > sets without regard for the encodings used at the time that the triples were > inserted into the dataset. > There are several such languages of interest in our application: Chinese, > Tibetan, Sanskrit, Japanese and Korean. There are various Romanizations and > ideographic variants. > Encodings may not normalized when inserting triples for a variety of reasons. > A principle one is that the {{rdf:langString}} object often must be entered > in the same encoding that it occurs in some physical text that is being > catalogued. Another is that metadata may be imported from sources that use > different encoding conventions and we want to preserve that form. > The second situation arises as we want to provide simple support for phonetic > or other forms of lossy search at the time that triples are indexed directly > in the Lucene system. > To handle the first situation we introduce a {{text}} assembler predicate, > {{text:searchFor}}, that specifies a list of language tags that provides a > list of language variants that should be searched whenever a query string of > a given encoding (language tag) is used. For example, the following > {{text:TextIndexLucene/text:defineAnalyzers}} fragment : > {code:java} > [ text:addLang "bo" ; > text:searchFor ( "bo" "bo-x-ewts" "bo-alalc97" ) ; > text:analyzer [ > a text:GenericAnalyzer ; > text:class "io.bdrc.lucene.bo.TibetanAnalyzer" ; > text:params ( > [ text:paramName "segmentInWords" ; > text:paramValue false ] > [ text:paramName "lemmatize" ; > text:paramValue true ] > [ text:paramName "filterChars" ; > text:paramValue false ] >
Re: Towards Jena 3.8.0
+1 I'm adding a post-it to make sure I will help reviewing the release, and also going through the documentation to see if there's anything that needs updating (I remember at least one page I think I used 3.7.1 instead of 3.8 as the @since for a feature). Cheers Bruno From: Andy Seaborne To: "dev@jena.apache.org" Sent: Wednesday, 6 June 2018 11:32 PM Subject: Towards Jena 3.8.0 Let's look at a Jena 3.8.0 release - there are some significant new items in this release. JIRA report: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311220=12343042 --- Headlines: ** JENA-632: JSON templated SPARQL queries. http://jena.staging.apache.org/documentation/query/generate-json-from-sparql.html JENA-1542: Integrate Lucene index in transaction lifecycle (TDB1, TDB2). JENA-1550: Parallel bulk loader for TDB2 http://jena.staging.apache.org/documentation/tdb2/tdb2_cmds.html --- Dependency changes Removed: org.apache.xerces is no longer a dependency. Remove xercesImpl-2.11.0.jar Remove xml-apis-1.4.01.jar Added: Add Apache Commons Compress : commons-compress 1.17 https://lists.apache.org/thread.html/40ebcb548cd2cb6d404d150cc1c919605689cf242ae17fe9e47191b1@%3Cdev.jena.apache.org%3E Updated: jsonldjava 0.11.1 ==> 0.12.0 jackson 2.9.0 ==> 2.9.5 (addresses CVE-2018-5968) httpclient 4.5.3 ==> 4.5.5 httpcore 4.4.6 ==> 4.4.9 Shared guava update 21.0 ==> 21.1-jre Tests: com.jayway.awaitility::1.7.0 ==> org.awaitility.awaitility::3.1.0 org.objenesis:objenesis:jar: 2.1 ==> 2.6 Build: maven-surefire-plugin: 2.20.1 ==> 2.21.0 System changes: JENA-1537: Remove xerces JENA-1525 / Christopher Johnson Java Automatic Module Names JENA-1524 / Christopher Johnson Split package Note to repacking and deep system integrations: Package "org.apache.jena.system" was split across jars. There are now two packages: "org.apache.jena.sys" "org.apache.jena.system" and "sys" contains the system service loader code. JenaSystem.init() has migrated, with deprecated proxy, from "org.apache.jena.system" to "org.apache.jena.sys" ** NB ServiceLoader file change ** The ServiceLoader interface for system initialization is now: org.apache.jena.sys.JenaSubsystemLifecycle --- Other Changes JENA-1544: Consistent FROM/FROM NAMED naming handling JENA-1519: OpExt / Jeremy Coulon JENA-1488: SelectiveFoldingFilter for jena-text
Towards Jena 3.8.0
Let's look at a Jena 3.8.0 release - there are some significant new items in this release. JIRA report: https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311220=12343042 --- Headlines: ** JENA-632: JSON templated SPARQL queries. http://jena.staging.apache.org/documentation/query/generate-json-from-sparql.html JENA-1542: Integrate Lucene index in transaction lifecycle (TDB1, TDB2). JENA-1550: Parallel bulk loader for TDB2 http://jena.staging.apache.org/documentation/tdb2/tdb2_cmds.html --- Dependency changes Removed: org.apache.xerces is no longer a dependency. Remove xercesImpl-2.11.0.jar Remove xml-apis-1.4.01.jar Added: Add Apache Commons Compress : commons-compress 1.17 https://lists.apache.org/thread.html/40ebcb548cd2cb6d404d150cc1c919605689cf242ae17fe9e47191b1@%3Cdev.jena.apache.org%3E Updated: jsonldjava 0.11.1 ==> 0.12.0 jackson 2.9.0 ==> 2.9.5 (addresses CVE-2018-5968) httpclient 4.5.3 ==> 4.5.5 httpcore 4.4.6 ==> 4.4.9 Shared guava update 21.0 ==> 21.1-jre Tests: com.jayway.awaitility::1.7.0 ==> org.awaitility.awaitility::3.1.0 org.objenesis:objenesis:jar: 2.1 ==> 2.6 Build: maven-surefire-plugin: 2.20.1 ==> 2.21.0 System changes: JENA-1537: Remove xerces JENA-1525 / Christopher Johnson Java Automatic Module Names JENA-1524 / Christopher Johnson Split package Note to repacking and deep system integrations: Package "org.apache.jena.system" was split across jars. There are now two packages: "org.apache.jena.sys" "org.apache.jena.system" and "sys" contains the system service loader code. JenaSystem.init() has migrated, with deprecated proxy, from "org.apache.jena.system" to "org.apache.jena.sys" ** NB ServiceLoader file change ** The ServiceLoader interface for system initialization is now: org.apache.jena.sys.JenaSubsystemLifecycle --- Other Changes JENA-1544: Consistent FROM/FROM NAMED naming handling JENA-1519: OpExt / Jeremy Coulon JENA-1488: SelectiveFoldingFilter for jena-text
[jira] [Closed] (JENA-1541) Jena Eyeball - ant test fails with TestCase Error
[ https://issues.apache.org/jira/browse/JENA-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Seaborne closed JENA-1541. --- > Jena Eyeball - ant test fails with TestCase Error > - > > Key: JENA-1541 > URL: https://issues.apache.org/jira/browse/JENA-1541 > Project: Apache Jena > Issue Type: Question > Components: Eyeball, Jena >Reporter: Edward >Priority: Major > Labels: ant, eyeball, fail, jena, test > > Hello, > I'm trying to get Jena Eyeball running. I wanted to do the Tutorial on > [https://jena.apache.org/documentation/tools/eyeball-getting-started.html] > but it failed with following error: > {code:java} > [junit] Testcase: warning(junit.framework.TestSuite$1): FAILED > [junit] Class > com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no > public constructor TestCase(String name) or TestCase() > [junit] junit.framework.AssertionFailedError: Class > com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no > public constructor TestCase(String name) or TestCase() > [junit] > [junit] > BUILD FAILED > /**/Downloads/eyeball-2.3/build.xml:146: Test > com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector failed > {code} > The tutorial says if the ant test doesn't pass, I should file a Jira Issue. > So that's what I am doing. > Help would be much appreciated! > > Greetings, > Edward -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JENA-1541) Jena Eyeball - ant test fails with TestCase Error
[ https://issues.apache.org/jira/browse/JENA-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503158#comment-16503158 ] Andy Seaborne commented on JENA-1541: - No one has stepped forward to offer to update Eyeball so it remains an unreleased codebase which has not been updated to Apache Jena releases. With no offers of support and maintenance, all there is the original source code. I wish there were more to offer, but if no-one volunteers to support and maintain the code, there is little that can be done. Anyone reading this is able to offer help. The documentation has been unlinked from the tools page and the status updated. > Jena Eyeball - ant test fails with TestCase Error > - > > Key: JENA-1541 > URL: https://issues.apache.org/jira/browse/JENA-1541 > Project: Apache Jena > Issue Type: Question > Components: Eyeball, Jena >Reporter: Edward >Priority: Major > Labels: ant, eyeball, fail, jena, test > > Hello, > I'm trying to get Jena Eyeball running. I wanted to do the Tutorial on > [https://jena.apache.org/documentation/tools/eyeball-getting-started.html] > but it failed with following error: > {code:java} > [junit] Testcase: warning(junit.framework.TestSuite$1): FAILED > [junit] Class > com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no > public constructor TestCase(String name) or TestCase() > [junit] junit.framework.AssertionFailedError: Class > com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no > public constructor TestCase(String name) or TestCase() > [junit] > [junit] > BUILD FAILED > /**/Downloads/eyeball-2.3/build.xml:146: Test > com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector failed > {code} > The tutorial says if the ant test doesn't pass, I should file a Jira Issue. > So that's what I am doing. > Help would be much appreciated! > > Greetings, > Edward -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (JENA-1541) Jena Eyeball - ant test fails with TestCase Error
[ https://issues.apache.org/jira/browse/JENA-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Seaborne resolved JENA-1541. - Resolution: Won't Fix > Jena Eyeball - ant test fails with TestCase Error > - > > Key: JENA-1541 > URL: https://issues.apache.org/jira/browse/JENA-1541 > Project: Apache Jena > Issue Type: Question > Components: Eyeball, Jena >Reporter: Edward >Priority: Major > Labels: ant, eyeball, fail, jena, test > > Hello, > I'm trying to get Jena Eyeball running. I wanted to do the Tutorial on > [https://jena.apache.org/documentation/tools/eyeball-getting-started.html] > but it failed with following error: > {code:java} > [junit] Testcase: warning(junit.framework.TestSuite$1): FAILED > [junit] Class > com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no > public constructor TestCase(String name) or TestCase() > [junit] junit.framework.AssertionFailedError: Class > com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no > public constructor TestCase(String name) or TestCase() > [junit] > [junit] > BUILD FAILED > /**/Downloads/eyeball-2.3/build.xml:146: Test > com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector failed > {code} > The tutorial says if the ant test doesn't pass, I should file a Jira Issue. > So that's what I am doing. > Help would be much appreciated! > > Greetings, > Edward -- This message was sent by Atlassian JIRA (v7.6.3#76005)
Re: Jena Eyeball [was: JENA-1541]
Done and JIRA closed "Won't fix". On 23/05/18 15:26, ajs6f wrote: +1. ajs6f On May 21, 2018, at 1:44 PM, Bruno P. Kinoshita wrote: Sounds good to me +1 Sent from Yahoo Mail on Android On Tue, 22 May 2018 at 5:08, Andy Seaborne wrote: To pull this into a plan ... The Eyeball code is already elsewhere (in Apache SVN - keep that link). Proposal: * Unlink from the tools page. * Upgarde the notice on all Eyeball pages to "This page is historical for information only - there is no Apache release of Eyeball". * Remove links to JIRA, SF distribution (broken anyway) and mention of help/support. PMC - OK? Andy On 11/05/18 14:10, Chris Dollin wrote: (Sorry for previous empty reply, hit wrong button) I'm the original developer of Eyeball. It is several years since any work has been done on it, and I suspect that it is now obsolete, that is, that there other tools with equivalent functionality (though I don't know what they are). Putting it somewhere designated "Attic" with a red label saying "use at your own risk" seems reasonable. Chris On 11 May 2018 at 13:57, Chris Dollin wrote: On 11 May 2018 at 13:11, Andy Seaborne wrote: Edward found Eyeball via: https://jena.apache.org/documentation/tools/eyeball-getting-started.html Eyeball is mentioned: documentation/tools/index.mdtext documentation/tools/eyeball-guide.mdtext documentation/tools/eyeball-getting-started.mdtext documentation/tools/eyeball-manual.mdtext The tools page has schemagen and eyeball on it. eyeball-getting-started says "file a JIRA". What should we do? If it is not released, do we retire it from the documentation? Andy On 04/05/18 11:39, Edward (JIRA) wrote: Edward created JENA-1541: Summary: Jena Eyeball - ant test fails with TestCase Error Key: JENA-1541 URL: https://issues.apache.org/jira/browse/JENA-1541 Project: Apache Jena Issue Type: Question Components: Eyeball, Jena Reporter: Edward Hello, I'm trying to get Jena Eyeball running. I wanted to do the Tutorial on [here|[https://jena.apache.org/documentation/tools/eyeball-g etting-started.html],] but it failed with following error: {code:java} [junit] Testcase: warning(junit.framework.TestSuite$1):FAILED [junit] Class com.hp.hpl.jena.eyeball.inspec tors.test.TestMoreOwlSyntaxInspector has no public constructor TestCase(String name) or TestCase() [junit] junit.framework.AssertionFailedError: Class com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no public constructor TestCase(String name) or TestCase() [junit] [junit] BUILD FAILED /**/Downloads/eyeball-2.3/build.xml:146: Test com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector failed {code} The tutorial says if the ant test doesn't pass, I should file a Jira Issue. So that's what I am doing. Help would be much appreciated! Greetings, Edward -- This message was sent by Atlassian JIRA (v7.6.3#76005) -- "What I don't understand is this ..." Trevor Chaplin, /The Beiderbeck Affair/ Epimorphics Ltd, http://www.epimorphics.com Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20 6PT Epimorphics Ltd. is a limited company registered in England (number 7016688)
[jira] [Resolved] (JENA-1552) Bulk loader for TDB2 (phased loading)
[ https://issues.apache.org/jira/browse/JENA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andy Seaborne resolved JENA-1552. - Resolution: Done Fix Version/s: Jena 3.8.0 > Bulk loader for TDB2 (phased loading) > - > > Key: JENA-1552 > URL: https://issues.apache.org/jira/browse/JENA-1552 > Project: Apache Jena > Issue Type: Improvement > Components: TDB2 >Reporter: Andy Seaborne >Assignee: Andy Seaborne >Priority: Major > Fix For: Jena 3.8.0 > > > Following on from JENA-1550, this ticket is for phased loading which combined > features of the sequential loader and the parallel loader. > When building all the persistent datastructures (parallel loader), the work > on different indexes at the same time is competing for hardware resources, > RAM and I/O bandwidth. As the size to load grows, this becomes a noticeable > slowdown. > The sequential loader is the other extreme of the design spectrum. It does > work on one index at a time so as to maximize caching efficiency. > Phased loading has parallel operation per phase and splits work into subsets > of indexes. > At 200m and loading to rotational disk, an experimental phased loader working > with 2 indexes at a time, starts to become faster than parallel on the same > hardware as used for the [figures in > JENA-1550|https://issues.apache.org/jira/browse/JENA-1550#comment-16484269] > (57K parallel, 70K phased). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jena pull request #426: JENA-1552: Phased loader
Github user asfgit closed the pull request at: https://github.com/apache/jena/pull/426 ---
[jira] [Commented] (JENA-1552) Bulk loader for TDB2 (phased loading)
[ https://issues.apache.org/jira/browse/JENA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503132#comment-16503132 ] ASF subversion and git services commented on JENA-1552: --- Commit 2934c5506f9caa237868fbbb5aabf247106ec16b in jena's branch refs/heads/master from [~an...@apache.org] [ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=2934c55 ] JENA-1552: Phased loader > Bulk loader for TDB2 (phased loading) > - > > Key: JENA-1552 > URL: https://issues.apache.org/jira/browse/JENA-1552 > Project: Apache Jena > Issue Type: Improvement > Components: TDB2 >Reporter: Andy Seaborne >Assignee: Andy Seaborne >Priority: Major > > Following on from JENA-1550, this ticket is for phased loading which combined > features of the sequential loader and the parallel loader. > When building all the persistent datastructures (parallel loader), the work > on different indexes at the same time is competing for hardware resources, > RAM and I/O bandwidth. As the size to load grows, this becomes a noticeable > slowdown. > The sequential loader is the other extreme of the design spectrum. It does > work on one index at a time so as to maximize caching efficiency. > Phased loading has parallel operation per phase and splits work into subsets > of indexes. > At 200m and loading to rotational disk, an experimental phased loader working > with 2 indexes at a time, starts to become faster than parallel on the same > hardware as used for the [figures in > JENA-1550|https://issues.apache.org/jira/browse/JENA-1550#comment-16484269] > (57K parallel, 70K phased). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JENA-1552) Bulk loader for TDB2 (phased loading)
[ https://issues.apache.org/jira/browse/JENA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503133#comment-16503133 ] ASF GitHub Bot commented on JENA-1552: -- Github user asfgit closed the pull request at: https://github.com/apache/jena/pull/426 > Bulk loader for TDB2 (phased loading) > - > > Key: JENA-1552 > URL: https://issues.apache.org/jira/browse/JENA-1552 > Project: Apache Jena > Issue Type: Improvement > Components: TDB2 >Reporter: Andy Seaborne >Assignee: Andy Seaborne >Priority: Major > > Following on from JENA-1550, this ticket is for phased loading which combined > features of the sequential loader and the parallel loader. > When building all the persistent datastructures (parallel loader), the work > on different indexes at the same time is competing for hardware resources, > RAM and I/O bandwidth. As the size to load grows, this becomes a noticeable > slowdown. > The sequential loader is the other extreme of the design spectrum. It does > work on one index at a time so as to maximize caching efficiency. > Phased loading has parallel operation per phase and splits work into subsets > of indexes. > At 200m and loading to rotational disk, an experimental phased loader working > with 2 indexes at a time, starts to become faster than parallel on the same > hardware as used for the [figures in > JENA-1550|https://issues.apache.org/jira/browse/JENA-1550#comment-16484269] > (57K parallel, 70K phased). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (JENA-1556) text:query multilingual enhancements
[ https://issues.apache.org/jira/browse/JENA-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503116#comment-16503116 ] Osma Suominen commented on JENA-1556: - Whoa, sounds like quite advanced functionality for jena-text! Do you know if there is anything similar in Solr or Elasticsearch? ISTR that at least with Solr you can define fields that are based on transformations of other fields. I don't object to this if you're willing to do a PR, of course :) > text:query multilingual enhancements > > > Key: JENA-1556 > URL: https://issues.apache.org/jira/browse/JENA-1556 > Project: Apache Jena > Issue Type: New Feature > Components: Text >Affects Versions: Jena 3.7.0 >Reporter: Code Ferret >Assignee: Code Ferret >Priority: Major > Labels: pull-request-available > > This issue proposes two related enhancements of Jena Text. These enhancements > have been implemented and a PR can be issued. > There are two multilingual search situations that we want to support: > # We want to be able to search in one encoding and retrieve results that may > have been entered in other encodings. For example, searching via Simplified > Chinese (Hans) and retrieving results that may have been entered in > Traditional Chinese (Hant) or Pinyin. This will simplify applications by > permitting encoding independent retrieval without additional layers of > transcoding and so on. It's all done under the covers in Lucene. > # We want to search with queries entered in a lossy, e.g., phonetic, > encoding and retrieve results entered with accurate encoding. For example, > searching vis Pinyin without diacritics and retrieving all possible Hans and > Hant triples. > The first situation arises when entering triples that include languages with > multiple encodings that for various reasons are not normalized to a single > encoding. In this situation we want to be able to retrieve appropriate result > sets without regard for the encodings used at the time that the triples were > inserted into the dataset. > There are several such languages of interest in our application: Chinese, > Tibetan, Sanskrit, Japanese and Korean. There are various Romanizations and > ideographic variants. > Encodings may not normalized when inserting triples for a variety of reasons. > A principle one is that the {{rdf:langString}} object often must be entered > in the same encoding that it occurs in some physical text that is being > catalogued. Another is that metadata may be imported from sources that use > different encoding conventions and we want to preserve that form. > The second situation arises as we want to provide simple support for phonetic > or other forms of lossy search at the time that triples are indexed directly > in the Lucene system. > To handle the first situation we introduce a {{text}} assembler predicate, > {{text:searchFor}}, that specifies a list of language tags that provides a > list of language variants that should be searched whenever a query string of > a given encoding (language tag) is used. For example, the following > {{text:TextIndexLucene/text:defineAnalyzers}} fragment : > {code:java} > [ text:addLang "bo" ; > text:searchFor ( "bo" "bo-x-ewts" "bo-alalc97" ) ; > text:analyzer [ > a text:GenericAnalyzer ; > text:class "io.bdrc.lucene.bo.TibetanAnalyzer" ; > text:params ( > [ text:paramName "segmentInWords" ; > text:paramValue false ] > [ text:paramName "lemmatize" ; > text:paramValue true ] > [ text:paramName "filterChars" ; > text:paramValue false ] > [ text:paramName "inputMode" ; > text:paramValue "unicode" ] > [ text:paramName "stopFilename" ; > text:paramValue "" ] > ) > ] ; > ] > {code} > indicates that when using a search string such as "རྡོ་རྗེ་སྙིང་"@bo the > Lucene index should also be searched for matches tagged as {{bo-x-ewts}} and > {{bo-alalc97}}. > This is made possible by a Tibetan {{Analyzer}} that tokenizes strings in all > three encodings into Tibetan Unicode. This is feasible since the > {{bo-x-ewts}} and {{bo-alalc97}} encodings are one-to-one with Unicode > Tibetan. Since all fields with these language tags will have a common set of > indexed terms, i.e., Tibetan Unicode, it suffices to arrange for the query > analyzer to have access to the language tag for the query string along with > the various fields that need to be considered. > Supposing that the query is: > {code:java} > (?s ?sc ?lit) text:query ("rje"@bo-x-ewts) > {code} > Then the query formed in {{TextIndexLucene}}