FINAL REMINDER: Apache EU Roadshow 2018 in Berlin next week!

2018-06-06 Thread sharan

Hello Apache Supporters and Enthusiasts

This is a final reminder that our Apache EU Roadshow will be held in 
Berlin next week on 13th and 14th June 2018. We will have 28 different 
sessions running over 2 days that cover some great topics. So if you are 
interested in Microservices, Internet of Things (IoT), Cloud, Apache 
Tomcat or Apache Http Server then we have something for you.


https://foss-backstage.de/sessions/apache-roadshow

We will be co-located with FOSS Backstage, so if you are interested in 
topics such as incubator, the Apache Way, open source governance, legal, 
trademarks or simply open source communities then there will be 
something there for you too.  You can attend any of talks, presentations 
and workshops from the Apache EU Roadshow or FOSS Backstage.


You can find details of the combined Apache EU Roadshow and FOSS 
Backstage conference schedule below:


https://foss-backstage.de/schedule?day=2018-06-13

Ticket prices go up on 8th June 2018 and we have a last minute discount 
code that anyone can use before the deadline:


15% discount code: ASF15_discount
valid until June 7, 23:55 CET

You can register at the following link:

https://foss-backstage.de/tickets

Our Apache booth and lounge will be open from 11th - 14th June for 
meetups, hacking or to simply relax between sessions. And we will be 
posting regular updates on social media throughout next week so please 
follow us on Twitter @ApacheCon


Thank you for your continued support and we look forward to seeing you 
in Berlin!


Thanks
Sharan Foga, VP Apache Community Development

http://apachecon.com/

PLEASE NOTE: You are receiving this message because you are subscribed 
to a user@ or dev@ list of one or more Apache Software Foundation projects.





[jira] [Commented] (JENA-1556) text:query multilingual enhancements

2018-06-06 Thread Code Ferret (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503724#comment-16503724
 ] 

Code Ferret commented on JENA-1556:
---

[~kinow] The enhancements are neutral w.r.t. {{Analyzers}}. From my _very 
limited_ understanding, hiragana and katakana are one-to-one - for each 
hiragana sequence there is a unique katakana and vice versa. I don't know 
whether the same can be said for kanji in relation to hiragana/katakana. If 
there is a unique hiragana sequence for any kanji sequence (not the same number 
of codepoints, but a unique sequence none-the-less) then if you have an 
analyzer for each of these three: same code with different configurations as in 
the case of our Tibetan, Chinese and Sanskrit analyzers; or independent 
analyzers for each of the three, then *as long as they tokenize to a common 
underlying token encoding for indexing* the {{text:searchFor}} list of tags 
will serve to allow strings in kanji to retrieve kanas and so on.

The hard work is likely in the analyzers. These enhancements serve to configure 
the use of the analyzers.

For encodings that collapse distinct syllables or ideograms to a single 
representation, e.g., homophones as happens in pinyin, then the 
{{text:auxIndex}} is used to manage the use of analyzers that can perform the 
collapsed mapping when indexing triples, and then {{text:searchFor}} is used as 
needed to manage the query side.

I am not familiar with whether Solr or Elasticsearch have similar features for 
selecting analyzers in these data-dependent ways -- RDF is rather unique in 
providing a rigorous framework for consistently representing language and 
encoding information and I think this provides interesting opportunities to 
couple Jena with Lucene along the lines of these enhancements. 

The implementation that I have ready for a PR is strictly {{TextIndexLucene}} 
and supporting code for assemblers and such. There is no involvement of the 
{{jena-text-es}} sub-project.

What is _the selective filter_?

> text:query multilingual enhancements
> 
>
> Key: JENA-1556
> URL: https://issues.apache.org/jira/browse/JENA-1556
> Project: Apache Jena
>  Issue Type: New Feature
>  Components: Text
>Affects Versions: Jena 3.7.0
>Reporter: Code Ferret
>Assignee: Code Ferret
>Priority: Major
>  Labels: pull-request-available
>
> This issue proposes two related enhancements of Jena Text. These enhancements 
> have been implemented and a PR can be issued. 
> There are two multilingual search situations that we want to support:
>  # We want to be able to search in one encoding and retrieve results that may 
> have been entered in other encodings. For example, searching via Simplified 
> Chinese (Hans) and retrieving results that may have been entered in 
> Traditional Chinese (Hant) or Pinyin. This will simplify applications by 
> permitting encoding independent retrieval without additional layers of 
> transcoding and so on. It's all done under the covers in Lucene.
>  # We want to search with queries entered in a lossy, e.g., phonetic, 
> encoding and retrieve results entered with accurate encoding. For example, 
> searching vis Pinyin without diacritics and retrieving all possible Hans and 
> Hant triples.
> The first situation arises when entering triples that include languages with 
> multiple encodings that for various reasons are not normalized to a single 
> encoding. In this situation we want to be able to retrieve appropriate result 
> sets without regard for the encodings used at the time that the triples were 
> inserted into the dataset.
> There are several such languages of interest in our application: Chinese, 
> Tibetan, Sanskrit, Japanese and Korean. There are various Romanizations and 
> ideographic variants.
> Encodings may not normalized when inserting triples for a variety of reasons. 
> A principle one is that the {{rdf:langString}} object often must be entered 
> in the same encoding that it occurs in some physical text that is being 
> catalogued. Another is that metadata may be imported from sources that use 
> different encoding conventions and we want to preserve that form.
> The second situation arises as we want to provide simple support for phonetic 
> or other forms of lossy search at the time that triples are indexed directly 
> in the Lucene system.
> To handle the first situation we introduce a {{text}} assembler predicate, 
> {{text:searchFor}}, that specifies a list of language tags that provides a 
> list of language variants that should be searched whenever a query string of 
> a given encoding (language tag) is used. For example, the following 
> {{text:TextIndexLucene/text:defineAnalyzers}} fragment :
> {code:java}
> [ text:addLang "bo" ; 
>   text:searchFor ( "bo" 

[jira] [Commented] (JENA-1557) Update OSGi imports for 3.8 release

2018-06-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503558#comment-16503558
 ] 

ASF GitHub Bot commented on JENA-1557:
--

GitHub user acoburn opened a pull request:

https://github.com/apache/jena/pull/428

Update OSGi imports

Resolves: JENA-1557

This adds an exclusion for `org.checkerframework.checker.*`, which is now 
pulled in by `jena-shaded-guava`. This also adds guava (jsonld-java depends on 
this) and commons-compress to the `features.xml` file.

In addition, this reformats the `Import-Package` definition to make it 
easier to read (and maintain).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/acoburn/jena JENA-1557

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #428


commit 443d5697fc22c90172833695ec0bd01252e80a54
Author: Aaron Coburn 
Date:   2018-06-06T16:33:47Z

Update OSGi imports

Resolves: JENA-1557




> Update OSGi imports for 3.8 release
> ---
>
> Key: JENA-1557
> URL: https://issues.apache.org/jira/browse/JENA-1557
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: OSGi
> Environment: Karaf 4.2.0
>Reporter: Aaron Coburn
>Priority: Major
> Fix For: Jena 3.8.0
>
>
> The updates to various dependencies make it hard to install Jena in an OSGi 
> container. It would be good to update the features.xml file and add some 
> exclusions to the jena-osgi import declaration.
> In particular:
> jsonld-java now depends on Guava/24.1-jre
> jena now depends on commons-compress
> jena-shaded-guava tries to import too many packages (e.g. 
> org.checkerframework)
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena pull request #428: Update OSGi imports

2018-06-06 Thread acoburn
GitHub user acoburn opened a pull request:

https://github.com/apache/jena/pull/428

Update OSGi imports

Resolves: JENA-1557

This adds an exclusion for `org.checkerframework.checker.*`, which is now 
pulled in by `jena-shaded-guava`. This also adds guava (jsonld-java depends on 
this) and commons-compress to the `features.xml` file.

In addition, this reformats the `Import-Package` definition to make it 
easier to read (and maintain).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/acoburn/jena JENA-1557

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/428.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #428


commit 443d5697fc22c90172833695ec0bd01252e80a54
Author: Aaron Coburn 
Date:   2018-06-06T16:33:47Z

Update OSGi imports

Resolves: JENA-1557




---


[jira] [Created] (JENA-1557) Update OSGi imports for 3.8 release

2018-06-06 Thread Aaron Coburn (JIRA)
Aaron Coburn created JENA-1557:
--

 Summary: Update OSGi imports for 3.8 release
 Key: JENA-1557
 URL: https://issues.apache.org/jira/browse/JENA-1557
 Project: Apache Jena
  Issue Type: Improvement
  Components: OSGi
 Environment: Karaf 4.2.0
Reporter: Aaron Coburn
 Fix For: Jena 3.8.0


The updates to various dependencies make it hard to install Jena in an OSGi 
container. It would be good to update the features.xml file and add some 
exclusions to the jena-osgi import declaration.

In particular:

jsonld-java now depends on Guava/24.1-jre
jena now depends on commons-compress
jena-shaded-guava tries to import too many packages (e.g. org.checkerframework)

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1553) Can't Backup data - java.io.IOException: Illegal UTF-8: 0xFFFFFFB1

2018-06-06 Thread Brian Mullen (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503424#comment-16503424
 ] 

Brian Mullen commented on JENA-1553:


I think I identified the problem.  I believe the TDB data is corrupt, I ran 
some queries and get the "Impossibly Large Object" error from the 
troubleshooting page. I'm assuming that's why I can't backup the data.

> Can't Backup data - java.io.IOException: Illegal UTF-8: 0xFFB1
> --
>
> Key: JENA-1553
> URL: https://issues.apache.org/jira/browse/JENA-1553
> Project: Apache Jena
>  Issue Type: Bug
>  Components: Jena
> Environment: Ubuntu 16.04 running Docker.  Running stain/jena-fuseki 
> from the official Docker Hub.
>Reporter: Brian Mullen
>Priority: Major
>
> Attempting to backup through Fuseki, TDB 500M+ triples, breaking with error:  
>  
> {code:java}
> [2018-06-01 13:25:46] Log4jLoggerAdapter WARN  Exception in backup
> org.apache.jena.atlas.RuntimeIOException: java.io.IOException: Illegal UTF-8: 
> 0xFFB1
>     at org.apache.jena.atlas.io.IO.exception(IO.java:233)
>     at org.apache.jena.atlas.io.BlockUTF8.exception(BlockUTF8.java:275)
>     at 
> org.apache.jena.atlas.io.BlockUTF8.toCharsBuffer(BlockUTF8.java:150)
>     at org.apache.jena.atlas.io.BlockUTF8.toChars(BlockUTF8.java:73)
>     at org.apache.jena.atlas.io.BlockUTF8.toString(BlockUTF8.java:95)
>     at 
> org.apache.jena.tdb.store.nodetable.NodecSSE.decode(NodecSSE.java:101)
>     at org.apache.jena.tdb.lib.NodeLib.decode(NodeLib.java:105)
>     at org.apache.jena.tdb.lib.NodeLib.fetchDecode(NodeLib.java:81)
>     at 
> org.apache.jena.tdb.store.nodetable.NodeTableNative.readNodeFromTable(NodeTableNative.java:186)
>     at 
> org.apache.jena.tdb.store.nodetable.NodeTableNative._retrieveNodeByNodeId(NodeTableNative.java:111)
>     at 
> org.apache.jena.tdb.store.nodetable.NodeTableNative.getNodeForNodeId(NodeTableNative.java:70)
>     at 
> org.apache.jena.tdb.store.nodetable.NodeTableCache._retrieveNodeByNodeId(NodeTableCache.java:128)
>     at 
> org.apache.jena.tdb.store.nodetable.NodeTableCache.getNodeForNodeId(NodeTableCache.java:82)
>     at 
> org.apache.jena.tdb.store.nodetable.NodeTableWrapper.getNodeForNodeId(NodeTableWrapper.java:50)
>     at 
> org.apache.jena.tdb.store.nodetable.NodeTableInline.getNodeForNodeId(NodeTableInline.java:67)
>     at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:107)
>     at org.apache.jena.tdb.lib.TupleLib.triple(TupleLib.java:84)
>     at 
> org.apache.jena.tdb.lib.TupleLib.lambda$convertToTriples$2(TupleLib.java:54)
>     at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
>     at org.apache.jena.atlas.iterator.Iter$2.next(Iter.java:270)
>     at org.apache.jena.atlas.iterator.Iter.next(Iter.java:891)
>     at 
> org.apache.jena.riot.system.StreamOps.sendQuadsToStream(StreamOps.java:140)
>     at 
> org.apache.jena.riot.writer.NQuadsWriter.write$(NQuadsWriter.java:62)
>     at 
> org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:45)
>     at 
> org.apache.jena.riot.writer.NQuadsWriter.write(NQuadsWriter.java:91)
>     at org.apache.jena.riot.RDFWriter.write$(RDFWriter.java:208)
>     at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:165)
>     at org.apache.jena.riot.RDFWriter.output(RDFWriter.java:112)
>     at 
> org.apache.jena.riot.RDFWriterBuilder.output(RDFWriterBuilder.java:149)
>     at org.apache.jena.riot.RDFDataMgr.write$(RDFDataMgr.java:1269)
>     at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1162)
>     at org.apache.jena.riot.RDFDataMgr.write(RDFDataMgr.java:1153)
>     at org.apache.jena.fuseki.mgt.Backup.backup(Backup.java:115)
>     at org.apache.jena.fuseki.mgt.Backup.backup(Backup.java:75)
>     at 
> org.apache.jena.fuseki.mgt.ActionBackup$BackupTask.run(ActionBackup.java:58)
>     at 
> org.apache.jena.fuseki.async.AsyncPool.lambda$submit$0(AsyncPool.java:55)
>     at org.apache.jena.fuseki.async.AsyncTask.call(AsyncTask.java:100)
>     at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>     at java.lang.Thread.run(Thread.java:745)
> Caused by: java.io.IOException: Illegal UTF-8: 0xFFB1
>     ... 40 more
> [2018-06-01 13:25:46] Log4jLoggerAdapter INFO  
> Backup(/fuseki/backups/PDE_PROD_2018-06-01_13-24-00):2{code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1556) text:query multilingual enhancements

2018-06-06 Thread Bruno P. Kinoshita (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503197#comment-16503197
 ] 

Bruno P. Kinoshita commented on JENA-1556:
--

Oh, actually the selective filter is enough for te reo maori. But for Japanese 
I think this could still be useful?

> text:query multilingual enhancements
> 
>
> Key: JENA-1556
> URL: https://issues.apache.org/jira/browse/JENA-1556
> Project: Apache Jena
>  Issue Type: New Feature
>  Components: Text
>Affects Versions: Jena 3.7.0
>Reporter: Code Ferret
>Assignee: Code Ferret
>Priority: Major
>  Labels: pull-request-available
>
> This issue proposes two related enhancements of Jena Text. These enhancements 
> have been implemented and a PR can be issued. 
> There are two multilingual search situations that we want to support:
>  # We want to be able to search in one encoding and retrieve results that may 
> have been entered in other encodings. For example, searching via Simplified 
> Chinese (Hans) and retrieving results that may have been entered in 
> Traditional Chinese (Hant) or Pinyin. This will simplify applications by 
> permitting encoding independent retrieval without additional layers of 
> transcoding and so on. It's all done under the covers in Lucene.
>  # We want to search with queries entered in a lossy, e.g., phonetic, 
> encoding and retrieve results entered with accurate encoding. For example, 
> searching vis Pinyin without diacritics and retrieving all possible Hans and 
> Hant triples.
> The first situation arises when entering triples that include languages with 
> multiple encodings that for various reasons are not normalized to a single 
> encoding. In this situation we want to be able to retrieve appropriate result 
> sets without regard for the encodings used at the time that the triples were 
> inserted into the dataset.
> There are several such languages of interest in our application: Chinese, 
> Tibetan, Sanskrit, Japanese and Korean. There are various Romanizations and 
> ideographic variants.
> Encodings may not normalized when inserting triples for a variety of reasons. 
> A principle one is that the {{rdf:langString}} object often must be entered 
> in the same encoding that it occurs in some physical text that is being 
> catalogued. Another is that metadata may be imported from sources that use 
> different encoding conventions and we want to preserve that form.
> The second situation arises as we want to provide simple support for phonetic 
> or other forms of lossy search at the time that triples are indexed directly 
> in the Lucene system.
> To handle the first situation we introduce a {{text}} assembler predicate, 
> {{text:searchFor}}, that specifies a list of language tags that provides a 
> list of language variants that should be searched whenever a query string of 
> a given encoding (language tag) is used. For example, the following 
> {{text:TextIndexLucene/text:defineAnalyzers}} fragment :
> {code:java}
> [ text:addLang "bo" ; 
>   text:searchFor ( "bo" "bo-x-ewts" "bo-alalc97" ) ;
>   text:analyzer [ 
> a text:GenericAnalyzer ;
> text:class "io.bdrc.lucene.bo.TibetanAnalyzer" ;
> text:params (
> [ text:paramName "segmentInWords" ;
>   text:paramValue false ]
> [ text:paramName "lemmatize" ;
>   text:paramValue true ]
> [ text:paramName "filterChars" ;
>   text:paramValue false ]
> [ text:paramName "inputMode" ;
>   text:paramValue "unicode" ]
> [ text:paramName "stopFilename" ;
>   text:paramValue "" ]
> )
> ] ; 
>   ]
> {code}
> indicates that when using a search string such as "རྡོ་རྗེ་སྙིང་"@bo the 
> Lucene index should also be searched for matches tagged as {{bo-x-ewts}} and 
> {{bo-alalc97}}.
> This is made possible by a Tibetan {{Analyzer}} that tokenizes strings in all 
> three encodings into Tibetan Unicode. This is feasible since the 
> {{bo-x-ewts}} and {{bo-alalc97}} encodings are one-to-one with Unicode 
> Tibetan. Since all fields with these language tags will have a common set of 
> indexed terms, i.e., Tibetan Unicode, it suffices to arrange for the query 
> analyzer to have access to the language tag for the query string along with 
> the various fields that need to be considered.
> Supposing that the query is:
> {code:java}
> (?s ?sc ?lit) text:query ("rje"@bo-x-ewts) 
> {code}
> Then the query formed in {{TextIndexLucene}} will be:
> {code:java}
> label_bo:rje label_bo-x-ewts:rje label_bo-alalc97:rje
> {code}
> which is translated using a suitable {{Analyzer}}, 
> 

[jira] [Commented] (JENA-1556) text:query multilingual enhancements

2018-06-06 Thread Bruno P. Kinoshita (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503181#comment-16503181
 ] 

Bruno P. Kinoshita commented on JENA-1556:
--

Great idea! 

For the first case I think it would help Japanese users too.

{quote}
For example, most Japanese emails are in ISO-2022-JP ("JIS encoding") and web 
pages in Shift-JIS and yet mobile phones in Japan usually use some form of 
Extended Unix Code.
(https://en.wikipedia.org/wiki/Japanese_language_and_computers)
{quote}

I am more interested in the second situation. I wonder if that would work for 
Japanese and for Te Reo Maori. The word for yesterday can be written in 
Japanese as "昨日". But there are a few ways users could search for it:

* 昨日 (kanji)
*  きのう (hiragana)
* キノウ (katakana [bit weird I think, but...])
* kinou (wapuro romaji)
* kinō (hepburn romaji, harder to type, but still common)
* kino

Though I am not sure if this would work after these new analyzers/assemblers 
are added? Would it work [~code-ferret]?

The same would apply for Te Reo Maori I think, where in GIS systems places can 
be found to have been indexed as the English version (e.g. Taupo, name of a 
town and of a lake), or with the Maori name(s) ("Taupo" or "Taupō" for the town 
and lake, "Taupō-nui-a-Tia","Taupo-nui-a-Tia", or "Taupō nui a Tia" & etc for 
the city)

Never used Lucene or ElasticSearch with Japanese nor with Maori. But happy to 
help testing it if there's a PR, and maybe also investigate & learn how it 
works for Lucene & ElasticSearch at the moment :-D

> text:query multilingual enhancements
> 
>
> Key: JENA-1556
> URL: https://issues.apache.org/jira/browse/JENA-1556
> Project: Apache Jena
>  Issue Type: New Feature
>  Components: Text
>Affects Versions: Jena 3.7.0
>Reporter: Code Ferret
>Assignee: Code Ferret
>Priority: Major
>  Labels: pull-request-available
>
> This issue proposes two related enhancements of Jena Text. These enhancements 
> have been implemented and a PR can be issued. 
> There are two multilingual search situations that we want to support:
>  # We want to be able to search in one encoding and retrieve results that may 
> have been entered in other encodings. For example, searching via Simplified 
> Chinese (Hans) and retrieving results that may have been entered in 
> Traditional Chinese (Hant) or Pinyin. This will simplify applications by 
> permitting encoding independent retrieval without additional layers of 
> transcoding and so on. It's all done under the covers in Lucene.
>  # We want to search with queries entered in a lossy, e.g., phonetic, 
> encoding and retrieve results entered with accurate encoding. For example, 
> searching vis Pinyin without diacritics and retrieving all possible Hans and 
> Hant triples.
> The first situation arises when entering triples that include languages with 
> multiple encodings that for various reasons are not normalized to a single 
> encoding. In this situation we want to be able to retrieve appropriate result 
> sets without regard for the encodings used at the time that the triples were 
> inserted into the dataset.
> There are several such languages of interest in our application: Chinese, 
> Tibetan, Sanskrit, Japanese and Korean. There are various Romanizations and 
> ideographic variants.
> Encodings may not normalized when inserting triples for a variety of reasons. 
> A principle one is that the {{rdf:langString}} object often must be entered 
> in the same encoding that it occurs in some physical text that is being 
> catalogued. Another is that metadata may be imported from sources that use 
> different encoding conventions and we want to preserve that form.
> The second situation arises as we want to provide simple support for phonetic 
> or other forms of lossy search at the time that triples are indexed directly 
> in the Lucene system.
> To handle the first situation we introduce a {{text}} assembler predicate, 
> {{text:searchFor}}, that specifies a list of language tags that provides a 
> list of language variants that should be searched whenever a query string of 
> a given encoding (language tag) is used. For example, the following 
> {{text:TextIndexLucene/text:defineAnalyzers}} fragment :
> {code:java}
> [ text:addLang "bo" ; 
>   text:searchFor ( "bo" "bo-x-ewts" "bo-alalc97" ) ;
>   text:analyzer [ 
> a text:GenericAnalyzer ;
> text:class "io.bdrc.lucene.bo.TibetanAnalyzer" ;
> text:params (
> [ text:paramName "segmentInWords" ;
>   text:paramValue false ]
> [ text:paramName "lemmatize" ;
>   text:paramValue true ]
> [ text:paramName "filterChars" ;
>   text:paramValue false ]
>   

Re: Towards Jena 3.8.0

2018-06-06 Thread Bruno P. Kinoshita
+1

I'm adding a post-it to make sure I will help reviewing the release, and also 
going through the documentation to see if there's anything that needs updating 
(I remember at least one page I think I used 3.7.1 instead of 3.8 as the @since 
for a feature).


Cheers
Bruno


From: Andy Seaborne 
To: "dev@jena.apache.org"  
Sent: Wednesday, 6 June 2018 11:32 PM
Subject: Towards Jena 3.8.0



Let's look at a Jena 3.8.0 release - there are some significant new 

items in this release.


JIRA report:

https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311220=12343042


--- Headlines:


** JENA-632:  JSON templated SPARQL queries.


http://jena.staging.apache.org/documentation/query/generate-json-from-sparql.html



JENA-1542: Integrate Lucene index in transaction lifecycle (TDB1, TDB2).


JENA-1550: Parallel bulk loader for TDB2

http://jena.staging.apache.org/documentation/tdb2/tdb2_cmds.html


--- Dependency changes


Removed:


org.apache.xerces is no longer a dependency.

   Remove xercesImpl-2.11.0.jar

   Remove xml-apis-1.4.01.jar



Added:


Add Apache Commons Compress : commons-compress 1.17


https://lists.apache.org/thread.html/40ebcb548cd2cb6d404d150cc1c919605689cf242ae17fe9e47191b1@%3Cdev.jena.apache.org%3E


Updated:


  jsonldjava 0.11.1 ==> 0.12.0

  jackson 2.9.0 ==> 2.9.5 (addresses CVE-2018-5968)

  httpclient 4.5.3 ==> 4.5.5

  httpcore  4.4.6 ==> 4.4.9

  Shared guava update 21.0 ==> 21.1-jre


Tests:

  com.jayway.awaitility::1.7.0 ==> org.awaitility.awaitility::3.1.0

  org.objenesis:objenesis:jar: 2.1 ==> 2.6

Build:

  maven-surefire-plugin: 2.20.1 ==> 2.21.0



 System changes:


JENA-1537: Remove xerces


JENA-1525 / Christopher Johnson

Java Automatic Module Names


JENA-1524 / Christopher Johnson

Split package


Note to repacking and deep system integrations:


Package "org.apache.jena.system" was split across jars.

There are now two packages:


"org.apache.jena.sys"

"org.apache.jena.system"


and "sys" contains the system service loader code.


JenaSystem.init() has migrated, with deprecated proxy, from 

"org.apache.jena.system" to "org.apache.jena.sys"


** NB ServiceLoader file change **


The ServiceLoader interface for system initialization is now:


org.apache.jena.sys.JenaSubsystemLifecycle


--- Other Changes


JENA-1544: Consistent FROM/FROM NAMED naming handling


JENA-1519: OpExt / Jeremy Coulon


JENA-1488: SelectiveFoldingFilter for jena-text


Towards Jena 3.8.0

2018-06-06 Thread Andy Seaborne
Let's look at a Jena 3.8.0 release - there are some significant new 
items in this release.


JIRA report:
https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12311220=12343042

--- Headlines:

** JENA-632:  JSON templated SPARQL queries.

http://jena.staging.apache.org/documentation/query/generate-json-from-sparql.html


JENA-1542: Integrate Lucene index in transaction lifecycle (TDB1, TDB2).

JENA-1550: Parallel bulk loader for TDB2
http://jena.staging.apache.org/documentation/tdb2/tdb2_cmds.html

--- Dependency changes

Removed:

org.apache.xerces is no longer a dependency.
  Remove xercesImpl-2.11.0.jar
  Remove xml-apis-1.4.01.jar


Added:

Add Apache Commons Compress : commons-compress 1.17

https://lists.apache.org/thread.html/40ebcb548cd2cb6d404d150cc1c919605689cf242ae17fe9e47191b1@%3Cdev.jena.apache.org%3E

Updated:

 jsonldjava 0.11.1 ==> 0.12.0
 jackson 2.9.0 ==> 2.9.5 (addresses CVE-2018-5968)
 httpclient 4.5.3 ==> 4.5.5
 httpcore  4.4.6 ==> 4.4.9
 Shared guava update 21.0 ==> 21.1-jre

Tests:
 com.jayway.awaitility::1.7.0 ==> org.awaitility.awaitility::3.1.0
 org.objenesis:objenesis:jar: 2.1 ==> 2.6
Build:
 maven-surefire-plugin: 2.20.1 ==> 2.21.0


 System changes:

JENA-1537: Remove xerces

JENA-1525 / Christopher Johnson
Java Automatic Module Names

JENA-1524 / Christopher Johnson
Split package

Note to repacking and deep system integrations:

Package "org.apache.jena.system" was split across jars.
There are now two packages:

"org.apache.jena.sys"
"org.apache.jena.system"

and "sys" contains the system service loader code.

JenaSystem.init() has migrated, with deprecated proxy, from 
"org.apache.jena.system" to "org.apache.jena.sys"


** NB ServiceLoader file change **

The ServiceLoader interface for system initialization is now:

org.apache.jena.sys.JenaSubsystemLifecycle

--- Other Changes

JENA-1544: Consistent FROM/FROM NAMED naming handling

JENA-1519: OpExt / Jeremy Coulon

JENA-1488: SelectiveFoldingFilter for jena-text


[jira] [Closed] (JENA-1541) Jena Eyeball - ant test fails with TestCase Error

2018-06-06 Thread Andy Seaborne (JIRA)


 [ 
https://issues.apache.org/jira/browse/JENA-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne closed JENA-1541.
---

> Jena Eyeball - ant test fails with TestCase Error
> -
>
> Key: JENA-1541
> URL: https://issues.apache.org/jira/browse/JENA-1541
> Project: Apache Jena
>  Issue Type: Question
>  Components: Eyeball, Jena
>Reporter: Edward
>Priority: Major
>  Labels: ant, eyeball, fail, jena, test
>
> Hello,
> I'm trying to get Jena Eyeball running. I wanted to do the Tutorial on 
> [https://jena.apache.org/documentation/tools/eyeball-getting-started.html] 
> but it failed with following error:
> {code:java}
>     [junit] Testcase: warning(junit.framework.TestSuite$1):    FAILED
>     [junit] Class 
> com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no 
> public constructor TestCase(String name) or TestCase()
>     [junit] junit.framework.AssertionFailedError: Class 
> com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no 
> public constructor TestCase(String name) or TestCase()
>     [junit]
>     [junit]
> BUILD FAILED
> /**/Downloads/eyeball-2.3/build.xml:146: Test 
> com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector failed
> {code}
> The tutorial says if the ant test doesn't pass, I should file a Jira Issue. 
> So that's what I am doing.
>  Help would be much appreciated!
>  
> Greetings,
> Edward



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1541) Jena Eyeball - ant test fails with TestCase Error

2018-06-06 Thread Andy Seaborne (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503158#comment-16503158
 ] 

Andy Seaborne commented on JENA-1541:
-

No one has stepped forward to offer to update Eyeball so it remains an 
unreleased codebase which has not been updated to Apache Jena releases.

With no offers of support and maintenance, all there is the original source 
code.

I wish there were more to offer, but if no-one volunteers to support and 
maintain the code, there is little that can be done. Anyone reading this is 
able to offer help.

The documentation has been unlinked from the tools page and the status updated.

> Jena Eyeball - ant test fails with TestCase Error
> -
>
> Key: JENA-1541
> URL: https://issues.apache.org/jira/browse/JENA-1541
> Project: Apache Jena
>  Issue Type: Question
>  Components: Eyeball, Jena
>Reporter: Edward
>Priority: Major
>  Labels: ant, eyeball, fail, jena, test
>
> Hello,
> I'm trying to get Jena Eyeball running. I wanted to do the Tutorial on 
> [https://jena.apache.org/documentation/tools/eyeball-getting-started.html] 
> but it failed with following error:
> {code:java}
>     [junit] Testcase: warning(junit.framework.TestSuite$1):    FAILED
>     [junit] Class 
> com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no 
> public constructor TestCase(String name) or TestCase()
>     [junit] junit.framework.AssertionFailedError: Class 
> com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no 
> public constructor TestCase(String name) or TestCase()
>     [junit]
>     [junit]
> BUILD FAILED
> /**/Downloads/eyeball-2.3/build.xml:146: Test 
> com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector failed
> {code}
> The tutorial says if the ant test doesn't pass, I should file a Jira Issue. 
> So that's what I am doing.
>  Help would be much appreciated!
>  
> Greetings,
> Edward



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (JENA-1541) Jena Eyeball - ant test fails with TestCase Error

2018-06-06 Thread Andy Seaborne (JIRA)


 [ 
https://issues.apache.org/jira/browse/JENA-1541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-1541.
-
Resolution: Won't Fix

> Jena Eyeball - ant test fails with TestCase Error
> -
>
> Key: JENA-1541
> URL: https://issues.apache.org/jira/browse/JENA-1541
> Project: Apache Jena
>  Issue Type: Question
>  Components: Eyeball, Jena
>Reporter: Edward
>Priority: Major
>  Labels: ant, eyeball, fail, jena, test
>
> Hello,
> I'm trying to get Jena Eyeball running. I wanted to do the Tutorial on 
> [https://jena.apache.org/documentation/tools/eyeball-getting-started.html] 
> but it failed with following error:
> {code:java}
>     [junit] Testcase: warning(junit.framework.TestSuite$1):    FAILED
>     [junit] Class 
> com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no 
> public constructor TestCase(String name) or TestCase()
>     [junit] junit.framework.AssertionFailedError: Class 
> com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has no 
> public constructor TestCase(String name) or TestCase()
>     [junit]
>     [junit]
> BUILD FAILED
> /**/Downloads/eyeball-2.3/build.xml:146: Test 
> com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector failed
> {code}
> The tutorial says if the ant test doesn't pass, I should file a Jira Issue. 
> So that's what I am doing.
>  Help would be much appreciated!
>  
> Greetings,
> Edward



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Jena Eyeball [was: JENA-1541]

2018-06-06 Thread Andy Seaborne

Done and JIRA closed "Won't fix".


On 23/05/18 15:26, ajs6f wrote:

+1.

ajs6f


On May 21, 2018, at 1:44 PM, Bruno P. Kinoshita  wrote:

Sounds good to me
+1

Sent from Yahoo Mail on Android

  On Tue, 22 May 2018 at 5:08, Andy Seaborne wrote:   To pull 
this into a plan ...

The Eyeball code is already elsewhere (in Apache SVN - keep that link).

Proposal:

* Unlink from the tools page.
* Upgarde the notice on all Eyeball pages to "This page is historical
for information only - there is no Apache release of Eyeball".
* Remove links to JIRA, SF distribution (broken anyway) and mention of
help/support.

PMC - OK?

 Andy

On 11/05/18 14:10, Chris Dollin wrote:

(Sorry for previous empty reply, hit wrong button)

I'm the original developer of Eyeball. It is several years since any work
has been done on it, and I suspect that it is now obsolete, that is,
that there other tools with equivalent functionality (though I don't
know what they are).

Putting it somewhere designated "Attic" with a red label saying
"use at your own risk" seems reasonable.

Chris

On 11 May 2018 at 13:57, Chris Dollin  wrote:




On 11 May 2018 at 13:11, Andy Seaborne  wrote:


Edward found Eyeball via:

https://jena.apache.org/documentation/tools/eyeball-getting-started.html

Eyeball is mentioned:

documentation/tools/index.mdtext
documentation/tools/eyeball-guide.mdtext
documentation/tools/eyeball-getting-started.mdtext
documentation/tools/eyeball-manual.mdtext

The tools page has schemagen and eyeball on it.

eyeball-getting-started says "file a JIRA".

What should we do?
If it is not released, do we retire it from the documentation?

   Andy


On 04/05/18 11:39, Edward (JIRA) wrote:


Edward created JENA-1541:


 Summary: Jena Eyeball - ant test fails with TestCase Error
 Key: JENA-1541
 URL: https://issues.apache.org/jira/browse/JENA-1541
 Project: Apache Jena
 Issue Type: Question
 Components: Eyeball, Jena
   Reporter: Edward


Hello,

I'm trying to get Jena Eyeball running. I wanted to do the Tutorial on
[here|[https://jena.apache.org/documentation/tools/eyeball-g
etting-started.html],] but it failed with following error:
{code:java}
   [junit] Testcase: warning(junit.framework.TestSuite$1):FAILED
   [junit] Class com.hp.hpl.jena.eyeball.inspec
tors.test.TestMoreOwlSyntaxInspector has no public constructor
TestCase(String name) or TestCase()
   [junit] junit.framework.AssertionFailedError: Class
com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector has
no public constructor TestCase(String name) or TestCase()
   [junit]
   [junit]

BUILD FAILED
/**/Downloads/eyeball-2.3/build.xml:146: Test
com.hp.hpl.jena.eyeball.inspectors.test.TestMoreOwlSyntaxInspector
failed
{code}
The tutorial says if the ant test doesn't pass, I should file a Jira
Issue. So that's what I am doing.
Help would be much appreciated!


Greetings,

Edward



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)





--
"What I don't understand is this ..."  Trevor Chaplin, /The Beiderbeck
Affair/

Epimorphics Ltd, http://www.epimorphics.com
Registered address: Court Lodge, 105 High Street, Portishead, Bristol BS20
6PT
Epimorphics Ltd. is a limited company registered in England (number
7016688)











[jira] [Resolved] (JENA-1552) Bulk loader for TDB2 (phased loading)

2018-06-06 Thread Andy Seaborne (JIRA)


 [ 
https://issues.apache.org/jira/browse/JENA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andy Seaborne resolved JENA-1552.
-
   Resolution: Done
Fix Version/s: Jena 3.8.0

> Bulk loader for TDB2 (phased loading)
> -
>
> Key: JENA-1552
> URL: https://issues.apache.org/jira/browse/JENA-1552
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB2
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
> Fix For: Jena 3.8.0
>
>
> Following on from JENA-1550, this ticket is for phased loading which combined 
> features of the sequential loader and the parallel loader.
> When building all the persistent datastructures (parallel loader), the work 
> on different indexes at the same time is competing for hardware resources, 
> RAM and I/O bandwidth.  As the size to load grows, this becomes a noticeable 
> slowdown.
> The sequential loader is the other extreme of the design spectrum. It does 
> work on one index at a time so as to maximize caching efficiency.
> Phased loading has parallel operation per phase and splits work into subsets 
> of indexes.
> At 200m and loading to rotational disk, an experimental phased loader working 
> with 2 indexes at a time, starts to become faster than parallel on the same 
> hardware as used for the [figures in 
> JENA-1550|https://issues.apache.org/jira/browse/JENA-1550#comment-16484269] 
> (57K parallel, 70K phased).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[GitHub] jena pull request #426: JENA-1552: Phased loader

2018-06-06 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/426


---


[jira] [Commented] (JENA-1552) Bulk loader for TDB2 (phased loading)

2018-06-06 Thread ASF subversion and git services (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503132#comment-16503132
 ] 

ASF subversion and git services commented on JENA-1552:
---

Commit 2934c5506f9caa237868fbbb5aabf247106ec16b in jena's branch 
refs/heads/master from [~an...@apache.org]
[ https://git-wip-us.apache.org/repos/asf?p=jena.git;h=2934c55 ]

JENA-1552: Phased loader


> Bulk loader for TDB2 (phased loading)
> -
>
> Key: JENA-1552
> URL: https://issues.apache.org/jira/browse/JENA-1552
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB2
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>
> Following on from JENA-1550, this ticket is for phased loading which combined 
> features of the sequential loader and the parallel loader.
> When building all the persistent datastructures (parallel loader), the work 
> on different indexes at the same time is competing for hardware resources, 
> RAM and I/O bandwidth.  As the size to load grows, this becomes a noticeable 
> slowdown.
> The sequential loader is the other extreme of the design spectrum. It does 
> work on one index at a time so as to maximize caching efficiency.
> Phased loading has parallel operation per phase and splits work into subsets 
> of indexes.
> At 200m and loading to rotational disk, an experimental phased loader working 
> with 2 indexes at a time, starts to become faster than parallel on the same 
> hardware as used for the [figures in 
> JENA-1550|https://issues.apache.org/jira/browse/JENA-1550#comment-16484269] 
> (57K parallel, 70K phased).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1552) Bulk loader for TDB2 (phased loading)

2018-06-06 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503133#comment-16503133
 ] 

ASF GitHub Bot commented on JENA-1552:
--

Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/426


> Bulk loader for TDB2 (phased loading)
> -
>
> Key: JENA-1552
> URL: https://issues.apache.org/jira/browse/JENA-1552
> Project: Apache Jena
>  Issue Type: Improvement
>  Components: TDB2
>Reporter: Andy Seaborne
>Assignee: Andy Seaborne
>Priority: Major
>
> Following on from JENA-1550, this ticket is for phased loading which combined 
> features of the sequential loader and the parallel loader.
> When building all the persistent datastructures (parallel loader), the work 
> on different indexes at the same time is competing for hardware resources, 
> RAM and I/O bandwidth.  As the size to load grows, this becomes a noticeable 
> slowdown.
> The sequential loader is the other extreme of the design spectrum. It does 
> work on one index at a time so as to maximize caching efficiency.
> Phased loading has parallel operation per phase and splits work into subsets 
> of indexes.
> At 200m and loading to rotational disk, an experimental phased loader working 
> with 2 indexes at a time, starts to become faster than parallel on the same 
> hardware as used for the [figures in 
> JENA-1550|https://issues.apache.org/jira/browse/JENA-1550#comment-16484269] 
> (57K parallel, 70K phased).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (JENA-1556) text:query multilingual enhancements

2018-06-06 Thread Osma Suominen (JIRA)


[ 
https://issues.apache.org/jira/browse/JENA-1556?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16503116#comment-16503116
 ] 

Osma Suominen commented on JENA-1556:
-

Whoa, sounds like quite advanced functionality for jena-text! Do you know if 
there is anything similar in Solr or Elasticsearch? ISTR that at least with 
Solr you can define fields that are based on transformations of other fields.

I don't object to this if you're willing to do a PR, of course :)

> text:query multilingual enhancements
> 
>
> Key: JENA-1556
> URL: https://issues.apache.org/jira/browse/JENA-1556
> Project: Apache Jena
>  Issue Type: New Feature
>  Components: Text
>Affects Versions: Jena 3.7.0
>Reporter: Code Ferret
>Assignee: Code Ferret
>Priority: Major
>  Labels: pull-request-available
>
> This issue proposes two related enhancements of Jena Text. These enhancements 
> have been implemented and a PR can be issued. 
> There are two multilingual search situations that we want to support:
>  # We want to be able to search in one encoding and retrieve results that may 
> have been entered in other encodings. For example, searching via Simplified 
> Chinese (Hans) and retrieving results that may have been entered in 
> Traditional Chinese (Hant) or Pinyin. This will simplify applications by 
> permitting encoding independent retrieval without additional layers of 
> transcoding and so on. It's all done under the covers in Lucene.
>  # We want to search with queries entered in a lossy, e.g., phonetic, 
> encoding and retrieve results entered with accurate encoding. For example, 
> searching vis Pinyin without diacritics and retrieving all possible Hans and 
> Hant triples.
> The first situation arises when entering triples that include languages with 
> multiple encodings that for various reasons are not normalized to a single 
> encoding. In this situation we want to be able to retrieve appropriate result 
> sets without regard for the encodings used at the time that the triples were 
> inserted into the dataset.
> There are several such languages of interest in our application: Chinese, 
> Tibetan, Sanskrit, Japanese and Korean. There are various Romanizations and 
> ideographic variants.
> Encodings may not normalized when inserting triples for a variety of reasons. 
> A principle one is that the {{rdf:langString}} object often must be entered 
> in the same encoding that it occurs in some physical text that is being 
> catalogued. Another is that metadata may be imported from sources that use 
> different encoding conventions and we want to preserve that form.
> The second situation arises as we want to provide simple support for phonetic 
> or other forms of lossy search at the time that triples are indexed directly 
> in the Lucene system.
> To handle the first situation we introduce a {{text}} assembler predicate, 
> {{text:searchFor}}, that specifies a list of language tags that provides a 
> list of language variants that should be searched whenever a query string of 
> a given encoding (language tag) is used. For example, the following 
> {{text:TextIndexLucene/text:defineAnalyzers}} fragment :
> {code:java}
> [ text:addLang "bo" ; 
>   text:searchFor ( "bo" "bo-x-ewts" "bo-alalc97" ) ;
>   text:analyzer [ 
> a text:GenericAnalyzer ;
> text:class "io.bdrc.lucene.bo.TibetanAnalyzer" ;
> text:params (
> [ text:paramName "segmentInWords" ;
>   text:paramValue false ]
> [ text:paramName "lemmatize" ;
>   text:paramValue true ]
> [ text:paramName "filterChars" ;
>   text:paramValue false ]
> [ text:paramName "inputMode" ;
>   text:paramValue "unicode" ]
> [ text:paramName "stopFilename" ;
>   text:paramValue "" ]
> )
> ] ; 
>   ]
> {code}
> indicates that when using a search string such as "རྡོ་རྗེ་སྙིང་"@bo the 
> Lucene index should also be searched for matches tagged as {{bo-x-ewts}} and 
> {{bo-alalc97}}.
> This is made possible by a Tibetan {{Analyzer}} that tokenizes strings in all 
> three encodings into Tibetan Unicode. This is feasible since the 
> {{bo-x-ewts}} and {{bo-alalc97}} encodings are one-to-one with Unicode 
> Tibetan. Since all fields with these language tags will have a common set of 
> indexed terms, i.e., Tibetan Unicode, it suffices to arrange for the query 
> analyzer to have access to the language tag for the query string along with 
> the various fields that need to be considered.
> Supposing that the query is:
> {code:java}
> (?s ?sc ?lit) text:query ("rje"@bo-x-ewts) 
> {code}
> Then the query formed in {{TextIndexLucene}}