[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-113273093 @afs like for the other doc I put it in a gist : https://gist.github.com/amiara514/ef839adb7c5cb9dd697d It covers only the Deletion of Indexed Entities section of text-query.mdtext --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-112409094 Yes absolutly, you can merge it... before another conflict ;-) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-110457559 Done. I'll do changes of the Deletion of Indexed Entities part of jena-text doc once PR will be merged --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-110094664 I reorganized tests part --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-109986517 Hi, PR is mergeable again after conflict fixing of #72. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-110026337 Ok I see, I will add a similar case of graph-specific for deletion support. One question about graph indexing. In jena-text documentation you mention: This allows for more efficient text queries when the query targets only a single named graph. But there's no example of using this (even in the tests). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-110032217 @osma oups, forget my message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-108383450 I have created a gist instead : https://gist.github.com/amiara514/262d73ed35580b7dbdfe --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-108376304 Ok, which address ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-108118228 I sent it from the edition page (via Improve this Page) with anonymous access. The default setting is a post to dev@jena.apache.org. Another way to send the file ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-107989314 @afs : done, I modified the text-query.mdtext and sent it to the dev list. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-107572459 The direct reflection of site/trunk is http://jena.staging.apache.org/ and the button should work there as well. Humm, this link leads to the same wrong place... Ex: the section Configuring an Analyzer should finish with the explanation about specifiyng a query analyzer (New in Jena 2.13.0 is the optional ability to specify...) as the online version, and it's not the case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Add (?uri ?score) to jena-text
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/72#issuecomment-107581689 Maybe @amiara514 would have a comment as I understood he is using jena-text via Java code? @osma : Not exactly, I execute Sparql queries via java code which involve jena-text. I don't manipulate it at this level. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-106848124 @osma The status is still pending. Ok, I will fix minor changes to provide a re-mergeable version. 1. yes maybe the unique ID could be configurable. Hence the feature would be a kind of: Enabling deletion mode. Agree with that ? 2. no problem for change it. 3. I will take a look --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Update TextDatasetFactory.java
GitHub user amiara514 opened a pull request: https://github.com/apache/jena/pull/74 Update TextDatasetFactory.java Reintroducing previous static methods for backward compatibility You can merge this pull request into a Git repository by running: $ git pull https://github.com/amiara514/jena patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/74.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #74 commit 8b9c0ffb39bd6b6f4df8f7c359491cde891e1788 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2015-05-28T17:15:08Z Update TextDatasetFactory.java Reintroducing previous static methods for backward compatibility --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-106037240 Hi Andy, I'm watching on the documentation part about linguistic stuff. On the current doc, there are some references like : Starting with version 1.0.1, jena-text... or New in Jena 2.13.0 is the Should I introduce my new changes with a reference of the next 3.0.0 version of Jena ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Update TextQueryPF.java
GitHub user amiara514 opened a pull request: https://github.com/apache/jena/pull/73 Update TextQueryPF.java A simple warning insertion about the comment : https://github.com/apache/jena/pull/64#issuecomment-105229534 You can merge this pull request into a Git repository by running: $ git pull https://github.com/amiara514/jena patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/73.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #73 commit 1503c2597cd0d3294a2295699d71d1b5ae31e9f1 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2015-05-26T19:15:14Z Update TextQueryPF.java A simple warning insertion about the comment : https://github.com/apache/jena/pull/64#issuecomment-105229534 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-105237062 All seems to be ok with the merge, thanks. I'll try to make the doc by the end of the week. A question about this... it will be online only when jena3 will be available, right ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-105237426 I played a bit with this code. I noticed one potential issue: if I do a language specific query with a text:query argument such as 'lang:en', but there is not langField set for the index, the query parameter will just be silently ignored. Would it be better to return some kind of query error instead? The documentation may be sufficient. no ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-104762759 I see deletion in /64.patch : Date: Tue, 19 May 2015 14:41:32 -0400 Subject: [PATCH 3/3] langField implementation to store lang tags of literals + refactoring growing methods of TextDatasetFactory ... delete mode 100644 jena-text/src/main/java/org/apache/jena/query/text/LuceneUtil.java create mode 100644 jena-text/src/main/java/org/apache/jena/query/text/TextIndexConfig.java create mode 100644 jena-text/src/main/java/org/apache/jena/query/text/analyzer/Util.java delete mode 100644 jena-text/src/main/java/org/apache/jena/query/text/assembler/TextIndexLuceneMultilingualAssembler.java --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-104720469 I tried applying it as a patch and TextIndexLuceneMultilingualAssembler uses non-existent TextDatasetFactory.createLuceneIndexMultilingual . Is this PR dependent on another? Strange, `TextIndexLuceneMultilingualAssembler` has been deleted... `LuceneUtil` too. Which source code branch do you use ? Would it be possible to put back EntityDefinition for maximum compatibility with the original code? Ok, I'll do it soon. There is document at query/text-query. In what way should that be updated? Should I put paragraph explanations on this discussion or elsewhere ? It will be a re-format of the [dev mail](http://mail-archives.apache.org/mod_mbox/jena-dev/201505.mbox/raw/%3CBLU181-W56D79F6BA6AC7103317BC6ECC20%40phx.gbl%3E/1/2) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-104257598 Like @osma said, there's no direct relation between those pulls (except jena-text) Both were mixed in the previous proposal. I had splitted in 2 distinct pulls for a better understanding. [#53](https://github.com/apache/jena/pull/53) is a small patch for cleaning Lucene index on SPARQL triple deletion (with literal objects). [#64](https://github.com/apache/jena/pull/64) is a larger one for enabling linguistic index. About that, a resume of #64 have been posted in another thread of dev mailing list (20-05-2015). See details [here](http://mail-archives.apache.org/mod_mbox/jena-dev/201505.mbox/raw/%3CBLU181-W56D79F6BA6AC7103317BC6ECC20%40phx.gbl%3E/1/2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-103661706 1. Not exactly, let's suppose that langField is used with name language. If lang arg == none in the Sparql query, then the query extension for Lucene will be : ``` String qs2 = -language:* ``` with a minus before to exclude filled values 2. Ok I will submit it to the dev list. Is there a required formatting on on the message ? ps: I will also mention the refactoring to obtain their advices on it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-103635153 Languages can now be stored in the index by providing a langField param to EntityDefinition (assembler and java code). So you can use a TexIndexLucene with your own analyzer and target language of localized literals. 3 cases on Sparql clauses : ```?s text:query (rdfs:label 'word' 'lang:en' )``` will target english literals ```?s text:query (rdfs:label 'word' 'lang:none')``` will target unlocalized literals ```?s text:query (rdfs:label 'word')``` will ignore language NOTE: in Sparql queries, ```lang``` is a predefined keyword and the Lucene query will be mapped with the right langField name. Moreover, TextIndexConfig class has been introduced and EntityDefinition refactored to simplify TextDatasetFactory as desired. LuceneUtil has moved as Util class in analyzer package. It still used with the LocalizedAnalyzerAssembler --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-102398852 Yes completely agree with that, otherwise TextDatasetFactory will be difficult to maintain. And no more need of TextIndexLuceneMultilingualAssembler also. TextIndexConfiguration should be about : analyzer, queryAnalyzer, multilingual,.. and graphField, langField concern EntityDefinition no ? Moreover, to avoid the same growing constructors, EntityDefinition should be configurable with setters/getters. Just a question about the usual worflow, who decides to integrate, or not, a proposed code ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-102412893 As it turns out the TextIndexConfiguration class I suggested will only be concerned with analyzer configuration, maybe it could be called TextIndexAnalyzerConfiguration or something like that. I prefer TextIndexConfiguration, it will be easier to add future conf parameters. Thanks for the explanation. About the lang:xx, I think that extra params should be generalized in the same manner, limit:10, score:x,... Hence it would allow params to be optional and would remove the order and size constraints. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-102207391 Ok, it's not supposed to be a big job. I'll take a look soon. For the multilingual analyzer, the lang must be stored anyway, it depends on it (like you said in point 2). So, either the langField param will be ignored or an exception will be raised to alert the forgotten field ? Another point : In the current version, I put an undef value rather than an empty one for the unlocalized literals. Because the query on empty field is not obvious with Lucene and I want to be able to search unlocalized values in explicit way. In your case, I don't think it will be a necessity. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/64#issuecomment-101808418 Ok, I will separate it to have a distinct behavior. Removing the isMultilingual() method will probably force to switch some private fields and/or methods to protected. So you should never see a literal such as book@eng in well-formed RDF data. Yes unfortunaltely. It was to accept this case. Ok, if the strict implementation about standards is sufficient, I will bypass the 3 to 2 letters conversion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Jena-text multilingual alternative implementati...
GitHub user amiara514 opened a pull request: https://github.com/apache/jena/pull/64 Jena-text multilingual alternative implementation This proposal is an alternative of [pull52](https://github.com/apache/jena/pull/52) (JENA-928). It deals with a single index that includes language as a new field entry. You can merge this pull request into a Git repository by running: $ git pull https://github.com/LICEF/jena jena-text-ml-single-index Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/64.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #64 commit 9553c6b2c246bc9c05906096c1f56d65ba15fed8 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2015-05-13T15:23:56Z Implementation of jena-text multilingual with a single index --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing (take 2)
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/52#issuecomment-101331810 Hi, I'm not against your suggestion, it's probably easier to deal with one index. So, I made some positive tests that cover the points discussed previously (index a lang var and query on it). But a problem persists, we need to dynamically set the indexing analyzer on each triple addition, each of them may have a different language. I dont think it's possible to change it on the fly. The indexWriter config is done at start and the lock mechanism prevents it... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing (take 2)
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/52#issuecomment-101387550 Thanks a lot, I had not seen this method. Well, it seems to work and all tests pass. Should I propose the OneIndex branch in a new pull request ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing (take 2)
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/52#issuecomment-100941772 Some new tests have been submitted. About the implementation, your proposal would use a StandardAnalyzer on indexing phase and a localized queryAnalyzer for queries ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing (take 2)
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/52#issuecomment-101035985 But since there is only a relatively small number of Lucene analyzers anyway, maybe this is OK. It's why it's done like this :-) No, that wouldn't work. You have to use the same analyzer for both indexing and queries (in this case, the language-specific analyzer), otherwise the tokens won't match. Exactly But I think it should still be possible to share the same index, if you have a field that specifies the language and make sure to target your queries only to the specific language. Store the language as an extra field is easy to do during the document creation (on the addEntity method). Add an extra param in queries is not a problem either (done in my solution). But how to change correctly the existent code to target Lucene taking that extra language into account ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing (take 2)
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/52#issuecomment-99979270 Hi, with the last proposal : 1) It's now possible to set multilingual indexing via assembler configuration file by defining the multilingual class and using it in the index definition : ``` [] ja:loadClass org.apache.jena.query.text.TextQuery . text:TextDataset rdfs:subClassOf ja:RDFDataset . #text:TextIndexLucene rdfs:subClassOf text:TextIndex . text:TextIndexLuceneMultilingual rdfs:subClassOf text:TextIndex . #indexLucene a text:TextIndexLuceneMultilingual ; text:directory file:Lucene ; ##text:directory mem ; text:entityMap #entMap ; . ``` This multilingual index manages all localized literals automatically with all Lucene localized analyzers. 2) Moreover, with a default Lucene index setup, a localized analyzer can be specified (as for SimpleAnalyzer, KeywordAnalyzer, etc...) by this config : ``` #indexLucene a text:TextIndexLucene ; text:directory file:Lucene ; text:entityMap #entMap ; text:queryAnalyzer [ a text:LocalizedAnalyzer ; text:language en ] . ``` reference for JENA-928 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-99072949 Ah, I see. But this still doesn't help for cases where there are small differences between literals within the same language, for example singular/plural forms that get stemmed by the analyzer, or variations in capitalization. It's exactly that! So, I push the hash solution which cover all previous cases. For the other issue (with conjonctive query), maybe deletion have to be managed with an updateDocument ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-98744341 I'm curious, how would the multi language proposal help with this problem? Multilingual index manages dynamically one index per language. Hence two same literals with different languages are not stored in the same index. For the hash solution, it works fine with a sha1. So we have one more field by doc, but I don't think it's embarrassing for the final index size. Should I commit it ? I don't know either if it can disturbs the conjonctive stuff. However, the addEntity interacts with the updateEntity, and entries are still corrspond to triples/quads isn't it ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing (take 2)
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/52#issuecomment-98183195 Hi Andy, It's ok for the Jena3 new package format. The last commit already deals with it. Ok, I'll write some tests soon.. For the documentation 2 questions to be sure: 1) Are we talking about this page : https://jena.apache.org/documentation/query/text-query.html ? 2) Is there a special space to write it, or should I write a paragraph in this conversation ? ps: I don't need a dediacated branch for the moment, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
GitHub user amiara514 opened a pull request: https://github.com/apache/jena/pull/53 Lucene index synchro on triple deletion (jena-text) Hi, This code synchronize the Lucene index (from jena-text) when a triple is removed from the associated graph. It is based on a trick for exact match on Lucene. See it [here](http://blogs.perl.org/users/mark_leighton_fisher/2012/01/stupid-lucene-tricks-exact-match-starts-with-ends-with.html). You can merge this pull request into a Git repository by running: $ git pull https://github.com/LICEF/jena upstream/jena-text-triple-deletion Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/53.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #53 commit a052b6d26c62a218c516aa64621d2505040be30a Author: Alexis Miara alexis_mi...@hotmail.com Date: 2015-04-29T18:52:16Z Lucene index synchro on triple deletion on jena-text --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing (take 2)
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/52#issuecomment-97123004 Hi Osma, it's now ok for merge. For the other part, I will propose soon the triple deletion which clean the related entry in the index. For the synchronization of lucene and tdb transactions, the current codebase seems to manage that. I'm not 100% sure yet. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing (take 2)
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/52#issuecomment-97165961 Hi Osma, about your comments: 1. I'm not familiar with assembler configuration. But if you want to give some help ;-) 2. Ok, I will refactor it to leave previous signatures and calls. 3. Sure, it's more clean to extend Entity... ok, todo list. For the tests and doc, I 'm pretty busy at the moment. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing
Github user amiara514 closed the pull request at: https://github.com/apache/jena/pull/51 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing (take 2)
GitHub user amiara514 opened a pull request: https://github.com/apache/jena/pull/52 jena-text multilingual indexing (take 2) Hi, This version allows usage of localized Lucene indexes (in jena-text). All existing Lucene languages analyzers are taken into account. 2 new cases in TextDatasetFactory : - createLuceneFromLanguage : creation of lucene index with the associated Lucene analyzer. - createLuceneMultilingual : creation of a dynamic multilingual index (collection of localized lucene index) depending on triple's literals languages. On SPARQL side, the pattern is : ?uri text:query (property 'query' ['lang:language']) ; query is dispatched to the right Lucene index. Note 1: If the 'lang' arg is not present, it's the same default existing case. Note 2 : for the moment, the 'lang' argument is not managed with ?limit and ?score variables but works if they are not present. You can merge this pull request into a Git repository by running: $ git pull https://github.com/LICEF/jena upstream/jena-text-multilingual Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/52.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #52 commit 5e4e1c1432f44151356fe25cc44c87e0085c1873 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2015-04-21T18:19:32Z change on pom.xml to have local groupId commit d3f21853c0d0556ad95ae06c393fb8a8619feb35 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2015-04-22T18:55:58Z Introducing Lucene multilingual index commit abdc602fe505167562b7ce9218433bf7c99f2f9e Author: Alexis Miara alexis_mi...@hotmail.com Date: 2015-04-21T18:19:32Z change on pom.xml to have local groupId commit a88b6e47a8ab0d595a1a7077f46fd8396ae3e89d Author: Alexis Miara alexis_mi...@hotmail.com Date: 2015-04-22T18:55:58Z Introducing Lucene multilingual index commit ad87c035d841243dfc972d2b0e220f207ed5 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2015-04-22T19:07:09Z Merge branch 'upstream/jena-text-multilingual' of github.com:LICEF/jena into upstream/jena-text-multilingual commit a125642e1f6bd8e9ec732784d897df6c4e7cd28c Author: Alexis Miara alexis_mi...@hotmail.com Date: 2015-04-22T19:44:31Z original pom.xml --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/51#issuecomment-92953558 I just wanted to point out that if your current code requires extra synchronization calls in the calling code, that won't work for the Fuseki use case. Indeed, so it's definitely the dataset transaction mechanism to manage the index transaction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing
GitHub user amiara514 reopened a pull request: https://github.com/apache/jena/pull/51 jena-text multilingual indexing Hi, Instead of having a default Lucene index, this version of jena-text allows to associate localized indexes with jena datasets (english, french and spanish for the moment). See README.md for use cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/LICEF/jena multilingual-indexing Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/51.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #51 commit 9bfd6991628d6f33ffacc086e25e0b718793993a Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-10T16:10:31Z Description of fork version commit 94f565d9585825fa62ffa7a2c9e94a2c9da29a23 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-10T16:23:30Z Rename README to README.md commit 423081388158be98a430d9497f0dabee179fc908 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-10T16:23:44Z Update README.md commit de2516f07706b9121714582f2a66be3522e8f19b Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-10T16:30:27Z Update README.md commit 2c9a52537609da8286678f5ab36047d97a4b182a Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-10T19:34:25Z Update README.md commit fec8778c2ee66dec2551895604be6da3755aea56 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-10T20:44:53Z First import from precedent fork commit ce845a4ef24b99d39fde14109e7089a6fb43 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-10T20:45:04Z Merge branch 'multilingual-indexing' of github.com:LICEF/jena into multilingual-indexing commit 80eae193d0dad5c2fa8f618666904a0637ac38ba Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-11T16:22:31Z Update README.md commit 4e7dcf6bbb0210d499de60a4fb8216f7d8bb2d2e Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-11T21:51:30Z Fix multilingual index retrieving commit b33e70847c270f03a2101f3934a97d986d892dde Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-11T21:51:36Z Merge branch 'multilingual-indexing' of github.com:LICEF/jena into multilingual-indexing commit 652abe5fb7c1d5f7491dac712280db47fd39bcb1 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-12T15:14:01Z Update README.md commit ae00f390ac104d4d72498029b93d5b94d9687a29 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-12T15:14:26Z Update README.md commit a05ca9d28fa57de8865fc784f572b9f6f2fc4f3b Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-11-20T15:39:46Z maven group management change commit 457328a19ec7709ce66cca9d0dafea71b101aa67 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-05T19:15:38Z Update README.md commit 3ba2117b042aa0c82a041144ab45462ea47809fa Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-05T19:27:06Z Works now in transaction mode with Jena. No more auto commit. call finishIndexing() or abortIndexing() on index when manipulation is over. commit 68943dbf74d4cb3aa2692839218f73cb10c1ed2a Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-05T19:27:13Z Merge branch 'multilingual-indexing' of github.com:LICEF/jena into multilingual-indexing commit e56bef1c3e6f0c978f7e768dc8b27e9257f6bcea Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-05T19:35:33Z Update README.md commit 6f0642bea2a4fd8b18327a2a0408b0fd31a7bcf6 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-16T20:41:39Z Update README.md commit 0889472a88d947c2b0006c2f20d9622b19242269 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-16T20:44:12Z Storage of index in transaction process for retrieve it in query execution commit c2273559666cce97d9e03c7af698070125411abc Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-16T20:44:20Z Merge branch 'multilingual-indexing' of github.com:LICEF/jena into multilingual-indexing commit 34b5517878f1f0d05f1023afae846eec2c7126a0 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-16T23:09:04Z Simplify index reference commit b568109d24ad7028db2260e28189049ab37d6c22 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-17T15:53:27Z languages for indexing are again in array of strings. commit 9f37922bc81dd37fdf7d5996b0eeb3f2fe94ba97 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-17T15:56:41Z Update README.md commit 75bfbd131e8cb86083b1a17238b1a903d5b8d9e0 Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-17T21:48:37Z separating get and remove for index context. commit 8ee5f27198b48b377ae392bf6016ae6985f1f26f Author: Alexis Miara alexis_mi...@hotmail.com Date: 2014-12-17T21
[GitHub] jena pull request: jena-text multilingual indexing
Github user amiara514 closed the pull request at: https://github.com/apache/jena/pull/51 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/51#issuecomment-92953251 I just wanted to point out that if your current code requires extra synchronization calls in the calling code, that won't work for the Fuseki use case. Indeed, so it's definitely the dataset transaction mechanism to manage the index transaction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/51#issuecomment-92387676 Hi Osma If you want to use jena-text with Fuseki, you need to attach an assembler description. Read the configuration section from [Text searches with SPARQL](https://jena.apache.org/documentation/query/text-query.html). In my opinion, using jena-text by code is more flexible. It let's you combine usage of 'default' and 'indexed' datasets. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/51#issuecomment-91261263 Hi Andy, I think that my previous message was not clear. Changes made for jena-text are independant from Comète. They are already under the Apache License. Comète uses this fork as an external dependency. So I think there's no licensing issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: jena-text multilingual indexing
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/51#issuecomment-90671001 Hi, Thanks for the comment, 1. I'm working for a university research center. There is no explicit copyright for this code, changes are done inside a GNU GPL open source project (https://github.com/LICEF/comete). Is it a standard way to proceed ? 2. Yes, more precisely on tag jena-1.12.1-rc2 (rev 050c298ada38749a1ff166a77851b963991e4785) PS: In the proposed version of code, both transactions in Lucene and Jena dataset are synchronized. Regards Alexis Date: Tue, 7 Apr 2015 10:11:50 -0700 From: notificati...@github.com To: j...@noreply.github.com CC: alexis_mi...@hotmail.com Subject: Re: [jena] jena-text multilingual indexing (#51) Hi there - thank you very much for the pull request. It looks very interesting. Can I ask a couple of things: Who owns the copyright on the code? if you work for a company or institution, often the company or institution owns the copyright. This is based on jena 2.12.1? I tried to apply it to the current codebase but there have been 2 significant contributions since then and the pull request does not align with the codebase nowadays. â Reply to this email directly or view it on GitHub. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---