[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-18 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-113273093
  
@afs like for the other doc I put it in a gist : 
https://gist.github.com/amiara514/ef839adb7c5cb9dd697d
It covers only the Deletion of Indexed Entities section of 
text-query.mdtext


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-16 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-112409094
  
Yes absolutly, you can merge it... before another conflict ;-)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-09 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-110457559
  
Done. 
I'll do changes of the Deletion of Indexed Entities part of jena-text doc 
once PR will be merged


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-08 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-110094664
  
I reorganized tests part


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-08 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-109986517
  
Hi, PR is mergeable again after conflict fixing of #72.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-08 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-110026337
  
Ok I see, I will add a similar case of graph-specific for deletion 
support.

One question about graph indexing. In jena-text documentation you mention: 
This allows for more efficient text queries when the query targets only a 
single named graph. But there's no example of using this (even in the tests).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-08 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-110032217
  
@osma oups, forget my message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-06-03 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-108383450
  
I have created a gist instead : 
https://gist.github.com/amiara514/262d73ed35580b7dbdfe


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-06-03 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-108376304
  
Ok, which address ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-06-02 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-108118228
  
I sent it from the edition page (via  Improve this Page) with anonymous 
access.
The default setting is a post to dev@jena.apache.org.

Another way to send the file ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-06-02 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-107989314
  
@afs : done, I modified the text-query.mdtext and sent it to the dev list.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-06-01 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-107572459
  
 The direct reflection of  site/trunk  is http://jena.staging.apache.org/ 
and the button should work there as well. 

Humm, this link leads to the same wrong place...
Ex: the section Configuring an Analyzer should finish with the 
explanation about specifiyng a query analyzer (New in Jena 2.13.0 is the 
optional ability to specify...) as the online version, and it's not the case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Add (?uri ?score) to jena-text

2015-06-01 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/72#issuecomment-107581689
  
Maybe @amiara514 would have a comment as I understood he is using 
jena-text via Java code? 

@osma : Not exactly, I execute Sparql queries via java code which involve 
jena-text. I don't manipulate it at this level.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-05-29 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-106848124
  
@osma The status is still pending. Ok, I will fix minor changes to provide 
a re-mergeable version.

1. yes maybe the unique ID could be configurable. Hence the feature would 
be a kind of: Enabling deletion mode. Agree with that ?

2. no problem for change it.

3. I will take a look






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Update TextDatasetFactory.java

2015-05-28 Thread amiara514
GitHub user amiara514 opened a pull request:

https://github.com/apache/jena/pull/74

Update TextDatasetFactory.java

Reintroducing previous static methods for backward compatibility

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amiara514/jena patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/74.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #74


commit 8b9c0ffb39bd6b6f4df8f7c359491cde891e1788
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2015-05-28T17:15:08Z

Update TextDatasetFactory.java

Reintroducing previous static methods for backward compatibility




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-27 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-106037240
  
Hi Andy,
I'm watching on the documentation part about linguistic stuff.
On the current doc, there are some references like : 
Starting with version 1.0.1, jena-text... or New in Jena 2.13.0 is 
the

Should I introduce my new changes with a reference of the next 3.0.0 
version of Jena ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Update TextQueryPF.java

2015-05-26 Thread amiara514
GitHub user amiara514 opened a pull request:

https://github.com/apache/jena/pull/73

Update TextQueryPF.java

A simple warning insertion about the comment : 
https://github.com/apache/jena/pull/64#issuecomment-105229534

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/amiara514/jena patch-1

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/73.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #73


commit 1503c2597cd0d3294a2295699d71d1b5ae31e9f1
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2015-05-26T19:15:14Z

Update TextQueryPF.java

A simple warning insertion about the comment : 
https://github.com/apache/jena/pull/64#issuecomment-105229534




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-25 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-105237062
  
All seems to be ok with the merge, thanks.

I'll try to make the doc by the end of the week.
A question about this... it will be online only when jena3 will be 
available, right ?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-25 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-105237426
  
I played a bit with this code. I noticed one potential issue: if I do a 
language specific query with a text:query argument such as 'lang:en', but there 
is not langField set for the index, the query parameter will just be silently 
ignored. Would it be better to return some kind of query error instead?

The documentation may be sufficient. no ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-22 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-104762759
  
I see deletion in /64.patch :

Date: Tue, 19 May 2015 14:41:32 -0400
Subject: [PATCH 3/3] langField implementation to store lang tags of 
literals +
 refactoring growing methods of TextDatasetFactory
...
 delete mode 100644 
jena-text/src/main/java/org/apache/jena/query/text/LuceneUtil.java
 create mode 100644 
jena-text/src/main/java/org/apache/jena/query/text/TextIndexConfig.java
 create mode 100644 
jena-text/src/main/java/org/apache/jena/query/text/analyzer/Util.java
 delete mode 100644 
jena-text/src/main/java/org/apache/jena/query/text/assembler/TextIndexLuceneMultilingualAssembler.java
 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-22 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-104720469
  
I tried applying it as a patch and  TextIndexLuceneMultilingualAssembler  
uses non-existent  TextDatasetFactory.createLuceneIndexMultilingual . Is this 
PR dependent on another?

Strange, `TextIndexLuceneMultilingualAssembler` has been deleted... 
`LuceneUtil` too. 
Which source code branch do you use ?

Would it be possible to put back  EntityDefinition  for maximum 
compatibility with the original code?

Ok, I'll do it soon.

There is document at query/text-query. In what way should that be updated?

Should I put paragraph explanations on this discussion or elsewhere ?
It will be a re-format of the [dev 
mail](http://mail-archives.apache.org/mod_mbox/jena-dev/201505.mbox/raw/%3CBLU181-W56D79F6BA6AC7103317BC6ECC20%40phx.gbl%3E/1/2)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-05-21 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-104257598
  
Like @osma said, there's no direct relation between those pulls (except 
jena-text)
Both were mixed in the previous proposal. I had splitted in 2 distinct 
pulls for a better understanding.

[#53](https://github.com/apache/jena/pull/53) is a small patch for cleaning 
Lucene index on SPARQL triple deletion (with literal objects).

[#64](https://github.com/apache/jena/pull/64) is a larger one for enabling 
linguistic index.
About that, a resume of #64 have been posted in another thread of dev 
mailing list (20-05-2015).
See details 
[here](http://mail-archives.apache.org/mod_mbox/jena-dev/201505.mbox/raw/%3CBLU181-W56D79F6BA6AC7103317BC6ECC20%40phx.gbl%3E/1/2).








---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-19 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-103661706
  
1. Not exactly, let's suppose that langField is used with name language.
If lang arg == none in the Sparql query, then the query extension for 
Lucene will be :
```
String qs2 = -language:*
```
with a minus before to exclude filled values

2. Ok I will submit it to the dev list. 
Is there a required formatting on on the message ?
ps: I will also mention the refactoring to obtain their advices on it.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-19 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-103635153
  
Languages can now be stored in the index by providing a langField param to 
EntityDefinition (assembler and java code). So you can use a TexIndexLucene 
with your own analyzer and target language of localized literals.

3 cases on Sparql clauses : 
```?s text:query (rdfs:label 'word' 'lang:en' )``` will target english 
literals
```?s text:query (rdfs:label 'word' 'lang:none')``` will target unlocalized 
literals
```?s text:query (rdfs:label 'word')``` will ignore language
NOTE: in Sparql queries, ```lang``` is a predefined keyword and the Lucene 
query will be mapped with the right langField name.  

Moreover, TextIndexConfig class has been introduced and EntityDefinition 
refactored to simplify TextDatasetFactory as desired.

LuceneUtil has moved as Util class in analyzer package. It still used with 
the LocalizedAnalyzerAssembler


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-15 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-102398852
  
Yes completely agree with that, otherwise TextDatasetFactory will be 
difficult to maintain.
And no more need of TextIndexLuceneMultilingualAssembler also.

TextIndexConfiguration should be about : analyzer, queryAnalyzer, 
multilingual,.. and graphField, langField concern EntityDefinition no ?
Moreover, to avoid the same growing constructors, EntityDefinition should 
be configurable with setters/getters.


Just a question about the usual worflow, who decides to integrate, or not, 
a proposed code ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-15 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-102412893
  
As it turns out the TextIndexConfiguration class I suggested will only be 
concerned with analyzer configuration, maybe it could be called 
TextIndexAnalyzerConfiguration or something like that.

I prefer TextIndexConfiguration, it will be easier to add future conf 
parameters.

Thanks for the explanation.

About the lang:xx, I think that extra params should be generalized in the 
same manner, limit:10, score:x,... Hence it would allow params to be 
optional and would remove the order and size constraints.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-14 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-102207391
  
Ok, it's not supposed to be a big job. I'll take a look soon.
For the multilingual analyzer, the lang must be stored anyway, it depends 
on it (like you said in point 2). So, either the langField param will be 
ignored or an exception will be raised to alert the forgotten field ?

Another point :
In the current version, I put an undef value rather than an empty one for 
the unlocalized literals. Because the query on empty field is not obvious with 
Lucene and I want to be able to search unlocalized values in explicit way. In 
your case, I don't think it will be a necessity.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-13 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/64#issuecomment-101808418
  
Ok, I will separate it to have a distinct behavior. 
Removing the isMultilingual() method will probably force to switch some 
private fields and/or methods to protected.

 So you should never see a literal such as book@eng in well-formed RDF 
data.

Yes unfortunaltely. It was to accept this case. Ok, if the strict 
implementation about standards is sufficient, I will bypass the 3 to 2 letters 
conversion.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Jena-text multilingual alternative implementati...

2015-05-13 Thread amiara514
GitHub user amiara514 opened a pull request:

https://github.com/apache/jena/pull/64

Jena-text multilingual alternative implementation

This proposal is an alternative of 
[pull52](https://github.com/apache/jena/pull/52) (JENA-928).
It deals with a single index that includes language as a new field entry.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LICEF/jena jena-text-ml-single-index

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/64.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #64


commit 9553c6b2c246bc9c05906096c1f56d65ba15fed8
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2015-05-13T15:23:56Z

Implementation of jena-text multilingual with a single index




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-12 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-101331810
  
Hi, I'm not against your suggestion, it's probably easier to deal with one 
index.
So, I made some positive tests that cover the points discussed previously 
(index a lang var and query on it). 
But a problem persists, we need to dynamically set the indexing analyzer on 
each triple addition, each of them may have a different language. 
I dont think it's possible to change it on the fly. The indexWriter config 
is done at start and the lock mechanism prevents it...




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-12 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-101387550
  
Thanks a lot, I had not seen this method.
Well, it seems to work and all tests pass.
Should I propose the OneIndex branch in a new pull request ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-11 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-100941772
  
Some new tests have been submitted.

About the implementation, your proposal would use a StandardAnalyzer on 
indexing phase and a localized queryAnalyzer for queries ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-11 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-101035985
  
 But since there is only a relatively small number of Lucene analyzers 
anyway, maybe this is OK.

It's why it's done like this :-)

No, that wouldn't work. You have to use the same analyzer for both 
indexing and queries (in this case, the language-specific analyzer), otherwise 
the tokens won't match. 

Exactly

 But I think it should still be possible to share the same index, if you 
have a field that specifies the language and make sure to target your queries 
only to the specific language.

Store the language as an extra field is easy to do during the document 
creation (on the addEntity method). Add an extra param in queries is not a 
problem either (done in my solution).
But how to change correctly the existent code to target Lucene taking that 
extra language into account ?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-07 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-99979270
  
Hi,
with the last proposal :
1) It's now possible to set multilingual indexing via assembler 
configuration file by defining the multilingual class and using it in the index 
definition :
```
[] ja:loadClass org.apache.jena.query.text.TextQuery .
text:TextDataset  rdfs:subClassOf   ja:RDFDataset .
#text:TextIndexLucene  rdfs:subClassOf   text:TextIndex .
text:TextIndexLuceneMultilingual rdfs:subClassOf   text:TextIndex .

#indexLucene a text:TextIndexLuceneMultilingual ;
text:directory file:Lucene ;
##text:directory mem ;
text:entityMap #entMap ;
.
```
This multilingual index manages all localized literals automatically with 
all Lucene localized analyzers.

2) Moreover, with a default Lucene index setup, a localized analyzer can be 
specified (as for SimpleAnalyzer, KeywordAnalyzer, etc...) by this config :

```
#indexLucene a text:TextIndexLucene ;
text:directory file:Lucene ;
text:entityMap #entMap ;
text:queryAnalyzer [
a text:LocalizedAnalyzer ;
text:language en
]
.
```

reference for JENA-928



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-05-05 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-99072949
  
 Ah, I see. But this still doesn't help for cases where there are small 
differences between literals within the same language, for example 
singular/plural forms that get stemmed by the analyzer, or variations in 
capitalization.

It's exactly that!
So, I push the hash solution which cover all previous cases.

For the other issue (with conjonctive query), maybe deletion have to be 
managed with an updateDocument ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-05-04 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-98744341
  
I'm curious, how would the multi language proposal help with this problem?

Multilingual index manages dynamically one index per language. Hence two 
same literals with different languages are not stored in the same index.

For the hash solution, it works fine with a sha1. So we have one more 
field by doc, but I don't think it's embarrassing for the final index size. 
Should I commit it ? 

I don't know either if it can disturbs the conjonctive stuff.
However, the addEntity interacts with the updateEntity, and entries are 
still corrspond to triples/quads isn't it ?

 




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-05-01 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-98183195
  
Hi Andy,
It's ok for the Jena3 new package format. The last commit already deals 
with it.

Ok, I'll write some tests soon..

For the documentation 2 questions to be sure:
1) Are we talking about this page : 
https://jena.apache.org/documentation/query/text-query.html ? 
2) Is there a special space to write it, or should I write a paragraph in 
this conversation ?

ps: I don't need a dediacated branch for the moment, thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-04-29 Thread amiara514
GitHub user amiara514 opened a pull request:

https://github.com/apache/jena/pull/53

Lucene index synchro on triple deletion (jena-text)

Hi,
This code synchronize the Lucene index (from jena-text) when a triple is 
removed from the associated graph. It is based on a trick  for exact match on 
Lucene. See it 
[here](http://blogs.perl.org/users/mark_leighton_fisher/2012/01/stupid-lucene-tricks-exact-match-starts-with-ends-with.html).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LICEF/jena upstream/jena-text-triple-deletion

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/53.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #53


commit a052b6d26c62a218c516aa64621d2505040be30a
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2015-04-29T18:52:16Z

Lucene index synchro on triple deletion on jena-text




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-04-28 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-97123004
  
Hi Osma, it's now ok for merge. 

For the other part, I will propose soon the triple deletion which clean 
the related entry in the index.

For the synchronization of lucene and tdb transactions, the current 
codebase seems to manage that. 
I'm not 100% sure yet. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-04-28 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/52#issuecomment-97165961
  
Hi Osma, 
about your comments: 
1. I'm not familiar with assembler configuration. But if you want to give 
some help ;-)
2. Ok, I will refactor it to leave previous signatures and calls.
3. Sure, it's more clean to extend Entity... ok, todo list. 
For the tests and doc, I 'm pretty busy at the moment.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing

2015-04-22 Thread amiara514
Github user amiara514 closed the pull request at:

https://github.com/apache/jena/pull/51


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing (take 2)

2015-04-22 Thread amiara514
GitHub user amiara514 opened a pull request:

https://github.com/apache/jena/pull/52

jena-text multilingual indexing (take 2)

Hi,
This version allows usage of localized Lucene indexes (in jena-text).
All existing Lucene languages analyzers are taken into account.

2 new cases in TextDatasetFactory :
- createLuceneFromLanguage : creation of lucene index with the associated 
Lucene analyzer.
- createLuceneMultilingual : creation of a dynamic multilingual index 
(collection of localized lucene index) depending on triple's literals languages.


On SPARQL side, the pattern is :

?uri text:query (property 'query' ['lang:language']) ; query is dispatched 
to the right Lucene index. 

Note 1: If the 'lang' arg is not present, it's the same default existing 
case.
Note 2 : for the moment, the 'lang' argument is not managed with ?limit and 
?score variables but works if they are not present.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LICEF/jena upstream/jena-text-multilingual

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/52.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #52


commit 5e4e1c1432f44151356fe25cc44c87e0085c1873
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2015-04-21T18:19:32Z

change on pom.xml to have local groupId

commit d3f21853c0d0556ad95ae06c393fb8a8619feb35
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2015-04-22T18:55:58Z

Introducing Lucene multilingual index

commit abdc602fe505167562b7ce9218433bf7c99f2f9e
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2015-04-21T18:19:32Z

change on pom.xml to have local groupId

commit a88b6e47a8ab0d595a1a7077f46fd8396ae3e89d
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2015-04-22T18:55:58Z

Introducing Lucene multilingual index

commit ad87c035d841243dfc972d2b0e220f207ed5
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2015-04-22T19:07:09Z

Merge branch 'upstream/jena-text-multilingual' of github.com:LICEF/jena 
into upstream/jena-text-multilingual

commit a125642e1f6bd8e9ec732784d897df6c4e7cd28c
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2015-04-22T19:44:31Z

original pom.xml




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing

2015-04-14 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/51#issuecomment-92953558
  
 I just wanted to point out that if your current code requires extra 
synchronization calls in the calling code, that won't work for the Fuseki use 
case. 

Indeed, so it's definitely the dataset transaction mechanism to manage the 
index transaction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing

2015-04-14 Thread amiara514
GitHub user amiara514 reopened a pull request:

https://github.com/apache/jena/pull/51

jena-text multilingual indexing

Hi,
Instead of having a default Lucene index, this version of jena-text allows 
to associate localized indexes with jena datasets (english, french and spanish 
for the moment).

See README.md for use cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/LICEF/jena multilingual-indexing

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/51.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #51


commit 9bfd6991628d6f33ffacc086e25e0b718793993a
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-10T16:10:31Z

Description of fork version

commit 94f565d9585825fa62ffa7a2c9e94a2c9da29a23
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-10T16:23:30Z

Rename README to README.md

commit 423081388158be98a430d9497f0dabee179fc908
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-10T16:23:44Z

Update README.md

commit de2516f07706b9121714582f2a66be3522e8f19b
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-10T16:30:27Z

Update README.md

commit 2c9a52537609da8286678f5ab36047d97a4b182a
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-10T19:34:25Z

Update README.md

commit fec8778c2ee66dec2551895604be6da3755aea56
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-10T20:44:53Z

First import from precedent fork

commit ce845a4ef24b99d39fde14109e7089a6fb43
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-10T20:45:04Z

Merge branch 'multilingual-indexing' of github.com:LICEF/jena into 
multilingual-indexing

commit 80eae193d0dad5c2fa8f618666904a0637ac38ba
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-11T16:22:31Z

Update README.md

commit 4e7dcf6bbb0210d499de60a4fb8216f7d8bb2d2e
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-11T21:51:30Z

Fix multilingual index retrieving

commit b33e70847c270f03a2101f3934a97d986d892dde
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-11T21:51:36Z

Merge branch 'multilingual-indexing' of github.com:LICEF/jena into 
multilingual-indexing

commit 652abe5fb7c1d5f7491dac712280db47fd39bcb1
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-12T15:14:01Z

Update README.md

commit ae00f390ac104d4d72498029b93d5b94d9687a29
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-12T15:14:26Z

Update README.md

commit a05ca9d28fa57de8865fc784f572b9f6f2fc4f3b
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-11-20T15:39:46Z

maven group management change

commit 457328a19ec7709ce66cca9d0dafea71b101aa67
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-05T19:15:38Z

Update README.md

commit 3ba2117b042aa0c82a041144ab45462ea47809fa
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-05T19:27:06Z

Works now in transaction mode with Jena. No more auto commit.
call finishIndexing() or abortIndexing() on index when manipulation is over.

commit 68943dbf74d4cb3aa2692839218f73cb10c1ed2a
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-05T19:27:13Z

Merge branch 'multilingual-indexing' of github.com:LICEF/jena into 
multilingual-indexing

commit e56bef1c3e6f0c978f7e768dc8b27e9257f6bcea
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-05T19:35:33Z

Update README.md

commit 6f0642bea2a4fd8b18327a2a0408b0fd31a7bcf6
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-16T20:41:39Z

Update README.md

commit 0889472a88d947c2b0006c2f20d9622b19242269
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-16T20:44:12Z

Storage of index in transaction process for retrieve it in query execution

commit c2273559666cce97d9e03c7af698070125411abc
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-16T20:44:20Z

Merge branch 'multilingual-indexing' of github.com:LICEF/jena into 
multilingual-indexing

commit 34b5517878f1f0d05f1023afae846eec2c7126a0
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-16T23:09:04Z

Simplify index reference

commit b568109d24ad7028db2260e28189049ab37d6c22
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-17T15:53:27Z

languages for indexing are again in array of strings.

commit 9f37922bc81dd37fdf7d5996b0eeb3f2fe94ba97
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-17T15:56:41Z

Update README.md

commit 75bfbd131e8cb86083b1a17238b1a903d5b8d9e0
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-17T21:48:37Z

separating get and remove for index context.

commit 8ee5f27198b48b377ae392bf6016ae6985f1f26f
Author: Alexis Miara alexis_mi...@hotmail.com
Date:   2014-12-17T21

[GitHub] jena pull request: jena-text multilingual indexing

2015-04-14 Thread amiara514
Github user amiara514 closed the pull request at:

https://github.com/apache/jena/pull/51


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing

2015-04-14 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/51#issuecomment-92953251
  
 I just wanted to point out that if your current code requires extra 
synchronization calls in the calling code, that won't work for the Fuseki use 
case. 

Indeed, so it's definitely the dataset transaction mechanism to manage the 
index transaction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing

2015-04-13 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/51#issuecomment-92387676
  
Hi Osma
If you want to use jena-text with Fuseki, you need to attach an assembler 
description. Read the configuration section from [Text searches with 
SPARQL](https://jena.apache.org/documentation/query/text-query.html). In my 
opinion, using jena-text by code is more flexible. It let's you combine usage 
of 'default' and 'indexed' datasets.





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing

2015-04-09 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/51#issuecomment-91261263
  
Hi Andy,

I think that my previous message was not clear.

Changes made for jena-text are independant from Comète. They are already 
under the Apache License. Comète uses this fork as an external dependency.
So I think there's no licensing issue.







---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: jena-text multilingual indexing

2015-04-07 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/51#issuecomment-90671001
  
Hi,
Thanks for the comment,
 
1. I'm working for a university research center. 
There is no explicit copyright for this code, changes are done inside a GNU 
GPL open source project (https://github.com/LICEF/comete).
Is it a standard way to proceed ?
 
2. Yes, more precisely on tag jena-1.12.1-rc2 (rev 
050c298ada38749a1ff166a77851b963991e4785)
 
 
PS: In the proposed version of code, both transactions in Lucene and Jena 
dataset are synchronized.
 
 
 
Regards
Alexis
 
 
 
 

 
Date: Tue, 7 Apr 2015 10:11:50 -0700
From: notificati...@github.com
To: j...@noreply.github.com
CC: alexis_mi...@hotmail.com
Subject: Re: [jena] jena-text multilingual indexing (#51)

Hi there - thank you very much for the pull request.  It looks very 
interesting.


Can I ask a couple of things:



Who owns the copyright on the code?  if you work for a company or 
institution, often the company or institution owns the copyright.
This is based on jena 2.12.1?  I tried to apply it to the current codebase 
but there have been 2 significant contributions since then and the pull request 
does not align with the codebase nowadays.


—
Reply to this email directly or view it on GitHub.

  


  
  
  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---