Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.

2015-06-08 Thread Andy Seaborne

On 08/06/15 10:25, Claude Warren wrote:

What exactly is this review asking?  Change in strategy or change in docs?


Both :-)

concurrency-howto does not mention transactions except in passing.  It 
shoudl be more pro-transactions IMO.


A possibility is that Dataset are all transactional, even is that is 
only DatasetGraphWithLock;


No Dataset.supportsTransactions - its always true.
Remove Dataset.getlock.

concurrency-howto would be for model-only use.  Everything else is 
transaction in style.  The documentation should reflect this preferred 
style.


If we had (hi ajs6f!) an in-memory dataset as well as the general 
container one, and the in-memory one were transactional, copy-in for 
addGraph, we could make models be views of datasets always.  Creating a 
model would have an implicit Dataset if a free standing model.


Andy



On Fri, Jun 5, 2015 at 8:30 PM, Andy Seaborne (JIRA) j...@apache.org
wrote:


Andy Seaborne created JENA-957:
--

  Summary: Review concurrency howto in the light of
transactions.
  Key: JENA-957
  URL: https://issues.apache.org/jira/browse/JENA-957
  Project: Apache Jena
   Issue Type: Bug
 Reporter: Andy Seaborne
 Priority: Minor


http://jena.apache.org/documentation/notes/concurrency-howto.html

Include {{DatasetGraphWithLock}}.

Consider if that should be the default for in-memory and general datasets.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)









Re: TDB2

2015-06-08 Thread Andy Seaborne

On 08/06/15 17:48, Marco Neumann wrote:

is TDB2 going to replace TDB or is TDB2 a new cluster product?


Whatever people (users, developers) want.  Migrating Dbs is not as easy 
as ungrading code.  Running oaj.tdb and oaj.tdb2 side by side


(TDB2 is itself 7 maven modules ATM - some can be combined as they are 
small and just a good idea at the time).


TDB2 is not the cluster (that's Lizard).  Mantis started as the 
separation out of the low level code needed for Lizard. Initially 
validation of the reworking of transaction and datastructures, a little 
extra work has made it as viable as TDB2


Andy

(oaj = org.apache.jena)



Marco

On Mon, Jun 8, 2015 at 11:41 AM, Andy Seaborne a...@apache.org wrote:

Informational announcement: TDB2

TDB2 is a reworking of TDB based on updated implementations of transactions
and transactional data structures for project Lizard (a clustered SPARQL
store).

TDB2 has:

* Arbitrary scale write-once transactions
* New transaction system - can add other first class components.
   (e.g. text indexes, cache tables)
* Models works across transaction boundaries
* Cleaner, simpler, more maintainable

TDB2 databases are not compatible with TDB databases.  It uses a more
efficient encoding for RDF terms.  [1]

Being a database, the new indexing and transaction code needs time to settle
to bring the maturity up.  I'm using that tech in Lizard development.

 Andy

TDB2 code:
https://github.com/afs/mantis/tree/master/tdb2

Lizard slides:
http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard


[1] An upgrade path using TDB1-style encoding is possible; it is an one-way
upgrade path and not reversible [2].  TDB2 adds control files for the
copy-on-write data structures that TDB1 does not understand.

[2] Actually, if the encoding is compatible, what will happen is that TDB1
will see the database at the time of the upgrade.  Welcome to copy-on-write
immutable data structures.








Re: Trouble Building Under Eclipse

2015-06-08 Thread Andy Seaborne
Hadoop/Elephas is an example of a general problem with Guava.  By 
reputation, upgrading Guava across versions has been problematic - 
subtle and not-so-subtle changes of behaviour or removed code.


When Jena is used as a library, the system or application in which it is 
used might use Guava itself - and need a specific version.  But Jena 
uses Guava and needs a specific version with certain code in it, which 
might be different.


We are isolating Jena's use of Guava from the system in which Jena is 
used.  Hadoop's have very strong requirements on Guava versions - it 
might well apply to other user applications as well.


We do exclude/ in the sense that dependency-reduced-pom.xml POM of 
jena-shared-guava does not mention com.google.guava:guava. Elephas picks 
up the Hadoop dependency.


Andy

On 08/06/15 14:26, aj...@virginia.edu wrote:

I think the idea of breaking the shaded Guava artifact out of the
main  cycle is great. It's clearly not a subject of work under most
circumstances and having one less moving part in a developer's mix is
usually a good thing, especially for the simple-minded ({raises hand}).

Is it only Hadoop's Guava that is at issue? Would it be possible
perhaps to just exclude/ Guava from the Hadoop dependencies in
Elephas? Or does that blow up Hadoop? Or should I go experiment and find
out?



---
A. Soroka
The University of Virginia Library

On Jun 8, 2015, at 9:21 AM, Andy Seaborne a...@apache.org wrote:


Ah right. To summarise what is happening:

The POM file in the maven repo is not the POM file in git.The shade plugin 
produces a different POM for the the output artifact with the shaded dependency 
removed.

When the project is not open, Eclipse sees the reduced POM, which does not have a 
dependency on Google Guava.

When the module jena-shaded-guava is open in Eclipse, Eclipse sees the POM in the 
module source which names the dependent Google Guava in  a dependency.

Result: a certain degree of chaos.

Andy

On 06/06/15 03:19, Stian Soiland-Reyes wrote:

Yes, you would need to keep the jena-guava project closed so you get the
Maven-built shaded jar on the classpath, which has the shaded package name,
otherwise you will just see the upstream Guava through Eclipse's project
sharing.

The package name is not shaded for OSGi, it is easy to define private
packages there. It is shaded to avoid duplicate version mismatches against
other dependencies with the real guava, e.g. Hadoop which as you know has
an ancient Guava.

It might be good to keep it out of the normal build/release cycle, then you
would get the jena-guava shade from Maven central, which should only change
when we upgrade Guava, in which case it could be re-enabled in the SNAPSHOT
build or vote+released as a separate artifact (which might be slightly odd
as it contains no Jena contributions beyond the package name)
  On 4 Jun 2015 14:33, aj...@virginia.edu aj...@virginia.edu wrote:


I have had this problem since I began tinkering. The only solution I have
found is make sure that the jena-shaded-guava project is never open when
any project that refers to types therein is open. This isn't much of a
burden, and I suppose it has something to do with the Maven magic that is
going on inside jena-shaded-guava.

I'm not totally clear as to why Jena shades Guava into its own namespace--
is it to avoid OSGi-exporting Guava packages? (We have something like that
going on in another project on which I work.)

---
A. Soroka
The University of Virginia Library

On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote:


Folks

Recently I've been having a lot of trouble getting Jena to build in

Eclipse

which seems to be due to the use of the Shade plugin to Shade Guava.  Any
module that has a reference to the shaded classes ends refuses to build

with

various variations of the following error:

java.lang.NoClassDefFoundError:
org/apache/jena/ext/com/google/common/cache/RemovalNotification

Anybody else been having this issue?  If so how did you resolve it?

Sometimes cleaning my workspace and/or doing a mvn package at the command
line seems to help but other times it doesn't

Rob
















Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.

2015-06-08 Thread aj...@virginia.edu
So to be clear, part of the idea here is to boost the visibility of 
transactions, and one of the things that wants doing as part of that is to 
provide for copy-on-add-graph semantics for the in-memory dataset so that 
transactionality is coherent across such a dataset. Right now it instead is a 
sort of patchwork of whatever forms of transactionality were available in the 
graphs that have been added to it, which isn't an attractive thing to 
advertise, and may not even really work all the time.

As far as model-as-views-of-datasets: is it true that all that is needed for 
this is a good in-memory dataset? What about datasets that are much too large 
for memory? Or impls of Dataset that incur network latency in operation? Or do 
these cases just imply the need for the right kinds of laziness in views on 
Datasets?

---
A. Soroka
The University of Virginia Library

On Jun 8, 2015, at 3:23 PM, Andy Seaborne a...@apache.org wrote:

 On 08/06/15 10:25, Claude Warren wrote:
 What exactly is this review asking?  Change in strategy or change in docs?
 
 Both :-)
 
 concurrency-howto does not mention transactions except in passing.  It shoudl 
 be more pro-transactions IMO.
 
 A possibility is that Dataset are all transactional, even is that is only 
 DatasetGraphWithLock;
 
 No Dataset.supportsTransactions - its always true.
 Remove Dataset.getlock.
 
 concurrency-howto would be for model-only use.  Everything else is 
 transaction in style.  The documentation should reflect this preferred style.
 
 If we had (hi ajs6f!) an in-memory dataset as well as the general container 
 one, and the in-memory one were transactional, copy-in for addGraph, we could 
 make models be views of datasets always.  Creating a model would have an 
 implicit Dataset if a free standing model.
 
   Andy
 
 
 On Fri, Jun 5, 2015 at 8:30 PM, Andy Seaborne (JIRA) j...@apache.org
 wrote:
 
 Andy Seaborne created JENA-957:
 --
 
  Summary: Review concurrency howto in the light of
 transactions.
  Key: JENA-957
  URL: https://issues.apache.org/jira/browse/JENA-957
  Project: Apache Jena
   Issue Type: Bug
 Reporter: Andy Seaborne
 Priority: Minor
 
 
 http://jena.apache.org/documentation/notes/concurrency-howto.html
 
 Include {{DatasetGraphWithLock}}.
 
 Consider if that should be the default for in-memory and general datasets.
 
 
 
 --
 This message was sent by Atlassian JIRA
 (v6.3.4#6332)
 
 
 
 
 



[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-08 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-110094664
  
I reorganized tests part


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [ANN] GSoC 2015 Accepts a Student Project for Jena

2015-06-08 Thread Andy Seaborne

On 08/06/15 10:27, Qihong Lin wrote:

Hi,

The grammar has been modified for the problems you pointed out.
I've tried to run grammar script to generate arq.jj, sparql_11.jj and
their parser java classes, in cygwin with JavaCC 5.0. However the
generated java classes are different from those in the code base:
1) ARQParser
- the new generated one:
public class ARQParser extends ARQParserBase implements ARQParserConstants
- the old one in the code base:
public class ARQParser extends ARQParserBase


Ignore that difference - implements ARQParserConstants is fine and 
correct.


(ARQParserBase implements ARQParserConstants)

ARQParser got modified in some code clean up and should not have been.


There's no such difference for SPARQLParser11 (both new and old ones
have implements ...)


Good.


2) checksum for Token, ParseException, JavaCharStream and so on
- the new generated one (Token.java):
/* JavaCC - OriginalChecksum=335d1922781852977208d5cdca0fc164 (do not
edit this line) */
- the old one in the code base (Token.java):
/* JavaCC - OriginalChecksum=d9b4c8c9332fa3054a004615fdb22b89 (do not
edit this line) */


I have no idea what the checksum is a checksum of!

If the line endings are different, the checksums might be affected.


The log from grammar script seems OK:

$ ./grammar
 Process grammar -- sparql_11.jj
Java Compiler Compiler Version 5.0 (Parser Generator)


Ok - version 5.0.


(type javacc with no arguments for help)
Reading from file sparql_11.jj . . .
File TokenMgrError.java does not exist.  Will create one.
File ParseException.java does not exist.  Will create one.
File Token.java does not exist.  Will create one.
File JavaCharStream.java does not exist.  Will create one.
Parser generated successfully.
 Create text form
Java Compiler Compiler Version 5.0 (Documentation Generator Version 0.1.4)
(type jjdoc with no arguments for help)
Reading from file sparql_11.jj . . .
Grammar documentation generated successfully in sparql_11.txt
 Fixing Java warnings in TokenManager ...
 Fixing Java warnings in Token ...
 Fixing Java warnings in TokenMgrError ...
 Fixing Java warnings in SPARQLParser11 ...
 Done
 Process grammar -- arq.jj
Java Compiler Compiler Version 5.0 (Parser Generator)
(type javacc with no arguments for help)
Reading from file arq.jj . . .
File TokenMgrError.java does not exist.  Will create one.
File ParseException.java does not exist.  Will create one.
File Token.java does not exist.  Will create one.
File JavaCharStream.java does not exist.  Will create one.


does not exist is to be expected.  The script deletes old files before 
it runs javacc to ensure everything is clean.



Parser generated successfully.
 Create text form
Java Compiler Compiler Version 5.0 (Documentation Generator Version 0.1.4)
(type jjdoc with no arguments for help)
Reading from file arq.jj . . .
Grammar documentation generated successfully in arq.txt
 Fixing Java warnings in TokenManager ...
 Fixing Java warnings in Token ...
 Fixing Java warnings in TokenMgrError ...
 Fixing Java warnings in ARQParser ...
 Done

Is that the expected behavior for the grammar script? Anything wrong?


looks good.

If the ARQ test suite runs, it should be good.

cd jena-arq ; mvn clean test



regard,
Qihong


On Sat, Jun 6, 2015 at 11:05 AM, Ying Jiang jpz6311...@gmail.com wrote:

Hi,

The grammar needs revisions in some way. For example, in your
proposal, the GRAPH token can be optional. Another problem for default
graph: both  { ?s :p ?o } and  ?s :p ?o are valid, so QuadsNotTriples
should be re-defined.

On the other hand, you can start playing with the code of master.jj.
There's no need to wait until the grammar is ready. Your code is
supposed to be delivered as soon as possible. We can have early
feedback from the end users. Merging early will also reduce any
problems with several people changing the same file.

Best regards,
Ying Jiang

On Fri, Jun 5, 2015 at 6:25 PM, Qihong Lin confidence@gmail.com wrote:

Hi,

I added the grammar draft at the end of [1]. There're actually minor
changes on the grammar of ConstructQuery, which are marked red. Much
of the grammar from SPARQL INSERT can be reused, related to Quads. Any
comments?


regards,
Qihong

[1] 
https://docs.google.com/document/d/1KiDlfxMq5ZsU7vj7ZDm10yC96OZgdltwmZAZl56sTw0




Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.

2015-06-08 Thread Andy Seaborne

On 08/06/15 20:38, aj...@virginia.edu wrote:

So to be clear, part of the idea here is to boost the visibility of
transactions, and one of the things that wants doing as part of that
is to provide for copy-on-add-graph semantics for the in-memory
dataset so that transactionality is coherent across such a dataset.
Right now it instead is a sort of patchwork of whatever forms of
transactionality were available in the graphs that have been added to
it, which isn't an attractive thing to advertise, and may not even
really work all the time.


less - there is no transactionality across the contained graphs. 
(Model.graph transactions aren't connected to dataset transactions)



As far as model-as-views-of-datasets: is it true that all that is
needed for this is a good in-memory dataset?


It would be useful for working in-memory. For example default union 
graph can bne made to work efficiently, as can dataset transactions.



What about datasets that
are much too large for memory? Or impls of Dataset that incur network
latency in operation? Or do these cases just imply the need for the
right kinds of laziness in views on Datasets?


Models from TDB are already views.

public class GraphTDB extends GraphView ...

Andy




--- A. Soroka The University of Virginia Library

On Jun 8, 2015, at 3:23 PM, Andy Seaborne a...@apache.org wrote:


On 08/06/15 10:25, Claude Warren wrote:

What exactly is this review asking?  Change in strategy or change
in docs?


Both :-)

concurrency-howto does not mention transactions except in passing.
It shoudl be more pro-transactions IMO.

A possibility is that Dataset are all transactional, even is that
is only DatasetGraphWithLock;

No Dataset.supportsTransactions - its always true. Remove
Dataset.getlock.

concurrency-howto would be for model-only use.  Everything else is
transaction in style.  The documentation should reflect this
preferred style.

If we had (hi ajs6f!) an in-memory dataset as well as the general
container one, and the in-memory one were transactional, copy-in
for addGraph, we could make models be views of datasets always.
Creating a model would have an implicit Dataset if a free standing
model.

Andy



On Fri, Jun 5, 2015 at 8:30 PM, Andy Seaborne (JIRA)
j...@apache.org wrote:


Andy Seaborne created JENA-957:
--

Summary: Review concurrency howto in the light of
transactions. Key: JENA-957 URL:
https://issues.apache.org/jira/browse/JENA-957 Project: Apache
Jena Issue Type: Bug Reporter: Andy Seaborne Priority: Minor


http://jena.apache.org/documentation/notes/concurrency-howto.html




Include {{DatasetGraphWithLock}}.


Consider if that should be the default for in-memory and
general datasets.



-- This message was sent by Atlassian JIRA (v6.3.4#6332)













Re: TDB2

2015-06-08 Thread Marco Neumann
is TDB2 going to replace TDB or is TDB2 a new cluster product?

Marco

On Mon, Jun 8, 2015 at 11:41 AM, Andy Seaborne a...@apache.org wrote:
 Informational announcement: TDB2

 TDB2 is a reworking of TDB based on updated implementations of transactions
 and transactional data structures for project Lizard (a clustered SPARQL
 store).

 TDB2 has:

 * Arbitrary scale write-once transactions
 * New transaction system - can add other first class components.
   (e.g. text indexes, cache tables)
 * Models works across transaction boundaries
 * Cleaner, simpler, more maintainable

 TDB2 databases are not compatible with TDB databases.  It uses a more
 efficient encoding for RDF terms.  [1]

 Being a database, the new indexing and transaction code needs time to settle
 to bring the maturity up.  I'm using that tech in Lizard development.

 Andy

 TDB2 code:
 https://github.com/afs/mantis/tree/master/tdb2

 Lizard slides:
 http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard


 [1] An upgrade path using TDB1-style encoding is possible; it is an one-way
 upgrade path and not reversible [2].  TDB2 adds control files for the
 copy-on-write data structures that TDB1 does not understand.

 [2] Actually, if the encoding is compatible, what will happen is that TDB1
 will see the database at the time of the upgrade.  Welcome to copy-on-write
 immutable data structures.



-- 


---
Marco Neumann
KONA


Re: [ANN] GSoC 2015 Accepts a Student Project for Jena

2015-06-08 Thread Qihong Lin
Hi,

The grammar has been modified for the problems you pointed out.
I've tried to run grammar script to generate arq.jj, sparql_11.jj and
their parser java classes, in cygwin with JavaCC 5.0. However the
generated java classes are different from those in the code base:
1) ARQParser
- the new generated one:
public class ARQParser extends ARQParserBase implements ARQParserConstants
- the old one in the code base:
public class ARQParser extends ARQParserBase
There's no such difference for SPARQLParser11 (both new and old ones
have implements ...)
2) checksum for Token, ParseException, JavaCharStream and so on
- the new generated one (Token.java):
/* JavaCC - OriginalChecksum=335d1922781852977208d5cdca0fc164 (do not
edit this line) */
- the old one in the code base (Token.java):
/* JavaCC - OriginalChecksum=d9b4c8c9332fa3054a004615fdb22b89 (do not
edit this line) */

The log from grammar script seems OK:

$ ./grammar
 Process grammar -- sparql_11.jj
Java Compiler Compiler Version 5.0 (Parser Generator)
(type javacc with no arguments for help)
Reading from file sparql_11.jj . . .
File TokenMgrError.java does not exist.  Will create one.
File ParseException.java does not exist.  Will create one.
File Token.java does not exist.  Will create one.
File JavaCharStream.java does not exist.  Will create one.
Parser generated successfully.
 Create text form
Java Compiler Compiler Version 5.0 (Documentation Generator Version 0.1.4)
(type jjdoc with no arguments for help)
Reading from file sparql_11.jj . . .
Grammar documentation generated successfully in sparql_11.txt
 Fixing Java warnings in TokenManager ...
 Fixing Java warnings in Token ...
 Fixing Java warnings in TokenMgrError ...
 Fixing Java warnings in SPARQLParser11 ...
 Done
 Process grammar -- arq.jj
Java Compiler Compiler Version 5.0 (Parser Generator)
(type javacc with no arguments for help)
Reading from file arq.jj . . .
File TokenMgrError.java does not exist.  Will create one.
File ParseException.java does not exist.  Will create one.
File Token.java does not exist.  Will create one.
File JavaCharStream.java does not exist.  Will create one.
Parser generated successfully.
 Create text form
Java Compiler Compiler Version 5.0 (Documentation Generator Version 0.1.4)
(type jjdoc with no arguments for help)
Reading from file arq.jj . . .
Grammar documentation generated successfully in arq.txt
 Fixing Java warnings in TokenManager ...
 Fixing Java warnings in Token ...
 Fixing Java warnings in TokenMgrError ...
 Fixing Java warnings in ARQParser ...
 Done

Is that the expected behavior for the grammar script? Anything wrong?

regard,
Qihong


On Sat, Jun 6, 2015 at 11:05 AM, Ying Jiang jpz6311...@gmail.com wrote:
 Hi,

 The grammar needs revisions in some way. For example, in your
 proposal, the GRAPH token can be optional. Another problem for default
 graph: both  { ?s :p ?o } and  ?s :p ?o are valid, so QuadsNotTriples
 should be re-defined.

 On the other hand, you can start playing with the code of master.jj.
 There's no need to wait until the grammar is ready. Your code is
 supposed to be delivered as soon as possible. We can have early
 feedback from the end users. Merging early will also reduce any
 problems with several people changing the same file.

 Best regards,
 Ying Jiang

 On Fri, Jun 5, 2015 at 6:25 PM, Qihong Lin confidence@gmail.com wrote:
 Hi,

 I added the grammar draft at the end of [1]. There're actually minor
 changes on the grammar of ConstructQuery, which are marked red. Much
 of the grammar from SPARQL INSERT can be reused, related to Quads. Any
 comments?


 regards,
 Qihong

 [1] 
 https://docs.google.com/document/d/1KiDlfxMq5ZsU7vj7ZDm10yC96OZgdltwmZAZl56sTw0

 On Tue, Jun 2, 2015 at 10:10 PM, Ying Jiang jpz6311...@gmail.com wrote:
 Hi Qihong,

 Your grammar in the proposal is not formal. Why not compose a BNF/EBNF
 notation one, so that others can provide more accurate comments? e.g,
 the WHERE clause for the complete form and short form are quite
 different. No complex graph patterns are allowed in the short form).

 Best regards,
 Ying Jiang

 On Thu, May 28, 2015 at 10:59 PM, Qihong Lin confidence@gmail.com 
 wrote:
 Hi,

 Ying,
 I'll stick to the list for discussion. Thanks for your guide! I
 re-created a fresh new branch of JENA-491, which did not contain hp
 package any more.

 Andy,
 You mention that the GRAPH grammar needs revisions. Please check the
 following ones. I add the short form. Am I missing anything else?

 Complete form:

 CONSTRUCT {

# Named graph

GRAPH :g { ?s :p ?o }

# Default graph

{ ?s :p ?o }

# Named graph

:g { ?s :p ?o }

# Default graph

?s :p ?o

 } WHERE { ... }

 Short form:

 CONSTRUCT {

 } WHERE { ... }

 regards,
 Qihong



 On Tue, May 26, 2015 at 11:12 PM, Ying Jiang jpz6311...@gmail.com wrote:
 Hi Qihong,

 As Andy mentioned, the bonding period is for community bonding, not
 just mentor 

Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.

2015-06-08 Thread Claude Warren
What exactly is this review asking?  Change in strategy or change in docs?

On Fri, Jun 5, 2015 at 8:30 PM, Andy Seaborne (JIRA) j...@apache.org
wrote:

 Andy Seaborne created JENA-957:
 --

  Summary: Review concurrency howto in the light of
 transactions.
  Key: JENA-957
  URL: https://issues.apache.org/jira/browse/JENA-957
  Project: Apache Jena
   Issue Type: Bug
 Reporter: Andy Seaborne
 Priority: Minor


 http://jena.apache.org/documentation/notes/concurrency-howto.html

 Include {{DatasetGraphWithLock}}.

 Consider if that should be the default for in-memory and general datasets.



 --
 This message was sent by Atlassian JIRA
 (v6.3.4#6332)




-- 
I like: Like Like - The likeliest place on the web
http://like-like.xenei.com
LinkedIn: http://www.linkedin.com/in/claudewarren


[jira] [Created] (JENA-959) riot: gzip output option

2015-06-08 Thread Stian Soiland-Reyes (JIRA)
Stian Soiland-Reyes created JENA-959:


 Summary: riot: gzip output option
 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial


The riot command line tool supports incoming file formats like *.ttl.gz, but 
there is no (obvious) way to also output in compressed formats.

This can of course also be achieved with piping and gzip, but that is easily 
platform-specific and not so easily 

So my suggestion is to support extension .gz in the various --output options to 
enabled outputting via a GzipOutputStream -- 
http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html

For example:

{code}
stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
chembl_20.0_target_targetcmpt_ls.ttl.gz 
Not recognized as an RDF language : 'nquads.gz'
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (JENA-959) riot: gzip output option

2015-06-08 Thread Stian Soiland-Reyes (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stian Soiland-Reyes updated JENA-959:
-
Description: 
The riot command line tool supports incoming file formats like *.ttl.gz, but 
there is no (obvious) way to also output in compressed formats.

This can of course also be achieved with piping and gzip, but that is easily 
platform-specific. Writing *.format.gz with the command line is probably as 
much within remit of someone using riot on the command line as for reading 
those.

So my suggestion is to support extension .gz in the various --output options to 
enabled outputting via a GzipOutputStream -- 
http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html

For example:

{code}
stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
chembl_20.0_target_targetcmpt_ls.ttl.gz 
Not recognized as an RDF language : 'nquads.gz'
{code}

  was:
The riot command line tool supports incoming file formats like *.ttl.gz, but 
there is no (obvious) way to also output in compressed formats.

This can of course also be achieved with piping and gzip, but that is easily 
platform-specific and not so easily 

So my suggestion is to support extension .gz in the various --output options to 
enabled outputting via a GzipOutputStream -- 
http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html

For example:

{code}
stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
chembl_20.0_target_targetcmpt_ls.ttl.gz 
Not recognized as an RDF language : 'nquads.gz'
{code}


 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific. Writing *.format.gz with the command line is probably as 
 much within remit of someone using riot on the command line as for reading 
 those.
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-959) riot: gzip output option

2015-06-08 Thread A. Soroka (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577215#comment-14577215
 ] 

A. Soroka edited comment on JENA-959 at 6/8/15 1:53 PM:


What do you think of the idea of an independent flag ({{--compress}} or the 
like). Since compression can be applied orthogonally to any format, it seems a 
little simpler to keep it separate.


was (Author: ajs6f):
What do you think of the idea of an independent flag ({{--compress}}) or the 
like. Since compression can be applied orthogonally to any format, it seems a 
little simpler to keep it separate.

 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific and not so easily 
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-959) riot: gzip output option

2015-06-08 Thread A. Soroka (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577215#comment-14577215
 ] 

A. Soroka edited comment on JENA-959 at 6/8/15 1:53 PM:


What do you think of the idea of an independent flag ({{--compress}}) or the 
like. Since compression can be applied orthogonally to any format, it seems a 
little simpler to keep it separate.


was (Author: ajs6f):
What do you think of the idea of an independent flag ({{{--compress}}}) or the 
like. Since compression can be applied orthogonally to any format, it seems a 
little simpler to keep it separate.

 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific and not so easily 
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-959) riot: gzip output option

2015-06-08 Thread A. Soroka (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577215#comment-14577215
 ] 

A. Soroka commented on JENA-959:


What do you think of the idea of an independent flag ({{{--compress}}}) or the 
like. Since compression can be applied orthogonally to any format, it seems a 
little simpler to keep it separate.

 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific and not so easily 
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-959) riot: gzip output option

2015-06-08 Thread Stian Soiland-Reyes (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577220#comment-14577220
 ] 

Stian Soiland-Reyes commented on JENA-959:
--

Yeah, either should work. It might be worth also having explicit compression 
support for input formats.. FOr instance now it works with:

{code}
riot --syntax=turtle chembl_20.0_target_targetcmpt_ls.ttl.gz

http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 
http://www.w3.org/2004/02/skos/core#relatedMatch 
http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7619 .
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 
http://www.w3.org/2004/02/skos/core#relatedMatch 
http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7612 .
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 
http://www.w3.org/2004/02/skos/core#relatedMatch 
http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7611 .

{code}

but it is still guessing the .gz from the filename.. so I can't do the same if 
I have piped in a gziped stream or don't have a valid extension:

{code}
stain@biggie-utopic:~/Downloads$ riot --syntax=nquads fred
stain@biggie-utopic:~/Downloads$ riot --syntax=turtle fred
Exception in thread main org.apache.jena.atlas.RuntimeIOException: 
java.nio.charset.MalformedInputException: Input length = 1
at org.apache.jena.atlas.io.IO.exception(IO.java:222)
at 
org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
at 
org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
at 
org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
at 
org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:241)
at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:235)
at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:157)
at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:98)
at 
org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41)
at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:138)
at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:180)
at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:267)
at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:185)
at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:175)
at riotcmd.CmdLangParse.exec(CmdLangParse.java:148)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at riotcmd.riot.main(riot.java:35)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.Read
{code}


So for this I would appreciate if --syntax supported the same compression 
option:

{code}
stain@biggie-utopic:~/Downloads$ riot --syntax=turtle.gz fred
Can not detemine the synatx from 'turtle.gz'
{code}

 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific. Writing *.format.gz with the command line is probably as 
 much within remit of someone using riot on the command line as for reading 
 those.
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-959) riot: gzip output option

2015-06-08 Thread Stian Soiland-Reyes (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577219#comment-14577219
 ] 

Stian Soiland-Reyes commented on JENA-959:
--

Yeah, either should work. It might be worth also having explicit compression 
support for input formats.. FOr instance now it works with:

{code}
riot --syntax=turtle chembl_20.0_target_targetcmpt_ls.ttl.gz

http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 
http://www.w3.org/2004/02/skos/core#relatedMatch 
http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7619 .
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 
http://www.w3.org/2004/02/skos/core#relatedMatch 
http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7612 .
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 
http://www.w3.org/2004/02/skos/core#relatedMatch 
http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7611 .

{code}

but it is still guessing the .gz from the filename.. so I can't do the same if 
I have piped in a gziped stream or don't have a valid extension:

{code}
stain@biggie-utopic:~/Downloads$ riot --syntax=nquads fred
stain@biggie-utopic:~/Downloads$ riot --syntax=turtle fred
Exception in thread main org.apache.jena.atlas.RuntimeIOException: 
java.nio.charset.MalformedInputException: Input length = 1
at org.apache.jena.atlas.io.IO.exception(IO.java:222)
at 
org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
at 
org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
at 
org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
at 
org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:241)
at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:235)
at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:157)
at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:98)
at 
org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41)
at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:138)
at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:180)
at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:267)
at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:185)
at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:175)
at riotcmd.CmdLangParse.exec(CmdLangParse.java:148)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at riotcmd.riot.main(riot.java:35)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.Read
{code}


So for this I would appreciate if --syntax supported the same compression 
option:

{code}
stain@biggie-utopic:~/Downloads$ riot --syntax=turtle.gz fred
Can not detemine the synatx from 'turtle.gz'
{code}

 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific. Writing *.format.gz with the command line is probably as 
 much within remit of someone using riot on the command line as for reading 
 those.
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Issue Comment Deleted] (JENA-959) riot: gzip output option

2015-06-08 Thread Stian Soiland-Reyes (JIRA)

 [ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stian Soiland-Reyes updated JENA-959:
-
Comment: was deleted

(was: Yeah, either should work. It might be worth also having explicit 
compression support for input formats.. FOr instance now it works with:

{code}
riot --syntax=turtle chembl_20.0_target_targetcmpt_ls.ttl.gz

http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 
http://www.w3.org/2004/02/skos/core#relatedMatch 
http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7619 .
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 
http://www.w3.org/2004/02/skos/core#relatedMatch 
http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7612 .
http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 
http://www.w3.org/2004/02/skos/core#relatedMatch 
http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7611 .

{code}

but it is still guessing the .gz from the filename.. so I can't do the same if 
I have piped in a gziped stream or don't have a valid extension:

{code}
stain@biggie-utopic:~/Downloads$ riot --syntax=nquads fred
stain@biggie-utopic:~/Downloads$ riot --syntax=turtle fred
Exception in thread main org.apache.jena.atlas.RuntimeIOException: 
java.nio.charset.MalformedInputException: Input length = 1
at org.apache.jena.atlas.io.IO.exception(IO.java:222)
at 
org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77)
at 
org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154)
at 
org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137)
at 
org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:241)
at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:235)
at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:157)
at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:98)
at 
org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41)
at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:138)
at 
org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:180)
at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:267)
at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:185)
at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:175)
at riotcmd.CmdLangParse.exec(CmdLangParse.java:148)
at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:63)
at arq.cmdline.CmdMain.mainRun(CmdMain.java:50)
at riotcmd.riot.main(riot.java:35)
Caused by: java.nio.charset.MalformedInputException: Input length = 1
at java.nio.charset.CoderResult.throwException(CoderResult.java:281)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.Read
{code}


So for this I would appreciate if --syntax supported the same compression 
option:

{code}
stain@biggie-utopic:~/Downloads$ riot --syntax=turtle.gz fred
Can not detemine the synatx from 'turtle.gz'
{code})

 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific. Writing *.format.gz with the command line is probably as 
 much within remit of someone using riot on the command line as for reading 
 those.
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-959) riot: gzip output option

2015-06-08 Thread A. Soroka (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577228#comment-14577228
 ] 

A. Soroka commented on JENA-959:


Okay, I'll take this ticket forward a bit working on the assumption that a 
separate flag for output compression is best. I agree that 'manually 
adjustable' input compression would be nice, and I think that belongs in a 
separate ticket, or maybe we break this one down into subtasks?

 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific. Writing *.format.gz with the command line is probably as 
 much within remit of someone using riot on the command line as for reading 
 those.
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (JENA-959) riot: gzip output option

2015-06-08 Thread A. Soroka (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577215#comment-14577215
 ] 

A. Soroka edited comment on JENA-959 at 6/8/15 1:53 PM:


What do you think of the idea of an independent flag ({{--compress}} or the 
like)? Since compression can be applied orthogonally to any format, it seems a 
little simpler to keep it separate.


was (Author: ajs6f):
What do you think of the idea of an independent flag ({{--compress}} or the 
like). Since compression can be applied orthogonally to any format, it seems a 
little simpler to keep it separate.

 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific and not so easily 
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-959) riot: gzip output option

2015-06-08 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577264#comment-14577264
 ] 

Andy Seaborne commented on JENA-959:


Currently, it is reasonably orthogonal to the format.

The asymmetries are
# {{riot}} -- output is not named where as input is usually.
# {{RDFDataMgr}} takes I/O streams, not file names.

{{RDFLanguages.filenameToLang}} maps file extension to language symbol and it 
handles {{.gz}}.  {{Lang}} themselves don't register compressions i.e. don't 
have a specific file extension of {{.ttl.gz}}. 

Then when reading, {{IO.openFileEx(String)}} has the similar understanding of 
{{.gz}} and it adds the decompressor.

{{IO.openOutputFileEx(String)}} already has the complementary code to 
{{IO.openFileEx(String)}} to add the compressor.

This then all works from {{RDFDataMgr.(read|load)}} and {{model.read}}. The 
command {{riot}} isn't special for input.

Making syntax names work with compression extensions look interesting.  

If {{--compress}} then {{--decompress}} for stream in.

Don't forget {{http://.../gz}} case and decompression (i.e. when the HTTP 
response does not add the decompression step as you GET the compressed file.

 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific. Writing *.format.gz with the command line is probably as 
 much within remit of someone using riot on the command line as for reading 
 those.
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-08 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-109986517
  
Hi, PR is mergeable again after conflict fixing of #72.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-08 Thread osma
Github user osma commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-109996262
  
Thanks for fixing, and sorry for causing the conflict with #72.

It's good that you've added unit tests, however I think there could be more 
of them.

The current test adds and removes a resource, and only then checks that 
it's gone. I think it should check that it got into the index in the first 
place, otherwise it could be that text indexing is completely broken (no hits 
ever) and the test would still pass.

Would it be possible/easy the structure the unit tests so that all regular 
tests get executed also with the uid field enabled? After all, it shouldn't 
affect the current functionality if you enable deletion support (if it does 
it's a bug, either in implementation or the tests). You could get a lot of free 
tests this way and there would perhaps be no need for further tests of 
uid/deletion functionality.

A similar trick done with the graph-specific indexing, i.e. there are 
general tests in AbstractTestDatasetWithTextIndex, then a couple of extra tests 
for graph-aware functionality in AbstractTestDatasetWithGraphTextIndex, and 
finally TestDatasetWithLuceneGraphTextIndex pulls it together with the right 
(graph-aware) configuration. You could similarly try to reuse all the tests in 
AbstractTestDatasetWithTextIndex for the uid case. I admit the class hierarchy 
and naming is a bit complicated...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Trouble Building Under Eclipse

2015-06-08 Thread aj...@virginia.edu
I think the idea of breaking the shaded Guava artifact out of the main cycle is 
great. It's clearly not a subject of work under most circumstances and having 
one less moving part in a developer's mix is usually a good thing, especially 
for the simple-minded ({raises hand}).

Is it only Hadoop's Guava that is at issue? Would it be possible perhaps to 
just exclude/ Guava from the Hadoop dependencies in Elephas? Or does that 
blow up Hadoop? Or should I go experiment and find out?

---
A. Soroka
The University of Virginia Library

On Jun 8, 2015, at 9:21 AM, Andy Seaborne a...@apache.org wrote:

 Ah right. To summarise what is happening:
 
 The POM file in the maven repo is not the POM file in git.The shade plugin 
 produces a different POM for the the output artifact with the shaded 
 dependency removed.
 
 When the project is not open, Eclipse sees the reduced POM, which does not 
 have a dependency on Google Guava.
 
 When the module jena-shaded-guava is open in Eclipse, Eclipse sees the POM in 
 the module source which names the dependent Google Guava in  a dependency.
 
 Result: a certain degree of chaos.
 
   Andy
 
 On 06/06/15 03:19, Stian Soiland-Reyes wrote:
 Yes, you would need to keep the jena-guava project closed so you get the
 Maven-built shaded jar on the classpath, which has the shaded package name,
 otherwise you will just see the upstream Guava through Eclipse's project
 sharing.
 
 The package name is not shaded for OSGi, it is easy to define private
 packages there. It is shaded to avoid duplicate version mismatches against
 other dependencies with the real guava, e.g. Hadoop which as you know has
 an ancient Guava.
 
 It might be good to keep it out of the normal build/release cycle, then you
 would get the jena-guava shade from Maven central, which should only change
 when we upgrade Guava, in which case it could be re-enabled in the SNAPSHOT
 build or vote+released as a separate artifact (which might be slightly odd
 as it contains no Jena contributions beyond the package name)
  On 4 Jun 2015 14:33, aj...@virginia.edu aj...@virginia.edu wrote:
 
 I have had this problem since I began tinkering. The only solution I have
 found is make sure that the jena-shaded-guava project is never open when
 any project that refers to types therein is open. This isn't much of a
 burden, and I suppose it has something to do with the Maven magic that is
 going on inside jena-shaded-guava.
 
 I'm not totally clear as to why Jena shades Guava into its own namespace--
 is it to avoid OSGi-exporting Guava packages? (We have something like that
 going on in another project on which I work.)
 
 ---
 A. Soroka
 The University of Virginia Library
 
 On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote:
 
 Folks
 
 Recently I've been having a lot of trouble getting Jena to build in
 Eclipse
 which seems to be due to the use of the Shade plugin to Shade Guava.  Any
 module that has a reference to the shaded classes ends refuses to build
 with
 various variations of the following error:
 
 java.lang.NoClassDefFoundError:
 org/apache/jena/ext/com/google/common/cache/RemovalNotification
 
 Anybody else been having this issue?  If so how did you resolve it?
 
 Sometimes cleaning my workspace and/or doing a mvn package at the command
 line seems to help but other times it doesn't
 
 Rob
 
 
 
 
 
 
 



Re: Trouble Building Under Eclipse

2015-06-08 Thread Andy Seaborne

Ah right. To summarise what is happening:

The POM file in the maven repo is not the POM file in git.The shade 
plugin produces a different POM for the the output artifact with the 
shaded dependency removed.


When the project is not open, Eclipse sees the reduced POM, which does 
not have a dependency on Google Guava.


When the module jena-shaded-guava is open in Eclipse, Eclipse sees the 
POM in the module source which names the dependent Google Guava in  a 
dependency.


Result: a certain degree of chaos.

Andy

On 06/06/15 03:19, Stian Soiland-Reyes wrote:

Yes, you would need to keep the jena-guava project closed so you get the
Maven-built shaded jar on the classpath, which has the shaded package name,
otherwise you will just see the upstream Guava through Eclipse's project
sharing.

The package name is not shaded for OSGi, it is easy to define private
packages there. It is shaded to avoid duplicate version mismatches against
other dependencies with the real guava, e.g. Hadoop which as you know has
an ancient Guava.

It might be good to keep it out of the normal build/release cycle, then you
would get the jena-guava shade from Maven central, which should only change
when we upgrade Guava, in which case it could be re-enabled in the SNAPSHOT
build or vote+released as a separate artifact (which might be slightly odd
as it contains no Jena contributions beyond the package name)
  On 4 Jun 2015 14:33, aj...@virginia.edu aj...@virginia.edu wrote:


I have had this problem since I began tinkering. The only solution I have
found is make sure that the jena-shaded-guava project is never open when
any project that refers to types therein is open. This isn't much of a
burden, and I suppose it has something to do with the Maven magic that is
going on inside jena-shaded-guava.

I'm not totally clear as to why Jena shades Guava into its own namespace--
is it to avoid OSGi-exporting Guava packages? (We have something like that
going on in another project on which I work.)

---
A. Soroka
The University of Virginia Library

On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote:


Folks

Recently I've been having a lot of trouble getting Jena to build in

Eclipse

which seems to be due to the use of the Shade plugin to Shade Guava.  Any
module that has a reference to the shaded classes ends refuses to build

with

various variations of the following error:

java.lang.NoClassDefFoundError:
org/apache/jena/ext/com/google/common/cache/RemovalNotification

Anybody else been having this issue?  If so how did you resolve it?

Sometimes cleaning my workspace and/or doing a mvn package at the command
line seems to help but other times it doesn't

Rob












[jira] [Comment Edited] (JENA-959) riot: gzip output option

2015-06-08 Thread Andy Seaborne (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577264#comment-14577264
 ] 

Andy Seaborne edited comment on JENA-959 at 6/8/15 2:42 PM:


Currently, it is reasonably orthogonal to the format.

The asymmetries are
# {{riot}} -- output is not named where as input is usually.
# {{RDFDataMgr}} takes I/O streams, not file names.

{{RDFLanguages.filenameToLang}} maps file extension to language symbol and it 
handles {{.gz}}.  {{Lang}} themselves don't register compressions i.e. don't 
have a specific file extension of {{.ttl.gz}}. 

Then when reading, {{IO.openFileEx(String)}} has the similar understanding of 
{{.gz}} and it adds the decompressor.

{{IO.openOutputFileEx(String)}} already has the complementary code to 
{{IO.openFileEx(String)}} to add the compressor.

This then all works from {{RDFDataMgr.(read|load)}} and {{model.read}}. The 
command {{riot}} isn't special for input.

Making syntax names work with compression extensions look interesting.  

If {{\--compress}} then {{\--decompress}} for stream in.

Don't forget {{http://.../gz}} case and decompression (i.e. when the HTTP 
response does not add the decompression step as you GET the compressed file.


was (Author: andy.seaborne):
Currently, it is reasonably orthogonal to the format.

The asymmetries are
# {{riot}} -- output is not named where as input is usually.
# {{RDFDataMgr}} takes I/O streams, not file names.

{{RDFLanguages.filenameToLang}} maps file extension to language symbol and it 
handles {{.gz}}.  {{Lang}} themselves don't register compressions i.e. don't 
have a specific file extension of {{.ttl.gz}}. 

Then when reading, {{IO.openFileEx(String)}} has the similar understanding of 
{{.gz}} and it adds the decompressor.

{{IO.openOutputFileEx(String)}} already has the complementary code to 
{{IO.openFileEx(String)}} to add the compressor.

This then all works from {{RDFDataMgr.(read|load)}} and {{model.read}}. The 
command {{riot}} isn't special for input.

Making syntax names work with compression extensions look interesting.  

If {{--compress}} then {{--decompress}} for stream in.

Don't forget {{http://.../gz}} case and decompression (i.e. when the HTTP 
response does not add the decompression step as you GET the compressed file.

 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific. Writing *.format.gz with the command line is probably as 
 much within remit of someone using riot on the command line as for reading 
 those.
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (JENA-804) Jena is not reusing already allocated space on the file system which results in large amounts of disk space reserved by Jena files

2015-06-08 Thread Keith Wells (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577340#comment-14577340
 ] 

Keith Wells commented on JENA-804:
--

This issue has become a pain point: we have encountered examples with our 
customers where the TDB index has grown to very large sizes.  One example is a 
170 GB index which after unloading and reloading the nquads, the size of their 
index was reduced to 17GB. 

 Jena is not reusing already allocated space on the file system which results 
 in large amounts of disk space reserved by Jena files
 --

 Key: JENA-804
 URL: https://issues.apache.org/jira/browse/JENA-804
 Project: Apache Jena
  Issue Type: Bug
  Components: Jena
Affects Versions: Jena 2.11.2, TDB 1.0.2
 Environment: Windows 7, IBM JRE 1.7, Tomcat 7.0.54
Reporter: Keith Wells
 Attachments: TdbGrowthTests.java, out.txt, test-tdb-size.sh


 We have a product based on Jena TDB where we insert quads to Jena TDB along 
 with the deletion of quads.  We understand the performance over space 
 architectural decision to not clean up deleted nodeids from the indexes. But 
 the usage of disk space appears that Jena TDB is not reusing allocated space 
 which had been allocated by Jena previously.  Based on this comment there 
 appears to be something that is not correct on file space utilization, 
 http://mail-archives.apache.org/mod_mbox/jena-users/201310.mbox/%3cce7d7929.2a707%25rve...@dotnetrdf.org%3E:
  The indexes won't shrink - TDB never gives disk space back to the OS -  but 
 disk space is reused when reallocated within the same JVM..
 In this scenario on the same JVM with NO server stops or starts, we add 27765 
 graphs to IndexTdb and immediately remove them,  repeating this process 
 several times. 
 {noformat}
  MB   Bytes   Diff (Bytes)
 Start   193   203239424   
   
 Reindex 5 249 262066176   58826752
 Reindex 6 249 262086656   20480
 Reindex 10298 312500224   50413568
 Reindex 11298 312520704   20480
 Reindex 12298 312541184   20480
 Reindex 13298 312586240   45056
 Reindex 14306 320995328   8409088
 Reindex 15330 346181632   25186304
 Reindex 16330 346198538   16906
 Reindex 17346 362999808   16801270
 Reindex 18346 363020288   20480
 Reindex 19346 363040768   20480
 Reindex 20346 363061248   20480
 Reindex 21346 363081728   20480
 Reindex 22354 371490816   8409088
 Reindex 23378 396677120   25186304
   
 End   193 203239424   
 {noformat}
 The system starts with 193MB of data allocated by indexTdb.  A reindex 
 consists of a remove followed by an add of these graphs. As you can see from 
 the data there is a dramatic increase in the size of indexTdb on the disk 
 after repeadedly removing and adding graphs.  After Reindex 23, there is 378 
 MB of disk space used.  If Jena TDB reused allocated space there would be no 
 need to allocate more space other than what is used by deleted node ids 
 (unless nodeid storage is eating all of this space?).  Jena does not appear 
 to be reusing the allocated disk space.  At the very end of this scenario, we 
 exported the nquads and reloaded them to show the original disk space was 
 193MB back to where it started. 
 We believe Jena TDB is not reusing the space allocated by the TDB file system 
 within the same JVM.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: CLI libraries

2015-06-08 Thread Andy Seaborne

On 08/06/15 15:47, aj...@virginia.edu wrote:

In examining and discussing
https://issues.apache.org/jira/browse/JENA-959, it seems to me (a
Jena newbie!) that Jena's CLI action is built up in jena-core, in
package jena.cmdline.

If that is correct, and Jena has its own CLI code, wouldn't it be
better to replace this with a modern CLI library like that provided
by Apache Commons? Does that sound like a ticket?


arq.cmdline.CmdLineArgs

The whole cmd support does more than Apache Commons CLI.

Around command line processing is support for grouping and reuse across 
commands, and an execution model.


There are a lot of commands -- Apache Commons CLI would also cause 
chnages in syntax.   e.g. arq.cmd does not treat -- and - differently; 
combined POSIX like options aren't supported.


(jena.cmdline looks like some partial copy to get older development 
working).


A useful goal might be to have a module jena-cmd which is after SDB, 
TDB and the rest with the set of command line tools we deed to be the 
public set of commands (some of the old stuff needs retiring or at least 
incompatibly brought into the general style - e.g. rdfcompare).


People use rdfcat :-( but nowadays riot is better IMO (scale, speed, 
arguments, ..) but I'm not unbiased.


A useful but bounded stpe might be to take arq.cmd* to 
jena-base/jena.cmd* and drop jena-core/jena.cmdline (not tried this 
so there maybe a forgotten dependency).



Andy



--- A. Soroka The University of Virginia Library





[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-08 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-110026337
  
Ok I see, I will add a similar case of graph-specific for deletion 
support.

One question about graph indexing. In jena-text documentation you mention: 
This allows for more efficient text queries when the query targets only a 
single named graph. But there's no example of using this (even in the tests).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...

2015-06-08 Thread amiara514
Github user amiara514 commented on the pull request:

https://github.com/apache/jena/pull/53#issuecomment-110032217
  
@osma oups, forget my message.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Commented] (JENA-959) riot: gzip output option

2015-06-08 Thread A. Soroka (JIRA)

[ 
https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577268#comment-14577268
 ] 

A. Soroka commented on JENA-959:


I'm a little confused now [~andy.seaborne]-- are you arguing _against_ or _for_ 
a separate flag?

 riot: gzip output option
 

 Key: JENA-959
 URL: https://issues.apache.org/jira/browse/JENA-959
 Project: Apache Jena
  Issue Type: New Feature
  Components: RIOT
Reporter: Stian Soiland-Reyes
Priority: Trivial

 The riot command line tool supports incoming file formats like *.ttl.gz, but 
 there is no (obvious) way to also output in compressed formats.
 This can of course also be achieved with piping and gzip, but that is easily 
 platform-specific. Writing *.format.gz with the command line is probably as 
 much within remit of someone using riot on the command line as for reading 
 those.
 So my suggestion is to support extension .gz in the various --output options 
 to enabled outputting via a GzipOutputStream -- 
 http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html
 For example:
 {code}
 stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz 
 chembl_20.0_target_targetcmpt_ls.ttl.gz 
 Not recognized as an RDF language : 'nquads.gz'
 {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


CLI libraries

2015-06-08 Thread aj...@virginia.edu
In examining and discussing https://issues.apache.org/jira/browse/JENA-959, it 
seems to me (a Jena newbie!) that Jena's CLI action is built up in jena-core, 
in package jena.cmdline.

If that is correct, and Jena has its own CLI code, wouldn't it be better to 
replace this with a modern CLI library like that provided by Apache Commons? 
Does that sound like a ticket?

---
A. Soroka
The University of Virginia Library



Re: CLI libraries

2015-06-08 Thread aj...@virginia.edu
Okay, that makes sense.

Is the larger move (the construction of 'jena-cmd') worth an epic in Jira? With 
the smaller (take arq.cmd* to jena-base/jena.cmd* and drop 
jena-core/jena.cmdline) as a first story therein?

---
A. Soroka
The University of Virginia Library

On Jun 8, 2015, at 11:24 AM, Andy Seaborne a...@apache.org wrote:

 On 08/06/15 15:47, aj...@virginia.edu wrote:
 In examining and discussing
 https://issues.apache.org/jira/browse/JENA-959, it seems to me (a
 Jena newbie!) that Jena's CLI action is built up in jena-core, in
 package jena.cmdline.
 
 If that is correct, and Jena has its own CLI code, wouldn't it be
 better to replace this with a modern CLI library like that provided
 by Apache Commons? Does that sound like a ticket?
 
 arq.cmdline.CmdLineArgs
 
 The whole cmd support does more than Apache Commons CLI.
 
 Around command line processing is support for grouping and reuse across 
 commands, and an execution model.
 
 There are a lot of commands -- Apache Commons CLI would also cause chnages in 
 syntax.   e.g. arq.cmd does not treat -- and - differently; combined POSIX 
 like options aren't supported.
 
 (jena.cmdline looks like some partial copy to get older development working).
 
 A useful goal might be to have a module jena-cmd which is after SDB, TDB 
 and the rest with the set of command line tools we deed to be the public set 
 of commands (some of the old stuff needs retiring or at least incompatibly 
 brought into the general style - e.g. rdfcompare).
 
 People use rdfcat :-( but nowadays riot is better IMO (scale, speed, 
 arguments, ..) but I'm not unbiased.
 
 A useful but bounded stpe might be to take arq.cmd* to jena-base/jena.cmd* 
 and drop jena-core/jena.cmdline (not tried this so there maybe a forgotten 
 dependency).
 
 
   Andy
 
 
 --- A. Soroka The University of Virginia Library
 
 



TDB2

2015-06-08 Thread Andy Seaborne

Informational announcement: TDB2

TDB2 is a reworking of TDB based on updated implementations of 
transactions and transactional data structures for project Lizard (a 
clustered SPARQL store).


TDB2 has:

* Arbitrary scale write-once transactions
* New transaction system - can add other first class components.
  (e.g. text indexes, cache tables)
* Models works across transaction boundaries
* Cleaner, simpler, more maintainable

TDB2 databases are not compatible with TDB databases.  It uses a more 
efficient encoding for RDF terms.  [1]


Being a database, the new indexing and transaction code needs time to 
settle to bring the maturity up.  I'm using that tech in Lizard development.


Andy

TDB2 code:
https://github.com/afs/mantis/tree/master/tdb2

Lizard slides:
http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard


[1] An upgrade path using TDB1-style encoding is possible; it is an 
one-way upgrade path and not reversible [2].  TDB2 adds control files 
for the copy-on-write data structures that TDB1 does not understand.


[2] Actually, if the encoding is compatible, what will happen is that 
TDB1 will see the database at the time of the upgrade.  Welcome to 
copy-on-write immutable data structures.


Re: TDB2

2015-06-08 Thread Andy Seaborne

On 08/06/15 16:41, Andy Seaborne wrote:

Informational announcement: TDB2

TDB2 is a reworking of TDB based on updated implementations of
transactions and transactional data structures for project Lizard (a
clustered SPARQL store).

TDB2 has:

* Arbitrary scale write-once transactions
* New transaction system - can add other first class components.
   (e.g. text indexes, cache tables)
* Models works across transaction boundaries
* Cleaner, simpler, more maintainable

TDB2 databases are not compatible with TDB databases.  It uses a more
efficient encoding for RDF terms.  [1]

Being a database, the new indexing and transaction code needs time to
settle to bring the maturity up.  I'm using that tech in Lizard
development.

 Andy

TDB2 code:
https://github.com/afs/mantis/tree/master/tdb2

Lizard slides:
http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard


[1] An upgrade path using TDB1-style encoding is possible; it is an
one-way upgrade path and not reversible [2].  TDB2 adds control files
for the copy-on-write data structures that TDB1 does not understand.

[2] Actually, if the encoding is compatible, what will happen is that
TDB1 will see the database at the time of the upgrade.  Welcome to
copy-on-write immutable data structures.


TDB2 is transactional use only.
Additional fun with Java8: all the begin/commit foo is hidden.

   Dataset ds = TDBFactory.createDataset() ;

Here is a write transaction to load a file:

  TDBTxn.executeWrite(ds, ()-RDFDataMgr.read(ds, http:...)) ;

Or to get the size of the default model safely:

  long size =
TDBTxn.executeReadReturn(ds, ()-ds.getDefaultModel().size()) ;

Andy