Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.
On 08/06/15 10:25, Claude Warren wrote: What exactly is this review asking? Change in strategy or change in docs? Both :-) concurrency-howto does not mention transactions except in passing. It shoudl be more pro-transactions IMO. A possibility is that Dataset are all transactional, even is that is only DatasetGraphWithLock; No Dataset.supportsTransactions - its always true. Remove Dataset.getlock. concurrency-howto would be for model-only use. Everything else is transaction in style. The documentation should reflect this preferred style. If we had (hi ajs6f!) an in-memory dataset as well as the general container one, and the in-memory one were transactional, copy-in for addGraph, we could make models be views of datasets always. Creating a model would have an implicit Dataset if a free standing model. Andy On Fri, Jun 5, 2015 at 8:30 PM, Andy Seaborne (JIRA) j...@apache.org wrote: Andy Seaborne created JENA-957: -- Summary: Review concurrency howto in the light of transactions. Key: JENA-957 URL: https://issues.apache.org/jira/browse/JENA-957 Project: Apache Jena Issue Type: Bug Reporter: Andy Seaborne Priority: Minor http://jena.apache.org/documentation/notes/concurrency-howto.html Include {{DatasetGraphWithLock}}. Consider if that should be the default for in-memory and general datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: TDB2
On 08/06/15 17:48, Marco Neumann wrote: is TDB2 going to replace TDB or is TDB2 a new cluster product? Whatever people (users, developers) want. Migrating Dbs is not as easy as ungrading code. Running oaj.tdb and oaj.tdb2 side by side (TDB2 is itself 7 maven modules ATM - some can be combined as they are small and just a good idea at the time). TDB2 is not the cluster (that's Lizard). Mantis started as the separation out of the low level code needed for Lizard. Initially validation of the reworking of transaction and datastructures, a little extra work has made it as viable as TDB2 Andy (oaj = org.apache.jena) Marco On Mon, Jun 8, 2015 at 11:41 AM, Andy Seaborne a...@apache.org wrote: Informational announcement: TDB2 TDB2 is a reworking of TDB based on updated implementations of transactions and transactional data structures for project Lizard (a clustered SPARQL store). TDB2 has: * Arbitrary scale write-once transactions * New transaction system - can add other first class components. (e.g. text indexes, cache tables) * Models works across transaction boundaries * Cleaner, simpler, more maintainable TDB2 databases are not compatible with TDB databases. It uses a more efficient encoding for RDF terms. [1] Being a database, the new indexing and transaction code needs time to settle to bring the maturity up. I'm using that tech in Lizard development. Andy TDB2 code: https://github.com/afs/mantis/tree/master/tdb2 Lizard slides: http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard [1] An upgrade path using TDB1-style encoding is possible; it is an one-way upgrade path and not reversible [2]. TDB2 adds control files for the copy-on-write data structures that TDB1 does not understand. [2] Actually, if the encoding is compatible, what will happen is that TDB1 will see the database at the time of the upgrade. Welcome to copy-on-write immutable data structures.
Re: Trouble Building Under Eclipse
Hadoop/Elephas is an example of a general problem with Guava. By reputation, upgrading Guava across versions has been problematic - subtle and not-so-subtle changes of behaviour or removed code. When Jena is used as a library, the system or application in which it is used might use Guava itself - and need a specific version. But Jena uses Guava and needs a specific version with certain code in it, which might be different. We are isolating Jena's use of Guava from the system in which Jena is used. Hadoop's have very strong requirements on Guava versions - it might well apply to other user applications as well. We do exclude/ in the sense that dependency-reduced-pom.xml POM of jena-shared-guava does not mention com.google.guava:guava. Elephas picks up the Hadoop dependency. Andy On 08/06/15 14:26, aj...@virginia.edu wrote: I think the idea of breaking the shaded Guava artifact out of the main cycle is great. It's clearly not a subject of work under most circumstances and having one less moving part in a developer's mix is usually a good thing, especially for the simple-minded ({raises hand}). Is it only Hadoop's Guava that is at issue? Would it be possible perhaps to just exclude/ Guava from the Hadoop dependencies in Elephas? Or does that blow up Hadoop? Or should I go experiment and find out? --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 9:21 AM, Andy Seaborne a...@apache.org wrote: Ah right. To summarise what is happening: The POM file in the maven repo is not the POM file in git.The shade plugin produces a different POM for the the output artifact with the shaded dependency removed. When the project is not open, Eclipse sees the reduced POM, which does not have a dependency on Google Guava. When the module jena-shaded-guava is open in Eclipse, Eclipse sees the POM in the module source which names the dependent Google Guava in a dependency. Result: a certain degree of chaos. Andy On 06/06/15 03:19, Stian Soiland-Reyes wrote: Yes, you would need to keep the jena-guava project closed so you get the Maven-built shaded jar on the classpath, which has the shaded package name, otherwise you will just see the upstream Guava through Eclipse's project sharing. The package name is not shaded for OSGi, it is easy to define private packages there. It is shaded to avoid duplicate version mismatches against other dependencies with the real guava, e.g. Hadoop which as you know has an ancient Guava. It might be good to keep it out of the normal build/release cycle, then you would get the jena-guava shade from Maven central, which should only change when we upgrade Guava, in which case it could be re-enabled in the SNAPSHOT build or vote+released as a separate artifact (which might be slightly odd as it contains no Jena contributions beyond the package name) On 4 Jun 2015 14:33, aj...@virginia.edu aj...@virginia.edu wrote: I have had this problem since I began tinkering. The only solution I have found is make sure that the jena-shaded-guava project is never open when any project that refers to types therein is open. This isn't much of a burden, and I suppose it has something to do with the Maven magic that is going on inside jena-shaded-guava. I'm not totally clear as to why Jena shades Guava into its own namespace-- is it to avoid OSGi-exporting Guava packages? (We have something like that going on in another project on which I work.) --- A. Soroka The University of Virginia Library On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote: Folks Recently I've been having a lot of trouble getting Jena to build in Eclipse which seems to be due to the use of the Shade plugin to Shade Guava. Any module that has a reference to the shaded classes ends refuses to build with various variations of the following error: java.lang.NoClassDefFoundError: org/apache/jena/ext/com/google/common/cache/RemovalNotification Anybody else been having this issue? If so how did you resolve it? Sometimes cleaning my workspace and/or doing a mvn package at the command line seems to help but other times it doesn't Rob
Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.
So to be clear, part of the idea here is to boost the visibility of transactions, and one of the things that wants doing as part of that is to provide for copy-on-add-graph semantics for the in-memory dataset so that transactionality is coherent across such a dataset. Right now it instead is a sort of patchwork of whatever forms of transactionality were available in the graphs that have been added to it, which isn't an attractive thing to advertise, and may not even really work all the time. As far as model-as-views-of-datasets: is it true that all that is needed for this is a good in-memory dataset? What about datasets that are much too large for memory? Or impls of Dataset that incur network latency in operation? Or do these cases just imply the need for the right kinds of laziness in views on Datasets? --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 3:23 PM, Andy Seaborne a...@apache.org wrote: On 08/06/15 10:25, Claude Warren wrote: What exactly is this review asking? Change in strategy or change in docs? Both :-) concurrency-howto does not mention transactions except in passing. It shoudl be more pro-transactions IMO. A possibility is that Dataset are all transactional, even is that is only DatasetGraphWithLock; No Dataset.supportsTransactions - its always true. Remove Dataset.getlock. concurrency-howto would be for model-only use. Everything else is transaction in style. The documentation should reflect this preferred style. If we had (hi ajs6f!) an in-memory dataset as well as the general container one, and the in-memory one were transactional, copy-in for addGraph, we could make models be views of datasets always. Creating a model would have an implicit Dataset if a free standing model. Andy On Fri, Jun 5, 2015 at 8:30 PM, Andy Seaborne (JIRA) j...@apache.org wrote: Andy Seaborne created JENA-957: -- Summary: Review concurrency howto in the light of transactions. Key: JENA-957 URL: https://issues.apache.org/jira/browse/JENA-957 Project: Apache Jena Issue Type: Bug Reporter: Andy Seaborne Priority: Minor http://jena.apache.org/documentation/notes/concurrency-howto.html Include {{DatasetGraphWithLock}}. Consider if that should be the default for in-memory and general datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-110094664 I reorganized tests part --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: [ANN] GSoC 2015 Accepts a Student Project for Jena
On 08/06/15 10:27, Qihong Lin wrote: Hi, The grammar has been modified for the problems you pointed out. I've tried to run grammar script to generate arq.jj, sparql_11.jj and their parser java classes, in cygwin with JavaCC 5.0. However the generated java classes are different from those in the code base: 1) ARQParser - the new generated one: public class ARQParser extends ARQParserBase implements ARQParserConstants - the old one in the code base: public class ARQParser extends ARQParserBase Ignore that difference - implements ARQParserConstants is fine and correct. (ARQParserBase implements ARQParserConstants) ARQParser got modified in some code clean up and should not have been. There's no such difference for SPARQLParser11 (both new and old ones have implements ...) Good. 2) checksum for Token, ParseException, JavaCharStream and so on - the new generated one (Token.java): /* JavaCC - OriginalChecksum=335d1922781852977208d5cdca0fc164 (do not edit this line) */ - the old one in the code base (Token.java): /* JavaCC - OriginalChecksum=d9b4c8c9332fa3054a004615fdb22b89 (do not edit this line) */ I have no idea what the checksum is a checksum of! If the line endings are different, the checksums might be affected. The log from grammar script seems OK: $ ./grammar Process grammar -- sparql_11.jj Java Compiler Compiler Version 5.0 (Parser Generator) Ok - version 5.0. (type javacc with no arguments for help) Reading from file sparql_11.jj . . . File TokenMgrError.java does not exist. Will create one. File ParseException.java does not exist. Will create one. File Token.java does not exist. Will create one. File JavaCharStream.java does not exist. Will create one. Parser generated successfully. Create text form Java Compiler Compiler Version 5.0 (Documentation Generator Version 0.1.4) (type jjdoc with no arguments for help) Reading from file sparql_11.jj . . . Grammar documentation generated successfully in sparql_11.txt Fixing Java warnings in TokenManager ... Fixing Java warnings in Token ... Fixing Java warnings in TokenMgrError ... Fixing Java warnings in SPARQLParser11 ... Done Process grammar -- arq.jj Java Compiler Compiler Version 5.0 (Parser Generator) (type javacc with no arguments for help) Reading from file arq.jj . . . File TokenMgrError.java does not exist. Will create one. File ParseException.java does not exist. Will create one. File Token.java does not exist. Will create one. File JavaCharStream.java does not exist. Will create one. does not exist is to be expected. The script deletes old files before it runs javacc to ensure everything is clean. Parser generated successfully. Create text form Java Compiler Compiler Version 5.0 (Documentation Generator Version 0.1.4) (type jjdoc with no arguments for help) Reading from file arq.jj . . . Grammar documentation generated successfully in arq.txt Fixing Java warnings in TokenManager ... Fixing Java warnings in Token ... Fixing Java warnings in TokenMgrError ... Fixing Java warnings in ARQParser ... Done Is that the expected behavior for the grammar script? Anything wrong? looks good. If the ARQ test suite runs, it should be good. cd jena-arq ; mvn clean test regard, Qihong On Sat, Jun 6, 2015 at 11:05 AM, Ying Jiang jpz6311...@gmail.com wrote: Hi, The grammar needs revisions in some way. For example, in your proposal, the GRAPH token can be optional. Another problem for default graph: both { ?s :p ?o } and ?s :p ?o are valid, so QuadsNotTriples should be re-defined. On the other hand, you can start playing with the code of master.jj. There's no need to wait until the grammar is ready. Your code is supposed to be delivered as soon as possible. We can have early feedback from the end users. Merging early will also reduce any problems with several people changing the same file. Best regards, Ying Jiang On Fri, Jun 5, 2015 at 6:25 PM, Qihong Lin confidence@gmail.com wrote: Hi, I added the grammar draft at the end of [1]. There're actually minor changes on the grammar of ConstructQuery, which are marked red. Much of the grammar from SPARQL INSERT can be reused, related to Quads. Any comments? regards, Qihong [1] https://docs.google.com/document/d/1KiDlfxMq5ZsU7vj7ZDm10yC96OZgdltwmZAZl56sTw0
Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.
On 08/06/15 20:38, aj...@virginia.edu wrote: So to be clear, part of the idea here is to boost the visibility of transactions, and one of the things that wants doing as part of that is to provide for copy-on-add-graph semantics for the in-memory dataset so that transactionality is coherent across such a dataset. Right now it instead is a sort of patchwork of whatever forms of transactionality were available in the graphs that have been added to it, which isn't an attractive thing to advertise, and may not even really work all the time. less - there is no transactionality across the contained graphs. (Model.graph transactions aren't connected to dataset transactions) As far as model-as-views-of-datasets: is it true that all that is needed for this is a good in-memory dataset? It would be useful for working in-memory. For example default union graph can bne made to work efficiently, as can dataset transactions. What about datasets that are much too large for memory? Or impls of Dataset that incur network latency in operation? Or do these cases just imply the need for the right kinds of laziness in views on Datasets? Models from TDB are already views. public class GraphTDB extends GraphView ... Andy --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 3:23 PM, Andy Seaborne a...@apache.org wrote: On 08/06/15 10:25, Claude Warren wrote: What exactly is this review asking? Change in strategy or change in docs? Both :-) concurrency-howto does not mention transactions except in passing. It shoudl be more pro-transactions IMO. A possibility is that Dataset are all transactional, even is that is only DatasetGraphWithLock; No Dataset.supportsTransactions - its always true. Remove Dataset.getlock. concurrency-howto would be for model-only use. Everything else is transaction in style. The documentation should reflect this preferred style. If we had (hi ajs6f!) an in-memory dataset as well as the general container one, and the in-memory one were transactional, copy-in for addGraph, we could make models be views of datasets always. Creating a model would have an implicit Dataset if a free standing model. Andy On Fri, Jun 5, 2015 at 8:30 PM, Andy Seaborne (JIRA) j...@apache.org wrote: Andy Seaborne created JENA-957: -- Summary: Review concurrency howto in the light of transactions. Key: JENA-957 URL: https://issues.apache.org/jira/browse/JENA-957 Project: Apache Jena Issue Type: Bug Reporter: Andy Seaborne Priority: Minor http://jena.apache.org/documentation/notes/concurrency-howto.html Include {{DatasetGraphWithLock}}. Consider if that should be the default for in-memory and general datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: TDB2
is TDB2 going to replace TDB or is TDB2 a new cluster product? Marco On Mon, Jun 8, 2015 at 11:41 AM, Andy Seaborne a...@apache.org wrote: Informational announcement: TDB2 TDB2 is a reworking of TDB based on updated implementations of transactions and transactional data structures for project Lizard (a clustered SPARQL store). TDB2 has: * Arbitrary scale write-once transactions * New transaction system - can add other first class components. (e.g. text indexes, cache tables) * Models works across transaction boundaries * Cleaner, simpler, more maintainable TDB2 databases are not compatible with TDB databases. It uses a more efficient encoding for RDF terms. [1] Being a database, the new indexing and transaction code needs time to settle to bring the maturity up. I'm using that tech in Lizard development. Andy TDB2 code: https://github.com/afs/mantis/tree/master/tdb2 Lizard slides: http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard [1] An upgrade path using TDB1-style encoding is possible; it is an one-way upgrade path and not reversible [2]. TDB2 adds control files for the copy-on-write data structures that TDB1 does not understand. [2] Actually, if the encoding is compatible, what will happen is that TDB1 will see the database at the time of the upgrade. Welcome to copy-on-write immutable data structures. -- --- Marco Neumann KONA
Re: [ANN] GSoC 2015 Accepts a Student Project for Jena
Hi, The grammar has been modified for the problems you pointed out. I've tried to run grammar script to generate arq.jj, sparql_11.jj and their parser java classes, in cygwin with JavaCC 5.0. However the generated java classes are different from those in the code base: 1) ARQParser - the new generated one: public class ARQParser extends ARQParserBase implements ARQParserConstants - the old one in the code base: public class ARQParser extends ARQParserBase There's no such difference for SPARQLParser11 (both new and old ones have implements ...) 2) checksum for Token, ParseException, JavaCharStream and so on - the new generated one (Token.java): /* JavaCC - OriginalChecksum=335d1922781852977208d5cdca0fc164 (do not edit this line) */ - the old one in the code base (Token.java): /* JavaCC - OriginalChecksum=d9b4c8c9332fa3054a004615fdb22b89 (do not edit this line) */ The log from grammar script seems OK: $ ./grammar Process grammar -- sparql_11.jj Java Compiler Compiler Version 5.0 (Parser Generator) (type javacc with no arguments for help) Reading from file sparql_11.jj . . . File TokenMgrError.java does not exist. Will create one. File ParseException.java does not exist. Will create one. File Token.java does not exist. Will create one. File JavaCharStream.java does not exist. Will create one. Parser generated successfully. Create text form Java Compiler Compiler Version 5.0 (Documentation Generator Version 0.1.4) (type jjdoc with no arguments for help) Reading from file sparql_11.jj . . . Grammar documentation generated successfully in sparql_11.txt Fixing Java warnings in TokenManager ... Fixing Java warnings in Token ... Fixing Java warnings in TokenMgrError ... Fixing Java warnings in SPARQLParser11 ... Done Process grammar -- arq.jj Java Compiler Compiler Version 5.0 (Parser Generator) (type javacc with no arguments for help) Reading from file arq.jj . . . File TokenMgrError.java does not exist. Will create one. File ParseException.java does not exist. Will create one. File Token.java does not exist. Will create one. File JavaCharStream.java does not exist. Will create one. Parser generated successfully. Create text form Java Compiler Compiler Version 5.0 (Documentation Generator Version 0.1.4) (type jjdoc with no arguments for help) Reading from file arq.jj . . . Grammar documentation generated successfully in arq.txt Fixing Java warnings in TokenManager ... Fixing Java warnings in Token ... Fixing Java warnings in TokenMgrError ... Fixing Java warnings in ARQParser ... Done Is that the expected behavior for the grammar script? Anything wrong? regard, Qihong On Sat, Jun 6, 2015 at 11:05 AM, Ying Jiang jpz6311...@gmail.com wrote: Hi, The grammar needs revisions in some way. For example, in your proposal, the GRAPH token can be optional. Another problem for default graph: both { ?s :p ?o } and ?s :p ?o are valid, so QuadsNotTriples should be re-defined. On the other hand, you can start playing with the code of master.jj. There's no need to wait until the grammar is ready. Your code is supposed to be delivered as soon as possible. We can have early feedback from the end users. Merging early will also reduce any problems with several people changing the same file. Best regards, Ying Jiang On Fri, Jun 5, 2015 at 6:25 PM, Qihong Lin confidence@gmail.com wrote: Hi, I added the grammar draft at the end of [1]. There're actually minor changes on the grammar of ConstructQuery, which are marked red. Much of the grammar from SPARQL INSERT can be reused, related to Quads. Any comments? regards, Qihong [1] https://docs.google.com/document/d/1KiDlfxMq5ZsU7vj7ZDm10yC96OZgdltwmZAZl56sTw0 On Tue, Jun 2, 2015 at 10:10 PM, Ying Jiang jpz6311...@gmail.com wrote: Hi Qihong, Your grammar in the proposal is not formal. Why not compose a BNF/EBNF notation one, so that others can provide more accurate comments? e.g, the WHERE clause for the complete form and short form are quite different. No complex graph patterns are allowed in the short form). Best regards, Ying Jiang On Thu, May 28, 2015 at 10:59 PM, Qihong Lin confidence@gmail.com wrote: Hi, Ying, I'll stick to the list for discussion. Thanks for your guide! I re-created a fresh new branch of JENA-491, which did not contain hp package any more. Andy, You mention that the GRAPH grammar needs revisions. Please check the following ones. I add the short form. Am I missing anything else? Complete form: CONSTRUCT { # Named graph GRAPH :g { ?s :p ?o } # Default graph { ?s :p ?o } # Named graph :g { ?s :p ?o } # Default graph ?s :p ?o } WHERE { ... } Short form: CONSTRUCT { } WHERE { ... } regards, Qihong On Tue, May 26, 2015 at 11:12 PM, Ying Jiang jpz6311...@gmail.com wrote: Hi Qihong, As Andy mentioned, the bonding period is for community bonding, not just mentor
Re: [jira] [Created] (JENA-957) Review concurrency howto in the light of transactions.
What exactly is this review asking? Change in strategy or change in docs? On Fri, Jun 5, 2015 at 8:30 PM, Andy Seaborne (JIRA) j...@apache.org wrote: Andy Seaborne created JENA-957: -- Summary: Review concurrency howto in the light of transactions. Key: JENA-957 URL: https://issues.apache.org/jira/browse/JENA-957 Project: Apache Jena Issue Type: Bug Reporter: Andy Seaborne Priority: Minor http://jena.apache.org/documentation/notes/concurrency-howto.html Include {{DatasetGraphWithLock}}. Consider if that should be the default for in-memory and general datasets. -- This message was sent by Atlassian JIRA (v6.3.4#6332) -- I like: Like Like - The likeliest place on the web http://like-like.xenei.com LinkedIn: http://www.linkedin.com/in/claudewarren
[jira] [Created] (JENA-959) riot: gzip output option
Stian Soiland-Reyes created JENA-959: Summary: riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific and not so easily So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stian Soiland-Reyes updated JENA-959: - Description: The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific. Writing *.format.gz with the command line is probably as much within remit of someone using riot on the command line as for reading those. So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} was: The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific and not so easily So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific. Writing *.format.gz with the command line is probably as much within remit of someone using riot on the command line as for reading those. So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577215#comment-14577215 ] A. Soroka edited comment on JENA-959 at 6/8/15 1:53 PM: What do you think of the idea of an independent flag ({{--compress}} or the like). Since compression can be applied orthogonally to any format, it seems a little simpler to keep it separate. was (Author: ajs6f): What do you think of the idea of an independent flag ({{--compress}}) or the like. Since compression can be applied orthogonally to any format, it seems a little simpler to keep it separate. riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific and not so easily So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577215#comment-14577215 ] A. Soroka edited comment on JENA-959 at 6/8/15 1:53 PM: What do you think of the idea of an independent flag ({{--compress}}) or the like. Since compression can be applied orthogonally to any format, it seems a little simpler to keep it separate. was (Author: ajs6f): What do you think of the idea of an independent flag ({{{--compress}}}) or the like. Since compression can be applied orthogonally to any format, it seems a little simpler to keep it separate. riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific and not so easily So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577215#comment-14577215 ] A. Soroka commented on JENA-959: What do you think of the idea of an independent flag ({{{--compress}}}) or the like. Since compression can be applied orthogonally to any format, it seems a little simpler to keep it separate. riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific and not so easily So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577220#comment-14577220 ] Stian Soiland-Reyes commented on JENA-959: -- Yeah, either should work. It might be worth also having explicit compression support for input formats.. FOr instance now it works with: {code} riot --syntax=turtle chembl_20.0_target_targetcmpt_ls.ttl.gz http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 http://www.w3.org/2004/02/skos/core#relatedMatch http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7619 . http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 http://www.w3.org/2004/02/skos/core#relatedMatch http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7612 . http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 http://www.w3.org/2004/02/skos/core#relatedMatch http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7611 . {code} but it is still guessing the .gz from the filename.. so I can't do the same if I have piped in a gziped stream or don't have a valid extension: {code} stain@biggie-utopic:~/Downloads$ riot --syntax=nquads fred stain@biggie-utopic:~/Downloads$ riot --syntax=turtle fred Exception in thread main org.apache.jena.atlas.RuntimeIOException: java.nio.charset.MalformedInputException: Input length = 1 at org.apache.jena.atlas.io.IO.exception(IO.java:222) at org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77) at org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154) at org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137) at org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:241) at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:235) at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:157) at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:98) at org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41) at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:138) at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:180) at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:267) at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:185) at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:175) at riotcmd.CmdLangParse.exec(CmdLangParse.java:148) at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102) at arq.cmdline.CmdMain.mainRun(CmdMain.java:63) at arq.cmdline.CmdMain.mainRun(CmdMain.java:50) at riotcmd.riot.main(riot.java:35) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.Read {code} So for this I would appreciate if --syntax supported the same compression option: {code} stain@biggie-utopic:~/Downloads$ riot --syntax=turtle.gz fred Can not detemine the synatx from 'turtle.gz' {code} riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific. Writing *.format.gz with the command line is probably as much within remit of someone using riot on the command line as for reading those. So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577219#comment-14577219 ] Stian Soiland-Reyes commented on JENA-959: -- Yeah, either should work. It might be worth also having explicit compression support for input formats.. FOr instance now it works with: {code} riot --syntax=turtle chembl_20.0_target_targetcmpt_ls.ttl.gz http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 http://www.w3.org/2004/02/skos/core#relatedMatch http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7619 . http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 http://www.w3.org/2004/02/skos/core#relatedMatch http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7612 . http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 http://www.w3.org/2004/02/skos/core#relatedMatch http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7611 . {code} but it is still guessing the .gz from the filename.. so I can't do the same if I have piped in a gziped stream or don't have a valid extension: {code} stain@biggie-utopic:~/Downloads$ riot --syntax=nquads fred stain@biggie-utopic:~/Downloads$ riot --syntax=turtle fred Exception in thread main org.apache.jena.atlas.RuntimeIOException: java.nio.charset.MalformedInputException: Input length = 1 at org.apache.jena.atlas.io.IO.exception(IO.java:222) at org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77) at org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154) at org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137) at org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:241) at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:235) at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:157) at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:98) at org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41) at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:138) at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:180) at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:267) at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:185) at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:175) at riotcmd.CmdLangParse.exec(CmdLangParse.java:148) at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102) at arq.cmdline.CmdMain.mainRun(CmdMain.java:63) at arq.cmdline.CmdMain.mainRun(CmdMain.java:50) at riotcmd.riot.main(riot.java:35) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.Read {code} So for this I would appreciate if --syntax supported the same compression option: {code} stain@biggie-utopic:~/Downloads$ riot --syntax=turtle.gz fred Can not detemine the synatx from 'turtle.gz' {code} riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific. Writing *.format.gz with the command line is probably as much within remit of someone using riot on the command line as for reading those. So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Issue Comment Deleted] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stian Soiland-Reyes updated JENA-959: - Comment: was deleted (was: Yeah, either should work. It might be worth also having explicit compression support for input formats.. FOr instance now it works with: {code} riot --syntax=turtle chembl_20.0_target_targetcmpt_ls.ttl.gz http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 http://www.w3.org/2004/02/skos/core#relatedMatch http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7619 . http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 http://www.w3.org/2004/02/skos/core#relatedMatch http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7612 . http://rdf.ebi.ac.uk/resource/chembl/target/CHEMBL2364022 http://www.w3.org/2004/02/skos/core#relatedMatch http://rdf.ebi.ac.uk/resource/chembl/targetcomponent/CHEMBL_TC_7611 . {code} but it is still guessing the .gz from the filename.. so I can't do the same if I have piped in a gziped stream or don't have a valid extension: {code} stain@biggie-utopic:~/Downloads$ riot --syntax=nquads fred stain@biggie-utopic:~/Downloads$ riot --syntax=turtle fred Exception in thread main org.apache.jena.atlas.RuntimeIOException: java.nio.charset.MalformedInputException: Input length = 1 at org.apache.jena.atlas.io.IO.exception(IO.java:222) at org.apache.jena.atlas.io.CharStreamBuffered$SourceReader.fill(CharStreamBuffered.java:77) at org.apache.jena.atlas.io.CharStreamBuffered.fillArray(CharStreamBuffered.java:154) at org.apache.jena.atlas.io.CharStreamBuffered.advance(CharStreamBuffered.java:137) at org.apache.jena.atlas.io.PeekReader.advanceAndSet(PeekReader.java:241) at org.apache.jena.atlas.io.PeekReader.init(PeekReader.java:235) at org.apache.jena.atlas.io.PeekReader.peekChar(PeekReader.java:157) at org.apache.jena.atlas.io.PeekReader.makeUTF8(PeekReader.java:98) at org.apache.jena.riot.tokens.TokenizerFactory.makeTokenizerUTF8(TokenizerFactory.java:41) at org.apache.jena.riot.RiotReader.createParser(RiotReader.java:138) at org.apache.jena.riot.RDFParserRegistry$ReaderRIOTLang.read(RDFParserRegistry.java:180) at riotcmd.CmdLangParse.parseRIOT(CmdLangParse.java:267) at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:185) at riotcmd.CmdLangParse.parseFile(CmdLangParse.java:175) at riotcmd.CmdLangParse.exec(CmdLangParse.java:148) at arq.cmdline.CmdMain.mainMethod(CmdMain.java:102) at arq.cmdline.CmdMain.mainRun(CmdMain.java:63) at arq.cmdline.CmdMain.mainRun(CmdMain.java:50) at riotcmd.riot.main(riot.java:35) Caused by: java.nio.charset.MalformedInputException: Input length = 1 at java.nio.charset.CoderResult.throwException(CoderResult.java:281) at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:339) at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) at java.io.InputStreamReader.read(InputStreamReader.java:184) at java.io.Read {code} So for this I would appreciate if --syntax supported the same compression option: {code} stain@biggie-utopic:~/Downloads$ riot --syntax=turtle.gz fred Can not detemine the synatx from 'turtle.gz' {code}) riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific. Writing *.format.gz with the command line is probably as much within remit of someone using riot on the command line as for reading those. So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577228#comment-14577228 ] A. Soroka commented on JENA-959: Okay, I'll take this ticket forward a bit working on the assumption that a separate flag for output compression is best. I agree that 'manually adjustable' input compression would be nice, and I think that belongs in a separate ticket, or maybe we break this one down into subtasks? riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific. Writing *.format.gz with the command line is probably as much within remit of someone using riot on the command line as for reading those. So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577215#comment-14577215 ] A. Soroka edited comment on JENA-959 at 6/8/15 1:53 PM: What do you think of the idea of an independent flag ({{--compress}} or the like)? Since compression can be applied orthogonally to any format, it seems a little simpler to keep it separate. was (Author: ajs6f): What do you think of the idea of an independent flag ({{--compress}} or the like). Since compression can be applied orthogonally to any format, it seems a little simpler to keep it separate. riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific and not so easily So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577264#comment-14577264 ] Andy Seaborne commented on JENA-959: Currently, it is reasonably orthogonal to the format. The asymmetries are # {{riot}} -- output is not named where as input is usually. # {{RDFDataMgr}} takes I/O streams, not file names. {{RDFLanguages.filenameToLang}} maps file extension to language symbol and it handles {{.gz}}. {{Lang}} themselves don't register compressions i.e. don't have a specific file extension of {{.ttl.gz}}. Then when reading, {{IO.openFileEx(String)}} has the similar understanding of {{.gz}} and it adds the decompressor. {{IO.openOutputFileEx(String)}} already has the complementary code to {{IO.openFileEx(String)}} to add the compressor. This then all works from {{RDFDataMgr.(read|load)}} and {{model.read}}. The command {{riot}} isn't special for input. Making syntax names work with compression extensions look interesting. If {{--compress}} then {{--decompress}} for stream in. Don't forget {{http://.../gz}} case and decompression (i.e. when the HTTP response does not add the decompression step as you GET the compressed file. riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific. Writing *.format.gz with the command line is probably as much within remit of someone using riot on the command line as for reading those. So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-109986517 Hi, PR is mergeable again after conflict fixing of #72. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user osma commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-109996262 Thanks for fixing, and sorry for causing the conflict with #72. It's good that you've added unit tests, however I think there could be more of them. The current test adds and removes a resource, and only then checks that it's gone. I think it should check that it got into the index in the first place, otherwise it could be that text indexing is completely broken (no hits ever) and the test would still pass. Would it be possible/easy the structure the unit tests so that all regular tests get executed also with the uid field enabled? After all, it shouldn't affect the current functionality if you enable deletion support (if it does it's a bug, either in implementation or the tests). You could get a lot of free tests this way and there would perhaps be no need for further tests of uid/deletion functionality. A similar trick done with the graph-specific indexing, i.e. there are general tests in AbstractTestDatasetWithTextIndex, then a couple of extra tests for graph-aware functionality in AbstractTestDatasetWithGraphTextIndex, and finally TestDatasetWithLuceneGraphTextIndex pulls it together with the right (graph-aware) configuration. You could similarly try to reuse all the tests in AbstractTestDatasetWithTextIndex for the uid case. I admit the class hierarchy and naming is a bit complicated... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
Re: Trouble Building Under Eclipse
I think the idea of breaking the shaded Guava artifact out of the main cycle is great. It's clearly not a subject of work under most circumstances and having one less moving part in a developer's mix is usually a good thing, especially for the simple-minded ({raises hand}). Is it only Hadoop's Guava that is at issue? Would it be possible perhaps to just exclude/ Guava from the Hadoop dependencies in Elephas? Or does that blow up Hadoop? Or should I go experiment and find out? --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 9:21 AM, Andy Seaborne a...@apache.org wrote: Ah right. To summarise what is happening: The POM file in the maven repo is not the POM file in git.The shade plugin produces a different POM for the the output artifact with the shaded dependency removed. When the project is not open, Eclipse sees the reduced POM, which does not have a dependency on Google Guava. When the module jena-shaded-guava is open in Eclipse, Eclipse sees the POM in the module source which names the dependent Google Guava in a dependency. Result: a certain degree of chaos. Andy On 06/06/15 03:19, Stian Soiland-Reyes wrote: Yes, you would need to keep the jena-guava project closed so you get the Maven-built shaded jar on the classpath, which has the shaded package name, otherwise you will just see the upstream Guava through Eclipse's project sharing. The package name is not shaded for OSGi, it is easy to define private packages there. It is shaded to avoid duplicate version mismatches against other dependencies with the real guava, e.g. Hadoop which as you know has an ancient Guava. It might be good to keep it out of the normal build/release cycle, then you would get the jena-guava shade from Maven central, which should only change when we upgrade Guava, in which case it could be re-enabled in the SNAPSHOT build or vote+released as a separate artifact (which might be slightly odd as it contains no Jena contributions beyond the package name) On 4 Jun 2015 14:33, aj...@virginia.edu aj...@virginia.edu wrote: I have had this problem since I began tinkering. The only solution I have found is make sure that the jena-shaded-guava project is never open when any project that refers to types therein is open. This isn't much of a burden, and I suppose it has something to do with the Maven magic that is going on inside jena-shaded-guava. I'm not totally clear as to why Jena shades Guava into its own namespace-- is it to avoid OSGi-exporting Guava packages? (We have something like that going on in another project on which I work.) --- A. Soroka The University of Virginia Library On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote: Folks Recently I've been having a lot of trouble getting Jena to build in Eclipse which seems to be due to the use of the Shade plugin to Shade Guava. Any module that has a reference to the shaded classes ends refuses to build with various variations of the following error: java.lang.NoClassDefFoundError: org/apache/jena/ext/com/google/common/cache/RemovalNotification Anybody else been having this issue? If so how did you resolve it? Sometimes cleaning my workspace and/or doing a mvn package at the command line seems to help but other times it doesn't Rob
Re: Trouble Building Under Eclipse
Ah right. To summarise what is happening: The POM file in the maven repo is not the POM file in git.The shade plugin produces a different POM for the the output artifact with the shaded dependency removed. When the project is not open, Eclipse sees the reduced POM, which does not have a dependency on Google Guava. When the module jena-shaded-guava is open in Eclipse, Eclipse sees the POM in the module source which names the dependent Google Guava in a dependency. Result: a certain degree of chaos. Andy On 06/06/15 03:19, Stian Soiland-Reyes wrote: Yes, you would need to keep the jena-guava project closed so you get the Maven-built shaded jar on the classpath, which has the shaded package name, otherwise you will just see the upstream Guava through Eclipse's project sharing. The package name is not shaded for OSGi, it is easy to define private packages there. It is shaded to avoid duplicate version mismatches against other dependencies with the real guava, e.g. Hadoop which as you know has an ancient Guava. It might be good to keep it out of the normal build/release cycle, then you would get the jena-guava shade from Maven central, which should only change when we upgrade Guava, in which case it could be re-enabled in the SNAPSHOT build or vote+released as a separate artifact (which might be slightly odd as it contains no Jena contributions beyond the package name) On 4 Jun 2015 14:33, aj...@virginia.edu aj...@virginia.edu wrote: I have had this problem since I began tinkering. The only solution I have found is make sure that the jena-shaded-guava project is never open when any project that refers to types therein is open. This isn't much of a burden, and I suppose it has something to do with the Maven magic that is going on inside jena-shaded-guava. I'm not totally clear as to why Jena shades Guava into its own namespace-- is it to avoid OSGi-exporting Guava packages? (We have something like that going on in another project on which I work.) --- A. Soroka The University of Virginia Library On Jun 4, 2015, at 9:22 AM, Rob Vesse rve...@dotnetrdf.org wrote: Folks Recently I've been having a lot of trouble getting Jena to build in Eclipse which seems to be due to the use of the Shade plugin to Shade Guava. Any module that has a reference to the shaded classes ends refuses to build with various variations of the following error: java.lang.NoClassDefFoundError: org/apache/jena/ext/com/google/common/cache/RemovalNotification Anybody else been having this issue? If so how did you resolve it? Sometimes cleaning my workspace and/or doing a mvn package at the command line seems to help but other times it doesn't Rob
[jira] [Comment Edited] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577264#comment-14577264 ] Andy Seaborne edited comment on JENA-959 at 6/8/15 2:42 PM: Currently, it is reasonably orthogonal to the format. The asymmetries are # {{riot}} -- output is not named where as input is usually. # {{RDFDataMgr}} takes I/O streams, not file names. {{RDFLanguages.filenameToLang}} maps file extension to language symbol and it handles {{.gz}}. {{Lang}} themselves don't register compressions i.e. don't have a specific file extension of {{.ttl.gz}}. Then when reading, {{IO.openFileEx(String)}} has the similar understanding of {{.gz}} and it adds the decompressor. {{IO.openOutputFileEx(String)}} already has the complementary code to {{IO.openFileEx(String)}} to add the compressor. This then all works from {{RDFDataMgr.(read|load)}} and {{model.read}}. The command {{riot}} isn't special for input. Making syntax names work with compression extensions look interesting. If {{\--compress}} then {{\--decompress}} for stream in. Don't forget {{http://.../gz}} case and decompression (i.e. when the HTTP response does not add the decompression step as you GET the compressed file. was (Author: andy.seaborne): Currently, it is reasonably orthogonal to the format. The asymmetries are # {{riot}} -- output is not named where as input is usually. # {{RDFDataMgr}} takes I/O streams, not file names. {{RDFLanguages.filenameToLang}} maps file extension to language symbol and it handles {{.gz}}. {{Lang}} themselves don't register compressions i.e. don't have a specific file extension of {{.ttl.gz}}. Then when reading, {{IO.openFileEx(String)}} has the similar understanding of {{.gz}} and it adds the decompressor. {{IO.openOutputFileEx(String)}} already has the complementary code to {{IO.openFileEx(String)}} to add the compressor. This then all works from {{RDFDataMgr.(read|load)}} and {{model.read}}. The command {{riot}} isn't special for input. Making syntax names work with compression extensions look interesting. If {{--compress}} then {{--decompress}} for stream in. Don't forget {{http://.../gz}} case and decompression (i.e. when the HTTP response does not add the decompression step as you GET the compressed file. riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific. Writing *.format.gz with the command line is probably as much within remit of someone using riot on the command line as for reading those. So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (JENA-804) Jena is not reusing already allocated space on the file system which results in large amounts of disk space reserved by Jena files
[ https://issues.apache.org/jira/browse/JENA-804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577340#comment-14577340 ] Keith Wells commented on JENA-804: -- This issue has become a pain point: we have encountered examples with our customers where the TDB index has grown to very large sizes. One example is a 170 GB index which after unloading and reloading the nquads, the size of their index was reduced to 17GB. Jena is not reusing already allocated space on the file system which results in large amounts of disk space reserved by Jena files -- Key: JENA-804 URL: https://issues.apache.org/jira/browse/JENA-804 Project: Apache Jena Issue Type: Bug Components: Jena Affects Versions: Jena 2.11.2, TDB 1.0.2 Environment: Windows 7, IBM JRE 1.7, Tomcat 7.0.54 Reporter: Keith Wells Attachments: TdbGrowthTests.java, out.txt, test-tdb-size.sh We have a product based on Jena TDB where we insert quads to Jena TDB along with the deletion of quads. We understand the performance over space architectural decision to not clean up deleted nodeids from the indexes. But the usage of disk space appears that Jena TDB is not reusing allocated space which had been allocated by Jena previously. Based on this comment there appears to be something that is not correct on file space utilization, http://mail-archives.apache.org/mod_mbox/jena-users/201310.mbox/%3cce7d7929.2a707%25rve...@dotnetrdf.org%3E: The indexes won't shrink - TDB never gives disk space back to the OS - but disk space is reused when reallocated within the same JVM.. In this scenario on the same JVM with NO server stops or starts, we add 27765 graphs to IndexTdb and immediately remove them, repeating this process several times. {noformat} MB Bytes Diff (Bytes) Start 193 203239424 Reindex 5 249 262066176 58826752 Reindex 6 249 262086656 20480 Reindex 10298 312500224 50413568 Reindex 11298 312520704 20480 Reindex 12298 312541184 20480 Reindex 13298 312586240 45056 Reindex 14306 320995328 8409088 Reindex 15330 346181632 25186304 Reindex 16330 346198538 16906 Reindex 17346 362999808 16801270 Reindex 18346 363020288 20480 Reindex 19346 363040768 20480 Reindex 20346 363061248 20480 Reindex 21346 363081728 20480 Reindex 22354 371490816 8409088 Reindex 23378 396677120 25186304 End 193 203239424 {noformat} The system starts with 193MB of data allocated by indexTdb. A reindex consists of a remove followed by an add of these graphs. As you can see from the data there is a dramatic increase in the size of indexTdb on the disk after repeadedly removing and adding graphs. After Reindex 23, there is 378 MB of disk space used. If Jena TDB reused allocated space there would be no need to allocate more space other than what is used by deleted node ids (unless nodeid storage is eating all of this space?). Jena does not appear to be reusing the allocated disk space. At the very end of this scenario, we exported the nquads and reloaded them to show the original disk space was 193MB back to where it started. We believe Jena TDB is not reusing the space allocated by the TDB file system within the same JVM. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: CLI libraries
On 08/06/15 15:47, aj...@virginia.edu wrote: In examining and discussing https://issues.apache.org/jira/browse/JENA-959, it seems to me (a Jena newbie!) that Jena's CLI action is built up in jena-core, in package jena.cmdline. If that is correct, and Jena has its own CLI code, wouldn't it be better to replace this with a modern CLI library like that provided by Apache Commons? Does that sound like a ticket? arq.cmdline.CmdLineArgs The whole cmd support does more than Apache Commons CLI. Around command line processing is support for grouping and reuse across commands, and an execution model. There are a lot of commands -- Apache Commons CLI would also cause chnages in syntax. e.g. arq.cmd does not treat -- and - differently; combined POSIX like options aren't supported. (jena.cmdline looks like some partial copy to get older development working). A useful goal might be to have a module jena-cmd which is after SDB, TDB and the rest with the set of command line tools we deed to be the public set of commands (some of the old stuff needs retiring or at least incompatibly brought into the general style - e.g. rdfcompare). People use rdfcat :-( but nowadays riot is better IMO (scale, speed, arguments, ..) but I'm not unbiased. A useful but bounded stpe might be to take arq.cmd* to jena-base/jena.cmd* and drop jena-core/jena.cmdline (not tried this so there maybe a forgotten dependency). Andy --- A. Soroka The University of Virginia Library
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-110026337 Ok I see, I will add a similar case of graph-specific for deletion support. One question about graph indexing. In jena-text documentation you mention: This allows for more efficient text queries when the query targets only a single named graph. But there's no example of using this (even in the tests). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] jena pull request: Lucene index synchro on triple deletion (jena-t...
Github user amiara514 commented on the pull request: https://github.com/apache/jena/pull/53#issuecomment-110032217 @osma oups, forget my message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (JENA-959) riot: gzip output option
[ https://issues.apache.org/jira/browse/JENA-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14577268#comment-14577268 ] A. Soroka commented on JENA-959: I'm a little confused now [~andy.seaborne]-- are you arguing _against_ or _for_ a separate flag? riot: gzip output option Key: JENA-959 URL: https://issues.apache.org/jira/browse/JENA-959 Project: Apache Jena Issue Type: New Feature Components: RIOT Reporter: Stian Soiland-Reyes Priority: Trivial The riot command line tool supports incoming file formats like *.ttl.gz, but there is no (obvious) way to also output in compressed formats. This can of course also be achieved with piping and gzip, but that is easily platform-specific. Writing *.format.gz with the command line is probably as much within remit of someone using riot on the command line as for reading those. So my suggestion is to support extension .gz in the various --output options to enabled outputting via a GzipOutputStream -- http://docs.oracle.com/javase/7/docs/api/java/util/zip/GZIPOutputStream.html For example: {code} stain@biggie-utopic:~/Downloads$ riot --output=nquads.gz chembl_20.0_target_targetcmpt_ls.ttl.gz Not recognized as an RDF language : 'nquads.gz' {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)
CLI libraries
In examining and discussing https://issues.apache.org/jira/browse/JENA-959, it seems to me (a Jena newbie!) that Jena's CLI action is built up in jena-core, in package jena.cmdline. If that is correct, and Jena has its own CLI code, wouldn't it be better to replace this with a modern CLI library like that provided by Apache Commons? Does that sound like a ticket? --- A. Soroka The University of Virginia Library
Re: CLI libraries
Okay, that makes sense. Is the larger move (the construction of 'jena-cmd') worth an epic in Jira? With the smaller (take arq.cmd* to jena-base/jena.cmd* and drop jena-core/jena.cmdline) as a first story therein? --- A. Soroka The University of Virginia Library On Jun 8, 2015, at 11:24 AM, Andy Seaborne a...@apache.org wrote: On 08/06/15 15:47, aj...@virginia.edu wrote: In examining and discussing https://issues.apache.org/jira/browse/JENA-959, it seems to me (a Jena newbie!) that Jena's CLI action is built up in jena-core, in package jena.cmdline. If that is correct, and Jena has its own CLI code, wouldn't it be better to replace this with a modern CLI library like that provided by Apache Commons? Does that sound like a ticket? arq.cmdline.CmdLineArgs The whole cmd support does more than Apache Commons CLI. Around command line processing is support for grouping and reuse across commands, and an execution model. There are a lot of commands -- Apache Commons CLI would also cause chnages in syntax. e.g. arq.cmd does not treat -- and - differently; combined POSIX like options aren't supported. (jena.cmdline looks like some partial copy to get older development working). A useful goal might be to have a module jena-cmd which is after SDB, TDB and the rest with the set of command line tools we deed to be the public set of commands (some of the old stuff needs retiring or at least incompatibly brought into the general style - e.g. rdfcompare). People use rdfcat :-( but nowadays riot is better IMO (scale, speed, arguments, ..) but I'm not unbiased. A useful but bounded stpe might be to take arq.cmd* to jena-base/jena.cmd* and drop jena-core/jena.cmdline (not tried this so there maybe a forgotten dependency). Andy --- A. Soroka The University of Virginia Library
TDB2
Informational announcement: TDB2 TDB2 is a reworking of TDB based on updated implementations of transactions and transactional data structures for project Lizard (a clustered SPARQL store). TDB2 has: * Arbitrary scale write-once transactions * New transaction system - can add other first class components. (e.g. text indexes, cache tables) * Models works across transaction boundaries * Cleaner, simpler, more maintainable TDB2 databases are not compatible with TDB databases. It uses a more efficient encoding for RDF terms. [1] Being a database, the new indexing and transaction code needs time to settle to bring the maturity up. I'm using that tech in Lizard development. Andy TDB2 code: https://github.com/afs/mantis/tree/master/tdb2 Lizard slides: http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard [1] An upgrade path using TDB1-style encoding is possible; it is an one-way upgrade path and not reversible [2]. TDB2 adds control files for the copy-on-write data structures that TDB1 does not understand. [2] Actually, if the encoding is compatible, what will happen is that TDB1 will see the database at the time of the upgrade. Welcome to copy-on-write immutable data structures.
Re: TDB2
On 08/06/15 16:41, Andy Seaborne wrote: Informational announcement: TDB2 TDB2 is a reworking of TDB based on updated implementations of transactions and transactional data structures for project Lizard (a clustered SPARQL store). TDB2 has: * Arbitrary scale write-once transactions * New transaction system - can add other first class components. (e.g. text indexes, cache tables) * Models works across transaction boundaries * Cleaner, simpler, more maintainable TDB2 databases are not compatible with TDB databases. It uses a more efficient encoding for RDF terms. [1] Being a database, the new indexing and transaction code needs time to settle to bring the maturity up. I'm using that tech in Lizard development. Andy TDB2 code: https://github.com/afs/mantis/tree/master/tdb2 Lizard slides: http://www.slideshare.net/andyseaborne/201411-apache-coneu-lizard [1] An upgrade path using TDB1-style encoding is possible; it is an one-way upgrade path and not reversible [2]. TDB2 adds control files for the copy-on-write data structures that TDB1 does not understand. [2] Actually, if the encoding is compatible, what will happen is that TDB1 will see the database at the time of the upgrade. Welcome to copy-on-write immutable data structures. TDB2 is transactional use only. Additional fun with Java8: all the begin/commit foo is hidden. Dataset ds = TDBFactory.createDataset() ; Here is a write transaction to load a file: TDBTxn.executeWrite(ds, ()-RDFDataMgr.read(ds, http:...)) ; Or to get the size of the default model safely: long size = TDBTxn.executeReadReturn(ds, ()-ds.getDefaultModel().size()) ; Andy