[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Github user asfgit closed the pull request at: https://github.com/apache/jena/pull/427 ---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Github user afs commented on a diff in the pull request: https://github.com/apache/jena/pull/427#discussion_r192794776 --- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java --- @@ -158,15 +177,18 @@ static public OutputStream openOutputFileEx(String filename) throws FileNotFound filename = IRILib.decode(filename) ; } OutputStream out = new FileOutputStream(filename) ; -if ( filename.endsWith(".gz") ) -out = new GZIPOutputStream(out) ; +String ext = FileOps.extension(filename); --- End diff -- Good idea as a separate "clean up FileOps/FileUtils" item and let this PR go in now. Got to finish sometime! ---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Github user afs commented on a diff in the pull request: https://github.com/apache/jena/pull/427#discussion_r192794317 --- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java --- @@ -77,10 +81,28 @@ static public InputStream openFileEx(String filename) throws IOException, FileNo filename = IRILib.decode(filename) ; } InputStream in = new FileInputStream(filename) ; -if ( filename.endsWith(".gz") ) -in = new GZIPInputStream(in) ; +String ext = FileOps.extension(filename); +switch ( ext ) { +case "":return in; +case "gz": return new GZIPInputStream(in) ; +case "bz2": return new BZip2CompressorInputStream(in); +case "sz": return new SnappyCompressorInputStream(in); +} return in ; } + +private static String[] extensions = { ".gz", ".bz2", ".sz" }; + +/** The filename without any compression extension, or the original filename. + * It tests for compression types handled by {@link #openFileEx}. + */ +static public String filenameNoCompression(String filename) { +for ( String ext : extensions ) { +if ( filename.endsWith(ext) ) +return filename.substring(0, filename.length()-ext.length()); +} +return filename; +} --- End diff -- Done. ---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Github user kinow commented on a diff in the pull request: https://github.com/apache/jena/pull/427#discussion_r192701731 --- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java --- @@ -158,15 +177,18 @@ static public OutputStream openOutputFileEx(String filename) throws FileNotFound filename = IRILib.decode(filename) ; } OutputStream out = new FileOutputStream(filename) ; -if ( filename.endsWith(".gz") ) -out = new GZIPOutputStream(out) ; +String ext = FileOps.extension(filename); --- End diff -- Digressing; but as we have `FilenameUtils.getExtension()` in the classpath, from commons-io, perhaps this could later be marked as `deprecated`? ---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Github user kinow commented on a diff in the pull request: https://github.com/apache/jena/pull/427#discussion_r192694578 --- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java --- @@ -77,10 +81,28 @@ static public InputStream openFileEx(String filename) throws IOException, FileNo filename = IRILib.decode(filename) ; } InputStream in = new FileInputStream(filename) ; -if ( filename.endsWith(".gz") ) -in = new GZIPInputStream(in) ; +String ext = FileOps.extension(filename); +switch ( ext ) { +case "":return in; +case "gz": return new GZIPInputStream(in) ; +case "bz2": return new BZip2CompressorInputStream(in); +case "sz": return new SnappyCompressorInputStream(in); +} return in ; } + +private static String[] extensions = { ".gz", ".bz2", ".sz" }; + +/** The filename without any compression extension, or the original filename. + * It tests for compression types handled by {@link #openFileEx}. + */ +static public String filenameNoCompression(String filename) { +for ( String ext : extensions ) { +if ( filename.endsWith(ext) ) +return filename.substring(0, filename.length()-ext.length()); +} +return filename; +} --- End diff -- Maybe instead ```java /** The filename without any compression extension, or the original filename. * It tests for compression types handled by {@link #openFileEx}. */ static public String filenameNoCompression(String filename) { if ( FilenameUtils.isExtension(filename, extensions) ) { return FilenameUtils.removeExtension(filename); } return filename; } ``` I believe we have commons-io already in the dependencies list. There's some extra check for null bytes in the extension check... but that's not so important. Just simpler I think. ---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Github user afs commented on a diff in the pull request: https://github.com/apache/jena/pull/427#discussion_r192678383 --- Diff: pom.xml --- @@ -68,6 +68,7 @@ 3.4 1.5 1.11 +1.16.1 --- End diff -- Yes! Thanks for the pointer. ---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
Github user kinow commented on a diff in the pull request: https://github.com/apache/jena/pull/427#discussion_r192605754 --- Diff: pom.xml --- @@ -68,6 +68,7 @@ 3.4 1.5 1.11 +1.16.1 --- End diff -- 1.17 was just released... maybe worth using it instead? Just received Stefan's announcement message about it in the commons mailing list. ---
[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...
GitHub user afs opened a pull request: https://github.com/apache/jena/pull/427 JENA-1554, JENA-1555: Support bz2 compressed files directly from Java. JENA-1555, JENA-1554: Update awaitility ; add Apache Commons compress JENA-1554: Add bz2 compression/decompression You can merge this pull request into a Git repository by running: $ git pull https://github.com/afs/jena compressed Alternatively you can review and apply these changes as the patch at: https://github.com/apache/jena/pull/427.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #427 commit eb9ba394f59ae5f827a54db718b032d797d1bafb Author: Andy Seaborne Date: 2018-06-03T08:51:44Z JENA-1555, JENA-1554: Update awaitility ; add Apache Commons compress commit f88fbc578d02ed8925104bf5d4a03795470d9275 Author: Andy Seaborne Date: 2018-06-03T09:11:13Z JENA-1554: Add bz2 compression/decompression Add Snappy default 32k block decompress only; compressor not available Update javadoc (RDFLanguages, BinRDF) that mentions gz. ---