[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

2018-06-05 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/jena/pull/427


---


[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

2018-06-04 Thread afs
Github user afs commented on a diff in the pull request:

https://github.com/apache/jena/pull/427#discussion_r192794776
  
--- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
@@ -158,15 +177,18 @@ static public OutputStream openOutputFileEx(String 
filename) throws FileNotFound
 filename = IRILib.decode(filename) ;
 }
 OutputStream out = new FileOutputStream(filename) ;
-if ( filename.endsWith(".gz") )
-out = new GZIPOutputStream(out) ;
+String ext = FileOps.extension(filename);
--- End diff --

Good idea as a separate "clean up FileOps/FileUtils" item and let this PR 
go in now.  Got to finish sometime!



---


[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

2018-06-04 Thread afs
Github user afs commented on a diff in the pull request:

https://github.com/apache/jena/pull/427#discussion_r192794317
  
--- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
@@ -77,10 +81,28 @@ static public InputStream openFileEx(String filename) 
throws IOException, FileNo
 filename = IRILib.decode(filename) ;
 }
 InputStream in = new FileInputStream(filename) ;
-if ( filename.endsWith(".gz") )
-in = new GZIPInputStream(in) ;
+String ext = FileOps.extension(filename);
+switch ( ext ) {
+case "":return in;
+case "gz":  return new GZIPInputStream(in) ;
+case "bz2": return new BZip2CompressorInputStream(in);
+case "sz":  return new SnappyCompressorInputStream(in);
+}
 return in ;
 }
+
+private static String[] extensions = { ".gz", ".bz2", ".sz" }; 
+
+/** The filename without any compression extension, or the original 
filename.
+ *  It tests for compression types handled by {@link #openFileEx}.
+ */
+static public String filenameNoCompression(String filename) {
+for ( String ext : extensions ) {
+if ( filename.endsWith(ext) )
+return filename.substring(0, 
filename.length()-ext.length());
+}
+return filename;
+}
--- End diff --

Done.


---


[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

2018-06-04 Thread kinow
Github user kinow commented on a diff in the pull request:

https://github.com/apache/jena/pull/427#discussion_r192701731
  
--- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
@@ -158,15 +177,18 @@ static public OutputStream openOutputFileEx(String 
filename) throws FileNotFound
 filename = IRILib.decode(filename) ;
 }
 OutputStream out = new FileOutputStream(filename) ;
-if ( filename.endsWith(".gz") )
-out = new GZIPOutputStream(out) ;
+String ext = FileOps.extension(filename);
--- End diff --

Digressing; but as we have `FilenameUtils.getExtension()` in the classpath, 
from commons-io, perhaps this could later be marked as `deprecated`?


---


[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

2018-06-04 Thread kinow
Github user kinow commented on a diff in the pull request:

https://github.com/apache/jena/pull/427#discussion_r192694578
  
--- Diff: jena-base/src/main/java/org/apache/jena/atlas/io/IO.java ---
@@ -77,10 +81,28 @@ static public InputStream openFileEx(String filename) 
throws IOException, FileNo
 filename = IRILib.decode(filename) ;
 }
 InputStream in = new FileInputStream(filename) ;
-if ( filename.endsWith(".gz") )
-in = new GZIPInputStream(in) ;
+String ext = FileOps.extension(filename);
+switch ( ext ) {
+case "":return in;
+case "gz":  return new GZIPInputStream(in) ;
+case "bz2": return new BZip2CompressorInputStream(in);
+case "sz":  return new SnappyCompressorInputStream(in);
+}
 return in ;
 }
+
+private static String[] extensions = { ".gz", ".bz2", ".sz" }; 
+
+/** The filename without any compression extension, or the original 
filename.
+ *  It tests for compression types handled by {@link #openFileEx}.
+ */
+static public String filenameNoCompression(String filename) {
+for ( String ext : extensions ) {
+if ( filename.endsWith(ext) )
+return filename.substring(0, 
filename.length()-ext.length());
+}
+return filename;
+}
--- End diff --

Maybe instead

```java
/** The filename without any compression extension, or the original 
filename.
 *  It tests for compression types handled by {@link #openFileEx}.
 */
static public String filenameNoCompression(String filename) {
if ( FilenameUtils.isExtension(filename, extensions) ) {
return FilenameUtils.removeExtension(filename);
}
return filename;
}
```

I believe we have commons-io already in the dependencies list. There's some 
extra check for null bytes in the extension check... but that's not so 
important. Just simpler I think.


---


[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

2018-06-04 Thread afs
Github user afs commented on a diff in the pull request:

https://github.com/apache/jena/pull/427#discussion_r192678383
  
--- Diff: pom.xml ---
@@ -68,6 +68,7 @@
 3.4
 1.5
 1.11
+1.16.1
--- End diff --

Yes! Thanks for the pointer.


---


[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

2018-06-03 Thread kinow
Github user kinow commented on a diff in the pull request:

https://github.com/apache/jena/pull/427#discussion_r192605754
  
--- Diff: pom.xml ---
@@ -68,6 +68,7 @@
 3.4
 1.5
 1.11
+1.16.1
--- End diff --

1.17 was just released... maybe worth using it instead? Just received 
Stefan's announcement message about it in the commons mailing list.


---


[GitHub] jena pull request #427: JENA-1554, JENA-1555: Support bz2 compressed files d...

2018-06-03 Thread afs
GitHub user afs opened a pull request:

https://github.com/apache/jena/pull/427

JENA-1554, JENA-1555: Support bz2 compressed files directly from Java.

JENA-1555, JENA-1554: Update awaitility ; add Apache Commons compress
JENA-1554: Add bz2 compression/decompression


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/afs/jena compressed

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/jena/pull/427.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #427


commit eb9ba394f59ae5f827a54db718b032d797d1bafb
Author: Andy Seaborne 
Date:   2018-06-03T08:51:44Z

JENA-1555, JENA-1554: Update awaitility ; add Apache Commons compress

commit f88fbc578d02ed8925104bf5d4a03795470d9275
Author: Andy Seaborne 
Date:   2018-06-03T09:11:13Z

JENA-1554: Add bz2 compression/decompression

Add Snappy
  default 32k block
  decompress only; compressor not available

Update javadoc (RDFLanguages, BinRDF) that mentions gz.




---