Re: [julia-users] ANN: RawArray.jl

2016-10-14 Thread Páll Haraldsson
On Thursday, October 13, 2016 at 6:00:30 PM UTC, Tim Holy wrote:
>
> If you just want a raw dump of memory, you can get that, and if it's big 
> it uses `Mmap.mmap` when it reads the data back in. So you can read 
> terabyte-sized arrays.
>

[Not clear on mmap.. just a possibility, kind or requirement when arrays 
are this big?]

Good to know, I have another thread on array sizes (and 2 GB limit).

You mean you could read terabyte-sized, not that it's common or that you 
know of for 1D arrays?

[Would that be arrays of big structs? Fewer than 2G, e.g. 32-bit index 
would do?]

I'm not at all worried for 2D (or more dimensions).



Re: [julia-users] ANN: RawArray.jl

2016-10-13 Thread Páll Haraldsson

[Explaining more, and correcting typo..]

On Sunday, October 9, 2016 at 12:04:35 AM UTC, Páll Haraldsson wrote:

> FLIF is not a replacement for all uses (multidimensional, would be 
> interesting to know if could to be extended to..), but seem to be the best 
> option for non-lossy image compression:
>

Of course, say three dimensional, could be trivially, concatenated FLIF 
files/bytestreams of 2D slices.

But I really meant, could there be a way to compress whole array/"image", 
similar to how wavelet (and fft) can be generalized to more dimensions?


Re: [julia-users] ANN: RawArray.jl

2016-10-08 Thread Páll Haraldsson


On Monday, September 26, 2016 at 1:59:15 PM UTC, David Smith wrote:
>
> Hi, Isaiah. This is a valid question.
>
> 0. As a preface, I'd like to say I'm not trying to replace anything. I 
> wrote RawArray to solve a problem we have in magnetic resonance imaging 
> (quickly saving and loading large complex float arrays), and then I decided 
> to share it so if other people like it and find it useful, then cool beans.
>
> Now for the mild stumping...
>
> 1. I don't think NRRD is as substantially used as you might think. I've 
> worked in imaging science for years on the data processing/file format end, 
> and I've never seen anyone use it, and I've never even heard of it.  (Pity, 
> because it looks nice enough. :-\)
>
> 2. RawArray is simpler to handle and trivial to understand. I believe all 
> you need from an I/O library is I/O.* I don't want my file I/O library 
> performing transformations on my data. 
>
> I also don't need it to read image formats. Part of the reason behind 
> RawArray is to avoid standard image formats because they are not optimized 
> for large complex-float arrays. I just want to save multi-GB data arrays to 
> disk quickly and read them back quickly on a different machine, five years 
> later. 
>
> I have other implementations (https://github.com/davidssmith/ra), and all 
> are super short and platform agnostic.
>
> 3. RawArray is surely faster. All it does is read. It doesn't perform any 
> transformations or encoding, so it can't possibly be slower than NRRD.
>

Maybe not compared to NRRD, but it can be slower than lossless image 
compression.

I did read (short.. good):
https://github.com/davidssmith/ra/blob/master/doc/ra-sedona-abstract.pdf

https://en.wikipedia.org/wiki/Free_Lossless_Image_Format

FLIF is not a replacement for all uses (multidimensional, would be 
interesting to know if could to be extended to..), but seem to be the best 
option for non-lyssy image compression:

http://flif.info/index.html
"
53% smaller than lossless JPEG 2000 compression,
74% smaller than lossless JPEG XR compression.

Even if the best image format was picked out of PNG, JPEG 2000, WebP or BPG 
for a given image corpus, depending on the type of images (photograph, line 
art, 8 bit or higher bit depth, etc), then FLIF still beats that by 12% on 
a median corpus 
[..]
FLIF does away with knowing what image format performs the best at any 
given task.
[..]
Other lossless formats also support progressive decoding (e.g. PNG with 
Adam7 interlacing), but FLIF is better at it. Here is a simple 
demonstration video, which shows an image as it is slowly being downloaded:
[..]
No patents, Free

Unlike some other image formats (e.g. BPG and JPEG 2000), FLIF is 
completely royalty-free and it is not known to be encumbered by software 
patents. At least as far as we know. FLIF is uses arithmetic coding, just 
like FFV1 (which inspired FLIF), but as far as we know, all patents related 
to arithmetic coding are expired. Other than that, we do not think FLIF 
uses any techniques on which patents are claimed. However, we are not 
lawyers. There are a stunning number of software patents, some of which are 
very broad and vague; it is impossible to read them all, let alone 
guarantee that nobody will ever claim part of FLIF to be covered by some 
patent. All we know is that we did not knowingly use any technique which is 
(still) patented, and we did not patent FLIF ourselves either.

The reference implementation of FLIF is Free Software. It is released 
under the terms of the GNU Lesser General Public License (LGPL), version 3 
or any later version.
[..]
The reference FLIF decoder is also available as a shared library, 
released under the more permissive (non-copyleft) terms of the Apache 2.0 
license. Public domain example code is available to illustrate how to use 
the decoder library.

Moreover, the reference implementation is available free of charge 
(gratis) under these terms.
[..]
FLIF currently has the following features:

Lossless compression
Lossy compression (encoder preprocessing option, format itself is 
lossless so no generation loss)
Greyscale, RGB, RGBA (also palette and color-bucket modes)
Color depth: up to 16 bits per channel (high bit depth)"

-- 
Palli.

There is a C library at (https://github.com/davidssmith/ra) if you think a 
> pure Julia implementation isn't fast enough. 
>
> Cheers,
> Dave
>
> [*] That said, I'm not completely ruling out having transformations 
> available in RawArray between the RAM and disk. For example, when I first 
> wrote it, I had included Blosc compression as an option, signaled by a flag 
> in the header. But in general most transformations are best made in RAM 
> after reading or on disk with already existing, battle-proven tools, such 
> as gzip, uunencode, tar, etc. 
>
>
> On Sunday, September 25, 2016 at 9:59:45 PM UTC-5, Isaiah wrote:
>>
>> Is there a reason to use this file format over NRRD [1]? To borrow a 

Re: [julia-users] ANN: RawArray.jl

2016-10-01 Thread 'Tobias Knopp' via julia-users
https://github.com/JuliaIO/NRRD.jl

thats pure Julia


Am Samstag, 1. Oktober 2016 20:48:37 UTC+2 schrieb Steven G. Johnson:
>
>
>
> On Monday, September 26, 2016 at 9:59:15 AM UTC-4, David Smith wrote:
>>
>> I also don't need it to read image formats. Part of the reason behind 
>> RawArray is to avoid standard image formats because they are not optimized 
>> for large complex-float arrays. I just want to save multi-GB data arrays to 
>> disk quickly and read them back quickly on a different machine, five years 
>> later. 
>>
>
>  Aside from a small ASCII header, it looks (from the specs) like NRRD can 
> save a multidimensional complex floating-point array as just the raw data, 
> i.e. a single "write" call.  So I'm not sure what you mean by "not 
> optimized".
>
> As for being able to read something 5 years later, using a pre-existing 
> format with some kind of userbase seems to improve the odds of that.
>
> On Monday, September 26, 2016 at 10:03:20 AM UTC-4, David Smith wrote:
>>
>> Sorry I forgot to add: 
>>
>> JuliaIO/Images.jl relies on having ImageMagick installed, whereas 
>> RawArray.jl is a pure Julia solution without any dependencies. 
>>
>
> The NRRD spec is not that complicated at first glance; it looks like it 
> wouldn't be too hard to write a pure-Julia implementation of it.   If you 
> only want to support the subset of NRRD's functionality provided by 
> RawArray, the implementation effort wouldn't be much harder than RawArray.
>


Re: [julia-users] ANN: RawArray.jl

2016-10-01 Thread Steven G. Johnson


On Monday, September 26, 2016 at 9:59:15 AM UTC-4, David Smith wrote:
>
> I also don't need it to read image formats. Part of the reason behind 
> RawArray is to avoid standard image formats because they are not optimized 
> for large complex-float arrays. I just want to save multi-GB data arrays to 
> disk quickly and read them back quickly on a different machine, five years 
> later. 
>

 Aside from a small ASCII header, it looks (from the specs) like NRRD can 
save a multidimensional complex floating-point array as just the raw data, 
i.e. a single "write" call.  So I'm not sure what you mean by "not 
optimized".

As for being able to read something 5 years later, using a pre-existing 
format with some kind of userbase seems to improve the odds of that.

On Monday, September 26, 2016 at 10:03:20 AM UTC-4, David Smith wrote:
>
> Sorry I forgot to add: 
>
> JuliaIO/Images.jl relies on having ImageMagick installed, whereas 
> RawArray.jl is a pure Julia solution without any dependencies. 
>

The NRRD spec is not that complicated at first glance; it looks like it 
wouldn't be too hard to write a pure-Julia implementation of it.   If you 
only want to support the subset of NRRD's functionality provided by 
RawArray, the implementation effort wouldn't be much harder than RawArray.


Re: [julia-users] ANN: RawArray.jl

2016-09-26 Thread Isaiah Norton
Thanks for the response.

1. I don't think NRRD is as substantially used as you might think. I've
> worked in imaging science for years on the data processing/file format end,
> and I've never seen anyone use it, and I've never even heard of it.  (Pity,
> because it looks nice enough. :-\)


I think I can make a solid argument that the install-base of software
supporting NRRD is on the order of 30-50k -- probably larger than anything
but NIfTI (which is substantial, to me, as far as obscure, non-DICOM
medical imaging formats go...)

But that's neither here nor there. I'm not going to change your mind, but
perhaps someone else will see this thread and think twice before creating
`RawArraysWithSlightlyMoreMetadata.jl`.


> JuliaIO/Images.jl relies on having ImageMagick installed, whereas
> RawArray.jl is a pure Julia solution without any dependencies.
>

Indeed, that was a major pain point, especially for Windows users. However,
Images.jl has been modularized, and no longer requires ImageMagick. NRRD.jl
does have more dependencies (~10) than might be expected, in order to
support Images inter-op, but from what I can tell they are all pure-Julia
except for Rmath (the binary portions of which are distributed as part of
base Julia).

I'll take my other, minor, comments off-list since this curmudgeonly pet
peeve probably isn't of much general interest!


On Monday, September 26, 2016 at 8:59:15 AM UTC-5, David Smith wrote:
>>
>> Hi, Isaiah. This is a valid question.
>>
>> 0. As a preface, I'd like to say I'm not trying to replace anything. I
>> wrote RawArray to solve a problem we have in magnetic resonance imaging
>> (quickly saving and loading large complex float arrays), and then I decided
>> to share it so if other people like it and find it useful, then cool beans.
>>
>> Now for the mild stumping...
>>
>> 1. I don't think NRRD is as substantially used as you might think. I've
>> worked in imaging science for years on the data processing/file format end,
>> and I've never seen anyone use it, and I've never even heard of it.  (Pity,
>> because it looks nice enough. :-\)
>>
>> 2. RawArray is simpler to handle and trivial to understand. I believe all
>> you need from an I/O library is I/O.* I don't want my file I/O library
>> performing transformations on my data.
>>
>> I also don't need it to read image formats. Part of the reason behind
>> RawArray is to avoid standard image formats because they are not optimized
>> for large complex-float arrays. I just want to save multi-GB data arrays to
>> disk quickly and read them back quickly on a different machine, five years
>> later.
>>
>> I have other implementations (https://github.com/davidssmith/ra), and
>> all are super short and platform agnostic.
>>
>> 3. RawArray is surely faster. All it does is read. It doesn't perform any
>> transformations or encoding, so it can't possibly be slower than NRRD.
>> There is a C library at (https://github.com/davidssmith/ra) if you think
>> a pure Julia implementation isn't fast enough.
>>
>> Cheers,
>> Dave
>>
>> [*] That said, I'm not completely ruling out having transformations
>> available in RawArray between the RAM and disk. For example, when I first
>> wrote it, I had included Blosc compression as an option, signaled by a flag
>> in the header. But in general most transformations are best made in RAM
>> after reading or on disk with already existing, battle-proven tools, such
>> as gzip, uunencode, tar, etc.
>>
>>
>> On Sunday, September 25, 2016 at 9:59:45 PM UTC-5, Isaiah wrote:
>>>
>>> Is there a reason to use this file format over NRRD [1]? To borrow a
>>> wise phrasing: I wonder if the world needs another lightweight raw data
>>> format ;)
>>>
>>> For what it's worth, NRRD is already supported by JuliaIO/Images.jl, and
>>> I believe addresses the use-cases identified in your readme, but with a
>>> number of technical and non-technical advantages (not least: a number of
>>> independent implementations, and a substantial user base, at least as far
>>> as these things go).
>>>
>>> I say this -- very selfishly I admit -- as someone who has been on the
>>> receiving end of far too many files in home-brewed formats.
>>>
>>> [1] http://teem.sourceforge.net/nrrd/descformat.html
>>>
>>> On Sunday, September 25, 2016, David Smith  wrote:
>>>
 Hi, all:

 I finally pushed this out, and it might satisfy some of your needs for
 a simple way to store N-d arrays to disk. Hope you enjoy it.

 RawArray (.ra) is a simple file format for storing n-dimensional
 arrays. RawArray was designed to be portable, fast, storage efficient, and
 future proof. Basically it writes the binary array data directly to disk
 with a short header that is used to recreate type and dimension
 information.

 RawArray is faster than HDF5 and supports complex numbers out of the
 box, which HDF5 does not. RawArray supports all basic `Int`, `UInt`,
 `Float`, and `Complex{}` types, 

Re: [julia-users] ANN: RawArray.jl

2016-09-26 Thread David Smith
Sorry I forgot to add: 

JuliaIO/Images.jl relies on having ImageMagick installed, whereas 
RawArray.jl is a pure Julia solution without any dependencies. 



On Monday, September 26, 2016 at 8:59:15 AM UTC-5, David Smith wrote:
>
> Hi, Isaiah. This is a valid question.
>
> 0. As a preface, I'd like to say I'm not trying to replace anything. I 
> wrote RawArray to solve a problem we have in magnetic resonance imaging 
> (quickly saving and loading large complex float arrays), and then I decided 
> to share it so if other people like it and find it useful, then cool beans.
>
> Now for the mild stumping...
>
> 1. I don't think NRRD is as substantially used as you might think. I've 
> worked in imaging science for years on the data processing/file format end, 
> and I've never seen anyone use it, and I've never even heard of it.  (Pity, 
> because it looks nice enough. :-\)
>
> 2. RawArray is simpler to handle and trivial to understand. I believe all 
> you need from an I/O library is I/O.* I don't want my file I/O library 
> performing transformations on my data. 
>
> I also don't need it to read image formats. Part of the reason behind 
> RawArray is to avoid standard image formats because they are not optimized 
> for large complex-float arrays. I just want to save multi-GB data arrays to 
> disk quickly and read them back quickly on a different machine, five years 
> later. 
>
> I have other implementations (https://github.com/davidssmith/ra), and all 
> are super short and platform agnostic.
>
> 3. RawArray is surely faster. All it does is read. It doesn't perform any 
> transformations or encoding, so it can't possibly be slower than NRRD. 
> There is a C library at (https://github.com/davidssmith/ra) if you think 
> a pure Julia implementation isn't fast enough. 
>
> Cheers,
> Dave
>
> [*] That said, I'm not completely ruling out having transformations 
> available in RawArray between the RAM and disk. For example, when I first 
> wrote it, I had included Blosc compression as an option, signaled by a flag 
> in the header. But in general most transformations are best made in RAM 
> after reading or on disk with already existing, battle-proven tools, such 
> as gzip, uunencode, tar, etc. 
>
>
> On Sunday, September 25, 2016 at 9:59:45 PM UTC-5, Isaiah wrote:
>>
>> Is there a reason to use this file format over NRRD [1]? To borrow a wise 
>> phrasing: I wonder if the world needs another lightweight raw data format ;)
>>
>> For what it's worth, NRRD is already supported by JuliaIO/Images.jl, and 
>> I believe addresses the use-cases identified in your readme, but with a 
>> number of technical and non-technical advantages (not least: a number of 
>> independent implementations, and a substantial user base, at least as far 
>> as these things go).
>>
>> I say this -- very selfishly I admit -- as someone who has been on the 
>> receiving end of far too many files in home-brewed formats.
>>
>> [1] http://teem.sourceforge.net/nrrd/descformat.html
>>
>> On Sunday, September 25, 2016, David Smith  wrote:
>>
>>> Hi, all:
>>>
>>> I finally pushed this out, and it might satisfy some of your needs for a 
>>> simple way to store N-d arrays to disk. Hope you enjoy it.
>>>
>>> RawArray (.ra) is a simple file format for storing n-dimensional arrays. 
>>> RawArray was designed to be portable, fast, storage efficient, and future 
>>> proof. Basically it writes the binary array data directly to disk with a 
>>> short header that is used to recreate type and dimension information. 
>>>
>>> RawArray is faster than HDF5 and supports complex numbers out of the 
>>> box, which HDF5 does not. RawArray supports all basic `Int`, `UInt`, 
>>> `Float`, and `Complex{}` types, and more can be easily added in the future, 
>>> such as Rational or Big*. It can also handle derived types, but the 
>>> serialization of them is currently left up to the user.
>>>
>>> A system of version numbers and flags are implemented to future-proof 
>>> the data files as well, in case the implementation needs to change for some 
>>> reason.
>>>
>>> You can grab it with `Pkg.add("RawArray")`. A minimum of Julia 0.4 is 
>>> required.
>>>
>>> Repository: https://github.com/davidssmith/RawArray.jl
>>>
>>> Cheers,
>>> Dave
>>>
>>

Re: [julia-users] ANN: RawArray.jl

2016-09-26 Thread David Smith
Hi, Isaiah. This is a valid question.

0. As a preface, I'd like to say I'm not trying to replace anything. I 
wrote RawArray to solve a problem we have in magnetic resonance imaging 
(quickly saving and loading large complex float arrays), and then I decided 
to share it so if other people like it and find it useful, then cool beans.

Now for the mild stumping...

1. I don't think NRRD is as substantially used as you might think. I've 
worked in imaging science for years on the data processing/file format end, 
and I've never seen anyone use it, and I've never even heard of it.  (Pity, 
because it looks nice enough. :-\)

2. RawArray is simpler to handle and trivial to understand. I believe all 
you need from an I/O library is I/O.* I don't want my file I/O library 
performing transformations on my data. 

I also don't need it to read image formats. Part of the reason behind 
RawArray is to avoid standard image formats because they are not optimized 
for large complex-float arrays. I just want to save multi-GB data arrays to 
disk quickly and read them back quickly on a different machine, five years 
later. 

I have other implementations (https://github.com/davidssmith/ra), and all 
are super short and platform agnostic.

3. RawArray is surely faster. All it does is read. It doesn't perform any 
transformations or encoding, so it can't possibly be slower than NRRD. 
There is a C library at (https://github.com/davidssmith/ra) if you think a 
pure Julia implementation isn't fast enough. 

Cheers,
Dave

[*] That said, I'm not completely ruling out having transformations 
available in RawArray between the RAM and disk. For example, when I first 
wrote it, I had included Blosc compression as an option, signaled by a flag 
in the header. But in general most transformations are best made in RAM 
after reading or on disk with already existing, battle-proven tools, such 
as gzip, uunencode, tar, etc. 


On Sunday, September 25, 2016 at 9:59:45 PM UTC-5, Isaiah wrote:
>
> Is there a reason to use this file format over NRRD [1]? To borrow a wise 
> phrasing: I wonder if the world needs another lightweight raw data format ;)
>
> For what it's worth, NRRD is already supported by JuliaIO/Images.jl, and I 
> believe addresses the use-cases identified in your readme, but with a 
> number of technical and non-technical advantages (not least: a number of 
> independent implementations, and a substantial user base, at least as far 
> as these things go).
>
> I say this -- very selfishly I admit -- as someone who has been on the 
> receiving end of far too many files in home-brewed formats.
>
> [1] http://teem.sourceforge.net/nrrd/descformat.html
>
> On Sunday, September 25, 2016, David Smith  > wrote:
>
>> Hi, all:
>>
>> I finally pushed this out, and it might satisfy some of your needs for a 
>> simple way to store N-d arrays to disk. Hope you enjoy it.
>>
>> RawArray (.ra) is a simple file format for storing n-dimensional arrays. 
>> RawArray was designed to be portable, fast, storage efficient, and future 
>> proof. Basically it writes the binary array data directly to disk with a 
>> short header that is used to recreate type and dimension information. 
>>
>> RawArray is faster than HDF5 and supports complex numbers out of the box, 
>> which HDF5 does not. RawArray supports all basic `Int`, `UInt`, `Float`, 
>> and `Complex{}` types, and more can be easily added in the future, such as 
>> Rational or Big*. It can also handle derived types, but the serialization 
>> of them is currently left up to the user.
>>
>> A system of version numbers and flags are implemented to future-proof the 
>> data files as well, in case the implementation needs to change for some 
>> reason.
>>
>> You can grab it with `Pkg.add("RawArray")`. A minimum of Julia 0.4 is 
>> required.
>>
>> Repository: https://github.com/davidssmith/RawArray.jl
>>
>> Cheers,
>> Dave
>>
>

Re: [julia-users] ANN: RawArray.jl

2016-09-25 Thread Isaiah Norton
Is there a reason to use this file format over NRRD [1]? To borrow a wise
phrasing: I wonder if the world needs another lightweight raw data format ;)

For what it's worth, NRRD is already supported by JuliaIO/Images.jl, and I
believe addresses the use-cases identified in your readme, but with a
number of technical and non-technical advantages (not least: a number of
independent implementations, and a substantial user base, at least as far
as these things go).

I say this -- very selfishly I admit -- as someone who has been on the
receiving end of far too many files in home-brewed formats.

[1] http://teem.sourceforge.net/nrrd/descformat.html

On Sunday, September 25, 2016, David Smith  wrote:

> Hi, all:
>
> I finally pushed this out, and it might satisfy some of your needs for a
> simple way to store N-d arrays to disk. Hope you enjoy it.
>
> RawArray (.ra) is a simple file format for storing n-dimensional arrays.
> RawArray was designed to be portable, fast, storage efficient, and future
> proof. Basically it writes the binary array data directly to disk with a
> short header that is used to recreate type and dimension information.
>
> RawArray is faster than HDF5 and supports complex numbers out of the box,
> which HDF5 does not. RawArray supports all basic `Int`, `UInt`, `Float`,
> and `Complex{}` types, and more can be easily added in the future, such as
> Rational or Big*. It can also handle derived types, but the serialization
> of them is currently left up to the user.
>
> A system of version numbers and flags are implemented to future-proof the
> data files as well, in case the implementation needs to change for some
> reason.
>
> You can grab it with `Pkg.add("RawArray")`. A minimum of Julia 0.4 is
> required.
>
> Repository: https://github.com/davidssmith/RawArray.jl
>
> Cheers,
> Dave
>


[julia-users] ANN: RawArray.jl

2016-09-25 Thread David Smith
Hi, all:

I finally pushed this out, and it might satisfy some of your needs for a 
simple way to store N-d arrays to disk. Hope you enjoy it.

RawArray (.ra) is a simple file format for storing n-dimensional arrays. 
RawArray was designed to be portable, fast, storage efficient, and future 
proof. Basically it writes the binary array data directly to disk with a 
short header that is used to recreate type and dimension information. 

RawArray is faster than HDF5 and supports complex numbers out of the box, 
which HDF5 does not. RawArray supports all basic `Int`, `UInt`, `Float`, 
and `Complex{}` types, and more can be easily added in the future, such as 
Rational or Big*. It can also handle derived types, but the serialization 
of them is currently left up to the user.

A system of version numbers and flags are implemented to future-proof the 
data files as well, in case the implementation needs to change for some 
reason.

You can grab it with `Pkg.add("RawArray")`. A minimum of Julia 0.4 is 
required.

Repository: https://github.com/davidssmith/RawArray.jl

Cheers,
Dave