[julia-users] HDF5 file id biger then txt. What wrong?

2015-07-21 Thread paul analyst
I have data in txt file, some milons like this:
0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0
0,0,0,0,0,0,0,2,0,0,0,2,0,0,0,0,1
0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1

Coding win1250.

size of dane.txt is 1.3 GB

D=readcsv(dane.txt)
k,l=size(D)

using HDF5, JLD
hfi=h5open(D.h5,w)
close(hfi)

fid = h5open(D.h5,r+)
g = fid[/]
dset1 = d_create(g, /D, datatype(Int64), dataspace(k,l))
dset1[:,:]=D
close(fid)

After save to h5 file the file has 6.3 GB ? Why new file is 4 times biger?
Paul


Re: [julia-users] HDF5 file id biger then txt. What wrong?

2015-07-21 Thread Erik Schnetter
HDF5 file support compression. This is enabled via a flag when writing the
file; when reading, it is automatically decompressed. I assume that
compression would greatly reduce the file size.

-erik

On Tue, Jul 21, 2015 at 1:21 PM, Stefan Karpinski ste...@karpinski.org
wrote:

 In your example data, each value is represented with two bytes: one for
 the value, one for a comma or newline. Each Int64 value is 8 bytes. If all
 your values are between 0 and 255, you could use UInt8 to represent them
 and cut the size in half.

 On Tue, Jul 21, 2015 at 1:16 PM, paul analyst paul.anal...@mail.com
 wrote:

 I have data in txt file, some milons like this:
 0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 0,0,0,0,0,0,0,2,0,0,0,2,0,0,0,0,1
 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1

 Coding win1250.

 size of dane.txt is 1.3 GB

 D=readcsv(dane.txt)
 k,l=size(D)

 using HDF5, JLD
 hfi=h5open(D.h5,w)
 close(hfi)

 fid = h5open(D.h5,r+)
 g = fid[/]
 dset1 = d_create(g, /D, datatype(Int64), dataspace(k,l))
 dset1[:,:]=D
 close(fid)

 After save to h5 file the file has 6.3 GB ? Why new file is 4 times biger?
 Paul





-- 
Erik Schnetter schnet...@gmail.com
http://www.perimeterinstitute.ca/personal/eschnetter/


Re: [julia-users] HDF5 file id biger then txt. What wrong?

2015-07-21 Thread Stefan Karpinski
In your example data, each value is represented with two bytes: one for the
value, one for a comma or newline. Each Int64 value is 8 bytes. If all your
values are between 0 and 255, you could use UInt8 to represent them and cut
the size in half.

On Tue, Jul 21, 2015 at 1:16 PM, paul analyst paul.anal...@mail.com wrote:

 I have data in txt file, some milons like this:
 0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 0,0,0,0,0,0,0,2,0,0,0,2,0,0,0,0,1
 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1

 Coding win1250.

 size of dane.txt is 1.3 GB

 D=readcsv(dane.txt)
 k,l=size(D)

 using HDF5, JLD
 hfi=h5open(D.h5,w)
 close(hfi)

 fid = h5open(D.h5,r+)
 g = fid[/]
 dset1 = d_create(g, /D, datatype(Int64), dataspace(k,l))
 dset1[:,:]=D
 close(fid)

 After save to h5 file the file has 6.3 GB ? Why new file is 4 times biger?
 Paul



Re: [julia-users] HDF5 file id biger then txt. What wrong?

2015-07-21 Thread Stefan Karpinski
Yes, that could be even more effective.

On Tue, Jul 21, 2015 at 2:09 PM, Erik Schnetter schnet...@gmail.com wrote:

 HDF5 file support compression. This is enabled via a flag when writing the
 file; when reading, it is automatically decompressed. I assume that
 compression would greatly reduce the file size.

 -erik

 On Tue, Jul 21, 2015 at 1:21 PM, Stefan Karpinski ste...@karpinski.org
 wrote:

 In your example data, each value is represented with two bytes: one for
 the value, one for a comma or newline. Each Int64 value is 8 bytes. If all
 your values are between 0 and 255, you could use UInt8 to represent them
 and cut the size in half.

 On Tue, Jul 21, 2015 at 1:16 PM, paul analyst paul.anal...@mail.com
 wrote:

 I have data in txt file, some milons like this:
 0,0,2,0,0,0,0,0,0,0,0,0,0,0,0,0,0
 0,0,0,0,0,0,0,2,0,0,0,2,0,0,0,0,1
 0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1

 Coding win1250.

 size of dane.txt is 1.3 GB

 D=readcsv(dane.txt)
 k,l=size(D)

 using HDF5, JLD
 hfi=h5open(D.h5,w)
 close(hfi)

 fid = h5open(D.h5,r+)
 g = fid[/]
 dset1 = d_create(g, /D, datatype(Int64), dataspace(k,l))
 dset1[:,:]=D
 close(fid)

 After save to h5 file the file has 6.3 GB ? Why new file is 4 times
 biger?
 Paul





 --
 Erik Schnetter schnet...@gmail.com
 http://www.perimeterinstitute.ca/personal/eschnetter/