Re: New archive file format (was: [omgps] collect feature requests)

2009-07-04 Thread mqy

Hi Bilk:

Don't worry :)

I've said that I'm afraid of the corruption. So this feature will be
configurable if it can be integrated.


2009/7/2 William Kenworthy (via Nabble) ml-user+1677-108203...@n2.nabble.com:
 I hope not - I have over 2 million tiles stored on SD card - if file
 corruption or disaster occurs, it may affect only one tile if its being
 accessed at the time - imagine the effect of file system corruption on
 one large archive ... you will most likely lose the lot.

 Then there is the extra overhead needed - Ive gotta ask why? - if you
 can justify the extra cpu needed for this, why not do vector maps?

 BillK


 On Thu, 2009-07-02 at 00:42 -0700, mqy wrote:
 x and y are tile no in tile coordinate system within range of [0..
 2^zoom).
 just do it if you have time, since proof of concept is necessary :) keep
 in
 mind clear APIs.
 it's likely that, the final version to be integrated into omgps is
 rewritten
 in C.


 Laszlo KREKACS wrote:
 
  If I understand right the OSM tiles, they have the following directory
  ...
 

 --
 William Kenworthy bi...@...
 Home in Perth!


 ___
 Openmoko community mailing list
 commun...@...
 http://lists.openmoko.org/mailman/listinfo/community


 
 This email is a reply to your post @
 http://n2.nabble.com/New-archive-file-format-%28was%3A--omgps--collect-feature-requests%29-tp3191899p3193977.html
 You can reply by email or by visting the link above.



-- 
View this message in context: 
http://n2.nabble.com/New-archive-file-format-%28was%3A--omgps--collect-feature-requests%29-tp3191899p3205707.html
Sent from the Openmoko Community mailing list archive at Nabble.com.

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Alexander Shulgin
On Thu, Jul 2, 2009 at 00:20, Laszlo
KREKACSlaszlo.krekacs.l...@gmail.com wrote:
 I dont want to compress at all. The 118MB for me is perfect. I only
 want to pack the directory into a file. But not compressing.
 Im thinking about tar or ar.

 Tar completely fail at random access, simply it lacks the
 table of content, so accessing the last file in the archive
 requires reading the whole content before it.

I fail to see how is this true for normal tar files (vs. data read
from pipe).  Can you elaborate please?

 Zip support accessing each files in the archive, although
 it compress the file by default.

Pardon my ignorance, but wouldn't zip -0 do the trick for your purpose? :)

--
Regards,
Alex

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Laszlo KREKACS
On Thu, Jul 2, 2009 at 7:22 AM, mqymeng.qing...@gmail.com wrote:

 XML is used as a database, elements can be easily added, modified, removed.
 xor tends to be overkilled as of map tile usage -- we don't need iterating,
 delete, and that much meta information. With my suggested design, we can
 even add newly downloaded tiles:
 insert record into meta database, and append tile content into heap file.


If I understand right the OSM tiles, they have the following directory
structure:
XX/YYY/ZZZ.png
10/558/357.png

All the information is obtained from this info (position, zoom level),
am I right?
XX: zoom level
YYY: position x? (or I dont know how to call it;)
ZZZ: position y?

So I think we should only pack the files into the KISS archive file.
For more in depth explanation see the end of this mail.

So something like this:
XX/YY1.kiss
XX/YY2.kiss
XX/YYY.kiss

Im willing to implement a simple kiss/unkiss program (just like tar/untar),
for easy archiving.
I will use python with no non-standard modules.

Best regards,
 Laszlo

ps:
Some statistical data:

Number of all tiles
# cd ~/Maps/OSM; find . -name *.png |wc -l
63818

Subdirs in zoom level dirs (YYY), and total number of files.
for i in *; do echo -n $i; echo -n  ; cd $i; ls -1|wc -l; cd ..; done
for f in *; do cd $f; for i in *; do cd $i; for k in *; do echo
$i/$k  ~/Maps/OSM/$f.txt; done; cd ..; done; cd ..; done
cd ~/Maps/OSM
for i in *.txt; do echo -n $i ; cat $i|wc -l ; done

2:   4 dirs, 16 files
3:   8 dirs, 64 files
4:   11 dirs,77 files
5:   17 dirs,83 files
6:   22 dirs,   265 files
7:   22 dirs,   217 files
8:   17 dirs,75 files
9:   26 dirs,   152 files
10:  39 dirs,   426 files
11:  71 dirs,  1484 files
12: 100 dirs,  1046 files
13:  78 dirs,  2902 files
14: 193 dirs, 23400 files
15:  86 dirs,  1941 files
16: 119 dirs,  4033 files
17: 277 dirs, 27637 files


Count the files in the subdirs(ZZZ):
for i in *; do echo -n $i; echo -n f ; cd $i; for j in *; do cd $j;
echo -n $i # $j @; ls -1|wc -l; cd ..; done; cd ..; done

The number of files is in general 20-30, and the maximum was 180.

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Patryk Benderz
 You can read it here, I also included it (at the end of mail)
  for reference:
 http://pastebin.com/m608acaeb
From your reference:
## General properties
- blocksize: 512 bytes
- only store filename (and directory if any) and content
It might be convenient for future to store file properties like time of
modification. This way you could implement automatic update of tiles
that have been modified since last update or archive creation.

-- 
Kind Regards

Patryk Benderz
IT Specialist
Linux Registered User #377521
+48 22 538 6292

ERSTE Securities Polska S.A.
ul. Królewska 16
Warszawa 00-103
KRS 065121
NIP 526-10-27-638
REGON 011136053
Kapitał akcyjny: 15.500.000 złotych (w pełni opłacony)

This message and any attached files are confidential and intended solely
for the addressee(s). Any publication, transmission or other use of the
information by a person or entity other than the intended addressee is
prohibited. If you receive this in error please contact the sender and
delete the material. The sender does not accept liability for any errors
or omissions as a result of the transmission.


Email secured by Check Point

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Laszlo KREKACS
On Thu, Jul 2, 2009 at 8:42 AM, Alexander Shulginalex.shul...@gmail.com wrote:
 I fail to see how is this true for normal tar files (vs. data read
 from pipe).  Can you elaborate please?

Yepp, of course;)

Tar archive does not contain the byte positions of files inside the archive.
That means accessing a file inside the archive needs to read the whole
content before it, and determine where each file ends. (and you test
if you are at the desired file by reading its header).

It simply lacks of a TOC (table of content).

So accessing the last file in the archive reuires to reading the whole archive.
You can read it here:
http://en.wikipedia.org/wiki/Tar_(file_format)#Format_details

Simplification of tar archive:
[1. file header][1. file][2.file header][2. file][3. file header][3. file]

So how you read the third file from the archive? You read the file until the
[3. file header], your test is successfull (is it the right file?),
and you read the
file itself. You see? You have read the whole file, just accessing the
last item inside.

 Zip support accessing each files in the archive, although
 it compress the file by default.

 Pardon my ignorance, but wouldn't zip -0 do the trick for your purpose? :)

It will do more or less, however there are three main problems with it:

1. you can only obtain the whole file from the archive. So you cant
  read a part of the file. So if you packed lets say a 700MB file to zip,
  you run out of memory on neo.
 At least this is the case on standard python zipfile module.

2. There is no random access feature, at
least not in standard python modules.
3. There are significant processor time wasted when accessing to a file
   (many computation required). Btw, it needs to benchmark on the neo, how
   worse is it.

Best regards,
  Laszlo

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Laszlo KREKACS
On Thu, Jul 2, 2009 at 9:15 AM, Patryk Benderzpatryk.bend...@esp.pl wrote:
 - only store filename (and directory if any) and content
 It might be convenient for future to store file properties like time of
 modification. This way you could implement automatic update of tiles
 that have been modified since last update or archive creation.

This is a neverending game. You store one properties, others want other
property to store. You finally ends something overcomplicated like an
xml structure.
And for accessing the files inside the archive, you dont need this infos at all.

But this fileformat is flexible enough, just attach the metainformation of files
as a file into the archive!
And your problem is solved, and it is future-proof.

So the file structure would be something like this in your case:
[header]
[filenames]
[1. file = metadata file]
[2. file]
[3. file]
[4. file]


However this fileformat is not final, Im open for suggestions;)
I didnt decided if the filenames section should go at the end, or
right after the header. And where to store the metadata, if any?
(start vs. end of file)

So
[header]
[filenames]
[1. file]
[2. file]
[3. file]

vs.
[header]
[1. file]
[2. file]
[3. file]
[filenames]

The same with metadata:
[header]
[filenames]
[1. file = metadata file]
[2. file]
[3. file]

vs.
[header]
[1. file]
[2. file]
[3. file = metadata file]
[filenames]


Need a bit of thinking here.

Laszlo

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread mqy

x and y are tile no in tile coordinate system within range of [0.. 2^zoom).
just do it if you have time, since proof of concept is necessary :) keep in
mind clear APIs.
it's likely that, the final version to be integrated into omgps is rewritten
in C.


Laszlo KREKACS wrote:
 
 If I understand right the OSM tiles, they have the following directory
 ...
 

-- 
View this message in context: 
http://n2.nabble.com/New-archive-file-format-%28was%3A--omgps--collect-feature-requests%29-tp3191899p3193890.html
Sent from the Openmoko Community mailing list archive at Nabble.com.


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Laszlo KREKACS
On Thu, Jul 2, 2009 at 9:42 AM, mqymeng.qing...@gmail.com wrote:

 x and y are tile no in tile coordinate system within range of [0.. 2^zoom).
 just do it if you have time, since proof of concept is necessary :) keep in
 mind clear APIs.
 it's likely that, the final version to be integrated into omgps is rewritten
 in C.

Ok. I'll do it.

I will put the [filenames] section at the end of file. That way appending
to the file is dead simple.
The header structure will be the same, so between 10-20 bytes are
always the [filenames] position.

I was thinking more about the metadata stuff. If we agree on a filename,
like .metadata-kiss, and attache it as a simple file, there is no importance
where in the archive should be placed (but I think should be placed at
the end):
[header]
[1. file]
[2. file]
[3. file = metadata file]
[filenames]


But metadata is really for future consideration (if people find this
archive format useful).

I will also make some test archive file along with the kiss and unkiss
program, to easy implementing.

Here is the updated specification (I added two more faq entries,
max filesize and max filename length):
http://pastebin.com/f51927121

Laszlo

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Laszlo KREKACS
 Here is the updated specification (I added two more faq entries,
 max filesize and max filename length):
 http://pastebin.com/f51927121


I made a small mistake (header structure), here we go:
http://pastebin.com/f5feafd7a

Laszlo

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread William Kenworthy
I hope not - I have over 2 million tiles stored on SD card - if file
corruption or disaster occurs, it may affect only one tile if its being
accessed at the time - imagine the effect of file system corruption on
one large archive ... you will most likely lose the lot.

Then there is the extra overhead needed - Ive gotta ask why? - if you
can justify the extra cpu needed for this, why not do vector maps?

BillK


On Thu, 2009-07-02 at 00:42 -0700, mqy wrote:
 x and y are tile no in tile coordinate system within range of [0.. 2^zoom).
 just do it if you have time, since proof of concept is necessary :) keep in
 mind clear APIs.
 it's likely that, the final version to be integrated into omgps is rewritten
 in C.
 
 
 Laszlo KREKACS wrote:
  
  If I understand right the OSM tiles, they have the following directory
  ...
  
 
-- 
William Kenworthy bi...@iinet.net.au
Home in Perth!


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Laszlo KREKACS
On Thu, Jul 2, 2009 at 10:08 AM, William Kenworthybi...@iinet.net.au wrote:
 I hope not - I have over 2 million tiles stored on SD card - if file
 corruption or disaster occurs, it may affect only one tile if its being
 accessed at the time

My experience differs completely from yours.

- imagine the effect of file system corruption on
 one large archive ... you will most likely lose the lot.

I would even prefer to loosing my map files (backups?), than
crashing the whole filesystem.

However this is not the case. I dont intent to push pack everything
into a single file. Instead have about 1MB files. (or pack subdirs only)
So if you want to loose something, loose 1MB.
But this fileformat should be safe enough, if the header is untouched,
you can recover files from the archive (and there are the checksum
options too)

 Then there is the extra overhead needed - Ive gotta ask why? - if you
 can justify the extra cpu needed for this, why not do vector maps?

Need a serious benchmark here, if the extra overhead is true or not.

In opposite of your opinion, I expect speed improvements. ;)

Laszlo

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Dr. H. Nikolaus Schaller
I do not completely understand the reasons why there is a need for  
(once again) a new file format.
As far as I understand the proposal, it is just a file system running  
in an image file. Like mounting an ISO or any other file system  
residing not on a raw disk but within a file.

So what problem does it solve better than just using the existing file  
system hierarchy directly (/tiles/z/y/x.png)? If it does not compress,  
has no directories, is not faster and is not more reliable as William  
pointed out.
I see only one benefit - you can copy the whole archive as a single  
object instead of copying a file tree.

New file formats usually create more problems than they solve...

Am 02.07.2009 um 10:08 schrieb William Kenworthy:

 I hope not - I have over 2 million tiles stored on SD card - if file
 corruption or disaster occurs, it may affect only one tile if its  
 being
 accessed at the time - imagine the effect of file system corruption on
 one large archive ... you will most likely lose the lot.

 Then there is the extra overhead needed - Ive gotta ask why? - if  
 you
 can justify the extra cpu needed for this, why not do vector maps?

 BillK


 On Thu, 2009-07-02 at 00:42 -0700, mqy wrote:
 x and y are tile no in tile coordinate system within range of [0..  
 2^zoom).
 just do it if you have time, since proof of concept is necessary :)  
 keep in
 mind clear APIs.
 it's likely that, the final version to be integrated into omgps is  
 rewritten
 in C.


 Laszlo KREKACS wrote:

 If I understand right the OSM tiles, they have the following  
 directory
 ...


 -- 
 William Kenworthy bi...@iinet.net.au
 Home in Perth!


 ___
 Openmoko community mailing list
 community@lists.openmoko.org
 http://lists.openmoko.org/mailman/listinfo/community


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Michal Brzozowski
2009/7/2 Dr. H. Nikolaus Schaller h...@computer.org

 I do not completely understand the reasons why there is a need for
 (once again) a new file format.
 As far as I understand the proposal, it is just a file system running
 in an image file. Like mounting an ISO or any other file system
 residing not on a raw disk but within a file.


Good point. You can divide the archive into 100kb blocks and use mount -o
loop on them. When you run out of space, just create a new block.
___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Alexander Shulgin
On Thu, Jul 2, 2009 at 10:15, Laszlo
KREKACSlaszlo.krekacs.l...@gmail.com wrote:
 On Thu, Jul 2, 2009 at 8:42 AM, Alexander Shulginalex.shul...@gmail.com 
 wrote:
 I fail to see how is this true for normal tar files (vs. data read
 from pipe).  Can you elaborate please?

 Yepp, of course;)

[snip]

 Simplification of tar archive:
 [1. file header][1. file][2.file header][2. file][3. file header][3. file]

 So how you read the third file from the archive? You read the file until the
 [3. file header], your test is successfull (is it the right file?),
 and you read the
 file itself. You see? You have read the whole file, just accessing the
 last item inside.

Yes, but is lseek(2) banned on neo?  This is what I was talking about
then mentioned normal files (i.e. not pipes). :)

 Pardon my ignorance, but wouldn't zip -0 do the trick for your purpose? :)

 It will do more or less, however there are three main problems with it:

 1. you can only obtain the whole file from the archive. So you cant
  read a part of the file. So if you packed lets say a 700MB file to zip,
  you run out of memory on neo.
  At least this is the case on standard python zipfile module.

 2. There is no random access feature, at
    least not in standard python modules.
 3. There are significant processor time wasted when accessing to a file
   (many computation required). Btw, it needs to benchmark on the neo, how
   worse is it.

OK, I see now.  Thanks for explanation.

--
Cheers,
Alex

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Jose Luis Perez Diez
El Wednesday, 1 de July de 2009 23:20:40 Laszlo KREKACS va escriure:
 ## General properties
 - blocksize: 512 bytes
 - only store filename (and directory if any) and content
 - first file contains the filenames
 - header: start block, end block, position of last block

 ## Overall file structure
 [header][filenames][1. file][2. file][3. file]

 ## [header]
 [SB][EB][POS] [SB][EB][POS] [SB][EB][POS] etc..

My first reaction to this was: Why do you need this?

My points are:

1- With this format the resulting archive is near read only (every few inserts 
 need the whole file should be rewrote.
 One could use a loop mounted filesystem and use well tested tools.

2- To make it usefull with every app I think we need to mount it with fuse.

3- Not enogh metadata.

I think it could be simpler that way

Metadata Block [0..511]
  [0..3] Previus # metadata block (last block for fist metadata block)
  [4..7] Next#Metadata Block (First on last metadata block
  [8..]  Metadata_items   #list of  Metadata_item
   
Metadata_item
   [0..1] Metadata_size #Bytes;
   [2] Kind # of metadata (Name, Block,Size, Date,CRC, ...)
   [3..6] file Id 
   [8..Metadata_size-1] Value;

Block Value 
 [0..3] Start Block
 [4..7] End Block

The example on QA soud could have the folowing metadata be:

  00 00 00 00 # Previous 
  00 00 00 00 # Next 
  00 1F 01 00 00 00 01 first filename.extension #31 Bytes Name id 1
  00 11 01 00 00 00 02 second try # 17 Bytes Name id 2
  00 1D 01 00 00 00 03 I want a sexy name.txt #29 Bytes Name id 4
  00 0F 02 00 00 00 01 00 00 00 01 00 00 00 02 # id 1 blocks 1-2
  00 0F 02 00 00 00 02 00 00 00 03 00 00 00 04 # id 2 blocks 3-4
  00 0F 02 00 00 00 03 00 00 00 05 00 00 00 08 # id 3 blocks 5-8
  00 0B 03 00 00 00 01 00 00 03 00 #id 1 768 bytes
  00 0B 03 00 00 00 02 00 00 04 00 #id 2 1024 bytes
  00 0B 03 00 00 00 03 00 00 07 FF #id 3 2047 bytes
  00 00 #end of metadata

And a total file size of 9 blocks or 4608 bytes but with the same disk usage 
of 8kb.

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-02 Thread Laszlo KREKACS
On Thu, Jul 2, 2009 at 10:24 AM, Laszlo
KREKACSlaszlo.krekacs.l...@gmail.com wrote:
 Need a serious benchmark here, if the extra overhead is true or not.

Ok, I have written the python implementation of the file archive maker.
I need to finish (ie. write the unpacking part) of it.

I compiled few benchmarks...

I compressed the whole OSM maps tiles on my laptop (I repeated it 10 times):
l...@buldergep:~/Maps/OSM$ echo -e \noutput.kiss; time python
../../Asztal/down/openmoko/paroli/data/kiss/kiss.py 
../report.txt;mv output.kiss ..; echo -e \noutput.tar; time tar -cf
../output.tar .; echo -e \noutput.zip; time zip -0 -r output * 
../report.txt; mv output.zip ..; echo -e \noutput_comp.zip; time zip
-r output_comp *  ../report.txt; mv output_comp.zip ..; rm
../output*; rm ../report.txt

output.kiss

real0m4.447s
user0m2.748s
sys 0m1.520s

output.tar

real0m4.039s
user0m0.236s
sys 0m1.188s

output.zip

real0m5.556s
user0m1.276s
sys 0m2.632s

output_comp.zip

real0m12.438s
user0m8.437s
sys 0m2.620s


So the speed is about the same as in .tar file case. And it beats the zip.
File sizes:
-rw-r--r-- 1 lol lol 109M 2009-07-02 19:11 output_comp.zip
-rw-r--r-- 1 lol lol 125M 2009-07-02 19:11 output.kiss
-rw-r--r-- 1 lol lol 156M 2009-07-02 19:11 output.tar
-rw-r--r-- 1 lol lol 113M 2009-07-02 19:11 output.zip

-rw-r--r--  1 lol lol  93M 2009-07-02 19:11 output.kiss.bz2
-rw-r--r--  1 lol lol  94M 2009-07-02 19:11 output.tar.bz2

Total size of invidual files:
l...@buldergep:~/Maps/OSM$ du -hs .
290M.

Pretty strange, it reserves half the size 

I think this file format worth the effort.

Best regards,
 Laszlo

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


New archive file format (was: [omgps] collect feature requests)

2009-07-01 Thread Laszlo KREKACS
 I dont want to compress at all. The 118MB for me is perfect. I only
 want to pack the directory into a file. But not compressing.
 Im thinking about tar or ar.

Hi!

I have studied all the available archive and compression options.
Most notably tar[1][2][4][6] and zip file format [3].
They are the most common archive types. I read also ar (dpkg
and ipkg uses it) and cpio format. So I did my homework, and
made some researches.

Our requirements:
- no compression (no wasted cpu time)
- random access (no slow waiting time and memory issue)
- readily available module/library for easy of integrating
  (best: no additional package is required to install on the phone)

Tar completely fail at random access, simply it lacks the
table of content, so accessing the last file in the archive
requires reading the whole content before it.

Zip support accessing each files in the archive, although
it compress the file by default.

There are dar[5] and xar[7], which meets our random access
criteria. However dar needs to be ported to the device, and
xar is still in development (that means limited python support
for example).

So I wrote down the most dumb archive fileformat ever;)
When I wrote the specification, I only had one goal:
make it so simple, that everybody can implement it,
so no need to wait for ready-made library.

It is called KISS fileformat (keep it simple and stupid),
the preferred extension would be filename.kiss

You can read it here, I also included it (at the end of mail)
 for reference:
http://pastebin.com/m608acaeb

I think it is suitable for our map tile usage.

What do you think?

Best regards,
 Laszlo

[1]: http://en.wikipedia.org/wiki/Tar_(file_format)
[2]: http://www.python.org/doc/2.5.2/lib/module-tarfile.html
[3]: http://www.python.org/doc/2.5.2/lib/module-zipfile.html
[4]: http://en.wikipedia.org/wiki/Comparison_of_file_archivers
[5]: http://en.wikipedia.org/wiki/DAR_(Disk_Archiver)
[6]: http://en.wikipedia.org/wiki/Archive_formats
[7]: http://code.google.com/p/xar/

KISS archive fileformat specification:

# KISS archive format (Keep It Simple and Stupid)

## General properties
- blocksize: 512 bytes
- only store filename (and directory if any) and content
- first file contains the filenames
- header: start block, end block, position of last block

## Overall file structure
[header][filenames][1. file][2. file][3. file]

## [header]
[SB][EB][POS] [SB][EB][POS] [SB][EB][POS] etc..
[ 4][ 4][  2] [ 4][ 4][  2] [ 4][ 4][  2] etc..
[   header  ] [ filenames ] [1. file] etc..

SB (start block): 4 byte
EB (end block): 4 byte
POS (position of last block): 2 byte

All numbers are stored big-endian. That means most significant bit first.
Example:
613 dec = 265 hex = \00 \00 \02 \65 (4 bytes)
130411 dec = 1FD6B hex = \00 \01 \FD \6B (4 bytes)

Note:
The remaining part of the header block MUST be filled with zero bytes.
You will always have remaining part in the block, simply each file
takes 10 bytes. (512/10 = 51 and 2 bytes left)

## [filenames]
UTF-8 text for each filename, delimited with '\n' byte.
The directory structure is preserved too.
[name of 1. file]['\n'][name of 2. file]['\n'][name of 3. file] etc..

Some examples:
this is a file.txt
this2.tar.gz
this3.html
images/loller.html
weird_dir/this\/files contains\/several\\ slashes.txt

Special characters:
'\n': You cant have '\n' character in the filename. It is preserved.
  (it is not supported in most filesystems anyway)
'/': directory delimiter. To save directory structure.
'\/': if the filename itself contains an / character
'\\': if the filename itself contains a \ character


## [X. file]
The file content as is.


## FAQ:
Q: Why another archive format?
A: Because it is the most dumb format ever;)

Q: Why not tar, ar, zip, [name archive type here]?
A: Short answer: widely used archive format are not suited for random access
 with no compression.
   Long answer: tar: there is no index, reading the last file of the archive
 requires reading the whole file before it.
zip: individual files are compressed, which means: processortime
xar: it would fit the requirements, but it is not widely
 supported, and not in every language.

Q: I use X language does KISS supported there?
A: The fileformat is so simple, it is intented, every programmer
   could implement it in no time.

Q: Does compression supported?
A: No. But you can compress the whole file,
   just like in tar case: filename.kiss.bz2. Use it for file sharing.

Q: Do advanced features (rights, symlinks, hardlinks, user/group/other) are
   preserved?
A: No. It was not the goal of this archive. Although you can implement it, just
   write those informations in the first file. It is not recommended.

Q: If the original file is not multiple of 512 bytes, how it will look in the
   archive, how many bytes will it take?
A: Lets have an example. We have three files:
   768bytes file, 1024 bytes, 2047 bytes
 

Re: New archive file format (was: [omgps] collect feature requests)

2009-07-01 Thread jeremy jozwik
wow

On Wed, Jul 1, 2009 at 2:20 PM, Laszlo
KREKACSlaszlo.krekacs.l...@gmail.com wrote:
 I dont want to compress at all. The 118MB for me is perfect. I only
 want to pack the directory into a file. But not compressing.
 Im thinking about tar or ar.

 Hi!

 I have studied all the available archive and compression options.
 Most notably tar[1][2][4][6] and zip file format [3].
 They are the most common archive types. I read also ar (dpkg
 and ipkg uses it) and cpio format. So I did my homework, and
 made some researches.

 Our requirements:
 - no compression (no wasted cpu time)
 - random access (no slow waiting time and memory issue)
 - readily available module/library for easy of integrating
  (best: no additional package is required to install on the phone)

 Tar completely fail at random access, simply it lacks the
 table of content, so accessing the last file in the archive
 requires reading the whole content before it.

 Zip support accessing each files in the archive, although
 it compress the file by default.

 There are dar[5] and xar[7], which meets our random access
 criteria. However dar needs to be ported to the device, and
 xar is still in development (that means limited python support
 for example).

 So I wrote down the most dumb archive fileformat ever;)
 When I wrote the specification, I only had one goal:
 make it so simple, that everybody can implement it,
 so no need to wait for ready-made library.

 It is called KISS fileformat (keep it simple and stupid),
 the preferred extension would be filename.kiss

 You can read it here, I also included it (at the end of mail)
  for reference:
 http://pastebin.com/m608acaeb

 I think it is suitable for our map tile usage.

 What do you think?

 Best regards,
  Laszlo

 [1]: http://en.wikipedia.org/wiki/Tar_(file_format)
 [2]: http://www.python.org/doc/2.5.2/lib/module-tarfile.html
 [3]: http://www.python.org/doc/2.5.2/lib/module-zipfile.html
 [4]: http://en.wikipedia.org/wiki/Comparison_of_file_archivers
 [5]: http://en.wikipedia.org/wiki/DAR_(Disk_Archiver)
 [6]: http://en.wikipedia.org/wiki/Archive_formats
 [7]: http://code.google.com/p/xar/

 KISS archive fileformat specification:

 # KISS archive format (Keep It Simple and Stupid)

 ## General properties
 - blocksize: 512 bytes
 - only store filename (and directory if any) and content
 - first file contains the filenames
 - header: start block, end block, position of last block

 ## Overall file structure
 [header][filenames][1. file][2. file][3. file]

 ## [header]
 [SB][EB][POS] [SB][EB][POS] [SB][EB][POS] etc..
 [ 4][ 4][  2] [ 4][ 4][  2] [ 4][ 4][  2] etc..
 [   header  ] [ filenames ] [1. file] etc..

 SB (start block): 4 byte
 EB (end block): 4 byte
 POS (position of last block): 2 byte

 All numbers are stored big-endian. That means most significant bit first.
 Example:
 613 dec = 265 hex = \00 \00 \02 \65 (4 bytes)
 130411 dec = 1FD6B hex = \00 \01 \FD \6B (4 bytes)

 Note:
 The remaining part of the header block MUST be filled with zero bytes.
 You will always have remaining part in the block, simply each file
 takes 10 bytes. (512/10 = 51 and 2 bytes left)

 ## [filenames]
 UTF-8 text for each filename, delimited with '\n' byte.
 The directory structure is preserved too.
 [name of 1. file]['\n'][name of 2. file]['\n'][name of 3. file] etc..

 Some examples:
 this is a file.txt
 this2.tar.gz
 this3.html
 images/loller.html
 weird_dir/this\/files contains\/several\\ slashes.txt

 Special characters:
 '\n': You cant have '\n' character in the filename. It is preserved.
  (it is not supported in most filesystems anyway)
 '/': directory delimiter. To save directory structure.
 '\/': if the filename itself contains an / character
 '\\': if the filename itself contains a \ character


 ## [X. file]
 The file content as is.


 ## FAQ:
 Q: Why another archive format?
 A: Because it is the most dumb format ever;)

 Q: Why not tar, ar, zip, [name archive type here]?
 A: Short answer: widely used archive format are not suited for random access
 with no compression.
   Long answer: tar: there is no index, reading the last file of the archive
 requires reading the whole file before it.
zip: individual files are compressed, which means: 
 processortime
xar: it would fit the requirements, but it is not widely
 supported, and not in every language.

 Q: I use X language does KISS supported there?
 A: The fileformat is so simple, it is intented, every programmer
   could implement it in no time.

 Q: Does compression supported?
 A: No. But you can compress the whole file,
   just like in tar case: filename.kiss.bz2. Use it for file sharing.

 Q: Do advanced features (rights, symlinks, hardlinks, user/group/other) are
   preserved?
 A: No. It was not the goal of this archive. Although you can implement it, 
 just
   write those informations in the first file. It is not recommended.

 Q: If the original 

Re: New archive file format (was: [omgps] collect feature requests)

2009-07-01 Thread David Reyes Samblas Martinez
add another wow from here :o

2009/7/1 jeremy jozwik jerjoz.for...@gmail.com:
 wow

 On Wed, Jul 1, 2009 at 2:20 PM, Laszlo
 KREKACSlaszlo.krekacs.l...@gmail.com wrote:
 I dont want to compress at all. The 118MB for me is perfect. I only
 want to pack the directory into a file. But not compressing.
 Im thinking about tar or ar.

 Hi!

 I have studied all the available archive and compression options.
 Most notably tar[1][2][4][6] and zip file format [3].
 They are the most common archive types. I read also ar (dpkg
 and ipkg uses it) and cpio format. So I did my homework, and
 made some researches.

 Our requirements:
 - no compression (no wasted cpu time)
 - random access (no slow waiting time and memory issue)
 - readily available module/library for easy of integrating
  (best: no additional package is required to install on the phone)

 Tar completely fail at random access, simply it lacks the
 table of content, so accessing the last file in the archive
 requires reading the whole content before it.

 Zip support accessing each files in the archive, although
 it compress the file by default.

 There are dar[5] and xar[7], which meets our random access
 criteria. However dar needs to be ported to the device, and
 xar is still in development (that means limited python support
 for example).

 So I wrote down the most dumb archive fileformat ever;)
 When I wrote the specification, I only had one goal:
 make it so simple, that everybody can implement it,
 so no need to wait for ready-made library.

 It is called KISS fileformat (keep it simple and stupid),
 the preferred extension would be filename.kiss

 You can read it here, I also included it (at the end of mail)
  for reference:
 http://pastebin.com/m608acaeb

 I think it is suitable for our map tile usage.

 What do you think?

 Best regards,
  Laszlo

 [1]: http://en.wikipedia.org/wiki/Tar_(file_format)
 [2]: http://www.python.org/doc/2.5.2/lib/module-tarfile.html
 [3]: http://www.python.org/doc/2.5.2/lib/module-zipfile.html
 [4]: http://en.wikipedia.org/wiki/Comparison_of_file_archivers
 [5]: http://en.wikipedia.org/wiki/DAR_(Disk_Archiver)
 [6]: http://en.wikipedia.org/wiki/Archive_formats
 [7]: http://code.google.com/p/xar/

 KISS archive fileformat specification:

 # KISS archive format (Keep It Simple and Stupid)

 ## General properties
 - blocksize: 512 bytes
 - only store filename (and directory if any) and content
 - first file contains the filenames
 - header: start block, end block, position of last block

 ## Overall file structure
 [header][filenames][1. file][2. file][3. file]

 ## [header]
 [SB][EB][POS] [SB][EB][POS] [SB][EB][POS] etc..
 [ 4][ 4][  2] [ 4][ 4][  2] [ 4][ 4][  2] etc..
 [   header  ] [ filenames ] [1. file    ] etc..

 SB (start block): 4 byte
 EB (end block): 4 byte
 POS (position of last block): 2 byte

 All numbers are stored big-endian. That means most significant bit first.
 Example:
 613 dec = 265 hex = \00 \00 \02 \65 (4 bytes)
 130411 dec = 1FD6B hex = \00 \01 \FD \6B (4 bytes)

 Note:
 The remaining part of the header block MUST be filled with zero bytes.
 You will always have remaining part in the block, simply each file
 takes 10 bytes. (512/10 = 51 and 2 bytes left)

 ## [filenames]
 UTF-8 text for each filename, delimited with '\n' byte.
 The directory structure is preserved too.
 [name of 1. file]['\n'][name of 2. file]['\n'][name of 3. file] etc..

 Some examples:
 this is a file.txt
 this2.tar.gz
 this3.html
 images/loller.html
 weird_dir/this\/files contains\/several\\ slashes.txt

 Special characters:
 '\n': You cant have '\n' character in the filename. It is preserved.
      (it is not supported in most filesystems anyway)
 '/': directory delimiter. To save directory structure.
 '\/': if the filename itself contains an / character
 '\\': if the filename itself contains a \ character


 ## [X. file]
 The file content as is.


 ## FAQ:
 Q: Why another archive format?
 A: Because it is the most dumb format ever;)

 Q: Why not tar, ar, zip, [name archive type here]?
 A: Short answer: widely used archive format are not suited for random access
                 with no compression.
   Long answer: tar: there is no index, reading the last file of the archive
                     requires reading the whole file before it.
                zip: individual files are compressed, which means: 
 processortime
                xar: it would fit the requirements, but it is not widely
                     supported, and not in every language.

 Q: I use X language does KISS supported there?
 A: The fileformat is so simple, it is intented, every programmer
   could implement it in no time.

 Q: Does compression supported?
 A: No. But you can compress the whole file,
   just like in tar case: filename.kiss.bz2. Use it for file sharing.

 Q: Do advanced features (rights, symlinks, hardlinks, user/group/other) are
   preserved?
 A: No. It was not the goal of this archive. Although you can implement it, 
 just
   write 

Re: New archive file format (was: [omgps] collect feature requests)

2009-07-01 Thread mqy

Thumb up for your effort:)

A simpler design choice would be:

1. int get_tile_meta(int zoom, int x, int y, TileMeta *tm) -- fill in
offset, size; return -1 if tile not found
   -- implemented by collecting per map meta info into a sqlite database
2. int get_tile_bytes(char* buf, TileMeta *tm) -- read tile content into
buf of size
   -- implemented by collecting tiles into a big file.

Where TileMeta is defined as:
struct TileMeta
{
int offset;
int size;
U4 crc;
char *name; // optional
};

This kind of data source is abstracted as a tile provider, in addition to
the default standard file system based one.
I'd like to see if xar works well too.

regards, 
  mqy


Laszlo KREKACS wrote:
 
 Hi!
 
 I have studied all the available archive and compression options.
 ...
 Best regards,
  Laszlo
 ...
 

-- 
View this message in context: 
http://n2.nabble.com/New-archive-file-format-%28was%3A--omgps--collect-feature-requests%29-tp3191899p3193471.html
Sent from the Openmoko Community mailing list archive at Nabble.com.


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-01 Thread Laszlo KREKACS
Hi!

Thank you for the kind words.

 I'd like to see if xar works well too.

I have only one problem with xar: xml.

It complicates things unnecessary.

Laszlo

___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community


Re: New archive file format (was: [omgps] collect feature requests)

2009-07-01 Thread mqy

XML is used as a database, elements can be easily added, modified, removed.
xor tends to be overkilled as of map tile usage -- we don't need iterating,
delete, and that much meta information. With my suggested design, we can
even add newly downloaded tiles:
insert record into meta database, and append tile content into heap file.


Laszlo KREKACS wrote:
 
 ...
 I have only one problem with xar: xml. 
 It complicates things unnecessary.
 ...
 Laszlo
 

-- 
View this message in context: 
http://n2.nabble.com/New-archive-file-format-%28was%3A--omgps--collect-feature-requests%29-tp3191899p3193580.html
Sent from the Openmoko Community mailing list archive at Nabble.com.


___
Openmoko community mailing list
community@lists.openmoko.org
http://lists.openmoko.org/mailman/listinfo/community