Re: [wikireader] Project Gutenberg (again)
Sure. We'll add it to our todo list. Please keep us posted as to your progress. This is super exciting work you're doing! Well, there is not all that much more to say. I have been fixing minor glitches during the last days. I completely converted gutenberg-de yesterday and it is working fine in the simulator. I'm currently converting all of the german and english ebooks of project gutenberg (about 25000 ebooks, this will yield about 3.5GB of .dat files). Will probably take all day and longer on my dual-core laptop. When I return to Germany on Sunday (I study in the UK) I will finally order a wikireader to test this on real hardware. Tom ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader] Project Gutenberg (again)
Sean, thanks for your quick reply. 1) Is there a deep reason why boldface fonts are not implemented? I figure they are not really relevant for wikis, but would be nice for some of the books. Unless there is something that complicates the matter I'm not seeing, I think I will add them (should be straightforward to mimic the behaviour of italic fonts?). They are implemented. We just didn't include them to save space. (Font sets are super huge when you include all the unicode characters!) If you look at the function handle_data within http://github.com/wikireader/wikireader/blob/master/host-tools/offline-renderer/ArticleRenderer.py you'll see what I mean. Hm. I thought I had convinced myself that the real problem was that only two bits are used to encode the font id, and they are already used up (default, italic, title, subtitle, and supplements [large files with all characters I suppose] for default, title, subtitle). So adding boldface fonts to the wiki-app *does* seem to involve some non-trivial work. (I guess the advantage of splitting the fonts like this is that the small subset can be kept in memory all the time? The size of the fontfiles themselves is on the order of megabites so shouldn't matter, should it?) Sure we can do this. No problem! The font is getting more and more complex since we actually hand make many of the characters now. That would be really awesome. ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader] Project Gutenberg (again)
This has nothing to do with the data structures. We cache the fonts into the SDRAM to speed up the entire system. Without this, WikiReader is too painfully slow (reading from the SD card caps out at around 125kb/s.) Currently we use 32MB of SDRAM. This means we can hold a few font styles but we need to move to smaller size SDRAM for future productions for cost reasons. So we have to be super careful with how we handle fonts. It's quite a complex problem for us. Especially as we add more and more language support. I see. That's the kind of deep problem I'd rather leave to you experts. I'll just wait and see if you cook something up. Till then I can live without boldface. Regards, Tom ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
[wikireader] Project Gutenberg (again)
Dear all, after taking a rather longish break, I'm back working on my project gutenberg integration code. This message consists of three parts: in the first part, I quickly describe what it is all about. The second part contains a number of technical questions, and the third part talks about bugs in wiki-app. All code is available at gitorious: git://gitorious.org/wikireader-ness/wikireader-ness2.git What this is all about: My idea is that akin to wikipedia, project gutenberg provides a large collection of free data that may be nice carrying in your pocket. So I have been working on extending the offline-render to also process ebooks in EPUB format. To quickly see what this is about, try make DESTDIR=image WORKDIR=work WIKI_FILE_PREFIX=wiki WIKI_LANGUAGE=en WIKI_DIR_SUFFIX=guten EBOOK_FILES=ebook-samples VERBOSE=yes cleandirs createdirs birc There is some more functionality on which I can elaborate if anyone is interested, but this is basically it. You have to harvest the ebooks yourself, but I can provide scripts for project gutenberg, and also for project gutenberg-de. Technical Questions: 1) Is there a deep reason why boldface fonts are not implemented? I figure they are not really relevant for wikis, but would be nice for some of the books. Unless there is something that complicates the matter I'm not seeing, I think I will add them (should be straightforward to mimic the behaviour of italic fonts?). 2) Could you please add the characters U+2039 and U+203A ('SINGLE LEFT-POINTING ANGLE QUOTATION MARK' and right-pointing version) to the font? They are used quite often in some books and the box just looks ugly. Again I would do this myself but there seem to be a number of intermediate stages in font generation that I don't really understand. 3) Is it possible that the english language image on dev.thewikireader.org is corrupted? When I try to extract it with 7z x enpedia.7z I get a cryptic Error: E_FAIL message. (I'm running standard 7z of debian testing, version 9.04 beta.) Bugs in wiki-app: I believe that in the course of writing and testing my extensions, I have fixed some minor bugs in the core wiki-app code. My changes are very small and isolated, so the maintainers of the main repository may wish to look at these files only. Thanks, Tom ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader] Rudimentary support for several wikis
Thomas HOCEDEZ wrote: Have you seen this on the git (http://wiki.github.com/wikireader/wikireader/structure-of-sd-card) Nope. I take it most of my work was useless … ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader] Rudimentary support for several wikis
Actually that fits my needs very well. My support for several wikis was a rudimentary hack at best, to enable my real goal: using the wikireader as an ebook reader. *That* code was almost trivial to port, and can be found at git://gitorious.org/wikireader-ness/wikireader-ness2.git. I'll keep pushing there (mainly for backup), so if anyone is interested in having the entire project gutenberg library in their pocket … ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader] Rudimentary support for several wikis
Alright. The latest commit has a ChangeCollection script. Use it like this: ChangeCollection.py --from=none --to=1 --prefix=/path/to/image/pedia --dat-offset=${next free dat} where ${next free dat} is the first unused number in the .dat namespace of the english wiki. This will take a long while (it has to decompress and recompress all articles!), but it is probably faster than re-rendering everything (on my laptop it takes about 40 seconds to patch 1000 articles). Next copy the pedia.idx, pedia.pfx, pedia.fnd, pedia.hsh, pedia?.dat of the english wiki to your image, renaming to pedia0.idx, pedia0.pfx, pedia0.fnd, pedia0.hsh (the pedia?.dat can keep their names). If you now boot my kernel, you should be able to change between both wikis, as described in my first post. Please tell me if everything works as expected. Thomas HOCEDEZ wrote: It would be awesome ! I finished French Wiki last night, upload is in action. It will be available before tonight on some mirors. I'll post urls as soon as it is available. Thomas ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
Re: [wikireader] Rudimentary support for several wikis
in the light of the awfully long render times for complete wikis, I figure I should create a 'change collection number' script. Thomas HOCEDEZ wrote: Le 19/01/2010 16:33, Tom Bachmann a écrit : I now registered to the list, since unregistered didn't seem to come through and c...@thewikireader doesn't seem to respond. Possibly you might recive this message more than once. Original Message Subject: [wikireader] Rudimentary support for several wikis Date: Sun, 17 Jan 2010 00:56:53 + From: Tom Bachmanntb...@cam.ac.uk To: community@lists.openmoko.org Hello, first of all, please CC me since I'm not registered to the list. Over the last few days I have been hacking together rudimentary support for displaying several collections of data (e.g. wikis of different languages) on the wikireader. This code is not yet ready to be incorporated into the main repository (I think), and furthermore I don't actually know if it complies with your ideas of simplicity. HOWEVER, I would be very grateful to everyone who can test the code. I don't yet have a real wikireader (i.e. I have been developing this on the simulator; I will get one after sorting out my budget...) and I'm worried that there might be problems related to e.g. the scarcity of memory on the reader (how much ram has it installed?). Here is what I did: basically, articles are now identified by their index and by their collection id (the highest four bits of the 32bit identifier). The .pfx, .fnd, .hsh and .idx files are replicated per collection. The .dat files are just numbered consecutively (and identified by the usual way). So if you have e.g. two collections, say english and french wikipedia, then your image layout may look like this: pedia0.idx pedia0.hsh pedia0.pfx pedia0.fnd pedia1.idx pedia1.hsh pedia1.pfx pedia1.fnd pedia0.dat pedia1.dat pedia2.dat pedia3.dat pedia4.dat You cannot tell what articles are in what .dat files (in principle articles from several wikis could be mixed in one file), but in practice we might have pedia0-2.dat corresponding to the collection 0 (english wiki) and pedia{3,4}.dat corresponding to collection 1 (french wiki). The searching functionality etc is implemented in the wiki-app, the user inteface is rather non-existent. As a hack for testing I'm statically configuring the system to use two collections (identified 0 and 1) and I added an invisible button to the upper right corner of the search menu to switch between the collections (in the simulator you will see a message). There seem to be some bugs in that button but it's really for testing only. In addition to implementing all that in the wiki-app, I modified the render, index and combine programs. All take a new --coll-number argument to identify the collection being worked on, and ArticleRender.py has a new --dat-number argument to specify the .dat file (--number only identifies the block for the .idx file). The good news is, you can just re-use your primary collection (the one identified by 0). The bad news is, all extra collections have to be re-built. For a quick test, try make DESTDIR=image WORKDIR=work \ XML_FILES=xml-file-samples/japanese_architects.xml \ COLL_NUMBER=1 DAT_NUMBER=${first unused index in .dat} iprch make DESTDIR=image WORKDIR=work install and then copy everything to your wikireader (or try sim4). Again, it would be *greatly* appreciated if someone could build a large second collection and try two real-life datasets on the wikireader. All the code is at gitorious (just because I am already registered there but not yet on github). To get it, do git clone git://gitorious.org/wikireader-ness/wikireader-ness.git Let me know what you think! Thanks, Tom ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community It would be awesome ! I finished French Wiki last night, upload is in action. It will be available before tonight on some mirors. I'll post urls as soon as it is available. Thomas ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community
[wikireader] Rudimentary support for several wikis
I now registered to the list, since unregistered didn't seem to come through and c...@thewikireader doesn't seem to respond. Possibly you might recive this message more than once. Original Message Subject: [wikireader] Rudimentary support for several wikis Date: Sun, 17 Jan 2010 00:56:53 + From: Tom Bachmann tb...@cam.ac.uk To: community@lists.openmoko.org Hello, first of all, please CC me since I'm not registered to the list. Over the last few days I have been hacking together rudimentary support for displaying several collections of data (e.g. wikis of different languages) on the wikireader. This code is not yet ready to be incorporated into the main repository (I think), and furthermore I don't actually know if it complies with your ideas of simplicity. HOWEVER, I would be very grateful to everyone who can test the code. I don't yet have a real wikireader (i.e. I have been developing this on the simulator; I will get one after sorting out my budget...) and I'm worried that there might be problems related to e.g. the scarcity of memory on the reader (how much ram has it installed?). Here is what I did: basically, articles are now identified by their index and by their collection id (the highest four bits of the 32bit identifier). The .pfx, .fnd, .hsh and .idx files are replicated per collection. The .dat files are just numbered consecutively (and identified by the usual way). So if you have e.g. two collections, say english and french wikipedia, then your image layout may look like this: pedia0.idx pedia0.hsh pedia0.pfx pedia0.fnd pedia1.idx pedia1.hsh pedia1.pfx pedia1.fnd pedia0.dat pedia1.dat pedia2.dat pedia3.dat pedia4.dat You cannot tell what articles are in what .dat files (in principle articles from several wikis could be mixed in one file), but in practice we might have pedia0-2.dat corresponding to the collection 0 (english wiki) and pedia{3,4}.dat corresponding to collection 1 (french wiki). The searching functionality etc is implemented in the wiki-app, the user inteface is rather non-existent. As a hack for testing I'm statically configuring the system to use two collections (identified 0 and 1) and I added an invisible button to the upper right corner of the search menu to switch between the collections (in the simulator you will see a message). There seem to be some bugs in that button but it's really for testing only. In addition to implementing all that in the wiki-app, I modified the render, index and combine programs. All take a new --coll-number argument to identify the collection being worked on, and ArticleRender.py has a new --dat-number argument to specify the .dat file (--number only identifies the block for the .idx file). The good news is, you can just re-use your primary collection (the one identified by 0). The bad news is, all extra collections have to be re-built. For a quick test, try make DESTDIR=image WORKDIR=work \ XML_FILES=xml-file-samples/japanese_architects.xml \ COLL_NUMBER=1 DAT_NUMBER=${first unused index in .dat} iprch make DESTDIR=image WORKDIR=work install and then copy everything to your wikireader (or try sim4). Again, it would be *greatly* appreciated if someone could build a large second collection and try two real-life datasets on the wikireader. All the code is at gitorious (just because I am already registered there but not yet on github). To get it, do git clone git://gitorious.org/wikireader-ness/wikireader-ness.git Let me know what you think! Thanks, Tom ___ Openmoko community mailing list community@lists.openmoko.org http://lists.openmoko.org/mailman/listinfo/community