On 1 August 2013 17:34, Martyn Russell <mar...@lanedo.com> wrote: > On 01/08/13 15:23, Jonatan Pålsson wrote: >> >> Hi list, > > > Hello Jonatan, >
Hi Martyn! >> OK. So I think I have pitched the problem now. What I have done is to >> combine media encoders (LAME and ImageMagick) and metadata tagging >> software (id3v2 and exiftool) with the random number generator of >> Python. By using the random numbers generated by Python as input to >> these tools, random, reproducible (by reusing the seed for the PRNG), >> media files can be created. >> >> I'm using this to create large numbers of media files to test Tracker >> extractor modules on, and it works pretty well. So far I can generate >> MP3, PNG, JPG, TIF, and GIF. > > > Just before you go on, what are you trying to test here? That we > index/extract properly? Or test the data with queries to the database? > The main purpose for generating actual files rather than directly putting the data in the database is to test the actual extractors. Specifically I am working on lowering the extraction speed as much as possible. I have previously written to the list asking for tips on how to improve store insertion performance, and this is also something I am experimenting with, via the extractors. I am also, as you mention, looking at accuracy in the extractors, since it is quite simple to compare for instance ID3 tags of the MP3 files with the extracted fields. My general idea is that having actual files to index produces a testing scenario as close as possible to actual use cases of Tracker. > > At this point I wanted to ask if you had seen the data generators we have in > the Tracker tree already? NOTE: I say "data" not "file" there. > > utils/data-generators/cc/ > > You can run > > $ ./generate ./default.cfg > > It will create a bunch of ttl files which you can import as you which using > tracker-import. I think you can even use tracker-import *.ttl. > > Anyway this is fake data, not based on files - so it really depends on what > you're testing. You can also tweak where the data draws its random crap from > :) > This is news to me, thanks for pointing it out. These scripts should be good for measuring optimal insertion speed to the database, not including time required for actual extraction. I am currently experimenting with developing much simpler ontologies, holding only the data we (Pelagicore) are interested in - I will have a look at if it is interesting to adapt this ttl generator for our ontologies in order to establish an optimal insertion speed. The source data you are using (source-data/) looks very interesting. I might have a look at integrating that into my script. > I think it would be quite useful to include your file generator into the > tracker tree for people to make use of or at least reference to it from a > README somewhere. This sounds like a good plan. I will ping the list when I feel the script is ready for inclusion (or mentioning in a README), I have some more features and fixes to add soon. -- Regards, Jonatan Pålsson Pelagicore AB Ekelundsgatan 4, 6th floor, SE-411 18 Gothenburg, Sweden _______________________________________________ tracker-list mailing list tracker-list@gnome.org https://mail.gnome.org/mailman/listinfo/tracker-list