Re: [Tracker] Script for media library generation

Jonatan Pålsson Thu, 01 Aug 2013 23:38:23 -0700

On 1 August 2013 17:34, Martyn Russell <mar...@lanedo.com> wrote:
> On 01/08/13 15:23, Jonatan Pålsson wrote:
>>
>> Hi list,
>
>
> Hello Jonatan,
>


Hi Martyn!

>> OK. So I think I have pitched the problem now. What I have done is to
>> combine media encoders (LAME and ImageMagick) and metadata tagging
>> software (id3v2 and exiftool) with the random number generator of
>> Python. By using the random numbers generated by Python as input to
>> these tools, random, reproducible (by reusing the seed for the PRNG),
>> media files can be created.
>>
>> I'm using this to create large numbers of media files to test Tracker
>> extractor modules on, and it works pretty well. So far I can generate
>> MP3, PNG, JPG, TIF, and GIF.
>
>
> Just before you go on, what are you trying to test here? That we
> index/extract properly? Or test the data with queries to the database?
>

The main purpose for generating actual files rather than directly
putting the data in the database is to test the actual extractors.
Specifically I am working on lowering the extraction speed as much as
possible. I have previously written to the list asking for tips on how
to improve store insertion performance, and this is also something I
am experimenting with, via the extractors. I am also, as you mention,
looking at accuracy in the extractors, since it is quite simple to
compare for instance ID3 tags of the MP3 files with the extracted
fields.

My general idea is that having actual files to index produces a
testing scenario as close as possible to actual use cases of Tracker.

>
> At this point I wanted to ask if you had seen the data generators we have in
> the Tracker tree already? NOTE: I say "data" not "file" there.
>
>   utils/data-generators/cc/
>
> You can run
>
>   $ ./generate ./default.cfg
>
> It will create a bunch of ttl files which you can import as you which using
> tracker-import. I think you can even use tracker-import *.ttl.
>
> Anyway this is fake data, not based on files - so it really depends on what
> you're testing. You can also tweak where the data draws its random crap from
> :)
>

This is news to me, thanks for pointing it out. These scripts should
be good for measuring optimal insertion speed to the database, not
including time required for actual extraction. I am currently
experimenting with developing much simpler ontologies, holding only
the data we (Pelagicore) are interested in - I will have a look at if
it is interesting to adapt this ttl generator for our ontologies in
order to establish an optimal insertion speed. The source data you are
using (source-data/) looks very interesting. I might have a look at
integrating that into my script.

> I think it would be quite useful to include your file generator into the
> tracker tree for people to make use of or at least reference to it from a
> README somewhere.

This sounds like a good plan. I will ping the list when I feel the
script is ready for inclusion (or mentioning in a README), I have some
more features and fixes to add soon.

-- 
Regards,
Jonatan Pålsson

Pelagicore AB
Ekelundsgatan 4, 6th floor, SE-411 18 Gothenburg, Sweden
_______________________________________________
tracker-list mailing list
tracker-list@gnome.org
https://mail.gnome.org/mailman/listinfo/tracker-list

Re: [Tracker] Script for media library generation

Reply via email to