Rob Til Freedmen wrote:
> trackerd treats musepack & monkey audio files as 'mime text/plain and 
> service type 6'
> 
>  From the tracker.log:
> 13 Dec 2006, 16:21:04:379 - Extracting Metadata for *new* file 
> /mnt/sound/Oldies/John.Peel.Last.Show.14.10.04/John Peel Last Show 
> 14.10.04 digital recorded (q8).mpc with mime text/plain and service type 6
> 
> 13 Dec 2006, 16:21:32:906 - Please wait while data is being flushed to 
> the inverted word index...
> 13 Dec 2006, 16:21:48:288 - flushing data (36255 words left) to inverted 
> word index - please wait
> ...
> 13 Dec 2006, 16:23:46:572 - flushing data (355 words left) to inverted 
> word index - please wait
> 
> It needs more than 2 minutes to write something like 
> 'utf-japanese/chinese'  to the database!

we normally flush when 6000 words have built up but in your case when a 
binary file is misdiagnosed as text we can end up with huge numbers of 
words (36555). We are not optimized for dealing with such large amounts 
but then that should never occur in practice.

> 
> I haven't added the correct mime type for .mpc and .ape - still looking 
> where to add/modify things :(
> 
> Anyway, a binary type file like musepack or monkey audio should not be 
> treated as unikode text -
> it can't be really legal utf-code!

we check the first 4kb and only treat an unidentified file as text if 
that first 4kb is legal utf-8.

> 
> May be a 'file'-check for a catch-all-mime-type could prevent this?

yeah maybe if a file is suspicious then 'file' could be used to check as 
a last resort.

thanks for the suggestion (I will try and squeeze it into next release)


-- 
Mr Jamie McCracken
http://jamiemcc.livejournal.com/

_______________________________________________
tracker-list mailing list
[email protected]
http://mail.gnome.org/mailman/listinfo/tracker-list

Reply via email to