Re: [FFmpeg-devel] A few filter questions
On Fri, Jul 18, 2014 at 12:38:43PM +0200, Gerion Entrup wrote: Am Donnerstag 17 Juli 2014, 17:24:35 schrieb Clément Bœsch: On Thu, Jul 17, 2014 at 04:56:08PM +0200, Gerion Entrup wrote: [...] Also, you still have the string metadata possibility (git grep SET_META libavfilter). Hmm, thank you, I will take a look at it. If I see it right, it is used to fill a dictionary per frame with some kind of data? Strings only, so you'll have to find a serialization somehow. Maybe simply an ascii hex string or something. But yeah, it just allows you to map some key → value string couples to the frames passing by in the filter. How huge is the information to store per frame? 82 byte per frame for the finesignature (Could be split again in three parts: An one byte confidence, a 5 byte words vector, and a 76 byte framesignature, something like: struct finesignature{ uint8_t confidence; uint8_t words[5]; uint8_t framesignature[76] }) 152 byte per 90 frames for the coursesignature (Note, that there are 2 coursesignatures with an offset of 45 frames: 0-89 45-134 90-179 ...) If I see it right, there are two possibilies: Write as chars in the output (looks crappy, but needs the same amount of memory). Write as ascii hex in the output (looks nice, but needs twice as much memory). It won't be encoded in the output (at least I'm not sure about which muxer would store these meta) so the bandwidth issue is not a problem. An ascii hex string would be nice IMO, it's extremely small and would appear fine and be easily parsable. [...] -- Clément B. pgpzlbLRzY72R.pgp Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] A few filter questions
Am Donnerstag 17 Juli 2014, 17:24:35 schrieb Clément Bœsch: On Thu, Jul 17, 2014 at 04:56:08PM +0200, Gerion Entrup wrote: [...] Also, you still have the string metadata possibility (git grep SET_META libavfilter). Hmm, thank you, I will take a look at it. If I see it right, it is used to fill a dictionary per frame with some kind of data? Strings only, so you'll have to find a serialization somehow. Maybe simply an ascii hex string or something. But yeah, it just allows you to map some key → value string couples to the frames passing by in the filter. How huge is the information to store per frame? 82 byte per frame for the finesignature (Could be split again in three parts: An one byte confidence, a 5 byte words vector, and a 76 byte framesignature, something like: struct finesignature{ uint8_t confidence; uint8_t words[5]; uint8_t framesignature[76] }) 152 byte per 90 frames for the coursesignature (Note, that there are 2 coursesignatures with an offset of 45 frames: 0-89 45-134 90-179 ...) If I see it right, there are two possibilies: Write as chars in the output (looks crappy, but needs the same amount of memory). Write as ascii hex in the output (looks nice, but needs twice as much memory). [...] stdout/stderr really isn't a good thing. Using metadata is way better because you can output them from ffprobe, and parse them according to various outputs (XML, CSV, JSON, ...). Sounds good… tools/normalize.py make use of such feature if you want examples (that's the -of option of ffprobe) Ok. [...] Am I understanding right your wondering? No ;), but anyway thanks for your answer. In your 2nd method your filter is a VV-V filter? Am I right, that this filter then also can take only one stream? Said in another way: Can a VV-V filter also behave as a V-V filter? Yes, fieldmatch is a (complex) example of this. But typically it's simply a filter with dynamic inputs, based on the user input. The simplest example would be the split filter. Look at it for an example of dynamic allocation of the number of inputs based on the user input (-vf split=4 is a V- filter) Hmm, interesting code, thank you. [...] Check tools/normalize.py, it's using ebur128 and the metadata system. Thats what I mean. Someone has to write an external script, which calls ffmpeg/ffprobe two times, parse stdout of the first call and pass it to the filteroptions of the second call. As I see, there is no direct way. Something like: ffmpeg -i foo -f:a volume=mode=autodetect normalized.opus We add a discussion several time for real time with that filter. If we do a 2-pass, that's simply because it's more efficient. Typically, doing some live normalization can be done easily (we had patches for this): ebur128 already attaches some metadata to frames, so a following filter such as volume could reuse them, something like -filter_complex ebur128=metadata=1,volume=metadata. [...] ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
[FFmpeg-devel] A few filter questions
Good day, I'm currently working on a video signature filter for ffmpeg. This allows you to fingerprint videos. This fingerprint is built up of 9mb/s of bits or 2-3 mb/s bits compressed. In this context a few questions come into my mind: - Should I print this whole bitstream to stdout/stderr at the end? Is it maybe a better choice to made an own stream out of this. But which type of stream this is? (btw, the video signature algorithm needs 90 following frames, so I can theoretically write every 90 frames something somewhere.) - If I print the whole bitstream to stdout/stderr (my current implementation), is there a possibility to use this later in an external program? The only other globally analyze filter I found is volumedetect. This filter at the end prints per print_stats the calculated results to the console. Is there a possibility within the API for an external program to use these values or do I have to grep the output? A similar example is AcousticID (a fingerprinting technique for audio). Currently chromaprint (the AcousticID library) provides an executable (fpcalc) to calculate AcousticID. It therefore uses FFmpeg to decode the audio and then its own library to calculate the fingerprint. The better way I think would be to have an ffmpeg filter for this. But is it possibly to use the calculated number in an external program without grepping the output? Another thing that came into my mind: Can filter force other filters to go into the filterchain? I see it, when I force GREY_8 only in my filter, it automatically enables the scale filter, too. The reason I asked is the lookup for my filter. Currently my filter analyzes a video and then produces a lot of numbers. To compare two videos and decide, wheather they match or not, these numbers has to be compared. I see three possibilities: 1. Write an VV-V filter. Reimplement (copy) the code from the V-V signature filter and give a boolean as output (match or match not). 2. Take the V-V filter and write a python (or whatever) script that fetch the output and calculates then the rest. 3. Write an VV-V filter, but enforce, that the normal signature filter is executed first to both streams, use the result and then calculate the matching type. Unfortunately I have no idea, how to do this and whether it is possible at all. Can you give me an advice? The last possibility also would allow something like twopass volume normalisation. Currently there is a volumedetect and volume filter. To normalize once could run volumedetect, then fetch the output, and put the values into the volume filter, but I currently don't see a way to do this automatically directly in ffmpeg. (Once the filter is in a good state, I will try to bring it upstream.) Best, Gerion ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] A few filter questions
On Thu, Jul 17, 2014 at 12:33:41PM +0200, Gerion Entrup wrote: Good day, I'm currently working on a video signature filter for ffmpeg. This allows you to fingerprint videos. Oh, nice. This fingerprint is built up of 9mb/s of bits or 2-3 mb/s bits compressed. In this context a few questions come into my mind: - Should I print this whole bitstream to stdout/stderr at the end? Is it maybe a better choice to made an own stream out of this. But which type of stream this is? How does the fingerprint looks like? Could it make sense as a gray video output fractal, or maybe some kind of audio signal? Also, you still have the string metadata possibility (git grep SET_META libavfilter). (btw, the video signature algorithm needs 90 following frames, so I can theoretically write every 90 frames something somewhere.) Do you cache all these frames or just update your caches/stats drop them? - If I print the whole bitstream to stdout/stderr (my current implementation), is there a possibility to use this later in an external program? The only other globally analyze filter I found is volumedetect. This filter at the end prints per print_stats the calculated results to the console. Is there a possibility within the API for an external program to use these values or do I have to grep the output? stdout/stderr really isn't a good thing. Using metadata is way better because you can output them from ffprobe, and parse them according to various outputs (XML, CSV, JSON, ...). Another solution I can now think of is to simply pass an output file as option to the filter. That's typically how we do the 2-pass thing with vidstab filter. [...] Another thing that came into my mind: Can filter force other filters to go into the filterchain? I see it, when I force GREY_8 only in my filter, it automatically enables the scale filter, too. Some filter are inserted automatically for conversion constraints, but that's not decided by the filters but the framework itself. The reason I asked is the lookup for my filter. Currently my filter analyzes a video and then produces a lot of numbers. To compare two videos and decide, wheather they match or not, these numbers has to be compared. I see three possibilities: 1. Write an VV-V filter. Reimplement (copy) the code from the V-V signature filter and give a boolean as output (match or match not). 2. Take the V-V filter and write a python (or whatever) script that fetch the output and calculates then the rest. 3. Write an VV-V filter, but enforce, that the normal signature filter is executed first to both streams, use the result and then calculate the matching type. Unfortunately I have no idea, how to do this and whether it is possible at all. Can you give me an advice? So if you output a file in the filter itself: ffmpeg -i video -vf fingerprint=video.sig -f null - ffmpeg -i another -vf fingerprint=video.sig:check=1 -f null - Or if you save the signature stream in a video (in gray8 for instance): ffmpeg -i video -vf fingerprint -c:v ffv1 sig.nut ffmpeg -i another -i sig.nut -vf '[0][1] fingerprint=mode=check' -f null - The 2nd method is better because it doesn't require file handling in the library, and it also allows stuff like using a diff filter (if you also apply fingerprint - not with mode=check - on `another`) Am I understanding right your wondering? The last possibility also would allow something like twopass volume normalisation. Currently there is a volumedetect and volume filter. To normalize once could run volumedetect, then fetch the output, and put the values into the volume filter, but I currently don't see a way to do this automatically directly in ffmpeg. Check tools/normalize.py, it's using ebur128 and the metadata system. (Once the filter is in a good state, I will try to bring it upstream.) Cool Best, Gerion -- Clément B. pgpIUcBGsdmql.pgp Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] A few filter questions
Am Donnerstag 17 Juli 2014, 13:00:13 schrieb Clément Bœsch: On Thu, Jul 17, 2014 at 12:33:41PM +0200, Gerion Entrup wrote: Good day, I'm currently working on a video signature filter for ffmpeg. This allows you to fingerprint videos. Oh, nice. This fingerprint is built up of 9mb/s of bits or 2-3 mb/s bits compressed. Argh, fail, sorry. I meant: 9mb per hour of video (and 2-3 mb per hour). In this context a few questions come into my mind: - Should I print this whole bitstream to stdout/stderr at the end? Is it maybe a better choice to made an own stream out of this. But which type of stream this is? How does the fingerprint looks like? Could it make sense as a gray video output fractal, or maybe some kind of audio signal? There a finesignatures per frame and coursesignatures per 90 finesignatures. coursesignature are binarized histograms (0 or 1 possible as count). finesignature is mainly a vector of 380 difference values between -128 and 127 which are ternarized into 0 1 or 2. (See the MPEG-7 Standard for more details). I doubt, this is a good video or audio stream. Definitely, interpreting this as video make sense in some way, but metadata looks more useful. Also, you still have the string metadata possibility (git grep SET_META libavfilter). Hmm, thank you, I will take a look at it. If I see it right, it is used to fill a dictionary per frame with some kind of data? (btw, the video signature algorithm needs 90 following frames, so I can theoretically write every 90 frames something somewhere.) Do you cache all these frames or just update your caches/stats drop them? ATM I don't cache the frames, but the whole signature. As said above, the coursesignatures (the part, which needs the 90 frames) is calculated only from the finesignatures (the finesignatures are cached, anyway). - If I print the whole bitstream to stdout/stderr (my current implementation), is there a possibility to use this later in an external program? The only other globally analyze filter I found is volumedetect. This filter at the end prints per print_stats the calculated results to the console. Is there a possibility within the API for an external program to use these values or do I have to grep the output? stdout/stderr really isn't a good thing. Using metadata is way better because you can output them from ffprobe, and parse them according to various outputs (XML, CSV, JSON, ...). Sounds good… Another solution I can now think of is to simply pass an output file as option to the filter. That's typically how we do the 2-pass thing with vidstab filter. I don't like output files. If you want to write a program, that performs a lookup to signatures somewhere stored in a database and this program uses ffmpeg internally and then always has to write a file and read it again, it's not that elegant. (btw, an example for such a program is Musicbrainz Picard, but for AcousticID ;)) [...] Another thing that came into my mind: Can filter force other filters to go into the filterchain? I see it, when I force GREY_8 only in my filter, it automatically enables the scale filter, too. Some filter are inserted automatically for conversion constraints, but that's not decided by the filters but the framework itself. The reason I asked is the lookup for my filter. Currently my filter analyzes a video and then produces a lot of numbers. To compare two videos and decide, wheather they match or not, these numbers has to be compared. I see three possibilities: 1. Write an VV-V filter. Reimplement (copy) the code from the V-V signature filter and give a boolean as output (match or match not). 2. Take the V-V filter and write a python (or whatever) script that fetch the output and calculates then the rest. 3. Write an VV-V filter, but enforce, that the normal signature filter is executed first to both streams, use the result and then calculate the matching type. Unfortunately I have no idea, how to do this and whether it is possible at all. Can you give me an advice? So if you output a file in the filter itself: ffmpeg -i video -vf fingerprint=video.sig -f null - ffmpeg -i another -vf fingerprint=video.sig:check=1 -f null - Or if you save the signature stream in a video (in gray8 for instance): ffmpeg -i video -vf fingerprint -c:v ffv1 sig.nut ffmpeg -i another -i sig.nut -vf '[0][1] fingerprint=mode=check' -f null - The 2nd method is better because it doesn't require file handling in the library, and it also allows stuff like using a diff filter (if you also apply fingerprint - not with mode=check - on `another`) Am I understanding right your wondering? No ;), but anyway thanks for your answer. In your 2nd method your filter is a VV-V filter? Am I right, that this filter then also can take only one stream?
Re: [FFmpeg-devel] A few filter questions
On Thu, Jul 17, 2014 at 04:56:08PM +0200, Gerion Entrup wrote: [...] Also, you still have the string metadata possibility (git grep SET_META libavfilter). Hmm, thank you, I will take a look at it. If I see it right, it is used to fill a dictionary per frame with some kind of data? Strings only, so you'll have to find a serialization somehow. Maybe simply an ascii hex string or something. But yeah, it just allows you to map some key → value string couples to the frames passing by in the filter. How huge is the information to store per frame? [...] stdout/stderr really isn't a good thing. Using metadata is way better because you can output them from ffprobe, and parse them according to various outputs (XML, CSV, JSON, ...). Sounds good… tools/normalize.py make use of such feature if you want examples (that's the -of option of ffprobe) [...] Am I understanding right your wondering? No ;), but anyway thanks for your answer. In your 2nd method your filter is a VV-V filter? Am I right, that this filter then also can take only one stream? Said in another way: Can a VV-V filter also behave as a V-V filter? Yes, fieldmatch is a (complex) example of this. But typically it's simply a filter with dynamic inputs, based on the user input. The simplest example would be the split filter. Look at it for an example of dynamic allocation of the number of inputs based on the user input (-vf split=4 is a V- filter) [...] Check tools/normalize.py, it's using ebur128 and the metadata system. Thats what I mean. Someone has to write an external script, which calls ffmpeg/ffprobe two times, parse stdout of the first call and pass it to the filteroptions of the second call. As I see, there is no direct way. Something like: ffmpeg -i foo -f:a volume=mode=autodetect normalized.opus We add a discussion several time for real time with that filter. If we do a 2-pass, that's simply because it's more efficient. Typically, doing some live normalization can be done easily (we had patches for this): ebur128 already attaches some metadata to frames, so a following filter such as volume could reuse them, something like -filter_complex ebur128=metadata=1,volume=metadata. [...] -- Clément B. pgporzFGKSof0.pgp Description: PGP signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
Re: [FFmpeg-devel] A few filter questions
Le nonidi 29 messidor, an CCXXII, Clément Bœsch a écrit : We add a discussion several time for real time with that filter. If we do a 2-pass, that's simply because it's more efficient. Typically, doing some live normalization can be done easily (we had patches for this): ebur128 already attaches some metadata to frames, so a following filter such as volume could reuse them, something like -filter_complex ebur128=metadata=1,volume=metadata. I believe you are wrong in this paragraph: we do two passes for normalization because that is the only way of doing it without distortions: the level of volume adjustment depends on the whole stream. Normalization can be done in a single pass with distortions, but currently, no filter is capable of smoothing the measures computed by ebur128 to make the distortions inaudible. Patch welcome. Regards, -- Nicolas George signature.asc Description: Digital signature ___ ffmpeg-devel mailing list ffmpeg-devel@ffmpeg.org http://ffmpeg.org/mailman/listinfo/ffmpeg-devel