Re: [FFmpeg-user] Advice on using silence removal

2021-08-21 Thread Paul B Mahol
On Fri, Aug 20, 2021 at 9:47 AM Alex R  wrote:

> Hi everyone,
>
> I am attempting to leverage ffmpeg in a project that involves recording
> short audio clips. So far I have gotten some mixed results and I'd like to
> tap into your collective knowledge to ensure my approach is sound.
>
> Context:
> - a person records an audio clip of themselves pronouncing a word (imagine
> that you read aloud a flash-card that says "tree" or "helicopter")
> - the recording is usually made on a mobile phone
>
> The clip contains some silence at both ends, because there is a delay
> between the moment the user presses the record button, the moment they
> pronounce their word, and the moment they press "stop". Depending on the
> device, there may also be an audible click in the beginning.
>
> My objective is to trim the silence at both ends and apply fade-in/out to
> soften the clicks, if any.
>
> The challenges are:
> - ffmpeg's silenceremove filter needs a threshold value, however,
> - each user is in their own environment, with different levels of ambient
> noise
> - each device is unique in terms of sensitivity
>
> Thus, I can achieve my desired result with one specific clip through trial
> and error, tinkering with thresholds until I get what I need. But I cannot
> figure out how to detect these thresholds automatically, such that I can
> replicate the result with a broad range of users, environments and
> recording devices.
>
> Note that there is no expectation to produce perfect results that match the
> quality of an audio recording studio, I'm more in the "rough, but good
> enough for practical purposes" territory.
>
> Having read the documentation and various forums, I put together this
> pipeline (actual commands in the appendix):
>
> 1. run volumedetect to see what the maximum level is
> 1a. parse stdout to extract `max_volume`
> 2. normalize audio to `max_volume`
> 3. apply silenceremove with 
> 3a. for the beginning of the file
> 3b. invert the stream and run another silenceremove for the beginning
> (which is actually the end)
> 3c. invert it back and save the output
>
>
>
> What I read in the forums gave me the impression that we need step#2 such
> that at step#3 we could say the threshold is 0. However, that is not the
> case, I still had to find a reasonable threshold via trial and error.
>
> After I found a value that produces a good result, I assumed that it might
> be good enough for practical purposes and it would be OK to simply hardcode
> it into my code as a magic number. However, on the next day I attempted to
> replicate the results using the same recording device in the same room -
> but this time ffmpeg would tell me the filtered stream is empty, nothing to
> write. The environment wasn't 100% identical, since I'm not doing this in a
> controlled lab, but most of the variables are the same, though perhaps the
> windows were open and it was a different time of the day, so the baseline
> noise level outside was somewhat different.
>
> Clearly, my approach is not robust. I'd like to understand whether there
> are any low-hanging fruits that I can try, or if I'm not on the right
> track.
>
> I imagine that the solution I need would somehow determine the silence
> threshold relative to the rest of the file, instead of using a "one fits
> all" value. However I did not find such filters or analyzers in ffmpeg.
>
>
> Your guidance will be greatly appreciated,
> Alex
>
>
>
>
> Appendix, pipeline commands
>
> 1. ffmpeg -i input.mp3 -af "volumedetect"  -f null /dev/null
> here I parse stdout, looking for something like "[Parsed_volumedetect_0 @
> 0x559dbe815f00] max_volume: -15.9 dB"
>
> 2. ffmpeg -i input.mp3 -af "volume=15.9dB" out2-normalized.mp3
>
> 3. ffmpeg -i out2-normalized.mp3 -af
>
> silenceremove=start_periods=1:start_duration=0:start_threshold=-6dB:start_silence=0.5,areverse,silenceremove=start_periods=1:start_duration=0:start_threshold=-6dB:start_silence=0.5,afade=t=in:st=0:d=0.3,areverse,afade=t=in:st=0:d=0.3
> out3-trimmed.mp3
>

Use window option too, also set detection to peak if rms (default value) is
not working as expected.
There is not much that can be done if silence is in variable dBFS and
changes much.


>
>
> An example of an input file is available at
> railean.net/files/public-temp/in-fresh.mp3, after normalization you can
> hear some church bells in the distance. I'm totally fine with them
> remaining audible in the result, as long as the leading and trailing
> silence is removed.
> ___
> ffmpeg-user mailing list
> ffmpeg-user@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
>
> To unsubscribe, visit link above, or email
> ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".
>
___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".

Re: [FFmpeg-user] Advice on using silence removal

2021-08-21 Thread Carl Zwanzig

What you'd doing is a "noise gate" function.

A few ideas-
Use either the mean volume or 3-6db below the max as the threshold. A more 
complicated version would be to examine the level at maybe half-second 
intervals and use that to determine the levels of the background and spoken 
parts of the clip.


Do the silence removal before the normalize so you're not bringing up the 
noise level along with the speech.


Expand the dynamic range (compand) to push the voice level up and the noise 
level down. Compand can even do a noise-gate, there's a somewhat cryptic 
example in the audio filters doc 
(https://ffmpeg.org/ffmpeg-filters.html#toc-compand).


One thing that will bite is if the recording's automatic gain control is too 
aggressive and gets the background noise at the start/finish to the same 
level as the voice. Not much you can easily do about that but ask for a new 
recording.


Later,

z!

___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-user] Decimal times to frame numbers

2021-08-21 Thread Nicolas George
Gyan Doshi (12021-08-21):
> A nit, but MP4 is inherently VFR-ready - that's what the (mandatory) stts
> box is for.
> 
> It is for some opaque reason that the muxer defaults to CFR, possibly to do
> with some limitation or bug in the early days.

Good to know.

I will still argue that MP4 is a crappy format designed with more
concern for pleasing big investors than with concern for good design.
For careful examination of the output, NUT would be the most reliable.

Regards,

-- 
  Nicolas George


signature.asc
Description: PGP signature
___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-user] Decimal times to frame numbers

2021-08-21 Thread Gyan Doshi




On 2021-08-21 03:05 pm, Nicolas George wrote:


- By adding -r on your output and by using a format like MP4 that only
   supports constant frame rate, you are forcing ffmpeg to duplicate or


A nit, but MP4 is inherently VFR-ready - that's what the (mandatory) 
stts box is for.


It is for some opaque reason that the muxer defaults to CFR, possibly to 
do with some limitation or bug in the early days.


Regards,
Gyan
___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-user] Decimal times to frame numbers

2021-08-21 Thread Nicolas George
amindfv--- via ffmpeg-user (12021-08-20):
> I then tested the output of running commands like this, only changing the 
> value for $STARTTIME :
> 
> export STARTTIME=0.33 ; ffmpeg -r 24 -i color-frames.mp4 -r 24 -ss 
> $STARTTIME "test-$STARTTIME.mp4"

There are several flaws in problem report:

- You did not post the full output. We need the full output to spot some
  usual missteps.

- You do not need export since you are using the variable directly.

- By adding -r on your input, your force ffmpeg to re-synthesize the
  timestamps. This is wrong in general. In this particular case, it
  changes the precision of the timestamps: your input file has a 1/12288
  time base, with -r 24 it becomes 1/24.

- By adding -r on your output and by using a format like MP4 that only
  supports constant frame rate, you are forcing ffmpeg to duplicate or
  drop frames to a certain rate. If you want to examine the output
  carefully, either use a format that supports arbitrary timestamps or
  do not use a format at all and write to individual images with "-vsync
  passthrough".

It is possible you have unearthed an off-by-one bug somewhere. But in
general, the logic should be:

- The timestamps you write are rounded to the nearest integer multiple
  of the time base.

- The frames you get in the filter line are exactly the interval you
  requested. That means if the time base is more precise than the frame
  rate you can get frames with half the duration.

What happens after the filter chain, on the other hand, is subject to
more complex heuristics. I strongly suggest you try to work as much as
possible only with filters.

> On Wed, Aug 18, 2021 at 09:26:03PM -0600, amindfv--- via ffmpeg-user wrote:

Top-posting is forbidden on this mailing-list. Do not do it again if you
want help. Look it up if you do not know what it means.

Regards,

-- 
  Nicolas George


signature.asc
Description: PGP signature
___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".