Re: [FFmpeg-user] Decimal times to frame numbers

2021-08-20 Thread amindfv--- via ffmpeg-user
To test this, I've just created a fixed-frame-rate video at 24 fps:

|---R|G|---B|

(1/3) seconds (8 frames) solid Red, followed by (1/24) seconds (1 frame) solid 
Green, followed by (1/3) seconds (8 frames) solid Blue.

Per the FFmpeg Mailing List FAQ, I've uploaded the sample file to 
https://0x0.st/-yD3.mp4

I then tested the output of running commands like this, only changing the value 
for $STARTTIME :

export STARTTIME=0.33 ; ffmpeg -r 24 -i color-frames.mp4 -r 24 -ss 
$STARTTIME "test-$STARTTIME.mp4"

(Side note: I originally put the -ss value before the input specifier, but the 
times were very inaccurate: some clips dropped no frames - i.e. I saw 8 frames 
of red in some cases! Other times too many frames appeared to be dropped. If 
I'm reading the man page entry for -ss correctly, this seems like a bug.)

Below are values for $STARTTIME, with the color of the first frame of the 
output:

0.3: Red
0.30: Red
0.31: Red
0.311: Red
0.312: Red
0.31249: Red
0.31249: Red
0.3125: Green < This is halfway between the start time of the last Red 
frame (7/24) and the start time of the first Green one (1/3). Calculation: 
((7/24) + (((1/3)-(7/24))/2)) == (5/16). Note this output file starts with 2(!) 
green frames even though the input only contains 1.
0.3125001: Green (still 2 frames of Green)
0.312501: Green (still 2 frames)
0.31251: Green (still 2 frames)
0.31250001: Green (still 2 frames)
0.3125001: Green (still 2 frames)
0.312501: Green (back to only 1 frame of Green)
0.31251: Green
0.313: Green
0.315: Green
0.317: Green
0.32: Green
0.33: Green
0.333: Green
< This is where the source video turns from Red to Green 
(1/3 == 0.333...)
0.334: Green
0.334: Green
0.34: Green
0.35: Green
0.36: Blue
0.37: Blue
0.374: Blue
0.374: Blue
0.375: Blue < This is where the source video turns from Green to Blue
0.3750: Blue
0.3751: Blue
0.4: Blue
0.41: Blue
0.416: Blue
0.41: Blue
0.417: Blue

In summary, I'm more confused than when I started. It seems (though I haven't 
tried hairier time values, e.g. with frame rate 23.976) that seeking may 
compute the closest start frame - i.e. simply round. But then the output seems 
to be working in some other way: what's the distinction between time 0.312501 
(1 frame of Green) and 0.3125001 (2 frames of Green)? I don't know.

Any help here is much appreciated. Thanks!
Tom


On Wed, Aug 18, 2021 at 09:26:03PM -0600, amindfv--- via ffmpeg-user wrote:
> How are frame numbers converted to and from decimal numbers of seconds in 
> ffmpeg and related tools?
> 
> For example, given a file foo.mp4 at 24fps, when I run a command like:
> 
> ffmpeg -i foo.mp4 -t 0.72 bar.mp4
> 
> 0.72 is a time between frame 18 (0.70833... seconds) and frame 19 (0.75 
> seconds).
> 
> In my tests, it seems that the number of seconds is rounded down, i.e. any 
> value less than 0.75 is equivalent to the earlier frame.
> 
> Is this always true? Is this the best way to think about the decimal 
> seconds<->frame number conversion? Is there any difference (e.g. in audio 
> track duration) between saying 0.71 or 0.73 in the above?
> 
> Thanks,
> Tom
> 
> ___
> ffmpeg-user mailing list
> ffmpeg-user@ffmpeg.org
> https://ffmpeg.org/mailman/listinfo/ffmpeg-user
> 
> To unsubscribe, visit link above, or email
> ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".
___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-user] Framerate automatically increased while cropping

2021-08-20 Thread Ulf Zibis

Hi,

why is the framerate automatically changed here from 29.98 fps to 120 fps, and 
how can I prevent from this ?

$ ffmpeg -i Demo\ 31.10.2020\ Köln\ Vorspiel+Zugriff.mp4 -vf 
crop=640:320:640:176 Demo\ 31.10.2020\ Köln\ Vorspiel+Zugriff\ Ausschnitt.mp4
ffmpeg version 4.4-static https://johnvansickle.com/ffmpeg/ Copyright (c) 
2000-2021 the FFmpeg developers
  built with gcc 8 (Debian 8.3.0-6)
  configuration: --enable-gpl --enable-version3 --enable-static --disable-debug 
--disable-ffplay --disable-indev=sndio --disable-outdev=sndio --cc=gcc 
--enable-fontconfig --enable-frei0r --enable-gnutls --enable-gmp 
--enable-libgme --enable-gray --enable-libaom --enable-libfribidi 
--enable-libass --enable-libvmaf --enable-libfreetype --enable-libmp3lame 
--enable-libopencore-amrnb --enable-libopencore-amrwb --enable-libopenjpeg 
--enable-librubberband --enable-libsoxr --enable-libspeex --enable-libsrt 
--enable-libvorbis --enable-libopus --enable-libtheora --enable-libvidstab 
--enable-libvo-amrwbenc --enable-libvpx --enable-libwebp --enable-libx264 
--enable-libx265 --enable-libxml2 --enable-libdav1d --enable-libxvid 
--enable-libzvbi --enable-libzimg
  libavutil  56. 70.100 / 56. 70.100
  libavcodec 58.134.100 / 58.134.100
  libavformat    58. 76.100 / 58. 76.100
  libavdevice    58. 13.100 / 58. 13.100
  libavfilter 7.110.100 /  7.110.100
  libswscale  5.  9.100 /  5.  9.100
  libswresample   3.  9.100 /  3.  9.100
  libpostproc    55.  9.100 / 55.  9.100
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from 'Demo 31.10.2020 Köln 
Vorspiel+Zugriff.mp4':
  Metadata:
    major_brand : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder : Lavf58.76.100
  Duration: 00:07:30.03, start: 0.00, bitrate: 2580 kb/s
  Stream #0:0(und): Video: h264 (High) (avc1 / 0x31637661), yuv420p, 1280x720, 
2508 kb/s, 29.98 fps, 120 tbr, 90k tbn, 2k tbc (default)
    Metadata:
  handler_name    : VideoHandler
  vendor_id   : [0][0][0][0]
  Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 
63 kb/s (default)
    Metadata:
  handler_name    : VideoHandler
  vendor_id   : [0][0][0][0]
Stream mapping:
  Stream #0:0 -> #0:0 (h264 (native) -> h264 (libx264))
  Stream #0:1 -> #0:1 (aac (native) -> aac (native))
Press [q] to stop, [?] for help
[libx264 @ 0x65debc0] using cpu capabilities: MMX2 SSE2Fast SSSE3 SSE4.1 Cache64
[libx264 @ 0x65debc0] profile High, level 3.1, 4:2:0, 8-bit
[libx264 @ 0x65debc0] 264 - core 161 r3048 b86ae3c - H.264/MPEG-4 AVC codec - 
Copyleft 2003-2021 - http://www.videolan.org/x264.html - options: cabac=1 ref=3 
deblock=1:0:0 analyse=0x3:0x113 me=hex subme=7 psy=1 psy_rd=1.00:0.00 
mixed_ref=1 me_range=16 chroma_me=1 trellis=1 8x8dct=1 cqm=0 deadzone=21,11 
fast_pskip=1 chroma_qp_offset=-2 threads=3 lookahead_threads=1 sliced_threads=0 
nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=3 
b_pyramid=2 b_adapt=1 b_bias=0 direct=1 weightb=1 open_gop=0 weightp=2 
keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=crf 
mbtree=1 crf=23.0 qcomp=0.60 qpmin=0 qpmax=69 qpstep=4 ip_ratio=1.40 aq=1:1.00
Output #0, mp4, to 'Demo 31.10.2020 Köln Vorspiel+Zugriff Ausschnitt.mp4':
  Metadata:
    major_brand : isom
    minor_version   : 512
    compatible_brands: isomiso2avc1mp41
    encoder : Lavf58.76.100
  Stream #0:0(und): Video: h264 (avc1 / 0x31637661), yuv420p(progressive), 
640x320, q=2-31, 120 fps, 15360 tbn (default)
    Metadata:
  handler_name    : VideoHandler
  vendor_id   : [0][0][0][0]
  encoder : Lavc58.134.100 libx264
    Side data:
  cpb: bitrate max/min/avg: 0/0/0 buffer size: 0 vbv_delay: N/A
  Stream #0:1(und): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, mono, fltp, 
69 kb/s (default)
    Metadata:
  handler_name    : VideoHandler
  vendor_id   : [0][0][0][0]
  encoder : Lavc58.134.100 aac
frame=    1 fps=0.0 q=0.0 size=   0kB time=00:00:00.00 bitrate= 
920.9kbits/sframe=   61 fps=0.0 q=33.0 size=   0kB time=00:00:00.51 
bitrate=   0.7kbits/frame=   81 fps= 54 q=33.0 size=   0kB time=00:00:00.68 
bitrate=   0.6kbits/frame=  125 fps= 62 q=33.0 size=   0kB time=00:00:01.04 
bitrate= 0.4kbits/frame=  165 fps= 65 q=33.0 size=   0kB time=00:00:01.38 
bitrate=   0.3kbits/frame=  193 fps= 62 q=33.0 size=   0kB time=00:00:01.62 
bitrate=   0.2kbits/frame=  229 fps= 62 q=33.0 size=   0kB time=00:00:01.92 
bitrate= 0.2kbits/frame=  265 fps= 63 q=33.0 size= 256kB time=00:00:02.21 
bitrate= 945.1kbits/frame=  301 fps= 61 q=33.0 size= 256kB time=00:00:02.51 
bitrate= 832.9kbits/frame=  329 fps= 61 q=33.0 size= 256kB time=00:00:02.73 
bitrate= 768.0kbits/frame=  361 fps= 60 q=33.0 size= 256kB time=00:00:03.03 
bitrate= 692.3kbits/frame=  389 fps= 59 q=33.0 size= 256kB time=00:00:03.26 
bitrate= 642.5kbits/

-Ulf

___

Re: [FFmpeg-user] Decimal times to frame numbers

2021-08-20 Thread Phil Rhodes via ffmpeg-user
 I must reinforce Carl's interpretation here.
To define the problem more formally, timestamps are a necessary and perfectly 
reasonable approach to timing video playback in a variable frame rate 
environment. They're not a particularly good way of looking up specific images 
in a sequence in a fixed frame rate environment. There's a lot of ambiguity in 
that a range of times refer to any one image, and applications working in a 
fixed frame rate environment (which means absolutely all of film and television 
production) have to make some sort of hopefully-sensible estimated conversion, 
which can and has caused real world problems. Issues arise particularly when 
bringing variable frame rate material into a fixed frame rate environment, 
which may be as simple as video shot on cellphones, which often don't maintain 
a very consistent frame rate even when asked to.
The reason it feels like there isn't any very obvious and guaranteed-correct 
way to convert frame counts to timestamps and vice versa is that... there isn't 
any guaranteed-correct way to do that. That doesn't mean it shouldn't be done.
People involved with the ffmpeg project in the past have made some odd 
statements about timecode, though; I don't think most of the people involved in 
the project have much experience of real world film and TV production, so I 
wouldn't hold your breath for a fix.
P
___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".


Re: [FFmpeg-user] Decimal times to frame numbers

2021-08-20 Thread Carl Zwanzig

I hate to disagree with you, but

On 8/19/2021 5:56 AM, Nicolas George wrote:

No, it is the other way around: you should not be thinking about frame
numbers, you should be thinking about timestamps.

Not necessarily


Timestamps are an inherent property of the frame, they will be preserved
or converted by filters and storage. Frame numbers are not.
Frame numbers are not an internal property of a frame itself, but I'm sure 
you know that in production and editing environments*, nobody cares about a 
timestamp, they care that this frame is 3 seconds and 5 frames from that 
one, that the clip is 32 seconds and 13 frames long, or that a transition 
takes 19 frames. Frame numbers are ordinal from an arbitrary point.


And it's not relevant that they be "preserved by filters" because the don't 
exist in that context anyway- kind of like saying that 20 story building 
doesn't have a 13th floor; it surely does, even if it's labeled "14".


*which tend to be fixed rate, not variable

It's also quite a lot easier and more clear and precise to specify a point 
as 00:00:00.13 (the thirteenth frame interpreted at the current frame rate) 
than as 0.43377... (at 29.97); it will always be the 13th frame of the 
second and can't mean anything else.



So, yes, frame numbers are meaningful and commonly used. If ffmpeg only 
accepts decimal seconds, so be it, but that complicates matters for a fair 
few users and they're not likely to change to decimals for their daily work.


Later,

z!

___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".


[FFmpeg-user] Advice on using silence removal

2021-08-20 Thread Alex R
Hi everyone,

I am attempting to leverage ffmpeg in a project that involves recording
short audio clips. So far I have gotten some mixed results and I'd like to
tap into your collective knowledge to ensure my approach is sound.

Context:
- a person records an audio clip of themselves pronouncing a word (imagine
that you read aloud a flash-card that says "tree" or "helicopter")
- the recording is usually made on a mobile phone

The clip contains some silence at both ends, because there is a delay
between the moment the user presses the record button, the moment they
pronounce their word, and the moment they press "stop". Depending on the
device, there may also be an audible click in the beginning.

My objective is to trim the silence at both ends and apply fade-in/out to
soften the clicks, if any.

The challenges are:
- ffmpeg's silenceremove filter needs a threshold value, however,
- each user is in their own environment, with different levels of ambient
noise
- each device is unique in terms of sensitivity

Thus, I can achieve my desired result with one specific clip through trial
and error, tinkering with thresholds until I get what I need. But I cannot
figure out how to detect these thresholds automatically, such that I can
replicate the result with a broad range of users, environments and
recording devices.

Note that there is no expectation to produce perfect results that match the
quality of an audio recording studio, I'm more in the "rough, but good
enough for practical purposes" territory.

Having read the documentation and various forums, I put together this
pipeline (actual commands in the appendix):

1. run volumedetect to see what the maximum level is
1a. parse stdout to extract `max_volume`
2. normalize audio to `max_volume`
3. apply silenceremove with 
3a. for the beginning of the file
3b. invert the stream and run another silenceremove for the beginning
(which is actually the end)
3c. invert it back and save the output



What I read in the forums gave me the impression that we need step#2 such
that at step#3 we could say the threshold is 0. However, that is not the
case, I still had to find a reasonable threshold via trial and error.

After I found a value that produces a good result, I assumed that it might
be good enough for practical purposes and it would be OK to simply hardcode
it into my code as a magic number. However, on the next day I attempted to
replicate the results using the same recording device in the same room -
but this time ffmpeg would tell me the filtered stream is empty, nothing to
write. The environment wasn't 100% identical, since I'm not doing this in a
controlled lab, but most of the variables are the same, though perhaps the
windows were open and it was a different time of the day, so the baseline
noise level outside was somewhat different.

Clearly, my approach is not robust. I'd like to understand whether there
are any low-hanging fruits that I can try, or if I'm not on the right track.

I imagine that the solution I need would somehow determine the silence
threshold relative to the rest of the file, instead of using a "one fits
all" value. However I did not find such filters or analyzers in ffmpeg.


Your guidance will be greatly appreciated,
Alex




Appendix, pipeline commands

1. ffmpeg -i input.mp3 -af "volumedetect"  -f null /dev/null
here I parse stdout, looking for something like "[Parsed_volumedetect_0 @
0x559dbe815f00] max_volume: -15.9 dB"

2. ffmpeg -i input.mp3 -af "volume=15.9dB" out2-normalized.mp3

3. ffmpeg -i out2-normalized.mp3 -af
silenceremove=start_periods=1:start_duration=0:start_threshold=-6dB:start_silence=0.5,areverse,silenceremove=start_periods=1:start_duration=0:start_threshold=-6dB:start_silence=0.5,afade=t=in:st=0:d=0.3,areverse,afade=t=in:st=0:d=0.3
out3-trimmed.mp3


An example of an input file is available at
railean.net/files/public-temp/in-fresh.mp3, after normalization you can
hear some church bells in the distance. I'm totally fine with them
remaining audible in the result, as long as the leading and trailing
silence is removed.
___
ffmpeg-user mailing list
ffmpeg-user@ffmpeg.org
https://ffmpeg.org/mailman/listinfo/ffmpeg-user

To unsubscribe, visit link above, or email
ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".