Hi, I'm trying to understand how pix_fmt and colorspace work together when encoding using h264 and h265 (x264 and x265 codecs, respectively). This is a very long email, but I feel I need to provide context for what I'm trying to understand. I added several questions in the email.
My original goal was to get some videos encoded with h265 and the RGB, YCoCg, and ICtCp color spaces, to understand the effect of the different color spaces in the actual encoding sizes. I was expecting the following: * ICtCp should reduce the encoded size of a file, but only a small amount. The color space is supposed to be optimized to add all the video energy to the I component, and leave the Ct and Cp ones with very little energy * YCoCg should increase the encoded size of a file, but only a small amount. The color space is interesting because the conversion operations are very simple (no FP ops, just shifts and adds/subs). * RGB should increase the encoded size of the file significantly when compared to YUV444. I am expecting up to 3x (if we assume the RGB->YUV axis rotation puts all the energy on the Y, and therefore the chromas are free to encode). First I checked the standards: The 3x color spaces are part of the h265 standard (described in Table E.5 of the 2018/02 standard). The first 2 (RGB and YCoCg) are part of the h264 standard (described in Table E.5 of the 2016/02 standard). So I tried to encode some h265 files. I tried the "-colorspace <colorspace>" option, alongside "yuv444p" as the pixel format (I want to avoid the effect of chroma subsampling in the final file sizes). First I get the yardstick. Input is a 15-second excerpt from Tears of Steel (1280x800). ``` $ ffmpeg -y -i input -c:v libx265 -pix_fmt yuv420p -qp 30 out.yuv444p.265 ``` Step 1. The file is 1701366 MB. Let's see the effect of chroma subsampling: ``` $ ffmpeg -y -i input -c:v libx265 -pix_fmt yuv444p -qp 30 out.yuv444p.265 ``` Step 2. File is 1706445, or 1.003x. While I understand most of the benefit of chroma subsampling is due to the RGB->YUV conversion, the number feels on the smaller side. I checked the contents of the .265 files (using [h265nal](https://github.com/chemag/h265nal)). I run the following command: ``` $ h265nal --noas-one-line --add-length --add-checksum out.yuv444p.265 > out.yuv444p.265.txt ``` Step 3. And then diff the outputs. I can see the following differences (there are more, but I thought these were the interesting ones): * general_profile_idc is 1 for yuv420p, and 4 for yuv444p (both VPS and SPS) * chroma_format_idc is 1 and 3, respectively (4:2:0 and 4:4:4) * all the slice payloads are different This looks legit. Clearly the first file is 4:2:0, and the second 4:4:4, and the contents are different. OK, let's try with the RGB, YCoCg, and ICtCp color spaces. ``` $ ffmpeg -y -i input -c:v libx265 -pix_fmt yuv444p -colorspace rgb -qp 30 out.yuv444p.rgb.265 $ ffmpeg -y -i input -c:v libx265 -pix_fmt yuv444p -colorspace ycocg -qp 30 out.yuv444p.ycocg.265 $ ffmpeg -y -i input -c:v libx265 -pix_fmt yuv444p -colorspace ictcp -qp 30 out.yuv444p.ictcp.265 ``` Step 4. File sizes are 1706466, 1706466, and 1706473 MB, respectively. The same size in the RGB and YCoCg file is very suspicious. Checking their NALU data, I can see that the file contents are exactly the same, except: * the PREFIX_SEI NALUs are the same size, but contain different contents. This is likely the ffmpeg parameters encoded in the SEI ("colormatrix=0" for RGB vs. "colormatrix=8" for YCoCg). * the SPS NALUs are the same size, but contain different `matrix_coeffs` values (again, 0 for RGB, and 8 for YCoCg) in the VUI That's it. Everything else is the same, including all the NALU slice payloads. This makes no sense, as one of the encoded files is RGB, and the other is a slight variation of YCoCg. In fact, playing the RGB file I can see the colors are heavily distorted, as when you cast RGB to YUV. Checking the YCoCg file, the size is 7 bytes bigger, which is explained by the 7x PREFIX_SEI NALUs in my file (the encoded string is "colormatrix=14"). Apart from the SEI NALUs, the SPS NALUs are also different, as their `VUI.matrix_coeffs` value is 14. All the NALU slice payloads are the same. Compared with the original yuv444p file, I see similar differences: * SPS VUIs are different. When comparing the rgb/ycocg/ictcp cases with the simple yuv444p, I can see that there are more differences in the SPS VUI fields, which can be explained because in order to set `matrix_coeffs` you need to first set the `colour_description_present_flag`, and also provide valid `colour_primaries` and `transfer_characteristics` values. * the PREFIX_SEI values are different But all the slice payloads are exactly the same. So, it seems that setting the `-colorspace` parameter only affects the SPS VUI. The encoded content will always be the yuv444p input, but it will be marked as a different color space in the SPS VUI. *Q1: Is this intended?* Finally I tried using "rgb24" as the pix_fmt, and "rgb" as the colorspace. ``` $ ffmpeg -y -i input -c:v libx265 -pix_fmt rgb24 -colorspace rgb -qp 30 out.rgb24.rgb.265 ``` Step 5. File is 2766445 MB. That's 1.6x the yuv444p case. This is slightly lower than what I expected, but in the ballpark. Compared with the previous intent to produce RGB (using yuv444p), I can see the following differences: * the SPS.VUI `video_full_range_flag` is 1 now (before it was 0). Changing the pix_fmt causes the video to be encoded full range instead of limited range. Other than that, the VPS/SPS/PPS content is exactly the same * the slice payloads are all different The out.rgb24.rgb.265 has the colors right, which suggests this is the right wayto get h265 video using RGB. | File | Size | | --------------------- | ------- | | out.yuv420p.265 | 1701366 | | out.yuv444p.265 | 1706445 | | out.yuv444p.rgb.265 | 1706466 | | out.yuv444p.ycocg.265 | 1706466 | | out.yuv444p.ictcp.265 | 1706473 | | out.rgb24.rgb.265 | 2766445 | So it seems that the pix_fmt parameter does affect what is actually sent to the encoder to encode, while the colorspace parameter only affects how the SPS.VUI is actually set. This suggests there's no actual support to encode anything but vanilla YUV: Running `ffmpeg -pix_fmts` I see lots of combinations of 'y', 'u', 'v', 'j' (IIUC an obsolete way to denote full range) , and 'a' (alpha channels), but nothing else. *Q2: Did I get this right?* Finally, I repeated all the experiments (steps 1 to 5) with h264. As for the parser, I used [h264nal](https://github.com/chemag/h264nal). Everything was the same except using "libx264" instead of "libx265". The file sizes are now: | File | Size | | --------------------- | ------- | | out.yuv420p.264 | 2023618 | | out.yuv444p.264 | 2208675 | | out.yuv444p.rgb.264 | 2208703 | | out.yuv444p.ycocg.264 | 2208703 | | out.yuv444p.ictcp.264 | 2208703 | | out.rgb24.rgb.264 | 2208703 | | out.rgb24.264 | 6002892 | Interesting differences: * the cost of going from 4:2:0 to 4:4:4 is 1.09x, which seems more logical than the 1.003x we saw in h265. * the file produced using "`-pix_fmt rgb24 -colorspace rgb`" is the same as the one produced "`-pix_fmt yuv444p -colorspace rgb`". Of course, the colors are again broken. The fact that the behavior is different for x264 vs. x265 is kind of worrying. I found that you can use "`libx264rgb`" as the codec type. ``` $ ffmpeg -y -i input.264 -c:v libx264rgb -pix_fmt rgb24 -qp 30 out.rgb24.264 ``` Step 6. Note that the file size for the libx264rgb is 2.7x the size of the yuv444p experiments, which is closer to my original 3x expectation than in the x265 case. Comparing the produced file with out.yuv444p.rgb.264, I can see the following differences: * the libx264rgb file is marked as full range (SPS.VUI `video_full_range_flag` field is 1) * the libx264rgb file uses different chroma QP index offsets * all the slice payloads are different Assuming I understood the function of pix_fmt and colorspace from the h265 study, my hunch here is that the "`-pix_fmt rgb24 -colorspace rgb`" combination is broken in h264, and that the libx264rgb mode was created to fix the case. This would explain why there is no libx265rgb. *Q3: Did I get this right?* Thanks, -Chema PS: In case it matters, I'm running my own (modern) ffmpeg build: N-99932-g17a0dfebf5 _______________________________________________ ffmpeg-user mailing list ffmpeg-user@ffmpeg.org https://ffmpeg.org/mailman/listinfo/ffmpeg-user To unsubscribe, visit link above, or email ffmpeg-user-requ...@ffmpeg.org with subject "unsubscribe".