Hi,
thanks for pointing me to
https://sourceforge.net/p/sox/mailman/message/37325794/. This is exactly the
problem I am trying to solve.
I am somewhat surprised that we can actually recover the whole buffer after a
seek and write because the memory stream maintains a null byte after the data.
I was assuming that this means that we can not simply recover by seeking back
to the end of the buffer because one byte would have been overwritten by that
terminator byte. I tried what you suggested (code attached), and it seems that
the terminating null byte is not moved on a seek operation, but only when the
file descriptor is closed.
Of course it would be desirable to have a proper WAV header even for streams
opened with sox_open_memstream_write(). I can prepare a fix that introduces a
seek back, but I guess I would have to do something similar for all formats
that seek in the output buffer. Before I start to work on that I would like to
get approval from the maintainer that this is the right approach.
Regards,
Sven
From: Sun Zhenliang <hisunzhenli...@outlook.com>
Sent: 07 October 2021 04:24
To: sox-devel@lists.sourceforge.net <sox-devel@lists.sourceforge.net>
Subject: Re: [SoX-devel] [PATCH] formats: disallow seeking in dynamic memory
buffers
在 2021年10月7日 +0800 05:35,Sven Neumann via SoX-devel
<sox-devel@lists.sourceforge.net>,写道:
Hi,
in one of our internal applications we are using SoX (the 14.4.2+git20190427
version from Ubuntu) to convert from a variety of audio formats to the WAV file
format. We observed that the tests for the conversion occasionally failed and
over the last days I found time to dig deeper into this.
We are using sox_open_memstream_write() to write to a dynamically allocated
in-memory stream. In our tests sometimes the size of the resulting WAV buffer
would have the expected size, sometimes it would be 44 bytes, the size of the
WAV header. Valgrind told me that the behavior of is_seekable() in formats.c
depends on uninitialized memory. In your git repository I found a fix for this:
commit bb38934e11035c8fab141f70dabda3afdd17da36
Author: Mans Rullgard <m...@mansr.com>
Date: Tue Aug 4 17:19:49 2020 +0100
format: improve is_seekable() test
Streams opened with fmemopen() do not have an underlying file descriptor,
so the fstat() will fail, and a random result is returned.
A simpler method that works regardless of file type is to call fseek()
and check if it reports success.
Suggested by Stefan Sauer <enso...@google.com>.
Now with this fix applied valgrind was happy, however now our conversion from
MP3 to WAV would always result in only 44 bytes, as read from the
buffer_size_ptr location passed to sox_open_memstream_write(). It turns out
that with above change the undefined behavior is fixed for streams created with
open_memstream() and is_seekable() will now reliably returns sox_true for such
streams. This allows the WAV writer code to do an fseek() to the start of the
stream followed by a write of the WAV header with correct length information.
However such a seek followed by a write causes the dynamically allocated memory
stream to be truncated. Thus after calling sox_close() the size reported for
the stream will be 44 bytes, that's not what we want. Unfortunately we can not
simply fix this by reporting the full buffer size as the buffer will actually
have been truncated, and a trailing null byte is appended after the WAV header.
It looks like we can indeed not seek and fix data in a dynamically allocated
stream. Thus I am attaching a patch that changes the code in formats.c to set
ft->seekable to false for streams opened with open_memstream(). With this
change applied on top of the improvement for the is_seekable() test, our tests
pass reliably and valgrind seems happy as well.
I am attaching the patch here, please consider it for inclusion. I am also
attaching a simple test application that writes to a stream, seeks to the front
and performs another write. The output of this program illustrates that the
buffer is truncated:
buf = `hello', size = 5
buf = `hello, world', size = 12
buf = `heyho', size = 5
Bug mentioned https://sourceforge.net/p/sox/mailman/message/37325794/.
It is possible to fix the whole data. From my test to open_memstream(), the
dynamically allocated memory would not be free before the ft->fp was closed.
The data written is still there if it’s not covered by other write opt.
In this WAV issue, the seek opt is just in sox_close() to rewrite header, which
is the end of transcoding and will not cover the data area. We can just ftell()
to get the end of file before the fseek and seek back to the end after
rewriting.
Output with ftell opt:
buf = `hello', size = 5
buf = `hello, world', size = 12
buf = `heyho, world', size = 12
As for other formats, the situation may be different from WAV and needs
specific
analysis.
Regards,
Sven
_______________________________________________
SoX-devel mailing list
SoX-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-devel
#include <stdio.h>
int
main (void)
{
char *bp;
size_t size;
off_t end;
FILE *stream;
stream = open_memstream (&bp, &size);
fprintf (stream, "hello");
fflush (stream);
printf ("buf = `%s', size = %ld\n", bp, size);
fprintf (stream, ", world");
fflush (stream);
printf ("buf = `%s', size = %ld\n", bp, size);
end = ftell (stream);
fseek (stream, 0, SEEK_SET);
fprintf (stream, "heyho");
fflush (stream);
printf ("buf = `%s', size = %ld\n", bp, size);
fseek (stream, end, SEEK_SET);
fclose (stream);
printf ("buf = `%s', size = %ld\n", bp, size);
return 0;
}
_______________________________________________
SoX-devel mailing list
SoX-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/sox-devel