RE: Magic in videobuf
Hans Verkuil wrote: Andy Walls wrote: On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote: Hans Verkuil wrote: Hans Verkuil wrote: Pawel Osciak wrote: is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code besides driver debugging? I intend to remove them, as we weren't able to find any particular use for them when we were discussing this at the memory handling meeting in Norway... It is a sort of paranoid check to avoid the risk of mass memory corruption if something goes deadly wrong with the video buffers. What on earth does this magic check have to do with possible DMA overruns/memory corruption? This assumes that somehow exactly these magic fields are overwritten and that you didn't crash because of memory corruption elsewhere much earlier. All it does is oops anyway, so it really doesn't 'avoid' a crash (as if you could in such scenarios). And most likely the damage has been done already in that case. It won't avoid the damage, but the error message could potentially help to track the issue. It will also likely limit the damage. Please let us get rid of this. It makes no sense whatsoever. I don't have a strong opinion about this subject, but if this code might help to avoid propagating the damage and to track the issue, I don't see why we need to remove it, especially since it is easy to disable the entire logic by just adding a few #if's to remove this code on environments where no problem is expected. It is highly unlikely that this code ever prevented these issues. Especially given the places where the check is done. I think this is just debug code that has been dragged along for all these years without anyone bothering to remove it. I remember I had to re-format one disk, during that time, due to a videobuf issue. So, those checks help people that are touching at the videobuf code, reducing the chances of damaging their disk partitions when trying to implement overlay mode and userptr on the videobuf implementations that misses those features, or when working on a different mmap() logic at the driver. In a previous job, working on a particularly large application, I had occasional corruption in a shared memory segment that was shared by many writer processes and 2 readers. A simple checksum on the data header (and contents if appropriate) was enough to detect corrpution and avoid dereferencing a corrupted pointer to the next data element (when walking a data area filled with Key-Length-Value encoded data). This forward error detection was inelegant to me - kind of like putting armor on one's car instead of learning to drive properly. I only resorted to using the checksum because there was almost no way to find which process was corrupting shared memory in a reasonable amount of time. It allowed me to change a show stopper bug into an annoying data presentation bug, so the product could be released to a production environment. In a development environment, it would be much better to disable such defensive coding and let the kernel Oops. You'll never find the problems if you keep hiding them from yourself. So, to sum up (I hope I understood you guys correctly): we are not seeing any particular reason (besides debugging) for having the checks in videobuf-core. Checks in memory-specific handling may have some uses, although I am not sure how much. I am not an expert on sg drivers, but as the magics are in the kernel control structures, they are not really a subject to corruption. What may get corrupted is video data or sg lists, but the magics are in a separate memory areas anyway. So videobuf-core magics should be removed and we are leaning towards removing memory-type magics as well? That is my opinion, yes. However, there is one case where this is actually useful. Take for example the function videobuf_to_dma in videobuf-dma-sg.c. This is called by drivers and it makes sense that that function should double-check that the videobuf_buffer is associated with the dma_sg memtype. But calling this 'magic' is a poor choice of name. There is nothing magic about it, in this case it is just an identifier of the memtype. And there may be better ways to do this check anyway. I have not done any analysis, but might be enough to check whether the int_ops field of videobuf_queue equals the sg_ops pointer. If so, then the whole magic handling can go away in this case. Well... I see this discussion is dragging on a bit. I will not be touching magics for now then, at least not until we arrive at a consensus sometime in the future. Best regards -- Pawel Osciak Linux Platform Group Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Magic in videobuf
On Wednesday 17 March 2010 07:50:27 Pawel Osciak wrote: That is my opinion, yes. However, there is one case where this is actually useful. Take for example the function videobuf_to_dma in videobuf-dma-sg.c. This is called by drivers and it makes sense that that function should double-check that the videobuf_buffer is associated with the dma_sg memtype. But calling this 'magic' is a poor choice of name. There is nothing magic about it, in this case it is just an identifier of the memtype. And there may be better ways to do this check anyway. I have not done any analysis, but might be enough to check whether the int_ops field of videobuf_queue equals the sg_ops pointer. If so, then the whole magic handling can go away in this case. Well... I see this discussion is dragging on a bit. I will not be touching magics for now then, at least not until we arrive at a consensus sometime in the future. You give up too easily :-) But I agree that it is probably best to start with checkpatch cleanups. I did a bit more research. The idea that I had to check the int_ops field doesn't work at the moment because videobuf_qtype_ops contains a wild variety of functions: some operate on a queue, some on a buffer. Some get a queue pointer, but really should have gotten a buffer pointer. Ops like copy_to_user and copy_stream shouldn't have been in the qtype at all, at least not in the current form. Anyway, it would help a lot if the qtype_ops are split into two structs: one for queues, one for buffers. Each queue or buffer struct then has a pointer to the ops. And that can be used by the qtype code as a check to detect whether it is called from the right context. Frankly, in the long term the queue-specific ops should probably be replaced by buffer-specific ops anyway since we want to have a lot more flexibility here and possible have different memory types per buffer (or even plane) on the same queue. It is my impression that all the queue-specific ops just walk over the buffers anyway, so they can easily be replaced by buffer-specific ops and the 'walk over' part can be moved to the core. So perhaps switching immediately to buffer-only ops might actually be better than create separate queue and buffer ops. Regards, Hans -- Hans Verkuil - video4linux developer - sponsored by TANDBERG -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Magic in videobuf
Andy Walls wrote: On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote: Hans Verkuil wrote: Hans Verkuil wrote: Pawel Osciak wrote: is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code besides driver debugging? I intend to remove them, as we weren't able to find any particular use for them when we were discussing this at the memory handling meeting in Norway... It is a sort of paranoid check to avoid the risk of mass memory corruption if something goes deadly wrong with the video buffers. What on earth does this magic check have to do with possible DMA overruns/memory corruption? This assumes that somehow exactly these magic fields are overwritten and that you didn't crash because of memory corruption elsewhere much earlier. All it does is oops anyway, so it really doesn't 'avoid' a crash (as if you could in such scenarios). And most likely the damage has been done already in that case. It won't avoid the damage, but the error message could potentially help to track the issue. It will also likely limit the damage. Please let us get rid of this. It makes no sense whatsoever. I don't have a strong opinion about this subject, but if this code might help to avoid propagating the damage and to track the issue, I don't see why we need to remove it, especially since it is easy to disable the entire logic by just adding a few #if's to remove this code on environments where no problem is expected. It is highly unlikely that this code ever prevented these issues. Especially given the places where the check is done. I think this is just debug code that has been dragged along for all these years without anyone bothering to remove it. I remember I had to re-format one disk, during that time, due to a videobuf issue. So, those checks help people that are touching at the videobuf code, reducing the chances of damaging their disk partitions when trying to implement overlay mode and userptr on the videobuf implementations that misses those features, or when working on a different mmap() logic at the driver. In a previous job, working on a particularly large application, I had occasional corruption in a shared memory segment that was shared by many writer processes and 2 readers. A simple checksum on the data header (and contents if appropriate) was enough to detect corrpution and avoid dereferencing a corrupted pointer to the next data element (when walking a data area filled with Key-Length-Value encoded data). This forward error detection was inelegant to me - kind of like putting armor on one's car instead of learning to drive properly. I only resorted to using the checksum because there was almost no way to find which process was corrupting shared memory in a reasonable amount of time. It allowed me to change a show stopper bug into an annoying data presentation bug, so the product could be released to a production environment. In a development environment, it would be much better to disable such defensive coding and let the kernel Oops. You'll never find the problems if you keep hiding them from yourself. So, to sum up (I hope I understood you guys correctly): we are not seeing any particular reason (besides debugging) for having the checks in videobuf-core. Checks in memory-specific handling may have some uses, although I am not sure how much. I am not an expert on sg drivers, but as the magics are in the kernel control structures, they are not really a subject to corruption. What may get corrupted is video data or sg lists, but the magics are in a separate memory areas anyway. So videobuf-core magics should be removed and we are leaning towards removing memory-type magics as well? Best regards -- Pawel Osciak Linux Platform Group Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Magic in videobuf
Mauro Carvalho Chehab wrote: is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code besides driver debugging? I intend to remove them, as we weren't able to find any particular use for them when we were discussing this at the memory handling meeting in Norway... It is a sort of paranoid check to avoid the risk of mass memory corruption if something goes deadly wrong with the video buffers. The original videobuf, written back in 2001/2002 had this code, and I've kept it on the redesign I did in 2007, since I know that DMA is very badly implemented on some chipsets. There are several reports of the video driver to corrupt the system memory and damaging the disk data when a PCI transfer to disk happens at the same time that a PCI2PCI data transfer happens (This basically affects overlay mode, where the hardware is programmed to transfer data from the video board to the video adapter board). The DMA bug is present on several VIA and SYS old chipsets. It happened again in some newer chips (2007?), and the fix were to add a quirk blocking overlay mode on the reported broken hardware. It seems that newer BIOSes for those newer hardware fixed this issue. That's said, I never got any report from anyone explicitly saying that they hit the MAGIC_CHECK() logic. I prefer to keep this logic, but maybe we can add a CONFIG option to disable it. Something like: #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK #define MAGIC_CHECK() ... #else #define MAGIC_CHECK() #endif What on earth does this magic check have to do with possible DMA overruns/memory corruption? This assumes that somehow exactly these magic fields are overwritten and that you didn't crash because of memory corruption elsewhere much earlier. Yes, that's the assumption. As, in general, there are more than one videobuffer, and assuming that one buffer physical address is close to the other, if the data got miss-aligned at the DMA, it is likely that the magic number of the next buffer will be overwritten if something got bad. The real situation will depend on how fragmented is the memory. For the record: we are talking about the magic fields as found in include/media/videobuf*.h. None of the magic field there are actually in the video buffers. They are in administrative structures or in ops structs which are unlikely to be close in memory to the actual buffers. Well, Pawel's email didn't mentioned that he is referring just to one type of magic check. Thanks for the explanation. Although I don't really understand how any of the magic numbers would help with DMA corruption, short of DMA operations overwriting random memory areas (or just huge amounts of memory). If I understood correctly, even the magics in dma-sg code are in structs videobuf_dmabuf and videobuf_dma_sg_memory. Do dma operations touch those structures directly? Best regards -- Pawel Osciak Linux Platform Group Samsung Poland RD Center -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Magic in videobuf
Andy Walls wrote: On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote: Hans Verkuil wrote: Hans Verkuil wrote: Pawel Osciak wrote: is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code besides driver debugging? I intend to remove them, as we weren't able to find any particular use for them when we were discussing this at the memory handling meeting in Norway... It is a sort of paranoid check to avoid the risk of mass memory corruption if something goes deadly wrong with the video buffers. What on earth does this magic check have to do with possible DMA overruns/memory corruption? This assumes that somehow exactly these magic fields are overwritten and that you didn't crash because of memory corruption elsewhere much earlier. All it does is oops anyway, so it really doesn't 'avoid' a crash (as if you could in such scenarios). And most likely the damage has been done already in that case. It won't avoid the damage, but the error message could potentially help to track the issue. It will also likely limit the damage. Please let us get rid of this. It makes no sense whatsoever. I don't have a strong opinion about this subject, but if this code might help to avoid propagating the damage and to track the issue, I don't see why we need to remove it, especially since it is easy to disable the entire logic by just adding a few #if's to remove this code on environments where no problem is expected. It is highly unlikely that this code ever prevented these issues. Especially given the places where the check is done. I think this is just debug code that has been dragged along for all these years without anyone bothering to remove it. I remember I had to re-format one disk, during that time, due to a videobuf issue. So, those checks help people that are touching at the videobuf code, reducing the chances of damaging their disk partitions when trying to implement overlay mode and userptr on the videobuf implementations that misses those features, or when working on a different mmap() logic at the driver. In a previous job, working on a particularly large application, I had occasional corruption in a shared memory segment that was shared by many writer processes and 2 readers. A simple checksum on the data header (and contents if appropriate) was enough to detect corrpution and avoid dereferencing a corrupted pointer to the next data element (when walking a data area filled with Key-Length-Value encoded data). This forward error detection was inelegant to me - kind of like putting armor on one's car instead of learning to drive properly. I only resorted to using the checksum because there was almost no way to find which process was corrupting shared memory in a reasonable amount of time. It allowed me to change a show stopper bug into an annoying data presentation bug, so the product could be released to a production environment. In a development environment, it would be much better to disable such defensive coding and let the kernel Oops. You'll never find the problems if you keep hiding them from yourself. So, to sum up (I hope I understood you guys correctly): we are not seeing any particular reason (besides debugging) for having the checks in videobuf-core. Checks in memory-specific handling may have some uses, although I am not sure how much. I am not an expert on sg drivers, but as the magics are in the kernel control structures, they are not really a subject to corruption. What may get corrupted is video data or sg lists, but the magics are in a separate memory areas anyway. So videobuf-core magics should be removed and we are leaning towards removing memory-type magics as well? That is my opinion, yes. However, there is one case where this is actually useful. Take for example the function videobuf_to_dma in videobuf-dma-sg.c. This is called by drivers and it makes sense that that function should double-check that the videobuf_buffer is associated with the dma_sg memtype. But calling this 'magic' is a poor choice of name. There is nothing magic about it, in this case it is just an identifier of the memtype. And there may be better ways to do this check anyway. I have not done any analysis, but might be enough to check whether the int_ops field of videobuf_queue equals the sg_ops pointer. If so, then the whole magic handling can go away in this case. Regards, Hans -- Hans Verkuil - video4linux developer - sponsored by TANDBERG Telecom -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Magic in videobuf
On Tue, Mar 16, 2010 at 12:35 PM, Hans Verkuil hverk...@xs4all.nl wrote: That is my opinion, yes. However, there is one case where this is actually useful. Take for example the function videobuf_to_dma in videobuf-dma-sg.c. This is called by drivers and it makes sense that that function should double-check that the videobuf_buffer is associated with the dma_sg memtype. But calling this 'magic' is a poor choice of name. There is nothing magic about it, in this case it is just an identifier of the memtype. And there may be better ways to do this check anyway. It's funny, because when I saw the word magic, I knew *exactly* what it did. The notion of putting a magic value at the top of a structure is not that uncommon (although you don't see it in Linux much). I've seen it done many times over the years (all using the term magic which is why I knew what it did). The cases where it is really helpful is when you have lots of (void *) pointers in your buffer management code, as it lets you detect cases where a function gets passed the wrong buffer type (which wouldn't fail a compile time due to the void * pointer). And by using different magic values for different structure types, in many cases it will give you a hint where the problem is (because you would now know *what* type of structure got passed into the function). That said, these sorts of cases usually aren't usually intended to catch *random* memory corruption as they are to help isolate problems in buffer handling code. Devin -- Devin J. Heitmueller - Kernel Labs http://www.kernellabs.com -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Magic in videobuf
Hans Verkuil wrote: Hi Pawel, Pawel Osciak wrote: Hello, is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code besides driver debugging? I intend to remove them, as we weren't able to find any particular use for them when we were discussing this at the memory handling meeting in Norway... It is a sort of paranoid check to avoid the risk of mass memory corruption if something goes deadly wrong with the video buffers. The original videobuf, written back in 2001/2002 had this code, and I've kept it on the redesign I did in 2007, since I know that DMA is very badly implemented on some chipsets. There are several reports of the video driver to corrupt the system memory and damaging the disk data when a PCI transfer to disk happens at the same time that a PCI2PCI data transfer happens (This basically affects overlay mode, where the hardware is programmed to transfer data from the video board to the video adapter board). The DMA bug is present on several VIA and SYS old chipsets. It happened again in some newer chips (2007?), and the fix were to add a quirk blocking overlay mode on the reported broken hardware. It seems that newer BIOSes for those newer hardware fixed this issue. That's said, I never got any report from anyone explicitly saying that they hit the MAGIC_CHECK() logic. I prefer to keep this logic, but maybe we can add a CONFIG option to disable it. Something like: #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK #define MAGIC_CHECK() ... #else #define MAGIC_CHECK() #endif What on earth does this magic check have to do with possible DMA overruns/memory corruption? This assumes that somehow exactly these magic fields are overwritten and that you didn't crash because of memory corruption elsewhere much earlier. Yes, that's the assumption. As, in general, there are more than one videobuffer, and assuming that one buffer physical address is close to the other, if the data got miss-aligned at the DMA, it is likely that the magic number of the next buffer will be overwritten if something got bad. The real situation will depend on how fragmented is the memory. It pollutes the code There are only 18 occurences of MAGIC* at a given videobuf driver: $ grep MAGIC ~v4l/master_hg/v4l/videobuf-dma-sg.c |wc -l 18 So, I don't think it is too much pollution. for no good reason. All it does is oops anyway, so it really doesn't 'avoid' a crash (as if you could in such scenarios). And most likely the damage has been done already in that case. It won't avoid the damage, but the error message could potentially help to track the issue. It will also likely limit the damage. Please let us get rid of this. It makes no sense whatsoever. I don't have a strong opinion about this subject, but if this code might help to avoid propagating the damage and to track the issue, I don't see why we need to remove it, especially since it is easy to disable the entire logic by just adding a few #if's to remove this code on environments where no problem is expected. -- Cheers, Mauro -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Magic in videobuf
Hans Verkuil wrote: Hi Pawel, Pawel Osciak wrote: Hello, is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code besides driver debugging? I intend to remove them, as we weren't able to find any particular use for them when we were discussing this at the memory handling meeting in Norway... It is a sort of paranoid check to avoid the risk of mass memory corruption if something goes deadly wrong with the video buffers. The original videobuf, written back in 2001/2002 had this code, and I've kept it on the redesign I did in 2007, since I know that DMA is very badly implemented on some chipsets. There are several reports of the video driver to corrupt the system memory and damaging the disk data when a PCI transfer to disk happens at the same time that a PCI2PCI data transfer happens (This basically affects overlay mode, where the hardware is programmed to transfer data from the video board to the video adapter board). The DMA bug is present on several VIA and SYS old chipsets. It happened again in some newer chips (2007?), and the fix were to add a quirk blocking overlay mode on the reported broken hardware. It seems that newer BIOSes for those newer hardware fixed this issue. That's said, I never got any report from anyone explicitly saying that they hit the MAGIC_CHECK() logic. I prefer to keep this logic, but maybe we can add a CONFIG option to disable it. Something like: #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK #define MAGIC_CHECK() ... #else #define MAGIC_CHECK() #endif What on earth does this magic check have to do with possible DMA overruns/memory corruption? This assumes that somehow exactly these magic fields are overwritten and that you didn't crash because of memory corruption elsewhere much earlier. Yes, that's the assumption. As, in general, there are more than one videobuffer, and assuming that one buffer physical address is close to the other, if the data got miss-aligned at the DMA, it is likely that the magic number of the next buffer will be overwritten if something got bad. The real situation will depend on how fragmented is the memory. For the record: we are talking about the magic fields as found in include/media/videobuf*.h. None of the magic field there are actually in the video buffers. They are in administrative structures or in ops structs which are unlikely to be close in memory to the actual buffers. Magic values that are actually put in the buffers themselves might serve some purpose. It pollutes the code There are only 18 occurences of MAGIC* at a given videobuf driver: $ grep MAGIC ~v4l/master_hg/v4l/videobuf-dma-sg.c |wc -l 18 So, I don't think it is too much pollution. It is, because it is absolute not clear what its purpose is, and in this case even when I know the purpose it still makes no sense. Code like that confuses people and does more harm than good. for no good reason. All it does is oops anyway, so it really doesn't 'avoid' a crash (as if you could in such scenarios). And most likely the damage has been done already in that case. It won't avoid the damage, but the error message could potentially help to track the issue. It will also likely limit the damage. Please let us get rid of this. It makes no sense whatsoever. I don't have a strong opinion about this subject, but if this code might help to avoid propagating the damage and to track the issue, I don't see why we need to remove it, especially since it is easy to disable the entire logic by just adding a few #if's to remove this code on environments where no problem is expected. It is highly unlikely that this code ever prevented these issues. Especially given the places where the check is done. I think this is just debug code that has been dragged along for all these years without anyone bothering to remove it. Regards, Hans -- Hans Verkuil - video4linux developer - sponsored by TANDBERG Telecom -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Magic in videobuf
Hans Verkuil wrote: Hans Verkuil wrote: Hi Pawel, Pawel Osciak wrote: Hello, is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code besides driver debugging? I intend to remove them, as we weren't able to find any particular use for them when we were discussing this at the memory handling meeting in Norway... It is a sort of paranoid check to avoid the risk of mass memory corruption if something goes deadly wrong with the video buffers. The original videobuf, written back in 2001/2002 had this code, and I've kept it on the redesign I did in 2007, since I know that DMA is very badly implemented on some chipsets. There are several reports of the video driver to corrupt the system memory and damaging the disk data when a PCI transfer to disk happens at the same time that a PCI2PCI data transfer happens (This basically affects overlay mode, where the hardware is programmed to transfer data from the video board to the video adapter board). The DMA bug is present on several VIA and SYS old chipsets. It happened again in some newer chips (2007?), and the fix were to add a quirk blocking overlay mode on the reported broken hardware. It seems that newer BIOSes for those newer hardware fixed this issue. That's said, I never got any report from anyone explicitly saying that they hit the MAGIC_CHECK() logic. I prefer to keep this logic, but maybe we can add a CONFIG option to disable it. Something like: #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK #define MAGIC_CHECK() ... #else #define MAGIC_CHECK() #endif What on earth does this magic check have to do with possible DMA overruns/memory corruption? This assumes that somehow exactly these magic fields are overwritten and that you didn't crash because of memory corruption elsewhere much earlier. Yes, that's the assumption. As, in general, there are more than one videobuffer, and assuming that one buffer physical address is close to the other, if the data got miss-aligned at the DMA, it is likely that the magic number of the next buffer will be overwritten if something got bad. The real situation will depend on how fragmented is the memory. For the record: we are talking about the magic fields as found in include/media/videobuf*.h. None of the magic field there are actually in the video buffers. They are in administrative structures or in ops structs which are unlikely to be close in memory to the actual buffers. Well, Pawel's email didn't mentioned that he is referring just to one type of magic check. Magic values that are actually put in the buffers themselves might serve some purpose. It pollutes the code There are only 18 occurences of MAGIC* at a given videobuf driver: $ grep MAGIC ~v4l/master_hg/v4l/videobuf-dma-sg.c |wc -l 18 So, I don't think it is too much pollution. It is, because it is absolute not clear what its purpose is, and in this case even when I know the purpose it still makes no sense. Code like that confuses people and does more harm than good. for no good reason. All it does is oops anyway, so it really doesn't 'avoid' a crash (as if you could in such scenarios). And most likely the damage has been done already in that case. It won't avoid the damage, but the error message could potentially help to track the issue. It will also likely limit the damage. Please let us get rid of this. It makes no sense whatsoever. I don't have a strong opinion about this subject, but if this code might help to avoid propagating the damage and to track the issue, I don't see why we need to remove it, especially since it is easy to disable the entire logic by just adding a few #if's to remove this code on environments where no problem is expected. It is highly unlikely that this code ever prevented these issues. Especially given the places where the check is done. I think this is just debug code that has been dragged along for all these years without anyone bothering to remove it. I remember that when I did the conversion, the memory magic helped a lot to find several issues. So, yes, they are very useful when debug troubles at videbuf. I remember I had to re-format one disk, during that time, due to a videobuf issue. So, those checks help people that are touching at the videobuf code, reducing the chances of damaging their disk partitions when trying to implement overlay mode and userptr on the videobuf implementations that misses those features, or when working on a different mmap() logic at the driver. They also helped me with some troubles related to compat32 stuff and troubles at mmap() logic on the driver, as the videobuf magic hits when the data segment is pointing to the wrong place. This may also help to find bugs with troubles with the memory allocators. By looking only at the adm struct magic, I don't see any problem on getting rid of them. -- Cheers, Mauro -- To unsubscribe from this list: send the line
Re: Magic in videobuf
On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote: Hans Verkuil wrote: Hans Verkuil wrote: Pawel Osciak wrote: is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code besides driver debugging? I intend to remove them, as we weren't able to find any particular use for them when we were discussing this at the memory handling meeting in Norway... It is a sort of paranoid check to avoid the risk of mass memory corruption if something goes deadly wrong with the video buffers. What on earth does this magic check have to do with possible DMA overruns/memory corruption? This assumes that somehow exactly these magic fields are overwritten and that you didn't crash because of memory corruption elsewhere much earlier. All it does is oops anyway, so it really doesn't 'avoid' a crash (as if you could in such scenarios). And most likely the damage has been done already in that case. It won't avoid the damage, but the error message could potentially help to track the issue. It will also likely limit the damage. Please let us get rid of this. It makes no sense whatsoever. I don't have a strong opinion about this subject, but if this code might help to avoid propagating the damage and to track the issue, I don't see why we need to remove it, especially since it is easy to disable the entire logic by just adding a few #if's to remove this code on environments where no problem is expected. It is highly unlikely that this code ever prevented these issues. Especially given the places where the check is done. I think this is just debug code that has been dragged along for all these years without anyone bothering to remove it. I remember I had to re-format one disk, during that time, due to a videobuf issue. So, those checks help people that are touching at the videobuf code, reducing the chances of damaging their disk partitions when trying to implement overlay mode and userptr on the videobuf implementations that misses those features, or when working on a different mmap() logic at the driver. In a previous job, working on a particularly large application, I had occasional corruption in a shared memory segment that was shared by many writer processes and 2 readers. A simple checksum on the data header (and contents if appropriate) was enough to detect corrpution and avoid dereferencing a corrupted pointer to the next data element (when walking a data area filled with Key-Length-Value encoded data). This forward error detection was inelegant to me - kind of like putting armor on one's car instead of learning to drive properly. I only resorted to using the checksum because there was almost no way to find which process was corrupting shared memory in a reasonable amount of time. It allowed me to change a show stopper bug into an annoying data presentation bug, so the product could be released to a production environment. In a development environment, it would be much better to disable such defensive coding and let the kernel Oops. You'll never find the problems if you keep hiding them from yourself. Regards, Andy -- To unsubscribe from this list: send the line unsubscribe linux-media in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html