RE: Magic in videobuf

2010-03-17 Thread Pawel Osciak
Hans Verkuil wrote:
Andy Walls wrote:
On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote:
 Hans Verkuil wrote:
  Hans Verkuil wrote:
  Pawel Osciak wrote:

  is anyone aware of any other uses for MAGIC_CHECK()s in videobuf
 code
  besides driver debugging? I intend to remove them, as we weren't
 able
  to find any particular use for them when we were discussing this
 at
  the memory handling meeting in Norway...
  It is a sort of paranoid check to avoid the risk of mass memory
  corruption
  if something goes deadly wrong with the video buffers.
 

  What on earth does this magic check have to do with possible DMA
  overruns/memory corruption? This assumes that somehow exactly these
  magic
  fields are overwritten and that you didn't crash because of memory
  corruption elsewhere much earlier.

  All it does is oops anyway, so it really doesn't 'avoid' a crash
  (as if you could in such scenarios). And most likely the damage has
 been
  done already in that case.
  It won't avoid the damage, but the error message could potentially
 help
  to track the issue. It will also likely limit the damage.
 
  Please let us get rid of this. It makes no sense whatsoever.
  I don't have a strong opinion about this subject, but if this code
 might
  help
  to avoid propagating the damage and to track the issue, I don't see
 why we
  need to remove it, especially since it is easy to disable the entire
 logic
  by just adding a few #if's to remove this code on environments where
 no
  problem is expected.
 
  It is highly unlikely that this code ever prevented these issues.
  Especially given the places where the check is done. I think this is
 just
  debug code that has been dragged along for all these years without
 anyone
  bothering to remove it.

 I remember I had to re-format one disk, during that time, due to a
 videobuf issue.
 So, those checks help people that are touching at the videobuf code,
 reducing the
 chances of damaging their disk partitions when trying to implement
 overlay mode and
 userptr on the videobuf implementations that misses those features, or
 when
 working on a different mmap() logic at the driver.


In a previous job, working on a particularly large application, I had
occasional corruption in a shared memory segment that was shared by many
writer processes and 2 readers.  A simple checksum on the data header
(and contents if appropriate) was enough to detect corrpution and avoid
dereferencing a corrupted pointer to the next data element (when walking
a data area filled with Key-Length-Value encoded data).

This forward error detection was inelegant to me - kind of like
putting armor on one's car instead of learning to drive properly.  I
only resorted to using the checksum because there was almost no way to
find which process was corrupting shared memory in a reasonable amount
of time.  It allowed me to change a show stopper bug into an annoying
data presentation bug, so the product could be released to a production
environment.

In a development environment, it would be much better to disable such
defensive coding and let the kernel Oops.  You'll never find the
problems if you keep hiding them from yourself.

 So, to sum up (I hope I understood you guys correctly):

 we are not seeing any particular reason (besides debugging) for having
 the checks in videobuf-core. Checks in memory-specific handling may have
 some
 uses, although I am not sure how much. I am not an expert on sg drivers,
 but as
 the magics are in the kernel control structures, they are not really a
 subject
 to corruption. What may get corrupted is video data or sg lists, but the
 magics
 are in a separate memory areas anyway. So videobuf-core magics should be
 removed
 and we are leaning towards removing memory-type magics as well?

That is my opinion, yes. However, there is one case where this is actually
useful. Take for example the function videobuf_to_dma in
videobuf-dma-sg.c. This is called by drivers and it makes sense that that
function should double-check that the videobuf_buffer is associated with
the dma_sg memtype.

But calling this 'magic' is a poor choice of name. There is nothing magic
about it, in this case it is just an identifier of the memtype. And there
may be better ways to do this check anyway.

I have not done any analysis, but might be enough to check whether the
int_ops field of videobuf_queue equals the sg_ops pointer. If so, then the
whole magic handling can go away in this case.


Well... I see this discussion is dragging on a bit.
I will not be touching magics for now then, at least not until we arrive at
a consensus sometime in the future.


Best regards
--
Pawel Osciak
Linux Platform Group
Samsung Poland RD Center



--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Magic in videobuf

2010-03-17 Thread Hans Verkuil
On Wednesday 17 March 2010 07:50:27 Pawel Osciak wrote:
That is my opinion, yes. However, there is one case where this is actually
useful. Take for example the function videobuf_to_dma in
videobuf-dma-sg.c. This is called by drivers and it makes sense that that
function should double-check that the videobuf_buffer is associated with
the dma_sg memtype.

But calling this 'magic' is a poor choice of name. There is nothing magic
about it, in this case it is just an identifier of the memtype. And there
may be better ways to do this check anyway.

I have not done any analysis, but might be enough to check whether the
int_ops field of videobuf_queue equals the sg_ops pointer. If so, then the
whole magic handling can go away in this case.


 Well... I see this discussion is dragging on a bit.
 I will not be touching magics for now then, at least not until we arrive at
 a consensus sometime in the future.

You give up too easily :-)

But I agree that it is probably best to start with checkpatch cleanups.

I did a bit more research. The idea that I had to check the int_ops field
doesn't work at the moment because videobuf_qtype_ops contains a wild variety
of functions: some operate on a queue, some on a buffer. Some get a queue
pointer, but really should have gotten a buffer pointer. Ops like copy_to_user
and copy_stream shouldn't have been in the qtype at all, at least not in the
current form.

Anyway, it would help a lot if the qtype_ops are split into two structs: one
for queues, one for buffers. Each queue or buffer struct then has a pointer
to the ops. And that can be used by the qtype code as a check to detect
whether it is called from the right context.

Frankly, in the long term the queue-specific ops should probably be replaced
by buffer-specific ops anyway since we want to have a lot more flexibility
here and possible have different memory types per buffer (or even plane) on
the same queue.

It is my impression that all the queue-specific ops just walk over the buffers
anyway, so they can easily be replaced by buffer-specific ops and the 'walk 
over'
part can be moved to the core. So perhaps switching immediately to buffer-only
ops might actually be better than create separate queue and buffer ops.

Regards,

Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Magic in videobuf

2010-03-16 Thread Pawel Osciak
Andy Walls wrote:
On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote:
 Hans Verkuil wrote:
  Hans Verkuil wrote:
  Pawel Osciak wrote:

  is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
  besides driver debugging? I intend to remove them, as we weren't able
  to find any particular use for them when we were discussing this at
  the memory handling meeting in Norway...
  It is a sort of paranoid check to avoid the risk of mass memory
  corruption
  if something goes deadly wrong with the video buffers.
 

  What on earth does this magic check have to do with possible DMA
  overruns/memory corruption? This assumes that somehow exactly these
  magic
  fields are overwritten and that you didn't crash because of memory
  corruption elsewhere much earlier.

  All it does is oops anyway, so it really doesn't 'avoid' a crash
  (as if you could in such scenarios). And most likely the damage has been
  done already in that case.
  It won't avoid the damage, but the error message could potentially help
  to track the issue. It will also likely limit the damage.
 
  Please let us get rid of this. It makes no sense whatsoever.
  I don't have a strong opinion about this subject, but if this code might
  help
  to avoid propagating the damage and to track the issue, I don't see why we
  need to remove it, especially since it is easy to disable the entire logic
  by just adding a few #if's to remove this code on environments where no
  problem is expected.
 
  It is highly unlikely that this code ever prevented these issues.
  Especially given the places where the check is done. I think this is just
  debug code that has been dragged along for all these years without anyone
  bothering to remove it.

 I remember I had to re-format one disk, during that time, due to a videobuf 
 issue.
 So, those checks help people that are touching at the videobuf code, 
 reducing the
 chances of damaging their disk partitions when trying to implement overlay 
 mode and
 userptr on the videobuf implementations that misses those features, or when
 working on a different mmap() logic at the driver.


In a previous job, working on a particularly large application, I had
occasional corruption in a shared memory segment that was shared by many
writer processes and 2 readers.  A simple checksum on the data header
(and contents if appropriate) was enough to detect corrpution and avoid
dereferencing a corrupted pointer to the next data element (when walking
a data area filled with Key-Length-Value encoded data).

This forward error detection was inelegant to me - kind of like
putting armor on one's car instead of learning to drive properly.  I
only resorted to using the checksum because there was almost no way to
find which process was corrupting shared memory in a reasonable amount
of time.  It allowed me to change a show stopper bug into an annoying
data presentation bug, so the product could be released to a production
environment.

In a development environment, it would be much better to disable such
defensive coding and let the kernel Oops.  You'll never find the
problems if you keep hiding them from yourself.

So, to sum up (I hope I understood you guys correctly):

we are not seeing any particular reason (besides debugging) for having
the checks in videobuf-core. Checks in memory-specific handling may have some
uses, although I am not sure how much. I am not an expert on sg drivers, but as
the magics are in the kernel control structures, they are not really a subject
to corruption. What may get corrupted is video data or sg lists, but the magics
are in a separate memory areas anyway. So videobuf-core magics should be removed
and we are leaning towards removing memory-type magics as well?


Best regards
--
Pawel Osciak
Linux Platform Group
Samsung Poland RD Center


--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Magic in videobuf

2010-03-16 Thread Pawel Osciak
Mauro Carvalho Chehab wrote:
 is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
 besides driver debugging? I intend to remove them, as we weren't able
 to find any particular use for them when we were discussing this at
 the memory handling meeting in Norway...
 It is a sort of paranoid check to avoid the risk of mass memory
 corruption
 if something goes deadly wrong with the video buffers.

 The original videobuf, written back in 2001/2002 had this code, and
 I've
 kept it on the redesign I did in 2007, since I know that DMA is very
 badly
 implemented on some chipsets. There are several reports of the video
 driver
 to corrupt the system memory and damaging the disk data when a PCI
 transfer
 to disk happens at the same time that a PCI2PCI data transfer happens
 (This
 basically affects overlay mode, where the hardware is programmed to
 transfer
 data from the video board to the video adapter board).

 The DMA bug is present on several VIA and SYS old chipsets. It happened
 again
 in some newer chips (2007?), and the fix were to add a quirk blocking
 overlay
 mode on the reported broken hardware. It seems that newer BIOSes for
 those
 newer hardware fixed this issue.

 That's said, I never got any report from anyone explicitly saying that
 they
 hit the MAGIC_CHECK() logic.

 I prefer to keep this logic, but maybe we can add a CONFIG option to
 disable it.
 Something like:

 #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK
   #define MAGIC_CHECK() ...
 #else
   #define MAGIC_CHECK()
 #endif
 What on earth does this magic check have to do with possible DMA
 overruns/memory corruption? This assumes that somehow exactly these
 magic
 fields are overwritten and that you didn't crash because of memory
 corruption elsewhere much earlier.
 Yes, that's the assumption. As, in general, there are more than one
 videobuffer,
 and assuming that one buffer physical address is close to the other, if
 the data
 got miss-aligned at the DMA, it is likely that the magic number of the
 next buffer
 will be overwritten if something got bad. The real situation will depend
 on how
 fragmented is the memory.

 For the record: we are talking about the magic fields as found in
 include/media/videobuf*.h. None of the magic field there are actually in
 the video buffers. They are in administrative structures or in ops structs
 which are unlikely to be close in memory to the actual buffers.

Well, Pawel's email didn't mentioned that he is referring just to one type
of magic check.

Thanks for the explanation. Although I don't really understand how any of
the magic numbers would help with DMA corruption, short of DMA operations
overwriting random memory areas (or just huge amounts of memory). If I 
understood
correctly, even the magics in dma-sg code are in structs videobuf_dmabuf and
videobuf_dma_sg_memory. Do dma operations touch those structures directly?


Best regards
--
Pawel Osciak
Linux Platform Group
Samsung Poland RD Center



--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Magic in videobuf

2010-03-16 Thread Hans Verkuil

Andy Walls wrote:
On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote:
 Hans Verkuil wrote:
  Hans Verkuil wrote:
  Pawel Osciak wrote:

  is anyone aware of any other uses for MAGIC_CHECK()s in videobuf
 code
  besides driver debugging? I intend to remove them, as we weren't
 able
  to find any particular use for them when we were discussing this
 at
  the memory handling meeting in Norway...
  It is a sort of paranoid check to avoid the risk of mass memory
  corruption
  if something goes deadly wrong with the video buffers.
 

  What on earth does this magic check have to do with possible DMA
  overruns/memory corruption? This assumes that somehow exactly these
  magic
  fields are overwritten and that you didn't crash because of memory
  corruption elsewhere much earlier.

  All it does is oops anyway, so it really doesn't 'avoid' a crash
  (as if you could in such scenarios). And most likely the damage has
 been
  done already in that case.
  It won't avoid the damage, but the error message could potentially
 help
  to track the issue. It will also likely limit the damage.
 
  Please let us get rid of this. It makes no sense whatsoever.
  I don't have a strong opinion about this subject, but if this code
 might
  help
  to avoid propagating the damage and to track the issue, I don't see
 why we
  need to remove it, especially since it is easy to disable the entire
 logic
  by just adding a few #if's to remove this code on environments where
 no
  problem is expected.
 
  It is highly unlikely that this code ever prevented these issues.
  Especially given the places where the check is done. I think this is
 just
  debug code that has been dragged along for all these years without
 anyone
  bothering to remove it.

 I remember I had to re-format one disk, during that time, due to a
 videobuf issue.
 So, those checks help people that are touching at the videobuf code,
 reducing the
 chances of damaging their disk partitions when trying to implement
 overlay mode and
 userptr on the videobuf implementations that misses those features, or
 when
 working on a different mmap() logic at the driver.


In a previous job, working on a particularly large application, I had
occasional corruption in a shared memory segment that was shared by many
writer processes and 2 readers.  A simple checksum on the data header
(and contents if appropriate) was enough to detect corrpution and avoid
dereferencing a corrupted pointer to the next data element (when walking
a data area filled with Key-Length-Value encoded data).

This forward error detection was inelegant to me - kind of like
putting armor on one's car instead of learning to drive properly.  I
only resorted to using the checksum because there was almost no way to
find which process was corrupting shared memory in a reasonable amount
of time.  It allowed me to change a show stopper bug into an annoying
data presentation bug, so the product could be released to a production
environment.

In a development environment, it would be much better to disable such
defensive coding and let the kernel Oops.  You'll never find the
problems if you keep hiding them from yourself.

 So, to sum up (I hope I understood you guys correctly):

 we are not seeing any particular reason (besides debugging) for having
 the checks in videobuf-core. Checks in memory-specific handling may have
 some
 uses, although I am not sure how much. I am not an expert on sg drivers,
 but as
 the magics are in the kernel control structures, they are not really a
 subject
 to corruption. What may get corrupted is video data or sg lists, but the
 magics
 are in a separate memory areas anyway. So videobuf-core magics should be
 removed
 and we are leaning towards removing memory-type magics as well?

That is my opinion, yes. However, there is one case where this is actually
useful. Take for example the function videobuf_to_dma in
videobuf-dma-sg.c. This is called by drivers and it makes sense that that
function should double-check that the videobuf_buffer is associated with
the dma_sg memtype.

But calling this 'magic' is a poor choice of name. There is nothing magic
about it, in this case it is just an identifier of the memtype. And there
may be better ways to do this check anyway.

I have not done any analysis, but might be enough to check whether the
int_ops field of videobuf_queue equals the sg_ops pointer. If so, then the
whole magic handling can go away in this case.

Regards,

 Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG Telecom

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Magic in videobuf

2010-03-16 Thread Devin Heitmueller
On Tue, Mar 16, 2010 at 12:35 PM, Hans Verkuil hverk...@xs4all.nl wrote:
 That is my opinion, yes. However, there is one case where this is actually
 useful. Take for example the function videobuf_to_dma in
 videobuf-dma-sg.c. This is called by drivers and it makes sense that that
 function should double-check that the videobuf_buffer is associated with
 the dma_sg memtype.

 But calling this 'magic' is a poor choice of name. There is nothing magic
 about it, in this case it is just an identifier of the memtype. And there
 may be better ways to do this check anyway.

It's funny, because when I saw the word magic, I knew *exactly* what
it did.  The notion of putting a magic value at the top of a structure
is not that uncommon (although you don't see it in Linux much).  I've
seen it done many times over the years (all using the term magic
which is why I knew what it did).  The cases where it is really
helpful is when you have lots of (void *) pointers in your buffer
management code, as it lets you detect cases where a function gets
passed the wrong buffer type (which wouldn't fail a compile time due
to the void * pointer).  And by using different magic values for
different structure types, in many cases it will give you a hint where
the problem is (because you would now know *what* type of structure
got passed into the function).

That said, these sorts of cases usually aren't usually intended to
catch *random* memory corruption as they are to help isolate problems
in buffer handling code.

Devin

-- 
Devin J. Heitmueller - Kernel Labs
http://www.kernellabs.com
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Magic in videobuf

2010-03-15 Thread Pawel Osciak
Hello,

is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
besides driver debugging? I intend to remove them, as we weren't able
to find any particular use for them when we were discussing this at
the memory handling meeting in Norway...


Best regards
--
Pawel Osciak
Linux Platform Group
Samsung Poland RD Center

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Magic in videobuf

2010-03-15 Thread Mauro Carvalho Chehab
Hans Verkuil wrote:
 Hi Pawel,

 Pawel Osciak wrote:
 Hello,

 is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
 besides driver debugging? I intend to remove them, as we weren't able
 to find any particular use for them when we were discussing this at
 the memory handling meeting in Norway...
 It is a sort of paranoid check to avoid the risk of mass memory corruption
 if something goes deadly wrong with the video buffers.

 The original videobuf, written back in 2001/2002 had this code, and I've
 kept it on the redesign I did in 2007, since I know that DMA is very badly
 implemented on some chipsets. There are several reports of the video
 driver
 to corrupt the system memory and damaging the disk data when a PCI
 transfer
 to disk happens at the same time that a PCI2PCI data transfer happens
 (This
 basically affects overlay mode, where the hardware is programmed to
 transfer
 data from the video board to the video adapter board).

 The DMA bug is present on several VIA and SYS old chipsets. It happened
 again
 in some newer chips (2007?), and the fix were to add a quirk blocking
 overlay
 mode on the reported broken hardware. It seems that newer BIOSes for those
 newer hardware fixed this issue.

 That's said, I never got any report from anyone explicitly saying that
 they
 hit the MAGIC_CHECK() logic.

 I prefer to keep this logic, but maybe we can add a CONFIG option to
 disable it.
 Something like:

 #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK
  #define MAGIC_CHECK() ...
 #else
  #define MAGIC_CHECK()
 #endif
 
 What on earth does this magic check have to do with possible DMA
 overruns/memory corruption? This assumes that somehow exactly these magic
 fields are overwritten and that you didn't crash because of memory
 corruption elsewhere much earlier. 

Yes, that's the assumption. As, in general, there are more than one videobuffer,
and assuming that one buffer physical address is close to the other, if the data
got miss-aligned at the DMA, it is likely that the magic number of the next 
buffer
will be overwritten if something got bad. The real situation will depend on how 
fragmented is the memory.

 It pollutes the code

There are only 18 occurences of MAGIC* at a given videobuf driver:
$ grep MAGIC ~v4l/master_hg/v4l/videobuf-dma-sg.c |wc -l
18

So, I don't think it is too much pollution.

 for no good
 reason. All it does is oops anyway, so it really doesn't 'avoid' a crash
 (as if you could in such scenarios). And most likely the damage has been
 done already in that case.

It won't avoid the damage, but the error message could potentially help
to track the issue. It will also likely limit the damage.

 Please let us get rid of this. It makes no sense whatsoever.

I don't have a strong opinion about this subject, but if this code might help
to avoid propagating the damage and to track the issue, I don't see why we
need to remove it, especially since it is easy to disable the entire logic
by just adding a few #if's to remove this code on environments where no
problem is expected.

-- 

Cheers,
Mauro
--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Magic in videobuf

2010-03-15 Thread Hans Verkuil

 Hans Verkuil wrote:
 Hi Pawel,

 Pawel Osciak wrote:
 Hello,

 is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
 besides driver debugging? I intend to remove them, as we weren't able
 to find any particular use for them when we were discussing this at
 the memory handling meeting in Norway...
 It is a sort of paranoid check to avoid the risk of mass memory
 corruption
 if something goes deadly wrong with the video buffers.

 The original videobuf, written back in 2001/2002 had this code, and
 I've
 kept it on the redesign I did in 2007, since I know that DMA is very
 badly
 implemented on some chipsets. There are several reports of the video
 driver
 to corrupt the system memory and damaging the disk data when a PCI
 transfer
 to disk happens at the same time that a PCI2PCI data transfer happens
 (This
 basically affects overlay mode, where the hardware is programmed to
 transfer
 data from the video board to the video adapter board).

 The DMA bug is present on several VIA and SYS old chipsets. It happened
 again
 in some newer chips (2007?), and the fix were to add a quirk blocking
 overlay
 mode on the reported broken hardware. It seems that newer BIOSes for
 those
 newer hardware fixed this issue.

 That's said, I never got any report from anyone explicitly saying that
 they
 hit the MAGIC_CHECK() logic.

 I prefer to keep this logic, but maybe we can add a CONFIG option to
 disable it.
 Something like:

 #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK
 #define MAGIC_CHECK() ...
 #else
 #define MAGIC_CHECK()
 #endif

 What on earth does this magic check have to do with possible DMA
 overruns/memory corruption? This assumes that somehow exactly these
 magic
 fields are overwritten and that you didn't crash because of memory
 corruption elsewhere much earlier.

 Yes, that's the assumption. As, in general, there are more than one
 videobuffer,
 and assuming that one buffer physical address is close to the other, if
 the data
 got miss-aligned at the DMA, it is likely that the magic number of the
 next buffer
 will be overwritten if something got bad. The real situation will depend
 on how
 fragmented is the memory.

For the record: we are talking about the magic fields as found in
include/media/videobuf*.h. None of the magic field there are actually in
the video buffers. They are in administrative structures or in ops structs
which are unlikely to be close in memory to the actual buffers.

Magic values that are actually put in the buffers themselves might serve
some purpose.


 It pollutes the code

 There are only 18 occurences of MAGIC* at a given videobuf driver:
   $ grep MAGIC ~v4l/master_hg/v4l/videobuf-dma-sg.c |wc -l
   18

 So, I don't think it is too much pollution.

It is, because it is absolute not clear what its purpose is, and in this
case  even when I know the purpose it still makes no sense. Code like that
confuses people and does more harm than good.


 for no good
 reason. All it does is oops anyway, so it really doesn't 'avoid' a crash
 (as if you could in such scenarios). And most likely the damage has been
 done already in that case.

 It won't avoid the damage, but the error message could potentially help
 to track the issue. It will also likely limit the damage.

 Please let us get rid of this. It makes no sense whatsoever.

 I don't have a strong opinion about this subject, but if this code might
 help
 to avoid propagating the damage and to track the issue, I don't see why we
 need to remove it, especially since it is easy to disable the entire logic
 by just adding a few #if's to remove this code on environments where no
 problem is expected.

It is highly unlikely that this code ever prevented these issues.
Especially given the places where the check is done. I think this is just
debug code that has been dragged along for all these years without anyone
bothering to remove it.

Regards,

 Hans

-- 
Hans Verkuil - video4linux developer - sponsored by TANDBERG Telecom

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Magic in videobuf

2010-03-15 Thread Mauro Carvalho Chehab
Hans Verkuil wrote:
 Hans Verkuil wrote:
 Hi Pawel,

 Pawel Osciak wrote:
 Hello,

 is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
 besides driver debugging? I intend to remove them, as we weren't able
 to find any particular use for them when we were discussing this at
 the memory handling meeting in Norway...
 It is a sort of paranoid check to avoid the risk of mass memory
 corruption
 if something goes deadly wrong with the video buffers.

 The original videobuf, written back in 2001/2002 had this code, and
 I've
 kept it on the redesign I did in 2007, since I know that DMA is very
 badly
 implemented on some chipsets. There are several reports of the video
 driver
 to corrupt the system memory and damaging the disk data when a PCI
 transfer
 to disk happens at the same time that a PCI2PCI data transfer happens
 (This
 basically affects overlay mode, where the hardware is programmed to
 transfer
 data from the video board to the video adapter board).

 The DMA bug is present on several VIA and SYS old chipsets. It happened
 again
 in some newer chips (2007?), and the fix were to add a quirk blocking
 overlay
 mode on the reported broken hardware. It seems that newer BIOSes for
 those
 newer hardware fixed this issue.

 That's said, I never got any report from anyone explicitly saying that
 they
 hit the MAGIC_CHECK() logic.

 I prefer to keep this logic, but maybe we can add a CONFIG option to
 disable it.
 Something like:

 #ifdef CONFIG_VIDEO_DMA_PARANOID_CHECK
#define MAGIC_CHECK() ...
 #else
#define MAGIC_CHECK()
 #endif
 What on earth does this magic check have to do with possible DMA
 overruns/memory corruption? This assumes that somehow exactly these
 magic
 fields are overwritten and that you didn't crash because of memory
 corruption elsewhere much earlier.
 Yes, that's the assumption. As, in general, there are more than one
 videobuffer,
 and assuming that one buffer physical address is close to the other, if
 the data
 got miss-aligned at the DMA, it is likely that the magic number of the
 next buffer
 will be overwritten if something got bad. The real situation will depend
 on how
 fragmented is the memory.
 
 For the record: we are talking about the magic fields as found in
 include/media/videobuf*.h. None of the magic field there are actually in
 the video buffers. They are in administrative structures or in ops structs
 which are unlikely to be close in memory to the actual buffers.

Well, Pawel's email didn't mentioned that he is referring just to one type
of magic check.


 Magic values that are actually put in the buffers themselves might serve
 some purpose.
 
 It pollutes the code
 There are only 18 occurences of MAGIC* at a given videobuf driver:
  $ grep MAGIC ~v4l/master_hg/v4l/videobuf-dma-sg.c |wc -l
  18

 So, I don't think it is too much pollution.
 
 It is, because it is absolute not clear what its purpose is, and in this
 case  even when I know the purpose it still makes no sense. Code like that
 confuses people and does more harm than good.
 
 for no good
 reason. All it does is oops anyway, so it really doesn't 'avoid' a crash
 (as if you could in such scenarios). And most likely the damage has been
 done already in that case.
 It won't avoid the damage, but the error message could potentially help
 to track the issue. It will also likely limit the damage.

 Please let us get rid of this. It makes no sense whatsoever.
 I don't have a strong opinion about this subject, but if this code might
 help
 to avoid propagating the damage and to track the issue, I don't see why we
 need to remove it, especially since it is easy to disable the entire logic
 by just adding a few #if's to remove this code on environments where no
 problem is expected.
 
 It is highly unlikely that this code ever prevented these issues.
 Especially given the places where the check is done. I think this is just
 debug code that has been dragged along for all these years without anyone
 bothering to remove it.

I remember that when I did the conversion, the memory magic helped a lot to find
several issues. So, yes, they are very useful when debug troubles at videbuf.

I remember I had to re-format one disk, during that time, due to a videobuf 
issue.
So, those checks help people that are touching at the videobuf code, reducing 
the 
chances of damaging their disk partitions when trying to implement overlay mode 
and
userptr on the videobuf implementations that misses those features, or when
working on a different mmap() logic at the driver.

They also helped me with some troubles related to compat32 stuff and troubles 
at 
mmap() logic on the driver, as the videobuf magic hits when the data segment 
is pointing to the wrong place. This may also help to find bugs with troubles 
with
the memory allocators.

By looking only at the adm struct magic, I don't see any problem on getting rid
of them.

-- 

Cheers,
Mauro
--
To unsubscribe from this list: send the line

Re: Magic in videobuf

2010-03-15 Thread Andy Walls
On Mon, 2010-03-15 at 14:26 -0300, Mauro Carvalho Chehab wrote:
 Hans Verkuil wrote:
  Hans Verkuil wrote:
  Pawel Osciak wrote:

  is anyone aware of any other uses for MAGIC_CHECK()s in videobuf code
  besides driver debugging? I intend to remove them, as we weren't able
  to find any particular use for them when we were discussing this at
  the memory handling meeting in Norway...
  It is a sort of paranoid check to avoid the risk of mass memory
  corruption
  if something goes deadly wrong with the video buffers.
 

  What on earth does this magic check have to do with possible DMA
  overruns/memory corruption? This assumes that somehow exactly these
  magic
  fields are overwritten and that you didn't crash because of memory
  corruption elsewhere much earlier.

  All it does is oops anyway, so it really doesn't 'avoid' a crash
  (as if you could in such scenarios). And most likely the damage has been
  done already in that case.
  It won't avoid the damage, but the error message could potentially help
  to track the issue. It will also likely limit the damage.
 
  Please let us get rid of this. It makes no sense whatsoever.
  I don't have a strong opinion about this subject, but if this code might
  help
  to avoid propagating the damage and to track the issue, I don't see why we
  need to remove it, especially since it is easy to disable the entire logic
  by just adding a few #if's to remove this code on environments where no
  problem is expected.
  
  It is highly unlikely that this code ever prevented these issues.
  Especially given the places where the check is done. I think this is just
  debug code that has been dragged along for all these years without anyone
  bothering to remove it.

 I remember I had to re-format one disk, during that time, due to a videobuf 
 issue.
 So, those checks help people that are touching at the videobuf code, reducing 
 the 
 chances of damaging their disk partitions when trying to implement overlay 
 mode and
 userptr on the videobuf implementations that misses those features, or when
 working on a different mmap() logic at the driver.


In a previous job, working on a particularly large application, I had
occasional corruption in a shared memory segment that was shared by many
writer processes and 2 readers.  A simple checksum on the data header
(and contents if appropriate) was enough to detect corrpution and avoid
dereferencing a corrupted pointer to the next data element (when walking
a data area filled with Key-Length-Value encoded data).

This forward error detection was inelegant to me - kind of like
putting armor on one's car instead of learning to drive properly.  I
only resorted to using the checksum because there was almost no way to
find which process was corrupting shared memory in a reasonable amount
of time.  It allowed me to change a show stopper bug into an annoying
data presentation bug, so the product could be released to a production
environment.

In a development environment, it would be much better to disable such
defensive coding and let the kernel Oops.  You'll never find the
problems if you keep hiding them from yourself.


Regards,
Andy

--
To unsubscribe from this list: send the line unsubscribe linux-media in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html